Category Archives: Uncategorized

Pandas Kung Fu: Using Pandas with Threat Intelligence

Simple Threat Analysis with Pandas

angry pandaThis Jupyter notebook is available at

This notebook demonstrates a simple way to see if egress (outbound) transactions from wire data traffic are terminating a potential malicious sites. The Python software library pandas is used to compare the IP addresses of servers exfiltrating from a mock set of IP addresses from businesses against IP addresses of malicious sites.

To represent ‘known bad actors’, this example uses IP addresses that have been reported within the last 48 hours as having run attacks on the service Mail, Postfix at the Blocklist website:, downloaded Jan 9, 2018.

 Import the pandas package, %matplotlib is needed to visualize matplot graphs in jupyter notebook.
In [45]:
import pandas as pd
%matplotlib inline
 For this example we use a mock dataset ‘egress’ containing 10,000 records of egress (outbound) transactions in a csv file.

Load the data into pandas using pd.read_csv. Our data uses latin-1 encoding, depending on your source you may need to specify another option, such as utf-8 or utf-16.

In [46]:
egress=pd.read_csv("Traffic.csv", encoding='latin-1')

Examining the first few records we can see that the Server Address column is the destination ip address for outbound (egress) transactions.

In [47]:
Time Record Type Source Destination Source Location Dest Location Environment Dest Country Protocol Client Address Client Bytes Server Address Server Bytes Latency Process Time
0 55:39.4 Flow Audit External External DC Datacenter DC Datacenter EGRESS United States telnet 3 36 NaN NaN
1 55:38.5 Flow Audit External External DC Datacenter DC Datacenter EGRESS United States tcp:23 1 1 NaN 17.626
2 55:38.5 Flow Audit Santa Clara Campus (Users) External Santa Clara Office California EGRESS United States SSL:443 173 133 NaN NaN
3 55:38.5 Flow Audit Santa Clara Campus (Users) External Santa Clara Office California EGRESS United States SSL:443 600 1,406 NaN 0.747
4 55:38.3 Flow Audit External External DC Datacenter DC Datacenter EGRESS United States tcp:23 1 1 NaN 64.652

We can quickly identify the countries and volume of traffic associated with the egress servers using the column ‘Dest Country’, which stands for Destination Country.

Index into the ‘egress’ dataframe and select the column, this creates a pandas series object, assign it to variable ‘servers’.

Using the value_counts() method on the servers object gives us a count for each country. The US was the most frequent destination country (9580), followed by Norway (291) and Ireland (51).

In [15]:
servers=egress['Dest Country']
In [16]:
United States     9580
Norway             291
Ireland             51
Japan               22
France              13
Netherlands         10
Singapore            5
Poland               1
United Kingdom       1
Name: Dest Country, dtype: int64

We can plot the distribution, by using the .plot() method and specifying a horizontal bar chart. However in this example it is not is very useful because of the very high frequency of US IP addresses compared with other countries.

In [18]:
egress['Dest Country'].value_counts().plot(kind='barh')
<matplotlib.axes._subplots.AxesSubplot at 0x118517780>

What if we wanted to look at the traffic from a particular country, such as France? We could use indexing combined with the the pandas .isin() method. This is similar to the SQL LIKE operator. The produces a new dataframe ‘France’.

In [48]:
France=(egress.loc[egress['Dest Country'].isin(['France'])])
In [49]:
Time Record Type Source Destination Source Location Dest Location Environment Dest Country Protocol Client Address Client Bytes Server Address Server Bytes Latency Process Time
1784 50:13.4 Flow Audit Santa Clara Campus (Users) External Santa Clara Office Ì_le-de-France EGRESS France SSL:443 126 258 NaN 57.071
1787 50:13.3 Flow Audit Santa Clara Campus (Users) External Santa Clara Office Ì_le-de-France EGRESS France SSL:443 517 2,916 60.99 76.425
In [50]:

Using the describe method we can see that there are 13 transactions – this is the same number identified using .value_count() above. All of these transactions are from the Santa Clara Campus, the most frequent destination is Ale-de-France and is the most common server address.

In [51]:
Time Record Type Source Destination Source Location Dest Location Environment Dest Country Protocol Client Address Client Bytes Server Address Server Bytes Latency Process Time
count 13 13 13 13 13 5 13 13 13 13 13 13 13 6 12
unique 9 1 1 1 1 1 1 1 1 1 8 4 9 6 12
top 40:36.8 Flow Audit Santa Clara Campus (Users) External Santa Clara Office Ì_le-de-France EGRESS France SSL:443 517 137 130.23 134.514
freq 2 13 13 13 13 5 13 13 13 13 3 4 2 1 1

Server Addresses

Lets return to the full ‘egress’ dataframe and select all data in the ‘Server Address’ column, assigning it to the variable ‘server_ip’

In [52]:
server_ip=egress['Server Address']

Using the .describe() method we see that there are 10,000 records, 691 unique server addresses, the top (highest frequency) IP address is, which occurs 225 times.

In [53]:
count            10000
unique             691
freq               225
Name: Server Address, dtype: object

We can use .value_counts() to count the frequency of each IP address and .head(20) to limit the results to the top 20.

In [27]:
Out[27]:       225     221      196      181      152    147     145      142      132         114      105       104        101        99         97         94         92         90         90         89
Name: Server Address, dtype: int64

This time the bar graph offers greater visualization power.

In [28]:
<matplotlib.axes._subplots.AxesSubplot at 0x118ad8710>

Right now server_ip is a pandas.Series object, we can use .type() to confirm that. If you want this data in dataframe format, use .to_frame() method to convert.

In [54]:
In [55]:
In [56]:

Using .tail() we can confirm that our new server_ip dataframe still has 10,000 records.

In [35]:
Server Address

The Bad Guys

The file ‘badips.csv’ contains a list of malicious IP addresses from the Blocklist website. We use pd.read_csv to bring the data into a pandas dataframe. We specify that there is no header and assign the column name ‘Server Address’ to our 1 column dataframe.

In [79]:
blocklist = pd.read_csv('badips.csv', header=None, names=['Server Address'])

The blocklist dataframe has 20,134 Server Addresses, all of them are unique as each address represents a different site.

In [83]:
Server Address
count 20134
unique 20134
freq 1
In [84]:
Server Address

To see how many of our egress transactions in the ‘egress’ dataframe are terminating at malicous sites from the ‘blocklist’ dataframe – we merge the two dataframes.

We use pd.merge and specify an inner join on the ‘Server Address’ column.

Congratulations! The resulting joined_ips dataframe is empty. There are no transactions terminating at a known malicious site.

In [85]:
joined_ips=pd.merge(egress,blocklist, on='Server Address', how='inner')
In [86]:
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 15 columns):
Time               0 non-null object
Record Type        0 non-null object
Source             0 non-null object
Destination        0 non-null object
Source Location    0 non-null object
Dest Location      0 non-null object
Environment        0 non-null object
Dest Country       0 non-null object
Protocol           0 non-null object
Client Address     0 non-null object
Client Bytes       0 non-null object
Server Address     0 non-null object
Server Bytes       0 non-null object
Latency            0 non-null object
Process Time       0 non-null object
dtypes: object(15)
memory usage: 0.0+ bytes
In [87]:
Time Record Type Source Destination Source Location Dest Location Environment Dest Country Protocol Client Address Client Bytes Server Address Server Bytes Latency Process Time

Lets add some fake data (like fake news but better!) into our blocklist to make sure the join is working. I have taken 10 IP addresses from the ‘egress’ data frame and created a file ‘fakebadservers.csv’. After loading this file using pd.read_csv I concatonated this dataframe to ‘blocklist’ to create ‘fakeblocklist’.

In [88]:
fakeblocklist=pd.read_csv('fakebadservers.csv', names=['Server Address'])
In [89]:
fakeblocklist=pd.concat([blocklist, fakeblocklist], axis=0)
In [90]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20144 entries, 0 to 9
Data columns (total 1 columns):
Server Address    20144 non-null object
dtypes: object(1)
memory usage: 314.8+ KB
In [91]:
Server Address

Lets try the merge again, only this time using ‘egress’ and ‘fakeblocklist’. This merge identifies 580 transactions associated with a known ‘bad actor’ – that is the Server Address in egress matches the server address in ‘fakeblocklist’.

In [92]:
joined_ips2=pd.merge(egress,fakeblocklist, on='Server Address', how='inner')
In [93]:
Time Record Type Source Destination Source Location Dest Location Environment Dest Country Protocol Client Address Client Bytes Server Address Server Bytes Latency Process Time
0 55:38.5 Flow Audit External External DC Datacenter DC Datacenter EGRESS United States tcp:23 1 1 NaN 17.626
1 55:38.3 Flow Audit External External DC Datacenter DC Datacenter EGRESS United States tcp:23 1 1 NaN 64.652
2 55:38.0 Flow Audit External External DC Datacenter DC Datacenter EGRESS United States tcp:23 3 3 NaN 548,711.66
3 55:36.5 Flow Audit External External DC Datacenter DC Datacenter EGRESS United States tcp:23 1 1 NaN 297.186
4 55:36.2 Flow Audit External External DC Datacenter DC Datacenter EGRESS United States tcp:23 1 4 NaN 61.68
In [94]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 580 entries, 0 to 579
Data columns (total 15 columns):
Time               580 non-null object
Record Type        580 non-null object
Source             580 non-null object
Destination        580 non-null object
Source Location    580 non-null object
Dest Location      580 non-null object
Environment        580 non-null object
Dest Country       580 non-null object
Protocol           580 non-null object
Client Address     580 non-null object
Client Bytes       573 non-null object
Server Address     580 non-null object
Server Bytes       573 non-null object
Latency            193 non-null object
Process Time       539 non-null object
dtypes: object(15)
memory usage: 72.5+ KB


This is just a simple example, but I think pandas offers great potential for threat detection. Next step – API integration.


Refer(er) MADNESS!!! “Phish Phinding” with Wire Data


I was reading the Cisco mid-year security review and once again, phishing is still staggeringly effective. One of the strategies in looking for possible phishing campaigns is investigating HTTP referrer metrics in log files. This is not a bad way to look for phishing campaigns but I wanted to take the time to cover a log-alternative to investigating phishing attacks using ExtraHop’s Wire Data analytics.

Why wire data?
I recently read a whitepaper by Solarwinds stating that a typical peak events-per-second rate on a web farm is around 1100. Cast this across 6-8 business hours a day and you are looking at paying to store and index between 24 million and 32 million events of which a fraction of that is relevant to searching relevant referrals. While I love logs and the use of them, the intelligence yield of most log solutions can be measured in the “thousandths” of a percent. In this post, I am going to walk though how to look for specific referrers that do not match what you expect to see and provide ONLY the actionable HTTP referrers that could be the result of phishing campaigns.

What do we want to do?
We have a site called that logs users into our fictitious financial site. All users should be sent to a welcome page called (yes, this is simplistic). We can either hash that HTTP referer or referer(s) and look specifically for referers that do NOT match or we can look for them specifically. You may find that if you have a large number of HTTP referer sites it looks MUCH cleaner to use hashes. For this example we want to look for two things.

  1. Any HTTP referer that DOES NOT match the goodRefer array
  2. Any “double-dotted” namespace that exists in the domArray array. (You may also want to check for “appended” or “double-dotted” namespace on line 7 as well in case you think internal users may get phished and you want to catch it on the EGRESS.)

Okay, we have some unauthorized HTTP referers, we have verified that they are a phishing site now what do we do?
You have several options should you observe unauthorized HTTP referers those that I can think of initially are as follows:

  • If this is an external customer, you can log the credentials on the site and begin the process of alerting them.
  • Build in a redirect policy sending users sourcing from the offending referer to a page that alerts them that they have been phished.
  • Send the data to your poor-overburdened SIEM with some high INTEL-yield actionable logs!!
  • JSON.stringify the results and send them to the FS-ISAC and alert the rest of the community (and punk-bust the offenders that much faster!)
  • HTTP POST the results to your API-driven orchestration and IR solution (ChatOps, Risklense, ServiceNow, etc.)
  • Expose the unauthorized referer data via our API to existing workflows that can check against Threat intelligence.

Within my lab I am writing the data out to our search appliance:

Below we are writing the data to our search appliance with a once-click link directly to the packets for digital evidence and further investigation (Click Image)

Phishing is still an effective vector for bad actors today and less friction we can put between the practitioner and the data that they need the better. Leveraging wire-data for this task is considerably more-agile than log based solutions and it delivers better data to your existing SIEM investment. Having an open platform like ExtraHop makes integrating incident response, system owners, end users and customers as well as the security community at large more integrated into the solution. We have had stand-alone security for a long time, in some situations that is as good as it gets but where we can, I believe we should leverage the entire community. It is the gaps between us (users, customers, system owners and security) that is exploited as much as any vuln.

Thanks for reading!!

John Smith
Sr. Security Engineer
ExtraHop Networks


Hand to Hand Combat: Finding on your network using Wire Data

I was listening to the Security Weekly podcast this week with Paul Asadoorian and John Strand and I heard them talking about a product called Mazerunner that can detect running on your network. I began to wonder if there was a way to detect using the ExtraHop platform. So I downloaded a Kali OVA file and started looking into what the traffic looked like on the wire to see if I could make some sense of it. What I found was that I was, in fact, able to consistently detect traffic by looking at LLMNR responder packet itself coming from the Kali server.

What I observed:
Over the last four years, I have literally spent my time pulling metadata and PCAPs at the core router or via some sort of Span aggretation like Gigimon or Arista. I can honestly say, in the realm of protocols that I come across LLMNR (udp 5355) is not one I come across very often. It is possible this is due to the request normally being sent out to a broadcast address (rarely the focus of my work) but seeing a response from an RFC1918 address was somewhat interesting. What I mean is, I would start up responder on my Kali box then see the request to the broadcast address with the response sourcing from the Kali system with its own separate flow. An analogy would be like if you asked a crowd “Where is TypoShare?” and someone, wearing a trench-coat, fedora and sunglasses, said “I got ya’ TypoShare right here….just log in”. On the wire, this is easy to pick out as it does not respond over the ephemeral port rather it creates a new flow so I am able to easily parse it out as the Kali server appears as the sender and the Windows domain workstation appears as the receiver. This alone does NOT make a malicious situation but I want to start the process of gathering internal threat intelligence by keeping track of the system that responded to the LLMNR request.

Below is an example of a user “fat fingering” a share and having the lookup answered by the malicious Kali host (or anyone who basically “git clone” s
Note below, you see the Kali system located at answer the LLMNR request and provide name resolution for a file server/share that does not exist. It is not entirely uncommon for a large enterprise for someone to have a typo in a CIFS share name.


Below we see that we have observed someone answering an LLMNR/NBNS request

The next step is to write the system answering LLMNR requests to our session table so that we can look them up later. (cip and sip are variables for Client and Server IP respectively)

ExtraHop IOC protocol threading:


So what exactly is happening here?
For this test, I wanted to check a number of IOCs around SQL (TDS), Web Proxying and WPAD MITM hijacking as well as Hash stealing. To research this with logs, you would need to take the time to interrogate several different source log files then try to mash them up. I am trying to do this within seconds and in-flight vs. after they have been written to the SIEM. So to do this, I use the ExtraHop “Session” table which is a memcache key value pair location where I will part the system answering LLMNR queries and see if it shows up again when I match additional IOCs around CIFS, HTTP and TDS. (If that didn’t make sense, feel free to email me).

So the next step is to check for any type of hijacking/Tunneling that could occur as a result of someone running on your network. First let’s have a look at HTTP Tunneling (WPAD Proxy hijacking).

HTTP WPAD Hijacking and TUNNELING: tries to hijack proxy sessions on browsers that are set to auto-discover proxy servers. In this case, I open my browser and the Kali system happily resolves WPAN for me and offers to be my proxy server for my web session, in addition to obtaining my hash it may also prompt me for creds because, hey, clear text is always better than a hash.

How do we catch this?
To catch this, we look at the HTTP headers (in real time and at, up to, 40 Gbps) for the host WPAD and if the result is true, we check the server IP against the IP address we put in the session table earlier, if that ALSO comes back true, then we flag it as a potential actionable event. You may actually have a WPAD server in your environment, but I am comfortable in saying that you likely don’t have a WPAD server that had previously been resolving LLMNR and/or NBNS name requests. That makes it suspicious and actionable, the goal here is to give CSIRT and the SOC BETTER data, not more data. Also, keep in mind, all of this is being interrogated in flight so #LookMomNoSIEM!!! We can also send the data to a Syslog/SIEM workflow which would likely be a welcome change to get pre-parsed actionable Intel sent to the SIEM vs. thousands of logs that must be sifted through to find actionable intelligence.

Likewise, with Tunneling we do something similar where we look for the Tunneling protocol consistent with proxy servers and if we see it, we check the server IP against our LLMNR host from earlier (again, “sip” is a variable for Server IP). As I said before, you may have a proxy server but likely not one that speaks LLMNR.

Below you see the results, here we are just logging them to the debug window but we offer a number of ways to deal with IOC discovery that include but are not limited to:

  • ServiceNow SEVONE ticket creation
  • Syslog to a SIEM based workflow or orchestration
  • REST API to a workflow or orchestration solution



SQL Server Credential Stealing: can also listen for SQL Server browser requests, if someone installs SQL Management Studio on their system then proceeds to browse the network for a SQL Server, the Kali system running responder will offer them a system. Most folks would be curious and likely click on it and when they did so, would hand over their hash, likely a developer account that has access to sensitive data, or they will try to offer it SQL credentials and then the Kali server will have their SQL based creds. The first time I saw this I was imagining the horror of someone literally stealing hashes and creds and developers frustratingly offering their creds to it. So to combat this, we will look for the SQL browsing protocol (udp:1434) and if we see it, as before, we will check the server IP against our LLMNR responder that we parked in memcache (Session Table) to see if it matches. The caveat here is that I am not sure if the SQL Browser does not actually use LLMNR natively, I can say that I don’t see it very often when my head is in the packets at customer sites but for my part, I have spent the last 5 years disabling this service on most SQL Servers for security reasons. At any rate, we can detect it with regularity and many best practices advise disabling the SQL Browser service anyway so if it shows up on your network with anything other than a broadcast address it could be an issue.

Below you see the trigger logic were we check for the presence of the SQL Browsing traffic then we check it against the IP we put in the session table.


If we see a match, here we are warning inside the debug window but in an implementation this would be sent to the SIEM, ServiceNow, an Email alert or some form of orchestration to initiate an incident response.


Watching CIFS to see if hashes are being stolen:
The last thing I want to show in this post is how to detect if someone tries to steal a hash using CIFS. (This can also be done using WPAD but I will cover that in a later post).

If an end user inadvertently types the wrong share name into a browser, either via explorer or the “net use” command the normal name resolution will fail causing the system to broadcast the request. The system will answer this request via NBT-NS or LLMNR and offer the user to type credentials (see graphic below). What I noticed on the wire is that there is a CIFS message for “SMB2_SESSION_SETUP: STATUS_ACCESS_DENIED” even though there wasn’t an official request thus you see the access denied message in the CIFS dialog box for the share \\1233\.

Casting the web:
So in preparing the trigger for this, I simply look for the error string then, as we have seen previously, cross-checked the IP address against those that have responded to LLMNR requests. In the event of a match, we are writing it out to the debug window but depending on the customer we would run the normal Incident Response regimen.


It had been a long time since I had used any PEN testing tools. One thing I have to say, after 20 years of being a blue-teamer, is DAMN, the red team’s toys are SO MUCH cooler than ours!! This is not an attempt to one up Security weekly or Mazerunner, it was an interesting use case that I had not tried to tackle before with the platform and it made for an interesting weekend project. If you are curious, this took a few hours to write once I got all of the systems in place (AD lab, Kali system). While we will write all of the code that we include with our specific security bundles, this is an example of how we have an open platform that, if desired, you can engage in your own version of hand-to-hand combat with IOCs and bad actors. In this example, we have not had to set up any honey-pots or misdirection, there are products that already do that. We are simply looking at the existing traffic and parsing out behavior that is indicative of an IOC. One of the challenges of doing this with logs or machine data is that if an IOC uses more than one protocol than you likely have disparate logs/PCAPs for each protocol and researching each data source is not agile at all. Using the Session table, we are able to park specific characteristics to be referenced in real time (in Microseconds) and reused with other transactions threading them together to create a picture within seconds, this is the true power being handed back to the blue team using our platform. With ExtraHop’s application inspection triggers you have the ability to engage directly with the wire to root out IOCs like LLMNR responses, SQL Browsers, SMBv1, expired/weak certificates, etc. This gives you the same agility that your adversaries have allowing you to pivot your surveillance to what is relevant for your environment. Lastly, (or any other RFC1918 address) will NEVER be in your TAXII/STIXX feed or Threat Intelligence subscription. Leveraging a surveillance platform like ExtraHop will position you to be able to gather and build your own internal threat intelligence on your internal addresses. As most breaches being/source from the inside, the ability to create internal intelligence is critical.

Thanks for reading!



The Case for Wire Data: Save the SIEM

I recently attended Black Hat and one of the key narratives that I overheard while meeting with INFOSEC practitioners was the need to have better/smarter data. Attendees voiced some frustration with current traditional tools that are delivering too much data and the time it takes to respond to an incident leaves attackers enough time to do their dirty work. Over the last year we have heard several criticisms of SIEM based solutions and some of the limitations that they have in dealing with today’s more agile threat landscape. My induction into security was based in SIEM and I even ran a blog dedicated to Syslogs at where I detailed work I had done around my “Skunk” project. It stemmed from my desperate, but unheeded, pleas to my former manager to purchase Splunk saying “Can I have $20,000?” and being told “No” then seeing Kiwi Syslog support SQL Server connections and I then made a second plea, “Can I have $300 dollars?” which was accepted. SKUNK stood for SQL, KIWI to make it like Splunk. We used SQL, SSRS and KIWI’s ODBC connector with some parsing engines. As SIEM products go, Splunk was ABSOLUTELY my first love! I still know a few brilliant engineers that work there and I have a great deal of respect for what that company has done to revolutionize security.

Over the last 5 years I had another epiphany, I was introduced by an SE named Matt Cauthorn to the concept of Wire Data Analytics. So why am talking about a SIEM on a wire data analytics blog? Well, first, I feel like a lot of the criticism of SIEM’s aren’t necessarily 100% fair. My experience with the SIEM is that it is only as good as the data you put into it. Even more importantly, the investment you make in back-end processes in terms of how you parse, interpret and report with back end processes so that you can “set context”. By “set context” I mean find actionable data which is the subject of some intense criticisms of SIEM products today.

In an article from Dark Reading citing a Ponemon institute study:

Today, those same products “barely work at all,” says Exabeam CMO Rick Caccia. Older systems aren’t built to capture credential or identity-based threats, hackers impersonating people on corporate networks, or rogue employees trying to steal data. A recent report by the Ponemon Institute, commissioned by Cyphort, discovered 76% of SIEM users across 559 businesses view SIEM as a strategically important security tool. However, only 48% were satisfied with the actionable intelligence their SIEMs generate.


With that I’d like to start writing about how Wire Data Analytics with ExtraHop can set context in flight, reduce the cost of your SIEM investment and bring to bear an entirely new set of metrics and provide security teams with better data instead of more data.

Setting Context in Flight:
ExtraHop’s wire data analytics capabilities enable you to set context in flight by interrogating the wire for specific events then applying logic to them in milliseconds so that your logs have considerably more value and a much higher intelligence yield.

Example: Auditing your PCI Environment.
You have a PCI environment that you want to set up network based auditing for. The rules are as follows:

  • Alert on ANY external non-RFC1918 access and report it as an Egress Violation
  • Alert on any client or server based traffic that has not been pre-defined.

Using the SIEM only approach you must perform the following:

– Audit/Log every single build-up and tear-down action which could result in thousands and potentially millions of logs via Syslog or Netflow
– Index, parsed these logs
– Build/run back end batch processes to root out the few suspect transactions from the, potentially, millions of logs that you already have.

Now let’s consider what that would look like using an ExtraHop appliance.
– Create a rule that sets the appropriate communications
– Acceptable client/server traffic (Fill out the pre-defined application inspection trigger with the appropriate protocols)
– Tell the ExtraHop appliance to alert on any non-RFC1918 connection.
– Send ONLY actionable intelligence to the SIEM relieving both the CSIRT team and the back-end SIEM of the burden of indexing/parsing and sorting millions of logs.


Click Image:trigger

After setting the criteria, aka “casting the web” we need only lie in wait for something to run across it. In the video below you will see examples of how we have integrated with Splunk Cloud to “set context in flight” by sending ONLY logs that have violated the criteria cast in the application inspection trigger above.

Now instead of leaving the Threat hunter to sort through thousands or millions of logs on the back end we are sending data that is actionable because we set the rules prior to sending the Syslog message. As you can see below in the Splunk Cloud instance, every transaction sent to the SIEM is actionable vs. the madness of sending thousands and thousands of logs every second to your SIEM. This will make the bill for indexing much cheaper both from a licensing standpoint as well as a hardware scaling standpoint. (Please Watch the Video on Youtube)

New Concept: Intelligence Yield
In my time as the Event Correlation guru for my security team one of the more frustrating things I would run into is the fact that I consistently needed about 30% of what was in a log file but I would pay to store and/or index 100% of the data. When you use ExtraHop as a forwarder you have the ability to actually pick and choose what parts of a log/payload you want to forward and you can even customize the delimiter if you like. This means that there is no leftover ASCII that needs to be stored/indexed. While this may not seem like a lot, at scale it can actually get expensive! Another way we provide better intelligence yield is, as you noted in the example above, we set the conditions under which we would like to send Syslog data and ignore transactions that you may already be logging via whatever daemon you are running (apache for HTTP, etc). Why log HTTP network connections when you are already doing it in /var/log/apache/.


Possible Licensing Cost Savings:
We actually had a scenario with a customer where they wanted to find out if there were excessive logins from a single client. The traffic was sending thousands of messages per minute to their SIEM. We looked what was happening and we did the following:

  • Kept a running ticker of the number of logins per client IP
  • Sent actionable data to the SIEM by sending just those client IPs that had more than 5 logins in a ten minute period reducing the message count from thousands per minute to between 5-7.

In the fictitious scenario below, we are using Splunk’s list price to show the difference in savings when you use ExtraHop as a forwarder and give the SIEM a break on processing messages. Keep in mind, while this is an overly simple example, there may be parts of your logging regimen that ExtraHop can provide in-flight context as well as a reduction in the amount of work, licensing costs and an increase in the quality of the data you are receiving in your SIEM.

Scenario: Reducing your licensing costs as well as your Mean Time To “WTF!?” (MTTWTF)
A customer has 500 clients with each Client node sending around 2500 logs per minute (this is HARDLY out of the ordinary for a large enterprise).  So if you have say 500 clients sending 2500 events per minute you are looking at 1.25 million events being indexed every minute.  

Let’s say we use the SESSION_EXPIRE event, we are sending ONE event that has the Client:Server and a count of 500,  In terms of “Intelligence  Yield” it has the same value but it has an overall impact of .04 percent (not four percent, “point zero four” percent).  I would argue that the intelligence yield is actually higher because you have delivered a level of context (the count) in the syslog messages vs. leaving it to some algorithm or batch process on the back end to deliver context.  Five events….”meh”……500, 5000 events….”WTF!”.

Our “MTTWTF” (Mean Time To “WHAT THE F***….”) is potentially MUCH faster.  

So if I take the overly simplistic view of a 50GB Splunk license (it will NOT be this easy for you but I think most customers will get the value prop here)

From Splunk’s Website:

A 50GB Splunk license is $38K Annual or $95K perpetual WITH $19K in support.  If we can proportionally reduce the impact of their SIEM product you get the improved “MTTWTF” with a 2GB license which would cost $1500 Annually and $4500 perpetual w/$1500 in maintenance.

As I said earlier, the view here is simplistic but there is WITHOUT QUESTION logging regimens within customers that we can look to make more efficient using ExtraHop’s Wire Data Analytics and the Session table to replace logging every transaction. Also, please credit Splunk for publishing their sticker price.

This is not a knock on Splunk!

In this model you get the following:

  • On perpetual a 2100% decrease in initial costs
  • On Annual a 2500% decrease in initial costs
  • Better Intelligence Yield
  • No forwarders or debugging levels to enable on the clients themselves.

Keep in mind, the larger the license the smaller the savings will be as Splunk rewards customers for the larger GB license but the point here is, there is significant savings to be made in addition to having all around better logs.

What’s on YOUR wire!!?
When you leverage ExtraHop as a log forwarder you actually get access to the best source of data on your network. Not only do you get access to it, but you get application inspection triggers that will allow you to actively interact with it. When you are using ExtraHop, unlike logging based solutions, you are not dependent on someone to “opt-in” to logging. You will NEVER have to go to another team and ask them to install forwarders, agents or send data to a remote system. If you have an IP address, you have already opted-in and if you have an IP address, there is NO opting out. If a system is rooted and /var/log is deleted…we will still log. If logging is shut off on a system, we, like a closed caption TV, will continue surveillance and logging.

No agents, no logs….NO PROBLEM!
As previously stated, ExtraHop works from a mirror of the Network so if you have IoT devices that cannot log, we can log it for them. If you have a systems that are “certified” by a vendor and cannot be patched or have forwarders installed on them, not a problem, we can log for them. If you have a malicious raspberry pi plugged into the MDF and have ACL’d yourself off so you cannot be discovered….not a problem, we’ll log everything you do!! (We also send a New Device alert when your mac shows up). What the ExtraHop Discover appliance does is allow you to “log the un-logable” if that makes sense. Adopting a passive, surveillance strategy is a perfect complement to any SIEM regimen.

As I stated near the beginning of the post, INFOSEC teams do NOT need more data, they need better data. I am not saying you no longer need a SIEM but I am absolutely saying that we need to send better data to our SIEM. Using ExtraHop can greatly enhance the agility and certainty of any CSIRT team or SOC. Evaluating transactions BEFORE you send them to the SIEM provides the level of certainty needed to take that next step toward orchestration and automation. As Threat Hunting continues to evolve as a discipline, no one will provide you a more intelligent and scalable web to cast as we move from playing whack-a-mole to a role more consistent with a trap-door spider. Several INFOSEC workflows are currently tied to the SIEM and let’s not throw the baby out with the bathwater. The SIEM can still serve us well, we just need to take steps to send it better data, there is no better source of data than the network and there is no solution more capable of letting you mine data directly off the network than ExtraHop.

Thanks for reading


John M. Smith
Security Systems Engineer
ExtraHop Networks.


Advanced Persistent Dysfunction: Organizational “Air Gaps”

One of the more frustrating things about Wannacry, Petya, notPetya is that they would have been made significantly less effective had organizations applied MS17-014. The fact that we still see SMBv1 is utterly staggering to myself and my colleagues. Why is it that we live in a world where we have automation, vulnerability scanning, patching solutions and spend billions on security that our organizations are routinely compromised by what are, in many cases, patchable or at least significantly mitigable. (Admittedly, Petya/NotPetya exploited LSASS.EXE as well which is pretty brutal!)

There is another vector that I think is being exploited and as digital threats become more organized (2017 Cisco ASR states that it is not uncommon for malicious hashes to be less than 24 hours old) is Organizational Air Gaps. While great for protecting integrity and accountability, I believe that organizational “air gaps” are part of the issue and there are numerous instances where security teams have warned system owners about vulnerabilities and were ignored thus the sad attempt at an INFOSEC meme below.

How did we get here?
The meme above is meant to lend humor to what is a frustrating situation. I am certain that most system owners do not mock their CISO office nor are they unconcerned about their security. The issue here is that post 9/11 we started to form Cyber security teams. While INFOSEC existing prior to 2001 the size and breadth was not nearly at the level it is today where there are nearly as many INFOSEC roles as IT roles. The point is, we started the process of decoupling system owners from their own security. Cyber security teams started to form and eventually (and probably justifiably) the “Office of the Chief Information Security Officer” (OCISO) was created and, in my opinion, this is where the new risk of organizational air gaps was born. While having the CISO report to the CEO does allow for IT to be held accountable and prevents a CIO from brow-beating the security team from reporting issues with security, it has created a difficult, albeit fixable, organizational challenge where the individuals responsible for addressing reported vulnerabilities have their agenda, budgeting and staffing levels set by an entirely different organization. The security apparatus is tasked with deriving a posture/strategy for an organization and it could easily be received by the IT department as an unfunded mandate. I have worked both in security and within IT departments and throughout my career when I wasn’t working in INFOSEC, the security of the systems under my purview were never a criteria in my performance evaluation. I was very security conscious but it had more to do with not wanting to be in the news or embarrassed. The time has come to evaluate the effect and cause, systems will always have vulnerabilities and vendors will always have patches. The real vulnerability we need to address might be within our own organizations.

How do we fix this?
Well, let me start with one of my patented insufferable analogies. The fact that my city has a police department doesn’t mean that I don’t lock my door and that I am not vigilant about my own property. Sadly, the workloads, staff shortages and overall culture of today’s enterprise has system owners worrying about everything BUT security. When I first moved into my neighborhood it was VERY sketchy but I LOVED the old bungalow style houses and my wife and I decided to fix one up. Over the next few years, more people moved into the neighborhood who did not want to tolerate flop houses and crack houses and eventually things got considerably better. Crime went down, property values went up and an area that was very costly to the city was not less costly and paying higher property taxes.

So what changed?
As I stated previously, people in my neighborhood made a conscious decision not to allow fringe activity to continue. When we saw people behaving suspiciously, we called the police and generally got involved in the security of our own neighborhoods. Contrast this with a neighborhood 20 blocks south of me where the relationship with law enforcement was strained. This neighborhood was less safe, cost the city more money and had considerably lower property values. I am not going to get into the reasons for the strained relationship with the law enforcement (some legit, some not) but the point/parallel here is that the better your system owner’s relationship is with the CISO’s organization, the more functional and safer your CIDR block is going to be. You have to ask yourself what you have between you and security, a wall, a bridge or a moat with alligators in it. As a federal employee while I did not work for the OCISO my group assigned a daily “pit boss” and we built a bridge between our team and our peers on the OCISO side.

Back in 2010 I made the following statement out of frustration with INFOSEC and the way it was functioning on my Edgesight under the Hood blog. “Unless you can buy your INFOSEC team a crystal ball or get them an enterprise license to Dionne Warwick’s Psychic Friend’s Network system owners are going to HAVE to start taking some responsibility for their own security”. Security teams cannot be responsible for knowing suspect behavior on systems that they don’t oversee on a day to day basis. When we factor in things like phishing or credential stealing then we basically have a bad actor using approved, albeit stolen, credentials coming in over approved ports. If someone had stolen a key to my house and walked up to the front door, opened it and started leaving with my property, even if a cop was standing right there it would not look suspicious. When I chase them out of my house with a Mossberg THAT looks suspicious. Sadly once most systems are compromised, the last people to know are the actual system owners. At ExtraHop we pride ourselves in the visibility we provide both security teams AND system owners. As you evaluate solutions, think of how you can get system owners involved and include IT in the process of implementing them and make them a stake holder.

INFOSEC can be a lonely job, when I worked in IT security, generally the only friends I had in the organization were other security folks. The professional barrier with your IT colleagues is fine but there doesn’t need to be an air gap. In my old neighborhood, yes, the local police there could end up needing to arrest me one day (luckily I have yet to ascend beyond the occasional “suspicious character” in the police blotter) but the professional barrier should not prevent me from working hand and hand with him as he is working to protect me. The people who build, support and architect our digital products pay all of our salaries, including INFOSEC. I think we need to ask ourselves if there are any organizational air-gaps between the CIO and CISO’s organizations and what steps can we take to build bridges to ensure everyone is working together?

Thanks for reading!

John M. Smith
Solutions Architect
ExtraHop networks











The next big breach will be……

Most of the circles I run in are at the point of rolling their eyes when they hear me say “I can’t tell you what the next big breach will be other than that it will involve one host talking to another host it’s not supposed to”. One of the challenges I come across, even in the Federal space occasionally, is that due to staff shortages, the system sprawl facilitated by virtualization and ridicules workloads that some operations teams have, the ability to distill your security posture into who is talking to who is next to impossible. The top two critical controls of the SANS 20 critical controls are An inventory of Authorized and Unauthorized systems as well as an inventory on the Apps and Software running on said systems. In our conversations with practitioners these top two controls are consistently mentioned as being extremely difficult to wrangle. I believe that some of this is due to the top down nature of most security tools that perform tasks like SNMP/Ping sweeps or WMI sweeps. An individual looking to work in the dark will, if they are worth their weight in salt, effectively ACL themselves off and hide from being discovered. The fix for this is wire data analytics which does not depend on discovering data by having open ports or having a system respond. With ExtraHop’s wire data analytics platform if you have an IP address and you engage in a transaction with another host that ALSO has an IP address, you are pretty much made. We will see the port/protocol, IP, client and server of the conversation as well as numerous performance metrics. When this feature is paired with Application inspection triggers, you are then positioned to take back your enterprise and get control of those conversations that you don’t expect or don’t know about. The type of stuff that keeps your CISO up at night.

Enter the ExtraHop Segment Auditing:
Using the ExtraHop platform to audit critical segments of your infrastructure has a two-fold function. First, you are positioned to be alerted immediately when an unauthorized protocol or port has been accessed by a client or one of your servers in that segment has engaged in unauthorized traffic to Belarus or China. The second function is to allow Architecture, Security and System owners to reclaim their enterprise by getting a grip on what the exact communication landscape looks like. As previously stated, the combination of staff turnover, system-sprawl and workload have left teams with little to no time to spend auditing communications. With the ExtraHop platform as a fulcrum, much of the heavy lifting is already done drastically reducing the analytics burden.

How it works:
Within the ExtraHop platform you create a device group, you then use the Template Trigger to assign to the device group (Example: PCI) and edit a few simple variables that allow you to declare your expected communications. If a transaction that is outside the white list of expected/permitted communications the Discover Appliance will take action in the form of alerts, slack updates, dashboard updates and Explorer (our search appliance) updates. The alerted team will have five minutes to investigate the incident before they will receive another alert. The idea here is you investigate and either white list or suppress transactions that are not allowed/expected. In doing so, you should have a full map of communications within an hour of deploying the trigger to an audited segment/environment.

Declare Expected Communications:
In the trigger we have one declared variable and three white lists that can be used to reduced alert fatigue as well as root out unauthorized transactions.


Here we set the segment that we are auditing, this is what will show up in the dashboard.
Here we set the protocols that are approved for the specific device group we have
This variable is used to set the CIDR blocks that you wish to ignore. I generally only use broadcast-type addressing as there are risks with white listing an entire CIDR block.
For this variable I am using 24 bit blocks from the cidr_port variable. An example of this white list could be the need to alert on CIFS traffic but you want to remove false positives for accessing the sysvol share on your Active Directory controllers. Let’s say your AD environment lives on than we would white list “” specifically allowing us to continue to monitor for CIFS while not being alerted on normal Active Directory policy downloads.

Below is a sample of the trigger used that you assign to each device group you would like to audit.

(Click Image)

This same trigger can also be edited to white list client based activity (Egress) as well as server based activity (Ingress).

The results are you can methodically peel the onion back in the event you have a worm infecting your system(s). Additionally, you can systematically begin the process of understanding who is talking to who within your critical infrastructure. Below you see a dashboard that shows the unauthorized activity both as a server and as a client. You also have a ticker showing a count of the offending transactions that includes the client/server/protocol/role as well as a rate of protocol violations. Ideally after a few hours, and working out unexpected communications, you would expect this dashboard to be blank. Beyond the dashboards is where the real money is, let’s talk about some of the potential workflows that are available leveraging the ExtraHop ODS feature and our partners.

(Click Image)

Possible Workflows:

Export results to Excel and ask system owners what the HELL is going on:
The ExtraHop platform includes a search appliance that allows you to export the results of the segmentation audit to a spreadsheet. This can be attached to an email to the system owners or CSIRT team to find out what is going on with those unauthorized transactions. In the search grid below, what you see is a mapping of all transactions that were not previously declared as safe.

(FYI, the “Other” protocol is typically tunnel based traffic such as ICMP or GRE)

(Click Image)


SIEM Integration:
The ODS feature of the ExtraHop platform can send protocol violations to your SIEM workflow. As most CSRIT responses are tied to some sort of SIEM and ExtraHop can thread wire data surveillance into those workflows seamlessly.

Slack updates:
If you have a distributed SECOPS team or you want the flexibility of creating a Slack channel and assigning resource to watch it, the ability to leverage RESTFUL API’s to allow integration with other tools can greatly enhance the agility and effectiveness of your security incident response teams. Below you see an example of sending a link to the alert or the actual alert itself into a slack channel. In our example above, if you are a member of the PCI team or on the governance side of the house (or both for that matter) you can easily collaborate here. In the scenario below, the INFOSEC resource can actually chat with the system owner to find out if this is, in fact, suspicious activity. The majority of crimes that result in arrest do so as a result of a citizen calling the police and the two working together to determine if a crime has been committed. Sadly this dynamic doesn’t exist in IT today, we are creating it for you below (Alerts are sent within a few milliseconds).

(Click Image)

Tetration Nation:
One big announcement last week at Cisco Live was the ExtraHop integration with Cisco’s Tetration product. Below you see an example of how the ExtraHop platform handles a Ransomware outbreak. The workflow for protocol violations is the same, should the Discover appliance observe unauthorized communications, the traffic can be tagged and sent to the Cisco Security Policy Management engine where policies can be enforced.

One of the battle-cry’s for security in 2017 has been the need to simplify security. Top-down device discovery simply does not work and leaves room for bad actors as well as insider threats to work in the dark. A foundational security practice that includes passive device discovery provides the ground-up approach to security that can then lay the ground work for building a much more stable security practice. Distilling communications down to who is talking to who and is it authorized or not has been impossible for far too long. Leveraging ExtraHop’s segment auditing capabilities positions you to know, within milliseconds, when a system is operating outside its normal pre-defined parameters. When coupled with ExtraHop Addy you can obtain full-circle visibility 24×7.

Thanks for reading

John M. Smith

The Case for Wire Data: GONE in 60….”err”…8 seconds! (Counter-punching DNS jackassery)

A few days ago SANS wrote an article about the importance of tracking DNS Query length stating emphatically “Size Does Matter”. It’s an excellent article and certainly worth a read, you can find it here

The article demonstrates how easy it is to exfiltrate a file using the DNS Channel. They ran a script that encoded the /etc/passwd file into base32 chunks and exfiltrated the file to an external DNS Server. Since the subdomain limit is 63 characters or less they used all 63 characters to append an encoded text string onto the subdomains allowing them to push the data externally using the internal corporate DNS server as the mule in the process.

Just before showing us the Splunk Histograms they did something VERY unique, they showed you the following command:

# tcpdump -vvv -s 0 -i eth0 -l -n port 53 | egrep “A\? .*\.data\.rootshell\.be”

As you are aware, tcpdump is a wire analysis sniffing tool that shows you packets as they are being observed on the wire. If ONLY there were a way to take action against this behavior as it was being observed on the wire. What if I told you there was one? And you probably didn’t even know you could leverage it in your security practice!

Enter Wire Data:
In a way, the SANS article provided a Segway for me to demonstrate the power of wire data. The ExtraHop platform provides full, in flight stream analysis and has the capability to interact with external APIs that may be able to actually STOP DNS exfiltration instead of telling you that it had already happened.

So to do a similar test, I ran something similar attempting to use base64 to exfiltrate a much larger filer called blockedIPs.txt using the following script named I then used an online stopwatch to count the time from when I executed the script to when it finished. The entire text file was exfiltrated in 8.01 seconds. Basically by the time it is rendered in Splunk, the data has already been exfiltrated. That doesn’t mean that the splunk scenario isn’t valuable but leveraging wire data we can do A LOT more than just tell your SOC that they have been breached.

Below you see that the script was able to exfiltrate the entire file in 8.01 seconds.


So in looking at the time it took to exfiltrate the entire BlockedIPs.txt file, 8.01 seconds isn’t really a lot to work with as your SOC does not have a crystal ball.  BUT, in the world of wire data analytics where you deal in microseconds, seconds are hours! Below is a diagram of how we have set up the ExtraHop appliance to alert us when DNS exfiltration takes place. Since I don’t have an active OpenDNS account I am using the Slack API to demonstrate how the ExtraHop platform can integrate with intelligent services. For this test we set the following criteria using ExtraHop’s Application Inspection Triggers.

  • DNS length of greater 63 or greater
  • Not part of B2B partners’, or Internal Namespace
  • Not a common CDN like Akamai or

There will always be SOME white listing that will need to occur to avoid digital collateral damage. If the site is a .ru, .cn or in this case, a .be and I am a hospital in Minneapolis than I doubt that I have an abundance of business with those name spaces and I feel pretty comfortable bit-bucketing them via OpenDNS or another next generation DNS product.


So upon executing the script we begin to see results populating our Slack channel within a few milliseconds. This could just as easily be an API call to your DNS service to “black hole” the suspect domain.

Transaction speed: In the two images below, you can note the performance of the slack transactions. You can see an average round trip time of 41ms with an average processing time of 58ms. I would expect an API call to OpenDNS to be similarly fast basically meaning that you have plenty of time to stop a file from being fully transferred using DNS exfiltration. The point here is, unlike many SIEM based solutions, you are well positioned to counter-punch using ExtraHop and Wire Data Analytics.

Slack Performance Metrics: Taken from the ExtraHop Appliance

The article also made a point to suggest that you also monitor the subdomain counts (they are one of the few to do that, tip of the hat to ya!). Using the ExtraHop platform, we also keep track of the number of subdomains by Client/Domain combo. If you look below, you see the number of lookups for a specific domain from a single IP. Unless it is a NATed address with a few hundred users behind it, it is pretty safe that a large number of the metrics below are NOT human metrics but some sort of program doing the lookups. Even the fastest internet users cannot perform more than a few DNS queries in a 30 second period.

Also noted in the SANS article was the need to pay attention to query lengths. Here we have written a trigger to give you the average query length by root domain. This can, as well as the subdomain count, metrics can be leveraged with an API to actually orchestrate a response via OpenDNS or another product.

There is a big push to add automation and orchestration to most security practices. This is more difficult than it reads/sounds. In the case of DNS Exfiltration, many SIEM based products, while still quite valuable, lack the shutter-speed to successfully observe and react to DNS based exfiltration. However, when you leverage the power of ExtraHop’s Wire Data Analytics and the Open Data Stream (ODS) technology allowing you to orchestrate responses the job becomes almost trivial. As I stated, in our world, seconds are hours and minutes are days. The number of orchestration products hitting the cyber security market are going to make show floors look like a middle-school science fair where practitioners are going to feel like they are looking at a myriad of baking-soda volcanos! Orchestration is only as good as the visibility integrated into it, good visibility starts with wire data, good visibility starts with the ExtraHop platform.

*PS anyone running OpenDNS and familiar with the API, I’d LOVE to try counter punching using the techniques described here!!

The Case for Wire Data: Security (Interacting with the wire)

Quick post today, I want to go over what I noticed over the weekend after reading up on Quantum Insert and the way Quantum Insert works to infect users with a MOTS (Man-On-The-Side) attack.

On Saturday, I watched a pretty interesting Bro-Con 2015 presentation from Yun Zheng Hu of

In the presentation, Yun details how you can use Bro to detect Quantum Insert activity by looking on the wire at Layer 4 sequence numbers and at Layer 7 HTTP headers. While I am still working on the Layer 4 surveillance what I saw on the HTTP headers and payloads were pretty interesting. Basically, the “shooter” has to send back a 302 redirect with a content-length of zero to avoid a malformed HTTP response. As a thought exercise I set up an Application Inspection Trigger to look for this behavior. Using the PCAP they provided we found the following:

First we set up the inspection trigger looking for a status code of 302 and a content-length of zero on the HTTP_RESPONE.

In looking at the PCAP from their website where they are injecting a redirect from LinkedIn to You note in the results that we see the redirect and we have the ability to report on this type of behavior.

So for a thought exercise, I thought I would take a look at my “hackrificial” VM that I know has some malware/adware on it and did some browsing. What I noticed was that at least 2/3 of the sites that had a 302 redirect code coupled with a content-length of zero. Here are a few examples: (Using POSH VirusTotal script)

No webutation data but we did not that it had a malicious file observed in December of 2016

Now, this does not necessarily mean that 302 with a content-length of zero means malware, adware or anything like that but I think it is worth looking into if you have an ExtraHop Discover appliance. More importantly, what I am trying to point out is how ExtraHop allows you to interact directly with the wire to look for specific scenarios. From here you have the following workflows available:

  • Automate a threat intelligence feed that checks these domains and alerts/orchestrates a response to them.
  • Track them in our Session table and keep a count of them and report them in one minute blocks (instead of each observance) to give you a better idea of your exposure
  • Send them to Splunk or your INFOSEC CSIRT team

Other Wire Data Intelligence scenarios:

  • Banking login failure URI
    • How often does it get hit (thus how often do users fail to log in)
    • Geolocaiton of the IPs that failed to log in (I have a small bank in North Carolina that has ten login failures from China?????)
    • Which usernames are consistently failing?
  • Password Reset/New Cookie Banking Login URI:
    • Who was the referrer (has this user been phished?)
    • Geolocation of the IP Address (is it appropriate)
    • Did the user just log in a few days/hours ago? Why do we see a new cookie after they just recently logged in?

The wire will present you with a deluge of data, what a product like ExtraHop does is allow you to set conditions you want to observe and thread that intelligence into your security practice.

Thanks for reading!





The Case for Wire Data: Security

During the 2nd week of February I had the honor to deliver two speaking sessions at the RSA Conference in San Francisco. One of them was on Ad Hoc threat intelligence and the 2nd was a Birds of a Feather round-table session called “Beyond Logs: Wire Data Analytics”. While it was a great conference, I found that you get some strange looks at a security conference when you are walking around with a badge that says “John Smith”. In both sessions, a key narrative was the effectiveness of wire data analytics and its ability to position security teams with the needed agility to combat today’s threats. In this post I would like to make the case for wire data analytics and demonstrate the effectiveness of using wire data as another tool along with your IDS/IPS and Log consolidation.

Wire Data Analytics:
Most security professionals are familiar with wire data already. Having used Intrusion protection and detection software for nearly 20 years now, concepts such as port mirroring and span aggregation are already native to them. INFOSEC professionals are some of the original wire data analytics professionals. We differ from IDS/IPS platforms in that we are not looking specifically at signatures rather we are rebuilding layer 2-7 flows. We have several areas where we can help INFOSEC teams by providing visibility into the SANS first two critical security controls, augmenting logs and increasing visibility as well as providing a catalyst for ongoing orchestration efforts.

SANS Top 2 Security Critical Controls:
From Wikipedia we have the following list making up the SANS top 20 Cyber Security Controls (Click image if you, like me, are middle aged and can’t see it)

In our conversations with practitioners we commonly hear that “if we could JUST get an inventory of what systems are on the network”. As virtualization and automation has matured over the years, the ability to mass-provision systems has made security teams’ job much harder as there can be as much as a 15% difference in the number of nodes on a single 24 bit CIDR block from one day to the next, hell from one hour to the next. Getting a consistent inventory with current technologies generally involves responding to an SNMP sweep, Ping response, WMI Mining or NMAP scan. As we have seen with IoT devices, many of them don’t have MIBs, WMI libraries and in most (all) cases logs. Most malicious doers will prefer to do their work in the dark, if detected, they will try to use approved or common ports to remain unseen.

“All snakes who wish to remain in Ireland, please raise your right hand….” Saint Patrick

The likelihood that a compromised system is going to respond to an SNMP walk, Ping, WMI connection or volunteer what they are doing may be about as likely as a snake raising their right hand.

How ExtraHop works with the top 2 SANS controls:
Most systems try to engineer down to this level of detail, technologies such as SNMP, Netflow, logs and the like to do a pretty good job of getting 80-90 percent of the hosts but there are still blind spots. When you are a passive wire data analytics solution, you aren’t dependent on someone to “give” data to you, we “take” the data off the wire. This means if someone shuts off logging, deletes /var/log/* they cannot hide. A senior security architect once told me, “if it has an IP address it can be compromised”. To that we at ExtraHop would answer “if it has an IP Address, it can’t hide from us”. I cannot tell you what the next big breach or vulnerability will be, but what I CAN say with certainty (and trust me, I coined the phrase “certainty is the enemy of reason”, I am NEVER certain) is that it will involve one host talking to another host it isn’t supposed to. With wire data, if you have an IP address and you talk to another node who ALSO has an IP address. Provided we have the proper surveillance in place….YOUR BUSTED!

ExtraHop creates an inventory as it “observes” packets and layer 7 transactions. This positions the security team to account for who is talking on their network regardless of the availability of agents, Netflow, MIBs or WMI libraries. To add to this, ExtraHop applies a layer of intelligence around it. Below, you see a collection of hosts and locations as well as a transaction count. What we have done is import a customer’s CIDR block mapping csv that will then allow us to geocode both RFC1918 addresses as well as external addresses so that you have a friendly name for the CIDR block. This is a process of reconciling which networks belong to which groups and/or functions. As you can see, we have a few IP Addresses, the workflow here is to identify every IP address and classify it’s CIDR block until you can fully account for who lives where. This takes a process of getting an accurate inventory from, what can be, a 3 month or longer task into a few hours. Once you have reconciled which hosts belong to which functions, you have taken the first step in building your Security Controls foundation securing the first control. The lack of this control is a significant reason why many security practices topple over. An accurate inventory is the foundation, to quote the podcast “ya gotta know what you have first”.

Click Image:

SANS Control 2: Inventory of Authorized and Unauthorized Software:
While wire data cannot directly address this control, I tend to interpret (maybe incorrectly) this as being networked software. While something running on just one host could do significant damage to that one host. Most of us worry more about data exfiltration. This means that the malicious software HAS to do something on the Network. Here we look at both Layer 4 and Layer 7 to provide an inventory of what is actually being run on the systems that you have finally gathered an inventory for.

In the graphic below, you see one of our classified CIDR blocks. We have used the designation “Destination” (server) to get an accurate inventory of what ports and protocols are being served up by the systems on this CIDR block. (Or End Point Group if you are a Cisco ACI person). Given that I have filtered out for transactions being served up by our “Web Farm” the expected ports and protocols would be HTTP:8080, SSL:443, HTTP, etc. Sadly, what I am seeing below is someone SSHing into my system and that is NOT what I expected. While getting to this view too me only two clicks we can actually do better. We can trigger an alert letting the SOC or CSIRT know that there has been a violation. Later in this post, we will talk about how we could actually counter-punch this type of behavior using our API.

As far as SANS 2nd Control. If I look on a web server and I see that it is an FTP Client to a system in Belarus, I am generally left to conclude that the FTP is likely unauthorized. What ExtraHop gives you, in addition to an accurate inventory, is an accounting for what ports and protocols are in use by both the clients and servers using those segments. While this is not a literal solution for SANS 2nd control it does have significant value in that INFOSEC practitioners can see what is traversing their network and are positioned to respond with alerts or orchestrate remediation.

Layer 7 Monitoring:
In the video below, titled “Insider Hating”, you see our Layer 7 auditing capability. In this scenario we have set up an application inspection trigger to look for any queries of our EHR database. The fictitious scenario here is that we want to audit who is querying our EHR database to ensure that it is not improperly used or that someone does not steal PHI from it. When an attacker has stolen credentials or you have an insider, you now have an attack that is going to use approved credentials and approved ports/protocols. This is what keeps CIOs, CISOs and practitioners up at night. We can help and we HAVE helped on numerous occasions. Here we are setting up a L7 inspection trigger to look for any ad hoc like behavior. In doing so, we can position, not JUST the security team to engage in surveillance, but the system owners. This is an ABSOLUTE IMPARATIVE if we want to be able to stop insiders or folks with stolen credentials. We need to do away with the idea that security teams have a crystal ball. When someone runs a “Select * from ERH” from a laptop in the mail room, we can tell you that it came from the mail room and not the web server. We can also alert the DBA of this and get system owners to take some responsibility for their own security. This same query, to many security teams, will look like an approved set of creds using approved ports. This same information being viewed by the DBA or System owner may cause them to fall out of their chair and run screaming to the Security teams’ office. The of vigilance by system owners, in my opinion, is the single greatest reason breaches are worse than ever before in spite of the fact that we spend more money than ever.


Augmenting Logs:
I love logs, I love Splunk, LogRhythm and of course my old friend Kiwi!! But today’s threats and breaches happen so fast that using just logs positions you to operate in a largely forensic fashion. In many cases, by the time the log is written and noticed by the SOC the breach has already happened. Below you see a graphic from the Verizon DBIR that states that 93% of all compromises happen within minutes, 11% within seconds. Using just logs and batch processing to find these threats is great for rooting out patterns and malicious behavior but, as I stated previously, largely forensic. As a Wire Data Analytics platform we work/live in a world of microseconds and thus for us, seconds are hours and minutes are days. Current SIEM products, when not augmented with wire data analytics, simply don’t have the shutter speed to detect and notify or orchestrate a timely response.


I saw an amazing black-hat demo on how OpenDNS was using a hadoop cluster to root out C2 controllers and FastFlux domains. The job involved a periodic batch job using pic to extract domains with a TTL of 150. Through this process they were able to consistently root out “FastFluxy” domains to get a new block list.

We have had some success here collecting the data directly off the wire. I will explain how it works: (we are using a DNS Tunneling PCAP but C2 and Exfiltration will have similar behavior).

  • First we whitelist common CDNs and common domains such as Microsoft, Akamai, my internal intranet namespace, etc.
  • We collect root domains and we start adding the number of subdomains that we observe.
    • In the example below, we see pirate.sea and we start to increment each time we observe a subdomain
  • If a root domain has a count of over 50 subdomains within a 30 second period, we account for it. (thus the dashboard below)

The idea behind this inspection trigger is that if the root domain is NOT a CDN, not my internal namespace and not a common domain like Google or Microsoft, WHY THE HELL DOES THE CLIENT HAVE 24K lookups? Using logs, this is done via a batch process vs. using wire data, we uncover suspicious behavior in 30 seconds. Does that mean you don’t need logs or the ingenius work done by OpenDNS isn’t useful? Hell no, this is simply augmenting the log based approach to give you more agile tool to engage directly with an issue as it is happening. I am certain that even the folks at OpenDNS would find value in being able to get an initial screening within 30 seconds. In my experience, with good white listing, the number of positives is not overly high. Ultimately, if a single client makes 24500 DNS lookups for a domain that you don’t normally do business with, it’s worth investigating. We routinely see Malware, Adware as well as 3rd party, unapproved, apps that think they are clever by using DNS to phone home (yes YOU Dropbox) using this method.

Click Image:

SIEM products are a lynch pin for most security teams. For this reason, we support sending data to SIEM platforms such as LogRhythm and Splunk but we also provide a hand-to-hand combat tool for those SecOps (DevOps) focused teams who want to engage threats directly. In the hand-to-hand world of today’s threats, no platform gives you a sharper knife or a bigger stick than Wire Data Analytics with ExtraHop.

Automation and Orchestration (Digital counter-punching):
In an article in September of 2014 GCN asked “is automation security’s only hope?” With the emergence of the “human vector” what we have learned over the last 18 months is that you can spend ten million dollars in security software, tools and training only to have Fred in payroll open a malicious attachment and undo all of it within a few seconds. As stated earlier in this post, 11% of compromises happen within seconds. All, I hope, is not lost however, there have been significant improvements in orchestration and automation. At RSAC 2016 Phantom Cyber debuted their ability to counter-punch and won first prize in the innovation sandbox. You can go to my youtube channel and see several instances of integration with OctoBlu where we are using OctoBlu to query threat intelligence and warn us of malicious traffic. But we can go a lot further with this. I don’t think we have to settle for post-mortem detection (which is still quite valuable to restrict subsequent breach attempts) with logs and batched surveillance. Automation and orchestration will only be as effective as the visibility you can provide.

Enter Wire Data:
Using wire data analytics, keep in mind that ours is a world of microseconds, we have the shutter speed to observe and act on today’s threats and thread our observed intelligence into orchestration and automation platforms such as Phantom Cyber and/or OctoBlu and do more than just warn. ExtraHop Open Data Stream has the ability to securely issue an command whereby we send a JSON object with the parameters of who to block positioning INFOSEC teams to potentially stop malicious behavior BEFORE the compromise. Phantom Cyber supports REST based orchestration as does Citrix OctoBlu, most of your newer firewalls have API’s that can be accessed as does Cisco ACI. The important thing here to remember is that these orchestration tools and next generation hardware API’s need to partner with a platform that can not only observe the malicious behavior but thread the intel into these API’s positioning security teams for tomorrows’ threats.

My dream integrations include:

  • Upon observing FastFluxy behavior, sending OpenDNS an API call that resolves the offending domain to or a warning page
  • Putting a mac address in an ACI “Penalty box” (quarantine endpoint group) when we see them accessing a system they are not supposed to
  • Sending an API call to the Cisco ASA API to create an ACL blocking a host that just nmapped your DMZ

As orchestration and automation continues to take shape within your own practices, please consider what kind of visibility available to them. How fast you can observe actionable intelligence will have a direct effect on how effective your orchestration and automation endeavors are. Wire Data analytics with ExtraHop has no peer when it comes to the ability to set conditions that make a transaction actionable and act on it. Orchestration and automation vendors will not find a better partner that will make their products better than ExtraHop.

The threat landscape is drastically changing and the tools in the industry and rapidly trying to adapt. An orchestration tool is not effective without a good surveillance tool, a Wire Data analytics platform like ExtraHop is made better when coupled with an orchestration tool that can effectively receive REST based Intel. The solution to tomorrows’ threats will not involve a single vendor and the ability to integrate platforms using APIs will become key to implementing tomorrows’ solutions. The ExtraHop platform is the perfect visibility tool to add to your existing INFSEC portfolio. Whether you are looking to map out a Cisco ACI implementation or you want to thread wire data analytics into your Cisco Tetration investment, getting real-time analytics and visibility will make all of your security investments better. Wire Data Analytics will become a key part of any security team’s arsenal in the future and the days of closed platforms that cannot integrate with other platforms are coming to an end.

There is no security puzzle where ExtraHop’s Wire Data Analytics does not have a piece that fits.

If you’d like to see more, check out my YouTube channel:

Thanks for reading

John Smith










Advanced Persistent Surveillance: Threat Intelligence and Wire Data equals Real-time Wire Intelligence

Please watch the Video!!

As the new discipline of Threat Intelligence takes shape, Cyber Security teams will need to take a hard look at their existing tool sets if they want to take advantage of the dynamic, ever changing threat intelligence feeds providing them with information on which hosts are malicious and whether or not any of their corporate nodes have engaged in any sort of communications with any of the malicious hosts, DNS names or hashes that you are collecting from your CTI (Cyber Threat Intelligence) feeds. Currently the most common way that I see this accomplished is through the use of logs. Innovative products like Alienvault and Splunk have the ability to check the myriad of log files that they collect and cross reference them with CTI fees and check to see there have been any IP based correspondence with any known malicious actors called out by such feeds.

Today I want to talk about a different, and in my opinion, better way of integrating with Cyber Threat Intelligence using Wire Data and the ExtraHop Platform featuring the Discover and Explorer Appliances respectively.

How does it work? Well let’s first start with our ingredients.

  1. A threat analytics feed (open source, subscription, Bro or CIF created text file)
  2. A peer Unix-based system to execute a python script (that I will provide)
  3. An ExtraHop Discover Appliance
  4. An ExtraHop Explorer Appliance


  • ExtraHop Discover Appliance:
    An appliance that can passively (no agents) read data at speeds from 1GB to 40GB. It can also scale horizontally to handle large environments.
  • ExtraHop Explore Appliance:
    ExtraHop’s Elastic appliance that allows for grouping and string searching INTEL gathered off the wire.
  • Session Table: ExtraHop’s memcache that allows for instant lookup of known malicious hosts.

The solutions works by using the Unix peer to execute a python script that will collect the threat intelligence data. It then uploads the malicious hosts into the Discover Appliance’s Session Table (up to 32K records). The Discover appliance then waits to observe a session that connects with one of these known malicious sites. If it sees a session with a known site from the TI feed activities include, but are not limited to the following:

  • Updates a Threat Intelligence dashboard
  • Triggers an alert that warns the appropriate Incident Response team(s) about the connection to the malicious host
  • Writes a record to the ExtraHop Explorer Device
  • Triggers a Precision PCAP capturing the entire session/transaction to a PCAP file to be leveraged as digital evidence in the event that “Chet” the security guard needs to issue someone a cardboard box! (not sure if any of you are old enough to remember “Chet” from weird science)

Please Click Image:


Below you see the ExtraHop Threat Intelligence Monitoring Dashboard (last 30 minutes) showing the Client/Server and Protocol as well as the Alert and a running count of violations: (this is all 100% customizable)

Please Click Image:

On the Explorer Appliance, we see the custom data format for Malicious Host Access and we can note the regularity of the offense
Please Click Image:

And finally we have the Precision Packet Capture showing a PCAP file for forensics, digital evidence and if needed, punk busting.
Please Click Image:

The entire process that I have outlined above took less than one minute to complete every single task (Dashboard, Alert, EXA, PCAP). According to Security Week, the average time to detect a breach has “Improved” to 146 days in their 2015 report. Cyber Threat Intelligence has a chance to drastically reduce the amount of time it takes to detect a breach but it needs a way to interact with existing data.  ExtraHop positions your Threat Intelligence investment to interact directly with the network, and in real time.  Many incumbent security tools are not built to accommodate solutions like CTI feeds via API or do not have an open architecture to leverage Threat Intelligence, much less use memcache to do quick lookups. The solution outlined above using ExtraHop with a Threat Intelligence feed positions INFOSEC teams to be able to perform Advanced Persistent Surveillance without the cost of expensive log indexing SIEM solutions. Since the data is analyzed in flight and in real time, you have a chance to greatly reduce your time to detection of a breach, maybe even start the Incident Response process within a few minutes!

What you have read here is not a unicorn, this exists today, you just need to open your mind to leveraging the network as a data source (in my opinion the richest) that can work in conjunction with your log consolidation strategy and maximize your investment in Cyber Threat Intelligence.

Incidentally, the “Malicious Host” you see in my logs is actually  I did NOT want to browse any of the hosts on the blacklist so I manually added my host to the blacklist the accessed it.  Rest assured, is not on any blacklists that I am aware of!

Thanks for reading!

John M. Smith