Category Archives: Advanced Persistent Threat

Advanced Persistent Surveillance: Re-thinking Lateral Communications with Wire Data Analytics

Several high profile compromises, involving breaches of trusted systems working over trusted ports, has – once again – raised the issue of lateral communications between internal hosts.  Breaches will continue as hackers evolve and learn to work around existing countermeasures that are, at times, overly based on algorithms and not based enough on surveillance.

So what is an infosec practitioner to do?

How practical is monitoring lateral communications?

Do we assign a person to look at every single build-up and tear-down?

Do we set all of our networking equipment to debug level 7 and pay to index petabytes of logs with a big data platform?

Do we assign a SecOps resource to watch every single conversation on our network?

Answer: Maybe…or maybe not.

Most of our critical systems (Cardholder Data Environment, CRM Databases, EMRs and HIS) are made up of a group of systems, some are client-server some are tiered with web services or MTS (Microsoft Transaction Services) acting as middleware and some are legacy socket driven solutions.  All of them have a common set of expected communications that can be monitored.

What if we could separate the millions of packets, and hopefully lion’s share, of expected communication from that communication which is unexpected?

What if we could do it at layer 7?

Using ExtraHop’s Wire Data Analytics Platform INFOSEC teams and application owners are positioned to be able to see non-standard lateral communications that would otherwise go unnoticed by incumbent IPS/Anti-malware/Anti-Virus tools.  The fact is, while we need the existing tools set, today’s complicated breaches tend to hide in the shadows communicating over approved ports and using trusted internal hosts.  ExtraHop shines light on this behavior leaving them exposed and positioning teams to “get their ‘stomp’ on” and stamp out these threats like cockroaches.

How we do it: 
Most INFOSEC practitioners have worked with Wire Data before though their IPS and IDS systems. ExtraHop’s platform is similar in that we work off of a span but instead of looking for specific signatures we observe and rebuild Layer 4-7 flows supporting speeds of up to a sustained 20 Gb per second. We also use a technology called triggers to support specific conditions we want to monitor and alert on (such as anomalies in lateral communications) This is a contrast from most of our perimeter defenses that scale into the megabit/single gigabit range, we are able to work up to the tens of gigabits range. The same innovation that allows us to collect Operational Intelligence and Performance metrics directly off a span port can be used to provide layer 4-7 digital surveillance of critical systems within your or your customer’s infrastructure.

While it is not practical to monitor every single transaction between your critical systems and other components of your enterprise applications, it is now possible, to monitor the unexpected traffic.  For instance, if we have a tiered application that has a web front end, RESTFUL web service tier and a back end database tier.  We can define the expected traffic that we should see between the tiers (what should be the lions share) and ignore that traffic while reporting on the traffic that is NOT expected.

Figure 1: We can write a trigger that ignores expected communications and reports on unexpected communications.

 

 

We can accomplish this task by writing two triggers, the first trigger is a basic Layer 4 detail surveillance trigger where we white-list selected hosts and protocols and only report on communications outside our expected conversations.  The second trigger is a layer 7 trigger that leverages ExtraHop’s layer 7 database fluency of SQL Server communications.  The layer 7 trigger will white-list specific stored procedures that are run from our REST tier that make up our Layer 7 expected communications.  Anything outside of these will be accounted for.

The first trigger makes the following assumptions:
The Web/Public tier comes in from hosts 192.168.1.10/20  and the REST tier is at 192.168.1.30.  The approved ports are 8000, 8080 and 1433.

The first trigger monitors client communications from specific clients and over specific ports.  Unlike NAC (Network Access Control), we are simply monitoring unexpected communications and we are not blocking anything.  In many cases we white list the Active Directory Controllers (if a Windows environment) and we would likely white list the WSUS server for Windows environments.  With the trigger below, you would be alerted of every RDP connection, any direct CIFS access or any exfiltration attempts that utilize ports not previously approved.  This simple trigger could have warned any number of breach victims of the staging that was going on prior to data being exfiltrated.  This trigger took less than one day to write (Don’t be intimidated by the javascript, I knew none when I started and we have people who can help you with it)

Leveraging layer 7 fluency:
Layer 7 surveillance is also a critical part of preventing tomorrow’s sophisticated breaches. In the trigger below, we are watching for expected layer 7 communications. A common occurrence in many high profile breaches is the compromise of a trusted system that is allowed to traverse sensitive networks. In the example above, if the REST tier were to become compromised it is particularly dangerous due to it’s being a trusted host of the Database environment (likely an open ACL). Using the trigger below, we can monitor which stored procedures we should be seeing connecting to the database. Stopthehacker.com states that there is a 156 day laps between the time a computer is compromised and the time it is detected. That is nearly six months that, in the event of a SOAP/REST tier breach, they have to run ad hoc queries against my back end database. For this reason, properly identifying anomaly’s at layer 7 (What SQL Statements are being run?) will be key in preventing/mitigating data loss and just might keep you out of the news.

So using the trigger we have created, if I run the following command (Example: emulate a breached Middleware server)

We are able to increment the counters in the ExtraHop dashboard:

Clicking on the “3” shows us the Offending IP and the Query that they ran:

 

And here we see that we can, in fact, send the data to Splunk or any other SIEMS product:

Digital Evidence:

You can also assign a precision packet capture to a trigger that will create a pcap file that you can use as digital evidence after the fact.

 

Sample Scenario: Middleware Breach
To show how a ExtraHop can detect a middleware breach (see Figure 2) using the two triggers above you would first catch the rogue queries being run with the layer 7 surveillance while ignoring the common stored procedures.  ExtraHop’s Wire Data Analytics will also catch the communications with the Exfiltration/staging server because the communications are outside of those we set in the trigger.    ExtraHop sees this communication and implements the steps in the triggers, from the first trigger it sees the REST tier communicating with an unknown host over an unknown port.  In the second trigger it sees the ad hoc queries mixed in with the normal stored procedures

Figure 2: SOAP/REST Tier Breached:
In figure 2 below you see how an ExtraHop Wire Data analytics appliance can be written to ignore expected traffic and only report on unexpected traffic.

 

Conclusion:

There has been a lot of talk about the need to monitor lateral communications but there has been little practical information on how to do it. Your digital surveillance strategy with Wire Data Analytics will be a living process that in which you periodically update and evaluate your connectivity profile. This is a crucial process often overlooked in INFOSEC.

Using ExtraHop’s platform to build out a surveillance strategy makes the once daunting prospect of monitoring lateral communications a workable process that can provide peace of mind.   We need to accept the fact that we cannot keep all Malware from getting inside; there are two types of systems, breached and about-to-get breached.  I think we need to look at our INFOSEC practice from the WHEN perspective and not the IF perspective.

As infiltration attempts become more sophisticated the ability to spot the building blocks of a data breach will become increasingly valuable.  ExtraHop can provide that extra set of eyes to do the heavy lifting for you while reporting on communications beyond those you expect, reducing the burden of additional auditing.  The reporting/surveillance framework also allows for application owners and shared services teams to get involved in INFOSEC by reviewing the periodic reports and, hopefully in most cases, deleting them.

Wire Data Analytics with the ExtraHop platform can provide your organization with a daily, weekly and monthly digital police blotter giving you the information you need to stay vigilant in the face of this new breed of threat.

Thank you for reading

 

John M. Smith

 

 

 

The “Arm-chair Architect”: Healthcare Dot Gov

Making the news the last few weeks has been the problems associated with the Healthcare.gov website. Being interested in the Healthcare Exchanges and seeing what might be available I decided to sign up. While doing so I decided to connect to healthcare.gov from my lab so that I could see the results in Extrahop and try to get some wire data of the experience.

My Experience:
There were definitely some areas that were slower than others but luckily I signed in during the AM on the east coast so I am guessing the site was not extensively busy at the time. Currently I am waiting to hear back on my eligibility and they sent me a document to download but I am currently unable to download it as it times out repeatedly. Outside of that, the initial sign up took about 15 minutes and while there were some slowdowns, it was not so bad compared to other municipal sites (save those I hosted as a Federal Employee at the CDC which snapped smartly to all requests J )

While several pundits and are having a great time making fun of the Federal Government and the issues with the Healthcare.gov site, as a former Federal Employe and Contractor for over ten years I can tell you that I worked 50+ hours a week routinely and while there are well noted inefficiencies in the Federal Government some of the smartest people I ever worked with were feds. I am NOT dancing on the sorrows of anyone and I have NO DOUBT that people are busting their asses to make the end user experience as productive as possible. Regardless of how you feel about ObamaCare, we paid for this site to be up and it needs to work as well as possible, if you are involved with this project, I feel ya.

That said, it’s no secret that I am a big fan of wire data and Extrahop and this article is an attempt to promote it and with that, I will go into detail on how I used Extrahop to gain wire data (no agents installed on my workstation, all data was taken directly from the wire) and I will provide info on what could be done if Extrahop were located behind their firewalls and aggregated.

My lab setup: I have a span set up on a Cisco 3550 switch (it’s a lab but it should work on your SDN, Nexus or 6500) that grabs data from the uplink to my firewall. Extrahop has the ability to handle 20GB/s on an appliance and they can Cluster several of them if you want to aggregate several of them and manage them from a single console. For my test, I launched a published desktop within my Citrix farm and signed up for the Obamacare site from behind the span.

While signing up to look at the Healthcare Exchanges I did two things, first Extrahop grabbed the Layer 7 visibility and provided performance metrics on all non-encrypted URI stems. I also used Extrahop Triggers to send the following Events to a Splunk Syslog Server for parsing and reporting.

  • FLOW_TICK
  • FLOW_TURN
  • HTTP_REQUEST
  • HTTP_RESPONSE
  • SSL_OPEN
  • SSL_RECORD

Findings:

I will try to keep my suggestions and opinions to a minimum but I do have some suggestions that I will include later. For the most part I just want to report on what Extrahop was able to grab from the wire and either report in the Extrahop Console or Report to Splunk.

Data in the Extrahop Console:
From the Extrahop console I drilled into the Citrix server that I signed up for Healthcare.gov on. From there I am given a menu of options

Layer 4 TCP:
When I first start to troubleshoot an issue, after verifying Layer 1 and Layer 2 integrity (in this case, my lab is in a two post rack two feet from me so I can verify that) I dive into Layer 4 where Extrahop really starts to give you a solid holistic view of your environment. There are two views within the Layer 4 environment, the first is the L4 TCP node. The L4 TCP Node provides a quick holistic view of the Layer 4 bandwidth data both open, established and closed sessions as well as reporting on the Aborted sessions. You also see a graphing of the RoundTrip time.

If an Extrahop appliance were on a spanned port inside the healthcare.gov network, similar metrics could be provided for web farms, back end Database connections and SOAP/RESTful api calls. For this test, I only had the ability to grab the wire data from the client perspective. There would be a much larger breadth of data were the Extrahop appliance located on the healthcare.gov side. Also included in the L4 TCP node view are graphs on inbound/outbound congestion and inbound/outbound throttling.

Layer 4 TCP > Details:
The second node is the Layer 4 Details node.  I am a Systems Admin first and a Network Engineer a distant second. While I pride myself on being a generalist, I usually ask for help when looking at Layer 4 details just to make sure I know what I am looking at. I will give you my best effort observation of the L4 TCP > Details node.

Looking below, you see drill-down options on Accepted, Connected, Closed, Expired and Established Sessions. On the In and Out grids I generally look at Resets, Retransmission Timeouts (RTOs), Zero Windows Abords and Dropped Segments. A more experienced Network Engineer may focus on other metrics. Again, if we had an Extrahop Appliance on the inside, we would see the wire data for the actual web server.

As you can see below, when we look at the L4 Detail data we see a much higher number of outbound (From my client to Healthcare.gov) Aborts, Dropped Segments, Resets and Retransmissions. If you had a web farm, you could trigger this data and find a problem node in the group. You can click on any of the linked metrics to drill in to see which hosts are dropping segments, Aborting Connections, etc.

L7 Protocol Node:
The L7 Protocols node provides a holistic view of the protocol utilization during the specified time period as well as the peer devices. From the client perspective, you can see the sites that are providing data either in iframes or the client is sent to directly as a result of healthcare.gov redirecting. You see two charts of incoming and outgoing protocol usage broken down by L7 technology. Also, below you see a list of peer devices, I generally look here as well to see if a CRL or OCSP service is not responding fast enough and delaying my site or if I have an infected iFrame that is sending a user to a rouge site. We will get more into peer performance in the trigger section.

From here you can drill into actual Packets and Throughput per Protocol as well as take an interesting look at Turn Timing (also discussed in the trigger section) where you can see the performance of specific protocols.

Layer 7 Protocols > Turn Timing:
Within the turn timing you can see the Network In (Client to Server) Processing Time (Server Performance or Time waiting to respond) and Network Out (Server Response back to the Client).

If the Appliance were on the inside, this could be very valuable to see if there were back end systems that were not responding. From the Client Perspective, looking at the information below, it appears the servers themselves (processing time) seemed to actually perform relatively well on average (we will get into more detail on the triggers) and we seemed to have issues with the web servers responding back to the client. Keep in mind that this data is JUST from the client to the Healthcare.gov site. It would e considerably more valuable to have information on the performance inside Healthcare.gov.

Layer 7 Protocols > Details:
The details page can also be very valuable, if you are using a specific server for all of your images in your web farm you can take a look at the bandwidth and find out if a specific server has larger images than you have planned on. Also, you can ensure that all of the peer communications are with appropriate IP Addresses. If you are integrating with outside partners, you are only as secure as they are. Sometimes it’s best to periodically verify who your peer nodes are.


DNS: *Disclosure I forgot to use an external DNS Server so in the initial test, my DNS Server was local and therefore did not traverse the span and was not logged in Extrahop. I went back and added an external DNS Server and went back to the site to do some browsing to get these metrics

Few people fully realize the extent to which slow DNS resolution can wreck an application. In the DNS Node you can quickly get a glance at the number of requests and the performance of your DNS Servers as well as drill down into errors and timeouts. If your Web server is consuming a RESTFUL API and the DNS Resolution takes 200ms and the API is called several thousand times a minute, you could see a lot of waiting while using the web app. As previously stated, if I had an Extrahop Appliance inside the healthcare.gov network we could see if the web front end were having trouble resolving any names of the tiered API’s they are consuming.


HTTP:
While the majority of the site is delivered via SSL there are a few actions that are delivered by HTTP, the HTTP Node provides a holistic view of the overall environment. In my case, I was the Client so I would set my Metric Type to Client and look at the data. From there I have drilldowns for Errors, URIs and Referrers, if I were looking at a Healthcare.gov Webserver I would select the Server Metric type and look at the same data. You see below I have Status Codes, methods transaction metrics and transaction rates readily available. If you put the SSL Keys on the Extrahop Appliance (like you would in wireshark) you can also get the Layer 7 performance of Every URI stem that is being delivered via SSL. This could then be used to alert you to slow web applications or downstream API farms where you are consuming web servers from 3rd party partners. I understand that exporting SSL keys is EXTREMELY taboo in the Federal space but I believe you can remove the keys once you have finished troubleshooting.

SSL:
Due to the data being encrypted there isn’t as much SSL Data as there is with other protocols. When you click on the SSL Node you can ensure that web servers have been configured with FIPS Compliant Ciphers and you can double check key lengths by Clicking on the “Certificates” link. From this menu you will see the Session details, if sessions are being aborted, which versions are being used (root out non-compliant SSLv2/v3 Certs). If I had an appliance on the inside, I would look at the “Aborted” Metric within the “Session Details” area.

Extrahop Trigger Data with Splunk Integration:
My favorite feature of Extrahop is it’s trigger function. Extrahop has the ability to fire off a syslog message, custom metrics or even a pcap compliant capture based on a set of criteria you give it. In the case of Healthcare.gov, they could set a trigger that states, only syslog REST transactions that take longer than 200ms or Alert me when a database transaction occurs for specific tables in a database. Because I am looking at healthcare.gov from a client perspective I can only provide triggers on the Client end but if I had an appliance inside, I could see not only the Client interaction with the site but I could trigger on downstream performance of Databases, IBM Queueing, SOAP/REST calls and Slow DNS Lookups.

As I stated previously, I have triggers on the following. In the case of Healthcare.gov I may also look at the performance of the DNS Servers. Currently, I am only reporting on the DNS failures and not the performance. This can be added in less than a minute.

  • FLOW_TICK/FLOW_TURN
  • HTTP_REQUEST/HTTP_RESPONSE
  • SSL_OPEN/SSL_RECORD

Let’s examine a few of these triggers and see what we can glean from the information.

FLOW data:
Within the TCP Flow Triggers I want to look at the FLOW_TURN data as this gives us a good indication of where potential bottlenecks are and how long a client waited for a response from a server. In the FLOW_TURN trigger I am going to grab the following metrics and map them by average to ServerIP. When I make a request there are a number of potential bottlenecks that need to be monitored on the wire. DNS Performance, the client request, the server processing time and then finally the server response. I can wait on any one or all four of them. Within this first Splunk query I am going to look at the client request performance, server processing time and server response. The triggers I use for FLOW_TURN can be found in the Triggers/Downloads section of the blog.

What am I doing in this Query:
The query below uses a Regular expression to convert the ServerIP into “ip”. The “ip” field can be passed to a reverse lookup function within Splunk that will give me the hostname of the ServerIP field. If you are not a big RegEX person there is plenty of RegEx material you can research. I have always found that if you just hit <Shift> and the Number keys enough times, eventually what you are looking for will come up on the screen…J. If you are not interested in learning RegEx then you can simply copy the query below.

Splunk Query:
FLOW_TURN | Search ClientIP=”192.168.1.82″ | rex field=_raw “ServerIP=(?<ip>.[^:]+)\sServerPort” | stats count(_time) as Instances avg(TurnReqXfer) avg(TurnRespXfer) avg(tprocess) by ip | lookup dnsLookup ip

Below you see the results of the query above, I have taken the time to sort by the slowest return time and you see that the server odlinks.govedelivery.com had the slowest average response turn time of more than 3 seconds per response. The saving grace is that there were only three instances of it. Far more concerning is 191ms tprocess metric that occurred 676 times.(Please note that it is hitting the Akamai front end, I AM NOT saying Akamai is the bottleneck but there may be a back end server that is causing the slowdown. Again, if I had an appliance on the inside, I could get this metric for each server in the web farm. That said, the nearly 200ms tprocess time is the time that Extrhop observed before the server sent a response packet. This can give you an indication of how long the server took to respond, either due to DNS Resolution (how many DNS Suffixes are in YOUR IP Configuration!?) or just time processing the information.

Once you have the data in Splunk it becomes like “Baseball Stats” where you can get the average Response Transfer time of west coast customers between midnight and six AM on weekdays. The amount of stats you can query is dizzying. From the data below, I can see the average performance by Server.

One other point I want to make about the FLOW_TURN trigger is that it is very valuable in SSL Environments where you cannot get performance metrics because the data is encrypted. While I do not have URI Stems and SOAP/REST calls I do have the basic Layer 4 performance data which can be very valuable in instances where using the Private Key on the appliance is not possible.

*Please note that the data below is what was collected while I was signing up on healthcare.gov. While I was not knowingly doing anything outside of that, some connections may have been made to other sites that are not affiliated with healthcare.gov and may show up in the stats below. There is NO HIDING from wire data and without knowing the application I am not sure who to exclude. Please make a note of it.

HTTP_REQUEST/HTTP_RESPONSE:
Due to the site conducting all but the a few packets in SSL the HTTP data is actually quite lite. I did want to point out what you can get from the HTTP data and show how you can correlate it with a big data back end like Splunk. You can essentially trigger any HTTP Header value as well as the performance of web applications using the HTTP_RESPONSE trigger. Below you are seeing the performance of URI stems for the healthcare.gov site as well as the performance of the CRLs. In instances where environments are “Locked Down” the lack of access to CRL’s and OCSP can have a negative impact on Web Applications. Here we note that there are no performance issues with CRLs or OCSP sites. The only Healthcare.gov URI that we see is the initial site. Everything thereafter was encrypted
Note: With the keys installed on the Extrahop Appliance, you can see the performance of each URI stem and quickly identify web services that are not performing properly.
The Splunk Query:
Note I am removing some “noise” from the results. I had bing as my search and I had to go to gmail to verify my login.

HTTP_* |search ClientIP=”192.168.1.82″ Host!=”mail.google.com” Host!=”gmail.com” Host!=”api.bing.com” | table _time eh_event ServerIP Host HTTP_uri tprocess


SSL_OPEN/SSL_RECORD:
Most of the relevant data was from the SSL_OPEN trigger, the only unique item I was triggering from SSL_RECORD was Key Size, I am not sure you can even get a 1024 bit Key anymore, but all keys were 2048 bit so I will not include SSL_RECORD in this article.

While it does not say much in terms of performance, it is sometimes nice to just make sure those Certificates that people are using are what you expect, especially when you are working with 3rd parties and partners. This will allow you to ensure that all SOAP/RESTful web services are meeting the FIPS encryption standards.

The Splunk Query: Note we are, once again, using REGEX to parse out the “ip” so that we can perform a reverse lookup.
eh_event=”SSL_OPEN” | rex field=_raw “ServerIP=(?<ip>.[^:]+)\sServerPort”| eval SSL_EXP=strftime(SSL_EXP,”%+”) | stats count(_time) as Instances by ip SSL_VERSION SSL_CIPHER SSL_SUBJECT SSL_EXP | lookup dnsLookup ip

Conclusion:
I am certain that there is no end of “Arm Chair” architects offering DHHS advice. Like I said, I am NOT dancing on anyone’s sorrows here. As a former DHHS (CDC) employee myself I know that they are working around the clock to fix any issues users are having. I feel like the first step in that process is to start gathering Operation Intelligence and Extrahop can do that without any impact on the existing Server architecture. As I stated throughout the post, the data I have is Client side and the data they could collect inside the healthcare.gov network would be orders-of-magnitude more valuable.

There are a few VERY important things to note for Feds/Govies (or anyone else) who want to leverage Extrahop’s Wire Data

  • It does NOT require any agents, there will be minimal, if any, changes to the incumbent C & A framework and you should only have to get the Appliance approved. This means that you can call them today, get the appliance rack mounted and tie it into your span and start looking at data without doing ANY configuration changes to the servers.
  • They have a free Discovery Edition that you can use to perform your own Proof of Concept that is a VM.
  • As I stated previously, they can handle up to 20GB/s of data per appliance and they can be clustered so that they are centrally managed as well as aggregated.
  • It will integrate with your existing Splunk environment or any other Syslog server that you have in place. I have used it with both KIWI and Splunk.
  • It augments existing INFOSEC strategies by allowing real-time access to wire data to find Malware, DNS Cache Poisoning (Pharming) and Session hijacking within seconds.

You can check out the Discovery Edition here at: (Please tell them John@wiredata.net sent you J )
http://www.extrahop.com/discovery/
Thanks for Reading

John

Go with the Flow! Extrahop’s FLOW_TICK feature

I was test driving the new 3.10 firmware of ExtraHop and I noticed a new feature that I had not seen before (it may have been there in 3.9 and I just missed it). There is a new trigger called FLOW_TICK, that basically monitors connectivity between two devices at layer 4 allowing you to see the response times between two devices regardless of L7 Protocol. This can be very valuable if you just want to see if there is a network related issue in the communication between two nodes. Say, you have an HL7 interface or a SQL Server that an application connects to. You are now able to capture flows between those two devices or even look at the Round Trip time of tiered applications from the client, to the web farm to the back end database. When you integrate it with Splunk you get an excellent table or chart of the conversation between the nodes.

The Trigger:
The first step is to set up a triggler and select the “FLOW_TICK” event.

Then click on the Editor and enter in the following Text: (You can copy/Paste the text and it should appear as the graphic below)

log(“RTT ” + Flow.roundTripTime)
RemoteSyslog.info(
” eh_event=FLOW_TICK” +
” ClientIP=”+Flow.client.ipaddr+
” ServerIP=”+Flow.server.ipaddr+
” ServerPort=”+Flow.server.port+
” ServerName=”+Flow.server.device.dnsNames[0]+
” RTT=”+Flow.roundTripTime
)

Integration with Splunk:
So if you have your integration with Splunk set up, you can start consulting your Splunk interface to see the performance of your layer 4 conversations using the following Text:
sourcetype=”Syslog” FLOW_TICK | stats count(_time) as TotalSessions avg(RTT) by ClientIP ServerIP ServerPort

This should give you a table that looks like this: (Note you have the Client/Server the Port and the total number of sessions as well as the Round Trip Time)

If you want to narrow your search down you can simply put a filter into the first part of your Splunk Query: (Example, if I wanted to just look at SQL Traffic I would type the following Query)
sourcetype=”Syslog” FLOW_TICK 1433
| stats count(_time) as TotalSessions avg(RTT) by ClientIP ServerIP ServerPort

By adding the 1433 (or whatever port you want to filter on) you can restrict to just that port. You can also enter in the IP Address you wish to filter on as well.

INFOSEC Advantage:
Perhaps an even better function of the FLOW_TICK event is the ability to monitor egress points within your network. One of my soapbox issues in INFOSEC is the fact that practitioners beat their chests about what incoming packets they block but until recently, the few that got in could take whatever the hell they wanted and leave unmolested. Even a mall security guard knows that nothing is actually stolen until it leaves the building. If a system is infected with Malware you have the ability, when you integrate it with Splunk and the Google Maps add-on, to see outgoing connections over odd ports. If you see a client on your server segment (not workstation segment) making a 6000 connections to a server in China over port 8016 maybe that is, maybe, something you should look into.

When you integrate with the Splunk Google Maps add-on you can use the following search:
sourcetype=”Syslog” FLOW_TICK | rex field=_raw “ServerIP=(?<IP>.[^:]+)sServerPort” | rex field=_raw “ServerIP=(?<NetID>bd{1,3}.d{1,3}.d{1,3})” |geoip IP | stats avg(RTT) by ClientIP IP ServerPort IP_city IP_region_name IP_country_name

This will yield the following table: (Note that you can see a number of connections leaving the network to make connections in China and New Zealand, the Chinese connections I made on purpose for this lab and the New Zealand connections are NTP connections embedded into XenServer)

If you suspected you were infected with Malware and you wanted to see which subnets were infected you would use the following Splunk Query:
sourcetype=”Syslog” FLOW_TICK
%MalwareDestinationAddress%
| rex field=_raw “ServerIP=(?<IP>.[^:]+)sServerPort” | rex field=_raw “ClientIP=(?<NetID>bd{1,3}.d{1,3}.d{1,3})” | geoip IP | stats count(_time) by NetID

Geospatial representation:
Even better, if you want to do some big-time geospatial analysis with Extrahop and Splunk you can actually use the Google Maps application you can enter the following query into Splunk:
sourcetype=”Syslog” FLOW_TICK | rex field=_raw “ServerIP=(?<IP>.[^:]+)sServerPort” | rex field=_raw “ClientIP=(?<NetID>bd{1,3}.d{1,3}.d{1,3})” |geoip IP | stats avg(RTT) by ClientIP NetID IP ServerPort IP_city IP_region_name IP_country_name | geoip IP

Conclusion:
I apologize for the RegEx on the ServerIP field, for some reason I wasn’t getting consistent results with my data. You should be able to geocode the ServerIP field without any issues. As you can see, the FLOW_TICK gives you the ability to monitor the layer 4 communications between any two hosts and when you integrate it with Splunk you get some outstanding reporting. You could actually look at the average Round Trip Time to a specific SQL Server or Web Server by Subnet. This could quickly allow you to diagnose issues in the MDF or if you have a problem on the actual server. From an INFOSEC standpoint, this is fantastic, your INFOSEC team would love to get this kind of data on a daily basis. Previously, I used to use a custom Edgesight Query to deliver a report to me that I would look over every morning to see if anything looked inconsistent. If you see an IP making a 3389 connection to an IP on FIOS or COMCAST than you know they are RDPing home. More importantly, the idea that an INFOSEC team is going to be able to be responsible for everyone’s security is absurd. We, as SyS Admins and Shared Services folks need to take responsibility for our own security. Periodically validating EGRESS is a great way to find out quickly if Malware is running amok on your network.

Thanks for reading

John M. Smith