Tag Archives: INFOSEC

SACCP: Stream Analytics Critical Control Point

I left the enterprise approximately 30 months ago after being a cubicle drone for the last 18 years.  I now work for ExtraHop Networks, a software company that makes a wire data analytics platform for providing operational intelligence to organizations around their applications, the data that traverses their wire and basically shines light on the somewhat opaque world of packet analysis.

In the last few years, I can honestly say that I find myself getting a bit frustrated with the number of breaches that have occurred due, in my opinion, in large part to the lack of involvement by system owners in their own security. For my household alone, in the last 24 months, we are on our 5th credit card (in fact, I look at my expiration dates on most of my credit cards and chuckle on the inside knowing I will never make it.) I am also a former Federal Employee with a clearance so I also have the added frustration of knowing several Chinese hackers likely had access to my SF86 information (basically my personal and financial life story). In the last 15 years, we have added a range of regulatory framework, Security Operations Centers (SOC), I have watched INFOSEC budgets bulge while needing to justify my $300 purchase of Kiwi Syslog server. I have concluded that maybe the time has come for the industry to try a new approach. The breaches seem to get bigger and no matter what we put in place, insiders or hackers just move around it. At times I wonder if a framework I learned in my career prior to Information Technology may be just what the industry needs?

My first job out of College was with Maricopa County Environmental Health (I was the health inspector) and I was introduced to a concept called HACCP (Hazard Analysis Critical Control Point) and I think some of what I learned from it can be very relevant in analyzing today’s distributed and often problematic environments.


HACCP, pronounced “hassup”, is a methodology of ensuring food safety by the development of a series of processes that ensure, in most cases, that no one gets sick from eating your food.  It involves evaluating the ingredients of each dish and determining which food is potentially hazardous and what steps need to be taken to ensure that quality is ensured/maintained from food prep to serving.

While working as the health inspector, I was required to visit every permit holder twice a year and perform a typical inspection that involved taking temperatures, making sure they had hot water, employees washed hands and stayed home when they were sick, etc. But in most if not all of the restaurants I inspected, the process of checking temperatures, ensuring there is soap at the hand wash station and making sure there is hot water did not JUST happen during an inspection, I knew that in most cases it went on even when I was not on the premises. Sadly, in today’s enterprise, generally systems are only checked and/or monitored when an application team is being audited. An incumbent INFOSEC team cannot be responsible for the day to day security of a shared services or hosting team’s applications any more than I could be in every single restaurant every single day. The operator has to take responsibility; I am proposing the same framework for today’s enterprise. Share services and hosting teams need to take responsibility for their own security and use INFOSEC as an auditing and escalation solution. I will attempt to parallel how ExtraHop’s Stream analytics solution can provide an easy way to accomplish this even in today’s skeleton crew enterprise environments.

Let’s start with some parallels.

An example of a HACCP based SOP would be:

  • The cooling of all pre-cooked foods will ensure that foods are cooled from 135 degrees to 70 degrees within two hours
  • The entire cooling process from 135 degrees to 41 degrees will not take more than 6 hours.

So, I am taking away the “H” and putting in an “S” for SACCP I am proposing that we do the same for our applications and systems that we support at the packet level.  Just as ingredients may have chicken, cheese and other potentially hazardous ingredients applications may have SSO logins, access tokens, PII being transferred between DB and Middle or Front End tiers. We need to understand each part of an infrastructure that represents risk to an application and what an approved baseline is, what mitigation steps to take and who is responsible for maintaining it.  Let’s take a look at the 7 HACCP/SACCP principles.

Principle 1 – Conduct a Hazard Stream Analysis
The application of this principle involves listing the steps in the process and identifying where there may be significant risk. Stream analytics will focus on hazards that can be prevented, eliminated or controlled by the SACCP plan. A justification for including or excluding the hazard is reported and possible control measures are identified.

Principle 2 – Identify the Critical Control Points
A critical control point (CCP) is a point, transaction or process at which control (monitoring) can be applied to ensure compliance and, if needed, a timely response to a breach.

Principle 3 – Establish Critical Limits
A full understanding of acceptable thresholds, ports and protocols of specific transactions will help with identifying when CCP is outside an acceptable use.

Principle 4 – Monitor Critical Control Point
Monitor compliance with CCPs using ExtraHop’s Stream analytics Discover and Explorer appliances to ensure that communications are within the expected and approved ports and protocols established in each CCP.

Principle 5 – Establish Corrective Action
Part of this is not only understanding what to do when a specific critical control point is behaving outside the approved limits but to also establish who owns the systems involved in each CCP.  For example, if a Critical Control Point for a server in the middle-tier of an application is suddenly SCP-ing files out to a server in Russia, establish who is responsible for ensuring that this is reported and escalated as soon as possible as well as establish what will be done in the event a system appears to be compromised.

Principle 6 – Record Keeping
Using the ExtraHop Explorer appliance, custom queries can be set up and saved to ensure that there is proper compliance with established limits. Also integration with an external SIEM for communications outside the established limits can be enabled as well as HTTP push and Alerting.

Principle 7 – Establish Verification
Someone within the organization, either the INFOSEC team or team lead/manager must verify that the SACCP plan is being executed and that it is functioning as expected.

So what would a SACCP strategy look like?

Lets do a painfully simple exercise using both the ExtraHop Discover Appliance and ExtraHop Explorer Appliance to create a Stream Analytics Critical Control Point profile.

Scenario: We have a Network that we want to call “Prod”.

Principal 1: Analysis
Any system with an IP Address starting with “172.2” is a member of the Prod network and there should ONLY be INGRESS sourcing from the outside (The Internet) and Peer-to-Peer communications between Prod Hosts. No system on the Prod network should establish a connection OUTSIDE Prod.

Principal 2: Identify CCPs
In this case, the only Critical Control Point (CCP) is the Prod network.

Principal 3: Limits
As stated, the limits are that Prod hosts can accept connections from the outside BUT they should not establish any sessions outside the Prod network.

Principal 4: Monitoring

Using the ExtraHop Discover Appliance (EDA) we will create a trigger that identifies transactions based on the logical network names of their given address space and monitor both the ingress and egress of these networks.

In the figure below, we will outline how we are setting a logical boundary to monitor communications. In this manor we can lay the groundwork for monitoring the environment by first identifying which traffic belongs to which network.

  • You see on line 5 in the trigger below we are establishing which IP blocks belong to the source (egress) networks.
  • You then see on line 11 we are identifying the prod network as a destination (ingress).

*Important, you DO NOT have to learn to write triggers as we will write them for you but we are an open platform and we do provide an empty canvas to our customers should they want to paint their own masterpiece thus we are showing you how we do it.

 

Next we will leverage the ExtraHop Explorer Appliance (EXA) to demonstrate where the traffic is going. You will see on line 28 (although commented out) we are committing several metrics to the EXA such as source, destination, protocol, bytes, etc. This completes principal 4 and allows us to monitor the Prod network. In the figure below, you will see that we are grouping by “Sources”. You will note that Prod has successfully been classified and it has over one million transactions.

 

 

Principal 5: Establish Corrective Action
Well, in our hypothetical prod network, we have noted that there are some anomalies. As you can see below, when we filter on Prod as the source and we group by the Destinations we see that 15 of our nearly 1.3 million transactions were External. In most situations, this would go largely unnoticed by several tools however using SACCP and the ExtraHop’s Stream Analytics platform, the hosting team or SOC are positioned to easily see that there is an issue and begin the process of escalating it or remedying the issue with further investigation.

*Note, we can easily create an alert that can warn teams of when a transaction occurs outside the expected set of transactions. We also have a RESTful API that can be interrogated by existing equipment to see anomalies.

 

 

Digging Deeper:
As we dig a little deeper by pivoting to the Layer 7 communications (demonstrated in the video below) you will note that someone has uploaded a file to an external site at 14.146.24.124. Depending on what was in that file and existing policies, the mitigation could involve a cardboard box and a visit from the security guard.

 

Principal 6: Establish Record Keeping
The ExtraHop Discover Appliance has the ability to send a syslog to an incumbent SIEM system as well as a RESTFUL push. There is also a full alerting suite that can alert via email or SNMP Trap. In most enterprises, there is already an incumbent record keeping system, the ExtraHop platform has a variety of ways to integrate with the incumbent solution.

 

Principal 7: Verification
Someone should provide oversight of the SACCP plan and ensure that it is being executed and that it is having the desired results. This can either be the INFOSEC team management or hosting team management but someone should be responsible for ensuring that the shared services team(s) is (are) following the plan.

 

Conclusion:
The time has come for a new strategy, in several other industries where there is a regulatory framework for safety, compliance and responsibility there exists a culture of the operators taking responsibility for ensuring that they are compliant. The Enterprise is over 30 years old and just as the Health Inspector cannot be in every restaurant every day or a policeman cannot be on every street corner, the time has come for the IT industry to ask that system owners take some of the responsibility for their own security.

 

Thanks for reading and please check out the video below.

John

ADDENDUM!!!  (PUNKBUSTER OPTION!)

I wanted to take the time to show the next iteration of this, I call it precision punk busting…”err”..I mean Packet Capture.

The ExtraHop Discover Appliance has a feature called Precision Packet Capture.  Within the same narrative described above, I have edited my trigger to include taking a packet capture any time a policy is violated.  If you recall, I wanted to ensure that my Prod network ONLY communicated within the Prod network.  I added the following javascript to my trigger and you will see that I have instructed the appliance to kick off a packet capture in the event the policy is violated.

Busta_PCAP_Trigger

As a result of the FTP Traffic out to the internet we notice that we have a PCAP waiting for us indicating that a system has violated the Prod policy.

Busta_PCAP

 

We can also alert you that you have a PCAP waiting for you either via Syslog, SNMP or Email.  This PCAP can be used as forensics, digital evidence against an insider or a way to verify just wha the “F” just happened.

Having this information readily available and alerting either a system owner or SOC team that a policy was violated is a much easier surveillance method than sorting through Terabytes of logs or sifting through a huge PCAP file to get what you want.  Here we are ONLY writing PCAPs for those instances that violate the policy.

Thanks for reading!

Happy punk busting!!!

Thanks

John

Using Extrahop Triggers to Monitor Databases for Leakage and Performance

To continue with the INFOSEC posts I wanted to demonstrate how you can use Extrahop triggers to Monitor your Database connectivity and be able to tell when data is actually being stolen from your back end Databases. In many cases sensitive data (data that you will get sued, fined, embarrassed or fired in the event it is compromised) is located on Database servers. From an INFOSEC perspective, it is important that back end databases are only accessed by those systems that are supposed to. Let’s say we have a Web tiered application that connects to a CRM database that has all of my company’s leads. If I note that I see an IP Address that is on the users segment connecting to my back end database that is something I should be concerned about. Even if my organization has taken steps to layer then network so that only appropriate hosts can talk to one another, you still have the issue of someone potentially compromising one of the web servers then running queries and stealing data from the compromised web server. In order to prevent this I need two things, I need to know what type of SQL traffic is expected (it is actually rather rare to see “select *” in a well written application. Taking stock of the types of queries that are being run against your data and from whom/where is an important step to preventing data leakage due to SQL Injection or a trusted box getting compromised.

Outside of INFOSEC you also have the benefit of being able to see which queries are taking the longest time to run. If you have Splunk you can use some RegEx to actually parse out the performance by table (will show you a video) which could give you an indication that a table needs indexing. Using Triggers you can log and report on the following:

  • Table Performance
  • Processing time by Server
  • Processing time by Client
  • Total queries by Client
  • Total queries by server
  • Processing time by Query (which Queries take the longest time to complete)

Imagine doing an application upgrade or a schema update and being able to go to your stored procedures and see before and after performance without needing to run profiler. All of this data can be collected, parsed and reported on without a single agent being installed and without anyone touching an incumbent system.

The Triggers:
The two triggers that you need for this are located in the Triggers section and they can be copied and pasted into your Extrahop Discovery Edition. Once you have loaded the triggers you can then see the SQL traffic traverse the span and using the Console trigger you can see the data in the Console.

Note the simple Query below:

And you see the same query below, you have the IP Address of who made the query, the actual Query and the amount of time it took. (I will show this in the video too)

 

General Punk busting:
In addition to being able to see the overall performance of each SQL Query you will be able to audit exactly what queries have been run against your critical databases and even critical tables. You see in the graphic below a user (myself) is attempting to select critical data from a fictional table called CreditCardData. Note the time I ran the query (the Splunk server is not synched up with my AD domain so I am off by a few seconds)

What I look for in the results:
The first thing I note is that we see the query for PII running and I see the IP Address. The important thing to ask yourself is, “Does that IP Address look right, is that my front-end E-Commerce server verifying payment information or is that some clown on the network. The next thing I ask myself is “does that query look like something that was compiled into a stored procedure or written into the application or is this someone who has compromised a trusted server and is running ad hoc queries?”

SPLUNK QUERY: (Note the RegEx to parse out the Statement)
EH_DB_TRIGGER | rex field=_raw “Statement=(?<STMT>.[^:]+)\sProcessTime” | table _time ClientIP STMT

Another way to keep track of exactly who is accessing your SQL Servers is to keep an average ProcessTime by ClientIP and ServerIP.

What I look for in the results:
Question1:
“Are all of the IP’s appropriate for SQL Queries?”
Question2: What is 192.168.1.98 doing to 192.168.1.205 that its queries are taking several times longer to process?
Question3: What is 192.168.1.205? Is that a proper database or has someone gone rouge.

SPLUNK QUERY:
EH_DB_TRIGGER | stats avg(ProcessTime) by ClientIP ServerIP

 


So let’s say we suspect 192.168.1.98 of possible malfeasance, I can now query the Extrahop data for every query that client has run for the last 24 hours. What we note from the query below is that this particular IP address has been engaging in some very undesirable behavior and by the time you have finished your tirade of obscenities you can call security had have them delivery him or her a cardboard box and escort them out of the building. Either way, you have adequate digital evidence for both termination or, if needed, prosecution as the log itself is fully intact on the Syslog server.

SPLUNK QUERY:
EH_DB_TRIGGER | search ClientIP=”192.168.1.98″ |rex field=_raw “Statement=(?<STMT>.[^:]+)\sProcessTime” | stats count(STMT) by ClientIP ServerIP STMT

 

Conclusion:
In the past to get this type of data I have had to run the very invasive SQL Profiler. This tool can take up to 20% of your resources and you cannot run it on a long term basis. Using Extrahop’ s wire data you are able to collect all of this information (I have cross referenced it to SQL Profiler and in all cases, the metrics were EXACTLY the same) you can get access to very meaningful SQL data without impacting any systems. As always, this is completely agentless and required no reconfiguration of any SQL Server or any Client accessing the server. If you add 6 web servers to your web farm to accommodate extra front end capacity, you don’t have to worry about installing more agents, if they have an IP Address, you will see the data.

More importantly, while we have tools to detect viruses, malware and spyware it will not defend against a malicious employee or a trusted system that has become compromised. As part of the Human Algorithm, periodically inspecting the behavior of critical systems that house sensitive data is very important and should be a part of the overall INFOSEC strategy. Extrahop has better videos/documentation on monitoring SQL Performance but when used with triggers you can easily compare SQL performance and see changes in performance (better or worse) as they happen in real-time. If you are ever a Sys Admin that has taken a beating because a table needed indexing you know what I mean. Let’s be honest, they go after the systems folks first, network folks second then they look at the software. I am not indicting developers, there just hasn’t been a great deal of visibility until now. At my previous employer, we shared Extrahop metrics with Systems, Network, INFOSEC AND Developers. I think it is better to know that Application slowness is due to a single server in a back end database cluster before you double the amount of RAM, add more spindles/IOPS and upgrade the switch.

Also, please note that while this article uses SQL Server in the examples, Extrahop supports the major DB vendors (DB2, Oracle and MySQL) as well.

Thanks for reading

John

Go with the Flow! Extrahop’s FLOW_TICK feature

I was test driving the new 3.10 firmware of ExtraHop and I noticed a new feature that I had not seen before (it may have been there in 3.9 and I just missed it). There is a new trigger called FLOW_TICK, that basically monitors connectivity between two devices at layer 4 allowing you to see the response times between two devices regardless of L7 Protocol. This can be very valuable if you just want to see if there is a network related issue in the communication between two nodes. Say, you have an HL7 interface or a SQL Server that an application connects to. You are now able to capture flows between those two devices or even look at the Round Trip time of tiered applications from the client, to the web farm to the back end database. When you integrate it with Splunk you get an excellent table or chart of the conversation between the nodes.

The Trigger:
The first step is to set up a triggler and select the “FLOW_TICK” event.

Then click on the Editor and enter in the following Text: (You can copy/Paste the text and it should appear as the graphic below)

log(“RTT ” + Flow.roundTripTime)
RemoteSyslog.info(
” eh_event=FLOW_TICK” +
” ClientIP=”+Flow.client.ipaddr+
” ServerIP=”+Flow.server.ipaddr+
” ServerPort=”+Flow.server.port+
” ServerName=”+Flow.server.device.dnsNames[0]+
” RTT=”+Flow.roundTripTime
)

Integration with Splunk:
So if you have your integration with Splunk set up, you can start consulting your Splunk interface to see the performance of your layer 4 conversations using the following Text:
sourcetype=”Syslog” FLOW_TICK | stats count(_time) as TotalSessions avg(RTT) by ClientIP ServerIP ServerPort

This should give you a table that looks like this: (Note you have the Client/Server the Port and the total number of sessions as well as the Round Trip Time)

If you want to narrow your search down you can simply put a filter into the first part of your Splunk Query: (Example, if I wanted to just look at SQL Traffic I would type the following Query)
sourcetype=”Syslog” FLOW_TICK 1433
| stats count(_time) as TotalSessions avg(RTT) by ClientIP ServerIP ServerPort

By adding the 1433 (or whatever port you want to filter on) you can restrict to just that port. You can also enter in the IP Address you wish to filter on as well.

INFOSEC Advantage:
Perhaps an even better function of the FLOW_TICK event is the ability to monitor egress points within your network. One of my soapbox issues in INFOSEC is the fact that practitioners beat their chests about what incoming packets they block but until recently, the few that got in could take whatever the hell they wanted and leave unmolested. Even a mall security guard knows that nothing is actually stolen until it leaves the building. If a system is infected with Malware you have the ability, when you integrate it with Splunk and the Google Maps add-on, to see outgoing connections over odd ports. If you see a client on your server segment (not workstation segment) making a 6000 connections to a server in China over port 8016 maybe that is, maybe, something you should look into.

When you integrate with the Splunk Google Maps add-on you can use the following search:
sourcetype=”Syslog” FLOW_TICK | rex field=_raw “ServerIP=(?<IP>.[^:]+)sServerPort” | rex field=_raw “ServerIP=(?<NetID>bd{1,3}.d{1,3}.d{1,3})” |geoip IP | stats avg(RTT) by ClientIP IP ServerPort IP_city IP_region_name IP_country_name

This will yield the following table: (Note that you can see a number of connections leaving the network to make connections in China and New Zealand, the Chinese connections I made on purpose for this lab and the New Zealand connections are NTP connections embedded into XenServer)

If you suspected you were infected with Malware and you wanted to see which subnets were infected you would use the following Splunk Query:
sourcetype=”Syslog” FLOW_TICK
%MalwareDestinationAddress%
| rex field=_raw “ServerIP=(?<IP>.[^:]+)sServerPort” | rex field=_raw “ClientIP=(?<NetID>bd{1,3}.d{1,3}.d{1,3})” | geoip IP | stats count(_time) by NetID

Geospatial representation:
Even better, if you want to do some big-time geospatial analysis with Extrahop and Splunk you can actually use the Google Maps application you can enter the following query into Splunk:
sourcetype=”Syslog” FLOW_TICK | rex field=_raw “ServerIP=(?<IP>.[^:]+)sServerPort” | rex field=_raw “ClientIP=(?<NetID>bd{1,3}.d{1,3}.d{1,3})” |geoip IP | stats avg(RTT) by ClientIP NetID IP ServerPort IP_city IP_region_name IP_country_name | geoip IP

Conclusion:
I apologize for the RegEx on the ServerIP field, for some reason I wasn’t getting consistent results with my data. You should be able to geocode the ServerIP field without any issues. As you can see, the FLOW_TICK gives you the ability to monitor the layer 4 communications between any two hosts and when you integrate it with Splunk you get some outstanding reporting. You could actually look at the average Round Trip Time to a specific SQL Server or Web Server by Subnet. This could quickly allow you to diagnose issues in the MDF or if you have a problem on the actual server. From an INFOSEC standpoint, this is fantastic, your INFOSEC team would love to get this kind of data on a daily basis. Previously, I used to use a custom Edgesight Query to deliver a report to me that I would look over every morning to see if anything looked inconsistent. If you see an IP making a 3389 connection to an IP on FIOS or COMCAST than you know they are RDPing home. More importantly, the idea that an INFOSEC team is going to be able to be responsible for everyone’s security is absurd. We, as SyS Admins and Shared Services folks need to take responsibility for our own security. Periodically validating EGRESS is a great way to find out quickly if Malware is running amok on your network.

Thanks for reading

John M. Smith