In the previous installment I talked a little bit about how we can do anomaly detection and gave some background to the framework we use to perform anomaly detection on log files. Now it’s time for a working example of how we can detect an anomalous set of behaviors. There are many ways to perform anomaly detection, what I am going to show you is one such method. This method uses events and the likelihood that a series of other events occur after each other, or an event sequence, and compares against a similar set of previously learned behavior or event sequences.
Suppose we have a log file and contained therein are trace entries that relate to each other. The first step is to discover relationships and learn the probability that when one event occurs then there is a measurable likelihood that a certain event will follow it. For simplicity sake, let’s say that our log file has multiple trace entries, and that each trace entry contains one event within it. In this context, a trace entry is a line within a log file that has been written out by some application and is wholly onto itself (see figure 1 for an example).
Figure 1. An example excerpt of traces from a log file where it contains a timestamp, an identifier, event name or type, and a descriptive status message for each trace entry
Within our sample log, events are labelled as Event_1, Event_2, Event_3, Event_4, Event_5, & Event_6. Each of these events have a probability of occurrence within our sample log file, they are also associated to each other with some feature such as an identifier (but not necessarily limited to just one feature that relates one trace entry to another). Within this log file we have multiple traces occurring at different times, having different identifier features, and of course each trace may or may not have a different event type recorded within it. In Figure 1, you can see a small except, but going forward imagine that there are many more traces that have been similarly recorded.
The goal is to mine and learn application behaviors to which we can run comparisons against. Typically we would want to mine out periods of time but for the sake of this example we will mine the entire log file. If we mine the sequence of events for particular identifiers within in our example log file we can pull out a learned model with probabilities. We can aggregate all of the sets of behaviors or events that relate to each other using an identifier to create an event sequence that is tied together using an identifier feature. This identifier uniquely maps one event to a chain of behaviors and distinguishes it from another chain of behaviors.
An example would be a session id for a web browsing session that separates one users set of events from that of another user. When aggregated this model represents the likelihood that given the occurrence of one event then there is a measurable probability that another certain event will follow it in time (see figure 2 for an example aggregate model of all behavior or event sequences learned from a log file). This means that if we see event_1 (represented as the number one on the graph in Figure 2) then there is a 100% chance that the next event in a series for a behavior sequence will be event 2, and so on. You can also see that there are start and stop nodes; these nodes represent the aggregate view of where a set of behaviors begins or ends.
Furthermore, from the information in the model, we can see that there really are only two event sequence types. They either begin with event 1 or event 2 (labelled as “1” and “2” in Figure 2 and have certain probability associated with them) and that if we have a set of behaviors starting with event 1 then it will either end with event 4 or event 5 occurring. Likewise if we are looking at event 3 in a series, then there is a 5% the next event in the sequence will be event 5, 15% chance for event 4, and 80% chance for event 3. What we have done is created an aggregate view of these behavior sequences present in a log file and represented them as a graph.
Figure 2. Directed graph with label, weights, and start-stop nodes where each of the nodes (circles) represent the respective events. The edges have transition probabilities that represent the likelihood of a given event occurring after another given event in time.
Once we have a learned model of previous behaviors we can then test newly logged behaviors against that model and determine the degree to which any one event or series of events deviates from this learned model. This means that if we have an event sequence beginning of event 1, event 2, and then event 4 then we have an event sequence that is somewhat anomalous to what we have seen before. Essentially without going into the mathematics involved, we can take a set of behaviors and map them against this model to provide an intuitive metric that is between 0 and 1 that also provide a significance measure as to how important a finding of anomalousness may be. As an example, take the top sequence of events or a “walk” from Figure 3 (top of graphic) as a newly discovered trace sequence that is mined from a log file while in-flight. To determine how anomalous this newly discovered sequence of events is compared to our past event sequences we can decompose the event sequence into a graph with transition probabilities (Figure 3: bottom of graphic)
Figure 3. An example incoming sequence of events (top) and its decomposed graph representation (bottom)
We can then compare the chain of events (also known as a Markov Chain) to all possible event sequences in our model and determine the degree of match. In this particular case, based on what information we have of the in-flight behavior pattern we can say that it is approximately 28% anomalous. There is some mathematics involved to arrive at this number and it involves measuring overlap, distance, and correlation of this in-flight chain against all other known chains of events within the learned model. Essentially, we can score the anomalous metric against the probability distribution of all of the sequence of event chains contained in the learned model of previously known behavior sequences. Providing a value between 0 and 100% greatly aides an end user, algorithm, or an analyst with a grade value.
Additional statistical methodologies can be applied to determine the significance of the provided anomalousness grade as well. Combining all of the metrics of anomalousness and a significance factor of our findings provide a confidence that is paramount in determining the degree to which a series of events and activities can be considered a threat or at the very least an interesting lead to follow up on or to be ignored.
We have implemented a method just like this in our log content analytics framework and it can be used for discovering typical behaviors as well as for proactive monitoring of emerging in-flight event sequences that aid adjudication. If you are intrigued by anomaly detection and log content analytics then give us a call over in the Data Insights group in the TechLabs and we can show you how it is done!