Many enterprise environments are awash in copious amounts of log files. Sifting through the numerous log file data sources to find errors and anomalies can be a daunting task. However this rigor it is critical to application debugging, anomaly detection, compliance, investigation, error tracking, operational intelligence, and root cause analysis to name a few (see link for reference on more information concerning the activities). Anomalies are those interesting tidbits in data that, when found, provide the electricity to the proverbial light bulb that hovers in our heads as we hunch over and sift through a deluge of data. In short, they help facilitate the insights.
Some time ago a colleague of mine blogged about anomaly detection and why it is important (see link). Continuing along that thread, this blog entry will give insight and provide a basis into how anomaly detection in the first installment and how it can actually be performed with a working example in the second installment.
First, let me provide our motivation and background. Log content analytics (LCA) is the application of analytics and semantic technologies to (semi-) automatically consume and analyze heterogeneous computer-generated log files to discover and extract relevant insights in a rationalized and structured form that can enable a wide-range of enterprise activities. Often data present in the contents of log files is characterized by log traces with unique identifiers, timestamps, events, and actions. These unique attributes can be indicative of underlying behaviors of applications. Through mining and correlation, relevant information contained within log files can be modeled using learning techniques.
Our goal when creating our log content analytics framework (introduced at this link) was to provide analytics that extend beyond the capabilities of existing technologies by utilizing machine learning techniques to increase data literacy. To that end, it is important to provide a contextual and intuitive metric for anomalous behaviors and patterns that exist within many application logs. With our framework that we introduced in previous blog, we sought to extend its capabilities by providing a methodology that can detect abnormal behaviors and patterns in-flight as they emerge and deliver information that can be of use proactively to provide a metric for the contextual anomalousness for a sequence of events. We do this by comparing newly discovered information against the probability distribution of patterns present within an overall learned model of behaviors that has been seen in the past. Simply put, if a pattern of events doesn’t look like what we’ve seen before than it is probably anomalous. Within our framework we do this by utilizing machine learning understand behaviors of applications and creating a model represented as a graph and then utilize concepts of graph theory to provide a measure of what is currently being logged compared to that same model we learned previously. Numerous models can be learned for multiple times of day, days of the week, weeks of the months, or times of the year. Adjusting for different time frames allows for us to dynamically adjust the sensitivity of our approach.
Now you have a basis for how we can perform anomaly detection by mining log files for processes. In the next installment I’ll go over how we treat log files as graph structures over which we can apply concepts of graph theory to pull out interesting insights. Additionally, I’ll go over a brief walkthrough of how we can utilize one such graph theory method to detect anomalies by looking at complex behavior chains. Don’t worry, I’ll save you from the complicated math details!