August 14, 2013
Part IV: An Application of Log Content Analytics
By: Colin Puri

If you read my previous three blogs about log content analytics you may have thought that I was done espousing my views. For better or worse, I have more to say. First of all, I lied about it being a three part series. It is actually a 4 part series now, you can sue me if you like. Although, I don’t own much so that may be a futile practice :)

Continuing on from the previous thread of the past 3 blogs I wrote, you may be scratching your head asking what can you REALLY do that goes beyond vendor tools? In this part of the series I will show you an example of a log content analytics application that we have created at the Accenture Technology Labs, how it sits in the vendor eco system, and how it builds on top of existing vendor solutions to create new functionality.

First off why is additional insight needed? As the IT Operational Management field grows, so too does log file management. With the field growth, comes an increase in spending. Log file management spending steps in line as log files increase in size. As the cycle continues onward, enterprises must spend more money to upgrade their infrastructure to accommodate the influx of information generated in log files. As these log files increase, it becomes more and more difficult to parse them, find errors, track issues, and particularly so when cross log correlations come into play. To this point, we identified opportunity areas for enhancements in ingestion and parsing, analysis & exploration, and visualization. Using our domain knowledge and expertise we chose to focus on the analysis and exploration piece by providing a vendor agnostic way to extrapolate information from log files that required little input from the end-user.

We developed an asset framework that we dubbed “SMART Log AnalyzER” or “SMARTER” for short. The framework provides enterprises with a systematic and effective mechanism to understand, analyze, and visualize heterogeneous log files to discover insights. What we found when analyzing log files is that many of them are really transaction logs.

What does that mean? That means that the log files contains information that can be used to uniquely identify events, event probabilities and statistics, and discover the temporal relationships between events. Additionally, that means we can mine it for information and pull out a graph that depicts behaviors. In doing so, we created algorithms to discover temporal causality relationships. Having done this exercise again and again on specific data sets and for specific clients, we strove to create an asset that could be used in a pipeline on various data sets without customization and without worry concerning the underlying vendor infrastructure, be it in an onsite data center or housed in the cloud.

The SMARTER pipeline is a several step process. Log files are ingested into a vender solution where the files are indexed and from there the SMARTER application acts as a wrapper or layer above the vendor tool where it performs the task for log file extraction to a vendor agnostic format. This vendor agnostic format is then piped into our algorithms where it mines and extracts the temporal causality relationships of trace events by treating the log files as a transaction log. Trace entries are linked together by discovering the unique identifier for a sequence of events. Additional statistics are mined that allow us to predict the likelihood of events following each other in time. This data then can be used to seed real time analysis for anomaly detection and pattern recognition.

What is really fascinating is how we have been able to use data from large media companies to infer demographics, gender groups, political affiliations, and so on. The discovered data in turn can help marketing campaigns properly target audience for maximized profit with minimized effort. Additionally we have shown using the same algorithms that we can detect anomalous patterns in the data present from logistic companies as packages and shipments traverse through their transportation system, allowing for correction and thus avoiding costly misplaced shipments. Further exploration also demonstrated the tools capability for pulling out anomalous network events from data transmission logs. Examples of the output are shown in Fig 2 as a screenshot.

Accenture Log Content Analytics Diagram

Fig 2. Screenshot of Trace Sequence

Analysis Output

To aid the end user seeking the insight we also developed filtering capabilities to drill down and really see what is going on without all of the noise of a large mined graph structure. Interactions of events within a log file also get clustered together as their relationships strengthen over time. If at any time events are filtered out due to user selections, they can be toggled for viewing and explanations as to why events are filtered or considered anomalous is easily obtained by clicking on an event in a graph within our tool.

Great! So how do you use it? The steps are simple:

  1. Select the input

  2. Select any parameters to filter out particular event views, a purely optional exercise

  3. View the output

  4. Gain insight!

It is as simple as that, it was designed to be a light weight framework that extends the functionality of existing vendors! Our framework was also designed to run on copious amounts of data and scale elastically. If you are at all interested in the concepts of log content analytics and how it can help you, reach out to us on the Data Insights team at the Accenture Technology Labs and we can show you how log content analytics can help you gain more insight into your operations.

Popular Tags

    More blogs on this topic