There are tons and tons of log files that reside in an enterprise. The old adage, one man’s trash is another man’s treasure, especially holds true when it comes to log files – administrators may see a bunch of junk, but the data team sees a treasure trove of scrumptious data “noshables”. It is all about the context. In other words, if you want to gain insight, don’t throw it away!
So, how do you get insights from log files? Content analytics. In its simplest form log content analytics is the science of making sense of computer-generated records. However, it is much more expansive than that. Log Content Analytics is the application of analytics and semantic technologies to (semi-)automatically consume and analyze heterogeneous computer-generated log files to discover and extract relevant insights into a rationalized, structured form that can enable a wide-range of enterprise activities.
What can log content analytics be used for? Simple, log content analytics enables the following:
Audit or Regulatory Compliance – The goal that corporations or public agencies aspire to in their efforts to ensure that applications, for example, adhere to relevant laws or regulations.
Security Policy Compliance – The adherence of individuals and applications to the policies of a company that ensure protected access to assets.
Digital Forensic Investigation – The investigation of details and tracking down of application traces as it leaves footprints of its operation.
Security Incidence Response – Monitoring security violations that may be present in alert logs.
Operational Intelligence – Business analytics that deliver visibility and insight into business operations often in real-time.
Anomaly Detection – The detection of patterns in a given data set that do not conform to an established normal behavior.
Error Tracking – The detection of error messages and alerts.
Application Debugging – The process of debugging an application though the use of trace logs.
Understanding the definition of log content analytics is but one part of the puzzle when understanding what lies within log files. This leads to the next question, how is log content analytics generally performed as of today? In general there are six basic steps in the process of log content analytics extracting the information and utilizing it.
File selection and ingestion – During the selection and ingestion process, log files are selected and consumed. This may be done either manually or through an automated process via a vendor tool. It is at this stage where tools aggregate log files from many sources into a single point of access.
Parsing and extraction – Log files are parsed and relevant features and values are extracted. This is the most critical stage that enables of the following steps without which storing, index, analysis, visualization, and publication would not be able to take place.
Storage and indexing – Once a log file is parsed/extracted it is stored and its contents are made index-able for the purpose of searching and querying of information. This is the second most important phase as it enables the analysis and exploration phase.
Analysis and exploration – This phase is where a user or administrator interacts with log files, generates queries, analyzes the results, and iterates until the desired information is discovered.
Visualization – The visualization phase is where the data starts to come to life with bar charts, line graphs, and so on for later use in dashboards. This phase is important for properly communicating the information contained in log files in a succinct and impactful manner.
Publication and usage of results – The last stage where individual visualizations are collected into dashboards to gain actionable insight and gathered information may be pushed to other destinations for consumption.
Stay tuned for my second blog where I’ll share my thoughts on what makes a good vendor solution for log content management solution, and how these vendor solutions can be used as a starting point for log content analytics.