Skip to main content Skip to Footer

April 28, 2014
The Right Big Data Technology for Smart Grid – Distributed Stream Computing
By: Smart Grid Big Data Management Team

Accenture Technology Lab’s Smart Grid team has been working on developing a big data analytics/services solution which will help the utilities to do monitoring, analysis and prediction on power grid situation and take action in a better way. The technology of stream computing which is a new computational paradigm that processes continuously generated data is one of the key parts.

The most conspicuous features of future electric power grids are: 1) the large number of sensors being deployed across the power grid infrastructure. 2) The continuous stream of information fed back from these sensors. For example, a provincial utility grid in China can span up to 20 million meters. If you consider that data is collected every 5-15 minutes, this grid alone will generate about 1 terabyte a day. 3) Data must be processed in real-time to take action in time. As a result, the amount of data from sensors to be analyzed is increased by several orders of magnitude comparing to the current system, which offers considerable challenges.

Comparing to traditional batch processing approach, stream computing can process large amounts of data and process the data in real-time. This makes stream computing extremely suitable for smart grid data processing applications.

There are several open-sourced distributed stream computing platforms available, including S4, Storm, StreamBase, HStreaming etc. For the purposes of our research, we explored Storm’s stream computing framework. As a free and open source, distributed in real-time computation system, Storm makes it easy to reliably process unbounded streams of data in real-time. Storm is widely used because it is scalable, fault-tolerant, and can be used with any programming language, easier to set up and operate. Storm has many use cases: real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.

Some key abstractions of Storm are as below:

Tuple – A named list of values where each value can be any type. For example, a “3-tuple” might be (1, “Storm”, 50.98)

Spout – A logical computing node which is the source of stream in a computation, spout will read tuples from an external source and emit them into the topology.

Bolt – A logical computing node which process input streams and produce output streams. They can: run functions; filter, aggregate, or join data; or talk to databases.

Topology – Define a network of spouts and bolts which represent the overall calculation. A Storm task topology is a directed acyclic graph composed of a series of Bolt and Spout nodes (as in the following figure).

Storm topology

Figure 1: Storm topology.

In the Technology Labs, an innovative Storm task topology has been designed to process the data timely and efficiently. In the design, the power grid resources are divided in to 5 levels according to the power grid network structure. The first and lowest levels refer to equipment such as transformers. The Level 2 to Level 5 refer to higher concepts, such as distribution feeders, substations, zones and power companies The data streams are ingested by Spout and then grouped by the related equipment. They are then transmitted as a tuple which might be {(10089, 20140414112356312, 221.5),(10090, 20140414112356319, 222.1)} to the first level processing Bolt. The new data streams generated by the lower level processing Bolt are aggregated according to the parent containers, and transmitted to the higher level processing Bolt and so forth.

Based on the solution, not only any particular data analysis for the power grid equipment can be performed by each computing node, but also some comprehensive power grid network analyses can be realized by computing across the task topology. For example, we can do real-time transformer situation identification on the first level, a distribution feeder load analysis on the second level and a system peak load tracking throughout the topology.

A successful practice of the topology design represented above is the ability to do real-time tracking of system peak load and save the whole grid snapshot data at the moment. This will help the utilities to analyze the load distribution and take action on grid dispatching and operation optimization. Currently, the data collecting rate for SCADA (supervisory control and data acquisition, a system which for gathering and analyzing real-time data to help electric utilities to monitor and control the power grid) is every 3-5 second and every 15 minutes for AMI (advanced metering infrastructure, an architecture for automated communication between a smart meter and a utility company .The goal of an AMI is to provides utility companies with real-time data about power consumption and allow customers to make informed choices about energy usage based on the price at the time of use). The volume of data to be cached and processed for a regional distribution network could exceed 10 gigabytes. The size of the data could be even bigger in a provincial power grid. Therefore, as smart grid evolves, more and more data from new installed sensors needs to be processed. It is a big challenge to track system peak load in real time using the traditional methods. Based on our innovative Storm task topology design, we can process the large volume data of a regional distribution network by using 10 servers in our test environment.

Figure 2 illustrates distribution power grid power flow chart and the snapshots tracked on every peak load moment.

Distribution Power Flow

Figure 2: distribution power flow

Figure 3 illustrates the system load curve comparing with the load curves of the sub-grids (the feeders). The upper curve is system load and below three curves are the loads of each sub-grid. It can be easily seen that the middle sub-grid is the main contributor to the system peak load. Power grid operation dispatcher can take appropriate loading diversion measures accordingly.

System load compare with the load of sub-grid.

Figure 3: System load compare with the load of sub-grid.

With Smart Grid going forward, energy utilities are dealing with huge volumes of data which are sustaining growth. Stream computing is a high-performance computer system that can analyze large amount of streams from multiple sources in real time. Stream computing is able to increase speed and accuracy when dealing with data handling and analysis for Smart Grid.

We also want to emphasize that the distributed stream computing solution we developed for handling the Smart Grid real-time data could be easily adapted to be applied to other industries to meet their real-time data processing requirements.

Popular Tags

    More blogs on this topic