Mining data in maritime vessel traffic
February 18, 2022
February 18, 2022
In 2018, there were almost 95,000 commercial transport vessels and 4 million commercial fishing vessels documented in the world’s oceans and waterways. These types of vessels account for a large portion of the world’s economic activity, transporting many of the products we buy, catching and/or transporting much of the food we eat, and in the process, posing significant challenges to the authorities—especially countries and international organizations—with the responsibility of keeping vessels safe and their activities efficient.
Since 2004, most commercial vessels are required to participate in the worldwide Automatic Identification System (AIS) to improve maritime safety of navigation. Within this system, each vessel broadcasts its location at varying intervals so that every vessel can see the location of other vessels broadcasting in the vicinity. Unsurprisingly, in this era of big data, the billions of AIS data points broadcast across the world each day are increasingly serving as input to modern analytics. This helps improve safety and efficiency, even as fleets continue to grow in size and activity.
At Accenture Federal Services, we have been developing geospatial analytics and Machine Learning (ML) techniques to mine geolocation data like AIS for patterns to help identify anomalies and suspicious behaviors. The effort began as an R&D program in which a customer challenged us to identify specific types of vessels engaged in specific types of activities. To do this, we developed the Trajectory Data Mining Framework (TDMF), a distributed application that operates on geolocation data.
In this post we describe TDMF and how we are using it to help with two particular challenges facing the maritime industry: collision risk prediction and detection of vessel spoofing.
Using TDMF, we are able to infer many interesting details about vessels from trajectory data alone. The system first runs a sequence of detectors to capture features for ML across a range of classes, including kinematic, temporal, and spatial. TDMF then classifies vessels by size (small, medium, and large), as our earliest experiments with trajectory data showed that vessel size is a key discriminator for many subsequent vessel analytics. After size prediction, TDMF performs behavior modeling, an ML process which identifies the degree to which the behavior of every vessel in a region resembles that of a set of common vessel types. TDMF delivers the behavior prediction as a probability vector, with one entry per vessel type, providing a profile of how the vessel behaved. The vessel entry with the highest probability can be used as a vessel type prediction, which is how we answered the original challenge posed to us by our customer: “find instances of these types of vessels.” For example, Figure 1 depicts the fishing vessel behavior modeling findings from a region of approximately 600 square kilometers. The known fishing vessel trajectories are shown in green, and trajectories whose classified behavior most resembled fishing vessels (i.e., the predictions) are in red. Among the most encouraging artifacts visible in that diagram are the extent to which the algorithm appears to have learned from the back-and-forth effects apparent in some of the known trajectories at sea.
<<< Start >>>
Figure 1: Fishing vessel trajectory prediction. Known trajectories are green; predicted are in red. The filaments around the perimeter are waterways.
<<< End >>>
After making predictions about vessels based on features derived directly from each vessel’s AIS points, TDMF uses a form of indirect learning that derives features for ML from the nature of the activity in time and space across the region. Further, TDMF indirect learning is implemented so as to allow local learning, which uses localized features from small areas to learn about phenomena specific to each area. Indirect, local learning allows TDMF to make predictions about even tiny amounts of geolocation data – for example, providing predictions about a vessel at the location of a single satellite photograph.
Once TDMF direct and indirect learning are complete for a region, the application can package its detections and learning artifacts into a TDMF Observer component, which allows users to generate predictions at near-real-time.
Across each region, TDMF learns the associations between local activities and the conditions that have historically been associated with risk of collision in that specific area. This localized predictive capability is packaged into TDMF Observer, which can then warn in near-real-time when local conditions resemble those that have historically been associated with high risk of collision.
Figure 2a depicts areas within the Hormuz region where there were incidents of vessels getting dangerously close to one another. Unsurprisingly, some of those areas are in and around ports where, for example, tugboats and tenders get close to other vessels on purpose. However, other areas are not near ports and might be cause for more concern. For example, the areas of risk in the lower right of Figure 2a are localized around the port at Fujairah Harbor. Above that area, to the right off the top of the peninsula, another, small area of risk is shown in the open sea. Figure 2b then depicts the results of TDMF Observer’s real time prediction of areas where there is increased risk of collision, showing results in those areas of Figure 2a just mentioned.
<<< Start >>>
(Left) Figure 2a: Region-wide Collision Risk; (Right) Figure 2b: Real-time Collision Risk
<<< End >>>
Local learning enables TDMF to detect the different kinds of collision risk present at different locations. The conditions that contribute to increased risk of collision in, for example, port areas vs. areas at sea are different, and TDMF learns this automatically.
Across each region, TDMF automatically learns “normal” activity using a wide array of measures, including the types of vessels present, the times of day they are present, and the different activities in which they are engaged. With a detailed picture of normal, TDMF Observer is then able to identify abnormal, or anomalous, vessels and activities, including AIS spoofing. Spoofing occurs when illicit actors inject false data into the AIS system that makes it appear as if a vessel is present at a time and place where it is not actually located. An increasing number of mysterious spoofing incidents have been reported in the media in recent years and it is thought their purposes range from simply disrupting normal activity to making particular vessels more vulnerable to interference or attack.
TDMF identifies vessels as anomalous if they differ significantly from what TDMF has learned is normal in the local areas where the vessels have sailed. To test how well this approach would detect spoofing, spoofed AIS points were presented to TDMF Observer along with normal AIS points. Figure 3 depicts two spoofed voyages (black) in the Hormuz region, shown over a background of cargo vessel trajectories (green). The more compact, erratic black line on the right is an actual coast guard voyage taken from another region. The longer, straighter black line on the left is an actual cargo vessel voyage, also taken from another region. The cargo vessel voyage was explicitly chosen and placed so as to be nearly indistinguishable from normal Hormuz traffic, intentionally posing a more difficult spoofing detection challenge.
<<< Start >>>
Figure 3: A spoofed cargo vessel voyage on the left; a spoofed coast guard vessel voyage on the right.
<<< End >>>
In the test, TDMF Observer was presented with eight hours of Hormuz AIS data it had not seen before. Both spoofed vessels were introduced into the AIS stream approximately 90 minutes into the test. Each of the spoofed vessels sailed along their fake courses throughout the remainder of the test time. TDMF Observer first identified the coast guard vessel as anomalous approximately 1 hour into its fake voyage, after it had seen the vessel execute too many turns and sail too many times in opposition to prevailing traffic patterns. TDMF Observer identified the fake cargo vessel as anomalous 15 minutes later.
<<< Start >>>
Figure 4: Snapshot from the TDMF Observer interface. Areas of increased collision risk are red dots; the trajectories of anomalous vessels are black.
<<< End >>>
TDMF direct and indirect learning aggregate data on a spatial scale of approximately 1 square kilometer and a time scale of 4 hours, though these can be generalized based on use case and available compute resources. The real power of the analytics learned from this data is apparent when you consider data at scale. In a large region with an area of nearly a quarter of a million square kilometers over a period of one year, TDMF generated behavior predictions and other analytics for more than 105,000 vessels, resulting in a pattern of life rich enough for any number of further analytics, including port modeling and specialized anomaly detection.
TDMF tracks the nearly 5 million vessels in oceans and waterways around the world to assess collision risk and detect vessel spoofing. Using AIS geolocation data with ML techniques, this distributed application works to improve safety and efficiency in maritime navigation to protect products that support the global economy.