On the Edge: The Increasing Importance of Decentralized Analytics in IIoT
Learn how edge computing can solve the issues faced by IIoT companies using data-driven methods to avoid unplanned downtime in their equipment.
Big data, data lakes, cloud computing. Well-recognized terms that have generated a lot of hype in recent years. Many industrial internet of things (IIoT) companies hope to recreate the wild success of the big data giants using their data to generate value. However, many of their efforts end up in the dreaded PowerPoint graveyard:
Through 2020, 80% of AI projects will remain alchemy, run by wizards whose talents won't scale in the organization [Gartner].
This is, in part, due to the unique challenges that remote industrial assets such as vessels and rigs pose.
Let’s talk about how moving the computation out of the cloud and to the edge can help meet these challenges and allow IIoT companies make the most of the valuable data they own.
Unplanned downtime due to equipment failure is extremely costly. Traditionally, industrial operators have relied on the domain expertise of their operators to avoid unexpected equipment failures. However, due to the complex nature of industrial assets, this can't always be avoided. Increasingly, companies are looking to data-driven methods of assessing equipment health and scheduling maintenance operations to avoid such unplanned downtime.
Such data-driven methodology is well-trodden territory:
- Model Train: use machine learning to train a model on historical data to spot patterns that may indicate imminent equipment failure
- Model Predict: deploy the trained model onto a live data stream to make predictions of equipment health, thus informing operators of what actions to take to avoid unplanned downtime.
The traditional environment to run (1) and (2) is a cloud-based data lake (a centralized solution). A data lake is built to hold the historical data generated in the past, and new data streams are marshalled to this same environment where model predictions can be run.
THE CONNECTION CHALLENGE
However, these kinds of solutions are rarely successful in IIoT, which poses a set of unique challenges to scalability. There are often restrictions on getting the raw data off a remote asset that’s connected to the outside world via a satellite connection. Continuous connection cannot be relied upon, and in addition, data streaming can be restricted due to cost and security reasons.
Let’s take a look at the specific challenges:
CHALLENGE 1: STRANDED DATA
Industrial assets generate large volumes of time series data. Streaming all of it off the asset is too expensive or too much of a security risk. As a result, raw data typically gets stored on the asset in a legacy environment, such as an industrial historian.
CHALLENGE 2: STRANDED INFRASTRUCTURE
Often, substantial time and money have been spent on building cloud-based data lakes. This is typically where the predictive model is trained. However, the cloud environment is cut off from the asset and the data it is generating. The data doesn’t make it into the data lake in a timely fashion, and the models don’t have data to train on. When the underlying data behavior changes, which it does (as components are swapped in, or wear out, for example) the models trained on old data grow stale and irrelevant very quickly.
CHALLENGE 3: STRANDED VALUE
The person who makes the decisions on an asset is an operator, such as a captain of a ship or equipment operator on the rig. They are the end consumer of the analytics models and ultimately the person who will unlock the value from the data. Cloud-based solutions (applications), while of great value for offline tasks such as data exploration, are of limited use to this persona who is physically on the asset and unable to use such solutions to inform the decisions they make in real time.
The data is cut off from the infrastructure and the model. The model is cut off from the decision maker. No decisions on the asset are influenced. Ultimately, the company doesn’t derive value from the valuable data that they own.
‘If the mountain does not go to Muhammad, Muhammad will go to the mountain’:
Edge computing involves decentralized data processing at the edge of the network. It's being increasingly discussed in an industrial context: Gartner forecasts that 75% of enterprise-generated data will be created and processed at the edge by 2022.
Traditionally in IIoT, edge computing has been used to stream the data off the asset and to a centralized cloud environment or a data historian. However, there is another edge computing use case and that's running the analytics models themselves on the edge agents: edge analytics. By this, I mean running the computations on the data as it streams through the edge agents.
The computations can take the form of aggregation, machine learning or model prediction. The edge agent is, at the end of the day, simply a piece of software that can run whatever computation is deployed onto it, as long as the underlying hardware is powerful enough.
Edge analytics provides three solutions to the connection challenge:
SOLUTION 1: CONNECTED DATA
The traditional use of edge agents is to stream the data to local data storage, which can be backed up in the cloud whenever feasible. However, edge analytics enables data aggregation on the edge and subsequent streaming of the aggregations to a cloud environment. This means that important data is still streamed off the asset. However, aggregations use potentially much less bandwidth than streaming raw data.
SOLUTION 2: CONNECTED INFRASTRUCTURE
Hardware solutions are increasingly capable of running complex models, such as online machine learning. The predominant model train infrastructure can be run on the rigs, next to the data at its point of origin. Data doesn’t need to make it off the rig to make it into an advanced analytics solution. The model can update its parameters online as data streams through it and thus always stays relevant to the most recent data.
SOLUTION 3: CONNECTED VALUE
Edge analytics allows the model predictions and supporting application to be run on the rig. The solution is collocated with the operator, enabling them to make better decisions at the right time.
CONNECTED DATA, CONNECTED INFRASTRUCTURE, CONNECTED VALUE
By decentralizing the computing infrastructure and bringing analytics to the edge, data, infrastructure, and value are connected together at the source. A complete solution is formed, and companies can start using their data to influence decisions and generate value.