We’ve all seen how successful internet giants have been at deploying artificial intelligence (AI). Now companies in many different sectors want to do the same. In this video, I’ll talk with Carola about the challenges many organizations in heavy-asset industries face when deploying AI in predictive maintenance. I’ll also let you know what you can do to be successful.
Below the video, you’ll get a summary of what we talked about.
What’s been my experience so far?
When I‘ve gone into organizations what I typically see is that they're on different stages of their journey to generating value from their data. A lot of efforts have been made in pilots and POCs (Proof of Concepts). Perhaps they’ve hired a team of data scientists to build models from their data. They might have had some success building these models making accurate and useful predictions. However, what I often see is that these models are still on the data scientists' laptop. They’re not in front of the decision-makers influencing their decisions in creating value.
Why is this happening?
On the technical side, the key issue is data accessibility. The data scientists don't have the data they need to build models that can scale to a level worth putting them in production. Then they have data sparsity. A lot of organizations in heavy-asset industries don't have that rich and varied dataset making it possible for them to build compelling models.
Data is generated on the assets and these assets generate huge amounts of time series data. Why don’t the data scientists actually have this data at their fingertips? It’s because industrial assets weren’t built with data science in mind. Data is captured and stored on the asset’s equipment in legacy system historians. This is because it’s too expensive to stream data through satellite connections, it’s not reliable and you might have security risks.
What’s the solution?
I’ll always ask this question before I come up with solutions: Are you in a position to tackle the data sparsity problem, or aren’t you? There are different solutions to both.
1. You aren’t able to tackle data sparsity
Your data scientists or engineers have been given a fairly incomplete dataset. However, you do have a lot of people that actually know the equipment that you’re trying to model. In Arundo, we do the modeling at the equipment level - on pumps, generators, compressors, heat exchangers etc. These pieces of equipment might have around 5-40 sensors attached to them. If you combine these two, you’ll be able to understand the components physically and use the data generated from them.
My advice is to always start simple. Monitor the data coming off the assets. Stream the data, put it in an application and put it in front of the people that know and monitor the equipment. Then you can start cleaning and structuring your data. Do some simple threshold alerts and maybe later some anomaly detection. Generate data, capture how the operator is interacting with the application and feed it into another iteration.
Another thing you could potentially do is look at hybrid modeling techniques. There are two main ways to do advanced analytics:
- Theoretical analytical approach: Start with the underlying theory that describes your equipment and work towards the data.
- Purely data-driven approach: Take the dataset you have access to and use it to try to build a model describing the equipment.
If you don’t have a lot of data or your theoretical model doesn’t work in real life, you can combine these two into a hybrid model. Fill in the gaps with the other approach.
2. You’re able to tackle the data sparsity
If you’re on the business / corporate IT side, you potentially have the ability to tackle the data sparsity problem. Even though this is difficult, there are however some fairly simple-ish things you can do to start generating value. As mentioned, the data is often stranded from onshore. However, the data isn't necessarily stranded. This is because your operator is often co-located with your data. Instead of thinking about this purely as a cloud-based model, you can move the computation to the data. This is edge computing, where you run the computation where the data is generated on the asset.
Another thing you can do is make sure there’s a mechanism allowing a data scientist to put a model into an application and get that in front of the operator. The operator will then see and interact with the data and provide feedback to the data scientist. The data scientist can then adapt the model and put it back into the application. You’ll have a continuous cycle where the model is always in production.
What does success look like?
In my experience, what I’ve seen in organizations that have been most successful are those starting small and simple. Instead of having grand ambitions of building data lakes and hiring large teams of data scientists and building complicated models, they start small. Doing a simple model with just thresholds, just data streaming and putting the application in front of the decision-maker and getting their feedback to iterate. Once you’ve nailed the first use case and proved ROI you can expand to other use cases.