Alexandra discusses a typical process for data scientists building predictive equipment models.
Whenever I start a project, we start off by flying to the customer, we'll sit in a room with some subject matter experts so that could be equipment specialists, maintenance engineers, or project manager, and then we'll bat around a few ideas. Where do you think that we should build a predictive model? Where do you think you're losing money?
The equipment specialist, they have a really gut feel because they've been working on this equipment for decades, right? You have to include them in the loop and eventually, we'll decide we should build a model on predicting dry gas seal failure for compressors because it's losing this oil company a lot of money. I'll say, "Okay, we're going to start with one compressor on one rig and I'll go home, I'll go to my desk and I'll send some emails, I'll make a few phone calls," because the first thing that I want is the sensor data.
Somebody will eventually send me a list of all of the sensors on rig X which is the rig we decided to start with. I'll look at this list of sensors and I'll see that there's 50,000 sensors. How do I know which are the ones that are sitting on the compressor because there's probably not 50,000 sensors on one compressor? Then, somebody will send me these diagrams. They're very, very confusing if you haven't seen them before. They're still confusing and I have seen them before.
They'll tell me, "Okay, this is a process and instrumentation diagram. On here, it describes how equipment relates to each other, how sensors sit on the equipment and you have to go through and you have to find-- You see all those little circles with texts in it? Those are sensors. You need to find the ones that you need for this analysis." Finally, I'll sit with somebody who is an expert, might be it takes a day to figure out everything that we need, and we'll have the right sensor data and it's maybe 50 sensors.
Then, I want to start evaluating patterns. How can I predict this dry gas seal failure? I want to know when did these failures occur so that I can overlay my sensor data with those and start to see if there's patterns that emerge. The thing is, like I said, I live in Norway, so this is all in Norwegian or it's in the dialect of Norwegian or it's in Swedish or Danish and there's thousands of these notifications relating to this equipment and to the rig.
I'll have to fly to Trondheim and I'll have to sit with somebody that can help me find the two or three failures that actually were related to this specific use case. Then, I want to know, "Okay, how do the people in the control room make that decision to fix it? How did they know that it was a failure?"
I might have to fly to Bergen and then I'll speak to the engineers that are in the control room and they'll explain to me that process. Finally, I want to know how much money I am going to save this customer if we build this model. I want to know, "Did they have to shut down the rig? If when they fixed it, how long did they have to shut down the rig? How much time did it take to find the new parts?" Then, I'll make a phone call to rig X and they'll tell me that maybe they had to shut down the rig for a week because it took that long to get those parts.
We chose dry gas seal failure because the business value is huge. A week of downtime for an oil rig is a huge amount of money, it's millions and millions of dollars, but the thing is, for a compressor which has a lifetime of 20 years, this dry gas seal failure, it occurs maybe once, maybe twice on one compressor. It's not really very high statistics. It's hard to build a successful supervised learning model with one incident to learn from. You start to see that the efficiency improves as you get more and more data.
The thing is, is it takes a really long time to get that data. When we did this for this oil company, we looked at five compressors, so across five rigs, one piece of equipment. That took two months just to get the data. This oil company has 50 assets and if they want to build, if they really want to do data science at scale and they want to employ a machine learning at a high level and they want to do it for 10 critical pieces of equipment, that would take them 15 years just to find, clean, figure out how it all joins together. I think that it goes without saying that nobody has time for that.