By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookie Policy and Privacy Policy for more information.

Organizing Data For Industrial Data Science

In this video Alexandra shares the methodology behind tools we are building to structure and link data from different sources within asset-heavy industries.


You should connect data along two axes for each asset. The first one is this equipment hierarchy so which sensors are on which equipment and which events are related to each equipment. That way, you're not having to sift through all of these complicated diagrams, lists of sensors, thousands of notifications, it should just be organized this way. Once you've done this for one asset, you need to do it for all of your assets so that you can really employ data science at scale.

Once you do this, if I'm a data scientist, I want to be able to say, "Give me all the data that you have in your portfolio," so all of your 50 rigs, I want all of the data related to all of your compressors, specifically dry gas seal failure. In order to do that, you have to label the events. These are how I think the steps to fully scalable data science and industry should be. The first one is identifying locations and relationships between sensors.

One step could be P&ID mining, these complicated diagrams, figuring out a way to automatically parse them so that you can say which sensors lie on which equipment, how does equipment relate to each other, because you also want to say, "What is the equipment that's upstream or downstream from the compressor," because that could be an indication of failure instead of just what's available in the compressor. Then, you also want to put all of the sensors to the hierarchy.

Next, you want to focus on labeling your event data. There are these three types of event data: the failure notification, work orders, and downtime, and all of them should be organized by location and specific failure modes so that way, you can start building these supervised models. Once you've labeled your sensors and your events and you have an indication of where they occur on the equipment, you can join everything together in this cohesive landscape like I showed on the slide before.