Creating Business Value from Unstructured Data

Consider an oil platform operating in the North Sea. This platform generates data from complex control systems, personnel and resource planning systems, and various computers and devices—in addition to tens of thousands of individual sensors attached to equipment, meters, and gauges.

Meanwhile, important data related to the platform’s operations, such as production losses, failure notifications, and maintenance records are generated at a consistent rate and stored across systems that vary by asset. The long-term trends of near-ubiquitous sensors, cheaper data storage, and cheaper compute resources drive an ever-growing blizzard of operational data from industrial assets such as this platform.

However, the vast majority of data generated each day from, about, or related to this platform will go unused and unnoticed.

Accessing, analyzing, and driving daily business decisions from this data is the fundamental challenge and promise underlying the Industrial Internet of Things and the big data revolution. It is the future of machine learning and smart industrial operations.

The format of this data varies in nature. Some of it is highly structured. Sensor data, for instance, is a time-stamped measurement of a physical value such as temperature, pressure, vibration, or flow rate. Sensor records may be generated at very frequent intervals, and may need to be understood in the context of other sensors, but once found and contextualized, the information in any individual sensor record is relatively easy to parse.

Often, however, critical information related to an industrial operation is much less structured than sensor data—but it needs to be joined with sensor data in order to gain valuable insight. For instance, on our oil platform, we might want to investigate whether different failure modes experienced by compressors could be predicted in advance by patterns in sensor readings. To achieve this, we would have to sift through process diagrams and sensor hierarchies to find all of the sensors related to compressors. We would then need to review thousands of historical entries in maintenance logs, written by humans in a natural and unstructured format (often in various languages!) to find which common failures occurred on compressors, and when.

This is not easy. For instance, one human operator might record a maintenance log entry that states: “the compressor leaked a light brown fluid that smells like eggs.” Another operator might enter: “maple syrup was leaking out of the compressor and it smelled like breakfast.” Human subject matter experts understand that these two entries refer to similar types of compressor failure. However, it is extraordinarily time-intensive, expensive, and error-prone to manually review and categorize years of unstructured maintenance data for any specific piece of equipment—much less all of the equipment across an entire fleet of platforms.

The process of getting input data into a state where it is properly understandable by a machine learning model can take significant time and effort, with many data scientists claiming that they spend 80% of their time finding, cleansing, and joining the data required to build advanced analytics models.

Data integration is a fundamental challenge to data science adoption in heavy industry, and not just in terms of data science and data engineering time and effort. Taking a manual approach to this process, it could take two months to access and blend data to analyze a single type of equipment across five sites. At that rate, in order to analyze ten types of equipment across fifty sites, it would take over fifteen years!

One solution to this challenge is to train machine learning models (through human-augmented review of topic clusters) to automatically classify maintenance entries by failure mode, and only query human experts in rare exceptions. This approach rapidly accelerates the data integration process. It makes the ultimate goal—a model that can ingest datasets from sensors, failure notifications, and maintenance logs on an ongoing basis, and return processed datasets and critical information related to equipment performance and potential downtime—much more attainable.

Automated labelling of unstructured event data allows machines to adapt to human processes. This ultimately enables humans to make better, more informed business decisions. This is the future of heavy industry.

‍

Written by Alexandra Gunderson and Ellie Dobson

Let’s connect the dots!

Join other leading industrial companies and discover how Arundo’s AI Foundation adds insight and intelligence to your operations

Get in touch

All articles

Article

Towards the Sentient and Autonomous Factory

What a Cup of Coffee Can Teach Us About AI-Driven Manufacturing

Article

AI Assistants for Operators: Setting the Foundation for AI-Driven Industrial Operations

Using Large Language Models and AI Agent Systems to Provide AI Assistants for Improved Industrial Operations

Article

Building an Industrial AI Companion using Arundo Foundation

While large language models (LLMs) may struggle to generate insights directly from time-series data, their strong pattern-recognition capabilities make them well-suited for identifying and synthesizing already existing insights. We therefore argue that an architecture featuring a separate graph-based domain model and time-series data storage is the ideal architecture for a generalized AI companion serving the heavy asset industry: An agentic AI can first identify relevant nodes (assets, sensors, and a large variety of pre-computed models) for answering a user prompt using a mixture of APIs, query language generation, semantic search, and graph traversal. Only after this filtering step is time-series data accessed and retrieved to synthesize an answer.

Creating Business Value from Unstructured Data

Let’s connect the dots!

More articles

Towards the Sentient and Autonomous Factory

AI Assistants for Operators: Setting the Foundation for AI-Driven Industrial Operations

Building an Industrial AI Companion using Arundo Foundation