By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookie Policy and Privacy Policy for more information.

Getting Started with Machine Learning for Compressors

Getting Started with Machine Learning for Compressors

Downtime is often a significant cost and source of revenue loss for operations requiring gas compression. Learn how to get started with machine learning to reduce downtime.

If you work with natural gas, chemicals, or other industrial processes, it’s likely you deal with compressor systems. These systems tend to be complex, fit-for-purpose, and expensive. Because of their role in maintaining phase states, compressor malfunction can affect many parts of an operation. Downtime – whether due to failures or disruptions or planned due to scheduled maintenance, equipment movement, upgrades, and so forth – is often a significant cost and source of revenue loss for operations requiring compression.

For industrial companies that want to take advantage of machine learning (ML) algorithms and artificial intelligence (AI) applications for critical equipment, compressors are often a natural place to begin. In this article, we’ll explain how to get started.


Machine learning refers to computer programs that are trained to identify relationships, patterns, and classifications based on data, rather than taking advantage of specific underlying knowledge of how a system works.

Mastering complexity with machine learning algorithms

Machine learning applications can achieve a high level of complexity. On one hand, they can process high volumes of data and look at many inputs at the same time. On the other hand, technologies such as Computer Vision (CV) and Natural Language Processing (NLP) can take advantage of various types of inputs, for instance, images and text. This allows machine learning applications to learn the complex interactions that exist in highly engineered systems.

A machine learning algorithm is trained on data specific to a system to learn its characteristics and use them to make predictions or categorizations. We can train algorithms on available sensor data to capture how the system’s conditions, e.g., pressure and temperature, relate to compressor performance or its characteristic states. By contrast, a compressor simulation program might use formulae based on gas mix and thermodynamic conditions to estimate compressor performance. 

The main difference is then, that a simulation program applies predetermined knowledge to the system, while a machine learning program learns from data collected on that system (or similar ones). This makes the latter adaptable to the specificities of each individual system or asset, and much more versatile and efficient in doing this at scale. Moreover, machine learning programs go beyond phenomena for which we can construct a physical model.

Better insights through combining machine learning and traditional simulations

Often, a better understanding of the overall system is possible by combining a machine learning approach with parameters from traditional simulations. For example, physical parameters can be found with statistical learning, by applying machine learning algorithms. However, machine learning programs don’t necessarily need to know why a system operates in a certain way. They simply analyze what has actually happened in the past to better understand what’s happening now or what may happen in the future.

For compressor reliability and maintenance, it is often of primary interest to understand whether specific patterns in operating data, such as pressure, temperature, flow, and vibration, indicate undesired operational modes, especially future critical failures. Ideally, you would like to know about specific failure modes, such as valve failure, lubrication system failure, dry gas seal failure, corrosion, compressor surge, and so forth, before they occur. Moreover, you would also like to know where in the system the failure will happen.

The challenge of insufficient datasets for compressors

One common challenge is that machine learning algorithms need data to learn from, but data about failures is typically scarce. Industrial compressors are highly engineered systems designed not to fail, so a historical dataset collected from compressor systems is unlikely to show many episodes of failure, especially across the whole universe of different failure modes. 

In addition, sensor-collected data has historically been used for real-time asset monitoring and failure alerts (alarms) rather than to construct databases of failures. Marking sensor-collected data with evidence of failures, an activity called data labeling, is key to improve the performance of machine learning algorithms, but has not traditionally been done in industrial settings. The lack of historical examples makes standard approaches focused on predicting future incidents based on past patterns challenging.

How to mitigate the lack of historical data on compressors

Sometimes, it is possible to mitigate this problem. Possible workarounds are to focus on recent failures for which back-labeling is still possible, or on frequent failures that constitute recurring pain points for the compressor system. This is generally done with the help of subject matter experts, who provide in-depth interpretation of the data.

In conclusion, while standard approaches focused on predicting future incidents based on historical data about failures are applicable, their performance might suffer in the initial phases of the projects due to lack of labeled historical data.

Also read: Pump Analytics: Solving Pump Inefficiency and CO2 Emissions


Modern industrial monitoring systems allow operators to have real-time evaluation and alarming of critical parameters. More advanced systems may allow increasingly sophisticated monitoring options, for example, presenting derived values such as efficiency or identifying temporal trends in operational data. 

How to identify critical events before they occur

However, such monitoring typically relies on hard-coded alerting thresholds and may still require significant expertise and analysis time in relating alerts to real problems needing action. Adding a well-designed machine learning layer to this workflow can simultaneously increase sophistication, and decrease the time needed to identify mission-critical events when or even before they occur. In addition, machine learning applications can improve over time, issuing fewer false alarms in the long run, which mitigates alarm fatigue.

Such a layer may contain one or both of the following approaches based on learnings from historical observations combined with an operator’s expert domain knowledge:

  • Interpretable generic multi-sensor anomaly detection: Despite having few failure examples, a historian will contain plenty of sensor data from regular operations. An approach can therefore be developed to monitor all (or a subset of) compressor sensors simultaneously and alert an operator when the input is outside the regular operational domain based on historical observations. This approach is sometimes called anomaly or outlier detection, and uses algorithms that learn the patterns of regular operations without the need for data representing fault modes. Preference should be given to interpretable anomaly detection algorithms, which by their nature allow to draw a connection between an alert and the sensors contributing to it. Thus, guiding the operator to where in the compression system a problem might be occurring. The feedback from the operator can also be collected, and used to construct a dataset of labeled suspected failures. This data collection process serves the double purpose of improving the “generic anomaly detection” algorithm and laying the foundations for training fault-specific machine learning algorithms.
  • Fleet learning: Compressors are well-understood physical systems: Information on how their failure modes manifest themselves is often known, even if a specific compressor hasn’t observed that failure in its own history. This can be leveraged with a technique called fleet learning. In this case, an algorithm that learns from data collected from multiple compressors of the same type is developed. The machine learning algorithm can then “apply” those learnings to new compressors being monitored as long as they are of the same kind and their operating conditions are represented in the data, even though they did not appear in the initial dataset. This technique effectively increases the amount of available data and the capability of operating at scale efficiently. One caveat to be aware of when using fleet learning is that the performance of the machine learning algorithm will not be satisfactory if the asset to which it is applied is of a different type or operates in very different conditions than what is seen during training. In this case, re-training the model with data representing the new scenario is necessary to capture its specificities.

Reducing wear and improving visibility for natural gas compressor systems

In our experience, a well-designed machine learning application leads to improved monitoring for a compressor system and new insights and visibility into it. In one of our industrial applications, we studied a multi-stage compressor system for natural gas. The system operators aimed to reduce the maintenance cost of the system by receiving alerts of potential faults before their impact becomes too large. 

The system was well-equipped with sensors, but the collected data did not present any evidence of critical failures or other (non-critical) faults. In addition, a basic alert system was in place to highlight if parameters such as the final output pressure were outside the expected range. In such a scenario, a machine learning application that complements the existing alert system can be developed. 

In this particular application, we trained an algorithm to recognize the operational modes of the system, which led to the insight that the system was sometimes left for too long in a state that led to excessive strain. The algorithm, enriched with the business logic from the observation, was put into production to alert the operators when the same conditions were observed again. Thanks to this new alert, the operators can now ensure the system does not remain in this state unnecessarily, thus reducing its wear.


Compression systems are an excellent place to start for implementing artificial intelligence (AI) and machine learning (ML) solutions for asset monitoring. On one hand, their criticality in maintaining operations makes downtime expensive and undesirable. On the other hand, their physics and failure modes are well-known and are generally well-monitored assets. 

Building a context-aware machine learning application on existing monitoring solutions can significantly streamline diagnosing problems and add complexity, speed, accuracy, and scale. Lack of failure data in the lifetime of a given compressor may appear as a barrier, but well-designed anomaly detection approaches can provide significant value from the beginning. 

Furthermore, as solutions are scaled across a fleet of comparable compression systems, information can be learned across that fleet as a whole, further increasing the value machine learning approaches can provide.

Learn more:

What's Missing From Asset-Heavy Digitalization?

Revolutionizing heavy asset maintenance: The power of LLM-based virtual assistants

Monitor Your Critical and Less-Critical Assets With Machine Learning