By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookie Policy and Privacy Policy for more information.

Machine Learning for Industrial Use Cases: Hard-Won Lessons

Machine Learning for Industrial Use Cases: Hard-Won Lessons

In this article, we'll go through some of the lessons we got providing our customers in different industrial use cases with data and machine learning solutions.

With the advent of the era of "Big Data" and the subsequent explosion of interest in applying data science techniques to all manner of problems, it has come as no surprise that heavy industry has been exploring these areas in recent years. The promise and expectation surrounding these techniques is the increase in efficiency, reduction of non-productive downtime, and improvements in safety. Identifying which opportunities present the best combination of feasibility and value is still the most important question industrial companies face. However, how they can successfully implement machine learning in an industrial context should be a close second.

By their nature, industrial companies are very different from the tech companies that have embraced data science and machine learning in recent years. While industrials aren’t software companies at their cores, there are promising opportunities for solving valuable problems with machine learning and related techniques. These opportunities are enabled in part by the current abundance of data in industrial settings. Sensors on equipment are now ubiquitous and the data can be streamed to the cloud, used on premise, or even taken advantage of at the edge using the right software.

Industrial use cases present both unique opportunities and unique problems for applying data science and machine learning techniques. Industrial environments often mean a combination of physical systems, high monetary risk (or reward), physical safety considerations, and, by necessity, a conservative operating environment. All of these mean that specific considerations must be made, often differing from the typical application of machine learning.

At Arundo, we have worked with many customers to provide them with data and machine learning powered solutions for a very wide range of industrial use cases. Out of this experience, we have learned several lessons on how to approach industrial use cases.


Data scientists and other machine learning practitioners live in a world where the best performing models are typically seen as the ultimate goal. This is reflected in the high visibility of many machine learning competitions and published papers that push the state of the art performance. While it’s widely understood that "Kaggle isn't the real world" (i.e. these competitions do not reflect real-world problems in a complete sense), it’s not as well understood what real-world solutions should consist of.

For industrial use cases, we’ve recognized a few common scenarios:

  1. Training data may lack scenarios that are known to be of interest.
  2. Events of interest exist in the training data, but are poorly or inadequately labeled.
  3. Certain scenarios of interest aren’t in the training data, but are well understood from a physics point of view.

These scenarios present several challenges for a purely data-driven solution, including machine learning.

Gathering more (and higher quality) data is a universally good strategy for improving underperforming models. Unfortunately, adequate data isn’t always available for physical systems, or worse, you may not realize that you lack it. This is a major reason why it’s important for machine learning practitioners to work closely with subject matter experts.


For physical systems, events of interest, such as failure modes, are often well known, if not well understood, by the operators and engineers. Taking advantage of the existing knowledge of SME’s can help flesh out the overall performance of the predictive system, where a machine learning model may be inadequate on its own. SME knowledge can help the data scientist identify which additional data to gather, which other features should be used or engineered, and which hand-coded rules can act as guard rails for the system. Just because your water data set doesn’t contain data below 0°C, doesn’t mean you should be surprised by "unexpected" behavior when the system goes below freezing.

In our experience, the best machine learning systems for physical industrial processes combine the data with the knowledge of the on-the-ground experts. This allows the machine learning practitioner to handle edge cases, improve the core predictive system, and ultimately improve the business outcome that the solution is being built for.