By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookie Policy and Privacy Policy for more information.

Machine Learning in Industrial Applications: Research to Reality

Machine Learning in Industrial Applications: Research to Reality

In this article you'll learn some of the problems you might face when trying to apply machine learning in industrial applications.

There’s an emerging trend using machine learning (ML) in industrial applications. However, transitioning a solution from a research and development (R&D) setting to actual use in industry is not without hiccups. In this first article in our series, Transitioning from R&D to Reality, we'll go through some of the problems you might face when trying to apply ML in industrial applications.


Easy accessibility of data science tools (e.g., Keras, TensorFlow, Scikit-learn, AutoML), infrastructure and low-cost computation have changed the landscape of machine learning in the last decade. The trend of sharing data and software has made the field of machine learning more accessible, reducing entry barriers that might have previously existed due to lack of sufficient data or irreproducible models. This trend has also spread to academia. Machine learning conferences such as ICML, ICLR, and NeurIPS have started allowing data and code to be uploaded during submission so that reviewers can consider these supplemental materials. Anonymized peer reviews for submissions are also posted online.

Thus, we see an emerging trend where machine learning algorithms that had earlier been restricted to academia are increasingly being applied to use cases in industry.


It’s fortunate that easy access to data science tools has generated an immeasurable amount of attention, people, and money. However, at the same time, it’s unfortunate that much of this has happened without sufficient attention to model limitations, leading to false promises, unrealistic expectations, and possible failures in delivery.

Transitioning your solution from an R&D setting to actual use isn’t without hiccups. Possible reasons why you might experience this includes the following:

  • You have a communication gap between model developers and actual users
  • Your models aren’t robust on test data (overfitting)
  • Your real-world scenarios aren’t adhering to model assumptions, or
  • Your existing IT infrastructure doesn’t support the model.

Finally, the real world isn’t always reflective of historical data used for model training and testing. Because of this, the “best” model in R&D isn’t always the best model in a production environment. As an example, in Kaggle competitions, top-ranking solutions aren’t necessarily usable in practice.


In the industrial environment, the stakes are high due to risks involved, e.g., financial, environmental, safety, etc., necessitating a human-in-the-loop machine learning solution. Machine learning solutions need to be designed for your end-users making critical decisions. In the heavy asset industry, end-users may include equipment operators, field engineers, and plant managers, not all of whom have exposure or understanding of machine learning. It’s imperative that the results of your model are easy to understand and actionable.

Secondly, real-world data is noisy, unlabeled and unstructured. In the asset-heavy industry, we often observe a combination of the following scenarios:

  • Fusion of machine & human-generated data: sensor data combined with data from diagrams, reports, logs, etc., (not all of which are digitized)
  • Non-stationary dynamical systems (assets exposed to perturbations & disturbances)
  • Sensor bias, drift, and failure
  • Strong system safety, reliability & quality requirements
  • Infrequent occurring events (e.g. equipment failures that occur every 5-7 years)
  • Operator-controlled parameters, resulting in a system with humans in the loop


Finally, adoption of machine learning solutions will be slow due to apprehensions and distrust about model performance in the field. Human experts with decades of engineering experience will be hesitant to trust a system that they don’t understand. Hence, in certain situations, there is an incentive to include the human operator/expert in the solution loop.


For tasks where a fully automatic ML solution is too risky, it might make sense to adopt a machine-first solution before invoking manual review by the human expert. This manual review has an added advantage of giving you more (labeled) data that may further improve your ML models.

This article was the first article in a series on "Transitioning From R&D to Industry" and will be written by co-authors. We have applied this human-in-the-loop paradigm successfully to an industrial application, i.e., the digitization of engineering schematic diagrams. To find out more, please watch out for our next article on “Digitization of Engineering Schematic Diagrams”.


Jo-Anne Ting is Lead Data Scientist at Arundo Analytics, based out of the Palo Alto office. Her experience lies in developing and implementing machine learning solutions to various application domains in the robotics, control, risk, automotive, manufacturing, and industrial spaces. She received a PhD in Computer Science from the University of Southern California and completed postdocs at the University of Edinburgh and University of British Columbia. She was previously a Research Scientist at Bosch Research and Director of Data Science & Engineering at Insikt, Inc. (now known as Aura Financial).

Pushkar Kumar Jain is Data Scientist at Arundo Analytics in Houston office. He received his PhD in Engineering Mechanics from the University of Texas at Austin towards advancements in computational science and high performance computing. His experience includes developing data science applications in heavy-asset industry involving various machine learning domains of computer vision, time-series analysis etc. He was previously an Engineering Consultant at General Electric Global Research Center, developing simulation software and a R&D Research Intern at Quantlab Financial, developing algorithmic trading strategies.