There’s an emerging trend using machine learning (ML) in industrial applications. However, transitioning a solution from a research and development (R&D) setting to actual use in industry is not without hiccups. In this first article in our series, Transitioning from R&D to Reality, we'll go through some of the problems you might face when trying to apply ML in industrial applications.
Access to Data Science Tools is Easy
Easy accessibility of data science tools (e.g., Keras, TensorFlow, Scikit-learn, AutoML), infrastructure and low-cost computation have changed the landscape of machine learning in the last decade. The trend of sharing data and software has made the field of machine learning more accessible, reducing entry barriers that might have previously existed due to lack of sufficient data or irreproducible models. This trend has also spread to academia. Machine learning conferences such as ICML, ICLR, and NeurIPS have started allowing data and code to be uploaded during submission so that reviewers can consider these supplemental materials. Anonymized peer reviews for submissions are also posted online.
Thus, we see an emerging trend where machine learning algorithms that had earlier been restricted to academia are increasingly being applied to use cases in industry.
There is a Gap in Research and Implementation
It’s fortunate that easy access to data science tools has generated an immeasurable amount of attention, people, and money. However, at the same time, it’s unfortunate that much of this has happened without sufficient attention to model limitations, leading to false promises, unrealistic expectations, and possible failures in delivery.
Transitioning your solution from an R&D setting to actual use isn’t without hiccups. Possible reasons why you might experience this includes the following:
- You have a communication gap between model developers and actual users
- Your models aren’t robust on test data (overfitting)
- Your real-world scenarios aren’t adhering to model assumptions, or
- Your existing IT infrastructure doesn’t support the model.
Finally, the real world isn’t always reflective of historical data used for model training and testing. Because of this, the “best” model in R&D isn’t always the best model in a production environment. As an example, in Kaggle competitions, top-ranking solutions aren’t necessarily usable in practice.
Face the Reality
In the industrial environment, the stakes are high due to risks involved, e.g., financial, environmental, safety, etc., necessitating a human-in-the-loop machine learning solution. Machine learning solutions need to be designed for your end-users making critical decisions. In the heavy asset industry, end-users may include equipment operators, field engineers, and plant managers, not all of whom have exposure or understanding of machine learning. It’s imperative that the results of your model are easy to understand and actionable.
Secondly, real-world data is noisy, unlabeled and unstructured. In the asset-heavy industry, we often observe a combination of the following scenarios:
- Fusion of machine & human-generated data: sensor data combined with data from diagrams, reports, logs, etc., (not all of which are digitized)
- Non-stationary dynamical systems (assets exposed to perturbations & disturbances)
- Sensor bias, drift, and failure
- Strong system safety, reliability & quality requirements
- Infrequent occurring events (e.g. equipment failures that occur every 5-7 years)
- Operator-controlled parameters, resulting in a system with humans in the loop
Applying ML to the problems above is not trivial
Finally, adoption of machine learning solutions will be slow due to apprehensions and distrust about model performance in the field. Human experts with decades of engineering experience will be hesitant to trust a system that they don’t understand. Hence, in certain situations, there is an incentive to include the human operator/expert in the solution loop.
How should you solve it?
For tasks where a fully automatic ML solution is too risky, it might make sense to adopt a machine-first solution before invoking manual review by the human expert. This manual review has an added advantage of giving you more (labeled) data that may further improve your ML models.
This article was the first article in a series on "Transitioning From R&D to Industry" and will be written by co-authors. We have applied this human-in-the-loop paradigm successfully to an industrial application, i.e., the digitization of engineering schematic diagrams. To find out more, please watch out for our next article on “Digitization of Engineering Schematic Diagrams”.
About the authors of this article:
Jo-Anne Ting is Lead Data Scientist at Arundo Analytics, based out of the Palo Alto office. Her experience lies in developing and implementing machine learning solutions to various application domains in the robotics, control, risk, automotive, manufacturing, and industrial spaces. She received a PhD in Computer Science from the University of Southern California and completed postdocs at the University of Edinburgh and University of British Columbia. She was previously a Research Scientist at Bosch Research and Director of Data Science & Engineering at Insikt, Inc. (now known as Aura Financial).
Pushkar Kumar Jain is Data Scientist at Arundo Analytics in Houston office. He received his PhD in Engineering Mechanics from the University of Texas at Austin towards advancements in computational science and high performance computing. His experience includes developing data science applications in heavy-asset industry involving various machine learning domains of computer vision, time-series analysis etc. He was previously an Engineering Consultant at General Electric Global Research Center, developing simulation software and a R&D Research Intern at Quantlab Financial, developing algorithmic trading strategies.