<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=239686353408221&amp;ev=PageView&amp;noscript=1">
Subscribe to The Journey

Data Science Applications: 5 Things an Operations Engineer Should Know

Posted by Mark Tibbetts on May 14, 2019
Find me on:

Over the last few years, the utilization of data science and machine learning techniques in asset-heavy industries have taken off. Digital transformation is moving from proofs of concept into an ecosystem of fully-fledged data science applications. Engineers operating production critical equipment are key end-users of these applications. They need to know what these applications are and how to interact with them. In this article I’ll go through 5 things you need to know as an operations engineer.


1. The human in the loop

Data science is often described as a path to fully automated intelligent systems. However, this simply isn’t true in heavy-asset equipment operations. Well-designed data science applications should arm you with the information you need to make better decisions. You must understand how to trust the provided information from the data science application. If you can’t, the information is essentially useless no matter how good the data science team who built it is.


2. Data science is powerful but imperfect

The hype around data science is for a good reason: it’s a fast-moving field with technical revolutions occurring on an almost weekly basis. However, no data science application can provide insight which is 100% perfect. You'll need to understand the following concepts:

  • False positive: data science application insight that says something is happening when it's not.
  • False negative: data science application that doesn’t say something is happening when it is.

The application should set clear expectations around these scenarios. A main reason operations engineers lose trust in data science applications is because they’re not understanding this limitation.


3. Engineers know systems better than data science application do

Data science applications are only as good as the data used to develop them. In asset-heavy industries, the biggest challenge for data scientists is understanding the underlying truth of the system they are trying to model. When did something happen and why?

As a system generates data, the best people to provide this information are the operating engineers. You’re the direct proxy for what’s happening in the underlying system in a data science application. Furthermore, you should use your own experience to assess how trustworthy the insights are.


4. Feedback will improve data science applications

Understanding and capturing what really happened in a system through feedback from an operations engineer can improve the application. With time, the data science application can learn from this information and improve its performance by correctly identifying the truth of the underlying system. In turn, this will lead to improved trust in the insights provided.


5. Recognize and avoid applications built with black box data science

Heavy-asset equipment are well understood engineered systems. Any data science approach trying to learn from data without the engineering context will be significantly less trustworthy. In addition, if a data science application can’t give transparent information about why it generated an insight, then it won’t be useful to you. You should ask if the data science team who developed this application had knowledge about the type of operator and the system they built it for.



Data science applications in heavy-asset operations are becoming more prevalent. They promise a lot, but it’s critical that you learn to trust those applications by knowing their limitations and how they can improve with time. The most trustworthy data science applications will be those built with the operator and engineering context of the system in mind.


New call-to-action

Topics: data science, Industrial Internet of Things (IIoT), machine learning

Mark Tibbetts


Mark Tibbetts is Lead Data Scientist for R&D in Arundo Analytics' Houston office where he has a core role in advanced analytics engagements with partner organizations and researchers. His current focus is around IIoT data analysis for industries such as oil & gas, mining, maritime and utilities. He previously worked as a postdoctoral researcher at Berkeley National Laboratory where he had a leading role in analysing data from the Large Hadron Collider facility at CERN. Mark obtained a PhD specializing in High Energy Physics from Imperial College in 2010.