Tsaug: An Open-Source Python Package for Time Series Augmentation

We are now releasing a new tool as an open source package, tsaug, a data augmentation tool to help train machine learning models on time series.

We built a data augmentation tool to help us train machine learning models on time series. We're now releasing this tool, tsaug, as an open source package to help everyone improve their data hungry time series models.

DATA AUGMENTATION FOR DEEP LEARNING

The breakthrough of artificial neural networks in the past few years is the most exciting achievement of machine learning in recent decades. The advance of various types of deep learning models provides us with powerful tools to tackle various challenging problems that previously a traditional machine learning model could hardly solve. However, deep learning is notorious for its data-hungry nature.

To avoid overfitting a large number of model parameters, a neural network model usually requires a large-scale training set, which is often challenging to obtain. One solution to this problem is data augmentation, a process to generate more data by permutating existing data, and it is especially common for computer vision tasks, where image augmentation has become a standard technique. Several open-source libraries for image augmentation are widely used by data scientists.

MOTIVATION BEHIND DEVELOPING TSAUG

We recently published our research paper [1] on using a convolutional neural network (CNN) for time series segmentation and anomaly detection. As CNN models like U-net [2] were proven effective for image segmentation tasks, we applied a model with a similar architecture to time series data and achieved good results on multiple anomaly detection and segmentation benchmarks.

During this research, we tried to apply augmentation techniques to time series data in a similar fashion as image augmentation, because the model itself is an analog of a computer vision model. However, we could not find a comprehensive open-source package for time-series data augmentation. Therefore, we developed tsaug, a lightweight, but handy, Python library for this purpose. We recently released the open-source version of this package.

AUGMENTING TIME SERIES WITH TSAUG

There are 15 augmentation methods implemented in tsaug. They include methods that mimic common image augmentation methods, e.g. cropping, magnifying, flipping (reversing timeline), adding noises, as well as methods that are designed specifically for time series data, e.g. time warping, sidetracking, superposing trends, etc.

Every augmentation method is implemented as a function and a class in tsaug. A function is convenient to be called against a time series directly. The following example applies random time warping to three univariate time series as well as their corresponding anomaly labels.

>>> plot(X, Y)

>>> from tsaug import random_time_warp >>> X_aug, Y_aug = random_time_warp(X, Y) >>> plot(X_aug, Y_aug)

Augmentor classes have a unified API and can be easily connected into an augmentation pipeline, which is likely to be needed in practice. A user may control the probability of execution of each component in a pipeline so that the randomly augmented data has the desired distribution. It is also convenient to apply the same random augmentation multiple times in parallel on the same original time series and generate different random augmented versions.

In the following example, we connect four augmentors with tsaug operator +, control the randomness of each augmentor with operator @, and set up multi-execution with operator *. We also print the summary of this augmentation pipeline with a class method summary().

>>> from tsaug import RandomTimeWarp, RandomMagnify, RandomJitter, RandomTrend >>> my_aug = ( ... RandomMagnify(max_zoom=4.0, min_zoom=2.0) * 2 ... + RandomTimeWarp() * 2 ... + RandomJitter(strength=0.1) @ 0.5 ... + RandomTrend(min_anchor=-0.5, max_anchor=0.5) @ 0.5 ... ) >>> X_aug, Y_aug = my_aug.run(X, Y) >>> plot(X_aug, Y_aug)

>>> my_aug.summary() Augmentor M Prob Output Size Params ==================================================================================================== RandomMagnify 2 1.0 (2N, n, c) {'max_zoom': 4.0, 'min_zoom': 2.0, 'random_seed': None} RandomTimeWarp 2 1.0 (4N, n, c) {'n_speed_change': 3, 'random_seed': None} RandomJitter 1 0.5 (4N, n, c) {'dist': 'normal', 'strength': 0.1, 'random_seed': None} RandomTrend 1 0.5 (4N, n, c) {'num_anchors': 5, 'min_anchor': -0.5, 'max_anchor': 0.5, 'random_seed': None}

READY FOR THE WORLD

We are happy to share this open-source package with data scientists who are applying deep learning to time series problems and have similar needs for data augmentation as we did. We also welcome contributions from members in the open-source community. Please try out tsaug today.

REFERENCES

[1] Wen, Tailai, and Roy Keyes. "Time Series Anomaly Detection Using Convolutional Neural Networks and Transfer Learning." arXiv preprint arXiv:1905.13628 (2019).

[2] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.

Let’s connect the dots!

Join other leading industrial companies and discover how Arundo’s AI Foundation adds insight and intelligence to your operations

Get in touch

All articles

Article

Towards the Sentient and Autonomous Factory

What a Cup of Coffee Can Teach Us About AI-Driven Manufacturing

Article

AI Assistants for Operators: Setting the Foundation for AI-Driven Industrial Operations

Using Large Language Models and AI Agent Systems to Provide AI Assistants for Improved Industrial Operations

Article

Building an Industrial AI Companion using Arundo Foundation

While large language models (LLMs) may struggle to generate insights directly from time-series data, their strong pattern-recognition capabilities make them well-suited for identifying and synthesizing already existing insights. We therefore argue that an architecture featuring a separate graph-based domain model and time-series data storage is the ideal architecture for a generalized AI companion serving the heavy asset industry: An agentic AI can first identify relevant nodes (assets, sensors, and a large variety of pre-computed models) for answering a user prompt using a mixture of APIs, query language generation, semantic search, and graph traversal. Only after this filtering step is time-series data accessed and retrieved to synthesize an answer.

Tsaug: An Open-Source Python Package for Time Series Augmentation

Let’s connect the dots!

More articles

Towards the Sentient and Autonomous Factory

AI Assistants for Operators: Setting the Foundation for AI-Driven Industrial Operations

Building an Industrial AI Companion using Arundo Foundation