By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookie Policy and Privacy Policy for more information.
HomeArticles

Tsaug: An Open-Source Python Package for Time Series Augmentation

Tsaug: An Open-Source Python Package for Time Series Augmentation

We are now releasing a new tool as an open source package, tsaug, a data augmentation tool to help train machine learning models on time series.

We built a data augmentation tool to help us train machine learning models on time series. We're now releasing this tool, tsaug, as an open source package to help everyone improve their data hungry time series models.

DATA AUGMENTATION FOR DEEP LEARNING

The breakthrough of artificial neural networks in the past few years is the most exciting achievement of machine learning in recent decades. The advance of various types of deep learning models provides us with powerful tools to tackle various challenging problems that previously a traditional machine learning model could hardly solve. However, deep learning is notorious for its data-hungry nature.

To avoid overfitting a large number of model parameters, a neural network model usually requires a large-scale training set, which is often challenging to obtain. One solution to this problem is data augmentation, a process to generate more data by permutating existing data, and it is especially common for computer vision tasks, where image augmentation has become a standard technique. Several open-source libraries for image augmentation are widely used by data scientists.

MOTIVATION BEHIND DEVELOPING TSAUG

We recently published our research paper [1] on using a convolutional neural network (CNN) for time series segmentation and anomaly detection. As CNN models like U-net [2] were proven effective for image segmentation tasks, we applied a model with a similar architecture to time series data and achieved good results on multiple anomaly detection and segmentation benchmarks.

During this research, we tried to apply augmentation techniques to time series data in a similar fashion as image augmentation, because the model itself is an analog of a computer vision model. However, we could not find a comprehensive open-source package for time-series data augmentation. Therefore, we developed tsaug, a lightweight, but handy, Python library for this purpose. We recently released the open-source version of this package.

AUGMENTING TIME SERIES WITH TSAUG

There are 15 augmentation methods implemented in tsaug. They include methods that mimic common image augmentation methods, e.g. cropping, magnifying, flipping (reversing timeline), adding noises, as well as methods that are designed specifically for time series data, e.g. time warping, sidetracking, superposing trends, etc.

Every augmentation method is implemented as a function and a class in tsaug. A function is convenient to be called against a time series directly. The following example applies random time warping to three univariate time series as well as their corresponding anomaly labels.

>>> plot(X, Y)
tsaug_1
>>> from tsaug import random_time_warp >>> X_aug, Y_aug = random_time_warp(X, Y) >>> plot(X_aug, Y_aug)
tsaug_2

Augmentor classes have a unified API and can be easily connected into an augmentation pipeline, which is likely to be needed in practice. A user may control the probability of execution of each component in a pipeline so that the randomly augmented data has the desired distribution. It is also convenient to apply the same random augmentation multiple times in parallel on the same original time series and generate different random augmented versions.

In the following example, we connect four augmentors with tsaug operator +, control the randomness of each augmentor with operator @, and set up multi-execution with operator *. We also print the summary of this augmentation pipeline with a class method summary().

>>> from tsaug import RandomTimeWarp, RandomMagnify, RandomJitter, RandomTrend >>> my_aug = ( ...    RandomMagnify(max_zoom=4.0, min_zoom=2.0) * 2 ...    + RandomTimeWarp() * 2 ...    + RandomJitter(strength=0.1) @ 0.5 ...    + RandomTrend(min_anchor=-0.5, max_anchor=0.5) @ 0.5 ... ) >>> X_aug, Y_aug = my_aug.run(X, Y) >>> plot(X_aug, Y_aug)
tsaug_3
>>> my_aug.summary() Augmentor M Prob Output Size Params ==================================================================================================== RandomMagnify 2 1.0 (2N, n, c) {'max_zoom': 4.0, 'min_zoom': 2.0, 'random_seed': None} RandomTimeWarp 2 1.0 (4N, n, c) {'n_speed_change': 3, 'random_seed': None} RandomJitter 1 0.5 (4N, n, c) {'dist': 'normal', 'strength': 0.1, 'random_seed': None} RandomTrend 1 0.5 (4N, n, c) {'num_anchors': 5, 'min_anchor': -0.5, 'max_anchor': 0.5, 'random_seed': None}

READY FOR THE WORLD

We are happy to share this open-source package with data scientists who are applying deep learning to time series problems and have similar needs for data augmentation as we did. We also welcome contributions from members in the open-source community. Please try out tsaug today.

REFERENCES

[1] Wen, Tailai, and Roy Keyes. "Time Series Anomaly Detection Using Convolutional Neural Networks and Transfer Learning." arXiv preprint arXiv:1905.13628 (2019).

[2] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.