In areas with excellent road infrastructure, we can travel with high speed to our destination using, for example, a racing car. However, in rough and difficult terrains, a racing car will fail to serve its purpose. If the terrain is not too rough, we could instead resort to an off-road vehicle, which should be a better option than cars primarily constructed for paved roads.

This example describes the situation for time series classification. The majority of classification algorithms in machine learning assume that the data we want to learn on live in a Euclidean space. The Euclidean space has a well-developed mathematical infrastructure with a plethora of powerful techniques for statistical data analysis. The concepts of derivative and gradient are examples of such powerful techniques that correspond to racing cars in our above analogy.

Treating time series as vectors can be misleading. The next figure illustrates the problem. Assume that the red and blue time series have similar amplitudes but are shifted along the vertical axis for illustrative purposes.

In many applications, it is necessary to permit variation of speed, that is compression and expansion with respect to the time axis. The red and blue time series in the above figure are quite similar in shape, but they are not aligned along the time axis. Regarding both time series as vectors will result in a large Euclidean distance. Elastic matching such as dynamic time warping are more intuitive and result in lower distances, because matching of time points is guided by similarity of values.

Consequently, time series under elastic transformations live in spaces, which are mathematically poorly structured. The concepts of derivative and gradient are undefined in such spaces. Therefore, gradient-based classifiers such as logistic regression, linear support vector machines, and feed-forward neural networks cannot be applied directly to time series. Therefore the simple nearest neighbor method in conjunction with the dynamic time warping distance still belongs to the state-of-the-art and is reported to be exceptionally hard to beat.

Paper [1] generalizes gradient-based linear classifiers to time series spaces under dynamic time warping. The resulting elastic linear classifiers are piecewise smooth and preserve elastic transformations. They can be regarded as off-road vehicles designed for the rough terrain of dynamic time warping spaces.

The next table shows the average error rate of four elastic linear classifiers applied to all two-class classification problems of the UCR time series benchmark dataset. The error rates are averaged over 100 trials.

The first three columns show the error rates of the nearest neighbor method using the dynamic time warping distance. The four elastic linear classifiers extend the perceptron algorithm (ePERC), logistic regression (eLOGR), margin perceptron (eMARG), and linear support vector machine (eLSVM).

Green shaded areas in the table show superior performance of elastic linear classifiers, yellow shaded areas show comparable performance, and red shaded areas show superior performance of nearest neighbor methods.

The results show that elastic linear classifiers behave similarly as their standard counterpart in Euclidean spaces. They are simple and efficient methods that rely on the strong assumption that an elastic-linear decision boundary is appropriate. Therefore, elastic linear classifiers may yield inaccurate predictions when the assumptions are biased towards oversimplification and/or when the number of training examples is too low compared to the length of the time series.

In summary, the simplest form of an elastic classifier is comparable to the state of the art. Therefore we believe that the basic idea of generalizing more sophisticated gradient-based learning methods has the potential to complement the state-of-the-art in learning on time series and sequence data.

To learn how elastic classifiers work and what their theoretical properties are, we refer to publication [1]. A Java library implementing elastic linear classifiers for multi-class classification problems is publicly available at Github.

[1] B. Jain. Generalized Gradient Learning on Time Series under Elastic Transformations. *Machine Learning*, 2015 (accepted for publication).

## Leave a Reply