Ph.D. Dissertation Proposal

Learning Representation and Modeling Time Series

 Zhiguang Wang

10:00-12:00 Friday, 12 December 2015, ITE 325B

Most real-world data has a temporal component, whether it is measurements of natural (weather, sound) or man-made (stock market, robotics) phenomena. Analysis of time-series data has been the subject of active research for decades and is still considered to be a challenge in machine learning and data mining due to the properties of temporal data.

Traditional approaches for modeling and representing time-series data fall into three categories. Non-data adaptive models, such as Discrete Fourier Transformation (DFT), Discrete Wavelet Transformation (DWT), and Discrete Cosine Transformation (DCT), compute the transformation with an algorithm that is invariant with respect to the data. Data adaptive approaches such as Symbolic Aggregation approXimation (SAX), Piecewise Linear Aggregation (PLA), and shapelets compute transforms that are highly dependent on the data. In model-based approaches such as AutoRegressive Moving Average models (ARMA), Linear Dynamical Systems (LDS), and Hidden Markov Models (HMMs), the underlying data is assumed to fit a specific type of model. The estimated parameters can then be used as features in, for example, a classifier.

However, more complex, high-dimensional, and noisy real-world time-series data are often difficult to model because the dynamics are either too complex or unknown. Traditional shallow methods, which contain a small number of non-linear operations, might not have the capacity to accurately model such complex systems.

We develop and verify three different approaches to represent and model time-series. Time-Warping SAX and Pooling SAX are two extensions of the vanilla SAX approach that is used as a symbolic representation of time series. Time-Warping SAX extracts linear temporal dependencies by building a time-delay embedding vector to construct more informative SAX words. Pooling SAX applies a non-parametric weighting scheme to extract significant variables. These are data adaptive models that achieve state-of-the-art accuacy on time-series classification problems.

We also propose the Gramian Angular Field (GAF) and Markov Transition Field (MTF) as two novel approaches to encode a time-series as an image. These representations not only demonstrate potential for visual inspection by humans, but when they are combined with deep learning approaches (Convolutional Network and Denoised Auto-encoders) they achieve excellent performance compared to other modern algorithms on classification and regression/imputation problems. GAF and MTF are non-data adaptive approaches that allow us to learn models and extract the abstract representations supported by model-based approaches.

Finally, we propose to model time-series by learning the representation directly from the raw data with model-based approaches. We will develop recurrent auto-encoders, in which the global optimum is ensured by a new Adaptive Risk-Averting/Seeking Criterion, to model the real/complex time series (dynamical systems) by learning the implicit data generating distribution over time. This model will be applied to tasks such as classification, regression/imputation, and anomaly detection.

Committee: Drs. Timothy Oates (Chair), James Lo (Math), Yun Peng and Matt Schmill