No.122 Analysing Large Collections of Time Series

Icon

NII Shonan Meeting Seminar 122

Overview

Shonan Village Center
12 – 15 February, 2018

Due to technological advances in sensor technology, there is a tremendous increase in the availability of massive data streams—many of these are time series in nature. For examples, sensors could measure various parameters of a manufacturing environment, vital parameters of a medical patient, or fitness parameters of a healthy person, or movement sensors could be installed in a fixed environment, traffic sensors measure the number of people or vehicles in a network. Other examples where massive time series data are generated include web applications tracking user clicks, machine log data generated in an IT-infrastructure, and point-of-sales data in a large store. The aim of capturing this data could be monitoring production quality, monitoring the state of health of a patient, detecting intruders climbing over fences, predicting if a user is likely to click on an advertisement, forecasting the number of passengers on a particular train route, and so on.

These advances in sensor and cloud technologies have led to the Internet of Things (IoT) phenomenon, where huge sets of time series are collected in a distributed way and either the data or some aspect of them is transferred to a centralized cloud.

As a result of the deluge of information, new paradigms are needed for working with time series data. Instead of working at the level of individual observations, we can consider each time series as a single data point in a space of time series. Data analysis tasks such as forecasting, clustering, density estimation and outlier detection, have been largely developed for Euclidean (feature) spaces, and cannot easily be applied in these spaces of time series. We need new algorithmic methods in order to handle the infinite- dimensional geometry of the space of time series.

Indeed, the study of forecasting methods has a long history, and has been studied in various scientific communities, including statistics, econometrics, control engineering and computer science. Many of the widely-used techniques (such as exponential smoothing and the Kalman filter) were developed in the 1960s. From an algorithmic perspective, these methods are elegant and efficient, which make them
very appealing when computational power is scarce. Since then, much progress has been made with respect to both theoretical and computational aspects of forecasting. However, the focus has been limited to forecasting individual time series, or a small number of time series. New methods are required in order to develop algorithms and models designed for forecasting millions of related series.

Once we take the perspective of studying a space of time series, we can consider potential time series that have not yet been observed (e.g., the data that will be observed after we install a new sensor). We may wish to forecast these unobserved time series, but the existing paradigms provide no way of doing so.

Visualization of large collections of time series is also challenging, and impossible using classical time series graphics. Similarly, identifying outliers in a space of time series, or defining the “median” of a large collection of time series are difficult tasks and existing tools are very limited or non-existent.

The workshop will bring together researchers in machine learning, statistics, econometrics and computer science along with industry practitioners to discuss the computational and conceptual challenges that we are facing in analysing very large collections of time series. In particular, we have invited researchers from the following communities, because we see huge potential in the cross-fertilization of these research communities under the lens of time series data.

  • Computational topology is a field at the intersection of computer science and mathematics that is concerned with algorithms to compute topological features of point clouds. Its focus on topology rather than geometry allows it to detect and to reason about non-linear structures that underlie the data. Topological data analysis (TDA) is an emerging field that draws from concepts and methods developed in computational topology and makes a connection to traditional statistical data analysis.
  • Functional data analysis (FDA) is a field in statistics that is concerned with probability distributions over curves or functions. Researchers in this field have shown that traditional methods used for point clouds, such as principal component analysis (PCA), can be extended to spaces of curves. To the best of our knowledge there has not been much interaction between the fields of FDA and TDA, although there is clearly considerable overlap.
  • Manifold learning is an umbrella term for different techniques that involve non-linear dimensionality reduction, metric embeddings and clustering. The underlying assumption of these methods is that the unobserved coordinates of the data points lie on an unknown manifold embedded in a high-dimensional space of observations.
  • Forecasting large collections of time series is a common problem in modern data-driven companies. Data scientists working in this area are either using algorithms that work on many individual time series in parallel, or they are using deep learning approaches that model a collection of time series as a group. This latter approach overlaps with the idea of a studying a space of time series, but we know of no attempts to connect these models with the underlying implicit space of time series.

 

 

Organizers

Participants

(Alphabetical order)

Ms. Mahsa Ashouri, National Tsing Hua University, Taiwan

Prof. George Athanasopoulos, Monash University, Australia

Prof. Alexander Aue, University of California, Davis, USA

Jun-Prof. Maike Buchin, Ruhr-Universität Bochum, Germany

Prof. Kevin Buchin, TU Eindhoven, the Netherlands

Prof. Sanjay Chawla, University of Sydney, Australia

Prof. Dan Feldman, University of Haifa, Israel

Dr. Ben D Fulcher, Monash University, Australia

Prof. Jie Gao, Stony Brook University, USA

Dr. Michael Horton, University of Sydney, Australia

Dr. Maarten Löffler, Utrecht University, the Netherlands

Dr. Julie Novak, Netflix, USA

Dr. Anastasios Panagiotelis, Monash University, Australia

Prof. Nalini Ravishankar, University of Connecticut, USA

Dr. Hanlin Shang, Australian National University, Australia

Prof. Kate Smith-Miles, University of Melbourne, Australia

Dr. Frank Staals, Utrecht University, the Netherlands

Ms. Dilini Talagala, Monash University, Australia

Ms. Thiyanga Talagala, Monash University, Australia

Dr. Kevin Verbeek, Eindhoven Technical University, the Netherlands

Prof. Bei Wang, University of Utah, USA

Prof. Qiwei Yao, London School of Economics, UK

Schedule

12th February (Monday)

  • 9.00 – 10.30: Introduction
  • 10.30 – 11.00: Tea/Coffee Break
  • 11.00 – 12.00: Ben Fulcher: Feature-based time-series analysis (abstract) (Slides)
  • 12.00 – 13.30: Lunch
  • 13.30 – 14.15: Julie Novak: Challenges in Forecasting High-Dimensional Time Series (abstract)   (Slides)
  • 14.15 – 15.00: Dan Feldman:  Core-sets for learning streaming signals in real-time (abstract)  (Slides)
  • 15.00 – 16.00: Open Problems
  • 16.00 – 16.30: Tea/Coffee Break
  • 16.30 – 18.00: Work in Groups
  • 18.00: Dinner

13th February (Tuesday)

  • 9.00 – 9.45: Alexander Aue: Functional data analysis, with a view on current time series methods (abstract)   (Slides)
  • 9.45 – 10.30: Bei Wang: Topological Data Analysis In a Nutshell (abstract)
  • 10.30 – 11.00: Tea/Coffee Break
  • 11.00 – 12.00:  Work in Groups
  • 12.00 – 13.30: Lunch
  • 13.30 – 13.45: Group Photo
  • 13.45 – 16.00: Work in Groups
  • 16.00 – 16.30: Tea/Coffee Break
  • 16.30 –  17.00: Midterm Feedback
  • 17.00 – 18.00: Work in Groups
  • 18.00: Dinner

14th February (Wednesday)

  • 9.00 –  9.45:  Kevin Buchin & Maike Buchin: Trajectory Segmentation and Clustering (abstract)
  • 9.45  – 10.30: Work in Groups
  • 10.30 – 11.00: Tea/Coffee Break
  • 11.00 – 12.00:  Work in Groups
  • 12.00 – 13.30: Lunch
  • 13.30 -18.15: Excursion to Jomyoji temple with Japanese Tea ceremony
  • 18.15: Main Banquet

15th February (Thursday)

  • 9.00 – 10.30: Work in Groups
  • 10.30 – 11.00: Tea/Coffee Break
  • 11.00 – 12.00:  Conclusions
  • 12.00 – 13.30: Lunch

Survey Talks

Time: Monday, 11.00 am

Speaker: Dr. Ben D Fulcher, Monash University, Australia

Title: Feature-based time-series analysis  (Slides)

Abstract: I will give an introduction to feature-based approaches to time-series analysis. I will summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis will be given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. I argue that the future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.

 

Time: Monday, 1.30 pm

Speaker: Dr. Julie Novak, Netflix, USA

Title: Challenges in Forecasting High-Dimensional Time Series (Slides)

Abstract: This talk provides an overview of the challenges faced when forecasting high-dimensional time series data and the methods used to address them. We motivate the topic by describing issues that arise when forecasting IBM’s quarterly revenue for all their divisions and markets. It is often the case that such high-dimensional time series are naturally structured in a hierarchical manner. As a result, the forecast reconciliation problem becomes a critical one, where the goal is to make sure that forecasts produced independently at each node of the hierarchy are aggregate consistent while remaining as accurate as possible. We review the state of the art methodology that practitioners currently use and highlight recent advances in the field. Finally we discuss open questions and future research directions in this area.

 

Time: Monday, 2.15 pm

Speaker:  Prof. Dan Feldman, University of Haifa, Israel

Title: Core-sets for learning streaming signals in real-time  (Slides)

Abstract: A coreset (or, core-set) for a given problem is a “compressed” representation of its input, in the sense that a solution for the problem with the (small) coreset as input would yield a provable (1+epsilon) factor approximation to the problem with the original (large) input.

Using traditional techniques, a coreset usually implies provable linear time algorithms for the corresponding optimization problem, which can be computed in parallel on the cloud/GPU, via one pass over Big data, and using only logarithmic space (i.e, in the streaming model).

In this talk I will survey main coresets techniques, with applications for real-time signal processing such as localization of nano-drones, GPS data, and new coresets for deep learning.

 

Time: Tuesday, 9.00 am

Speaker: Prof. Alexander Aue, University of California, Davis, USA

Title: Functional data analysis, with a view on current time series methods   (Slides)

Abstract: In this talk, I will trace the broader developments within the field of functional data analysis that have taken place during the past two or so decades, with attention focused on the case of dependent functional observations. I will discuss by way of examples the most important tools of statistical inference, such as dimension reduction techniques, for independent data, explain what issues arise under dependence and how these may be resolved. These general considerations will then be utilized to give an overview of more specialized prediction algorithms and estimation strategies for functional time series. The talk will conclude with some speculation about future research directions.

 

Time: Tuesday, 9.45 am

Speaker: Prof. Bei Wang, University of Utah, USA

Title: Topological Data Analysis In a Nutshell

Abstract: Topological Data Analysis (TDA) is an emerging area in exploratory data analysis and data  visualization that has had a growing interests and notable successes with an expanding research community.
Topological techniques which capture the “shape of data” have the potential to extract salient features and to provide robust descriptions of large and complex (i.e., high throughput, high-dimensional, incomplete and noisy) data. In this talk, I will survey some of the classic topological techniques, with a focus on their applications in data analysis and data visualization. I will also briefly touch on the new opportunities
connecting TDA with time series analysis.

 

Time: Wednesday, 9.00 am

Speaker:  Prof. Kevin Buchin, TU Eindhoven, the Netherlands and  Prof. Maike Buchin, Ruhr-Universität Bochum, Germany

Title: Trajectory Segmentation and Clustering

Abstract: Nowadays more and more movement data is being collected, of people, animals, and vehicles. Analysing such data requires efficient algorithms. We first give a brief overview of work in this field, and then focus on algorithms for two analysis tasks: segmentation and clustering. Segmentation asks to split and possibly group trajectories such that they have similar movement characteristics. We present geometric and model-based approaches to segmentation, and show how these can be used to classify subtrajectories based on their characteristics. Clustering asks to group similar trajectories or subtrajectories. We present algorithmic results for clustering based on geometric similarity measures

Introduction Round

NII Shonan Meeting Seminar 122- Introduction Round