How to read these slides:

  1. Use arrow keys to navigate.
  2. Press O for switching to overview mode.
  3. Press O again to zoom in.

PRESENT VA Workshop

Tools for Visual Analytics of Time Series.

23rd July 2024
Julian Rakuschek
Prof. Dr. Tobias Schreck

Goals for this workshop:

  1. Time series analysis tasks
  2. Visual analytics tools for these tasks
  3. Discussion of use cases

Clustering

  1. Raw time series

Clustering

  1. Raw time series
  2. Segment into equal sized parts

Clustering

  1. Raw time series
  2. Segment into equal sized parts
  3. Cluster segments (e.g. k-means)

Search

Search

Query
Where does this pattern occur?

Search

Query
Where does this pattern occur?

Anomaly Detection

Find unexpected behavior.

Forecasting

Predict next values in the series.

Tasks

Clustering
Search
Anomaly Detection
Forecasting

Tasks

Clustering~ 15 minutes
Search~ 15 minutes
Anomaly Detection~ 25 minutes
Forecasting < 5 minutes

Tasks

Clustering
Search
Anomaly Detection
Forecasting

Tasks

Clustering
Search
Anomaly Detection
Forecasting

Tasks

Clustering
Search
Anomaly Detection
Forecasting

Tasks

Clustering
Search
Anomaly Detection
Forecasting

Tasks

Clustering
Search
Anomaly Detection
Forecasting

We want to use Visual Analytics

Image Source: J. Bernard, "Exploratory search in time-oriented primary data"

Why use Visual Analytics?

User-Friendly: No need to code R / Python scripts.

Dynamic Exploration: Instantly change the visualization with a few clicks.

Collaboration

Clustering

AirQ Sensor

AirQ Data

Goals

  1. Find pattern groups and compare across seasons
  2. Find anomalies
  3. Compare channels

Clustering the Sound Channel

Sound time series - each segment (day) is colored according to its clustering group
Each cluster is represented as the average of all its members
The calendar view enables the user to find recurrences - such as a lecture on each monday.

Comparing Semesters

The tool allows the comparison of variables with each other.

Workflow

[Video] Demonstration of using the tool - columns can be compared and easily adjusted by using the settings menu.

PV Energy Production of Households

Another dataset - PV energy production of households in southern germany. The daily segmentation shows the shift of charge and decharge over the day and the season of the year.

PV Energy Production of Households

The monthly segmentation enables the user to see the difference between summer and winter months.

Which application scenarios can you imagine?

What are users currently using?

Example Scenario

  • Employees plan new Jour-Fixe ...
  • ... but don't enter it in a company-wide calendar
  • CO2, Temperature, Sound reveal this pattern
  • Pre-Heat room, open windows accordingly, etc.

Search

Google Search Queries

Google Search Queries

Szilagyi et al., "Impact of the pandemic and its containment measures in Europe upon aspects of affective impairments: a Google Trends informetrics study", 2023, Cambridge University Press.

Searching through Hover & Click

The user first selects a pattern in the top chart by hovering and clicking a desired pattern. The window width is determined through the granularity setting (day, week, month, sliding window).
The found time series are shown below, overlayed with each other s.t. they can be compared.

Heatmap Visualization of Results

[Video] The tool features a heatmap which allows the user to see where the search results are located. By changing the threshold, more patterns can be found.

Self Similarities

Sometimes the most similar windows occur close in the neighborhood of the query.
We can analyze self similarities by comparing each segment in the time series with each other and plotting the distances in a heatmap.

AirQ Use Case

The user can select an interesting sound profile of a particular day and find all other occurences of the most similar sound profiles, i.e. days with events at 1 p.m.

Sketch Editor (Future Work)

Bernard et al., "VisInfo: a digital library system for time series research data based on exploratory search—a user-centered design approach", 2015, International Journal on Digital Libraries.

Applications

  • Anomaly Detection: User gives sample of anomaly, e.g. spurious trend in google searches.
  • Find recurring patterns.

In which scenario would you search for a specific pattern in your series?

Anomaly Detection

GutenTAG

Good Time Series Anomaly Generator

Anomaly Types in the Dataset

There are many time series!

Desired Output: Scoring

Introducing AnoScout

AnoScout offers seven algorithms

AutoEncoder
Discrete Wavelet Transform
Subsequence Isolation Forest
Subsequence Local Outlier Factor
LSTM Forecasting
Random Black Forest
ARIMA
AutoEncoder

Src: https://towardsdatascience.com/autoencoders-and-the-denoising-feature-from-theory-to-practice-db7f7ad8fc78

Discrete Wavelet Transform I
Discrete Wavelet Transform II
  1. Store sliding windows for each layer $l$ in matrices $\mathbf{D}^{(l)}$ and $\mathbf{C}^{(l)}$.
  2. Estimate gaussian distribution on matrices through MLE.
  3. Create a binary vector a that labels each row as either anomalous (1) or normal (0).
  4. On each level, the coefficients are halved, thus creating multiple binary trees across the time series.
  5. If an anomaly is detected in one of the nodes, it is marked as an event $e$, which gets pushed down to the leafs of the tree.
  6. Event count in the leafs is the final anomaly score.
Discrete Wavelet Transform III
Subsequence Isolation Forest I

Partition dataset using tree: Anomalies easy to isolate, therefore closer to root node.

Isolation tree construction:

  1. Randomly select an attribute $q$ in the dataset
  2. Randomly select a value $p$ from the selected attribute column in the dataset and proceed to split the dataset into two partitions based on the value.
  3. Two new nodes are attached to the current node; the left node contains all values smaller than $p$ , while the right node contains all nodes greater or equal than $p$ .
  4. These steps are recursively repeated until either the dataset is depleted or the maximum tree height specified by the user has been reached.
Subsequence Isolation Forest II
Subsequence Isolation Forest III

Assign anomaly scores to each node $x$

  1. Compute average path length $h(x)$ for node $x$ from multiple isolation trees.
  2. Estimate total average path length: \[c(n) = 2H (n − 1) − (2(n − 1)/n)\]
  3. Anomaly score $s$ for a point $x$: \[ s(x, n) = 2^{-\frac{E(h(x))}{c(n)}} \]
Subsequence Local Outlier Factor

Compute a Local Outlier Factor (LOF) for each point to measure the degree of deviation from normal data in the local neighborhood.

  1. Reachability distance (RD) of $p$ w.r.t. $q$: \[RD_k(p, q) = \max \left\{ \text{k-distance}(q), d(p, q) \right\}\]
  2. Local reachability density (LRD) for a point $p$: \[ LRD_k(p) = \frac{1}{\sum_{q \in N_k(p)} \frac{RD(p, q)}{\left\| N_k(p) \right\|} } \]
  3. Local Outlier Factor (LOF): \[ LOF_k(p) = \frac{\sum_{q \in N_k(p)} \frac{LRD_k(q)}{LRD_k(p)} }{\left\| N_k(p) \right\|} \]
LSTM Forecasting I

Idea of an RNN:

Neural networks for time series!

LSTM Forecasting II

RNN is not able to consider information far in the past, therefore LSTM is introduced:

Random Black Forest

RBF is a nested regression ensemble method:

Bagging Regressor

Random Forest Regressor

Decision Tree Regressor
Decision Tree Regressor
Decision Tree Regressor

Random Forest Regressor

Decision Tree Regressor
Decision Tree Regressor
Decision Tree Regressor

Random Forest Regressor

Decision Tree Regressor
Decision Tree Regressor
Decision Tree Regressor
ARIMA I

Forecasting model with three components:

  1. Autoregressive model: \[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \ldots + \phi_p y_{t-p} + \epsilon_t \]
  2. Moving average: \[ y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} \]
  3. Differencing: \[ y_t^{\prime} = y_t - y_{t-1} \]
ARIMA II

All combined:

\[ y_t^{\prime} = c + \phi_1 y^{\prime}_{t-1} + \phi_2 y^{\prime}_{t-2} + \ldots + \phi_p y^{\prime}_{t-p} + \\ \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} + \epsilon_t \]

Upload and Computation

Do we have anomalies now?

Not yet!

Anomaly Extraction

Algorithm Settings

Anomaly Representation

Manual Inspection

Exploring Anomalies

Heatmap

Scatterplot

Cluster Overview

Dissimilarities

Heatmap

Bird's Eye Perspective

The Severity of an Anomaly

\[\text{severity} = \sqrt{\text{score}^2 + \text{length}^2}\]

The Scatterplot is Interactive!

The Recommender in action

How do we gain an overview of anomaly patterns?

Clustering!

Clustering of Anomalies

Finding The Most Dissimilar Anomalies

Main features of AnoScout summarized

  1. Exploration pipeline for anomalies in time-oriented data.
  2. 7 algorithms for computing anomalies.
  3. "Playground" for testing various algorithms.
  4. Using user labels to fine-tune the system.

Application Scenario

  • A company wants to install a new machine.
  • The machine conducts an etching process (semiconductor manufacturing).
  • Each etching process is recorded through a sensor (e.g. pressure, temperature, and gas)
  • We want to use AnoScout to:
    1. Find possible anomaly patterns.
    2. Check which algorithms work well.

What anomalies do you encounter?

Why do you care?

Do you have a dataset that looks like this?

Forecasting

Predict Number of Airline Passengers

Introducing PredictPal

Workflow

Demo Video kindly provided by Yaryna Korduba.

Conclusion

Four Tasks

Clustering
Search
Anomaly Detection
Forecasting

Four Solutions

Clustering
Search
Anomaly Detection
Forecasting

Contact

Prof. Dr. Tobias Schreck

tobias.schreck@cgv.tugraz.at

Julian Rakuschek

julian.rakuschek@tugraz.at


We offer:
  • Preliminary Data Checks
  • Pairwise Visual Analytics Sessions
  • Visual Analytics Consulting

Thank you!