How to read these slides:

Use arrow keys to navigate.
Press O for switching to overview mode.
Press O again to zoom in.

PRESENT VA Workshop

Tools for Visual Analytics of Time Series.

23rd July 2024
Julian Rakuschek
Prof. Dr. Tobias Schreck

Goals for this workshop:

Time series analysis tasks
Visual analytics tools for these tasks
Discussion of use cases

Clustering

Raw time series

Clustering

Raw time series
Segment into equal sized parts

Clustering

Raw time series
Segment into equal sized parts
Cluster segments (e.g. k-means)

Search

Query

Where does this pattern occur?

Search

Query

Where does this pattern occur?

Anomaly Detection

Find unexpected behavior.

Forecasting

Predict next values in the series.

Tasks

Clustering

Anomaly Detection

Forecasting

Tasks

Clustering

~ 15 minutes

Anomaly Detection

~ 25 minutes

Forecasting

< 5 minutes

Tasks

Clustering

Anomaly Detection

Forecasting

Tasks

Clustering

Anomaly Detection

Forecasting

Tasks

Clustering

Anomaly Detection

Forecasting

Tasks

Clustering

Anomaly Detection

Forecasting

Tasks

Clustering

Anomaly Detection

Forecasting

We want to use Visual Analytics

Image Source: J. Bernard, "Exploratory search in time-oriented primary data"

Why use Visual Analytics?

User-Friendly: No need to code R / Python scripts.

Dynamic Exploration: Instantly change the visualization with a few clicks.

Collaboration

Clustering

AirQ Sensor

AirQ Data

Goals

Find pattern groups and compare across seasons
Find anomalies
Compare channels

Clustering the Sound Channel

Sound time series - each segment (day) is colored according to its clustering group

Each cluster is represented as the average of all its members

The calendar view enables the user to find recurrences - such as a lecture on each monday.

Comparing Semesters

The tool allows the comparison of variables with each other.

Workflow

[Video] Demonstration of using the tool - columns can be compared and easily adjusted by using the settings menu.

PV Energy Production of Households

Another dataset - PV energy production of households in southern germany. The daily segmentation shows the shift of charge and decharge over the day and the season of the year.

PV Energy Production of Households

The monthly segmentation enables the user to see the difference between summer and winter months.

Which application scenarios can you imagine?

What are users currently using?

Example Scenario

Employees plan new Jour-Fixe ...
... but don't enter it in a company-wide calendar
CO2, Temperature, Sound reveal this pattern
Pre-Heat room, open windows accordingly, etc.

Search

Google Search Queries

Szilagyi et al., "Impact of the pandemic and its containment measures in Europe upon aspects of affective impairments: a Google Trends informetrics study", 2023, Cambridge University Press.

Searching through Hover & Click

The user first selects a pattern in the top chart by hovering and clicking a desired pattern. The window width is determined through the granularity setting (day, week, month, sliding window).

The found time series are shown below, overlayed with each other s.t. they can be compared.

Heatmap Visualization of Results

[Video] The tool features a heatmap which allows the user to see where the search results are located. By changing the threshold, more patterns can be found.

Self Similarities

Sometimes the most similar windows occur close in the neighborhood of the query.

We can analyze self similarities by comparing each segment in the time series with each other and plotting the distances in a heatmap.

AirQ Use Case

The user can select an interesting sound profile of a particular day and find all other occurences of the most similar sound profiles, i.e. days with events at 1 p.m.

Sketch Editor (Future Work)

Bernard et al., "VisInfo: a digital library system for time series research data based on exploratory search—a user-centered design approach", 2015, International Journal on Digital Libraries.

Applications

Anomaly Detection: User gives sample of anomaly, e.g. spurious trend in google searches.
Find recurring patterns.

In which scenario would you search for a specific pattern in your series?

Anomaly Detection

GutenTAG

Good Time Series Anomaly Generator

Anomaly Types in the Dataset

There are many time series!

Desired Output: Scoring

Introducing AnoScout

AnoScout offers seven algorithms

AutoEncoder

Discrete Wavelet Transform

Subsequence Isolation Forest

Subsequence Local Outlier Factor

LSTM Forecasting

Random Black Forest

ARIMA

AutoEncoder

Src: https://towardsdatascience.com/autoencoders-and-the-denoising-feature-from-theory-to-practice-db7f7ad8fc78

Discrete Wavelet Transform I

Discrete Wavelet Transform II

Store sliding windows for each layer $l$ in matrices $\mathbf{D}^{(l)}$ and $\mathbf{C}^{(l)}$.
Estimate gaussian distribution on matrices through MLE.
Create a binary vector a that labels each row as either anomalous (1) or normal (0).
On each level, the coefficients are halved, thus creating multiple binary trees across the time series.
If an anomaly is detected in one of the nodes, it is marked as an event $e$, which gets pushed down to the leafs of the tree.
Event count in the leafs is the final anomaly score.

Discrete Wavelet Transform III

Subsequence Isolation Forest I

Partition dataset using tree: Anomalies easy to isolate, therefore closer to root node.

Isolation tree construction:

Randomly select an attribute $q$ in the dataset
Randomly select a value $p$ from the selected attribute column in the dataset and proceed to split the dataset into two partitions based on the value.
Two new nodes are attached to the current node; the left node contains all values smaller than $p$ , while the right node contains all nodes greater or equal than $p$ .
These steps are recursively repeated until either the dataset is depleted or the maximum tree height specified by the user has been reached.

Subsequence Isolation Forest II

Subsequence Isolation Forest III

Assign anomaly scores to each node $x$

Compute average path length $h(x)$ for node $x$ from multiple isolation trees.
Estimate total average path length: \[c(n) = 2H (n − 1) − (2(n − 1)/n)\]
Anomaly score $s$ for a point $x$: \[ s(x, n) = 2^{-\frac{E(h(x))}{c(n)}} \]

Subsequence Local Outlier Factor

Compute a Local Outlier Factor (LOF) for each point to measure the degree of deviation from normal data in the local neighborhood.

Reachability distance (RD) of $p$ w.r.t. $q$: \[RD_k(p, q) = \max \left\{ \text{k-distance}(q), d(p, q) \right\}\]
Local reachability density (LRD) for a point $p$: \[ LRD_k(p) = \frac{1}{\sum_{q \in N_k(p)} \frac{RD(p, q)}{\left\| N_k(p) \right\|} } \]
Local Outlier Factor (LOF): \[ LOF_k(p) = \frac{\sum_{q \in N_k(p)} \frac{LRD_k(q)}{LRD_k(p)} }{\left\| N_k(p) \right\|} \]

LSTM Forecasting I

Idea of an RNN:

Neural networks for time series!

LSTM Forecasting II

RNN is not able to consider information far in the past, therefore LSTM is introduced:

Random Black Forest

RBF is a nested regression ensemble method:

Bagging Regressor

Random Forest Regressor

Decision Tree Regressor

Random Forest Regressor

Decision Tree Regressor

Random Forest Regressor

Decision Tree Regressor

ARIMA I

Forecasting model with three components:

Autoregressive model: \[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \ldots + \phi_p y_{t-p} + \epsilon_t \]
Moving average: \[ y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} \]
Differencing: \[ y_t^{\prime} = y_t - y_{t-1} \]

ARIMA II

All combined:

\[ y_t^{\prime} = c + \phi_1 y^{\prime}_{t-1} + \phi_2 y^{\prime}_{t-2} + \ldots + \phi_p y^{\prime}_{t-p} + \\ \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} + \epsilon_t \]

Upload and Computation

Do we have anomalies now?

Not yet!

Anomaly Extraction

Algorithm Settings

Anomaly Representation

Manual Inspection

Exploring Anomalies

Heatmap

Scatterplot

Cluster Overview

Dissimilarities

Heatmap

Bird's Eye Perspective

The Severity of an Anomaly

\[\text{severity} = \sqrt{\text{score}^2 + \text{length}^2}\]

The Scatterplot is Interactive!

The Recommender in action

How do we gain an overview of anomaly patterns?

Clustering!

Clustering of Anomalies

Finding The Most Dissimilar Anomalies

Main features of AnoScout summarized

Exploration pipeline for anomalies in time-oriented data.
7 algorithms for computing anomalies.
"Playground" for testing various algorithms.
Using user labels to fine-tune the system.

Application Scenario

A company wants to install a new machine.
The machine conducts an etching process (semiconductor manufacturing).
Each etching process is recorded through a sensor (e.g. pressure, temperature, and gas)
We want to use AnoScout to:
1. Find possible anomaly patterns.
2. Check which algorithms work well.

What anomalies do you encounter?

Why do you care?

Do you have a dataset that looks like this?

Forecasting

Predict Number of Airline Passengers

Introducing PredictPal

Workflow

Demo Video kindly provided by Yaryna Korduba.

Conclusion

Four Tasks

Clustering

Anomaly Detection

Forecasting

Four Solutions

Clustering

Anomaly Detection

Forecasting

Contact

Prof. Dr. Tobias Schreck

tobias.schreck@cgv.tugraz.at

Julian Rakuschek

julian.rakuschek@tugraz.at

We offer:

Preliminary Data Checks
Pairwise Visual Analytics Sessions
Visual Analytics Consulting

How to read these slides:

PRESENT VA Workshop

Goals for this workshop:

Clustering

Clustering

Clustering

Search

Search

Search

Anomaly Detection

Forecasting

Tasks

Tasks

Tasks

Tasks

Tasks

Tasks

Tasks

We want to use Visual Analytics

Why use Visual Analytics?

Clustering

AirQ Sensor

AirQ Data

Goals

Clustering the Sound Channel

Comparing Semesters

Workflow

PV Energy Production of Households

PV Energy Production of Households

Which application scenarios can you imagine?

What are users currently using?

Example Scenario

Search

Google Search Queries

Google Search Queries

Searching through Hover & Click

Heatmap Visualization of Results

Self Similarities

AirQ Use Case

Sketch Editor (Future Work)

Applications

In which scenario would you search for a specific pattern in your series?

Anomaly Detection

GutenTAG

Anomaly Types in the Dataset

There are many time series!

Desired Output: Scoring

Introducing AnoScout

AnoScout offers seven algorithms

Upload and Computation

Do we have anomalies now?

Not yet!

Anomaly Extraction

Algorithm Settings

Anomaly Representation

Manual Inspection

Exploring Anomalies

Heatmap

The Severity of an Anomaly

The Scatterplot is Interactive!

The Recommender in action

How do we gain an overview of anomaly patterns?

Clustering of Anomalies

Finding The Most Dissimilar Anomalies

Main features of AnoScout summarized

Application Scenario

What anomalies do you encounter?

Why do you care?

Do you have a dataset that looks like this?

Forecasting

Predict Number of Airline Passengers

Introducing PredictPal

Workflow

Conclusion

Four Tasks

Four Solutions

Contact

We offer:

Thank you!