AnoScout
A System for Visual Recommendations of Anomalies in Time-Oriented Data
Master Thesis by Julian Rakuschek
Supervised by Prof. Dr. Tobias Schreck
Three Easy Pieces
Introduction
AnoScout
Behind the scenes
This is an Anomaly (Outlier)
This is an Anomaly (Shapelet)
Some anomalies mess with the seasonality ...
... while others cause a drift
Easy to see visually, but how does a computer see it?
We use anomaly detection algorithms!
Forecasting-Based Methods
Anomaly Score is the deviation from the forecast.
AnoScout offers seven algorithms
AutoEncoder
Discrete Wavelet Transform
Subsequence Isolation Forest
Subsequence Local Outlier Factor
LSTM Forecasting
Random Black Forest
ARIMA
Discrete Wavelet Transform I
Discrete Wavelet Transform II
- Store sliding windows for each layer $l$ in matrices $\mathbf{D}^{(l)}$ and
$\mathbf{C}^{(l)}$.
- Estimate gaussian distribution on matrices through MLE.
- Create a binary vector a that labels each row as either anomalous (1) or normal (0).
- On each level, the coefficients are halved, thus creating multiple binary trees across the time
series.
- If an anomaly is detected in one of the nodes, it is marked as an event $e$, which gets pushed
down to the leafs of the tree.
- Event count in the leafs is the final anomaly score.
Discrete Wavelet Transform III
Subsequence Isolation Forest I
Partition dataset using tree: Anomalies easy to isolate, therefore closer to root node.
Isolation tree construction:
- Randomly select an attribute $q$ in the dataset
- Randomly select a value $p$ from the selected attribute column in the
dataset and proceed to split the dataset into two partitions based on
the value.
- Two new nodes are attached to the current node; the left node contains
all values smaller than $p$ , while the right node contains all nodes
greater or equal than $p$ .
- These steps are recursively repeated until either the dataset is depleted
or the maximum tree height specified by the user has been reached.
Subsequence Isolation Forest II
Subsequence Isolation Forest III
Assign anomaly scores to each node $x$
- Compute average path length $h(x)$ for node $x$ from multiple isolation trees.
- Estimate total average path length: \[c(n) = 2H (n − 1) − (2(n − 1)/n)\]
- Anomaly score $s$ for a point $x$: \[
s(x, n) = 2^{-\frac{E(h(x))}{c(n)}}
\]
Subsequence Local Outlier Factor
Compute a
Local Outlier Factor (LOF) for each point to measure the degree of deviation
from normal data in the local neighborhood.
- Reachability distance (RD) of $p$ w.r.t. $q$: \[RD_k(p, q) = \max \left\{ \text{k-distance}(q),
d(p, q) \right\}\]
- Local reachability density (LRD) for a point $p$:
\[
LRD_k(p) = \frac{1}{\sum_{q \in N_k(p)} \frac{RD(p, q)}{\left\| N_k(p) \right\|} }
\]
-
Local Outlier Factor (LOF):
\[
LOF_k(p) = \frac{\sum_{q \in N_k(p)} \frac{LRD_k(q)}{LRD_k(p)} }{\left\| N_k(p) \right\|}
\]
LSTM
Forecasting I
Idea of an RNN:
Neural networks for time series!
LSTM
Forecasting II
RNN is not able to consider information far in the past, therefore LSTM is introduced:
Random Black
Forest
RBF is a nested regression ensemble method:
Bagging Regressor
Random Forest Regressor
Decision Tree Regressor
Decision Tree Regressor
Decision Tree Regressor
Random Forest Regressor
Decision Tree Regressor
Decision Tree Regressor
Decision Tree Regressor
Random Forest Regressor
Decision Tree Regressor
Decision Tree Regressor
Decision Tree Regressor
ARIMA I
Forecasting model with three components:
- Autoregressive model: \[
y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \ldots + \phi_p y_{t-p} + \epsilon_t
\]
-
Moving average: \[
y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q}
\]
- Differencing: \[
y_t^{\prime} = y_t - y_{t-1}
\]
ARIMA II
All combined:
\[
y_t^{\prime} = c + \phi_1 y^{\prime}_{t-1} + \phi_2 y^{\prime}_{t-2} + \ldots + \phi_p y^{\prime}_{t-p} + \\
\theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} + \epsilon_t
\]
Ensemble Method
Compute weighted average of anomaly scores
What if there are many time series?
A Data Scientist wants to know:
Which anomaly patterns arise?
Which algorithms work well?
Which anomalies are the most severe?
This is a Visual Analytics Process!
Image Source: J. Bernard, "Exploratory search in time-oriented primary
data"
Inspiration: MTV
Image Source: Liu et al., "MTV: Visual Analytics for Detecting,
Investigating, and Annotating Anomalies in Multivariate Time Series"
Inspiration: Cyclic Time Series Data with Matrix and Glyph Representations
Image Source: Suschnigg et al., "Visual Exploration of Anomalies in Cyclic
Time Series Data with Matrix and Glyph Representations"
AnoScout
A System for Anomaly Exploration
GutenTAG
Good Time Series Anomaly Generator
Anomaly Types in the Dataset
Do we have anomalies now?
Not yet!
Exploring Anomalies
Heatmap
Scatterplot
Cluster Overview
Dissimilarities
Heatmap
Bird's Eye Perspective
The Severity of an Anomaly
\[\text{severity} = \sqrt{\text{score}^2 + \text{length}^2}\]
The Scatterplot is Interactive!
Ranking Anomalies
Two methods implemented:
- Sort by severity and rating
-
Estimate rating based on similar item's ratings:
\[
\hat{r}_i = \frac{\sum_{j \in I^k} sim(i, j)r_j}{\sum_{j \in I^k} sim(i,j)}
\]
where $I^k$ = k-Nearest Neighbors of Anomaly $i$.
Measuring similarity
AnoScout uses Dynamic Time Warping
The Recommender in action
How do we gain an overview of anomaly patterns?
Clustering!
Finding The Most Dissimilar Anomalies
Optimizing the Ensemble Hyperparameters
Hypothesis: User labels $y_i$ approximated by anomaly scores $\hat{y}_i$
\[
\frac{\mathbf{a}^{T}_i \mathbf{w}}{\left\| \mathbf{w} \right\|} = \hat{y}_i \approx y_i
\]
The Optimization Problem
\[
\begin{gather*}
\min \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \sum_{j=1}^{k} w_j \hat{y}_{ij} \right)^2 \text{ s.t. } \\
\sum_{j=1}^{k} w_j = 1, \\
w_j \geq 0, \forall j = 1 \ldots k
\end{gather*}
\]
Gridsearch for the other parameters:
- Differencing: To apply or not to apply, that is the question.
- Threshold: Search space in the interval $[0, 1]$ with step size $0.001$.
- Savitzky-Golay filter: Search space in the interval $[10, 200]$ with step
size $1$.
Result
Caution: Trash in, trash out!
WIP: Shape Matching
Boost shapelets that are similar to highly rated anomalies.
Mute shapelets rated as "Nominals".
Summary
Contributions
- Exploration pipeline for anomalies in time-oriented data.
- "Playground" for testing various algorithms.
- Using user labels to fine-tune the system.
Future Work
- Advanced Recommenders, e.g. RecWalk or BPR [Prof. Lex]
- Explainable AI, e.g. a 2D Projection for the Sub-LOF method [Prof. Holzinger]
- Impact of user labels [Prof. Schreck]
- Evaluation: Domain experts and proper use case
Please tell me your ideas! 🙏
Conclusion
AnoScout combines a bunch of different algorithms and presents them to the
user in a fancy way.
Thank you!