Get Complete Project Material File(s) Now! »
Supervised vs Unsupervised method
Supervised learning consists of input variables x and an output variable Y . You use an algorithm to learn the mapping function f from the input to the output. Y = f(x).
The goal is to learn the mapping function such that when you have new input data x, you can predict the output variables Y for that data. Unsupervised learning involves having only input data x and no corresponding output variables. The goal of unsupervised learning is to model the underlying
structure or distribution in the data in order to learn more about the data.
In supervised anomaly detection, if we want a model to be able to detect anomalies, it must characterize the system very precisely both in normal behavior and in the presence of anomalies. However, normal behaviors can be multiple, as well as behaviors in the presence of anomalies. The large number of system behaviors leads to the need to provide a large amount of labeled data to capture a maximum of different behaviors both normal and abnormal. This is even more complicated as anomalies are rare events. Thus, the shorter the training dataset, the less anomalies it contains, which are precisely the elements we want to be able to discriminate efficiently.
Thus, unsupervised learning is perfectly adapted to the problem of anomaly detection since it is not necessary to label large data sets. Moreover, a part of the anomalies come from new behaviors of the system. By definition, these behaviors could not be correctly classified with supervised anomaly detection methods.
Machine learning-based methods
The methods presented in this section fall into three categories : Isolation, Neighbourhoodbased and Domain-based methods, which have been proposed in the survey by Domingues et al. [35]. Isolation algorithms consider a point as an anomaly when it is easy to isolate from others. Neighbourhood-based models look at the neighbourhood of each data point to identify outliers. Domain-based methods rely on the construction of a boundary separating the nominal data from the rest of the input space.
A common characteristic of machine learning-based techniques is that they typically model the dependency between a current time point and previous ones by transforming the multivariate time series T into a sequence of windowsW = fW1; : : : ;WT g,where Wt is a time window of length K at a given time t : Wt = fxtK+1; : : : ; xt1; xtg.
Neighbourhood-based methods
Among neighborhood-based methods, which study the neighborhoods of every point to identify anomalies, the Local Outlier Factor (LOF) [59] measures the local deviation of a given data point with respect to its neighbours. Based on the K-nearest neighbors [60], the local density of an observation is evaluated by considering the closest K observations in its neighbourhood. The anomaly score is then calculated by contrasting its local density with those of its k-nearest neighbors. A high score indicates a lower density than its neighbors and therefore potentially an anomaly. Ithas been applied to multivariate time series [61], demonstrating its ability to detect anomalies in long-term data.
Density-based spatial clustering of applications with noise (DBSCAN) [62] is a clustering method that groups data points in high density areas (many nearby neighbors) and marks points in low-density regions (few neighbors) as anomalous. DBSCAN thus classifies the points into three categories. Core points are points containing at least minPts in their distance neighborhood. The density-reachable points are the points containing at least one core point in their neighborhood. The other points are considered as anomaly. To handle multivariate time series, DBSCAN considers each time window as a point with the anomaly score being the distance from the point to the nearest cluster [63].
Deep learning-based methods
DNN-based methods are a are sub-category of machine learning-based approaches, which rely on deep neural networks. Given the explosion of DNN-based methods over the last years, they are presented as a separate category.
An Auto-Encoder (AE) [66] is an artificial neural network combining an encoder E and a decoder D. The encoder part takes the input window W and maps it into a set of latent variables Z, whereas the decoder maps the latent variables Z back into the input space as a reconstruction cW. The difference between the original input vector W and the reconstruction cW is called the reconstruction error. Thus, the training objective aims to minimize this error. Auto-encoder-based anomaly detection uses the reconstruction error as the anomaly score. Time windows with a high score are considered to be anomalies [21].
Auto-Encoders and Generative Adversarial Networks limitations
Autoencoder-based anomaly detection uses the reconstruction error as the anomaly score. Points with a high score are considered to be anomalies. Only samples from normal data are used at training. At inference, the AE will reconstruct normal data very well, while failing to do so with anomaly data which the AE has not encountered. If the anomaly is too small, i.e. it is relatively close to normal data, the reconstruction error will be small and thus the anomaly will not be detected.
UnSupervised Anomaly Detection (USAD)
This occurs because the AE aims to reconstruct input data as well (as close to normality) as possible. To overcome this problem, the AE should be able to identify if the input data contains no anomaly before doing a good reconstruction.
The GAN is trained in an adversarial way between the Generator and the Discriminator. GAN-based anomaly detection uses the output of the Discriminator as the anomaly score during inference. GAN training is not always easy, due to problems such as mode collapse and non-convergence [81], often attributed to the imbalance between the generator and the discriminator. Indeed, if an imbalance is created between the generator and the discriminator, the network is no longer able to learn correctly since one of the two is overwhelmed by the performance of the other. Techniques exist to stabilize the training and ensure a good convergence, have been proposed as the WGAN algorithm (Wasserstein GAN [82]), but it remains to be improved, especially because of the added complexity that these methods involve.
UnSupervised Anomaly Detection (USAD)
The UnSupervised Anomaly Detection (USAD) method is formulated as an AE architecture within a two-phase adversarial training framework. On one hand, this allows to overcome the intrinsic limitations of AEs by training a model capable of identifying when the input data does not contain an anomaly and thus perform a good reconstruction. On the other hand, the AE architecture allows to gain stability during adversarial training, therefore addressing the problem of collapse and nonconvergence mode encountered in GANs.
Table of contents :
List of Figures
List of Tables
1 Introduction
1.1 Context and Motivations
1.2 Contributions
1.3 Structure of thesis
1.4 Publications
1.5 Challenge participation
2 Anomaly detection in time-series
2.1 Time Series
2.1.1 Univariate vs Multivariate
2.1.2 Decomposition of a time series
2.1.2.1 Trend
2.1.2.2 Seasonality
2.1.2.3 Level
2.1.2.4 Noise
2.1.3 Stationarity
2.2 Anomaly Detection
2.2.1 Types of Anomalies in Time Series
2.2.2 Supervised vs Unsupervised method
2.2.3 Taxonomy
2.2.4 Conventional methods
2.2.4.1 Control Charts methods
2.2.4.2 Forecast methods
2.2.4.3 Decomposition methods
2.2.4.4 Similarity-search approach
2.2.5 Machine learning-based methods
2.2.5.1 Isolation methods
2.2.5.2 Neighbourhood-based methods
2.2.5.3 Domain-based methods
2.2.6 Deep learning-based methods
3 Unsupervised Anomaly Detection on Multivariate Time Series
3.1 Introduction
3.2 Auto-Encoders and Generative Adversarial Networks limitations
3.3 UnSupervised Anomaly Detection (USAD)
3.3.1 Method
3.3.2 Implementation
3.3.3 Experimental setup
3.3.3.1 Datasets
3.3.3.2 Feasibility study: Orange’s dataset
3.3.3.3 Evaluation Metrics
3.3.4 Experiments and Results
3.3.4.1 Overall performance
3.3.4.2 Effect of parameters
3.3.4.3 Training time
3.3.4.4 Ablation Study
3.3.4.5 Feasibility study
3.4 Conclusion
4 From Univariate to Multivariate Time Series Anomaly Detection with Non-Local Information
4.1 Introduction
4.2 Related works
4.3 From univariate to multivariate time series
4.4 Experiments and Results
4.4.1 Datasets
4.4.2 Experimental setup
4.4.2.1 Implementation.
4.4.3 Results
4.5 Discussion and Conclusions
5 Are Deep Neural Networks Methods Needed for Anomaly Detection on Multivariate Time Series?
5.1 Introduction
5.2 Related work
5.3 Experimental setup
5.3.1 Public Datasets
5.3.2 Evaluation Metrics
5.4 Experiments and Results
5.4.1 Benchmark Performance
5.4.2 Analysis of WADI
5.4.3 Impact of training set size
5.4.4 Discussion
5.5 Conclusion
6 Conclusion and Perspectives
6.1 Conclusion
6.2 Perspectives
References