A Comparative Analysis of Image Encoding of Time Series for Anomaly Detection

Chris Aldrich

doi:10.5772/intechopen.1002535

Abstract

A novel approach to anomaly detection in time series data is based on the use of multivariate image analysis techniques. With this approach, time series are encoded as images that make them amenable to analysis by pretrained deep neural networks. Few studies have evaluated the merits of the different image encoding algorithms, and in this investigation, encoding of time series data with Euclidean distance plots or unthresholded recurrence plots, Gramian angular fields, Morlet wavelet scalograms, and an ad hoc approach based on the presentation of the raw time series data in a stacked format are compared. This is done based on three case studies where features are extracted from the images with gray level co-occurrence matrices, local binary patterns and the use of a pretrained convolutional neural network, GoogleNet. Although no method consistently outperformed all the other methods, the Euclidean distance plots and GoogleNet features yielded the best results.

Keywords

image encoded time series
wavelet spectrograms
Gramian angular fields
Euclidean distance plots
recurrence plots
gray level co-occurrence matrices
local binary patterns
GoogleNet

Author Information

Show +

Chris Aldrich*
- Western Australian School of Mines, Curtin University, Perth, WA, Australia

*Address all correspondence to: chris.aldrich@curtin.edu.au

1. Introduction

Anomaly detection in nonlinear time series is a field that focuses on identifying unusual or abnormal patterns in time series data that exhibit nonlinear behavior. Time series data refers to a sequence of observations collected over time, and nonlinear time series data refers to data that does not follow a linear trend or relationship.

Anomalies, also known as outliers or aberrations, are data points or patterns that deviate significantly from the expected or normal behavior of the time series. These anomalies may represent critical events, irregularities, faults, or rare occurrences that are of interest for further investigation.

Detecting anomalies in nonlinear time series is a difficult task, since traditional methods based on linear assumptions or statistical measures may not effectively capture the nonlinear relationships and patterns present in the data.

These challenges can be addressed by use of nonlinear models and time series analysis methods specifically designed to handle nonlinear data. They aim to capture the intricate structures, dependencies, and irregularities present in the time series and identify deviations from the expected behavior. This can be accomplished by learning a representation of the original or normal time series behavior that can be used as a reference for the identification of anomalous patterns in the time series.

The quality of these models depends on the quality of the features that invariably have to be extracted from the time series to enable analysis. In this chapter, imaging of time series is empirically investigated by comparing different methods of image generation, as well as the extraction of features from these images. One of the major drivers in time series imaging is that this allows the use of pretrained convolutional neural networks and currently emerging vision transformers [1] that have recently redefined the state of the art in image analysis.

This is a major advantage in time series classification, where end-to-end learning direct from the images can be used. In what is referred to as transfer learning, this often entails further training of some of the feature layers of the networks, which could markedly improve the performance of the model, even if relatively few data are available for this purpose [2].

In the context of anomaly or change point detection in time series data, where labeled data are not necessarily available, this is not a direct option. Under these circumstances, the problem can be treated as a multivariate statistical process monitoring problem, where images are used as a basis for comparison. This would require a reference time series or sets of time series from which image features could be extracted. New time series data would subsequently be encoded as images, and the features extracted from these time series could then be compared with those from the reference time series in some formal monitoring scheme.

In this investigation, such formal monitoring approaches are not considered. Instead, the focus is on the comparative merits of the imaging and feature extraction methods, as these would be critical to the performance of any monitoring scheme.

The rest of the chapter is organized as follows. In the next section, a brief overview of image encoding of time series data is given. This is followed by a summary of the analytical methodology used in the investigation. The next three sections deal with three different case studies. In the final section, the results are discussed and the conclusions of the study are summarized.

2. Imaging of time series data

The rationale for using image-encoding techniques of time series data is that it allows for the use of well-established image processing and computer vision techniques for analysis. It also provides a visual representation that can be easily interpreted by humans, aiding in the understanding of complex temporal patterns.

Over the last decade, different approaches have been investigated. Broadly, these can be categorized as dealing with univariate or multivariate data, as outlined in Figure 1. Methods can be grouped into imaging of univariate and multivariate time series. Some methods, such as Euclidean distance plots and their thresholded version, recurrence plots, can deal naturally with time series in more than one dimension, while others, like Gramian angular field plots, cannot. Regardless, multivariate time series can be handled by stacking or multibanding of images obtained from individual time series, for example [3, 4].

Figure 1.
Image-encoding methods for time series.

Some feature extraction methods, such as pretrained convolutional neural networks that are increasingly used with imaged time series, can also naturally processes image triplets, so in principle, they could handle up to three different types of (gray-scale) images (Figure 2).

Figure 2.
Comparative analysis of the use of the most popular approaches for image encoding of time series data.

Gramian angular fields: GAFs have been one of the first approaches used to encode time series for multivariate image analysis [5] and remains the most popular approach for this purpose. Generally, GAF has been used in diverse fields ranging from tool wear classification with CNNs [6] to identification of nontechnical losses in power systems [7], recognition of wearable sensor-based human activity [8], fault detection in transmission lines [9], time series classification with vision transformers [10], and so on.

Distance and recurrence plots [11, 12]: Recurrence plots are graphical representations of the recurrence of a signal’s pattern. They are created by comparing each point in a signal to all other points and determining if they are close enough to be considered recurrent. Recurrence plots display the time points at which recurrences occur as black pixels or dots on a two-dimensional grid, providing insights into the signal’s temporal structure and periodicity.

They are essentially thresholded distance plots, with the Euclidean distance as the most popular measure of proximity between points. Recurrence plots underpin recurrence quantification analysis, a well-established approach to time series analysis with applications in a wide range of disciplines. Although recurrence plots per se could also be used directly as a basis for feature extraction by means other than RQA, this has approach has not been widely established. On the other hand, unthresholded distance plots are seeing growing use in time series analysis [13, 14, 15].

Wavelet scalograms [16, 17, 18, 19]: Wavelet analysis involves decomposing a signal into different frequency components using wavelet transforms. By representing the resulting coefficients as an image, known as a wavelet spectrogram, both time and frequency information can be captured simultaneously. The intensity or color of each pixel represents the magnitude of the corresponding frequency component. Image encoding of time series data with wavelet scalograms is relatively new and not as well-established as encoding with Gramian angular fields or recurrence plots, for example.

Ad hoc methods: These could include methods specific to the domain, such as heat maps for furnaces, day-hour power consumption heat maps [20], or climatic events with a time axis and a spatial dimension and time delay embeddings [21, 22] that are not captured in recurrence plots.

3. Analytical methodology

The analytical methodology is shown schematically in Figure 3. It consists of several steps. The first is segmentation of the time series by a moving window of a user-specified length, b, moving with a step size, s. If b = s, the time series is segmented into a number of contiguous segments. Otherwise, if b > s, the segments are overlapping. In some predictive monitoring schemes, it may also be possible to use b < s parameter configurations.

Figure 3.
General approach to image encoding and feature extraction from time series.

3.1 Image encoding

Four methods were used to encode the time series into images. These were based on Euclidean distance matrices of the time series, Gramian angular field matrices, imaging derived from Toeplitz matrices of time series, as well as wavelet scalograms, as briefly summarized below.

3.1.1 Euclidean distance plots

Euclidean distance plots are obtained from the distances between pairs of points in the time series segment, as shown in Figure 4. Imaging of the pairwise distance values (dij) yields a Euclidean distance plot. Thresholded Euclidean distance plots, also referred to as recurrence plots, are widely used in recurrence quantification analysis of time series.

Figure 4.
Derivation of Euclidean distance plots from time series data.

3.1.2 Gramian angular fields

Gramian angular fields can serve as a basis for imaging signals, particularly time series data. It represents the dynamics of a signal by encoding pairwise angles between its data points in a visual form. GAF provides a novel way to capture temporal relationships and patterns within a signal.

To obtain the Gramian angular field of a time series, yi∈−11, for all i=1,2,…N and time stamp ti=0,1,…N, the time series in the Cartesian coordinate system is converted to a polar coordinate system with the following equations. In these equations, N serves as a constant to regularize the span of the polar coordinates.

θi=arccosyiE1

ri=tiNE2

GAF=cosθ1+θ1cosθ1+θ2…cosθ1+θNcosθ2+θ1cosθ2+θ2⋯cosθ2+θN⋮cosθN+θ1⋮cosθN+θ2⋮⋯cosθN+θNE3

Finally, the Gramian angular field (GAF) matrix is imaged, with the elements of the field serving as pixel values.

3.1.3 Toeplitz stacking of time series segments

As an ad hoc approach to imaging of the time series, it is stacked by generating a Toeplitz or diagonal-constant matrix from the time series, Y=yi∈RN, i=1,2,…N in a circulant form as indicated by Eq. (4)

T=y1y2y2y1…yN…yN−1⋮⋮yNyN−1⋱⋮…y1E4

3.1.4 Wavelet scalograms

Morlet wavelets were used to generate wavelet spectrograms of the time series segments. Morlet wavelets are a type of complex-valued wavelet that combines the benefits of both wavelets and Fourier analysis and are commonly used in time-frequency analysis and signal-processing tasks to extract information about the time-varying frequency content of a signal.

Mathematically, the Morlet wavelet can be defined as:

ψt=Aeiωte−t22σ2E5

where ψ(t) represents the complex-valued Morlet wavelet at time t, A is a normalization constant, ω is the angular frequency, and σ is the standard deviation of the Gaussian envelope.

In practice, Morlet wavelets with basis functions as shown in Figure 5 are often used in wavelet transforms or continuous wavelet transforms (CWT). The wavelet transform convolves the Morlet wavelet with the signal of interest at different scales or frequencies, generating a time-frequency representation known as a scalogram. This scalogram provides insights into the signal’s frequency content and how it evolves over time, and it is these images that were used to encode the time series segments.

Figure 5.
Morlet waveform. (Source: Author).

3.2 Feature extraction

3.2.1 Gray-level co-occurrence matrices

Gray-level co-occurrence matrices (GLCMs) are statistical tools used to describe the spatial relationships between pixel intensities in an image. GLCMs capture the frequency of occurrence of pairs of pixel values at specific spatial displacements or directions within an image, as indicated in Figure 6. As such, they represent the joint probability distributions of pixel intensity values for a given set of pixels and their respective neighboring pixels within a defined distance and direction.

Figure 6.
Gray-level image (left) from which two gray-level co-occurrence matrices (middle and right) are derived.

More formal descriptions can be found elsewhere, for instance, Löfstedt et al. [23], but basically, GLCMs and features derived from them are generated by:

Choosing a distance and direction (e.g., horizontal, vertical, or diagonal) for the pixel pairs.
Creating a matrix with the number of rows and columns equal to the number of gray levels in the image (usually 256 for 8-bit images).
Checking the designated neighboring pixel for each pixel in the image according to a chosen distance and direction.
Incrementing the corresponding element in the GLCM matrix based on the pixel pair values.
Normalization of the GLCM, so the elements of the matrix can be interpreted as probability estimates.

The resulting GLCM matrix can then be used to derive various texture features, of which the Haralick features or texture descriptors are most commonly used [24]. Four of these features were used in this study, namely, contrast, correlation, energy, and homogeneity, as defined by Eqs. (1)–(4). In all cases, the number of gray levels that were used in the gray level co-occurrence matrices was 8 and the offset between pixels was a unit distance at an angle of zero degrees.

Contrast: Captures the intensity contrast between neighboring pixels.

CON=∑i∑ji−j2GijE6

Correlation: Measures the linear dependency between pixel pairs with mi and si (mj and sj), indicating the mean and standard deviation of the elements in the i’th row (j’th column) of the co-occurrence matrix.

COR=∑i∑ji−mij−mjGijsisjE7

Energy: Quantifies the homogeneity of the image texture.

E=∑i∑jGij2E8

Homogeneity: Reflects the closeness of the GLCM values to the diagonal, indicating homogeneous textures.

H=∑i∑jGij1+i−j2E9

3.2.2 Local binary patterns

Local binary patterns (LBPs) are designed to capture the texture information of an image by comparing the intensity of a central pixel with its surrounding neighbors [25, 26]. The basic idea is to convert the local image patch around each pixel into a binary pattern. The binary pattern is then used to represent the texture characteristics of that particular region. Local binary pattern methods have among other been used by Mitiche et al. [27] for extraction of features from encoded time series data.

The basic process of computing LBP involves the following steps, as indicated in Figure 7.

Selection of a central pixel: For each pixel in the image, a local neighborhood around that pixel is considered.
Comparison of central pixel with neighboring pixels: The intensity value of the central pixel is compared with the intensities of its neighbors. The comparisons are performed in a circular or square region around the central pixel.
Binary pattern formation: For each comparison, if the intensity of the neighboring pixel is smaller than the intensity of the central pixel, the result is set to 0; otherwise, it is set to 1. These binary values are then concatenated to form a binary pattern for that specific region.
Histogram creation: The binary patterns are collected for all the pixels in the image, and a histogram is created, showing the frequency of occurrence of different binary patterns.

Figure 7.
Feature extraction from images with local binary patterns.

With the parameter settings in Table 1, 59 LBP features were generated for each image.

Hyperparameter	Comment	Setting
P	Number of pixel neighbors	8
r	Central pixel neighborhood radius	1
Other	Encoding of rotation information	NO
	Interpolation method used to compute pixel neighbors	Linear
	Normalization of LBP histograms	L2

Table 1.

Local binary pattern hyperparameter settings.

3.2.3 GoogleNet

GoogleNet [28], also known as Inception-v1, is a deep convolutional neural network (CNN) architecture that was developed by researchers at Google for image classification tasks. It was the winner of the ImageNet Large Scale Visual Recognition Competition (ILSVRC) in 2014.

To use GoogleNet for feature extraction from images, the network is typically truncated or frozen after the desired layer. The earlier layers capture low-level features such as edges, corners, and basic textures, while deeper layers capture more complex and abstract features specific to the task on which the network was trained (e.g., ImageNet classification). These higher-level features can then be used as input to other machine learning models or for further analysis.

One of the most common uses of GoogleNet is in transfer learning. Transfer learning involves leveraging the pretrained weights of GoogleNet on a large-scale dataset (e.g., ImageNet) and fine-tuning it on a smaller, task-specific dataset. By doing so, one can benefit from the powerful feature extraction capabilities of the network without needing to train it from scratch, even if the target dataset is relatively small. In all cases, features were extracted from the images by simply passing the images through GoogleNet; that is, no further training was done, and the features were generated solely from the pretraining of GoogleNet on the ImageNet database.

For each image, 1024 features were extracted from the layer labeled “in GoogleNet.”

3.3 Evaluation of features

The quality of the features derived by means of different algorithms from images generated by different encoding schemes was evaluated through their use as predictors in machine learning models, specifically random forests [29]. In all cases, the hyperparameter settings of the random forests that were used are summarized in Table 2. This provided for as consistent an evaluation scheme as possible for the features.

Hyperparameter	Description	Value
ntrees	Number of trees	200
ntry	Number of observations drawn at each split	0.7N
mtry	Number of variables drawn at each split	M3
Replacement	TRUE/FALSE	TRUE
Node size	Minimum number of samples in a terminal node	5
Splitting rule	Criterion on which splitting of nodes was based	Gini

Table 2.

Hyperparameters of random forest constructed with a training data set consisting of N samples and M variables.

4. Case study 1: Bivariate time series

In the first case study, a simulated bivariate Gaussian time series is considered, as shown in Figure 8. The 2000 sample series is generated with a zero mean vector and unit variances. The covariance matrix of the first 1000 samples was invariant, with Σ1:1000=1001. However, from the 1001st sample index, the series started to change slowly, as the covariance was incrementally increased with each sample over the following 1000 samples, such that Σ1001:2000=1s/1000s/10001, with s=1,2,…1000.

Figure 8.
Bivariate time series considered in case study 1.

A close-up view of the time series is provided in the lower panel of Figure 8. As can be seen from this panel, the two time series move independently initially (left, lower panel), and toward the end, the two time series follow essentially the same trajectories.

A moving window with a size b = s = 1000 was used to segment the time series, and each segment was encoded as an image, examples of which are shown in Figure 9. A total of 100 images for each time series segment were generated. The Gramian angular field images were stacked horizontally for the two time series, of which the correlations varied from 0 to 1, as explained above.

Figure 9.
Imaging of the time series data in case study 1 (b = 100, s = 100). Images in the left column are from the invariant series (r = 0), while those on the right are from the variant section of the time series (r = 1).

4 GLCM, 59 LBP, and 1024 GoogleNet features were extracted from the images. These features could be visualized by projecting them to a two-dimensional space with a t-distributed stochastic neighbor embedding (t-SNE) algorithm [30] using a Euclidean distance metric with a perplexity parameter value of 30, as shown in Figure 10.

Figure 10.
Visualization of image features in case study 1 with t-SNE score plots. White and black markers show the first and second parts of the time series, respectively.

These projections tend to preserve the topological structure of the original data in the high-dimensional spaces; in other words, features that are similar (representing images that are similar) would tend to be located in the same area on the t-SNE map, while features that are different would tend to be segregated in the t-SNE map. As indicated by the bottom panel of these score plots, wavelet imaging facilitated the best segregation between the two time series.

To further quantify the performance of the features, they were used as predictors in random forest models, as discussed in Section 3.3. In each case, the features were used in a binary classification problem to discriminate as best as possible between the two time series.

For evaluation purposes, the out-of-bag (OOB) errors of the random forest models are shown as a function of the number of trees in the forest (each had 200 trees) in Figure 11. In this figure, each panel represents a different imaging method. GLCM features are represented by blue curves, LBP features by red curves, and the GoogleNet features by black curves.

Figure 11.
Out-of-bag (OOB) errors of random forest models discriminating between the different realizations of the bivariate system in case study 1, using GLCM (blue), LBP (red), and GoogleNet (black) features derived from different images.

The best results were obtained with the wavelet scalograms, as indicated by the lower right panel in the graph and more specifically by the features that were generated by GoogleNet. Conversely, features derived from the Toeplitz matrix and Gramian angular field images were not as predictive, noting that a classification error of 0.5 would be equivalent to random features with no predictive value, given that there were only two equisized classes to predict.

5. Case study 2: Effect of imaging and feature extraction on anomaly detection in a nonlinear dynamic system

In the second case study, the effect of time series preprocessing, imaging method, and feature extraction on the ability of a principal component model to detect changes in the dynamics of a nonlinear time series are considered. For this purpose, a simulation of the Thomas attractor is investigated. Imaging is done by use of Euclidean biplots, Gramian angular fields, and wavelet scalograms, as discussed in more detail below.

5.1 Thomas attractor

In the second case study, a univariate nonlinear time series is considered. This time series was obtained by simulating the Thomas attractor [31, 32, 33]. The Thomas attractor is a type of strange attractor that arises in a three-dimensional dynamical system and is named after its discoverer, mathematician and physicist Michael Thomas. The attractor is defined by the following system of three nonlinear differential equations:

dxdt=siny−b1xE10

dydt=sinz−b1yE11

dzdt=sinx−b1zE12

where b1 is a positive constant.

This system of equations exhibits chaotic behavior, meaning that the solutions of the equations are highly sensitive to initial conditions. The behavior of the system is characterized by a complex, non-repeating pattern of trajectories in three-dimensional space, which forms the Thomas attractor. b1 corresponds to how dissipative the system is and acts as a bifurcation parameter.

Changes in the behavior of the attractor can therefore be easily simulated by making small changes in the value of the parameter b. For this case study, the attractor was simulated by generating 20,000 samples of x, y, and z. For a parameter value of 0<b<0.33, the system exhibits chaotic behavior, with up to six separate coexisting attractors. An example of such as attractor is shown in Figure 12.

Figure 12.
The Thomas attractor in phase space.

Only one of the variables x was retained for purpose of the analysis. 10,000 samples were generated based on a parameter value b = 0.2 and another 10,000 with b₁ = 0.21. These time series were concatenated to yield a single time series consisting of 20,000 samples, as shown in Figure 13.

Figure 13.
Thomas attractor with persistent change in dynamics at index = 10,000 (top) and a close-up of the same data at the change point (bottom).

5.2 Effect of imaging of time series

The time series was segmented with a window size b = [100, 1000] and step size s = [100, 1000]. Each segment was imaged with Euclidean distance plots, Gramian angular field matrices, Toeplitz stacks, and Morlet wavelets. In addition, GLCM, LBP, and GoogleNet features were extracted from these images, as before. Examples of these images are shown in Figure 14.

Figure 14.
Examples of images of the Thomas attractor shown in Figure 13.

The features extracted with gray level co-occurrence matrices, local binary patterns, as well as using GoogleNet can be visualized in Figure 15. Overall, there does not appear to be marked differences in the imaging methods, and this is also borne out by the data shown in Figure 16. On the whole, the LBP features combined with the Euclidean distance plots yielded the best results.

Figure 15.
Visualization of image features in case study 2 with t-SNE score plots. White and black markers show the first and second parts of the time series, respectively.

Figure 16.
Effect of different image encodings and features in discriminating between two different realizations of the nonlinear Thomas time series in case study 2 shown in Figure 13. GLCM, LBP, and GoogleNet features are respectively represented by blue, red, and black curves.

6. Case study 3: Power consumption of a SAG mill

In the final case study, measurements of the power consumption collected from an IsaMill on an industrial copper processing circuit in Western Australia, as described in more detail by Napier and Aldrich [34], are considered. The time series consisted of 80,000 measurements collected over a two-month period of operation.

A surrogate time series with the same distribution of measurements, as well as having the same autocorrelation as the real data, was generated using an iterative amplitude adjusted Fourier transform algorithm. These algorithms are commonly used in nonlinear time series analysis to test hypotheses about the (non)linearity or deterministic nature of the time series based on pivotal test statistics, such as the correlation dimension of the time series.

In this case study, such formal tests are not conducted, but the extent to which the real data can be distinguished from the surrogate data could be seen as an indication of the nonlinear deterministic nature of the time series. The real data and the surrogate data are shown in Figure 17.

Figure 17.
Scaled power consumption on an industrial copper grinding circuit, as well as an iterative amplitude adjusted Fourier transform surrogate of the data.

Both the real time series and its surrogate were encoded in images based on the use of Euclidean distance plots, Gramian angular fields, Toeplitz matrices of the time series segments, as well as Morlet wavelet scalograms, as shown in Figure 18. As before, features were extracted from these images by use of gray level co-occurrence matrices, local binary patterns, and GoogleNet, and these can be visualized in Figure 19. By and large, the two time series segments are significantly separated in the t-SNE score spaces of all the images, with the exception of the wavelets.

Figure 18.
Examples of images of the power consumption on the industrial copper grinding circuit shown in Figure 17.

Figure 19.
Visualization of image features in case study 3 with t-SNE score plots. White and black markers show the first and second parts of the time series, respectively.

These features were subsequently used as predictors in random forest models trained to discriminate between the real and surrogate time series. The out-of-bag (OOB) errors are graphically portrayed in Figure 20.

Figure 20.
Effect of imaging method and feature extraction on the characterization of the dynamics of power consumption of a mill on a copper circuit in case study 3. GLCM, LBP, and GoogleNet features are respectively represented by blue, red, and black curves.

As can be seen from these results, the GoogleNet features extracted from the Euclidean distance plots and Gramian angular fields gave the best results, being able to discriminate between the two time series with an error of approximately 7%.

Features extracted from Euclidean distance plots consistently yielded the best performance. In addition, the GLCM features were consistently outperformed by the other feature sets.

7. Discussion and conclusion

When the time series is imaged, the capture of information contained in the time series depends on the characteristics of the image. Euclidean distance plots, for example, tend to capture recurrent behavior in the time series well. Although some comparative analysis has been conducted [35], the effects of different imaging approaches have not been studied very widely as yet. In another study, Yuan et al. [36] have compared recurrence plots, wavelets, and Markov transition fields in seismographic data. Although the wavelet-based images gave the best results, the differences were marginal. Song et al. [37] have likewise compared recurrence plots, Gramian angular difference fields, and Markov transition fields and have concluded that the recurrence plots yielded the best results in their application related to fault detection in manufacturing processes.

Overall, the feature extraction algorithms considered in this study can be compared by their rankings in each case study. That is, each algorithm can be awarded a score of 1 (best), 2, or 3 (worst) based on its performance in each trial. Where two algorithms perform equally, the ranks are shared. For example, if two algorithms perform equally well and best, each is awarded a score of 11/2.

For the 12 trials, that is, four image types in three case studies, the results are summarized in Table 3, with a lower score better than a higher score. This shows that on average GoogleNet features performed somewhat better than LBP features and markedly better than GLCM features.

Case study	Imaging*	GLCM	LBP	GoogleNet
1	D	1	21/2	21/2
	G	3	2	1
	T	1	21/2	21/2
	W	3	2	1
2	D	21/2	1	21/2
	G	3	11/2	11/2
	T	3	1	2
	W	3	1	2
3	D	2	2	2
	G	2	3	1
	T	3	2	1
	W	3	11/2	11/2
Total		291/2	22	201/2

Table 3.

Ranking of feature extraction methods across the three case studies. Smaller values are better.

^*

D = Euclidean distance plot, G = Gramian angular field, T = Toeplitz stacking, and W = Wavelet scalogram.

The different imaging methods can also be analyzed on a similar basis, which on average yield the result D > G > W > T; that is, the best models were associated with Euclidean distance plots, although it is clear that none of the imaging methods consistently outperformed the other.

These results should be considered as preliminary only, and further validation would be required with more diverse time series data. Other imaging methods or more advanced implementation of the basic approaches would also have to be considered.

References

1. Li Z, Li S, Yan X. Time Series as Images: Vision Transformer for Irregularly Sampled Time Series. arXiv:2303.12799v1 [cs.LG]. 1 Mar 2023
2. Aldrich C, Liu X. Quantitative texture analysis with convolutional neural networks. In: IoT-Enabled Convolutional Neural Networks: Techniques and Applications. Denmark: River Publishers; 2023. Available from: https://ieeexplore.ieee.org/Xplorehelp/browsing-ieee-xplore/river-publishers#learn-more-about-river-publishers
3. Abidi A, Ienco D, Abbes AB, Farah IR. Combining 2D encoding and convolutional neural network to enhance land cover mapping from satellite image time series. Engineering Applications of Artificial Intelligence. 2023;122:106152. DOI: 10.1016/j.engappai.2023.106152
4. Wang C-C, Kuo C-H. Detecting dyeing machine entanglement anomalies by using time series image analysis and deep learning techniques for dyeing-finishing process. Advanced Engineering Informatics. 2023;55:101852. DOI: 10.1016/j.aei.2022.101852
5. Wang Z, Oates T. Imaging time-series to improve classification and imputation. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), Buenos Aires, Argentina, 25-31 July. Palo Alto, CA, USA: AAAI Press; 2015
6. Zhou X, Yu T, Wang G, Guo R, Fu Y, Sun Y, et al. Tool wear classification based on convolutional neural network and time series images during high precision turning of copper. Wear. 2023;522:204692. DOI: 10.1016/j.wear.2023.204692
7. Chen Y, Li J, Huang Q, Li K, Zhao Z, Ren X. Non-technical losses detection with Gramian angular field and deep residual network. Energy Reports. 2023;9:1392-1401. DOI: 10.1016/j.egyr.2023.05.183
8. Qin Z, Zhang Y, Meng S, Qin Z, Choo KKR. Imaging and fusing time series for wearable sensor-based human activity recognition. Information Fusion. 2020;53:80-87. DOI: 10.1016/j.inffus.2019.06.014
9. Zhang Q, Qi Z, Cui P, Xie M, Din J. Detection of single-phase-to-ground faults in distribution networks based on Gramian angular field and improved convolutional neural networks. Electric Power Systems Research. 2023;221:109501. DOI: 10.1016/j.epsr.2023.109501
10. Jiang H, Liu L, Lian C. Multi-modal fusion transformer for multivariate time series classification. In: 14th International Conference on Advanced Computational Intelligence (ICACI); Wuhan, China, 15-17 July, 2022. IEEE. 2022. pp. 284-288. DOI: 10.1109/ICACI55529.2022.9837525
11. Marwan N. A historical review of recurrence plots. European Physical Journal ST. 2008;164(1):3-12. DOI: 10.1140/epjst/e2008-00829-1.S2CID 119494395
12. Zbilut JP, Webber CL Jr. Embeddings and delays as derived from quantification of recurrence plots. Physics Letters A. 1992;171(3–4):199-203. DOI: 10.1016/0375-9601(92)90426-M
13. Debayle J, Hatami N, Gavet Y. Classification of time-series images using deep convolutional neural networks. In: Proceedings of the 10th International Conference on Machine Vision (ICMV 2017). Vienne: SPIE; 2018. DOI: 10.1117/12.2309486
14. Hou Y, Aldrich C, Lepkova K, Machuca L, Kinsella B. Monitoring of carbon steel corrosion by use of electrochemical noise and recurrence quantification analysis. Corrosion Science. 2016;112:63-72
15. Hou Y, Aldrich C, Lepkova K, Kinsella B. Identifying corrosion of carbon steel buried in iron ore and coal cargoes based on recurrence quantification analysis of electrochemical noise. Electrochimica Acta. 2018;283:212-220
16. Abbasi H, Bennet L, Gunn AJ, Unsworth CP. 2D wavelet scalogram training of deep convolutional neural network for automatic identification of micro-scale sharp wave biomarkers in the hypoxic-ischemic EEG of preterm sheep. In: (2019) Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS; Berlin, Germany, 23-27 July 2019. IEEE. 2019. pp. 1825, 8857665-1828
17. Ahmad S, Ahmad Z, Kim C-H, Kim J-M. A method for pipeline leak detection based on acoustic imaging and deep learning. Sensors. 2022;22(4):1562
18. Roy AD, Islam MM. Detection of epileptic seizures from wavelet scalogram of EEG signal using transfer learning with AlexNet convolutional neural network. In: ICCIT 2020-23rd International Conference on Computer and Information Technology, Proceedings, Dhaka, Bangladesh, 19-21 December 2020. Art. no. 9392720. IEEE. 2020
19. Sharan RV. Spoken digit recognition using wavelet scalogram and convolutional neural networks. In: 2020 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2020. Thiruvananthapuram, India: (Virtual Conference), 3-5 December 2020. Art. no. 9332505. 2020. pp. 101-105
20. Almaghrabi S, Rana M, Hamilton M, Rahaman MS. Solar power time series forecasting utilising wavelet coefficient. Neurocomputing. 2022;508:182-207
21. Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Physical Review Letters. 1980;45:712
22. Sauer TD. Attractor reconstruction. Scholarpedia. 2006;1(10):1727. DOI: 10.4249/scholarpedia.1727
23. Löfstedt T, Brynolfsson P, Asklund T, Nyholm T, Garpebring A. Gray-level invariant Haralick texture features. PLoS One. 2019;14(2):e0212110. DOI: 10.1371/journal.pone.0212110
24. Haralick RM, Shanmugam K, Dinstein I. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics. 1973;3:610-621. DOI: 10.3390/s22041562
25. Ojala T, Pietikäinen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition. 1996;29:51-59
26. Ojala T, Pietikäinen M, Mäenpää T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(7):971-987. DOI: 10.1109/TPAMI.2002.1017623
27. Mitiche I, Morison G, Nesbitt A, Hughes-Narborough M, Stewart BG, Boreha P. Imaging time series for the classification of EMI discharge sources. Sensors. 2018;18(9):3098. DOI: 10.3390/s18093098
28. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Boston, MA, USA. 7-12 June 2015. IEEE. 2015. pp. 1-9
29. Breiman L. Random forests. Machine Learning. 2001;45:5-32
30. Van der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research. 2008;9:2579-2605
31. Butusov DN, Ostrovskii VY, Tutueva AV, Savelev AO. Comparing the algorithms of multiparametric bifurcation analysis. In: XX IEEE International Conference on Soft Computing and Measurements (SCM); St. Petersburg, Russia, 24-26 May 2017. IEEE. 2017. DOI: 10.1109/SCM.2017.7970536
32. McDonald B, Roy Choudhury S. The Thomas attractor with and without delay: Complex dynamics to amplitude death. Discontinuity, Nonlinearity, and Complexity. 2020;9(1):27-45. DOI: 10.5890/DNC.2020.03.003
33. Sprott JC, Chlouverakis Konstantinos E. Labyrinth chaos. International Journal of Bifurcation and Chaos. 2007;17(6):2097-2108
34. Napier LFA, Aldrich C. An IsaMill™ soft sensor based on random forests and principal component analysis. IFAC-PapersOnLine. 2017;50(1):1175-1180. DOI: 10.1016/j.ifacol.2017.08.270
35. Lee G, Kwon D, Lee C. A convolutional neural network model for SOH estimation of Li-ion batteries with physical interpretability. Mechanical Systems and Signal Processing. 2023;188:110004. DOI: 10.1016/j.ymssp.2022.110004
36. Yuan X, Tanksley D, Jiao P, Li L, Chen G, Wunsch D. Encoding time-series ground motions as images for convolutional neural networks-based seismic damage evaluation. Frontiers in Built Environment. 2021;729:660103. DOI: 10.3389/fbuil.2021.660103
37. Song J, Lee YC, Lee J. Deep generative model with time series-image encoding for manufacturing fault detection in die casting process. Journal of Intelligent Manufacturing. 2023;34:3001-3014. DOI: 10.1007/s10845-022-01981-6

[1] 1. Li Z, Li S, Yan X. Time Series as Images: Vision Transformer for Irregularly Sampled Time Series. arXiv:2303.12799v1 [cs.LG]. 1 Mar 2023

[2] 2. Aldrich C, Liu X. Quantitative texture analysis with convolutional neural networks. In: IoT-Enabled Convolutional Neural Networks: Techniques and Applications. Denmark: River Publishers; 2023. Available from: https://ieeexplore.ieee.org/Xplorehelp/browsing-ieee-xplore/river-publishers#learn-more-about-river-publishers

[3] 3. Abidi A, Ienco D, Abbes AB, Farah IR. Combining 2D encoding and convolutional neural network to enhance land cover mapping from satellite image time series. Engineering Applications of Artificial Intelligence. 2023;122:106152. DOI: 10.1016/j.engappai.2023.106152

[4] 4. Wang C-C, Kuo C-H. Detecting dyeing machine entanglement anomalies by using time series image analysis and deep learning techniques for dyeing-finishing process. Advanced Engineering Informatics. 2023;55:101852. DOI: 10.1016/j.aei.2022.101852

[5] 5. Wang Z, Oates T. Imaging time-series to improve classification and imputation. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), Buenos Aires, Argentina, 25-31 July. Palo Alto, CA, USA: AAAI Press; 2015

[6] 6. Zhou X, Yu T, Wang G, Guo R, Fu Y, Sun Y, et al. Tool wear classification based on convolutional neural network and time series images during high precision turning of copper. Wear. 2023;522:204692. DOI: 10.1016/j.wear.2023.204692

[7] 7. Chen Y, Li J, Huang Q, Li K, Zhao Z, Ren X. Non-technical losses detection with Gramian angular field and deep residual network. Energy Reports. 2023;9:1392-1401. DOI: 10.1016/j.egyr.2023.05.183

[8] 8. Qin Z, Zhang Y, Meng S, Qin Z, Choo KKR. Imaging and fusing time series for wearable sensor-based human activity recognition. Information Fusion. 2020;53:80-87. DOI: 10.1016/j.inffus.2019.06.014

[9] 9. Zhang Q, Qi Z, Cui P, Xie M, Din J. Detection of single-phase-to-ground faults in distribution networks based on Gramian angular field and improved convolutional neural networks. Electric Power Systems Research. 2023;221:109501. DOI: 10.1016/j.epsr.2023.109501

[10] 10. Jiang H, Liu L, Lian C. Multi-modal fusion transformer for multivariate time series classification. In: 14th International Conference on Advanced Computational Intelligence (ICACI); Wuhan, China, 15-17 July, 2022. IEEE. 2022. pp. 284-288. DOI: 10.1109/ICACI55529.2022.9837525

[11] 11. Marwan N. A historical review of recurrence plots. European Physical Journal ST. 2008;164(1):3-12. DOI: 10.1140/epjst/e2008-00829-1.S2CID 119494395

[12] 12. Zbilut JP, Webber CL Jr. Embeddings and delays as derived from quantification of recurrence plots. Physics Letters A. 1992;171(3–4):199-203. DOI: 10.1016/0375-9601(92)90426-M

[13] 13. Debayle J, Hatami N, Gavet Y. Classification of time-series images using deep convolutional neural networks. In: Proceedings of the 10th International Conference on Machine Vision (ICMV 2017). Vienne: SPIE; 2018. DOI: 10.1117/12.2309486

[14] 14. Hou Y, Aldrich C, Lepkova K, Machuca L, Kinsella B. Monitoring of carbon steel corrosion by use of electrochemical noise and recurrence quantification analysis. Corrosion Science. 2016;112:63-72

[15] 15. Hou Y, Aldrich C, Lepkova K, Kinsella B. Identifying corrosion of carbon steel buried in iron ore and coal cargoes based on recurrence quantification analysis of electrochemical noise. Electrochimica Acta. 2018;283:212-220

[16] 16. Abbasi H, Bennet L, Gunn AJ, Unsworth CP. 2D wavelet scalogram training of deep convolutional neural network for automatic identification of micro-scale sharp wave biomarkers in the hypoxic-ischemic EEG of preterm sheep. In: (2019) Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS; Berlin, Germany, 23-27 July 2019. IEEE. 2019. pp. 1825, 8857665-1828

[17] 17. Ahmad S, Ahmad Z, Kim C-H, Kim J-M. A method for pipeline leak detection based on acoustic imaging and deep learning. Sensors. 2022;22(4):1562

[18] 18. Roy AD, Islam MM. Detection of epileptic seizures from wavelet scalogram of EEG signal using transfer learning with AlexNet convolutional neural network. In: ICCIT 2020-23rd International Conference on Computer and Information Technology, Proceedings, Dhaka, Bangladesh, 19-21 December 2020. Art. no. 9392720. IEEE. 2020

[19] 19. Sharan RV. Spoken digit recognition using wavelet scalogram and convolutional neural networks. In: 2020 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2020. Thiruvananthapuram, India: (Virtual Conference), 3-5 December 2020. Art. no. 9332505. 2020. pp. 101-105

[20] 20. Almaghrabi S, Rana M, Hamilton M, Rahaman MS. Solar power time series forecasting utilising wavelet coefficient. Neurocomputing. 2022;508:182-207

[21] 21. Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Physical Review Letters. 1980;45:712

[22] 22. Sauer TD. Attractor reconstruction. Scholarpedia. 2006;1(10):1727. DOI: 10.4249/scholarpedia.1727

[23] 23. Löfstedt T, Brynolfsson P, Asklund T, Nyholm T, Garpebring A. Gray-level invariant Haralick texture features. PLoS One. 2019;14(2):e0212110. DOI: 10.1371/journal.pone.0212110

[24] 24. Haralick RM, Shanmugam K, Dinstein I. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics. 1973;3:610-621. DOI: 10.3390/s22041562

[25] 25. Ojala T, Pietikäinen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition. 1996;29:51-59

[26] 26. Ojala T, Pietikäinen M, Mäenpää T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(7):971-987. DOI: 10.1109/TPAMI.2002.1017623

[27] 27. Mitiche I, Morison G, Nesbitt A, Hughes-Narborough M, Stewart BG, Boreha P. Imaging time series for the classification of EMI discharge sources. Sensors. 2018;18(9):3098. DOI: 10.3390/s18093098

[28] 28. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Boston, MA, USA. 7-12 June 2015. IEEE. 2015. pp. 1-9

[29] 29. Breiman L. Random forests. Machine Learning. 2001;45:5-32

[30] 30. Van der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research. 2008;9:2579-2605

[31] 31. Butusov DN, Ostrovskii VY, Tutueva AV, Savelev AO. Comparing the algorithms of multiparametric bifurcation analysis. In: XX IEEE International Conference on Soft Computing and Measurements (SCM); St. Petersburg, Russia, 24-26 May 2017. IEEE. 2017. DOI: 10.1109/SCM.2017.7970536

[32] 32. McDonald B, Roy Choudhury S. The Thomas attractor with and without delay: Complex dynamics to amplitude death. Discontinuity, Nonlinearity, and Complexity. 2020;9(1):27-45. DOI: 10.5890/DNC.2020.03.003

[33] 33. Sprott JC, Chlouverakis Konstantinos E. Labyrinth chaos. International Journal of Bifurcation and Chaos. 2007;17(6):2097-2108

[34] 34. Napier LFA, Aldrich C. An IsaMill™ soft sensor based on random forests and principal component analysis. IFAC-PapersOnLine. 2017;50(1):1175-1180. DOI: 10.1016/j.ifacol.2017.08.270

[35] 35. Lee G, Kwon D, Lee C. A convolutional neural network model for SOH estimation of Li-ion batteries with physical interpretability. Mechanical Systems and Signal Processing. 2023;188:110004. DOI: 10.1016/j.ymssp.2022.110004

[36] 36. Yuan X, Tanksley D, Jiao P, Li L, Chen G, Wunsch D. Encoding time-series ground motions as images for convolutional neural networks-based seismic damage evaluation. Frontiers in Built Environment. 2021;729:660103. DOI: 10.3389/fbuil.2021.660103

[37] 37. Song J, Lee YC, Lee J. Deep generative model with time series-image encoding for manufacturing fault detection in die casting process. Journal of Intelligent Manufacturing. 2023;34:3001-3014. DOI: 10.1007/s10845-022-01981-6

A Comparative Analysis of Image Encoding of Time Series for Anomaly Detection

Time Series Analysis - Recent Advances, New Perspectives and Applications [Working Title]

Abstract

Keywords

Author Information

Chris Aldrich*

1. Introduction

2. Imaging of time series data

Figure 1.

Figure 2.

3. Analytical methodology

Figure 3.

3.1 Image encoding

3.1.1 Euclidean distance plots

Figure 4.

3.1.2 Gramian angular fields

3.1.3 Toeplitz stacking of time series segments

3.1.4 Wavelet scalograms

Figure 5.

3.2 Feature extraction

3.2.1 Gray-level co-occurrence matrices

Figure 6.

3.2.2 Local binary patterns

Figure 7.

Table 1.

3.2.3 GoogleNet

3.3 Evaluation of features

Table 2.

4. Case study 1: Bivariate time series

Figure 8.

Figure 9.

Figure 10.

Figure 11.

5. Case study 2: Effect of imaging and feature extraction on anomaly detection in a nonlinear dynamic system

5.1 Thomas attractor

Figure 12.

Figure 13.

5.2 Effect of imaging of time series

Figure 14.

Figure 15.

Figure 16.

6. Case study 3: Power consumption of a SAG mill

Figure 17.

Figure 18.

Figure 19.

Figure 20.

7. Discussion and conclusion

Table 3.

References