Temporal Clustering of Time Series via Threshold Autoregressive Models: Application to Commodity Prices
Abstract
This study aimed to find temporal clusters for several commodity prices using the threshold nonlinear autoregressive model. It is expected that the process of determining the commodity groups that are timedependent will advance the current knowledge about the dynamics of comoving and coherent prices, and can serve as a basis for multivariate time series analyses. The clustering of commodity prices was examined using the proposed clustering approach based on time series models to incorporate the time varying properties of price series into the clustering scheme. Accordingly, the primary aim in this study was grouping time series according to the similarity between their Data Generating Mechanisms (DGMs) rather than comparing pattern similarities in the time series traces. The approximation to the DGM of each series was accomplished using threshold autoregressive models, which are recognized for their ability to represent nonlinear features in time series, such as abrupt changes, timeirreversibility and regimeshifting behavior. Through the use of the proposed approach, one can determine and monitor the set of comoving time series variables across the time dimension. Furthermore, generating a time varying commodity price index and subindexes can become possible. Consequently, we conducted a simulation study to assess the effectiveness of the proposed clustering approach and the results are presented for both the simulated and real data sets.
1Introduction
The movement of commodity prices and the associated dynamics are interrelated with economics and directly affect many industries. For example, energy based commodities constitute the main input cost for firms and households; changes in agricultural food prices affect the purchasing power parity; and base metals such as copper and aluminum serve as primary raw materials for industrial production and the building trade. Inflation and growth rate dynamics are linked to changes in commodity prices and the form of the link is time varying. An avenue of research which is receiving a lot of attention both from monetary and fiscal policy makers and from academia is that into the dynamics and the endogeneity of commodity prices to macroeconomic variables and monetary developments, such as expected growth, expected inflation, interest rates, and currency movements [?]. Moreover, understanding commodity price behavior has become more important for commodity market participants since the financialization of commodities [?]. Findings suggest that determining commodity groups that are comoving with changing dynamics will find an application in inflation prediction and investors’ portfolios [?]. Hence, determining price groups that are timedependent can deepen the knowledge about the dynamics of comoving and coherent prices, and it can serve as a basis for other statistical analyses, such as causality, cointegration and multivariate analyses, from which the underlying causes of comovements can be investigated. Generating commodity price indices (e.g. RICI, TRCCI, SP GSCI) that rely on timedependently determined price groups are also possible by assigning weights for each commodity or for commodity groups.
Research on comovements of commodity prices has been the subject of interest following the seminal work of Pindyck and Rotemberg [?], in which they concluded that seemingly unrelated commodity prices tend to move together. They hypothesized that this price behavior was an “excess” comovement that was not related to macroeconomic fundamentals, such as inflation, industrial production, interest rates or exchange rates, but rather due to herding behavior, where traders speculate on commodities for no plausible economic reason [?]. Since then, a number of researches have been focused on the comovement of commodity prices, in which consideration have been given to various aspects of the price movements via econometric model specifications, such as the effects of macroeconomic variables, structural changes and the volatility of prices [?]. For example, Ai et al. [?] presented evidence against the herding behavior of commodities: the data suggested that the majority of the comovements were related with commodity factors such as supply and demand. They claimed that information about price movements was insufficient to understand commodity markets or develop a commodity price model. Similarly, according to the findings of Lescaroux [?], for oil and six particular metals, price movements did not support the hypothesis of “excess” comovement, but rather demonstrated that the tendency of commodity prices to comove was related to the tendency of their fundamental factors to move together.
Nevertheless, Byrne et al. [?] reported significant evidence in favor of comovement in commodity prices and identified a common factor utilizing the nonstationary Panel factor (PANIC) model from Bai and Ng [?] and Factor Augmented Vector Autoregression (FAVAR) framework from Bernanke et al. [?]. Examples of a sector and intrasector based assessment of price comovements for energy, nonenergy and metal type commodities can be found in [?]. Rossen [?] focused and analyzed the prices of different groups of metals, namely nonferrous metals, light metals and precious metals, using common statistical methods: concordance statistics, distance correlation coefficients, crosscorrelations and cointegration analyses. The findings suggested that comovement is not viable among metal prices, but it can be validated within specific groups of metals. Fernandez [?] focused on a new measure of global comovement by determining the average partial autocorrelation of one commodity with others. Several commodity categories and their influence on other commodity markets were analyzed and the results showed that there has been a strong comovement among the nominal returns of metals since 2003 and that the comovement among unrelated commodity returns has been negligible except for the period of global financial crisis in 20072010. Similarly, Sensoy et al. [?] evidenced a dynamic convergence of commodity futures returns for three groups of commodities (precious and industrial metals, and energy) by utilizing a dynamic equicorrelation (DECO)generalized autoregressive conditional heteroscedasticity (GARCH) model. The example for comovement analyses in a wide group of commodity prices via clustering perspective can be found in [?]. In their paper, Matesanz et al. [?] implemented the hierarchical clustering based on a model free dissimilarity measure and results provided for describing the dynamics of comovements and temporal interdependencies of commodities. Moreover, throughout the majority of literature about the comovement of commodity prices, attention was drawn to the temporal variability of price behavior.
This study aimed to address the question of determining the time varying commodity price groups that may have a comovement property and do so in a way that differed from existing literature in two ways. First, we proposed the time series clustering approach for grouping of commodity prices in which we would utilize the time series modelbased and goaloriented feature vectors that are able to represent the time varying behavior of commodity prices. In addition, since the clustering methodology does not assume, if it is not designated, any category based evaluation, it is possible to form a group in which its members originate from different commodity categories. Thus, seemingly unrelated commodities, such as gold and natural gas, may fall into the same cluster. Second, the temporal variability of prices was examined by a regime switching concept and we proposed the use of a threshold autoregressive (TAR) time series model to incorporate the time varying properties of price series into the clustering task. More specifically, each time series was to be formed into a vector that represents the point in the generated high dimensional space and then the vectors on this space would be clustered/grouped. It should also be noted that to represent the temporal dynamics of time series, and therefore, clusters, vectors that represent each time series could be regenerated at each time point.
This paper is organized as follows: the motivation behind the study is explained in section 2, the methodology and proposed approach are given in section 3 and the real data used in the study are presented in section 4, which is mainly devoted to the results of the simulation and the application study. The conclusion and future work are summarized in section 5.
2Motivation and Contribution of the Study
Time series clustering is the unsupervised grouping of a set of unlabeled time series into homogeneous clusters in such a way that the withingroup dissimilarity is minimized and the betweengroup dissimilarity is maximized at the end. Various approaches and procedures have been developed and used to cluster time series from different fields, such as economics, finance, bioinformatics, neuroscience, and climatology; however, most of them share a common perspective, in which they use or modify the algorithms for the clustering of static data by converting time series data into the form of static data. In this process, feature extraction from time series can be implemented and it is the extracted information (i.e. static information such as mean and variance) that is then used in clustering instead of the raw data. Although the intention is for the extracted features to be representative of the time series, information that has static characteristics (i.e. time independent) is not sufficient to represent the time series when their temporal behavior is considered. Therefore, besides the technique used in clustering, the performance of the clustering approach of time series is highly dependent on feature selection. The way of feature extraction is a continuing issue that needs to be addressed carefully. Comprehensive surveys based on different aspects of time series clustering can be seen in [?].
The perfect clustering of multiple time series, which also implies classification in the idealized case, can be achieved if the underlying data generating mechanisms (DGMs) are thoroughly known. Since we observe limited realizations of the underlying processes, determining the true model (i.e. the actual DGM) is usually not feasible in real cases. However, statistical inferences can be accessible after obtaining good approximations to those DGMs. In this study, the proposed time series clustering approach aimed to incorporate the timedependent information of the time series that was derived from approximations to their true DGMs. Here, the approximation to DGMs was done using linear and nonlinear time series models and the associations of time series with these time series models were used in the proposed time series clustering approach. Thus, the feature extraction phase or the feature vector formation of the proposed clustering approach mainly relied on time series models. Two main objectives were expected to be achieved from the formation of feature vectors: firstly, to ensure the timedependent characteristics were well represented at feature vectors in a proper manner; and secondly, to provide comparable and distinctive inputs to feature vectors from DGM approximations. In this study, one of the approximations to the DGM of each time series was accomplished by the TAR models detailed in Section 3.1. They are known for their dynamic structure and ability to represent nonlinear features in time series, such as abrupt changes, asymmetry and especially, regime shifting behavior (time varying state phenomena).
Another motivation of this study was to gain an insight into comoving commodities by investigating temporal price groups in commodities through a time series clustering perspective. Since the aim of the proposed clustering approach was to group time series according to their timedependent behavioral similarities, the group members in the same cluster that share similar dynamics (i.e. DGM) would provide useful information for further investigation of comovement evaluations. In the literature, to the best of authors’ knowledge, a time series model based clustering framework has not been reported for commodity prices.
3Methodology
The primary concern for this paper was grouping time series with respect to the similarity/dissimilarity between their DGM approximations instead of considering their commonly shared patterns (i.e. time series traces, static information). Since identical nondeterministic DGMs can produce different patterns (i.e. traces) as a result of their stochastic nature, finding a coherent time series group with respect to a tracelike pattern similarity/dissimilarity can lead to inappropriate conclusions. In order to distinguish time series with respect to their actual DGM approximations, we need a rich environment (i.e. a multifaceted time series model) to provide distinctive outcomes for clustering. Thus, instead of aiming to find a true DGM, the associations of time series with this environment can be observed and used for clustering. Fortunately, by having a switching mechanism that allows time varying states, accounting nonlinearities with a relatively simple structure, and enabling an easy implementation procedure, TAR models can provide comparable outputs for each time series.
In brief, to determine the clusters of time series, nonlinear TAR model outputs, Autoregressive (AR) model outputs, the sample autocorrelation function (acf), the partial autocorrelation function (pacf) and the crosscorrelation function (ccf) of each time series were combined within the proposed clustering approach. The TAR model and the proposed approach are explained in the following subsections.
3.1Threshold Autoregressive Model (TAR)
The TAR model, introduced by Tong and Lim [?], is motivated by several nonlinear characteristics commonly observed in practice such as asymmetry in declining and rising patterns of a timedependent process. The model aims to determine the time varying behavior of a time series process by switching regimes (states) via a threshold variable. Thus, unlike the linear and autoregressive time reversible time series models, the TAR modeling perspective seems to have a satisfactory way of analyzing time irreversible and complex systems due to its handling of the complexity within different but simpler linear subsystems that are connected by a threshold process. The capability of TAR models to generate and capture nonlinear dynamics, limit cycles, severe jumps and asymmetries is exemplified in diverse fields and comprehensively discussed in Tong [?]. The subsequent effect and reflection of the threshold nonlinearity concept and TAR modelling in the fields of economics and finance are reviewed in Hansen [?] and Chen [?].
A general TAR() model where denotes the number of regimes can be represented as
where represents the time series process; and is the number of regimes; is for regime AR lag order; is intercept and are AR terms coefficients for regime;
For example, if at time , then the active regime at that time is characterized by
The model represented in Equation 1 becomes a selfexciting threshold autoregressive (SETAR) model if the threshold variable is replaced by the time series variable itself, i.e., where the regime of the time series is now determined by its own past value .
Similar to the TAR model, a general SETAR() model where denotes the number of regimes can be represented as
The unknown parameters for the model in Equation 1 are
and can be estimated by a leastsquares (LS) estimation under the assumption that is i.i.d.. The minimization of the sum of squared residuals yields the LS estimators:
The minimization problem of Equation 3 can be solved by a grid search over all combinations of possible values of parameters . Thus, the search method requires a number of approximately arranged autoregressions. Alternatively, one can estimate the unknown thresholds and parameters via a model selection perspective by searching the minimum of a specified information criterion (e.g. AIC, BIC or HQIC). Fortunately, Gonzalo and Pitarakis [?] proposed a sequential model selection approach under an unknown number of thresholds for estimating all threshold parameters one at a time, which reduces the computational cost significantly. The procedure starts with deciding between a linear and a one threshold (i.e. two regime) AR specification. If the existence of a threshold cannot be rejected then the sample can be arranged into two subsamples by the threshold. To search for the existence of the another threshold, the same procedure is repeated on both subsamples that were conditionally created on the threshold in the first step. The iterations stop when the model selection procedure cannot verify the presence of an additional threshold. In this case, the required number of arranged autoregressions to be estimated approximately decreased to .
The statistical properties of the TAR model and more detailed information can be seen in [?], [?] and [?]. In line with the potential of the TAR specification to distinguish multiple time series, we made use of the TAR specification for observing nonlinear associations, and the AR specification for observing linear associations in the multiple time series clustering task.
3.2Clustering of Time Series
The clustering of multiple univariate time series based on measured distances over raw data via common clustering methods, such as Kmeans and FuzzyC means, would be inappropriate since the raw time series data cannot exhibit most of its timedependent statistical properties and structure without any statistical analyzing tool. Therefore, a proper way of summarizing the timedependent data should be investigated to allow for comparing underlying structures.
In this study, the set of time series that needed to be clustered were represented by feature vectors. Feature vectors were specifically designated to cover the timedependent information of the time series and the entries of each vector contained comparable model based outputs and timedependent statistics that could be used to distinguish each time series. Nevertheless, real time series data may have a complicated generating mechanism (i.e. source) and features, and for this reason, we needed competent and flexible tools to capture and summarize them. In this respect, one of the concerns of the study was grouping time series with respect to the similarity between their true DGMs. The approximation to the DGM of each series was investigated using TAR models, which are known for their ability to represent nonlinear features in time series, such as abrupt changes and regime shifting behavior.
Forming a feature vector from a univariate time series, , can be illustrated as;
where , and is the lag order that acf, pacf are calculated up to. To compare time series via TAR models, estimates of coefficients for each time series were included in feature vectors as, . The remaining serial correlation structure of residuals was observed by calculations and statistically significant correlations were added to the feature vector. The potential heteroscedasticity in the residuals was aimed to be evaluated by the of residuals versus squared residuals and significant correlations were maintained at feature vectors. Finally, significant AR coefficients, and the autocorrelation structure of the stationarized time series, that presents the linear associations were added to the feature vector. Thus, a time series with number of observations was represented as a point in the dimensional space. Then, the spectral clustering approaches that are known for their superiority in graph partitioning would become available for the purpose of time series clustering.
In this study, we used the normalized spectral clustering in accordance with Ng et al.[?]. The main idea of the procedure is using the eigenvectors of an affinity matrix that are derived from the data. The algorithm is summarized in Figure ? (for the more theoretical aspects of spectral clustering see [?] and [?]):
3.3Proposed time series clustering approach
Describing the temporal dynamics of a time series process can be achieved by utilizing TAR modeling and, hence, the logic of the use of TAR model outputs in time series clustering is somewhat akin to imposing a similar effect to that of centrifuge machinery (i.e. a separator tool for different substances with respect to their densities) throughout the observed multiple time series. Rather than finding an exact or “true” model for each process, the main motivation was forming a plausible scale that allows us to distinguish time series according to their associations with different phases of the TAR model. In addition, linear AR model outputs and autocorrelation information were also considered during the formation of the feature space.
The proposed clustering approach aimed to produce a reasonable partition of time series by their feature vectors, which were designated to contain associations of time series with linear and nonlinear properties. As we state in Section 2, the temporal properties of time series were refined in the feature space, which has more explicit characteristics (i.e. spectra) than raw time series data.
Steps of the proposed approach given in Figure ?:
4Computational Study
To assess the performance and the ability of the proposed time series clustering approach, the simulation study and applications on real data sets were conducted and the details are given in Section 4.1 and Section 4.2.
The applicability of the proposed time series clustering approach was evaluated by considering various linear and nonlinear time series models (i.e. DGMs) in the simulation scenario. To determine the optimum number of clusters, the average silhouette values were used.
With the support of the results from the simulation study, the proposed time series clustering approach was applied on several commodity prices to get price groups that share similar dynamics. In addition to this, it was possible to check whether the price groups varied across time. Commodity price clusters and their timedependent behaviors were also evaluated, see Section 4.2.
ser01  
ser02  
ser03  
ser04  
ser05  
ser06  
ser07 


ser08 


ser09 


ser10 


4.1Simulation
The simulation study was conducted to evaluate the effectiveness of the proposed time series clustering approach. For the simulation study, consideration was given to 10 different time series models (i.e. DGMs), given in Table 1, consisting of 400 observations each. Representative patterns of the DGMs are shown in Figure 1. Ten different samples were generated from each model and a final dataset of size was clustered with the proposed clustering approach. To explore the sampling variability of clustering results, this scenario was replicated 30 times.
The DGMs used in the study were selected to cover different types of characteristics of time series, such as stationarity, nonstationarity, seasonality and threshold nonlinearity or regime switching. In order to assess the sensitivity of the proposed approach over similar DGMs, some of them were selected based upon their similarity in terms of dynamics/structure, such as lag orders and coefficients. The first 6 models in Table 1 are from the family of linear processes, more specifically: the 1^{st} and 2^{nd} ones have mainly seasonal characteristics; the 3^{rd}, 4^{th} and 5^{th} ones share the same orders but with slightly different magnitudes at each order; and the 6^{th} model is the first order integrated nonstationary process. The last four models in Table 1 are three regime threshold nonlinear processes with different magnitudes and threshold values in each regime.
In order to determine the appropriate number of clusters, the silhouette method developed by Rousseeuw [?] was considered. For each object (time series variable or corresponding feature vector in our case), one can find a certain value that is called the silhouette value and it can be calculated by,
where is the average dissimilarity of object to all other objects within the same cluster, is the minimum of average dissimilarities of object to all other objects in other clusters. The silhouette value, , attains its maximum value when the best clustering is observed for the object . The maximum possible value of is 1 and this occurs when the within dissimilarity is much smaller (close to ) than the smallest between dissimilarity, .
For a specific number of clusters, , the overall quality of clustering can be evaluated by assessing the average value of , where is the total number of objects (time series or corresponding feature vector) to be clustered. Thus, the number of optimum clusters can be determined by finding the maximum of the average silhouette value, , amongst the possible numbers of clusters.
Figure 2 presents the values for one replication of the simulation scenario and it indicates the optimum number of clusters being selected at the maximum value, which is the actual number of DGM group. In order to explore the sampling variability on values, 30 independent replicates of the simulation scenario were considered and the resultant boxplots per cluster number are presented in Figure 3. In this figure, the maximum values with the smallest variance were obtained at cluster number 10, which is the true number of DGM group. Besides its noticeable distribution and large variance, the boxplots for cluster numbers 2 and 4 implied that the values could attain such a value near the maximum that we could refer to them as local maximums. This result for the simulation study logically corresponded to the general categories of the selected models. That is to say, the considered models (DGMs) in the simulation study could be classified as 2 main groups, such as linear vs nonlinear models, which explained the outliers of boxplot at cluster number 2 or they could also be classified into 4 categories, such as stationary, nonstationary, seasonal and nonlinear models, which explained the skewness of boxplot at cluster number 4.
The simulation study revealed that the true number of clusters could be determined via the proposed approach by finding the maximum average silhouette value over possible cluster numbers. With the optimum number of clusters set at 10, the correct assignments of the simulated time series to their true DGM group for 30 replicates of the simulation scenario are given in Figure 4. According to the overall result, 98.5% of the simulated time series clustered into their true DGM group, given in Table 1.
4.2Application to commodity prices
The data used in the study contained the monthly averages of 14 commodity prices between January 1990 and December 2014. The data were obtained from the publicly available database of the World Bank for Commodity Price Dataset  Global Economic Monitor (GEM) Commodities. Table 2 shows the prices used in the study and Figure 5 depicts the logged and scaled traces of each commodity price.
Metals  Energy Aluminum 
Copper  Crude Oil WTI 
Lead  Crude Oil Dubai 
Nickel  Crude Oil Avg 
Tin  Natural Gas US 
Zinc  
Gold  
Platinum  
Silver  
Here, we considered samples from two general categories of the commodities, namely metal and energy related commodities. As we state in Section 3.2 and Section 3.3, in determining the price groups, the proposed clustering approach would not impose any qualitative information related to price categories. We aimed to find clusters that contained similar commodities in terms of their price behaviors. In this context, the comparing and clustering of price behaviors were independent from their categorical classification. However, the viability of the qualitative categorization could be investigated by clustering over the intra (i.e. metal prices) and inter (i.e. metal & energy prices) categories.
Moreover, to illustrate the time varying price clusters, the proposed clustering approach was implemented on three different time periods. The first period, from January 1990 to December 2007, was chosen to be outside the instant effect of the global financial crisis that occurred in 2008. The second period, from January 2010 to December 2014, was chosen to see whether the behavior of price groups changed after the global financial crisis. Finally, the third period, spanning the whole period and including the effect of the financial crisis, was used to compare cluster results with the clusters obtained in the first period.
One of the important steps of the proposed approach was the feature extraction based on nonlinear associations. This task was accomplished by fitting a SETAR() model to each commodity price. The choice of the number of regimes for economic time series can be decided by the framework of the empirical study, and the number of regimes were generally considered as two [?]. It should be noted that, for the sake of clarity, the number of regimes, , was fixed to three for both the simulated and real data sets. From a statistical point of view, the SETAR concept for modeling any time series can be utilized after ensuring the existence of a threshold nonlinearity. However, the aim of using the SETAR specification in the proposed approach was not for modeling and forecasting, but rather for measuring the associations of time series with different phases of the SETAR model by the coefficients and correlation structure of the residuals that were obtained from the estimation of SETAR(). Therefore, proving the existence of a threshold nonlinearity (i.e. ) was not strictly necessary in this case. As we observed in the simulation study, using the SETAR model as a seperator tool in the proposed approach performed well even for similar linear causal time series models. Nevertheless, although it was not necessary with regard to implementing the proposed clustering approach, we evaluated the regime shifting behavior or threshold nonlinearity of the commodities using Hansen’s threshold nonlinearity test that was developed by Hansen [?]. Table 3 shows the values of the test results. According to the test results, the threshold nonlinearity of commodity prices and regime shifting behavior could not be rejected at 0.05 significance level except for ZINC and CRUDEWTI.
The results of metal prices and metal & energy prices clustering are shown in Figure 6(a)(c) and Figure 7(a)(c). Each cluster implied that the cluster members came from a similar source of data generating mechanisms, and the comovement of commodities could be valid for each cluster members.
The general category of metals in the commodity market was qualitatively divided into two sub categories: base and precious metals. For the sake of clarity, aluminum, lead, tin, nickel, zinc and copper are base (i.e. industrial) metals, whereas gold, silver and platinum are precious metals. Figure 6(a) shows the two groups obtained with maximum at 0.92 for metal prices over the first period (i.e. before global financial crisis). Here, one of the groups consists of all the precious and three of the base metals (aluminum, lead and tin). Figure 6(b) shows the period after the global financial crisis: the number of groups increased to three and the group members changed. Three clusters were again obtained when we considered the overall data period, shown in Figure 6(c). It should be noted that both the number of clusters and the intra category (i.e metals) transitions were being observed across time.
Extending the data set to include five energy related commodities caused slightly different cluster formations than in the previous configuration. In Figure 7(a), (b) and (c) coincide with their counterparts in Figure 6 and share some similarities. However, considering the dynamic properties of energy related commodity prices with the metal prices together affected the cluster numbers and members. For example, when we look at the first data period in Figure 7, aluminum formed a cluster with crude oil prices, whereas in Figure 6(a) it was clustered with five other metal prices. This implies that the dynamic characteristics of the aluminum price are more similar to the crude oil price than the other metal prices in the first period. Similarly, the lead prices formed a cluster with the natural gas prices in the second period, as shown in Figure 7(b), which is different from when only metal prices were considered, as shown in Figure 6(b). In this respect, the results of the clustering were dependent on the selected commodities, which were chosen in line with the aims of the study. Although testing the hypothesis of the comovement between commodity prices was not the primary aim, the scope of the study indicated limited similarities with the previous studies that focused on the comovement of commodities, such as [?]. Since determining the price groups in this study relied on a comparison between their generating mechanisms, we could infer that the group members that shared similar dynamics tend to comove over a certain period. In this context, our findings do not support the hypothesis of herding behavior of commodity prices, and we observe that the comovement of the commodity prices can only be valid for timedependently determined price groups (i.e. clusters).
5Concluding remarks
Time series clustering and time varying clusters were investigated in this study. We proposed the clustering approach based on TAR and AR specifications that are able to collect the nonlinear and linear associations of time series, respectively. To this end, the feature vectors were formed that were expected to represent the linear and nonlinear temporal properties of time series. More importantly, feature vectors (or points) can be regenerated at each time index. In other words, time series are represented as points in the formed feature space in such a way that the dynamics of the points can be traceable across the time. Thus, the time varying groups of time series can be found by partitioning those points across time. Naturally, the points close to each other must be considered to be in the same group. Spectral clustering by Ng et al. [?] is adapted to a time series clustering task because of its known merits with regard to the partitioning of a set of points in .
The proposed clustering approach was evaluated according to a simulation study and applied on several commodity price series to obtain time varying clusters. It reveals that feature vectors based upon the time series models (e.g. TAR and AR models) can be used for the capturing and clustering of temporal dependencies of time series that originate from different sources. Determining the differences between multiple time series can be better facilitated if the models used in the clustering task are able to capture frequently encountered characteristics in time series, such as seasonality, asymmetry, regime shifting, time varying and chaotic behaviors. Then, the association of each time series with the considered time series models can be obtained and such strong associations can be emphasized by feature extraction. In fact, TAR models are appropriate for extending it with more components such as ARCH/GARCH structure in the innovations. Ultimately, the proposed approach suggests the use of omnidirectional time series models for time series clustering, which can provide outputs such as those that are used as separators for multiple time series.
The obtained commodity price clusters are promising and, therefore, the proposed approach can be used to generate specific price group indexes such as RICI, TRCCI and S&P GSCI. The results provided from clustering can serve as a basis for further multivariate statistical analyses such as VAR modeling and cointegration.
There are several possibilities for future work. The effectiveness of the proposed approach should be examined by considering the different kinds of DGMs. For example, using ARCH/GARCH effects with threshold models or hybrid models such as double threshold GARCH models can be considered. The number of variables used in the study was selected to give the longest coverage period and we used the monthly averages of commodity prices. The effectiveness of the proposed approach needs to be evaluated for high frequency time series such as daily and hourly data. Finally, expanding the feature space by constructing feature matrices for each time series and multivariate counterparts can also be considered.
Footnotes
 if is true, else