A Study of Correlations in the Stock Market
Abstract
We study the various sectors of the Bombay Stock Exchange(BSE) for a period of 8 years from April 2006  March 2014. Using the data of daily returns of a period of eight years we make a direct model free analysis of the pattern of the sectorial indices movement and the correlations among them. Our analysis shows significant auto correlation among the individual sectors and also strong crosscorrelation among sectors. We also find that auto correlations in some of the sectors persist in time. This is a very significant result and has not been reported so far in Indian context These findings will be very useful in model building for prediction of price movement of equities, derivatives and portfolio management. We show that the Random Walk Hypothesis is not applicable in modeling the Indian market and MeanVarianceSkewnessKurtosis based portfolio optimization might be required. We also find that almost all sectors are highly correlated during large fluctuation periods and have only moderate correlation during normal periods.
I Introduction
The stock market is an extremely complex system with various interacting components Mantegna99 (). The movement of stock prices are somewhat interdependent as well as dependent on a wide multitude of external stimuli like announcement of government policies, change in interest rates, changes in political scenario, announcement of quarterly results by the listed companies and many others. The overall result is a chaotic complex system which has so far proved very difficult to analyze and predict. In fact it is still not completely clear, what are the generic features that will appear in any stock market and what are the features which depend on the social, political and economic climate of the country and/or of the world. So it is important to study each market individually so that finally we can be sure that certain behaviors or patterns are universal. Although some amount of work has been done in understanding the stock markets in Europe lux () and the United StatesUSA (), the proper mathematical and statistical study of emerging markets like India are in their infancy sitabhra1 ().
So far, there is no exact understanding on which external stimulus has how much effect on the stock prices or even how the self interactions of the various stocks or the various sectors drive the market. Broadly speaking, the price movement of a particular stock can be classified as (i) market (common to all stocks), (ii) sector (related to a particular business sector) and (iii) idiosyncratic (limited to an individual stock). While it is virtually impossible to develop any theory for this idiosyncratic movement, it is possible to analyze, study and build models for the other two types of stock movement. From an investors point of view, the most important reason to understand the stock market is to get the maximum possible return on an investment with the minimum possible risk. So a better understanding of the stock market will lead to better theories of portfolio management.
One important step in improving our understanding of the stock market is to study how the stock price movement of one stock affects the price of other stocks. One way to do this would be to see how one stock movement affects the others within the same sector. Another is to study how the overall prices of the various sectors are correlated. The goal of this study is to try to determine and quantify, from the available data, some of the possible correlations which might exist between the stock prices. This will not only enhance the understanding of the stock market as a whole but will play a crucial role in investment decisions like portfolio management. A systematic model independent analysis of the data that we do will also help in building more efficient and enhanced models which will give adequate weightage to the various relations which exist between movement of stock prices across sectors in a market and may help in forecasting future trends. Studies of such correlations have been carried out to a limited extent in the context of New York Stock Exchange NYSE (), but to the best of our knowledge, no such study exists for the Bombay Stock Exchange (BSE) BSE ().
To understand the financial market, it is very important to know the distribution of the return on a stock. Our data consists of the daily returns of 12 sectors of stocks of the BSE for Financial Year(FY) 2006 to FY 2013 i.e. days from 3rd April 2006 to 31st March 2014. We will be treating each sector as one entity in the rest of the paper. This approach is novel and has not been carried out before, at least in the context of Indian markets.
If is the index of the sector at time , then the (logarithmic) return of the th sector over a time interval to days in the interval is defined as
(1) 
In our case , the number of days we have considered, and because we look at the following 12 sectors SP BSE Auto (Auto), SP BSE Bankex (Bankex), SP BSE Consumer Durables (CD), SP BSE Capital Goods (CG) , SP BSE FMCG (FMCG), SP BSE Health care (HC), SP BSE IT (IT), SP BSE Metal (Metal), SP BSE Oil and Gas (Oil and Gas), SP BSE Power (Power), SP BSE Realty (Realty) and SP BSE Teck (Teck) and the SP BSE SENSEX (Sensex) which serves as the benchmark. The plot of the Sensex index and the log return over the time interval under consideration is given in Figure 1. From Figure 1, it is clear that we can divide the entire period in two sub interval (i) from FY 2006  FY 2009 as large fluctuation period and (ii) from FY 2010  FY 2013 for normal period. We shall discuss how the cross correlations of the sectors are markedly different in these two periods, later in the paper.
Obviously the mean return of the th sector is given by
(2) 
Defining , we can write the th moment of the th sector as
(3) 
For example, the second moment gives the variance as
(4) 
These definitions are used in the analysis subsequently.
Our paper is organized as follows. In section II we explore the individual sectors mentioned above and use the data to determine some features of the distribution of the returns and find significant deviations from normality. We then calculate the autocorrelation of log returns for all sectors indices, to test the market efficiency, and find that there is significant autocorrelation in most of the sectors of BSE at lag 1. The more surprising result is that the analysis of our data shows that the autocorrelations in some sectors persist at higher lags. In section III we analyze the crosscorrelations among sectors in BSE. Our study spans over FY 2006  2013, a time span which consisted a period large fluctuation in indices movement and normal fluctuation period. We find that, almost all sectors are highly correlated during period 2006  2009 and they are moderately correlated during 2009  2013. We finally conclude in section IV with a summary of our results and its interpretations.
Ii Understanding BSE sectors
It is commonly believed that the distribution for log return of a stock or for log change in a index movement is a normal distribution. However, many empirical studies shows deviation from this perception. Consequently, any prediction based on the normal distribution will generally fail. In particular, if there is any deviation from normality, the Random Walk Hypothesis will not be valid. Therefore, it is essential to first understand the distribution of any stock or index movement before using any model. Let us first consider the distribution of returns for the various sectors.
The study of Skewness and Kurtosis is very useful to characterize the distribution. We know that if a distribution is normal, then sample Skewness and sample excess Kurtosis will be close to zero omnibustest (). Any significant deviation from zero indicates a deviation from normality.
The sample skewness and excess kurtosis of any distribution of the th sector can be written in terms of the moments (3) as
Sample Skewness:  (5)  
Sample excess Kurtosis:  (6) 
The Standard Error in Skewness(SES) and Standard Error in Kurtosis(SEK) are given byomnibustest ()
(7) 
The sample skewness and sample excess kurtosis of the above sectors are displayed in Figure 2. The Standard Error in Skewness() and Standard Error in Kurtosis() are calculated based on the formulae given by (7). It is clear from Figure 2 that there is significant deviation from zero for sample Skewness and sample excess Kurtosis in all sectors. Hence, based on the study of sample Skewness and sample excess Kurtosis, we can say confidently that each individual sector’s return shows positive Kurtosis (fat tails) accompanied by Skewness. This clearly shows that the returns of none of the sectors are normally distributed.
To further strengthen this claim we perform the D’AgostinoPearson omnibus test omnibustest (). It is based on two quantities depending on both Skewness and Kurtosis. The quantities are defined as follows:
(8)  
(9) 
If the distribution of sector is normal, the should be that of distribution. So if , then the distribution of the sector is not normal. For a normal distribution of sector, should be with significance level of . What we find is that the values for all the sectors are much larger than . Therefore, the statistical results clearly indicate that the data does not satisfy the normality assumption, i.e. the change in index movement of individual sectors shows large deviation from normal distribution . This finding is also consistent with recent works devnorm () and shows that returns are driven by assymetric and fattailed distribution. This also clearly indicates that the market cannot be modeled using the Random Walk Hypothesis Fama (). For stock market modeling or from the perspective of portfolio management the meanvariance model Mark () should be expanded by meanvarianceskewnesskurtosis based portfolio optimization Lai ().
To further explore the nature of the auto correlation, we look at the time series of the auto correlation data for the various sectors. This study is important because if there is autocorrelation in the time series we can predict immediate future based on present information. If there is no autocorrelation in the time series data, the data are uncorrelated and it is not possible to make future predictions confidently.
To emphasize, if there is autocorrelation in the time series at lag , it is possible to make predictions about immediate future with high degree of certainty. Here, we have estimated the sample auto covariance at lag for a finite time series of T observations by BOX ()
(10) 
where is given by our definition (1). The autocorrelation at lag can then be estimated as:
(11) 
The function is known as the Auto Correlation Function(ACF).
We have used the Bartlett’s approximation bartlett () to estimate the variance of the ACF, at lags greater than some value beyond which the autocorrelation function may be deemed to have died out. This is defined as BOX ():
(12) 
The standard error for estimated autocorrelation is:
(13) 
We calculate the auto correlation of log returns for all sector indices of BSE. It clearly shows that there is significant autocorrelation in most of the sectors of BSE at lag 1. Therefore, residual effect is confirmed in almost all sectors in BSE. A statistically significant ACF value at lag 1 indicates an autoregressive component exists in the time series. In fact, we find some auto correlation in most of the sector persists over time.
Our results show that there is very little auto correlation in FMCG, weak autocorrelation in IT, Teck and Oil and Gas, and significant lag auto correlation in Auto, Bankex, CD, CG, HC, Metal, Power, Realty, and also in Sensex. In Figure 3, we have plotted the ACF for three BSE sectors for illustration. The figure also shows how the ACF persists in time. This feature, obtained by the analysis of our data, is extremely striking and has not been reported in literature before. Further analysis is required, in future works, to fully understand this feature.
The study of the ACF is an empirical test of the efficiency of the BSE market for the period under consideration. The persistence of auto correlation we see above clearly indicates that the BSE is not an efficient market. For example Figure 3 shows the significant consistent autocorrelation in REALTY (lags ), FMCG (lags ) and Sensex (lags ). Without high frequency data it is not possible to comment why these exact days lag are significant. However this broad analysis shows that during the period under consideration the BSE was not even weak efficient.
According to the Efficient Market Hypothesis(EMH), the stock prices fully reflect any changes in the information available to investors. For example, a market following random walk is consistent with the EMH. It has been shown efficient () that mature stock markets are generally weak efficient. A departure from weak efficiency (i.e. deviation from random walk) may point towards possible market manipulation.
The autocorrelation exhibited by the BSE sectors agrees with the findings in LO (). Those authors also show that autocorrelation in returns might generate a momentum. Therefore, a BSE sector that outperformed other sectors in the past might continue to do so for some time interval. These features in the auto correlation may be crucial for portfolio management in Indian equity markets
Financial market volatility is central to the theory and practice of asset pricing, asset allocation, and risk management. Popular assumption is that volatilities and correlations are constant, but we have seen that they have significant variation over time. Therefore, the study of can be useful for investor Beta (). The factor is defined as
(14) 
A of 1 indicates that the security’s price will move with the market. A of less than 1 means that the security will be less volatile than the market. A of greater than 1 indicates that the security’s price will be more volatile than the market. As an example, from Figure 4 we can see that the of the Bankex sector is 1.45 ( 45 more volatile than the market) while that of the HC sector is 0.49 ( less volatile than the market). A systematic study of this parameter will be undertaken in a future work.
Iii Crosscorrelation among BSE sectors
In the last section we have shown that there is significant autocorrelation in most sectors. Let us now try to see whether the movement of the indices in various sectors are also correlated i.e. whether there exists any cross correlation between the sectors. Some study of cross correlations of other markets have been carried out in different contexts collec () but to the best of our knowledge, there exists no studies of the correlations between sectors at least in the context of the Indian financial markets.
To understand the interactions among the sectors, it is useful to study the spectral properties of the correlation matrix of sectorial indices movements. The deviation of eigenvalues of the correlation correlation matrix from those of a random matrix provide signals about the underlying interactions between various sectors. The largest eigenvalue is identified as representing the influence of the entire market, common for all sectors. The remaining large eigenvalues are associated with the different sectors, as indicated by the composition of their corresponding eigenvectors Gopikrishnan (). This is what we do in this subsequently in this section.
If the time series of returns of sectors of length are mutually uncorrelated, then the resulting correlation matrix is random and is known as Wishart matrix MarchPast (). It is known that the empirical distribution of the eigenvalues of the Wishart matrix almost always converges to a probability distribution as and where is a constant such that . In that limit the distribution is continuous and supported on where MarchPast (). This bound is known as the Random Matrix Theory (RMT) bound. Therefore, the eigenvalue of the Wishart matrix should lie between and . We estimate the sample cross correlation matrix for our data set i.e for sectors for days and (see Figure 5).
The reduced number of Principal Components (PC) of the cross correlation matrix that can explain most of the total variance is given in terms of the eigenvectors of the cross correlation matrix as
(15) 
We find the eigenvalues of the cross correlation matrix. The eigenvectors corresponding to these eigenvalues are the PCs of the cross correlation matrix. These eigenvectors can be expanded in a basis given by the 13 sectors we are considering. All the eigenvalues and the expansion of the PCs in our chosen basis is given in Figure 6.
From Figure 6 we can see significant deviation of the largest eigenvalue of the PC1 from the largest eigenvalue of RMT. The largest eigenvalue of the cross correlation matrix is 9.11. Also, from the first column (corresponding to PC1) of Figure 6. the eigenvector of largest eigenvalue shows a relatively uniform composition, i.e. all sectors contribute to it and all elements having the same sign.
A very useful visualization of what we discussed above is the scree plot scree () as can be seen in Figure 7. The fact that the PC1 is so large and that it affects all the sectors with the same ratio, we can say that the largest eigenvalue is associated with the collective response of the entire market to external informations Mantegna99 (); Laloux99 (), i.e. the largest eigenvalue is due to the existence of a marketinduced correlation across all sectors. Since PC1 dominates to such a large extent it is difficult to observe the correlations between sectors.
From the investment point of view, it is interesting to note that the Tech and the IT sectors are highly correlated all the time. Hence, it would be better to club both these the sectors together for modelling and for portfolio diversification purposes.
The scree plot also gives some very useful information about periods of large fluctuations. During the time of large fluctuations we find that there is a is large correlation among most of the sectors. As a comparison consider Figure 8 where we compare the cross correlation matrices of a period of large fluctuation (April 2008  March 2009) with a period of relatively small fluctuation (April 2012  Mar 2013). As can be seen from the figure, although there exists significant cross correlations at both the times, the magnitude is lesser in the later period. This indicates that periods of large fluctuations can be studied using models where the correlation strength becomes large. Since periods of large fluctuations may correspond to crashes in the stock market, a systematic study of the cross correlation matrices of these periods will provide valuable insights into understanding and modeling crashes.
A more efficient way of analyzing this would be by doing the Principle Component analysis we had performed previously in this section. Again, scree plots provide a more efficient and rigorous demonstration of the increase in correlations during periods of crisis. As can be see in Figure 9, the PC1 when the entire market is experiencing large fluctuations is 9.91, while it comes down to 6.72 during period of relative calm. We can actually zoom in to the actual time of the crash (Jan 2008) in Figure 10 using the quarterly and monthly data and see that the PC1 is actually higher (11.32) during that time. This can provide a efficient and novel way of analyzing crashes of the stock market.
Iv Conclusions
In this paper, we have carried out a model independent analysis of the BSE for a period of 1990 days. This time frame contains periods of both small and large fluctuations and thus provides a good sample to understand and study the generic behavior of the stock market. Also the number of days chosen was large to avoid small sample size errors. Instead of studying the movement of individual stock returns as is usually done, we study the movement and behavior of groups of stocks, the grouping being done in terms of sectors. We look at 12 sectors of stocks and use the whole Sensex as the benchmark. The auto correlations in the return data captures how the stocks within the individual sectors interact among themselves while the cross correlations look at how the sectors affect each other.
We found the presence of significant auto correlations in all the sectors clearly demonstrating that the movement of the stock prices cannot be modeled via random walk. While this is usually a accepted feature of stock market model, our analysis of the departure from normality is rigorous. It is not just based on the non zero skewness and kurtosis but also on D’AgostinoPearson omnibus test. This comprehensively shows the existence of auto correlations. From an investors point of view, this means that the only meanvarianceskewnesskurtosis based methods of portfolio optimization will be useful.
A more interesting feature which we find in the study of auto correlations is that they persist over time. The ACF is significant for all sectors at lag one and there are certain sectors where this auto correlation persists at higher lags. This indicates that the BSE has significant departure from efficient market and EMH cannot be used to model the stock price movement in BSE. This is a very interesting property of the stock market which has to be accounted for in the future models. For financial markets to be meaningful and useful to the economy, they must be at least weakly efficient. Some of the reasons why BSE is not efficient may be (i) weak disclosure procedure (ii) poor quality and quantity of company’ disclosure (iii) almost no public awareness about securities (iv) no transparent regulation, supervision and administrative rule. This feature of the BSE should be of great interest not just to the investors but also policy makers and market regulators. Further analysis of this will be done in a future work.
We also study the relative volatility of the sectors compared to the whole market, measured in terms of . This parameter, as we point out, should have a significant role in making investment decisions. How to use this parameter in building physics models of financial markets is a direction of future work.
The cross correlation was studied by doing the Principle Component analysis of the correlation matrix. Our findings show that there exist a very large cross correlation but that correlation is due to some external force which drives the market as a whole. The effect of sectors on each other is smaller but not insignificant and will be the focus of a future work. A very important feature following from our analysis is that the value of PC1 increases during periods of large fluctuation of the market. This can have far reaching application in studying and predicting crashes of financial markets.
Acknowledgement
We would like to thank Dr. Sitabhra Sinha for careful reading of the manuscript and helpful suggestions.
References

(1)
R. N. Mantegna and H. E. Stanley, “Introduction to Econophysics”, Cambridge University Press, Cambridge, (1999).
Sitabhra Sinha, Arnab Chatterjee, Anirban Chakraborti and Bikas K. Chakrabarti, “Econophysics: An Introduction”, WileyVCH, Weinheim, (2010).  (2) T. Lux, “The stable Paretian Hypothesis and the Frequency of Large Returns: an Examination of Major German Stocks”, Applied Financial Economics 6: 463â475 (1996).

(3)
Dennis P. Quinn and HansJoachim Voth, “A Century of Global Equity Market Correlations”,
American Economic Review: Papers Proceedings 98:2, 535â540, (2008).
Fabrizio Lillo, Rasarion N. Mantegna, “Ensemble properties of securities traded in the NASDAQ market”, Physica A, Volume 299, Issues 1â2, Pages 161â167 (2001). 
(4)
S. Sinha and R. K. Pan, “The Power (Law) of Indian Markets: Analyzing NSE and BSE trading statistics”,
Econophysics of Stock and Other Markets (Eds. A. Chatterjee and B. K. Chakrabarti), Springer pp 2434 (2006).
S. Sinha and R. K. Pan, “Uncovering the Internal Structure of the Indian Financial Market: Large crosscorrelation behavior in the NSE”, Econophysics of Markets and Business Networks (Eds. A. Chatterjee and B. K. Chakrabarti), Springer pp 319 (2007).  (5) http://www.nyse.com/
 (6) http://www.bseindia.com/

(7)
Ralph B. DâAgostino, Albert Belanger, Ralph B. DâAgostino,Jr,
“A suggestion for using powerful and informative tests of normality”,
The American Statistician 44 (4): 316â 321. JSTOR 2684359, (1990).
D. N. Joanes and C. A. Gill, “Comparing Measures of Sample Skewness and Kurtosis”, The Statistician 47(1): 183â189.
R. DÊ¼Agostino , E. S. Pearson, “Tests for Departure from Normality”, Biometrika, 60(3): 61322, (1973).  (8) V. Plerou, P. Gopikrishnan, H.E. Stanley, “TwoPhase Behavior of Financial Markets”, Nature 421: 130, (2003).
 (9) Eugene F. Fama, “Random Walks In Stock Market Prices”, Financial Analysts Journal 21 (5): 55â59 (1965).
 (10) H. Markowitz, “Portfolio Selection”, Journal of Finance (7): 7791, (1952).

(11)
K. K.Lai, L. Yu, and S. Wang, “MeanVarianceSkewnessKurtosis based Portfolio Optimization”,
Proceeding of the First International MultiSymposiums on Computer and Computational Sciences, Vol 2 (IMSCCS 06):
292297, (2006).
P. Jena, T.K.Roy, and S. K. Majumdar, “MultiObjective MeanVarianceSkewness Model for Portfolio Optimization”, Advanced Modelling and Optimization, 9(1): 181193, (2007).  (12) G. E. P. Box, G. M. Jenkins, G. C. Reinsel, “Time Series Analysis: Forecasting Control”, 4th ed., Wiley, (2008).
 (13) M. S. Bartlett, “On the theoretical specification and sampling properties of autocorrelated time series”, J. Royal Statist. Soc., B8: 2741, (1946).

(14)
E. Fama, “The Behavior of Stock Market Prices”, Journal of Business, 38: 34105, (1965).
E. Fama “Efficient Capital Markets: A review of theory and empirical work”, Journal of Finance, 25 (2): 383417, (1970).  (15) A. W. Lo and A. C. MacKinlay “A NonRandom Walk Down Wall Street”, Princeton University Press, Pages 14, 16, and chapter 2 (1999).

(16)
Myron Scholes, Joseph Williams, “Estimating betas from nonsynchronous data”, Journal of Financial Economics 5 (3):
309â327, (1977).
Seth Klarman, Joseph Williams, “Beta”, Journal of Financial Economics 5 (3), (1991).  (17) R. K. Pan and S. Sinha, “Collective behavior of stock price movements in an emerging market”, Phys. Rev. E (76) 046116: 19, (2007).

(18)
V. Plerou, P. Gopikrishnan, L. A. N. Amaral, M. Meyer, H. E. Stanley,
“Scaling of the Distribution of Price Fluctuations of Individual Companies”,
Phys. Rev. E (60): 6519â6529 (1999).
P. Gopikrishnan, V. Plerou, L. A. N. Amaral, M. Meyer, H. E. Stanley,“Scaling of the Distribution of Fluctuations of Financial Market Indices”, Phys. Rev. E 60: 5305â5316 (1990).
M. Sarma “Characterisation of the tail behaviour of financial returns: studies from India”, EURANDOM Report 2005003 (http://www.eurandom.tue.nl/reports/2005/003MSreport.pdf) (2005).
K. Matia , M. Pal , H. Salunkay , H. E. Stanley, “Scaledependent price fluctuations for the Indian stock market”, Europhys. Lett. 66: 909â914 (2004). 
(19)
D. Johnson, “Some limit theorems for the eigenvalues of a sample covariance matrix”, J. Multivariate Anal. (12):
138, (1982).
V.A. Marchenko and L.A. Pastur, “The distribution of eigenvalues in certain sets of random matrices”, Mat. Sb. (72): 507536 (1967).  (20) R. B. Cattell, “The Scree Test for the Number of Factors”, Multivariate Behavioral Research 1: 245â276, (1966).

(21)
L. Laloux, P. Cizeau, J. P. Bouchaud, M. Potters, “Noise Dressing of Financial Correlation Matrices”, Phys. Rev. Lett. (83): 1467 (1999).
V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. Nunes Amaral, H. E. Stanley, “Universal and Nonuniversal Properties of Cross Correlations in Financial Time Series”, Phys. Rev. Lett. (83): 1471 (1999).