# Distinguishing manipulated stocks via trading network analysis

## Abstract

Manipulation is an important issue for both developed and emerging stock markets. For the study of manipulation, it is critical to analyze investor behavior in the stock market. In this paper, an analysis of the full transaction records of over a hundred stocks in a one-year period is conducted. For each stock, a trading network is constructed to characterize the relations among its investors. In trading networks, nodes represent investors and a directed link connects a stock seller to a buyer with the total trade size as the weight of the link, and the node strength is the sum of all edge weights of a node. For all these trading networks, we find that the node degree and node strength both have tails following a power-law distribution. Compared with non-manipulated stocks, manipulated stocks have a high lower bound of the power-law tail, a high average degree of the trading network and a low correlation between the price return and the seller-buyer ratio. These findings may help us to detect manipulated stocks.

###### keywords:

network analysis, trading network, power-law, manipulation^{1}

[cor1]Corresponding address:

No.6, South Road, Kexueyuan, zhongguancun, Institute of Computing Technology,Chinese Academy of Sciences, Beijing 100190, China.

## 1 Introduction

Dynamic behavior of stock markets has attracted much academic and industrial attention. Studies on dynamic behavior are facilitated by the availability of numerous historical financial data accumulated with the development of stock markets. The financial data can be roughly classified into two categories: time series of financial variables and detailed transaction data which contain the information of trader identities and their transactions.

For the first category of financial data, the traditional research approach is to employ the probability distribution functions in order to analyze statistical properties of various variables (1); (2); (3); (4); (5); (6); (7); (8); (9); (10); (11); (12); (13); (14) and to employ correlation functions to study the cross correlation among different variables (15); (16); (17); (18); (19); (20); (21). A power-law distribution has been found in many variables, such as price fluctuations, trading volume and the number of trades. In (2), a model is proposed to explain these empirical power-law distributions. Network analysis provides another method for studying the financial market (22). For example, researchers analyze the networks of listed companies (23); (24); (25) or stocks index fluctuations (26); (27); (28). Community structure and hierarchical structure are two properties of complex networks (29); (30); (31); (32). The community structure is shown in two real-world financial networks, namely the board network and the ownership network of the firms of the Italian Stock Exchange (24). Mantegna find a hierarchical arrangement of stocks by investigating the daily time series of the logarithm of stock price (25).

Unlike the data of the first category which characterize the macro-level properties of a stock market, the second category of financial data describe the specific transactions among investors from a micro-level perspective. From transaction data, a trading network can be constructed, in which traders are mapped into nodes and each transaction relation corresponds to a directed link. Kyriakopoulos et al. investigated the statistical properties of the transaction network of all major financial players within Austria over one year (33). They found that the transaction network is disassortative and that eigenvalue analysis is helpful for detecting abnormal behavior of financial players. Tseng et al. conducted a market experiment with 2,095 effective participants in order to explore the structure of transaction networks and to study the dynamics of wealth accumulation (34). They found that the transaction networks are scale-free and disassortative. In addition, they find that the wealth distribution follows Pareto’s law. Wang et al. constructed a trading network using the real trading records from Shanghai Future Exchange and found that future trading networks exhibit such features as scale-free behavior, a small-world effect, hierarchical organization, and a power-law betweenness distribution (35). Furthermore, in (36), Jiang and Zhou first studied the trading networks of all traders of one stock in the Chinese stock market. They found that the trading networks comprise a giant component and have power-law degree distributions and disassortative architectures.

The endeavors on the network of investors indicate the micro-level mechanism provides a promising way to shed light on the relationship between the macro-level statistical properties and the micro-level trading behavior. In this paper, to study the manipulation of stock, we focus on the micro-level of trading behavior. Specifically, we investigate the investors’ trading behavior using the transaction data of up to 108 liquid stocks. We first analyze the statistical properties of the trading network and find that the networks constructed from every stock have power-law degree distributions. We further find that the node strength distributions of the trading networks also have power-law tails. Finally, through comparison between manipulated stocks and non-manipulated stocks, we find that manipulated stocks have a high lower bound of the power-law tail, high average degree of the trading network and low correlation between the price return and seller-buyer ratio. These findings provide a promising way of detecting manipulated stocks.

The paper is organized as follows. Section 2 describes the data set. In Section 3, statistical properties of the trading network are analyzed. In Section 4, the features of manipulated stocks are analyzed and then compared with those of non-manipulated stocks. Finally, the conclusions are given in Section 5.

## 2 The data sets

In a stock market, a trader submits bid/ask orders to the electronic trading system when he/she wants to sell/buy shares. In the trading system, the ask orders are sorted in ascending order of price and the bid orders are sorted in descending order. Through matching a bid order and an ask order according to a certain rule of price/time priority, the trading system generates a transaction record. All of the transaction records constitute the transaction data which are the raw data used in this paper. These matched orders are often called executed orders.

In this paper, we study the transaction data of 108 stocks traded on Shanghai Stock Exchange and Shenzhen Stock Exchange during the whole year 2004. The transaction data for the 108 stocks contain 44,333,930 transaction entries, which involve 11,686,740 unique trader accounts. Each entry consists of the date and time, the unique number for the transaction, the buyer ID, the seller ID, the volume and price. Among all these 108 stocks, eight stocks had been manipulated by some investors and these manipulated stocks are revealed by China Securities Regulatory Commission(CSRC) for trade-based manipulation. In addition, for five of the eight manipulated stocks, the manipulated period persists through the whole year of 2004 and the manipulated period for the remaining three stocks is from Jan 2004 to Sep 2004. The basic description of the data sets is reported in Table 1.

Non-manipulated | 100 | 111,215 | 416,719 | Jan 2-Dec 31 |
---|---|---|---|---|

Manipulated | 5 | 29,564 | 119,838 | Jan 2-Dec 31 |

Partially manipulated | 3 | 139,130 | 687,597 | Jan 2-Sep 3 |

*N*is the total number of stocks. is the average number of traders, and denotes the average number of entries.

## 3 Stock trading network

Generally speaking, a stock market comprises of two main entities, namely list companies and investors. Stock trading behavior occurs between buyers and sellers, and all the trading behavior forms a directed network, which is called the stock trading network. The stock trading network provides a straightforward representation for characterizing the trading relationship among institutional or individual investors. It is believed that the study of the stock trading network may provide some insights into the relationship between the price fluctuation and the collective trading behavior.

### 3.1 Construction

We first describe the construction process of the stock trading network according to the transaction data. For each specific stock, we construct a trading network using the transaction records in a whole year. Specifically, the nodes of the trading networks correspond to buyers and sellers involved in the transaction data of the stock considered. For each transaction record, a directed edge is constructed pointing to the buyer from the seller. The volume of the transaction record is taken as the weight of an edge. In this way, there may exist multiple edges between two nodes if more than one transaction occurred between the same seller and buyer. Note that the directed and weighted edges reflect the flow of shares from the seller nodes to buyer nodes and the flow of cash in the opposite direction. Without losing such a physical meaning, we merge the multiple edges with the same seller and buyer nodes into one weighted edge. The final trading stock network is a directed and weighted network. Figure 1 illustrates the construction of the stock trading network using a toy example. To give an intuitive understanding, Figure 2 gives a stock trading network of a real non-manipulated stock in our data set.

In (36), a stock trading network is constructed using the transaction data for each trading day. However, as pointed out in (37), the stock market suffers weekend, holiday and monthly effects. To alleviate such effects, we aggregate the transaction data of the whole year into a sole trading network instead of constructing one trading network for each trading day. In addition, no isolated node exists in trading networks since each transaction involves two traders, a seller and a buyer.

### 3.2 Degree distribution

The degree of a node in a network is the number of connections that the node has to other nodes. Since the edges of the trading network are directed, the network is characterized by two different degrees, the out-degree, which is the number of outgoing edges, and the in-degree, which is the number of incoming edges. The degree distribution function of a network describes the heterogeneous properties of nodes. The distribution of outgoing edges (k) signifies the probability that an investor sells shares to *k* investors, and the distribution of incoming edges is the probability that an investor buys shares from *k* investors. We investigate two probability distributions for each stock during the whole year period. For the investor *i*, the more nodes it connects to, the more active the corresponding investor is.

Figure 3(a) illustrates the cumulative probability distribution of out-degrees and in-degrees for five randomly chosen stocks. We can see that there is a power-law decay when *k* is larger than , such that:

(1) |

(2) |

In order to confirm our observations of these power-law distributions, we adopt the least square estimate to fit the curve and test the goodness of fit with the Kolmogorov-Smirnov (KS) statistic. The calibrating parameters and are substituted into the KS test procedure to test the goodness of fit. The null hypothesis for our KS test is that the data ( ) are drawn from a power-law distribution. In (38), a power-law fitting approach for estimating the power-law exponent of empirical data and the lower bound of the power-law behavior is proposed, which combines maximum-likelihood fitting methods with goodness-of-fit tests based on the KS statistic and likelihood ratios. We calibrate the empirical data by this approach and find that the null hypothesis cannot be rejected at the significant level of 0.01 for all the 100 non-manipulated stocks. Figure 3(b) depicts the bivariate distribution of pairs (, ) and (, ). Each node represents the lower boundary of power-law tail and the estimated power-law exponents and . As shown in the inset of figure 3(b) the average values are = and =. There are 75% stocks whose values are well within the Lvy-stable regime and 73% stocks whose values are located in the Lvy-stable regime. We also notice that the degree distribution of the directed trading network also falls in the Lvy-stable regime (36). The degree distributions show that a few investors have many connections and most investors only have a few links.

### 3.3 The node strength distribution

In order to understand the investors’ trading behavior for one stock, we analyze the node strength of the trading network. The node strength is defined as the sum of all edge weights of the node. We define the weight on an edge as the total volume exchanged from *i* to *j*. We investigate three types of strength: the in-strength , the out-strength , and the total strength , where is the weight on link ij. The in-strength of a node is the total volume bought by the corresponding trader in a certain period , the out-strength means the total volume that an investor sold and the total strength is equal to the total trading volume exchanged by an investor in a certain period. We analyze the trading volume of each investor for one stock in a year to find the statistical properties of the investors’ trading behavior.

After calculating the cumulative distribution of 100 stocks, we find that the strength distributions exhibit power-law decay in the tails when :

(3) |

We use the same strategy as was utilized in Section 3.2 to confirm our observations of power-law distributions. The KS test is implemented on the calibrating results to check the goodness of fit. Our null hypothesis is that the strength can be well modeled by a power-law distribution. At a significance level of 0.01, we find that the null hypothesis cannot be rejected for all 100 stocks for in-strength , out-strength and total strength S. Figure 4(a) illustrates the in-strength and out-strength cumulative distributions for five randomly chosen stocks, and the solid lines are the best fits to the power-law distributions. Figure 4(b) depicts the bivariate distribution of pairs(,). Each node is associated with the lower bound of the power-law tail and the estimated power-law exponents . The inset shows the frequency of exponent , and the average values are =, = and =. From the Figure4(b), we find that power-law exponents lie in [0,2] for the in-strength distribution, out-strength distribution and total strength distribution.

## 4 Trading activity of manipulated stocks

The first researchers who studied manipulation were Allen, Gale and Jarrow. They studied the history of the stock price manipulation and classified the manipulations into three types: information-based manipulation, action-based manipulation and trade-based manipulation. Today, the most common types of manipulation are information-based manipulation and trade-based manipulation. In information-based manipulation, manipulators spread rumors and false information to influence the stock prices. In trade-based manipulation, manipulators engage in fraudulent trading to create an image of an active market (39); (40). The manipulated stocks in our data set are published by China Securities Regulatory Commission(CSRC) for trade-based manipulation. In this paper, we investigate the trading network for the analysis of trade-based manipulation only.

### 4.1 The degree and strength distribution of manipulated stocks

After achieving an understanding of the statistical properties of stock trading networks, we want to know whether the manipulated stocks exhibit the same phenomenon as non-manipulated stocks. If the manipulated stocks are different from non-manipulated ones, how different are they? In order to answer these questions, we analyze eight manipulated stocks published by CSRC. In our data set, five of these eight stocks have the manipulated period through a whole year and the remaining three stocks’ manipulated periods are from Jan 2004 to Sep 2004.

In order to compare the differences between non-manipulated stocks and manipulated stocks under the same conditions, we take the following method to select a set of non-manipulated stocks as reference stocks. First, for each manipulated stock, the reference stocks are the non-manipulated stocks which have the same capitalization and belong to the same industry sector. Second, we compute the degree distribution and strength distribution of the trading networks for each manipulated stock during the manipulation period and we also compute the same statistics of the reference stocks during the same periods. Third, for each stock we fit and test the power-law hypothesis using the methods described in the previous sections, and for each manipulated stock we take the average value of its reference stocks as the reference value.

In Figure 5(a) and (b), for all power-law distributions mentioned above, the lower bounds of each power-law fit for manipulated stocks are much larger than the reference values(except in-degree lower bound of the fourth stock). From Figure 5(c), it can be seen that the average degrees of the trading networks of manipulated stocks are larger than those for non-manipulated stocks. This means that there are more transactions between traders of the manipulated stocks. This phenomenon is attributed to some malpractices involving a group of traders acting and trading together to achieve a specific effect on the volume of a target security. In fact, manipulators trade among themselves in order to artificially increase the price and volume of a stock for the purpose of attracting other investors to buy the stock and they have a heavy trading volume among themselves. Manipulators earn a profit and investors incur losses. The anomaly in these distributions means that the manipulators are disturbing the market order.

### 4.2 Price return and trader number

After investigating the statistical properties of a manipulated trading network, we study how price change correlates with the trading activities. First, we consider how the number of traders changes with price. We plot the number of traders who buy shares and the number of traders who sell shares with respect to the trading day *t* in the top panel of Figure 6 and the stock prices are also plotted in the bottom panel.

Figure 6 illustrates the evolution of the daily prices, and , from which one can see that and change synchronously with the stock prices for both non-manipulated stock and manipulated stock. For non-manipulated stock (Figure 6(a)), when the price goes up, ; otherwise . Compared with the non-manipulated stocks, the manipulated stock (Figure 6(b)) behaves differently. For this stock, is always greater than no matter whether price rises or falls.

As we know, prices of stocks must be determined by the listed companies’ condition without any interference, and the price of a certain stock can affect the investors’ decision to buy or to sell. In order to obtain more profit, investors buy the stocks at low prices and sell them at high prices, which is called ”buy low and sell high”. When a buyer submits a big bid order, there will be a great number of retail investors who sell their shares. There are more buyer-initiated trades than seller-initiated trades and the price will go up. However, manipulators employ different methods to influence the price of the targeted stocks, and the number of traders may not reflect the price change.

Then we study the correlation between price return and the number of traders for each non-manipulated stock and manipulated stock. Each return series was evaluated through the logarithmic change of the corresponding price series, *pr(t)=P(t)-P(t-1)* (5), in which *P(t)* denotes the average price on day *t*. The seller-buyer ratio *r* is defined as the ratio of to . Next, we calculate the correlation coefficient of the price return and *r*. This is illustrated in Figure 7. From Figure 7, we find that *r* is positively correlated to *pr(t)* and the correlation coefficients are greater than 0.2 for most of the non-manipulated stocks; however, for the manipulated stocks they are all less than 0.2.

In trade-based manipulation, manipulators engage in fraudulent trading to create an image of an active market and attract the other investors to buy the stock (39). First, a manipulator controls hundreds of accounts in order to purchase or sell a large amount of shares at the same time in order to influence the stock price. Second, money transfers from one or several satellite accounts that are also controlled by the fraudulent trader to a central account. A manipulator trade shares from one account to another without changing ownership and this fraudulent trading leads to an image of active market. Therefore, price fluctuations of manipulated stocks do not reflect the demand fluctuations.

## 5 Conclusion

In this paper, we investigate the statistical properties of investor’s trading behavior by means of network analysis based on the transaction data for 108 stocks during the whole year 2004.

For each stock, we construct a trading network in which the nodes represent investors and each transaction is translated into a directed edge drawn from a seller to a buyer with the total trading volume as its weight. We find that the degree distributions follow a power law and the node strength distributions display power laws in the tails.

Furthermore, we examine the statistics of the manipulated stocks. We choose the non-manipulated stocks which have the same capitalization and industry sector with the manipulated stocks. By comparing the degree and strength distributions between non-manipulated stocks and manipulated stocks, we find that manipulated stocks have a higher lower bound of the power-law tail, and higher average degree of the trading network than non-manipulated ones. By analyzing the correlation between the price return and seller-buyer ratio, we find that manipulated stocks have a lower correlation than non-manipulated stocks.

A trading network provides a way to trace shares and cash flows in financial markets. However, the transaction records are generated by the trading system automatically according to the principle of price-time priority rather than from the direct results of the negotiation among users and spontaneous behavior. How to combat such a drawback is an issue for future work.

## Acknowledgments

This work was funded by the National Natural Science Foundation of China under grant numbers 60873245, 60933005, and 60803123. This work was also partly founded by the National High-Tech R&D (863) Program of China under grant number 2010AA012500.

### Footnotes

- journal: Physica A

### References

- V. Plerou and H. E. Stanley, Tests of scaling and universality of the distributions of trade size and share volume: Evidence from three distinct markets, Phys. Rev. E 76 (2007), 046109.
- X. Gabaix, P. Gopikrishnan, V. Plerou and H. E. Stanley, A theory of power-law distributions in financial market fluctuations, Nature 423 (2003), 267-270.
- V. Plerou, P. Gopikrishnan, X. Gabaix, L. A. N. Amaral and H. E. Stanley, Price fluctuations, market activity, and trading volume, Quantitative Finance 1 (2001), 262-269.
- P. Gopikrishnan, V. Plerou, X. Gabaix and H. E. Stanley, Statistical properties of share volume traded in financial markets, Phys. Rev. E 62 (2000), 4493-4496.
- Y. Liu, P. Gopikrishnan, P. Cizeau, M. Meyer, C. K. Peng and H. E. Stanley, Statistical properties of the volatility of price fluctions, Phys. Rev. E 60 (1999), 1390-1400.
- K. E. Lee, J. W. Lee, Probability distribution function and multiscaling properties in the Korean stock market, Physica A 383 (2007), 65-70.
- J.-W. Zhang, Y. Zhang, H. Kleinert, Power tails of index distributions in Chinese stock market, Physica A 377 (2007), 166-172.
- G.-H. Mu, W. Chen, J. Kertsz and W.-X. Zhou, Preferred numbers and the distributions of trade sizes and trading volumes in the Chinese stock market, European Physical Journal B 68 (2009), 145-152.
- G.-F. Gu and W.-X. Zhou, Statistical properties of daily ensemble variables in the Chinese stock markets, Physica A 383 (2007), 497-506.
- T. Qiu, L.-X. Zhong, G. Chen, X.-R. Wu, Statistical properties of trading volume of Chinese Stocks, Physica A 388 (2009), 2427-2434.
- G.-F. Gu, W. Chen, W.-X. Zhou, Empirical distributions of Chinese stock returns at different microscopic timescales, Physica A 387 (2008), 495-502.
- G.-H. Mu, W.-X. Zhou, Tests of nonuniversality of the stock return distributions in an emerging market, Phys. Rev. E 82 (2010), 066103.
- B.-H. Wang, P.-M. Hui, The distribution and scaling of fluctuations for Hang Seng index in Hong Kong stock market, Eur. Phys. J. B 20 (2001), 573-579.
- X.-Q. Sun, X.-Q. Cheng, H.-W. Shen and Z.-Y. Wang, Statistical properties of trading activity in Chinese stock market, Physics Procedia 3 (2010), 1699-1706.
- B. Podobnik, H. E. Stanley, Detrended cross-correlation analysis: a new method for analyzing two nonstationary time series, Phys. Rev. Lett. 100 (2008), 084102.
- B. Podobnik, D. Horvatic, A. M. Petersen and H. E. Stanley, Cross-correlations between volume change and price change, Proc. Natl. Acad. Sci. USA 106 (2009), 22079-22084.
- W.-X. Zhou, Multifractal detrended cross-correlation analysis for two nonstationary signals, Phys. Rev. E 77 (2008), 066211 .
- F. Ren, W.-X. Zhou, Recurrence interval analysis of high-frequency financial returns and its application to risk estimation, New J. Phys. 12 (2010), 075030 .
- B. Podobnik, D. Wang, D. Horvatic, I. Grosse, and H. E. Stanley, Time-lag cross-correlations in collective phenomena, Europhys. Lett. 90 (2010), 68001.
- K. Matia, Y. Ashkenazy and H. E. Stanley, Multifractal properties of price fluctuations of stocks and commodities, Europhys. Lett. 61 (2003), 422.
- P. Ch. Ivanov, A. Yuen, B. Podobnik and Y. Lee, Common scaling patterns in intertrade times of U.S. stocks, Phys. Rev. E 69 (2004), 056107.
- J. P. Onnela, K. Kaski and J. Kertész, Complex networks in the study of financial and social system, Eur. Phys. J. B 38 (2004), 353.
- T. Qiu, B. Zheng and G. Chen, Financial networks with static and dynamic thresholds, New J. Phys. 12 (2010), 043057.
- C. Piccardi, L. Calatroni and F. Bertoni, Communities in Italian corporate networks, Physica A 389 (2010), 5247-5258.
- R. N. Mantegna, Hierarchical structure in finanical markets, Eur. Phys. J. B 11 (1999), 193-197.
- P. Li, B.-H. Wang, An approach to Hang Seng index in Hong Kong stock market based on network topological statistics, Chinese Science Bulletion 51 (2006), 624-629.
- P. Li, B.-H. Wang, Extracting hidden fluctuation patterns of Heng Seng stock index from network topologies, Physica A 378 (2007), 519-526.
- S.-M. Cai, Y.-B. Zhou, T. Zhou, P.-L. Zhou, Hierarchical organization and disassortative mixing of correlation-based weighted financial networks, Int. J. Mod. Phys. C 21 (2010), 433-441.
- S. Fortunato. Community detection in graphs. Phys. Rep., 486 (2010), 75-174.
- X.-Q. Cheng, F.-X. Ren, S. Zhou, and M.-B. Hu, Triangular clustering in document networks, New J. Phys. 11 (2009), 033019.
- H.-W. Shen, X.-Q. Cheng, K. Cai, and M.-B. Hu, Detect overlapping and hierarchical community structure in networks, Physica A 388 (2009), 1706-1712.
- H.-W. Shen, X.-Q. Cheng, and B.-X. Fang, Covariance, correlation matrix and the multi-scale community structure of networks, Phys. Rev. E 82(2010), 016114.
- F. Kyriakopoulos, S. Thurner, C. Puhr, S. W. Schmitz, Network and eigenvalue analysis of financial transaction networks, Eur. Phys. J. B 71 (2009), 523-531.
- J.-J. Tseng, S.-P. Li, S.-C. Wang, Experimental evidence for the interplay between individual wealth and transaction network, Eur. Phys. J. B 73 (2010), 69-74.
- J.-J. Wang, S.-G. Zhou and J.-H. Guan, Characteristics of real futures trading networks, Physica A 390 (2011), 398-409.
- Z.-Q. Jiang, W.-X. Zhou, Complex stock trading network among investors, Physica A 389 (2010), 4929-4941.
- R. Mookerjee and Q. Yu, An empirical analysis of the equity markets in China, Review of Financial Economics, 8 (1999), 41-60.
- A. Clauset, C. R. Shalizi, M. E. J. Newman, Power-law distributions in empirical data, SIAM Rev. 51 (2009), 661-703.
- F. Allen and D. Gale, Stock-price manipulation, Review of Finacial Studies, 5 (1992), 503-529.
- R. A. Jarrow, Market manipulation, bubble, corners, and short squeezes, Journal of Financial and Quantitative Analysis 27 (1992), 311-336.