Clusters of investors around Initial Public Offering

Clusters of investors around Initial Public Offering

Margarita Baltakienė
Tampere University
margarita.baltakiene@tuni.fi
Corresponding author
   Kęstutis Baltakys
Tampere University
   Juho Kanniainen
Tampere University
   Dino Pedreschi
University of Pisa
   Fabrizio Lillo
University of Bologna
July 1, 2019
Abstract

The complex networks approach has been gaining popularity in analysing investor behaviour and stock markets, but within this approach, initial public offerings (IPO) have barely been explored. We fill this gap in the literature by analysing investor clusters in the first two years after the IPO filing in the Helsinki Stock Exchange by using a statistically validated network method to infer investor links based on the co-occurrences of investors’ trade timing for 14 IPO stocks. Our findings show that a rather large part of statistically similar network structures is persistent and general: they form in different securities’ and persist in time for mature and IPO companies. We also find evidence of institutional herding that hints at the existence of an investor information network.

Introduction

Initial public offerings (IPOs) play an important role in financial markets because they open new investment opportunities, redistribute funds’ allocations and attract new investors to the market. An IPO is usually a long-awaited event in the life of a privately held company, both for the current stockholders and the public exchange investors, giving the owners the opportunity to cash in and giving the investors a chance to gain from potential underpricing and future returns. Here, numerous financial studies have addressed various behavioural biases in relation to IPOs: Ljungqvist and Wilhelm [1] analysed the satisfaction with an IPO underwriter’s performance and indicated a unique pricing behaviour around the dot-com bubble [2], while Kaustia et al. [3] found that investors’ personal experiences and previous IPO returns have a significant impact on future IPO subscriptions. Other studies have analysed IPO investments [4], IPO earnings [5] and IPO underpricing [6] in financial markets on an aggregated level.

Financial markets, in turn, are complex systems comprised of financial decisions, information flows and direct and indirect investor interactions. A typical aspect of a financial market is multidimensionality and agent heterogeneity [7, 8]. Making an investment decision is a complex procedure because it is layered with different choices that are influenced by various market factors, investors’ experiences, wealth and investors’ stage of life. It is crucial to understand the characteristics of the underlying investor behaviour patterns because these, when combined with their behaviours, shape the dynamics of the whole market and thus are important factors in explaining the booms and bubbles in the financial markets [9]. Because investors seek higher returns, one possibility is to use social networks and other private information channels to follow other investors’ strategies and to exploit privately channelled information in stock markets. Recently, [10] provided evidence of the negative relationship between distance and trade timing similarity for household investors, indicating that face-to-face communication is still important in financial decision making. According to [11], information links can be identified from realised trade because investors who are directly linked in the information network tend to time their transactions similarly. We follow this idea and use observations on investor-level transactions from shareholder registration data to identify the links between investors, here with a special focus on identifying investor clusters. Prior studies have investigated the structures of investor networks in different contexts [11, 12, 13, 14, 9, 15], but investor clusters around IPOs have barely been explored.

We address this research gap by performing a broad multistock exploratory analysis of investor clusters over 14 stocks in the first two years of their IPO. In particular, we seek to establish whether the identified investor clusters are persistent over the first two years of the IPOs and appear across multiple IPO securities, as well as with existing, mature stocks in the market. Our analysis unveils one property of a financial market: we detect statistically robust investor clusters that form simultaneously in various securities and that persist over time.

Most of the earlier papers perform analyses on an aggregated category level [4, 16, 17, 18] or concentrate on a single highly liquid stock [12, 14]. Even though this type of analysis might include nearly all market participants interconnecting in a giant subsystem, the results do not generalise the collective market strategies but instead are rather stock specific. In contrast to previous research in the IPO literature, the current study is the first one on early-stage trading behaviour patterns on an individual investor account level. On the other hand, in opposition to the existing research on investor networks, in the current paper, instead of focusing of heavily capitalised stocks we analyse collective investor trading strategies that emerge after IPOs in the Helsinki Stock Exchange (HSE).

With the growing amounts of data and the availability of new datasets, the network theory has become a popular approach in analysing financial complex systems (e.g., [19]). Notwithstanding the high interest in the market structure, investor networks and the complexity of investor behavioural interrelationships remain weakly explored. Indeed, high precision financial investor-level datasets covering years of historical data and containing information about the social links are very rare and expensive because of their sensitive nature. Moreover, transactional data often have no explicit or implicit links between investors. As a consequence, the network inference methodologies have gained much interest in recent research [11, 13]. Similar to [14], we use the statistical validation method proposed by [20], which best suits our objectives and the available dataset.

In the current paper, we infer investor networks based on the investors’ trading co-occurrences for 14 securities that had their IPOs between the years 1995 and 2007, and we obtain dynamic multilink networks covering two years after their IPOs. Further, by applying the Infomap algorithm [21] on the investor networks, we analyse the networks’ topology. With the obtained network partitioned into clusters, we detect statistically robust clusters that persist in the networks between the first and the second years after the IPO. We also find clusters that form and re-occur over multiple securities. Finally, by cross-validating investor clusters on IPO securities with the investor clusters of a more mature stock, we conclude that the phenomenon of persistent clusters is not IPO specific but is rather market specific.

Dataset and methodology

Dataset

In this paper, we use a unique database provided by Euroclear Finland. The dataset contains all transactions executed in the Helsinki Stock Exchange by Finnish stocks shareholders between 1995 and 2009 on a daily basis. The data records represent the official certificates of ownership and include all the transactions executed in the Helsinki Stock Exchange that change an ownership of assets. Each transaction in the dataset has a rich set of attributes – such as investor sector code, investor birth year, gender and postal code – that we make use of in our analysis to identify and characterise the investor groups. The dataset classifies investors into six main categories: households; nonfinancial corporations; financial and insurance corporations; government; nonprofit institutions; and the rest of the world. Finnish domestic investors correspond to a separate account ID, while foreign investors can choose the nominee registration for the trades. However, the analysis cannot be conducted for nominee-registered transactions because individual nominee investors cannot be uniquely identified. Rather, the nominee investors are pooled together under the custodian’s nominee trading account. Therefore, a single nominee-registered investor’s account holdings may correspond to a large aggregated ownership of several foreign investors. So to avoid inconsistencies in the results, we eliminated nominee transactions from our analysis. This dataset has been also analysed and described in previous research (e.g., [22, 15, 9, 10, 18]).

The analysed data are restricted to marketplace transactions for securities that had their IPO listing in the Helsinki Stock Exchange between 1995 and 2009. The official listing dates were provided by NASDAQ OMX Nordic explicitly for the current research. There are 67 stocks in total that were listed in Finland on the Main Exchange or First North in the given time period. Unfortunately, the data appeared to have issues with the trading date attribute for some securities, particularly for the transactions between 1998 and 2004111The net trading volumes on a daily resolution do not reconcile to 0 for all trading dates, while the volume sold should be equal to the volume bought per each stock during each day across all investors. We noticed that the trading volume per each stock cumulatively sums to 0 for each stock; overall, 67 selected stocks over the whole analysed period were analysed. This suggests that some transactions in the dataset were misplaced timewise because of incorrectly recorded trading dates.. Because our analysis is based on the investors synchronous trading actions, to avoid a misinterpretation of the results, the securities with incorrectly timestamped transactions have been eliminated from the dataset. We also excluded three of the least-liquid securities with a very small number of transactions and investors. This resulted in a subset of 14 securities with correctly registered transactions, where one of the stocks was issued in 1995 and 13 of the stocks had IPOs between 2003 and 2007. The reduced set includes 11 securities with a medium liquidity and three with high liquidity in terms of the total number of transactions and unique investors per security (see Table 1). Some companies (e.g. Oriola) have two share classes with different voting rights. Class A shares give the owner more voting rights than Class B and hence potentially falls under a separate group of investors. Therefore, the comparison or a direct substitution of shares with one another seems improper, and we consider the securities with different voting classes as separate stocks.

Company name ISIN industry
total # of
transactions
# of unique
investors
IPO date
Neste FI0009013296 Oil & Gas 4,903,603 81,773 2005 Apr 21
KONE FI0009013403 Industrials 3,073,482 30,204 2005 Jun 01
Cargotec FI0009013429 Industrials 1,768,379 29,214 2005 Jun 01
Kemira GrowHow FI0009012843 Basic Materials 301,810 25,256 2004 Oct 18
Oriola B FI0009014351 Health Care 296,249 19,300 2006 Jul 03
Ahlstrom FI0009010391 Basic Materials 158,464 16,599 2006 Mar 17
SRV Yhtiöt FI0009015309 Industrials 71,302 9,601 2007 Jun 15
Affecto FI0009013312 Technology 56,833 5,727 2005 Jun 01
Oriola A FI0009014344 Health Care 41,711 5,600 2006 Jul 03
Terveystalo Health FI0009012413 Health Care 38,929 8,975 2007 Apr 10
Salcomp FI0009013924 Industrials 38,510 3,689 2006 Mar 17
FIM Group FI0009013593 Financials 21,824 3,086 2006 Apr 21
OMX SE0000110165 Financials 21,138 1,852 2003 Sep 04
Aspoyhtymä FI0009004881 Industrials 16,687 2,070 1995 Jan 12
Table 1: Summary of IPO stocks. Company, International Securities Identification Number (ISIN), industry, total number of transactions, total number of unique investors and the IPO day of security in the reduced set.

Table 2 gives the number of investors, the number of transactions and the traded volume for the entire set of 67 IPO stocks and for the reduced set of 14 stocks. The total number of investors who traded an IPO security is 360,607, and the total number of transactions is 55,161,521. The table also shows the number of nominee and non-nominee-registered investors.

Entire set Reduced set
Investor category # ids volume # transactions # ids volume # transactions
Non-financial corporations 19,957 8,215,911,863 2,685,198 8,363 1,049,640,371 493,666
Financial and insurance corporations 767 245,572,690,436 40,068,666 750 23,307,610,030 8,149,556
Government 374 4,937,302,118 225,817 113 590,306,199 31,099
Households 334,214 7,158,812,157 9,389,947 121,434 735,428,351 1,555,604
Non-profit institutions 2,359 709,718,657 216,629 1,067 78,654,895 38,824
Rest of the world 2,936 9,945,272,389 2,575,264 806 1,012,077,566 563,797
Total 360,607 276,539,707,620 55,161,521 132,533 26,773,717,412 10,832,546
Nominee registered 86 225,960,607,014 35,958,801 71 21,987,309,532 7,566,437
Non-nominee registered 360,560 50,579,100,606 19,202,720 132,495 4,786,407,880 3,266,109
Table 2: Summary of the number of investors, absolute exchanged shares volume and the number of transactions of the entire set and the reduced set. Note that the total volume in the table is counted twice, both for the selling and buying transactions. Here, 39 out of 86 investors in the entire set and 33 out of 71 investors in the reduced set with a nominee-registered holding type also made transactions with a non-nominee-registered holding type.

As shown, a few nominee accounts perform roughly twice as many trades as the non-nominee accounts. The reduced set accounts for 37% of all investors, 10% of total traded volume and 20% of transactions in the entire set.

Methodology

The given dataset is composed of transaction data where investors’ social links are not explicitly given, nor can they be directly obtained from other sources because of data anonymisation. However, given that investors must individually react and adapt to a quickly changing environment, they should identify and follow the best trading strategies. To detect investors with similar trading strategies or, more precisely, trade timing similarity, we take a look at the pairwise investors’ trading co-occurrences. In the current paper, we use a statistically validated network (SVN) method first introduced by [20]. This method, briefly presented below, has been demonstrated to be effective in investigating financial, biological and social systems [20, 12].

To compare the trading position taken by an investor on a given day, irrespective of the absolute volume traded, a categorical variable is introduced that describes the investor’s trading activity. For each investor and each trading day having the volume sold of a security and the volume bought of a security , we calculate the scaled net volume ratio as follows:

(1)

Then, a daily trading state can be assigned for an investor after having selected a threshold , as follows:

In our analysis, much like in [8], we set . We have verified that the calculations are not sensitive to selection: the results do not vary significantly for the threshold ranging from 0.01 to 0.25. With this categorisation, the system can be mapped into a bipartite network. We will take one set of nodes composed of investors and the other set composed of the trading days.

The states , and of investor are indicated as , and , respectively. There are nine possible combinations of the three trading states between investors and : (,), (,), (,), (,), (,), (,), (,), (,) and (,). Because we are focusing on the positive relationship between investors’ trading strategies, we further analyse only the situations where both investors have been in a buy state (,), both investors have been in the sell state (,), and both investors have been day traders (,), thus excluding the other six trading state co-occurrences.

Statistically validated networks

With the categorical variables on the trading states, the co-occurrence of the trading states of investors and can be identified and statistically validated. First, for each investor, her or his activity period is identified. Second, for an investor pair, the length of a joint trading period is determined, , which is equal to the number of trading days in an annual data sample for a given security ( 250). Then, in the intersection periods of a trader’s activity, () denotes the number of days when investor () is in a given state . Moreover, denotes the number of days when we observe the co-occurrence of the given states for investors and . Under the null hypothesis of the random co-occurrences of a state for investors and , the probability of observing co-occurrences of the investigated states for two investors in observations can be expressed by the hypergeometric distribution [20]. For each trading state , a -value can be associated as follows:

(2)

Using the SVN method, we construct dynamic networks for the IPO securities. The analysis for each security spans from the initial listing day up to the second year after the IPO. We assign the categorical variables that define the investor’s daily trading state, and we select only domestic Finnish investors who have traded an IPO stock at least five days during the first or second year. For each analysed security, we take two consecutive one-year periods of categorised trading states for investors. Taking the projection of the investor set in a year, we obtain an annual monopartite investor network, and two investor networks for consecutive years are obtained for each security.

We adjust the -thresholds using a false discovery rate (FDR) correction [23] by taking the sorted -values in an increasing order and retain those that satisfy . Here, we apply = 0.05, and equals the total number of observed relationships in a year. All networks are essentially multilink networks, where each link describes the type of trading co-occurrence between an investor pair. This adjustment is needed because there are multiple links and thus multiple tests with a given network. The link between investors and is considered to be statistically significant and thus existing if the corresponding -value, , is below the FDR-adjusted -threshold. In this way, we obtain validated dynamic networks for the first and second years. As an example, Fig. 10 in Appendix C shows the first year sorted p-values and the FDR thresholds for Kemira GrowHow links.

Statistically validated cluster dynamics timewise

We are interested in the investors’ cluster dynamics in the networks and evolution over time. In other words, we want to verify whether investors systematically synchronise their trading strategies with other investors and if such behaviour can be detected in dynamic networks. With the community partition for each network, we identify persistent clusters (i.e., clusters that share the same statistically significant component of investors in both the first and the second years after the IPO). Further, we briefly present the method from [24].

We are interested in identifying statistically similar clusters that emerged in both years (i.e., clusters with the overexpression of the same investor composition in both clusters, which share nonrandom elements). The probability that elements in the cluster of the first year network composed of elements also appear in the cluster of the second year composed of elements under the null hypothesis that the elements in each cluster are randomly selected is given by the hypergeomteric distribution , where is the total number of unique elements over two years. By using this distribution, a -value can be associated with the observed number of elements of the cluster reoccurring in according to the following equation:

(3)

We reject the null hypothesis if is smaller than a given adjusted threshold, in which case we say that the cluster is statistically similar with the cluster . We adjust the statistical threshold using the FDR correction with and the number of tests being equal to the total number of cluster pairs over two years that shared at least one common element.

Statistically validated cluster dynamics security-wise

Additionally, to check if the same cluster exists over multiple securities, we expand the analysis and further look for statistically significant overlapping clusters security-wise. Because the IPO event is the alignment point in our analysis, we look for the overlapping clusters in the set of first-year networks and the set of second-year networks separately. We again use the method (Eq. 3) for the cluster overlaps to detect clusters with nonrandomly overlapping elements (investors). To calculate the -values, we take equal to the total number of unique investors across all investigated securities in the same year, where is the number of investors in the cluster , is the number of investors in the cluster , and is the number of common investors in both and . Again, we adjust the statistical threshold using the FDR correction, where and the number of tests is equal to the total number of cluster pairs within the same year that shared at least one common element.

Over- and underexpression of the characterising investor attributes

To describe the investor clusters from the perspective of the attributes, such as postal code, age, gender or the type of organisation, we again use the hypergeometric test for identifying nonrandom overlap [25]. Once we obtain a system of elements partitioned into clusters (communities), we want to characterise each cluster of elements. Each element of the system has a certain number of attributes from a specific class. Here, we want to see if the number of elements in the cluster with a specific attribute value is significantly larger than randomly selecting the elements from the total system elements. For each attribute of the system, we test if is over-expressed in the cluster . The probability that elements in cluster have the attribute under the null hypothesis that the elements in the cluster are randomly selected is given by the hypergeomteric distribution , where is the total number of elements in the system with attribute . By using this distribution, a -value can be associated with the observed number of elements in cluster that have the attribute analogously with Eq. 3. We reject the null hypothesis if the p-value is smaller than a given FRD-adjusted p-threshold, and we then say that the attribute is overexpressed in cluster . In the FDR-adjustment, the number of tests is equal to the total number of unique attribute values over all attribute classes and all clusters in a network.

Alternatively, the attribute’s underexpression can also be tested. Here, we want to see if the number of elements in the cluster with a specific attribute value is significantly lower than randomly selecting the elements from the total system elements. The probability under the null hypothesis that the value of an attribute in a cluster is smaller than the observed value in the system can be obtained from the left tail of the hypergeometric distribution, as follows:

(4)

Again, if is smaller than a given FDR-adjusted p-threshold, we say that the attribute is underexpressed in cluster . We used the same setting for the FDR correction.

Results

We identify investor clusters in the statistically validated networks using the Infomap community detection algorithm [21]. Communities are locally dense connected subgraphs in a network that play a particularly important role in understanding a system’s topology. In the current paper, communities represent investor clusters that are timing their trades synchronously throughout the year. The three-layer network for three possible joint-trading states are first aggregated into one network with weighted links so that each link in the network is given the weight {1,2,3} depending on how many network layers have a validated link between a given investor pair222For example, if the given investors were timing their buy transactions similarly so that they have a statistically validated link in the buy layer but there were no statistical association with the sell and buy–sell states, then the weight of the link between the investors would be 1.. We run 100 Infomap trials on the networks with edge weights. Further, we detect clusters that statistically significantly overlap in the first and second year for each security. Fig. 1 (a) and (b) visualises the Infomap clusters for the first- and second-year networks of the security Kemira GrowHow (FI0009012843). Fig. 1 (c) and (d) displays five clusters that persist over the two first years after the IPO. Note that in this case, most of the investor clusters actually disappear after the first year, and completely new investor clusters are born in the second year. In addition, the size and shape of the network are considerably different between the years.

The latter observation is consistent for all of the analysed securities. Table 3 shows the number of asset-specific clusters over the total number of communities and persisting clusters per security. By asset-specific or unique clusters, we refer to the clusters that are not observable within investor networks for other IPO securities. For many stocks, most of the clusters do not persist in to the second year, and new investor clusters form in the following year. However, the number of asset-specific clusters is rather small and is around 31 (24) % in the first (second) year on average per asset.

ISIN
Unique
clusters Y1
Unique
clusters Y2
Persisting
clusters Y1 Y2
FI0009013296 66/262 (25%) 138/336 (41%)
FI0009013429 22/133 (17%) 11/89 (12%)
FI0009013403 16/112 (14%) 8/92 (9%)
FI0009015309 19/64 (30%) 7/29 (24%)
FI0009014351 16/56 (29%) 1/6 (17%)
FI0009012843 8/54 (15%) 11/64 (17%)
FI0009010391 5/37 (14%) 11/50 (22%)
FI0009013924 5/29 (17%) 4/11 (36%)
FI0009013312 5/26 (19%) 2/13 (15%)
FI0009012413 14/22 (64%) 10/22 (45%)
FI0009013593 3/17 (18%) 0/0 (0%)
FI0009004881 7/14 (50%) 3/10 (30%)
SE0000110165 2/4 (50%) 0/0 (0%)
FI0009014344 2/3 (67%) 0/0 (0%)
Table 3: Number of asset-specific clusters and persisting clusters in the first and second years after an IPO. Columns ‘Unique clusters Y1 (Y2)’ show the number of asset-specific investor clusters in the first (second) year networks, where asset-specific investor clusters are defined as those that were not observed in other IPO networks. The number in the brackets shows the ratio in percentage. ‘Persisting clusters Y1 Y2’ shows the number of clusters with statistically significant overlaps in the first and the second years. Note that because of cluster splits and merges, the number of persisted clusters is not necessarily the same for both years.
(a)
(b)
(c)
(d)
Figure 1: Infomap clusters and their evolution for Kemira GrowHow (FI0009012843). Community detection is used with weighted links based on the total number of buy state, sell state, and day trade link types between two investors. (a) FDR: 54 clusters, first year after IPO, (b) FDR: 64 clusters, second year after IPO, (c) and (d) show five statistically significant overlapping clusters in both years. Node position is fixed. The colours of reoccurring clusters in all graphs coincide. In (a) and (b), each cluster has a unique colour, with the exception of those with fewer than four elements, which are coloured in grey.

In our empirical analysis over all the IPO companies, we first analyse the over- and underexpression of the investor attributes in the identified investor clusters. Table 4 summarises the overexpressed cluster attributes333No-Gender and No-Age attributes refer to the institutional investors, but also to the individual investors who had no gender and/or birth year indicated in the data. For FI0009013296, the top five largest clusters in both year networks are presented. Clusters composed of fewer than eight elements with overexpressed attributes for FI0009010391 (Y1) and FI0009012413 are not presented in the table. FI0009014351, FI0009014344, FI0009013924, FI0009013593, SE0000110165 and FI0009004881 do not contain clusters with over-expressed attributes. This is why the number of No-Gender and No-Age attributes in the same cluster of Table 4 may differ.. The largest clusters of the analysed networks are over-represented by institutions: general government and nonprofit organisations. In addition, in the larger networks, location and decade attributes are overexpressed. At the same time, according to Table 5, households are under-represented in the largest clusters.444Clusters in FI0009013403, FI0009014351, FI0009014344, FI0009012413, FI0009013924, FI0009013593, SE0000110165 and FI0009004881 had no underexpressed attributes. Other attributes that are weakly represented in the analysed clusters are ‘Male’, ‘Helsinki’ and ‘South-West’ region. Overall, the results of both tables show that the largest clusters mainly contain institutions that are timing their trades similarly in a year. So compared with household investors, institutional traders form larger clusters that follow similar trade-timing strategies. Our findings thus support the studies that provide evidence of institutional herding [26, 27]. Some of the financial institutions, such as pension insurance companies, are driven by the same legislation and portfolio restrictions, which can lead to the same trading strategies. Alternatively, traders working for financial institutions have mutual and/or joint private information channels, leading to similar trade timing. The third explanation is that they react to public news in similar ways.

ISIN year cluster rank attribute attribute value # occur. cluster # occur. netw. #
FI0009013296 Y1 1 location Central-Finland 79/129 261/3,288 11
2 location Helsinki 36/49 1,145/3,288 7
3 sector code General-Government 8/20 29/3288 5
gender No-Gender 11/20 447/3,288 2
decade No-Age 13/20 504/3,288 4
4 location Northern-Finland 11/17 244/3,288 4
5 location Western-Tavastia 10/17 304/3,288 5
Y2 1 decade 1940 166/543 765/3,134 10
gender Male 433/543 2314/3,134 3
2 location Central-Finland 20/30 183/3,134 4
3 decade 2,000 2/24 3/3,134 8
4 sector code General-Government 3/19 21/3,134 5
sector code Non-Profit 5/19 24/3,134 5
gender No-Gender 12/19 423/3,134 3
decade No-Age 16/19 474/3,134 4
5 location Central-Finland 8/18 183/3,134 7
FI0009013403 Y1 1 sector code General-Government 9/17 21/441 4
gender No-Gender 14/17 105/441 2
decade No-Age 17/17 132/441 1
Y2 1 sector code General-Government 8/13 16/408 3
gender No-Gender 9/13 80/408 2
decade No-Age 13/13 99/408 1
FI0009012843 Y1 1 sector code General-Government 8/18 16/371 4
gender No-Gender 16/18 105/371 2
decade No-Age 18/18 128/371 1
2 location Ostrobothnia 6/14 29/371 4
Y2 1 location South-West 56/62 146/776 4
2 sector code General-Government 8/22 19/776 5
sector code Non-Profit 4/22 12/776 5
gender No-Gender 15/22 146/776 3
decade No-Age 17/22 171/776 5
FI0009013429 Y1 1 sector code General-Government 8/20 18/571 5
sector code Non-Profit 5/20 10/571 5
gender No-Gender 13/20 112/571 2
decade No-Age 17/20 149/571 2
2 sector code General-Government 4/8 18/571 3
Y2 1 sector code General-Government 8/15 12/360 4
gender No-Gender 9/15 65/360 2
decade No-Age 13/15 81/360 3
FI0009010391 Y2 1 sector code General-Government 5/12 7/309 4
decade No-Age 11/12 73/309 2
FI0009015309 Y1 1 sector code General-Government 6/17 9/273 5
location Helsinki 16/17 140/273 2
decade No-Age 12/17 61/273 4
Y2 1 sector code General-Government 5/10 6/122 4
decade No-Age 9/10 34/122 2
FI0009013312 Y1 1 sector code General-Government 8/11 8/137 3
gender No-Gender 9/11 30/137 2
decade No-Age 11/11 35/137 1
Y2 1 sector code General-Government 8/15 8/60 5
gender No-Gender 10/15 18/60 2
decade No-Age 14/15 22/60 2
Table 4: Overexpressed attributes in the largest clusters with at least eight nodes. Here, the column ‘cluster rank’ is the rank of the cluster with validated overexpressed attributes, ‘# occur. cluster’ is the number of times the attribute is present in the cluster over the size of the cluster and ‘# occur. netw.’ is the number of times the attribute is present in the network over the total network size. ‘# ’ is the number of distinct attributes of the same class in the cluster. ‘Y1’ (‘Y2’) corresponds to the network clusters in the first (second) year after the IPO.
ISIN year cluster rank attribute attribute value # occur. cluster # occur. netw. #
FI0009013296 Y1 1 location Helsinki 17/129 1145/3,288 11
location South-West 2/129 403/3,288 11
2 location Helsinki 7/20 2,767/3,288 5
3 location Helsinki 2/10 2,767/3,288 5
Y2 1 sector code Households 3/19 2,647/3,134 5
gender Male 3/19 2,314/3,134 3
2 sector code Households 4/17 2,647/3,134 4
FI0009012843 Y1 1 gender Male 2/18 227/371 2
Y2 1 location Helsinki 4/62 304/776 4
2 sector code Households 5/22 602/776 5
gender Male 6/22 540/776 3
FI0009013312 Y1 1 gender Male 2/11 101/137 2
Y2 1 sector code Households 1/15 38/60 5
FI0009015309 Y1 1 sector code Households 5/17 211/273 5
Y2 1 sector code Households 1/10 88/122 4
FI0009013429 Y1 1 sector code Households 3/20 422/571 5
Y2 1 sector code Households 2/15 279/360 4
FI0009010391 Y2 1 sector code Households 1/12 232/309 4
Table 5: Underexpressed attributes in the top largest clusters with at least eight nodes. Here, the column ‘cluster rank’ is the rank of the cluster with validated overexpressed attributes, ‘# occur. cluster’ is the number of times the attribute is present in the cluster over the size of the cluster and ‘# occur. netw’ is the number of times the attribute is present in the network over the total network size. ‘# ’ is the number of distinct attributes of the same type in the cluster. ‘Y1’ (‘Y2’) corresponds to the clusters in the FDR networks in the first (second) year after the IPO.

Next, for two consecutive years of each security, we detect persistent clusters with a statistically significant investor overlap, hereafter referred to as the timewise overlap. The visualised results are provided in Appendix A. The clusters in the current paper are visualised as rectangle blocks that are composed of investors with four attributes: sector code, geographic location, gender and year of birth decade (see Fig. 2). Statistically significant overlapping cluster pairs are connected by the arrow. An arrow points from the first-year to the second-year cluster. In the reduced set three555no second-year network for FI0009014344, FI0009013593 and SE0000110165 out of 14 securities did not produce any valid links for the second-year network. The overlapping clusters that existed in the first year and persisted over the second year were detected in eight securities666no linked clusters for FI0009013924, FI0009014351 and FI0009004881 out of 11. Importantly, and as mentioned above, we do not detect many clusters that persist over time. This can be partially explained by the fact that the size of the analysed networks are rather small because the trading activity for the selected stocks is moderate, and the number of links depends on the co-occurrences of the investor trade timing.

Figure 2: Graphical representation of the clusters. A single cluster is visualised as a rectangle block, where a row represents one investor with four attributes: sector code, location, gender and birth year decade.
Sector code: - Households, - Non-financial, - Financial-Insurance, - General-Government, - Non-Profit, - Rest-World.
Geographic location: - Helsinki, - South-West, - Western-Tavastia, - Central-Finland, - Northern-Finland, - Ostrobothnia, - Rest-Uusimaa, - Eastern-Tavastia, - Eastern-Finland, - South-East, - Northern-Savonia.
Gender: - Male, - Female, - No-Gender.
Decade: - No-Age, - 1910, - 1920, - 1930, - 1940, - 1950, - 1960, - 1970, - 1980, - 1990, - 2000.

Further, we analyse the identified clusters that overlap across multiple securities, hereafter referred to as the security-wise overlap, separately for the first- and second-year networks. Fig. 8 in Appendix A shows part of the results for the statistically significant overlaps across multiple securities. Some statistically significant overlapping clusters are not presented in the figure. The arrows between the clusters are omitted for the simplification of the visualisation. In the figure, each cluster has a statistically significant overlap with at least one cluster in a group.

Combining the previous results together, we obtain persistent clusters that emerge in investor networks over multiple securities. The results show that some statistically similar network structures are persistent and general: they appear in different securities and persist over time. Some examples are visualised in Fig. 3, where each group consists of clusters that overlap over time and security-wise. In the figure, the top (bottom) row of the group refers to the first- (second-) year clusters. Moreover, the downward arrows associate statistically similar clusters in the first- and second-year networks. The arrows between the clusters in the same year networks are omitted for the simplification of the visualisation. Notably, even if some of the clusters are not persistent over time, they appear quite generally over different securities.

Figure 3: Statistically significant cluster overlaps across multiple securities and over time. The row alignment shows statistically similar clusters in the same year: the top row is the first after the IPO, and the bottom row is the second year after the IPO. The downward arrows show the cluster timewise evolution from the first to the second year for the same security. A cluster is represented by the rectangle. Each cluster is composed of investors with four attributes: sector code, geographic location, gender and decade. See the attribute colour mapping in Fig. 2.

Do clusters of IPO investors exist with mature companies?

To verify if our identified persisting clusters are just IPO-related or if they exist with mature companies, as well, we compare the clusters of the new-to-the-market stocks with two mature companies – Nokia (FI0009000681) and UPM-Kymmene (FI0009005987). Regarding Nokia, the long-term evolution of the clusters of the most capitalised stock in the HSE – Nokia – has recently been analysed in [14]. Because the results between Nokia and UPM-Kymmene are consistent, we report our findings for Nokia only (the results for UPM-Kymmene are available upon request). For this analysis, we select the transactions between 1995 and 2009 for Nokia and check for a mismatch of the daily trading volume mentioned earlier in the subsection Dataset. We construct SVNs with the FDR correction for Nokia based on yearly synchronised nonrandom trading co-occurrences and identify investor clusters with Infomap cluster partitions. Unfortunately, there were incorrectly registered transactions in our data between 2000 and 2004, and because of this, we eliminated the part of data with incorrectly registered transactions. As a result, we obtain nine networks for Nokia. Table 6 shows the time spans of the SVN and the corresponding number of validated links per network. The yearly periods starting in June 2004 were selected intentionally so that they could be aligned over time and hence be more comparable with investor networks with some of the IPOs (see Table 1).

Time span Nokia
1995 Jan - 1996 Jan 823
1996 Jan - 1997 Jan 1179
1997 Jan - 1998 Jan 6869
1998 Jan - 1999 Jan 38446
2004 Jun - 2005 Jun 303263
2005 Jun - 2006 Jun 120703
2006 Jun - 2007 Jun 48637
2007 Jun - 2008 Jun 59794
2008 Jun - 2009 Jun 243961
Table 6: Number of links and corresponding time spans for Nokia SVNs with the FDR correction.

Initially, we can obtain the statistically robust clusters for Nokia’s stock across its multiple networks for consecutive years. One particular example of cluster evolution and persistence over time is shown in Fig. 4. An interesting observation is that the evolution of a robust cluster can be detected over a gap from 1999 to 2004, and the evolution spans over a long period. Next, we analyse the overlaps between investor clusters with Nokia data and the investor clusters with data on IPOs to answer the question if the investor cluster identified with IPO securities exists with a mature company. Fig. 5 shows the identified investor clusters on Nokia that can be linked to the investor clusters on IPOs. In the figure, the statistically significant overlap of the clusters with the IPOs and Nokia is marked by X. Notably, the clusters on IPOs are also significantly similar among each other; this example cluster group corresponds to Fig. 3 in the bottom left corner. Significant similarities within the cluster overlap can be observed from 1996 to 2008. From 2005 to 2007, an overlap with Nokia was detected for each shown cluster of seven securities in both years. Our results detect more overlapping clusters between the IPO stocks and Nokia, and some of the examples of the evolution of the cluster on Nokia are presented in Appendix B. Because the clusters persist also in a non-IPO stock, their dynamics can be traced several years back. This confirms that they occur not because of some IPO-specific factor but because their origin is rather market specific.

Figure 4: Evolution of investor clusters on Nokia. Example of the evolution of clusters on Nokia over time. Statistically similar clusters are connected by arrows. A cluster is represented by a rectangle. Cluster labels show the year of the cluster and the conventional cluster name. Clusters with labels highlighted in bold are reused in Fig. 5. Each cluster is composed of investors with four attributes: sector code, geographic location, gender and decade. See the attribute colour mapping in Fig. 2.
Figure 5: An example of investor cluster overlap between a sample of clusters observed throughout the years in Nokia investor networks and IPO networks during the first and second years. Each cluster is represented by a rectangle. Clusters on Nokia are rotated clockwise for visualisation purposes. Cluster labels show the year of the cluster and the conventional cluster name (introduced in Fig. 4.) X marks statistically significant overlaps between the clusters on Nokia and IPOs. An overlap of clusters between the IPOs and Nokia was observed for eight (seven) of the 14 networks in the first (second) year. Note that FI0009013312 has two clusters in the first year. Each cluster is composed of investors with four attributes: sector code, geographic location, gender and decade. See the attribute colour mapping in Fig. 2.

Conclusions

In the current paper, we analysed investor interactions and behaviours using a unique dataset of all Finnish investors’ transactions in the HSE. Our selected set of 14 securities is aligned to an IPO event, which occurs when a company first starts publicly trading its securities. We performed an analysis for multiple securities on an individual investor account level by constructing the networks from the statistically validated trading co-occurrences. Our main focus was on the newly emerging market networks and their common and persistent market-driven structures with the other mature and new stocks.

Applying a community detection algorithm, we found statistically similar investor clusters with synchronised trading strategies that were forming repeatedly over several years and for multiple securities. We detected statistically robust clusters between the first and second year after an IPO for eight out of 11 securities with statistically validated links in both years. We also found clusters that could be found within other securities.

Comparing the findings with the clusters on Nokia, we show an example of a highly persistent government cluster, and its evolution was shown to span over a long period. Our results show that some synchronised trading strategies in financial markets span across multiple stocks, are persistent over time and occur with mature stocks. However, this analysis applies to the HSE only and does not generalise to all markets. Further research should check if this phenomenon also exists in other stock exchanges with a larger amount of IPOs; however, to the best of our knowledge, these investor-level data are not available, for example, from the U.S. markets.

Traditional financial research assumes that investors are rational and hold optimal portfolios. However, actual investors have information and intellectual and computational limitations, and rather than optimising, they satisfice when making decisions. The systematic reoccurrence of the clusters gives a notion of possible stronger information connections that the investors share. For example, they may be consistently following the same public information sources or have mutual private information channels. However, with the current research, we do not try to explain the direction or the publicity of the information transfer. On the other hand, according to [11], investor networks can be considered proxies of information networks if they are fairly stable over time. In light of this argument, the persistent and security-wide investor clusters can represent the mutual information channels that exist for both new IPO securities and mature stocks (e.g., Nokia).

Data availability

The dataset analysed in the current study is not publicly available and cannot be distributed by the authors because it is a proprietary database of Euroclear Finland. The database can be accessed for research purposes under the nondisclosure agreement by asking permission from Euroclear Finland.

Acknowledgements

M.B. is grateful for the grants received from the Finnish Foundation for Technology Promotion and the Finnish Foundation for Share Promotion. K.B. received funding from the EU Research and Innovation Programme Horizon 2020 under grant agreement No. 675044 (BigDataFinance) and from the doctoral school of Tampere University. F.L. and D.P. acknowledge partial support from the European Community H2020 Program under the scheme INFRAIA-1-2014-2015: Research Infrastructures, grant agreement No. 654024 SoBigData: Social Mining and Big Data Ecosystem (http://www.sobigdata.eu). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Appendix A The evolution of investor clusters on IPOs

Figure 6: Cluster evolution for networks with the FDR validation. Partial results for ISIN FI0009013296. A cluster is represented by a rectangle. Cluster evolution is represented by the rectangles connected by downward arrows. The top rectangle is the cluster in the first year after the IPO, and the bottom rectangle is the cluster in the second year after an IPO in the same network. Sector code: - Households, - Non-financial, - Financial-Insurance, - General-Government, - Non-Profit, - Rest-World.
Geographic location: - Helsinki, - South-West, - Western-Tavastia, - Central-Finland, - Northern-Finland, - Ostrobothnia, - Rest-Uusimaa, - Eastern-Tavastia, - Eastern-Finland, - South-East, - Northern-Savonia.
Gender: - Male, - Female, - No-Gender.
Decade: - No-Age, - 1910, - 1920, - 1930, - 1940, - 1950, - 1960, - 1970, - 1980, - 1990, - 2000.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Figure 7: Cluster evolution for FDR networks (continued from Fig. 6). (a) FI0009013403. (b) FI0009013429. (c) FI0009012843. (d) FI0009015309. (e) FI0009013312. (f) FI0009012413. (g) FI0009010391.
(a) 1st year after IPO
(b) 2nd year after IPO
Figure 8: Statistically significant cluster overlaps across multiple securities. (a) Overlapping clusters across multiple securities during the first year after an IPO. (b) Overlapping clusters across multiple securities during the second year after an IPO. A cluster is represented by the rectangle. Statistically overlapping cluster groups are separated by horizontal lines. Each cluster is composed of investors with four attributes: sector code, geographic location, gender and decade. See the attribute colour mapping in Fig. 2.

Appendix B Evolution of investor clusters on Nokia over time

Figure 9: Examples of the persistence and evolution of clusters on Nokia over time. A cluster is represented by a rectangle. Statistically similar clusters are connected by arrows. Each cluster is composed of investors with four attributes: sector code, geographic location, gender, and decade. At least one cluster on Nokia in the chain has a statistically significant overlap with at least one cluster on IPOs. Cluster labels show the year of the cluster and the conventional cluster name. Sector code: - Households, - Non-financial, - Financial-Insurance, - General-Government, - Non-Profit, - Rest-World. Geographic location: - Helsinki, - South-West, - Western-Tavastia, - Central-Finland, - Northern-Finland, - Ostrobothnia, - Rest-Uusimaa, - Eastern-Tavastia, - Eastern-Finland, - South-East, - Northern-Savonia, - no-region. Gender: - Male, - Female, - No-Gender. Decade: - No-Age, - 1910, - 1920, - 1930, - 1940, - 1950, - 1960, - 1970, - 1980, - 1990, - 2000.

Appendix C Link validation with the FDR correction

Figure 10: Example of link validation for Kemira GrowHow (FI0009012843), first year after IPO, log-log scale. The number of observed synchronous trade co-occurrences: 33,595. The number of statistically validated links with the FDR correction: 1,481.

References

  • [1] Alexander Ljungqvist and William J Wilhelm Jr. Does prospect theory explain ipo market behavior? The Journal of Finance, 60(4):1759–1790, 2005.
  • [2] Alexander Ljungqvist and William J Wilhelm Jr. Ipo pricing in the dot-com bubble. The Journal of Finance, 58(2):723–752, 2003.
  • [3] Markku Kaustia and Samuli Knüpfer. Do investors overweight personal experience? evidence from ipo subscriptions. The Journal of Finance, 63(6):2679–2702, 2008.
  • [4] Jussi Karhunen and Matti Keloharju. Shareownership in finland 2000. Liiketaloudellinen aikakauskirja, pages 188–226, 2001.
  • [5] Jonas Spohr. Earnings management and ipos-evidence from finland. Liiketaloudellinen aikakauskirja, pages 157–172, 2004.
  • [6] Matti Keloharju. The winner’s curse, legal liability, and the long-run price performance of initial public offerings in finland. Journal of Financial Economics, 34(2):251–277, 1993.
  • [7] Josef Lakonishok and Edwin Maberly. The weekend effect: Trading patterns of individual and institutional investors. The Journal of Finance, 45(1):231–243, 1990.
  • [8] Federico Musciotto, Luca Marotta, Salvatore Miccichè, Jyrki Piilo, and Rosario N Mantegna. Patterns of trading profiles at the nordic stock exchange. a correlation-based approach. Chaos, Solitons & Fractals, 88:267–278, 2016.
  • [9] Sindhuja Ranganathan, Mikko Kivelä, and Juho Kanniainen. Dynamics of investor spanning trees around dot-com bubble. PloS one, 13(6):e0198807, 2018.
  • [10] Kęstutis Baltakys, Margarita Baltakienė, Hannu Kärkkäinen, and Juho Kanniainen. Neighbors matter: Geographical distance and trade timing in the stock market. Finance Research Letters, 2018.
  • [11] Han N Ozsoylev, Johan Walden, M Deniz Yavuz, and Recep Bildik. Investor networks in the stock market. The Review of Financial Studies, 27(5):1323–1366, 2013.
  • [12] Michele Tumminello, Fabrizio Lillo, Jyrki Piilo, and Rosario N Mantegna. Identification of clusters of investors from their real trading activity in a financial market. New Journal of Physics, 14(1):013041, 2012.
  • [13] Stanislao Gualdi, Giulio Cimini, Kevin Primicerio, Riccardo Di Clemente, and Damien Challet. Statistically validated network of portfolio overlaps and systemic risk. Scientific reports, 6:39467, 2016.
  • [14] Federico Musciotto, Luca Marotta, Jyrki Piilo, and Rosario N Mantegna. Long-term ecology of investors in a financial market. Palgrave Communications, 4(1):92, 2018.
  • [15] Kęstutis Baltakys, Juho Kanniainen, and Frank Emmert-Streib. Multilayer aggregation with statistical validation: Application to investor networks. Scientific reports, 8(1):8198, 2018.
  • [16] Mark Grinblatt and Matti Keloharju. How distance, language, and culture influence stockholdings and trades. The Journal of Finance, 56(3):1053–1073, 2001.
  • [17] Fabrizio Lillo, Salvatore Miccichè, Michele Tumminello, Jyrki Piilo, and Rosario N Mantegna. How news affects the trading behaviour of different categories of investors in a financial market. Quantitative Finance, 15(2):213–229, 2015.
  • [18] Milla Siikanen, Kęstutis Baltakys, Juho Kanniainen, Ravi Vatrapu, Raghava Mukkamala, and Abid Hussain. Facebook drives behavior of passive households in stock markets. Finance Research Letters, 2018.
  • [19] Frank Emmert-Streib, Aliyu Musa, Kestutis Baltakys, Juho Kanniainen, Shailesh Tripathi, Olli Yli-Harja, Herbert Jodlbauer, and Matthias Dehmer. Computational analysis of the structural properties of economic and financial networks. arXiv preprint arXiv:1710.04455, 2017.
  • [20] Michele Tumminello, Salvatore Micciche, Fabrizio Lillo, Jyrki Piilo, and Rosario N Mantegna. Statistically validated networks in bipartite complex systems. PloS one, 6(3):e17994, 2011.
  • [21] Martin Rosvall and Carl T Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4):1118–1123, 2008.
  • [22] Matti Ilmanen and Matti Keloharju. Shareownership in finland. Finnish Journal of Business Economics, 48(1):257–285, 1999.
  • [23] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), pages 289–300, 1995.
  • [24] Luca Marotta, Salvatore Micciche, Yoshi Fujiwara, Hiroshi Iyetomi, Hideaki Aoyama, Mauro Gallegati, and Rosario N Mantegna. Bank-firm credit network in japan: an analysis of a bipartite network. PloS one, 10(5):e0123079, 2015.
  • [25] Michele Tumminello, Salvatore Micciche, Fabrizio Lillo, Jan Varho, Jyrki Piilo, and Rosario N Mantegna. Community characterization of heterogeneous complex systems. Journal of Statistical Mechanics: Theory and Experiment, 2011(01):P01019, 2011.
  • [26] John R Nofsinger and Richard W Sias. Herding and feedback trading by institutional and individual investors. The Journal of finance, 54(6):2263–2295, 1999.
  • [27] Richard W Sias. Institutional herding. The Review of Financial Studies, 17(1):165–206, 2004.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
371301
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description