Change Detection in Noisy Dynamic Networks: A Spectral Embedding Approach

Change Detection in Noisy Dynamic Networks: A Spectral Embedding Approach

Isuru Udayangani Hewapathirana 1 1. Faculty of Science,
University of Kelaniya,
Sri Lanka.
12. School of Mathematics and Statistics,
University of Canterbury,
New Zealand.
   Dominic Lee 2 1. Faculty of Science,
University of Kelaniya,
Sri Lanka.
12. School of Mathematics and Statistics,
University of Canterbury,
New Zealand.
   Elena Moltchanova 2 1. Faculty of Science,
University of Kelaniya,
Sri Lanka.
12. School of Mathematics and Statistics,
University of Canterbury,
New Zealand.
   Jeanette McLeod 2 1. Faculty of Science,
University of Kelaniya,
Sri Lanka.
12. School of Mathematics and Statistics,
University of Canterbury,
New Zealand.
2email: ihewapathirana@kln.ac.lk
Abstract

Change detection in dynamic networks is an important problem in many areas, such as fraud detection, cyber intrusion detection and health care monitoring. It is a challenging problem because it involves a time sequence of graphs, each of which is usually very large and sparse with heterogeneous vertex degrees, resulting in a complex, high dimensional mathematical object. Spectral embedding methods provide an effective way to transform a graph to a lower dimensional latent Euclidean space that preserves the underlying structure of the network. Although change detection methods that use spectral embedding are available, they do not address sparsity and degree heterogeneity that usually occur in noisy real-world graphs and a majority of these methods focus on changes in the behaviour of the overall network.

In this paper, we adapt previously developed techniques in spectral graph theory and propose a novel concept of applying Procrustes techniques to embedded points for vertices in a graph to detect changes in entity behaviour. Our spectral embedding approach not only addresses sparsity and degree heterogeneity issues, but also obtains an estimate of the appropriate embedding dimension. We call this method CDP (change detection using Procrustes analysis). We demonstrate the performance of CDP through extensive simulation experiments and a real-world application. CDP successfully detects various types of vertex-based changes including (i) changes in vertex degree, (ii) changes in community membership of vertices, and (iii) unusual increase or decrease in edge weight between vertices. The change detection performance of CDP is compared with two other baseline methods that employ alternative spectral embedding approaches. In both cases, CDP generally shows superior performance.

Keywords:
Change Detection Dynamic Networks Sparse Networks Degree Heterogeneity Spectral Embedding Dimensionality Reduction Procrustes Analysis
journal: arXiv.org

1 Introduction

A network is a collection of entities, that have inherent relationships. Some examples include a social network of friendships among people, a communication network of company employees connected by phone calls, emails or text messages, and a biological network of neurons connected by their synapses. A network can be mathematically conceptualized as a graph by associating entities with vertices, and relationships with edges connecting vertices in the graph. For example, in the graph representation of a social network like Facebook, vertices may represent friends and edges represent friendship connections.

Most real-world networks evolve as time progresses. That is, the entities and their relationships keep evolving with time. This type of relational data can be represented as a dynamic network. For example, a communication network of a company is a dynamic network because new employees (entities) join the network and communication patterns (relationships) are modified continuously. Although both the entities and the relationships in a network can vary over time, in this paper, we assume that a dynamic network consists of a fixed set of entities with time varying relationships between them. A dynamic network can be represented as a time sequence of graphs, each representing the entities (as vertices) and their relationships (as edges) at a given time instant. Change detection is the process of continuously monitoring a dynamic network for deviations in entities and their relationship structure. A clear illustration of the change detection process based on a toy example is given in Hewapathirana (2019). Given a dynamic network conceptualized as a time sequence of undirected, weighted graphs, we address the problem of detecting vertex-based changes at each time instant. Detecting vertex-based changes is important in areas such as fraud detection, cyber intrusion detection and spam detection. For example, consider the time varying email communications between a set of employees in an organisation. A sudden collaboration between a set of employees who rarely communicated during the recent past, may indicate some unusual motivation or a major event involving the organisation Sricharan and Das (2014). Such changes in entity behaviour can be detected by monitoring the behaviour of vertices in the corresponding sequence of graphs.

Monitoring the behaviour of every vertex in the graph is a challenging problem because each graph in the time sequence contains a large number of vertices resulting in a high-dimensional mathematical object. Spectral embedding methods provide an effective solution to the high dimensionality problem. These methods can be used to obtain a low dimensional representation of the graph that excludes noise and redundant information and retain important structural information Skillicorn (2007). Our goal for spectral embedding is to obtain a low dimensional representation of vertices which maintains their edge-based closeness in the graph. In Figure 1, we give an illustration of an embedding of a small graph. The left figure (a) shows a graph where the length of each edge is drawn proportionally to the closeness between the corresponding pair of vertices. We can observe three clusters of vertices in this graph. The right figure (b) gives the two-dimensional embedding, where each vertex is represented as a point in a two dimensional Euclidean space. We can see how the edge-based closeness of vertices in the graph in (a) is maintained by the embedded points in (b). This characteristic emphasizes the clustering property of the embedded points Saerens et al. (2004).

(a) Original graph.
(b) Two-dimensional embedding.
Figure 1: Illustration of an embedding of a network using a toy example. The two-dimensional embedding preserves the edge-based closeness in the original graph.

In literature, we can find numerous approaches that detect vertex-based changes in a time series of graphs Neil et al. (2013); Heard et al. (2010); Papadimitriou et al. (2010); Priebe et al. (2005); Gupta et al. (2012); Yu et al. (2018). However, only a few utilize spectral methods. For example, Akoglu and Faloutsos (2010); Idé et al. (2007); Sun et al. (2008) apply matrix-based spectral embedding while Sun et al. (2006); Papalexakis et al. (2012) use a tensor-based spectral embedding method. The majority of the real-world graphs are sparse and contain vertices with heterogeneous degrees Sengupta and Chen (2015). Currently available spectral-based change detection methods do not simultaneously address sparsity and degree heterogeneity issues prior to obtaining an embedding from the graph. Consequently, changes involving only a few vertices, or changes involving low degree vertices, tend to be missed by these methods.

In this paper, we propose a novel method called CDP (change detection using Procrustes analysis) to detect changes in vertex behaviour. In our method, we first obtain a low dimensional embedding from the weighted adjacency matrix representing the graph at each time instant. Each embedded point characterizes the behaviour of a vertex in the graph at a given time instant. We use statistical Procrustes analysis techniques Dryden and Mardia (1998) to compare embeddings across time instants and calculate change scores for vertices. We evaluate the performance of CDP using extensive simulation experiments and the dynamic network for the Enron email dataset Klimt and Yang (2004). By carefully structuring the simulation experiments, we fully evaluate the performance of the method in detecting various types of changes that occur in real world networks. In all our experiments, we formally compare CDP to two other methods. Based on the results, we conclude that CDP efficiently and effectively identifies various vertex-based changes that are considered in our experiments.

The rest of the paper is organized as follows. We first provide a brief overview of our overall change detection method in Section 2. In Section 3, we provide a detailed description of our change detection framework. In Section 3.6, we summarize our change detection procedure and present the CDP algorithm. We evaluate the performance of CDP using simulation experiments (Section 4) and a real-world application (Section 5). In each experiment, the performance of CDP is compared with two other change detection approaches which are discussed in Section 4.5. Finally, we conclude by summarizing our findings in Section 6.

2 Brief Overview

Our proposed method, CDP (change detection using Procrustes analysis), aims to detect vertex-based changes in a dynamic network. A dynamic network is represented as a time sequence of undirected graphs, where each graph is then represented as a symmetric, weighted adjacency matrix. We apply spectral methods to the weighted adjacency matrix and embed the vertices into a -dimensional Euclidean space that preserves the closeness between vertices in the original graph representation. The embedded points also highlight important vertex properties such as transitivity, homophily by attributes, and clustering, that are present in most real-world graphs Hoff et al. (2002); Nickel (2007). In this paper, we define these embedded points as features for vertices characterizing vertex behaviour at each time instant. Vertices in sparse and heterogeneous graphs depict entities with different abilities to establish connections. It is difficult to achieve a good representation if we ignore sparseness and degree heterogeneity when obtaining a low dimensional embedding Joseph and Yu (2013). By employing ideas from spectral graph theory Chung (1997), combined with the graph regularization technique introduced in Amini et al. (2013), we formulate a strategy to effectively embed sparse and heterogeneous graphs into low dimensional Euclidean spaces. It is important to identify an optimum value for the low dimension in order to obtain a highly accurate representation of the inherent clusters of the data using the embedded space Brand and Huang (2003). CDP adapt the low-rank matrix approximation method in Achlioptas and McSherry (2007) to automatically estimate the proper embedding dimension.

Generalized orthogonal Procrustes analysis (GPA) methods can be used to calculate an average from a set of matrices after removing Euclidean similarity transformations Dryden and Mardia (1998); Stegmann and Gomez (2002). We adjust the standard GPA technique to extract profile features during the recent past time instants, and calculate change scores for vertices at each time instant. A profile feature, which is also a vector, represents the average behaviour of the vertex in the recent past time instants (previous time instants). Our idea of applying Procrustes analysis techniques to compare embeddings for the purpose of change detection in dynamic networks is new and is inspired by Tang et al. (2012). Using a moving window approach, the change score calculation procedure is repeated over time to detect changes for all time instants.

Figure 2 provides an illustration of the overall CDP framework. In order to evaluate the performance of CDP, we apply it to both synthetic and real-world datasets. We compare our method with two baseline change detection methods that are also based on different spectral embedding procedures. The results show that CDP performs better than the others in various change scenarios considered.

Figure 2: Illustration of the overall CDP framework. The time sequence of graphs is first represented as a time sequence of weighted adjacency matrices. At each time instant, we perform spectral embedding on the matrix and obtain an embedding where each row corresponds to a feature representing a vertex’s behaviour. Next, we define a window of length, , over the previous embeddings, and use GPA to obtain the profile embedding, where each row corresponds to a vertex’s profile feature. The dissimilarity between the current embedding and the profile embedding is then obtained to compute the change scores of the vertices at the current time instant. The window is moved along all preceding time instants to calculate vertex change scores for the whole time period.

3 Problem Framework

3.1 Notation and Terminology

Let be a sequence of graphs defined over time instants, . Each is a weighted and undirected graph with a fixed set of vertices, . In our discussions, we also refer to as vertex . Define the edge set of graph, , as , where , and contains edge, , if there is an edge between vertex and vertex . Each graph is represented by a symmetric weighted adjacency matrix, , of dimension , where each element, . If , then the vertices and are not connected in . The degree of each vertex at time instant is defined as

The degree matrix, , is the diagonal matrix containing the vertex degrees, , on the diagonal. Let be the average vertex degree of graph, , where . From Amini et al. (2013), we define a network as sparse when .

3.2 Problem Statement

At each time instant , our goal is to calculate a change score for each in , relative to the recent past behaviour. Our definition of the change score for at time instant is defined as follows.

Definition 1

The change score, , for at time instant is

(1)

where is the feature vector representing the behaviour of at time instant , is the profile feature vector representing the behaviour of in the recent past time instants, and is a dissimilarity function.

According to this definition, our overall change detection procedure can be summarized as follows.

  1. [noitemsep,topsep=0pt]

  2. Obtain a feature, , for from each , where and .

  3. Obtain a profile feature, , for from recent past time instants, , where and .

  4. Calculate the dissimilarity between and , and obtain the change scores, , for by using a suitable dissimilarity function .

In Sections 3.3, 3.4, and 3.5, we discuss how these steps are implemented respectively.

3.3 Feature Extraction at Each Time Instant

In this section, we formulate our spectral embedding strategy for each (Note that in this paper, for discussions focused on one time instant, we drop the superscript to simplify notation. For example, we use instead of to denote the matrix of ). The embedding of a graph is an matrix, where rows correspond to the dimensional embedded points for vertices. Our spectral embedding procedure consists of three main steps.

  1. Pre-processing the weighted adjacency matrix, , of .
    As we consider weighted, heterogeneous graphs, some edges possess considerably higher weights than the other edges and can turn out to be very influential during the embedding process. These edges are called dominant edges. The elements of the corresponding weighted adjacency matrix, , also show high variability. The presence of dominant edges may also hinder the detection of unusual edges that have lower weights, preventing the change from being detected. Applying a transformation on , such as the logarithm, helps to mitigate this problem. After the log transformation, we scale each element, so that all elements in the resulting matrix are between zero and one. Below we state our two preprocessing steps in more detail.

    1. Apply a log transformation to each element in , and obtain , where

      (2)
    2. Scale the elements of by its maximum element, and obtain , where

      (3)

      Note that the methodology discussed in this paper is also applicable to an unweighted graph, where the representation matrix is the binary adjacency matrix, , with elements that are ’s or ’s. However, performing log transformation followed by scaling would make no difference, hence can be omitted in this case.

  2. Obtaining a suitable representation matrix.
    The mapping of edge weights into a suitable representation matrix is an essential task when using the embedded points to study the structure of the underlying graph Skillicorn (2007). In sparse and heterogeneous graphs possessing power law degree distributions, the embeddings from the weighted adjacency matrix will only focus on vertices with the highest degrees, resulting in an inaccurate representation of the underlying connectivity structure Mihail and Papadimitriou (2002). To account for sparsity and degree heterogeneity, we construct the regularized degree normalized weighted adjacency matrix, , as the representation matrix. Let the regularizer, , be

    (4)

    Then is given by

    (5)

    where

    (6)

    where is an -dimensional column vector containing all ones, and is the degree matrix for . The regularization step (Equation 6) addresses sparseness by adding to each element in , while the degree normalization step (Equation 5) further adjusts for the irregularity in the degree distribution by dividing each element, , by . For a detailed theoretical justification on using as the representation matrix to obtain an embedding, we refer the reader to Amini et al. (2013).

  3. Obtaining a low dimensional embedding from the representation matrix, , using spectral decomposition.

    A low dimensional embedding, , from the representation matrix, can be seen as a solution to the optimization function,

    (7)

    subject to , where for Ng et al. (2001). The embedding, , can be estimated by performing the singular value decomposition (SVD), , and extracting principal singular vectors. In order to determine we employ the low-rank matrix approximation procedure in Achlioptas and McSherry (2007), which proposes to retain those singular vectors capturing the strongest structure in based on the norm. The norm of matrix is defined as,

    (8)

    where denotes the Frobenius norm.

    We refer the intersted reader to Achlioptas and McSherry (2007) for a detailed and theoretical description of the method. In this section, we summarize our implementation of their method in Algorithm 1.

    0:  (i) Symmetric matrix, , with dimensions , where , (ii) threshold, 0:   1:  Compute SVD, 2:  Update 3:  Compute SVD, 4:  Initialize k=1, 5:  while  and  do 6:      7:      8:     Calculate by randomly flipping the signs of elements in such that, and 9:     Update 10:     Set 11:  end while 12:  if  then 13:     Set the converged value 14:  else 15:     Set the converged value 16:  end if
    Algorithm 1 Optimal Low-Rank Approximation

    It is important to note that the regularization step (Equation 6) inserts edges between all disconnected components and creates a connected graph. For such a graph, the first principle singular vector, (with corresponding singular value ), of , is a constant vector and therefore not useful for the embedding Von Luxburg (2007). Thus, to obtain the embedding dimension, , we initially remove the first reconstruction in step 2 of Algorithm 1. Hence, the output returned by the algorithm is the number of principal singular vectors that should be kept starting from the second principal singular vector onwards111The Frobenius norm of a matrix measures its average linear trend Achlioptas and McSherry (2007). Hence, the division by the Frobenius norm of in step 9 of the algorithm provides a standardization to each Skillicorn (2007).. Once is obtained, the low dimensional embedding, , is given by

    Each row vector, , is the feature for at a given time instant.

    Furthermore, an important input parameter for Algorithm 1 is the convergence threshold, . Since there is no definitive method for choosing discussed in Achlioptas and McSherry (2007), we conduct extensive experiments and decide (See Appendix).

By following steps 1, 2, and 3 in Section 3.3, each graph, , in the time sequence is represented as a low dimensional embedding, , where is the embedding dimension returned by Algorithm 1. After following the three steps discussed in this section, the sequence of graphs, , is reduced to a sequence of low dimensional embeddings, .

3.4 Obtaining the Profile Features at Each Time Instant

After performing the steps stated in Section 3.3, we have a set of embeddings from the recent past time instants. From the uniqueness property of SVD Skillicorn (2007), the embedding obtained at each time instant is unique up to Euclidean similarity transformations such as scale, rotation and reflection. Thus, we cannot directly average the embeddings from the recent past time instants to obtain profile features. Generalized orthogonal Procrustes analysis (GPA) can be used to obtain an average from a set of matrices after adjusting for Euclidean similarity transformations. In this section, we show how we employ GPA to obtain an average embedding, , from the set of embeddings, . We call , the profile embedding for time instant . Let us first state the GPA procedure.

The pre-shape, , of a matrix, , is defined as

(9)

where

(10)

and the centering matrix, . Here, is an identity matrix, and is an dimensional vector of ones. Let be matrices, each of dimension . GPA involves the optimization of the least squares objective function

(11)

where is the orthogonal rotation/reflection matrix corresponding to , and is the preshape corresponding to as given in Equation 9. In Algorithm 2 we summarize our implementation of the iterative algorithm that solves the GPA objective function.

0:  , threshold
0:  , for
1:  Initialize ,
2:  while  do
3:     for  do
4:        Calculate , where
5:        Calculate SVD,
6:        Calculate
7:        Obtain
8:     end for
9:     Calculate mean embedding, , using the aligned embeddings
10:     Update
11:     Update:
12:  end while
Algorithm 2 Generalized Procrustes Distance Calculation

There is one limitation in applying GPA to the embeddings obtained at different time instants. GPA assumes that all matrices, , are of the same dimension, but the embeddings resulting from our methods discussed in Section 3.3, can be of different dimensions. We find two possible solutions to address this problem. Let .

  1. [topsep=0pt]

  2. For any with , append columns of zeros to to make it of size .

  3. For any with , truncate the additional columns of to make it of size .

Truncating extra dimensions causes us to drop singular vectors that may describe important structure of the graph. Appending columns of zeros does not cause loss of information, and is thus preferred. Thus, whenever the dimensions of the embeddings to be compared are different from each other, we append the low dimensional embedding with columns of zeros before fitting the generalized Procrustes model.

Therefore, the profile embedding, , is calculated as follows:

  1. Let . Append columns of zeros to each and obtain .

  2. Perform the generalized Procrustes analysis procedure and estimate the mean embedding, . To do this, we input into Algorithm 2, and estimate the mean embedding, .

At each time instant , the rows of give the profile features for the vertices in the graph.

3.5 Change Score Calculation

After applying the methods discussed in Sections 3.3 and 3.4, at each time instant , we end up with the profile embedding, , and current embedding, . Vertex change scores are calculated by computing the dissimilarity between and . Procrustes analysis can be used to compare two matrices after adjusting for Euclidean similarity transformations. From Section 3.4, when , we append columns of zeros to the lower dimensional embedding. Thus, the change score, , for vertex at time instant is calculated as follows:

  1. Let . Append and columns of zeros to and , respectively and obtain and .

  2. Perform GPA using Algorithm 2 and obtain the transformed embeddings, and , and the average of the transformed embeddings, .

  3. For each vertex , calculate the change score

    (12)

3.6 Proposed Algorithm - CDP (Change Detection using Procrustes Method)

We have now constructed the three main steps of our change detection procedure. These include, at each time instant, extracting features for vertices through graph embedding (Section 3.3), calculating profile features for vertices by applying GPA on the recent past embeddings (Section 3.4), and finally calculating change scores for vertices through generalized Procrustes distance calculation between current and profile embeddings (Section 3.5). The steps are listed in Algorithm 3.

0:  (i) Time sequence of symmetric, weighted adjacency matrices, , where each has dimension (ii) window size,
0:  Time sequence of vertex change scores, . Each is a vector of dimension
1:  for  to  do
2:     Update:
3:     Update:
4:     Calculate Update: ,Calculate , where Calculate
5:     Input to Algorithm 1 and estimate
6:     Perform SVD:
7:     Obtain the low dimensional embedding , where
8:  end for
9:  for  to  do
10:     Let
11:     for  do
12:        if  then
13:           append columns of zeros to
14:        end if
15:     end for
16:     Input into Algorithm 2, and estimate the profile embedding
17:     Let
18:     if  then
19:        append columns of zeros to
20:     end if
21:     if  then
22:        append columns of zeros to
23:     end if
24:     Align and with each other using Algorithm 2, and obtain the adjusted embeddings, and , and the mean
25:     Calculate vertex change scores,
26:  end for
Algorithm 3 Change Detection using Procrustes Analysis - CDP

After describing our algorithm, we evaluate its performance by conducting experiments on simulated dynamic networks and a real-world dataset.

4 Simulation Experiments

Simulated networks enable us to comprehend not only how and when a specific technique is doing well, but also when a technique is not doing well Yu et al. (2019). We conduct such an investigation by generating different synthetic datasets that mimic several real-world change scenarios. Within each scenario, a subset of vertices under-go change from recent past behaviour.

4.1 Overall Setting

For each change scenario we generate a time sequence of symmetric weighted adjacency matrices, , to represent a time sequence of weighted graphs. Similar to Wang et al. (2017), we assume that each network is generated from a certain recognized underlying model that determines the process of generation. We assume that the edges of the graphs have distribution and when a change occurs the distribution becomes . We consider two types of changes.

  1. Change occurs at a given time instant: change-point. A change point is injected to the time sequence of graphs by defining the edge distribution as

    (13)

    for .

  2. Change occurs at a time instant, and persists for some time period: change-interval. A change-interval is generated by defining the edge distribution as

    (14)

    for .

In Section 4.2, we discuss the model that is used to generate graphs for our experiments.

4.2 Random Graph Model Used for Synthetic Network Generation

The degree corrected stochastic block model (DCSBM) Karrer and Newman (2011) is a commonly used model because it can closely mimic the community structure of real-world networks. In our simulation experiments, we employ the DCSBM to define the probability distribution of the edges of a graph. By adjusting the model parameters, we obtain a wide variety of edge distributions.

Let denote the block membership of vertex . Then the vector, , of dimension denotes the block memberships of the vertices in the graph. In terms of the weighted adjacency matrix, , its distribution under the DCSBM is given by

(15)

where is the expected number of edges between a vertex in block and a vertex in block , and is an -dimensional vector of degree parameters. Each element, , is a Poisson random variable with mean . In order to mimic the degree distribution of real-world graphs, the vector, , is generated from a power-law distribution Clauset et al. (2009) defined as

where is the lower bound of the support of , is the shape parameter. The ’s are normalized to sum to one for vertices in the same block, i.e., (where if vertex belongs to block ).

To specify what is, let be the block probability matrix where each element, , denotes the probability of an edge between vertices in blocks and . Using , we can obtain , where each element, , denotes the number of vertices in block . Using and we can calculate the expected number of edges, , between a vertex in block and a vertex in block giving

We select to have the form

(16)

where . For example, for a graph with three blocks, can take the form,

(17)

where give the intra-block probabilities. is given by

(18)

where is the vector of ones, and . can be regarded primarily as an inter-block probability. Thus, by varying , we can vary the level of noise in the generated graphs, which makes it more difficult to identify the blocks.

The Equations (15 to 18) for the distributions of probability make the DCSBM a strong, flexible and popular tool for analyzing complex networks De Ridder et al. (2016); Yu et al. (2019). The distributions, and , for the edges are obtained using different sets of parameter values. Each set of parameter values is chosen to mimic real-world change scenarios involving vertices. In Table 1, we summarize the parameter settings of different DCSBM models used to generate graphs in our experiments.

4.3 Change Scenarios

A detailed review on numerous change scenarios studied in previous research is given in Hewapathirana (2019). Based on these ideas, we come up with the following change scenarios to evaluate our change detection method.

  1. Change in block membership - group-change.
    A set of vertices in a block change their block (group) membership.

  2. Change in block Structure,

    1. split - a block in the graph splits into two blocks,

    2. merge - the reverse of split: two blocks join together and form one block,

    3. form - a high increase in connections in a block that was previously sparse,

    4. fragment - the reverse of form: a dense block becomes sparse.

  3. Change in degree,

    1. Heterogeneous degrees to homogeneous degrees - hetero-to-homo.
      The degree parameters of a block of vertices in the graph change from heterogeneous to homogeneous.

    2. Homogeneous degrees to heterogeneous degrees - homo-to-hetero.
      The reverse of hetero-to-homo: the degree parameters of a block of vertices change from homogeneous to heterogeneous.

  4. Change in connectivity patterns:

    1. Clear block structure to complex structure - simple-to-complex.
      Two blocks add inter-block edges, disrupting the clear block structure in the graph.

    2. Complex block structure to clear block structure - complex-to-simple.
      The reverse of simple-to-complex: most inter-block edges between two blocks vanish, resulting in a graph with a clear block structure.

In Table 2, we give a detailed description of how we mimic these change scenarios through transitions of the underlying generative models. Each scenario corresponds to changes in the connectivity patterns of a subset of vertices in the graph. For each scenario, we visualize an example of ’s generated from the models corresponding to and .

For each change scenario, we generate a sequence of graphs, that is, we set . The parameters for the two types of changes defined in Section 4.1 are as follows.

  1. change-point (Equation 13): ,

  2. change-interval (Equation 14): .

We use windows of sizes , and , and calculate change scores for all vertices. We repeat this times, and calculate our performance measures (Section 4.4).

Model222Different values of were tested (), but the same value is used for the pair of models involved in a given change scenario. Distribution of 3 4 3 3 333 is a vector of degree parameters of the set of vertices, , and denotes a positive vector of constants. 3 444 is a vector of degree parameters of the set of vertices, . 3
Table 1: Parameter settings of different models with fixed parameters, , with , and with .
No. Change Scenario Changed Vertices
1 group-change
2 split
3 merge
4 form
5 fragment
6 hetero-to-homo
7 homo-to-hetero
8 simple-to-complex
9 complex-to-simple
Table 2: Illustration of change scenarios. Each scenario corresponds to a change in the connectivity patterns of a subset of vertices in the DCSBM graph and is visualized using the pixel-plots of the adjacency matrices generated.

4.4 Performance Measure

Since our goal is to detect vertices that have changed their behaviour with respect to the recent past, we measure the performance of CDP with respect to the ability of the change scores produced to discriminate between changed and unchanged vertices. Each change scenario discussed in Section 4.3 involves a set of vertices, , changing their behaviour. Let . If our method performs well, the change scores for vertices in should be higher than the change scores for the rest of the vertices in , especially at the time instant corresponding to a change. Note that , , and .

Let us consider a time sequence of vertex change scores, , where each is a vector of length obtained from a single simulation run of a change scenario. Let be the vector of change scores obtained for , and let be the vector of change scores obtained for . We use a sampling procedure to estimate

which is the probability that vertex, , in has a higher change score than vertex, , in . We separately sample (with replacement) a vector of elements, , from and a vector of elements, , from ; then is calculated by counting the proportion of entries in that are larger than the corresponding entries in as

where is one if and zero otherwise. In our experiments we use .

A proportion greater than indicates a higher chance of a change score for a vertex in being greater than a change score for a vertex in . By repeating this for all simulation runs, we obtain a vector of probabilities, . If all elements of are greater than and closer to one at a changed time instant, good change detection performance is indicated. Instead of directly using , we use the log odds

(19)

which measures the odds that a vertex in has higher change scores than a vertex in . When a change occurs, we expect the values of to lie above zero and be strongly positive. After calculating , we further calculate the log odds ratio between time instants and which gives

(20)

In our experiments we calculate both and to measure detection performance.

4.5 Comparison Methods

We compare our CDP algorithm with two baseline methods.

  1. ACT
    This is the activity (ACT) vector-based change detection algorithm developed by Idé and Kashima (2004). They employ a spectral embedding procedure, and represent a time sequence of graphs as a time sequence of activity vectors, , for . A profile vector, , is calculated from recent past activity vectors. The change score, , can be calculated as

    (21)

    where denotes absolute value. The elements of the activity vector, , denote the eigenvector centrality scores of the vertices in the graph. Idé and Kashima (2004) developed ACT to perform change detection in a time sequence of dense graphs. However in the majority of the applications we encounter, the graph obtained at each time instant is sparse and heterogeneous. As discussed in Section 2, such a graph consists of vertices with very high degree (hubs) as well as very low degree (sometimes zero; resulting in disconnected vertices in the graph). According to Martin et al. (2014), eigenvector centrality is a poor performance measure of centrality of vertices in sparse graphs. They show that the centrality scores are concentrated only on hubs and fail to capture the centrality of lower degree vertices. While this situation might be useful for some applications, for our current requirement of detecting changes in the behaviour of all vertices in the graph, it is inadmissible. Thus, we find Idé and Kashima (2004)’s approach cannot be generalized to most real-world graphs.

  2. ACTM
    We make a slight improvement to the profile vector calculation step in Idé and Kashima (2004) and call this method the modified activity (ACTM) vector-based algorithm. Recall that Idé and Kashima (2004) represent the recent past behaviour using the profile vector, . However, is only the first vector, , from the singular vectors, , resulting from the SVD of the matrix of activity vectors, , representing the recent past. The left singular vectors, , define an orthonormal basis for the subspace defined by the activity vectors, . Selecting only the first vector, , might cause us to loose information. Hence, a more representative profile vector can be obtained by projecting onto the dimensional orthonormal subspace defined by , where

    (22)

    The profile vector, is also the best approximation to in the subspace spanned by Poole (2014). The error vector, , gives an indication of the deviation of from its recent past. Thus, the change score, , is

    (23)

4.6 Results

For each change scenario discussed in Section 4.3, we first calculate the performance measure for several time instants before and after for CDP, ACT, and ACTM for both change-point and change-interval. In Figure 3, we show the corresponding results for group-change with for .

Figure 3: Observing on CDP (left), ACT (middle), and ACTM (right) over time on group-change for change point(top) and change-interval (bottom). CDP shows a clear detection at for both change point and change-interval. Although ACT and ACTM methods also show an increase at , the intervals still lie below zero.

Let us first discuss the results of CDP. For all time instants, before , the ’s are centred at a given level. All graphs generated before are from the same model . Since there is no model change, the odds of each being greater than are similar during these time instants. At , the generative model changes to , and we see a clear increase of compared to . This shows that there is a clear increase in . From onwards, we observe different patterns for change-point and change-interval.

  • [noitemsep,topsep=0pt]

  • Change-point: the generative model returns to at and persists for all time instants, .

    1. There is a big decrease in compared to . Our window is . Inside the window, there are four graphs generated from , and one graph generated from . Thus, unlike at , there is less change compared to the recent past. Thus, is less than .

    2. For , the window contains four graphs generated from , and one graph generated from . Thus, the change occurring in these time instants is similar. So, the ’s are generally centred at the same level.

    3. At , the window contains graphs generated purely from , and the comparison is also done with a graph generated from . So the change involving the set of vertices, , at is less than the change at . Hence, decreases.

    4. For , the window contains graphs generated purely from , and the comparison is also done with a graph generated from . Thus, the change occurring in these time instants is similar. So, the ’s are generally centred at the same level.

  • Change-interval: the generative model is for time instants, .

    1. [nosep]

    2. There is a decrease in compared to . Inside the window there are four graphs generated from , and one graph generated from . So there is less change involving the set of vertices, , compared to their change at . Hence, is less than .

    3. For all time instants, , the change becomes less and less as the window (recent past) contains more time instants which are similar to the current time instant. So decreases with time, causing to decrease accordingly.

    4. At , the window contains graphs generated purely from , and the comparison is also done with a graph generated from . So the change is less compared to the change at . Thus, is less than .

    5. For , the window contains graphs generated purely from , and the comparison is also done with a graph generated from . Thus, the change occurring in these time instants is similar. So, the ’s are generally centred at the same level.

For change detection methods ACT and ACTM, ’s are wider. Furthermore, the bulk of lies below zero for all time instants. Thus, although we see an increase in the intervals for ACT and ACTM, these methods do not perform well in detecting the change.

Note that the graphs generated at each time instant are independent samples from a given generative model ( or ). Thus, within the same generative model, edge weights can change from one time instant to another, also causing the connectivity patterns of vertices to change. For example, in Figure 3 (Top), we observe that the ’s are centred at a positive level even within the generative model, . This shows that the set of vertices, , for the group-change scenario (Table 2) undergo a higher change in their connectivity patterns for independent graph realizations under . However, when calculating our performance measure, the set of vertices, , does not necessarily contain those vertices whose connectivity patterns have changed between independent realizations from a given generative model. For example, in Figure 3 (Bottom), we observe that ’s are centred at a negative level within generative model, . This shows that the vertices in the set, , are the ones that change more during independent graph realizations under . Despite these changes occurring in connectivity patterns within a given generative model, our interest lies in detecting a change during model transitions. At , we expect to be larger than the ’s observed for time instants corresponding to the same model. Thus, in order to clearly observe this, we calculate the performance measure (Equation 20). In Figure 5, we plot for CDP, ACT, and ACTM with for the group-change scenario for . We observe that provides a clearer picture than on a method’s ability to detect change caused by model transitions. For the rest of the scenarios, we only plot over time and compare the performance measure, (Equation 19), for CDP, ACT, and ACTM for all window sizes only at the time instant corresponding to a change, i.e., we only compare .

Figure 4: Observing on CDP (left), ACT (middle), and ACTM (right) over time on group-change for change point(top) and change-interval (bottom).
Figure 5: Plot of on CDP, ACT, and ACTM for group-change for at change-point. For all , is positive and increases with . For all , a majority of elements of and are negative.
Figure 4: Observing on CDP (left), ACT (middle), and ACTM (right) over time on group-change for change point(top) and change-interval (bottom).

We compare the ’s for only those change scores obtained on change-point scenarios since it is sufficient to calculate for either change-point or change-interval as both involve similar changes when considering only . If is positive, then this indicates that the vertices in the set, , have higher change scores compared to the rest of the vertices in , at the time instant of change (). Figure 5 shows returned by CDP, ACT, and ACTM for the group-change scenario using various window sizes, . The returned by CDP for all window sizes are clearly positive. For returned by ACT and ACTM, we see the bulk of the interval lying below zero for all window sizes, showing failure in detection for those methods. We also observe for the other change scenarios, split (Figure 7), merge (Figure 9), form (Figure 11), fragment (Figure 13), hetero-to-homo (Figure 15), homo-to-hetero (Figure 17), simple-to-complex (Figure 19), and complex-to-simple (Figure 21). Our results show that CDP successfully detects the change in all the scenarios considered. ACT shows failure in detection for all change scenarios except form and fragment, while ACTM shows failure in detection for all change scenarios except form.

We further observe for split (Figure 7), merge (Figure 9), form (Figure 11), fragment (Figure 13), hetero-to-homo (Figure 15), homo-to-hetero (Figure 17), simple-to-complex (Figure 19), and complex-to-simple (Figure 21). CDP shows a clear detection at for change scenarios form, fragment, hetero-to-homo, homo-to-hetero, simple-to-complex, and complex-to-simple. In the case of split and merge, we observe an increase at , with the intervals being wide. ACT and ACTM do not show a clear increase at for split, merge, simple-to-complex, and complex-to-simple cases. For homo-to-hetero and hetero-to-homo, we observe a slight increase in for ACT and ACTM. For fragment, is highly negative for both ACT and ACTM methods. Thus, although we observed in Figure 13 that ACT shows good performance in terms of , Figure 13 shows that the change scores have decreased at . Thus, ACT shows failure in detecting the fragment scenario.

In Table 3, we perform the sign test to assess the statistical significance of the observed results. We compare calculated for group-change, split, merge, form, hetero-to-homo, homo-to-hetero, simple-to-complex, and complex-to-simple at for change-point. We do not perform the sign test for fragment scenario as we already observed a decrease in for ACT and ACTM compared to previous time instants in Figure 13 (this clearly shows how CDP outperforms these two methods). The leftmost column in Table 3 gives the alternative hypothesis tested. Subsequently in Table 4, we show the proportion of values in , that correspond to the hypothesis tested in Table 3. CDP outperforms ACT and ACTM for all change scenarios except form. ACTM outperforms ACT for group-change, form, and homo-to-hetero. For the other scenarios tested, there is no difference in for ACT and ACTM. However, when we consider the proportions in Table 4, the majority of the entries in are greater than for all change scenarios except hetero-to-homo.

Figure 6: Plot of on CDP, ACT, and ACTM for split for at change-point. For all , is positive and increases with . For all , and are negative.
Figure 7: Observing on CDP (left), ACT (middle), and ACTM (right) over time on split for change point(top) and change-interval (bottom). Although shows an increase at for both change point and change-interval, shows high variability further extending below zero. ACT and ACTM do not show an increase at .
Figure 6: Plot of on CDP, ACT, and ACTM for split for