# Graph Diffusion-Embedding Networks

###### Abstract

We present a novel graph diffusion-embedding networks (GDEN) for graph-structured data. GDEN is motivated by our closed-form formulation on regularized feature diffusion on graph. GDEN integrates both regularized feature diffusion and low-dimensional embedding simultaneously in a unified network model. Moreover, based on GDEN, we can naturally deal with the structured data with multiple graph structures. Experiments on semi-supervised learning tasks on several benchmark datasets demonstrate the better performance of the proposed GDEN when comparing with the traditional GCN models.

Graph Diffusion-Embedding Networks

Bo Jiang, Doudou Lin, Jin Tang School of Computer Science and Technology Anhui University Hefei, China jiangbo@ahu.edu.cn

noticebox[b]Preprint. Work in progress.\end@float

## 1 Introduction

### 1.1 Graph based feature diffusion

Given a graph with denoting the nodes and representing the edges. Let be the corresponding adjacency matrix and be the feature set of nodes, where denotes the attribute vector for node . The aim of our graph based feature diffusion is to learn a feature representation for each node by incorporating the contextual information of the other node representations. In the following, we provide three kinds of graph based feature diffusion model , as summarized in Table 1. Similar diffusion models have been commonly used in ranking and label propagation process zhou2004learning (); zhou2004ranking (); donoser2013diffusion (); lu2014learning (). Differently, in this paper, we propose to explore them for feature diffusion problem whose aim is to learn a contextual feature representation for each graph node.

(1) Graph Laplacian diffusion

Motivated by manifold ranking zhou2004learning (), we propose to compute the optimal diffused representation by solving the following optimization problem:

(1) |

where denotes the diffused feature of node . The first term conducts feature diffusion/propagation on graph while the second term encourages to preserve the original feature information in diffusion process. It is known that, the optimal closed-form solution for this problem is given by

(2) |

where and , and is the Laplacian of graph.

(2) Graph normalized Laplacian diffusion

One can also compute the optimal diffused representation by solving the following normalized optimization problem zhou2004ranking (); lu2014learning ():

(3) |

The optimal closed-form solution for this problem is given by

(4) |

where is the normalized Laplacian of the graph and .

(3) Graph random walk diffusion

Another method to formulate the feature diffusion is based on random walk with restart (RWR) model zhou2004ranking (); lu2014learning () and obtain the equilibrium representation on graph. In order to do so, we first define a transition probability matrix as

(5) |

Then, the RWR is conducted on graph and converges to an equilibrium distribution . Formally, it conducts update as

(6) |

where is the jump probability. We can obtain the equilibrium representation as

(7) |

Table 1 summarizes the feature diffusion results of the above three models. In additional to the above three models, some other models can also be explored here donoser2013diffusion (); lu2014learning (). There are three aspects of the above three diffusion models. (1) They conduct feature diffusion while preserve the information of original input feature in feature representation process. (2) They have explicit optimization formulation. The equilibrium representation of these models can be obtained via a simple closed-form solution. (3) They can be naturally extended to address the data with multiple graph structures, as shown in §3.

Model | Diffusion function |
---|---|

Laplacian diffusion | |

Random walks with restart | |

Normalized Laplacian diffusion |

### 1.2 Graph embedding

Graph embedding techniques have been widely used in dimensionality reduction and label prediction. Given a graph with adjacency matrix and . The aim of graph embedding is to generate a low-dimensional representation and for node . One popular way is to utilize linear embedding, which assumes that

(8) |

where denotes the linear projection matrix.

## 2 Graph Diffusion-Embedding Networks

In this section, we present our graph diffusion-embedding networks (GDEN). Similar to previous GCN defferrard2016convolutional (); kipf2016semi (), The aim of our GDEN is seek a nonlinear function to conduct dimensionality reduction and label prediction. It contains several propagation layers and one final perceptron layer together. For different tasks, one can design different final perceptron layers.

### 2.1 Propagation layer

Given any input feature and graph structure (adjacency matrix) , GDEN conducts the layer-wise propagation rule as,

(9) |

where . Function denotes the diffused feature representation. We can use any kind of diffusion models (Table 1) in our GDEN layer. Parameter is a layer-specific trainable weight matrix which is used to conduct linear projection. Function denotes an activation function, such as . The output of the -th layer provides a kind of low-dimensional embedding for graph nodes.

### 2.2 Final perceptron layer

One can design different final perceptron layers for different problems. In the following, we present two kinds of perceptron layers for semi-supervised learning and link prediction problem, respectively.

(1) Semi-supervised learning

For semi-supervised learning, let indicate the set of labelled nodes and be the corresponding labels for labelled nodes. The aim of semi-supervised learning is to predict the labels for the unlabelled nodes. To do so, we can design a final perceptron layer as

(10) |

where is the label output of the final layer and is the label indication vector of node . Similar to GCN kipf2016semi (), we can use the following cross-entropy loss function over all labeled nodes for semi-supervised classification.

(11) |

(2) Graph auto-encoder

The aim of graph auto-encoder (GAE) is to reconstruct/recover an optimal graph based on input feature and initial graph . Similar to work kipf2016variational (), we can use GDEN encoder and a simple inner product decoder, which can be used for link prediction task. For GAE problem, the final perceptron layer can be designed as the inner product of the final output embedding , i.e.,

(12) |

where is the logistic sigmoid function and is the output representation of the final propagation layer. One can use MSE loss function here which is defined as

(13) |

Some other loss functions, such as cross-entropy loss, can also be used here.

### 2.3 Comparison with related works

We provide a detail comparisons with recent graph convolutional network (GCN) kipf2016semi (), diffusion convolutional recurrent neural network (DCNN) atwood2016diffusion () and recent diffusion convolutional recurrent neural network (DCRNN) li2018dcrnn_traffic ().

Previous GCN, DCNN and GCRNN generally use a random walk based diffusion process. In GCN kipf2016semi (), it utilizes a one-step diffusion while DCRNN li2018dcrnn_traffic () utilizes a finite -step truncation of diffusion on a graph. One main limitation is that the equilibrium (convergence) representation of feature diffusion can not be obtained. Also, these models can not be used directly for the data with multiple graph structures. In contrast, in our GDEN, we explore regularized diffusion models (as shown in Table 1). The benefits are three aspects. (1) They conduct feature diffusion while preserve the information of original input feature in feature representation process. (2) They have explicit optimization formulation. The equilibrium representation of diffusion in our models can also be obtained via a simple closed-form solution which can thus be computed efficiently. (3) They can be naturally extended to address the data with multiple graph structures.

## 3 Multi-GDEN

Comparing with previous (GCN) kipf2016semi (), DCNN atwood2016diffusion () and DCRNN li2018dcrnn_traffic (), one benefit of the proposed GDEN is that it can naturally deal with structured data with multiple graph structures.

Given with multiple graph structures , we aim to seek a nonlinear function to conduct dimensionality reduction and label prediction. This is known as multiple graph learning problem.

First, for multiple graphs, we can conduct feature diffusion as

(14) |

The optimal closed-form solution for this problem is given by

(15) |

where and , and is the Laplacian of the -th graph. Similarly, we can also derive multiple graph feature diffusion based on normalized Laplacian diffusion.

Then, we can thus incorporate this multiple graph feature diffusion in each layer of GDEN (Eqs.(9,10)) to achieve multiple graph diffusion and embedding.

## 4 Experiments

To evaluate the effectiveness of the proposed GDEN. We follow the experimental setup in work Yang:2016 () and test our model on the citation network datasets including Citeseer, Cora and Pubmed sen2008collective (). The detail introduction of datasets used in our experiments are summarized in Table 2.

Dataset | Type | Nodes | Edges | Classes | Features | Label rate |
---|---|---|---|---|---|---|

Citeseer | Citation network | 3327 | 4732 | 6 | 3703 | 0.036 |

Cora | Citation network | 2708 | 5429 | 7 | 1433 | 0.052 |

Pubmed | Citation network | 19717 | 44338 | 3 | 500 | 0.003 |

We compare against the same baseline methods including traditional label propagation (LP) zhu2003semi (), semi-supervised embedding (SemiEmb) weston2012deep (), manifold regularization (ManiReg) belkin2006manifold (), Planetoid Yang:2016 (), DeepWalk perozzi2014deepwalk () and graph convolutional network (GCN) kipf2016semi (). For GCN kipf2016semi (), we implement it using the pythorch code provided by the authors. For fair comparison, we also implement our GDEN by using pythorch. Results for the other baseline methods are taken from work Yang:2016 (); kipf2016semi (). We implement it with three versions, i.e., 1) GDEN-L that utilizes graph Laplacian diffusion in GDEN. 2) GDEN-RWR that utilizes random walk with restart in GDEN. 3) GDEN-NL that utilizes normalized Laplacian diffusion in GDEN. The parameter in GDEN-L, GDEN-RWR and GDEN-NL is set to 4.5, 0.91 and 0.65, respectively. Table 3 summarizes the comparison results. Here we can note that, 1) GDEN generally performs better than other competing methods, demonstrating the effectiveness and benefit of the proposed GDEN model. 2) Overall, GDEN-NL performs better than GDEN-L and GDEN-NL.

Methond | Citeseer | Cora | Pubmed |
---|---|---|---|

ManiReg belkin2006manifold () | 60.1% | 59.5% | 70.7% |

SemiEmb weston2012deep () | 59.6% | 59.0% | 71.1% |

LP zhu2003semi () | 45.3% | 68.0% | 63.0% |

DeepWalk perozzi2014deepwalk () | 43.2% | 67.2% | 65.3% |

Planetoid Yang:2016 () | 64.7% | 75.7% | 77.2% |

GCN kipf2016semi () | 70.4% | 81.4% | 78.6% |

GDEN-L | 71.3% | 81.9% | 78.7 % |

GDEN-RWR | 72.9% | 79.3% | 77.9 % |

GDEN-NL | 72.1% | 83.0% | 79.2% |

## 5 Conclusion

We present a novel graph diffusion-embedding networks (GDEN) which operates on graph-structured data. GDEN integrates both feature diffusion and low-dimensional embedding simultaneously in a unified model. Based on GDEN, we can easily deal with data with multiple graph structures. Semi-supervised learning experiments on several datasets suggest the better performance of the proposed GDEN when comparing with the recent widely used GCNs.

## References

- (1) J. Atwood and D. Towsley. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1993–2001, 2016.
- (2) M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(Nov):2399–2434, 2006.
- (3) M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016.
- (4) M. Donoser and H. Bischof. Diffusion processes for retrieval revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1320–1327, 2013.
- (5) T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- (6) T. N. Kipf and M. Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
- (7) Y. Li, R. Yu, C. Shahabi, and Y. Liu. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In International Conference on Learning Representations (ICLR ’18), 2018.
- (8) S. Lu, V. Mahadevan, and N. Vasconcelos. Learning optimal seeds for diffusion-based salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2790–2797, 2014.
- (9) B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710, 2014.
- (10) P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. Collective classification in network data. AI magazine, 29(3):93, 2008.
- (11) J. Weston, F. Ratle, H. Mobahi, and R. Collobert. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade, pages 639–655. 2012.
- (12) Z. Yang, W. W. Cohen, and R. Salakhutdinov. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning, pages 40–48, 2016.
- (13) D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In Advances in neural information processing systems, pages 321–328, 2004.
- (14) D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Schölkopf. Ranking on data manifolds. In Advances in neural information processing systems, pages 169–176, 2004.
- (15) X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pages 912–919, 2003.