ThreeStage Subspace Clustering Framework with GraphBased Transformation and Optimization
Abstract
Subspace clustering (SC) refers to the problem of clustering highdimensional data into a union of lowdimensional subspaces. Based on spectral clustering, stateoftheart approaches solve SC problem within a twostage framework. In the first stage, data representation techniques are applied to draw an affinity matrix from the original data. In the second stage, spectral clustering is directly applied to the affinity matrix so that data can be grouped into different subspaces. However, the affinity matrix obtained in the first stage usually fails to reveal the authentic relationship between data points, which leads to inaccurate clustering results. In this paper, we propose a universal ThreeStage Subspace Clustering framework (3SSC). GraphBased Transformation and Optimization (GBTO) is added between data representation and spectral clustering. The affinity matrix is obtained in the first stage, then it goes through the second stage, where the proposed GBTO is applied to generate a reconstructed affinity matrix with more authentic similarity between data points. Spectral clustering is applied after GBTO, which is the third stage. We verify our 3SSC framework with GBTO through theoretical analysis. Experiments on both synthetic data and the realworld data sets of handwritten digits and human faces demonstrate the universality of the proposed 3SSC framework in improving the connectivity and accuracy of SC methods based on , , or nuclear norm regularization.
ThreeStage Subspace Clustering Framework with GraphBased Transformation and Optimization
Shuai Yang, Wenqi Zhu, Yuesheng Zhu Communication and Information Security Lab, Institute of Big Data Technologies, Shenzhen Graduate School, Peking University ethanyang@pku.edu.cn wenqizhu@pku.edu.cn zhuys@pkusz.edu.cn
noticebox[b]Preprint. Under review.\end@float
1 Introduction
Highdimensional data, such as images and documents, are ubiquitous in many applications of computer vision, e.g., face clustering [1], image representation and compression [2], and motion segmentation [3,4]. In order to deal with these data in a highdimensional ambient space, a union of lowdimensional data is adopted to approximate the original highdimensional data, which is known as Subspace Clustering (SC) [5]. The task of SC is to partition the data into groups so that data points in a group come from the same subspace.
Many methods for subspace clustering have been proposed, including algebraic [6], iterative [7][10], statistical [11][16], and spectral clustering based methods [17][27]. Stateoftheart methods tend to adopt spectral clustering due to its advantage to reveal the relationship between data points based on graph. In spectral clustering, data points are viewed as vertexes in set in an undirected graph , and the task is to partition into different subgraphs. Usually, an affinity matrix (or similarity matrix) which has the similarity between each vertex pair as its elements is generated to represent . By applying spectral clustering to , the data points will be clustered into different subspaces precisely.
Spectral clustering based methods usually follow a twostage framework. In the first stage, with data representation techniques, a coefficient matrix is generated from the original data based on the selfexpressiveness property of data belonging to the same subspace, and then an affinity matrix is obtained from . That is,
(1) 
(2) 
where is the data matrix, is the coefficients matrix, and is the affinity matrix. Then spectral clustering is directly applied to to partition data points into subspaces in the second stage.
However, the affinity matrix obtained in the first stage of the twostage framework contains insufficient information of since it’s usually sparse or low rank, which is obviously not capable to represent the relationship between data points and inappropriate to be applied with spectral clustering directly. Applying spectral clustering to will lead to inaccurate results especially in some data sets with large number of data points. Two factors lead to this problem. Firstly, too much elements in deny potential relations among data points, which makes the number of connections between data points particularly small. Secondly, nonzero elements in cannot reveal the authentic similarity between data points.
In this paper, we propose a universal ThreeStage Subspace Clustering framework (3SSC), which aims to overcome the drawbacks of the twostage SC framework. It provides a GraphBased Transformation and Optimization (GBTO) mechanism which turns the original obtained from data representation techniques into an optimized affinity matrix which is more capable to represent the distribution of data points in the highdimensional ambient space. In GBTO, we adopt the classic FloydWarshall algorithm as an optimization strategy to solve this SC problem. Besides, depending on different application scenarios, we propose two implementations of the 3SSC framework which are Hard 3SSC and Soft 3SSC.
In addition, we compare and analyze different transformation strategies in GBTO which determines how weight and distance are transformed into each other. Finally, we set experiments on synthetic data and realworld data sets of handwritten digits as well as human faces with varying lighting to verify our theoretical analysis. The universality of the 3SSC framework in improving the accuracy and connectivity of SC methods with different norm regularizations is demonstrated in comparison with traditional twostage framework.
Related work.
Current spectral clustering based methods tend to apply , , or nuclear norm regularization on the coefficient matrix , including original Sparse Subspace Clustering using norm (SSC) [28][29], Least Squares Regression (LSR) [17] using norm, Low Rank Representation (LRR) [30][31] using nuclear norm, Elastic Net Subspace Clustering (ENSC) [27] using a mixture of and norm and SSC by Orthogonal Matching Pursuit (OMP) [26] using norm. Meanwhile, Block Diagonal Representation (BDR) [32] uses block diagonal matrix induced regularizer to directly pursue the block diagonal matrix. Such methods divide the SC problem into two steps described as twostage SC framework. Though Structured Sparse Subspace Clustering (SSSC) [33] uses a joint affinity learning and subspace clustering framework to reweight the representation matrix, it still ignores that generated by (SSC) in the first stage cannot represent the authentic distribution of highdimensional data points. Our paper presents a more universal framework which can be applied to SC methods with different data representation techniques in the first stage.
2 ThreeStage Subspace Clustering: A universal framework
The main difference between these spectral clustering based SC methods lies in how the affinity matrix is obtained. Therefore, these algorithms can be concluded in a TwoStage Subspace Clustering framework. In order to deal with its drawbacks mentioned in Section 1.1, we propose a ThreeStage Subspace Clustering, which is universal for stateoftheart SC algorithms. Step 3 in Algorithm 1 is added in comparison with twostage SC. As is shown in Figure 1, 3SSC contains GraphBased Transformation and Optimization, details of which is in Section 3. Depending on the application scenario, two implementations of the 3SSC framework are proposed:

Hard 3SSC: in this case all elements in affinity matrix are applied with GraphBased transformation and optimization, in which way the number of nonzero entries in will increase. It works well especially in data sets with plenty subspaces and data points.

Soft 3SSC: in this case only nonzero elements in W are applied with GBTO, which will not change the sparsity in . It just reconstructs the existing relations by adjusting similarity.
3 Graphbased Transformation and Optimization (GBTO)
In the last stage of 3SSC, we apply spectral clustering to get SC results. To accomplish this, an affinity (or similarity) matrix containing sufficient information of how highdimensional data points distribute needs to be generated. As described in Section 1, the affinity matrix obtained in the first stage is usually sparse or low rank, which is obviously unqualified to represent the relationship between data points and inappropriate to be applied with spectral clustering directly. To solve this problem, Graphbased Transformation and Optimization algorithm (GBTO) is proposed.
Definition 1 (Simulated distances and simulated graph)
Elements in matrix generated from the affinity matrix to represent distances between data points are called simulated distances and is called simulated distances matrix, while the graph containing data points with simulated distances is defined as simulated graph.
Stage 2 in Figure 1 describes how GBTO works. By transforming the affinity matrix into simulated distances matrix via WeightDistance Transformation (WDT), a simulated undirected graph is generated. Then with minimized distances is generated from by applying FloydWarshall, after which a new affinity matrix with optimized similarity is generated by applying DistanceWeight Transformation (DWT) to .
Obviously, GBTO can be divided into two parts: transformation between weights and distances (section 3.1), and optimization for simulated distances which determines how the relationship between data points is reconstructed (section 3.2).
3.1 Transformation strategy between weights and distances (WDT and DWT)
In an undirected graph with vertex set , the weight (or similarity) between two data points grows approximately in inverse proportion to the distance between them, and the transformation should obey this rule as well. Due to the column normalization, the weight are distributed in the range . Thus WDT and DWT can be approximated as , where here stands for weights. Both f(x) and its inverse transformation are simple in computation. As is shown in Figure 2, compared with another proposed transformation ln, enjoys more stable changes when approaches , which means the transformation pays more attention to small weights. It’s a better mapping for GBTO aiming to deal with abnormally small similarity which is irrational in ternary relationship as described later in (6) and (7). Thus, transformation algorithms between affinity matrix and distances matrix are proposed as below.
WDT is applied to the original affinity matrix , and it transforms into a simulated distances matrix. After optimization for simulated distances, is generated from and it needs transforming back into affinity matrix so that spectral clustering can be applied. Thus the inverse transformation of WDT which is called DWT is proposed as follows.
3.2 Optimization strategy for simulated distances matrix D
After WDT, the weights are transformed into simulated distances so that graphbased optimization can be applied. in means no similarity between data points, so the corresponding distance is defined as . Elements in such as stands for the distance from data point to , and it’s obvious that is symmetric which has as diagonal elements. In the first stage, elements in fail to reveal the authentic relations between data points, thus the simulated distances generated from are not precise either. In the simulated undirected graph with vertex set , many potential connections are not established, which results in extremely overlarge distances or even infinite distances. So it’s a problem of how to minimize the distances between data points. We adopt the classic FloydWarshall algorithm [40] which aims to find the shortest paths between all pairs of vertices in a weighted graph with positive or negative edge weights, after which can be optimized as with shortest distances between all data points.
Theorem 1.
FloydWarshall minimizes the distances between highdimensional data points.
(3) 
Corollary 1.
3SSC with GBTO optimizes the similarity between data points.
(4) 
After traversal of all intermediate points , , the Floyd–Warshall algorithm compares all potential paths through the graph between each pair of data points. The left upper of Figure 3 is the simulated graph as proposed in Section 3.1. After FloydWarshall, some new connections have been created which are shown as dotted lines in the right upper of Figure 3. Besides, some existing connections are optimized by minimizing the distances via intermediate nodes, such as the connection between node and which is optimized via intermediate node .
3.3 3SSC applied with GBTO
With the implementation of GBTO, 3SSC can be applied to most of spectral clustering based SC methods. 3SSC with GBTO applied with Algorithm 5 is defined as Hard 3SSC, which means both zero and nonzero elements in affinity matrix are optimized. This will lead to more nonzero elements in affinity matrix. For Soft 3SSC, restriction as is added in step 6 of Algorithm 5 to ensure only nonzero elements in are optimized, in which way only the existing relations are optimized and there is no risk that wrong connections between subspaces are established. Compared with Hard 3SSC, Soft 3SSC is a more conservative optimization method which improves spectral clustering accuracy without changing sparsity.
(5) 
In the first stage, different techniques are applied to solve the problem in (5). Usually it’s simplified as a or problem which is not NP hard. in are coefficients for to be written as a linear combination of other points based on selfexpressiveness. Multiple solutions for vary from each other, so selfexpressiveness could not show the global linear correlation of data.
Lemma 1.
3SSC with GBTO optimizes the expression of linear correlation of data points.
With standing for subspace, (1) shows the selfexpressiveness property of data lying in the same subspace, which can be derived that:
(6) 
However, the coefficients matrix obtained in the first stage doesn’t possess good selfexpressiveness. For example, if we initialize , in some special cases:
(7) 
where and belong to the same subspace, while the opposite conclusion can be derived from . More commonly, for some nonzero entries in :
(8) 
the distance from intermediate point to and are small while is far from , which is irrational in ternary relationship. After GBTO, the distances between data points are minimized and the similarity is optimized, which can be viewed as optimization for as well. That is:
(9) 
Thus, 3SSC with GBTO optimizes the data representation to possess better linear correlation expression.
Lemma 2.
In spectral clustering, higher similarity of edges inside subgraphs leads to smaller cut.
Stateoftheart spectral clustering adopt Ncut [34] to gain a more accurate result.Ncut is defined as :
(10) 
where is the degree of dissimilarity between these two subgraphs and , which can be computed as total weight of the edges that have been removed, and is the total connection from vertexes in to all vertexes in graph . It can be rewritten as:
(11) 
where if vertex and if , and . For the task of partitioning graph into pieces, the Ncut for SC can be rewritten as:
(12) 
where is defined as the sum of the weights of all edges in subgraph whose nodes belong to , and is graph without . After applied with 3SSC, with optimized similarity is generated. The increment of weights inside subgraphs is much larger than that between subgraphs especially in graph with more subgraphs and more nodes per subgraph. That is:
(13) 
This is obvious since data points belong to the same subspace have larger similarity and more connections inside subgraphs are established. So it can be derived that:
(14) 
Theorem 2.
3SSC with GBTO improves accuracy and connectivity by higher similarity which is closer to the truth.
(15) 
This can be concluded from (4) and (15), which show how 3SSC affects the similarity. This means 3SSC with GBTO improves clustering accuracy and connectivity by optimized similarity.
4 Experiments
We compare the performance of spectral clustering based SC methods with these implemented as 3SSC methods, including SSC, OMP, ENSC, LRR, LSR and BDR, which contain , , and nuclear norm. These methods applied with hard 3SSC are prefixed with ’3S’, such as 3SSSC. Those applied with soft 3SSC are prefixed with ’soft’. Parameters are set as recommended ( for ENSC, for BDR). Experiments are set on synthetic data, handwritten digits set MNIST [35] and USPS [36], and human faces set Extended Yale B (EYaleB)[37] with 50 trails. Connectivity, Clustering Accuracy and Normalized Mutual Information (NMI) [38][39] are metrics to evaluate the performance. Connectivity is defined as the second smallest eigenvalue of the normalized Laplacian , where is the degree matrix of graph . NMI quantifies the amount of information obtained by clustering results compared with the groundtruth.
Synthetic Experiments. Figure 4 shows how clustering accuracy changes with samples per subspace on synthetic data with subspaces each of dimension in ambient space of dimension ( for BDR). Hard 3SSC improves accuracy for LSR, LRR and BDR, while soft 3SSC works well in SSC, OMP and ENSC with and norm. NMI synchronously changes with accuracy, and connectivity of 3SSC methods is usually higher than 80%.
Clustering Handwritten Digits. Experiments on two data sets are set differently. The number of subjects chosen to be clustered changes on MNIST, while samples per subspace changes on USPS. The feature vectors for images in MNIST are projected to dimension 500 and 200 for USPS by PCA. As is shown in Table 1, Figure 5(a) and 5(b), in realworld data sets of handwritten digits, 3SSC methods usually obtain higher accuracy and NMI. In terms of data sets with more samples per subspace, GBTO works better due to more authentic relations between data points.
Samples  50  100  200  400 

SSC  62.68  65.05  60.97  60.13 
3SSSC  70.50  78.45  73.13  82.30 
LSR  36.10  68.18  71.09  71.11 
3SLSR  37.24  71.80  73.79  75.88 
LRR  67.42  64.86  61.86  62.85 
3SLRR  71.01  70.25  69.02  71.74 
OMP  58.90  61.39  59.99  61.62 
3SOMP  67.28  70.57  74.41  73.13 
ENSC  57.94  60.34  60.62  59.21 
3SENSC  63.50  73.87  69.09  71.48 
BDRZ  68.86  66.45  60.07  53.54 
3SBDRZ  71.60  72.16  73.28  72.55 
BDRB  68.18  65.98  64.61  57.35 
3SBDRB  70.41  72.50  73.53  72.99 
Clustering Human Faces with Varying Lighting. Data points are images downsampled from to . Table 2 shows accuracy and Figure 5(c) reports connectivity. For OMP, slight improvements has been made when applied with soft 3SSC since no new connections are created. For ENSC and BDR, accuracy decrease of 3SSC methods occurs when clustering 5 or 10 subjects because more connections between subspaces are established compared with inner connections when data sets are small. Considerable increase can be observed when clustered subjects are more than 10.
The time complexity of FloydWarshall is , while the time cost of 3SSC is still acceptable even on limited computing resources. For larger and more complex data sets, techniques like vectorization for loops in ®MATLAB could reduce time. Moreover, the performance of BDR varies greatly when and are set differently, while 3SBDR overcomes the overdependence on parameters.
5 Conclusion
We proposed a ThreeStage Subspace Clustering framework (3SSC) with two implementations (Hard & Soft), in which GraphBased Transformation and Optimization (GBTO) is applied to optimize the representation of authentic data distribution. 3SSC is universal for SC methods with different regularizations, and the effectiveness of it is demonstrated on several data sets. We note that 3SSC sometimes doesn’t work well in small data sets and this is left for future research.
Acknowledgments
This work was supported in part by the Shenzhen Municipal Development and Reform Commission (Disciplinary Development Program for Data Science and Intelligent Computing), in part by Shenzhen International cooperative research projects GJHZ20170313150021171, and in part by NSFCShenzhen Robot Jointed Founding (U1613215).
References
[1] R. Basri and D. W. Jacobs, "Lambertian reflectance and linear subspaces," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 2, pp. 218233, Feb. 2003.
[2] W. Hong, J. Wright, K. Huang and Y. Ma, "Multiscale Hybrid Linear Models for Lossy Image Representation," IEEE Transactions on Image Processing, vol. 15, no. 12, pp. 36553671, Dec. 2006.
[3] J. Costeira and T. Kanade, “A Multibody Factorization Method for Independently Moving Objects,” International Journal of Computer Vision, vol. 29, no. 3, pp. 159179, 1998.
[4] K. Kanatani, "Motion segmentation by subspace separation and model selection," in Proceedings of IEEE 8th International Conference on Computer Vision, vol.2, pp. 586591, 2001
[5] R. Vidal, “Subspace Clustering,” Signal Processing Magazine, vol. 28, no. 2, pp. 5268, 2011.
[6] R. Vidal, Y. Ma, and S. Sastry, “Generalized principal component analysis (GPCA),” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1945–1959, Dec. 2005.
[7] P. S. Bradley and O. L. Mangasarian, “kplane clustering,” Journal of Global Optimization, vol. 16, no. 1, pp. 23–32, 2000.
[8] P. Tseng, “Nearest qflat to m points,” Journal of Optimization Theory and Applications, vol. 105, no. 1, pp. 249–252, 2000.
[9] T. Zhang, A. Szlam, and G. Lerman, “Median kflats for hybrid linear modeling with many outliers,” in Proceedings of IEEE 12th International Conference on Computer Vision Workshops, pp. 234–241, 2009.
[10] P. Agarwal and N. Mustafa, “kmeans projective clustering,” in Proceedings of ACM Symposium on Principles of Database Systems, pp. 155–165, 2004.
[11] T. E. Boult and L. G. Brown, “Factorizationbased segmentation of motions,” in Proceedings of IEEE Workshop Motion Understanding, pp. 179–186, 1991.
[12] Leonardis, H. Bischof, and J.Maver, “Multiple eigenspaces,” Pattern Recognition, vol. 35, no. 11, pp. 2613–2627, 2002.
[13] Archambeau, N. Delannay, and M. Verleysen, “Mixtures of robust probabilistic principal component analyzers,” Neurocomputing, vol. 71, nos. 7–9, pp. 1274–1282, 2008.
[14] Gruber and Y. Weiss, “Multibody factorization with uncertainty and missing data using the EM algorithm,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 707–714, 2004.
[15] Y. Ma, H. Derksen, W. Hong, and J. Wright, “Segmentation of multivariate mixed data via lossy data coding and compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1546–1562, 2007.
[16] Y. Yang, S. R. Rao, and Y. Ma, “Robust statistical estimation and segmentation of multiple subspaces,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshop, p. 99, 2006.
[17] C.Y. Lu, H. Min, Z.Q. Zhao, L. Zhu, D.S. Huang, and S. Yan, “Robust and efficient subspace segmentation via least squares regression,” in Proceedings of European Conference on Computer Vision, pp. 347–360, 2012.
[18] R. Vidal and P. Favaro, “Low rank subspace clustering (LRSC),” Pattern Recognition Letter, vol. 43, pp. 47–61, 2014.
[19] Lu, J. Tang, M. Lin, L. Lin, S. Yan, and Z. Lin, “Correntropy induced L2 graph for robust subspace clustering,” in Proceedings of IEEE International Conference Computer Vision, pp. 1801–1808, 2013.
[20] Y. X. Wang, H. Xu, and C. Leng, “Provable subspace clustering: When LRR meets SSC,” in Proceedings of Neural Information Processing Systems, pp. 64–72, 2013.
[21] E. L. Dyer, A. C. Sankaranarayanan, and R. G. Baraniuk, “Greedy feature selection for subspace clustering,” Journal of Machine Learning Research, vol. 14, no. 1, pp. 2487–2517, 2013.
[22] Y. Zhang, Z. Sun, R. He, and T. Tan, “Robust subspace clustering via halfquadratic minimization,” in Proceedings of IEEE International Conference Computer Vision, pp. 3096–3103, 2013.
[23] Park, C. Caramanis, and S. Sanghavi, “Greedy subspace clustering,” in Proceedings of Neural Information Processing Systems, pp. 2753–2761, 2014.
[24] R. Heckel and H. Bölcskei, “Robust subspace clustering via thresholding,” IEEE Transactions on Information Theory, vol. 61, no. 11, pp. 6320–6342, 2015.
[25] B. Li, Y. Zhang, Z. Lin, and H. Lu, “Subspace clustering by mixture of Gaussian regression,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2094–2102, 2015.
[26] You, D. Robinson, and R. Vidal, “Scalable sparse subspace clustering by orthogonal matching pursuit,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3918–3927, 2016.
[27] C. You, C.G. Li, D. P. Robinson, and R. Vidal, “Oracle based active set algorithm for scalable elastic net subspace clustering,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3928–3937, 2016.
[28] Elhamifar and R. Vidal, “Sparse subspace clustering,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2797, 2009.
[29] Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2765–2781, 2013.
[30] Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by lowrank representation,” in Proceedings of International Conference on Machine Learning, pp. 663–670, 2010.
[31] Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by lowrank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013.
[32] C. Lu, J. Feng, Z. Lin, T. Mei and S. Yan, "Subspace Clustering by Block Diagonal Representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 487501, 2019.
[33] C. Li, C. You and R. Vidal, "Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework," IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 29883001, 2017.
[34] J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888905, 2000.
[35] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradientbased learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 22782324, 1998.
[36] J. Hull, “A database for handwritten text recognition research,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, pp. 550–554, 1994.
[37] S. Georghiades, P. N. Belhumeur, and D. Kriegman, “From few to many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643–660, 2001.
[38] X. Wang, X. Guo, Z. Lei, C. Zhang and S. Z. Li, "ExclusivityConsistency Regularized Multiview Subspace Clustering," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 19, 2017.
[39] M. Yin, Y. Guo, J. Gao, Z. He and S. Xie, "Kernel Sparse Subspace Clustering on Symmetric Positive Definite Manifolds," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 51575164, 2016.
[40] Floyd, Robert W, "Algorithm 97: Shortest Path," Communications of the ACM, vol. 5, no. 6, pp. 345, 1962.