GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data
Abstract
Recent research on pattern discovery has progressed from mining frequent patterns and sequences to mining structured patterns, such as trees and graphs. Graphs as general data structure can model complex relations among data with wide applications in web exploration and social networks. However, the process of mining large graph patterns is a challenge due to the existence of large number of subgraphs. In this paper, we aim to mine only frequent complete graph patterns. A graph in a database is complete if every pair of distinct vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining algorithm developed to explore interesting pruning techniques to extract maximal complete graphs from large spatial dataset existing in Sloan Digital Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high efficiency especially in the presence of large number of patterns. In this paper, we describe GCG that can mine not only simple colocation spatial patterns but also complex ones. To the best of our knowledge, this is the first algorithm used to exploit the extraction of maximal complete graphs in the process of mining complex colocation patterns in large spatial dataset.
I Introduction
With the rapid invention of advanced technology, researchers have been collecting large amounts of data on a continuous or periodic basis in many fields. This data becomes the potential for researchers to discover useful information and knowledge that has not been seen before. In order to process this data and extract useful information, the data needs to be organised in a suitable format. Hence data preparation plays a very important role in the data mining process.
The focus of this study is mainly on extracting interesting complex patterns from Sloan Digital Sky Survey (SDSS) astronomy dataset [1]. The SDSS is the most motivated astronomical survey project ever undertaken. The survey maps in detail onequarter of the entire sky, determining the positions and absolute brightness of more than 100 million celestial objects. The first official Data Release (DR1) of SDSS was in June 2003. Since then there have been many new releases including the ninth major release (DR9) in August 2012 that provides images, imaging catalogs, spectra, and redshift. Release DR9 contains more than 5TB of data, which includes measures of 500 million unique celestial objects.
Availability of such large amount of useful data is an obvious opportunity for application of data mining techniques to extract interesting information. However, while much research has been done by the astronomical researchers, a feeble effort has been made to apply data mining techniques on SDSS data. That is because the SDSS data format is not suitable for mining purposes, that is the main motivation of this paper.
As mentioned in [2] spatial databases store spatial attributes about objects, and hence, SDSS is a large spatial dataset as it contains many attributes for each object. One of the most significant problems in spatial data mining is to find object types that frequently colocate with each other in large databases. Colocation means objects that are found in the neighborhood of each other. The proposed approach mines colocation patterns in SDSS data and uses these patterns to generate interesting information about different types of galaxies. In this work only the galaxies existing in SDSS is used. However, this approach could be generalised to be used with any other celestial objects.
The data preparation plays a vital rolein the mining process, hence it is done in two folds. First, extracting the galaxies in SDSS data and categorising them into “Early” and “Late” type galaxies. Second, our proposed algorithm GCG is utilised to generate colocation patterns (maximal complete graphs) from the data. A complete graph is any set of spatial objects such that all objects in the set colocate. A maximal complete graph is a complete graph which is not a subset of any other complete graph. Fig. 1 depicts some examples of spatial colocations, the line between the vertices (objects) indicates that they are colocated. The second column of Table I displays the maximal complete graph patterns, which are presented in Fig. 1.
The full general problem of extracting maximal complete graphs from a graph is known as NPHard. GCG efficiently extracts maximal complete graphs in a given spatial database with capability to divide the space into grid structure based on a predefined distance. A divide and conquer strategy is applied via a grid structure to reduce the search space.
ID  Maximal Complete Graphs  Transactions 

1  {C1,D1}  {C,D} 
2  {C5,D7}  {C,D} 
3  {B1,C5}  {B,C} 
4  {A4,D2,D5}  {A,D+} 
5  {A5,C3,D3}  {A,C,D} 
6  {A5,D3,D6}  {A,D+} 
7  {A2,B2,D4}  {A,B,D} 
8  {A1,A3,C2,C4,C6}  {A+,C+} 
In this paper we focus on using maximal complete graphs to allow us to mine interesting complex spatial relationships between the object types. A complex spatial relationship includes not only whether an object type, say , is present in a (maximal) complete graph, but also:

Whether more than one object of its type is present in the maximal complete graph. This is called a positive type and is denoted by .

Whether objects of a particular type are not present in a maximal complete graph – that is, the absence of types. This is called a negative type and is denoted by .
The inclusion of positive and / or negative types makes a relationship complex. This allows us to mine patterns that say, for example, that occurs with multiple ’s but not with a . That is, the presence of may imply the presence of multiple ’s and the absence of . This is interesting in the astronomy domain. The last column of Table I shows examples of (maximal) complex relationships.
Maximal complete graphs generated by GCG can be represented as transactions as given in Column 3 of Table I. These transactions can be used by ANY association rule mining technique. Association rule mining techniques that proposed by [3, 4] can generate useful rules, which will be interpreted as relationships between objects.
We are not interested in maximal complex patterns (relationships) in themselves, as they provide only local information (that is, about a maximal complete graph). We are however interested in sets of object types (including complex types), that appear across the entire dataset (that is, amongst many maximal complete graphs). In other words, we are interested in mining interesting complex spatial relationships (sets), where “interesting” is defined by a global measure. We use a variation of the [12] measure to define interestingness.
Ia Problem Statement
Given the set of maximal complete graphs, find all interesting complex patterns that occur amongst the set of maximal complete graphs. More specifically, find all sets of object types, including positive and negative (that is, complex) types that are interesting as defined by their being above a threshold.
This problem therefore becomes an itemset mining task. In order to do this very quickly, we use interesting itemset mining algorithm, that is GLIMIT [5].
Including negative types makes the problem much more difficult, as it is typical for spatial data to be sparse. This means that the absence of a type can be very common.
IB Contributions
In this paper we make the following contributions:

An efficient algorithm called GCG is proposed to generate all maximal complete graph patterns that exist in large spatial datasets.

We introduce the concept of maximal complete graph. We demonstrate how the use of maximal complete graphs makes more sense than using complete graphs, and we showed that they allow the use of negative patterns.

We show that complex and interesting colocation patterns can be efficiently extracted from huge and sparse spatial datasets.
Ii Maximal Complete Graph Mining
Figure 2 shows the overall process flow of our method. The follwoing subsections elaborate more about our approach.
Iia Data Extraction and Categorisation
Raw data needs most of the time to be prepared to suit data mining algorithms. This section illustrates the method of extracting the important attributes from the SDSS database. These attributes used to categorise galaxy objects. A view called SpecPhoto which is derived from a table called SpecPhotoAll is used. The latter is a joined table between the PhotoObjAll and SpecObjAll tables. In other words, SpecPhoto is view of joined Spectro and PhotoObjects that have the clean spectra^{1}^{1}1http://www.sdss.org/.
The concern was to extract only the galaxy objects from the SDSS using parameter (object type=0). The total number of galaxytype objects stored in the SDSS catalog is more thant 507,594. However, to ensure the accuracy for calculating the distance between objects and the earth which leads to calculate the and coordinates for each object, some parameters are used, such as (the rigid objects) and (correct RedShift). Therefore, the number of objects is reduced to (442,923).
SDSS release 9 provides a table called Neighbors. This table contains all objects that are located within 0.5 arcmins, this makes it not useful in this study because there is no ability to choose any distance that would form the neighborhood relation between objects. For example, in our experiments megaparsec (distances) are used as the thresholds to check whether objects are close to each other or not. Table II discloses the extracted fields from the SDSS (DR9) that used during the preparation process.
Data extraction: The data was obtained from SDSS (DR9) [6]. This data is extracted from the online catalog services using several SQL statements and tools, which offered by the catalog. These tools are accessible from the SDSS site^{2}^{2}2http://cas.sdss.org/dr5/en/tools/search/sql.asp.
Data transformation: The data obtained from the previous step is transformed to identify the categories of the glaxies and to represent the data in the right format.
No  Field name  Field description 

1.  specObjID  Unique ID 
2.  z  Final RedShift 
3.  ra  Right ascention 
4.  dec  Declination 
5.  cx  x of Normal unit vector 
6.  cy  y of Normal unit vector 
7.  cz  z of Normal unit vector 
8.  primTarget  prime target categories 
9.  objType  object type : Galaxy =0 
10.  modelMag_u  Ultraviolet magniutde 
11.  modelMag_r  Red Light magnitude 
New attributes creation: With all necessary fields, this step is to calculate the exact value of the and coordinates which are not explicitly shown in the SDSS data. First , the distance between objects and the earth is calculated using Hubble’s law and the value of for each object as in Equation ( 1 ) . Second, by considering the unit vectors and , and multiplying them by the , the value of and coordinates are calculated by Equations 2, 3, and 4, respectively.
(1) 
where is the speed of light, is the object RedShift, and is Hubbles’ constant. Currently the best estimate for this constant is 71 [7, 8].
(2) 
(3) 
(4) 
Galaxies Categorisation: Different parameters were used to categorise galaxy types. Based on the difference between Ultraviolet and Red light magnitude , galaxies are categorised as either “Early”’ or “Late”’. If the difference is greater than or equal to the galaxy is “Early”’, otherwise it is “Late”’. The value of the rband Petrosian magnitude indicates whether the galaxy is “Main”’ (close to the earth) or “Luminous Red Galaxies” (). That is by checking the value of rband. If rband , that indicates that the object is “Main” galaxy otherwise it is “LRG” [9]. The four galaxy types that found are MainLate, MainEarly, LRGLate, and LRGEarly.
IiB Basic Definitions and Concepts
This section briefly defines the concepts that are used in this paper.
Consider a set of objects with fixed locations. Given an appropriate distance measure we can define a graph as follows; let be the vertices and construct an edge between two objects and if , where is a chosen distance. A colocation pattern is a connected subgraph.
Definition 1 (Complete Graph)
A Complete Graph is any fully connected subgraph of . That is, .
For example, in Fig. 1, {A4,D2,D5} form a complete graph as each object colocates with each other. Similarly {C5,D7} form another complete graph.
As we have mentioned in Section I we use maximal complete graphs so that we can define and use complex patterns meaningfully and to avoid double counting.
Definition 2 (Maximal Complete Graph)
A maximal complete graph is a complete graph that is not a subset (subgraph) of any other complete graph.
In Fig. 1, {A4,D2,D5} form a maximal complete graph as it is not a subset of another complete graph. However, {A2,D4} is not a maximal complete graph since it is a subset of the complete graph {A2,B2,D4}.
The mining of maximal complete graphs is done directly – it does not require mining all subcomplete graphs first.
Definition 3 (Complete Graph’s Cardinality)
It is the number of vertices in a complete graph, that is . In other words, it is the size of the complete graph. This value can be used to find the total number of edges (Equation 5) that the complete graph can have. The below equation shows that.
(5) 
where is the number of vertices.
IiC Mining Maximal Complete Graphs
First, data preperation process starts a maximal complete graph mining algorithm to extract all maximal complete graphs, and strips them of the object identifiers (producing raw maximal complete craphs as shown in Table VI. One pass is then made over the raw maximal complete graphs in order to extract complex relationships. We describe this in Section IIF. This produces complex maximal complete graphs. Each of these complex maximal complete graphs is then considered as a transaction , and an interesting itemset mining algorithm, using as the interestingness measure, is used to extract the interesting complex relationships.
In itemset mining, the dataset consists of a set of transactions , where each transaction is a subset of a set of items ; that is, . In our work, the set of complex maximal complete graphs (relationships) becomes the set of transactions (third column in Table I). The items are the object types – including the complex types such as and . For example, if the object types are , and each of these types is present and absent in at least one maximal complete graph, then . An interesting itemset mining algorithm mines for interesting itemsets. The support of an itemset is the number of transactions containing the itemset: . So called frequent itemset mining uses the support as the measure of interestingness. For reasons described in Section I we use minPI [10] which, under the mapping described above, is equivalent to
Since is antimonotonic, we can easily prune the search space for interesting patterns. We adopted the method used in GLIMIT ([5, 11]) to mine the interesting patterns from maximal complete graphs. GLIMIT is a very fast and efficient itemset mining algorithm that has been shown to outperform Apriori like algorithms [12] and FPGrowth [13].
As shown in Fig. 2, the complete graph generation and complex relationship extraction are local procedures, in the sense that they deal only with individual maximal complete graphs. In contrast, the interesting pattern mining is global – it finds patterns that occur across the entire space. Secondly, we consider subsets of maximal complete graphs only in the last step – after the complex patterns have been extracted.
Object type  XCoordinate  YCoordinate 
A1  2.5  4.5 
A2  6  4 
A3  2  9 
B1  1.5  3.5 
B2  5  3 
B3  5  4 
C1  2.5  3 
C2  6  3 
D1  3  9 
D2  7  1.5 
IiD GCG algorithm
Algorithm 1 reveals the pseudocode of the GCG algorithm. This section shows how the algorithm works through an example. By assuming that all objects are spatial, we use Fig. 3 to depict some example items and their locations. These objects and their coordinates are given in Table III. It should be noted that SDSS is three dimensional dataset, but in the example two dimensions are used for the sake of simplicity.
Lists  Members 

1  {A1, B1, C1} 
2*  {B1, A1, C1} 
3*  {C1, A1, B1} 
4  {D1, A3} 
5  {A2, B2, C2, B3} 
6*  {B2, A2, B3, C2} 
7**  {C2, A2, B2, B3, D2} 
8  {D2, C2} 
9*  {A3, D1} 
10*  {B3, C2, B2, A2} 
Lists  1  4  5  8 

Members  {A1,B1,C1}  {D1,A3}  {A2,B2,C2,B3}  {D2,C2} 
Edges in each subgraph are formed by calculating the distance between adjacent objects. In other word, if the distance between them , the edge will be created. Each subgraph, in this context, forms a colocation pattern. Therefore, results of this algorithm are patterns containing objects that are colocated. GCG algorithm (1) functionality is described as follows:

Lines 1  12: Dividing the space into a grid structure and concurrently placing each point into its particular grid cell based on its coordinates (Fig. 3). The size of the grid cell is , where . The value of is given as one of the inputs for the GCG algorithm.

Lines 13  25: Finding each object’s neighborhood lists. This step is the most important one, and it is the most crucial step for the complexity issue. It uses the Euclidean distance technique to check the neighborhood relationship between objects. However, the number of checked spatial objects depends on the density of the grid and the content of the neighbor cells^{3}^{3}3Number of neighbor cells is 9 or 27 if the data is 2D or 3D, respectively.. According to the example in Fig. 3, also because the sample data contains 10 objects, a list for each object is created except for those objects that are located lonely. Our concern is to find colocation patterns that have number of members 2 (i.e. ); because one object does not form any type of relationship. Consequently, no need to count objects that do not have connections (i.e. relationship) with at least one another object. However, in our example all objects share relationships. For example, object {A1} has a relationship with {B1,C1} and object {A2} with {B2, B3, C2}. It can be seen that these objects share the same location, this means {A1, B1, C1} are colocated because the distance between them is . Table IV shows some redundant lists – marked by * – (same objects in different order); this gives us the chance to prune the complete list without losing the objects as they present in another list.

Lines 26  38: Pruning any neighbor list that contains at least one object violating the colocation condition. For example, list 7 is pruned because two of its members {A2,D2} are not close to each other as given in Table IV (lists marked by **).
As a result of the previous steps, list of maximal complete graphs will be formed. For example, {A1, B1, C1} forms a maximal complete graph and so forth for lists (4, 5, 8) as shown in Table V.
IiE GCG algorithm analysis
This section discusses the GCG algorithm completeness, correctness, and complexity.
Completeness: All objects in neighbor lists appear as set or subset in maximal complete graph lists. After acquiring the entire neighbors for each point, another check among these neighbors is done to assure that all points are neighbors to each other. Intuitively, doing that results to have repeated neighbor lists. Therefore, this ensures finding all maximal complete graphs in any given graph.
Correctness: Every subset of a maximal complete graph appears in the neighbors list. Thus, all maximal complete graphs that appear in maximal complete graph’s list will not be found as a subset in another maximal complete graph. That is, the definition of maximal complete graph. Fig. 4 displays an undirect graph and the neighborhood list and the existed maximal complete graph patterns. It is very clear that the pair does not appear in the neighborhood list, because the distance between (i.e. no edge between them). As a result, the pair will not be included in the maximal complete graphs’ list. In other words, any subset of any maximal complete graph appears in the neighborhood list and it will not appear as an independent maximal complete graph. By this, the correctness of the proposed algorithm is shown.
Complexity: Assume there is points and cells in a gird, and assume that all points are uniformly distributed. Hence, on average there is points per cell. Also, assume each cell has neighbors. Then to create the neighborhood list of one point points need to be examined to check if they are within distance . Since the total number of points is , thus the cost is . And since , an assumption, that this part of the algorithm is subquadratic, can be stated. Second, pruning neighborhood lists assuming that on average the length of each neighborhood list is . Then for each neighborhood list, other lists have to be examined to check if a point is in others neighborhood list or not. Therefore, for each point, other neighborhood lists are examined as well as within each one, up to points will be checked. Consequently, the cost is . Finally, the total cost is the cost to put the points in cell (O (N)), the cost to create the neighborhood lists , and the cost to prune the lists . The total complexity of the algorithm is .
ID  Maximal  Raw Maximal  Complex 

Complete Graphs  Complete Graphs  Relationships  
1  {A3, B1, B2, B3}  {A, B, B, B}  {A, B, B+, C} 
2  {B1, C1}  {B, C}  {A, B, C} 
3  {A1, A2, B}  {A, A, B}  {A, A+, B, C} 
IiF Extracting Complex Relationships
A relationship is called complex if it consists of complex types as defined in Section I.
Extracting a complex relationship from a maximal complete graph is straightforward – we simply use the following rules for every type :

First, remove the object identifiers. This produces a “raw” maximal complete graph .

If contains an object with type , .

If contains more than one object of type , .

If does not contain an object of type , .
Note that if includes a positive type , it will also always include the basic type . This is necessary to that maximal complete graphs that contain will also be counted as containing when we mine for interesting patterns.
Recall that the negative type only makes sense if we use maximal complete graphs. The last column of Table VI shows the result of applying all four rules.
Iii Experiments and Results Discussion
Experiments are carried out to confirm the achieved results when using the proposed algorithm on the SDSS data. All experiments were carried out on a Mac OS X 10.7 operated laptop (2.53 GHz) Intel Core Duo processor and 4 GB main memory. The data structures and algorithm were implemented in Java and compiled with the GNU compiler.
Iiia Scalability of GCG algorithm
Fig. 6 demonstrates the runtime of the GCG algorithm with various numbers of objects (galaxies) and distances. It illustrates that the runtime increases slightly as the number of objects and distance increase. The distance is increased by 1 Mpc every time, whereas the number of objects is increased by 50K objects. The maximum number of records was 350000. To explain further, when the distance increases the grid size increases. Also by increasing number of objects at the same time, it allows more objects to appear in the same gird’s cell or in the neighbor grid areas. Therefore, the two factors (distance, number of objects) affect the runtime of the GCG algorithm.
IiiB Galaxy types in large complete graphs
We applied the GCG algorithm on the “Main” galaxies extracted from SDSS to generate maximal complete graphs with neighborhood distance (4 Mpc). We selected the complete graphs with the largest cardinality (). Fig. 7 shows the distribution of “Early” and “Late” type galaxies in the reported complete graphs. These results show that large complete graphs consist of more “Early” type galaxies (Elliptic) than “Late” type galaxies (Spiral). This conforms to the patterns given by [1] that say “Early” type galaxies tend to stay away from “Late” type galaxies.
IiiC Complete Graphs Cardinalities
Figure 8 shows the complete graphs cardinalities in “Main” galaxies. It shows that complete graphs with cardinality between 2 and 5, small complete graphs, are more frequent than large complete graphs.
Iv Related Work
Huang et al. [14] defined the colocation pattern as the presence of a spatial feature in the neighborhood of instances of other spatial features. They developed an algorithm for mining valid rules in spatial databases using an Apriori based approach. Their algorithm does not separate the colocation mining and interesting pattern mining steps like our approach does. Also, they did not consider complex relationships or patterns.
Monroe et al. [15] used cliques as a colocation pattern (subgraphs), but in our research we used complete graphs instead. Similar to our approach, they separated the clique mining from the pattern mining stages. However, they did not use maximal complete graph. They treated each clique as a transaction and used an Apriori based technique for mining association rules. Since they used cliques (rather than maximal complete graphs) as their transactions, the counting of pattern instances is very different. They considered complex relationships within the pattern mining stage. However, their definition of negative patterns is very different – they used infrequent types while we base our definition on the concept of absence in maximal complete graphs. They also used a different measure, namely, maxPI.
Arunasalam et al. [4] used a similar approach to [15]. They proposed an algorithm called NP_maxPI which also used the MaxPI measure. The proposed algorithm prunes the candidate itemsets using a property of maxPI. They also used an Apriori based technique to mine complex patterns. A primary goal of their work was to mine patterns which have low support and high confidence. As with the work of [15], they did not use maximal complete graphs.
Zhang et al. [16] enhanced the algorithm proposed in [14] and used it to mine special types of colocation relationships in addition to cliques, namely; the spatial star, and generic patterns. This means they didn’t use maximal complete graphs.
Most of the previous research and to the best of our knowledge, previous work has used Apriori type algorithms for mining interesting colocation patterns. However, we embedded GLIMIT [5] as the underlying pattern mining algorithm as already discussed in Section IIC. To the best of our knowledge, no previous work has used the concept of maximal complete graph to mine comoplex colocation patterns in large spatial data.
V Conclusion
In this paper, we presented a framework, which incorporates our proposed algorithm GCG to mine complex colocation patterns exist in large spatial dataset (SDSS). Most of the previous research conducted in this area used Apriori type algorithms to mine only normal colocation patterns. However, we showed the importance of using complex colocation patterns, which are extracted from maximal complete graphs. We also presented how our proposed algorithms strips efficiently all maximal complete graphs in large spatial dataset (SDSS) using divide and conquer strategy. We have shown that the idea of mining maximal complete graphs is very important in our work since complex patterns only makes sense when using maximal complete graphs. Future work would be to extend this framework to extract interesting relationships using different types of spatial objects in the astronomy domain.
References
 [1] J. Gray, D. Slutz, A. S. Szalay, A. R. Thakar, J. vandenBerg, P. Z. Kunszt, and C. Stoughton, “Data mining the sdss skyserver database,” Microsoft Research, Tech. Rep. MSRTR200201, 2002.
 [2] S. Sekhar and S. Chawla, Spatial Databases:A Tour. Prentice Hall, 2003.
 [3] R. Agrawal, T. Imielinsk, and A. Swami, “Mining association rules between sets of items in large databases,” in SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data. New York, NY, USA: ACM Press, 1993, pp. 207–216.
 [4] B. Arunasalam, S. Chawla, and P. Sun, “Striking two birds with one stone: Simultaneous mining of positive and negative spatial patterns,” in Proceedings of the Fifth SIAM International Conference on Data Mining, 2005, pp. 173–182.
 [5] F. Verhein and S. Chawla, “Geometrically inspired itemset mining,” in ICDM. IEEE Computer Society, 2006, pp. 655–666. [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.75
 [6] S. D. S. Survey, “Sdss  sloan digital sky survey. retrieved august 5, 2005 from http://cas.sdss.org/dr5/en/help/download/,” 2006.
 [7] D.N.Spergel, M.Bolte, and W.Freedman, “The age of the universe,” Proceedings of the National Academy of Science, vol. 94, pp. 6579–6584, 1997.
 [8] H. M. and S. Churchman, “Hubble’s law. retrieved march 12, 2005, from from http://map.gsfc.nasa.gov/,” 1999.
 [9] V.J.Martin and E.Saar, Statistics of the Galaxy Distribution. Chapman and Hall/CRC, 2002.
 [10] S. Shekhar, Y. Huang, and H. Xiong, “Discovering spatial colocation patterns from spatial data sets:a general approach,” vol. 16, 2004, pp. 1472–1485.
 [11] F. Verhein and G. AlNaymat, “Fast mining of complex spatial colocation patterns using glimit,” in The 2007 International Workshop on Spatial and Spatiotemporal Data Mining (SSTDM) in cooperation with The 2007 IEEE International Conference on Data Mining (ICDM). Los Alamitos, CA, USA: IEEE Computer Society, 2007, pp. 679–684.
 [12] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of 20th International Conference on Very Large Data Bases VLDB. Morgan Kaufmann, 1994, pp. 487–499.
 [13] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in 2000 ACM SIGMOD Intl. Conference on Management of Data. ACM Press, May 2000, pp. 1–12. [Online]. Available: citeseer.ist.psu.edu/han99mining.html
 [14] Y. Huang, H. Xiong, S. Shekhar, and J. Pei, “Mining confident colocation rules without a support threshold,” in Proceedings of the 18th ACM Symposium on Applied Computing ACM SAC. ACM Press, New York, 2003.
 [15] R. Munro, S. Chawla, and P. Sun, “Complex spatial relationships,” in Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM 2003. IEEE Computer Society, 2003, pp. 227–234.
 [16] X. Zhang, N. Mamoulis, D. W. Cheung, and Y. Shou, “Fast mining of spatial collocations,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM PressNew York, 2004, pp. 384 – 393.