GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data

GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data

Ghazi Al-Naymat College of Computer Science and Information Technology
University of Dammam, KSA
ghalnaymat@ud.edu.sa
Abstract

Recent research on pattern discovery has progressed from mining frequent patterns and sequences to mining structured patterns, such as trees and graphs. Graphs as general data structure can model complex relations among data with wide applications in web exploration and social networks. However, the process of mining large graph patterns is a challenge due to the existence of large number of subgraphs. In this paper, we aim to mine only frequent complete graph patterns. A graph in a database is complete if every pair of distinct vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining algorithm developed to explore interesting pruning techniques to extract maximal complete graphs from large spatial dataset existing in Sloan Digital Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high efficiency especially in the presence of large number of patterns. In this paper, we describe GCG that can mine not only simple co-location spatial patterns but also complex ones. To the best of our knowledge, this is the first algorithm used to exploit the extraction of maximal complete graphs in the process of mining complex co-location patterns in large spatial dataset.

I Introduction

With the rapid invention of advanced technology, researchers have been collecting large amounts of data on a continuous or periodic basis in many fields. This data becomes the potential for researchers to discover useful information and knowledge that has not been seen before. In order to process this data and extract useful information, the data needs to be organised in a suitable format. Hence data preparation plays a very important role in the data mining process.

The focus of this study is mainly on extracting interesting complex patterns from Sloan Digital Sky Survey (SDSS) astronomy dataset [1]. The SDSS is the most motivated astronomical survey project ever undertaken. The survey maps in detail one-quarter of the entire sky, determining the positions and absolute brightness of more than 100 million celestial objects. The first official Data Release (DR1) of SDSS was in June 2003. Since then there have been many new releases including the ninth major release (DR9) in August 2012 that provides images, imaging catalogs, spectra, and redshift. Release DR9 contains more than 5TB of data, which includes measures of 500 million unique celestial objects.

Availability of such large amount of useful data is an obvious opportunity for application of data mining techniques to extract interesting information. However, while much research has been done by the astronomical researchers, a feeble effort has been made to apply data mining techniques on SDSS data. That is because the SDSS data format is not suitable for mining purposes, that is the main motivation of this paper.

As mentioned in [2] spatial databases store spatial attributes about objects, and hence, SDSS is a large spatial dataset as it contains many attributes for each object. One of the most significant problems in spatial data mining is to find object types that frequently co-locate with each other in large databases. Co-location means objects that are found in the neighborhood of each other. The proposed approach mines co-location patterns in SDSS data and uses these patterns to generate interesting information about different types of galaxies. In this work only the galaxies existing in SDSS is used. However, this approach could be generalised to be used with any other celestial objects.

The data preparation plays a vital rolein the mining process, hence it is done in two folds. First, extracting the galaxies in SDSS data and categorising them into “Early” and “Late” type galaxies. Second, our proposed algorithm GCG is utilised to generate co-location patterns (maximal complete graphs) from the data. A complete graph is any set of spatial objects such that all objects in the set co-locate. A maximal complete graph is a complete graph which is not a subset of any other complete graph. Fig. 1 depicts some examples of spatial co-locations, the line between the vertices (objects) indicates that they are co-located. The second column of Table I displays the maximal complete graph patterns, which are presented in Fig. 1.

Fig. 1: Graphs in a plane as examples of the co-location patterns

The full general problem of extracting maximal complete graphs from a graph is known as NP-Hard. GCG efficiently extracts maximal complete graphs in a given spatial database with capability to divide the space into grid structure based on a predefined distance. A divide and conquer strategy is applied via a grid structure to reduce the search space.

ID Maximal Complete Graphs Transactions
1 {C1,D1} {C,D}
2 {C5,D7} {C,D}
3 {B1,C5} {B,C}
4 {A4,D2,D5} {A,D+}
5 {A5,C3,D3} {A,C,D}
6 {A5,D3,D6} {A,D+}
7 {A2,B2,D4} {A,B,D}
8 {A1,A3,C2,C4,C6} {A+,C+}
TABLE I: Maximal Complete Graph patterns

In this paper we focus on using maximal complete graphs to allow us to mine interesting complex spatial relationships between the object types. A complex spatial relationship includes not only whether an object type, say , is present in a (maximal) complete graph, but also:

  • Whether more than one object of its type is present in the maximal complete graph. This is called a positive type and is denoted by .

  • Whether objects of a particular type are not present in a maximal complete graph – that is, the absence of types. This is called a negative type and is denoted by .

The inclusion of positive and / or negative types makes a relationship complex. This allows us to mine patterns that say, for example, that occurs with multiple ’s but not with a . That is, the presence of may imply the presence of multiple ’s and the absence of . This is interesting in the astronomy domain. The last column of Table I shows examples of (maximal) complex relationships.

Maximal complete graphs generated by GCG can be represented as transactions as given in Column 3 of Table I. These transactions can be used by ANY association rule mining technique. Association rule mining techniques that proposed by [3, 4] can generate useful rules, which will be interpreted as relationships between objects.

We are not interested in maximal complex patterns (relationships) in themselves, as they provide only local information (that is, about a maximal complete graph). We are however interested in sets of object types (including complex types), that appear across the entire dataset (that is, amongst many maximal complete graphs). In other words, we are interested in mining interesting complex spatial relationships (sets), where “interesting” is defined by a global measure. We use a variation of the  [12] measure to define interestingness.

I-a Problem Statement

Given the set of maximal complete graphs, find all interesting complex patterns that occur amongst the set of maximal complete graphs. More specifically, find all sets of object types, including positive and negative (that is, complex) types that are interesting as defined by their being above a threshold.

This problem therefore becomes an itemset mining task. In order to do this very quickly, we use interesting itemset mining algorithm, that is GLIMIT [5].

Including negative types makes the problem much more difficult, as it is typical for spatial data to be sparse. This means that the absence of a type can be very common.

I-B Contributions

In this paper we make the following contributions:

  • An efficient algorithm called GCG is proposed to generate all maximal complete graph patterns that exist in large spatial datasets.

  • We introduce the concept of maximal complete graph. We demonstrate how the use of maximal complete graphs makes more sense than using complete graphs, and we showed that they allow the use of negative patterns.

  • We show that complex and interesting co-location patterns can be efficiently extracted from huge and sparse spatial datasets.

The rest of the paper is organized as follows: Section II gives further details of our approach. Section III contains our experiments and an analysis of the results. Section IV puts our contributions in context of related work, then we conclude in Section V.

Fig. 2: Framework showing the complete mining process.

Ii Maximal Complete Graph Mining

Figure 2 shows the overall process flow of our method. The follwoing subsections elaborate more about our approach.

Ii-a Data Extraction and Categorisation

Raw data needs most of the time to be prepared to suit data mining algorithms. This section illustrates the method of extracting the important attributes from the SDSS database. These attributes used to categorise galaxy objects. A view called SpecPhoto which is derived from a table called SpecPhotoAll is used. The latter is a joined table between the PhotoObjAll and SpecObjAll tables. In other words, SpecPhoto is view of joined Spectro and PhotoObjects that have the clean spectra111http://www.sdss.org/.

The concern was to extract only the galaxy objects from the SDSS using parameter (object type=0). The total number of galaxy-type objects stored in the SDSS catalog is more thant 507,594. However, to ensure the accuracy for calculating the distance between objects and the earth which leads to calculate the and coordinates for each object, some parameters are used, such as (the rigid objects) and (correct RedShift). Therefore, the number of objects is reduced to (442,923).

SDSS release 9 provides a table called Neighbors. This table contains all objects that are located within 0.5 arcmins, this makes it not useful in this study because there is no ability to choose any distance that would form the neighborhood relation between objects. For example, in our experiments mega-parsec (distances) are used as the thresholds to check whether objects are close to each other or not. Table II discloses the extracted fields from the SDSS (DR9) that used during the preparation process.

Data extraction: The data was obtained from SDSS (DR9) [6]. This data is extracted from the online catalog services using several SQL statements and tools, which offered by the catalog. These tools are accessible from the SDSS site222http://cas.sdss.org/dr5/en/tools/search/sql.asp.

Data transformation: The data obtained from the previous step is transformed to identify the categories of the glaxies and to represent the data in the right format.

No Field name Field description
1. specObjID Unique ID
2. z Final RedShift
3. ra Right ascention
4. dec Declination
5. cx x of Normal unit vector
6. cy y of Normal unit vector
7. cz z of Normal unit vector
8. primTarget prime target categories
9. objType object type : Galaxy =0
10. modelMag_u Ultraviolet magniutde
11. modelMag_r Red Light magnitude
TABLE II: The SDSS schema

New attributes creation: With all necessary fields, this step is to calculate the exact value of the and coordinates which are not explicitly shown in the SDSS data. First , the distance between objects and the earth is calculated using Hubble’s law and the value of for each object as in Equation ( 1 ) . Second, by considering the unit vectors and , and multiplying them by the , the value of and coordinates are calculated by Equations  2,  3, and  4, respectively.

(1)

where is the speed of light, is the object RedShift, and is Hubbles’ constant. Currently the best estimate for this constant is 71  [7, 8].

(2)
(3)
(4)

Galaxies Categorisation: Different parameters were used to categorise galaxy types. Based on the difference between Ultraviolet and Red light magnitude , galaxies are categorised as either “Early”’ or “Late”’. If the difference is greater than or equal to the galaxy is “Early”’, otherwise it is “Late”’. The value of the r-band Petrosian magnitude indicates whether the galaxy is “Main”’ (close to the earth) or “Luminous Red Galaxies” (). That is by checking the value of r-band. If r-band , that indicates that the object is “Main” galaxy otherwise it is “LRG” [9]. The four galaxy types that found are Main-Late, Main-Early, LRG-Late, and LRG-Early.

Ii-B Basic Definitions and Concepts

This section briefly defines the concepts that are used in this paper.

Consider a set of objects with fixed locations. Given an appropriate distance measure we can define a graph as follows; let be the vertices and construct an edge between two objects and if , where is a chosen distance. A co-location pattern is a connected subgraph.

Definition 1 (Complete Graph)

A Complete Graph is any fully connected subgraph of . That is, .

For example, in Fig. 1, {A4,D2,D5} form a complete graph as each object co-locates with each other. Similarly {C5,D7} form another complete graph.

As we have mentioned in Section I we use maximal complete graphs so that we can define and use complex patterns meaningfully and to avoid double counting.

Definition 2 (Maximal Complete Graph)

A maximal complete graph is a complete graph that is not a subset (sub-graph) of any other complete graph.

In Fig. 1, {A4,D2,D5} form a maximal complete graph as it is not a subset of another complete graph. However, {A2,D4} is not a maximal complete graph since it is a subset of the complete graph {A2,B2,D4}.

The mining of maximal complete graphs is done directly – it does not require mining all sub-complete graphs first.

Definition 3 (Complete Graph’s Cardinality)

It is the number of vertices in a complete graph, that is . In other words, it is the size of the complete graph. This value can be used to find the total number of edges (Equation 5) that the complete graph can have. The below equation shows that.

(5)

where is the number of vertices.

Ii-C Mining Maximal Complete Graphs

First, data preperation process starts a maximal complete graph mining algorithm to extract all maximal complete graphs, and strips them of the object identifiers (producing raw maximal complete craphs as shown in Table VI. One pass is then made over the raw maximal complete graphs in order to extract complex relationships. We describe this in Section II-F. This produces complex maximal complete graphs. Each of these complex maximal complete graphs is then considered as a transaction , and an interesting itemset mining algorithm, using as the interestingness measure, is used to extract the interesting complex relationships.

In itemset mining, the dataset consists of a set of transactions , where each transaction is a subset of a set of items ; that is, . In our work, the set of complex maximal complete graphs (relationships) becomes the set of transactions (third column in Table I). The items are the object types – including the complex types such as and . For example, if the object types are , and each of these types is present and absent in at least one maximal complete graph, then . An interesting itemset mining algorithm mines for interesting itemsets. The support of an itemset is the number of transactions containing the itemset: . So called frequent itemset mining uses the support as the measure of interestingness. For reasons described in Section I we use minPI [10] which, under the mapping described above, is equivalent to

Since is anti-monotonic, we can easily prune the search space for interesting patterns. We adopted the method used in GLIMIT ([5, 11]) to mine the interesting patterns from maximal complete graphs. GLIMIT is a very fast and efficient itemset mining algorithm that has been shown to outperform Apriori like algorithms [12] and FP-Growth [13].

As shown in Fig. 2, the complete graph generation and complex relationship extraction are local procedures, in the sense that they deal only with individual maximal complete graphs. In contrast, the interesting pattern mining is global – it finds patterns that occur across the entire space. Secondly, we consider subsets of maximal complete graphs only in the last step – after the complex patterns have been extracted.

Object type X-Coordinate Y-Coordinate
A1 2.5 4.5
A2 6 4
A3 2 9
B1 1.5 3.5
B2 5 3
B3 5 4
C1 2.5 3
C2 6 3
D1 3 9
D2 7 1.5
TABLE III: Example: Dataset of two dimensions.
0:  Set of points , Threshold
0:  A list of maximal Complete Graph patterns. {Generating grid structure.}
1:  
2:  
3:  for all  do
4:     Get the coordinates of each point
5:     Generate the composite key (GridKey=()).
6:     if  then
7:         
8:     else
9:          new GridKey
10:         
11:     end if
12:  end for{Obtaining the neighborhood lists.}
13:  for all  do
14:     
15:      (the 27 neighbor cells of )
16:     
17:     if  then
18:         for all  do
19:            if EucDist  then
20:                ( are neighbors)
21:            end if
22:         end for
23:     end if
24:     
25:  end for{Pruning neighborhood list if at least one of its items violates the maximal Complete Graph definition.}
26:  
27:  
28:  for all  do
29:     
30:     for all  do
31:         for all  do
32:            if  then
33:                ( are neighbors)
34:            end if
35:         end for
36:     end for
37:     
38:  end for
Algorithm 1 Grid Complete Graph algorithm.

Ii-D GCG algorithm

Algorithm 1 reveals the pseudocode of the GCG algorithm. This section shows how the algorithm works through an example. By assuming that all objects are spatial, we use Fig. 3 to depict some example items and their locations. These objects and their coordinates are given in Table III. It should be noted that SDSS is three dimensional dataset, but in the example two dimensions are used for the sake of simplicity.

Fig. 3: Spatial objects in a 2D (x and y coordinates) grid structure.
Lists Members
1 {A1, B1, C1}
2* {B1, A1, C1}
3* {C1, A1, B1}
4 {D1, A3}
5 {A2, B2, C2, B3}
6* {B2, A2, B3, C2}
7** {C2, A2, B2, B3, D2}
8 {D2, C2}
9* {A3, D1}
10* {B3, C2, B2, A2}
TABLE IV: Neighbor lists.
Lists 1 4 5 8
Members {A1,B1,C1} {D1,A3} {A2,B2,C2,B3} {D2,C2}
TABLE V: Neighbor lists after the pruning step.

Edges in each subgraph are formed by calculating the distance between adjacent objects. In other word, if the distance between them , the edge will be created. Each subgraph, in this context, forms a co-location pattern. Therefore, results of this algorithm are patterns containing objects that are co-located. GCG algorithm (1) functionality is described as follows:

  1. Lines 1 - 12: Dividing the space into a grid structure and concurrently placing each point into its particular grid cell based on its coordinates (Fig. 3). The size of the grid cell is , where . The value of is given as one of the inputs for the GCG algorithm.

  2. Lines 13 - 25: Finding each object’s neighborhood lists. This step is the most important one, and it is the most crucial step for the complexity issue. It uses the Euclidean distance technique to check the neighborhood relationship between objects. However, the number of checked spatial objects depends on the density of the grid and the content of the neighbor cells333Number of neighbor cells is 9 or 27 if the data is 2D or 3D, respectively.. According to the example in Fig. 3, also because the sample data contains 10 objects, a list for each object is created except for those objects that are located lonely. Our concern is to find co-location patterns that have number of members 2 (i.e. ); because one object does not form any type of relationship. Consequently, no need to count objects that do not have connections (i.e. relationship) with at least one another object. However, in our example all objects share relationships. For example, object {A1} has a relationship with {B1,C1} and object {A2} with {B2, B3, C2}. It can be seen that these objects share the same location, this means {A1, B1, C1} are co-located because the distance between them is . Table IV shows some redundant lists – marked by * – (same objects in different order); this gives us the chance to prune the complete list without losing the objects as they present in another list.

  3. Lines 26 - 38: Pruning any neighbor list that contains at least one object violating the co-location condition. For example, list 7 is pruned because two of its members {A2,D2} are not close to each other as given in Table IV (lists marked by **).

As a result of the previous steps, list of maximal complete graphs will be formed. For example, {A1, B1, C1} forms a maximal complete graph and so forth for lists (4, 5, 8) as shown in Table V.

Ii-E GCG algorithm analysis

This section discusses the GCG algorithm completeness, correctness, and complexity.

Completeness: All objects in neighbor lists appear as set or subset in maximal complete graph lists. After acquiring the entire neighbors for each point, another check among these neighbors is done to assure that all points are neighbors to each other. Intuitively, doing that results to have repeated neighbor lists. Therefore, this ensures finding all maximal complete graphs in any given graph.

Fig. 4: Example of two maximal complete graphs used to show the correctness of the proposed algorithm.

Correctness: Every subset of a maximal complete graph appears in the neighbors list. Thus, all maximal complete graphs that appear in maximal complete graph’s list will not be found as a subset in another maximal complete graph. That is, the definition of maximal complete graph. Fig. 4 displays an undirect graph and the neighborhood list and the existed maximal complete graph patterns. It is very clear that the pair does not appear in the neighborhood list, because the distance between (i.e. no edge between them). As a result, the pair will not be included in the maximal complete graphs’ list. In other words, any subset of any maximal complete graph appears in the neighborhood list and it will not appear as an independent maximal complete graph. By this, the correctness of the proposed algorithm is shown.

Complexity: Assume there is points and cells in a gird, and assume that all points are uniformly distributed. Hence, on average there is points per cell. Also, assume each cell has neighbors. Then to create the neighborhood list of one point points need to be examined to check if they are within distance . Since the total number of points is , thus the cost is . And since , an assumption, that this part of the algorithm is sub-quadratic, can be stated. Second, pruning neighborhood lists assuming that on average the length of each neighborhood list is . Then for each neighborhood list, other lists have to be examined to check if a point is in others neighborhood list or not. Therefore, for each point, other neighborhood lists are examined as well as within each one, up to points will be checked. Consequently, the cost is . Finally, the total cost is the cost to put the points in cell (O (N)), the cost to create the neighborhood lists , and the cost to prune the lists . The total complexity of the algorithm is .

Fig. 5: Complete graph example used in explaining the process of extracting the complex relationships.
ID Maximal Raw Maximal Complex
Complete Graphs Complete Graphs Relationships
1 {A3, B1, B2, B3} {A, B, B, B} {A, B, B+, -C}
2 {B1, C1} {B, C} {-A, B, C}
3 {A1, A2, B} {A, A, B} {A, A+, B, -C}
TABLE VI: Representing maximal complete graphs of Fig. 5 as complex relationships

Ii-F Extracting Complex Relationships

A relationship is called complex if it consists of complex types as defined in Section I.

Extracting a complex relationship from a maximal complete graph is straightforward – we simply use the following rules for every type :

  1. First, remove the object identifiers. This produces a “raw” maximal complete graph .

  2. If contains an object with type , .

  3. If contains more than one object of type , .

  4. If does not contain an object of type , .

Note that if includes a positive type , it will also always include the basic type . This is necessary to that maximal complete graphs that contain will also be counted as containing when we mine for interesting patterns.

Recall that the negative type only makes sense if we use maximal complete graphs. The last column of Table VI shows the result of applying all four rules.

Fig. 6: GCG’s runtime using 5 different distances.
(a) Number of “Main-Late” galaxies in complete graphs
(b) Number of “Main-Early” galaxies in complete graphs
Fig. 7: The existence of galaxies in the universe.

Iii Experiments and Results Discussion

Experiments are carried out to confirm the achieved results when using the proposed algorithm on the SDSS data. All experiments were carried out on a Mac OS X 10.7 operated laptop (2.53 GHz) Intel Core Duo processor and 4 GB main memory. The data structures and algorithm were implemented in Java and compiled with the GNU compiler.

Iii-a Scalability of GCG algorithm

Fig. 6 demonstrates the runtime of the GCG algorithm with various numbers of objects (galaxies) and distances. It illustrates that the runtime increases slightly as the number of objects and distance increase. The distance is increased by 1 Mpc every time, whereas the number of objects is increased by 50K objects. The maximum number of records was 350000. To explain further, when the distance increases the grid size increases. Also by increasing number of objects at the same time, it allows more objects to appear in the same gird’s cell or in the neighbor grid areas. Therefore, the two factors (distance, number of objects) affect the runtime of the GCG algorithm.

Iii-B Galaxy types in large complete graphs

We applied the GCG algorithm on the “Main” galaxies extracted from SDSS to generate maximal complete graphs with neighborhood distance (4 Mpc). We selected the complete graphs with the largest cardinality (). Fig. 7 shows the distribution of “Early” and “Late” type galaxies in the reported complete graphs. These results show that large complete graphs consist of more “Early” type galaxies (Elliptic) than “Late” type galaxies (Spiral). This conforms to the patterns given by [1] that say “Early” type galaxies tend to stay away from “Late” type galaxies.

Iii-C Complete Graphs Cardinalities

Figure 8 shows the complete graphs cardinalities in “Main” galaxies. It shows that complete graphs with cardinality between 2 and 5, small complete graphs, are more frequent than large complete graphs.

Fig. 8: Complete Graphs cardinalities for Main galaxies using threshold = 4 Mpc. Frequency X

Iv Related Work

Huang et al. [14] defined the co-location pattern as the presence of a spatial feature in the neighborhood of instances of other spatial features. They developed an algorithm for mining valid rules in spatial databases using an Apriori based approach. Their algorithm does not separate the co-location mining and interesting pattern mining steps like our approach does. Also, they did not consider complex relationships or patterns.

Monroe et al. [15] used cliques as a co-location pattern (subgraphs), but in our research we used complete graphs instead. Similar to our approach, they separated the clique mining from the pattern mining stages. However, they did not use maximal complete graph. They treated each clique as a transaction and used an Apriori based technique for mining association rules. Since they used cliques (rather than maximal complete graphs) as their transactions, the counting of pattern instances is very different. They considered complex relationships within the pattern mining stage. However, their definition of negative patterns is very different – they used infrequent types while we base our definition on the concept of absence in maximal complete graphs. They also used a different measure, namely, maxPI.

Arunasalam et al. [4] used a similar approach to [15]. They proposed an algorithm called NP_maxPI which also used the MaxPI measure. The proposed algorithm prunes the candidate itemsets using a property of maxPI. They also used an Apriori based technique to mine complex patterns. A primary goal of their work was to mine patterns which have low support and high confidence. As with the work of [15], they did not use maximal complete graphs.

Zhang et al. [16] enhanced the algorithm proposed in [14] and used it to mine special types of co-location relationships in addition to cliques, namely; the spatial star, and generic patterns. This means they didn’t use maximal complete graphs.

Most of the previous research and to the best of our knowledge, previous work has used Apriori type algorithms for mining interesting co-location patterns. However, we embedded GLIMIT [5] as the underlying pattern mining algorithm as already discussed in Section II-C. To the best of our knowledge, no previous work has used the concept of maximal complete graph to mine comoplex co-location patterns in large spatial data.

V Conclusion

In this paper, we presented a framework, which incorporates our proposed algorithm GCG to mine complex co-location patterns exist in large spatial dataset (SDSS). Most of the previous research conducted in this area used Apriori type algorithms to mine only normal co-location patterns. However, we showed the importance of using complex co-location patterns, which are extracted from maximal complete graphs. We also presented how our proposed algorithms strips efficiently all maximal complete graphs in large spatial dataset (SDSS) using divide and conquer strategy. We have shown that the idea of mining maximal complete graphs is very important in our work since complex patterns only makes sense when using maximal complete graphs. Future work would be to extend this framework to extract interesting relationships using different types of spatial objects in the astronomy domain.

References

  • [1] J. Gray, D. Slutz, A. S. Szalay, A. R. Thakar, J. vandenBerg, P. Z. Kunszt, and C. Stoughton, “Data mining the sdss skyserver database,” Microsoft Research, Tech. Rep. MSR-TR-2002-01, 2002.
  • [2] S. Sekhar and S. Chawla, Spatial Databases:A Tour.   Prentice Hall, 2003.
  • [3] R. Agrawal, T. Imielinsk, and A. Swami, “Mining association rules between sets of items in large databases,” in SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data.   New York, NY, USA: ACM Press, 1993, pp. 207–216.
  • [4] B. Arunasalam, S. Chawla, and P. Sun, “Striking two birds with one stone: Simultaneous mining of positive and negative spatial patterns,” in Proceedings of the Fifth SIAM International Conference on Data Mining, 2005, pp. 173–182.
  • [5] F. Verhein and S. Chawla, “Geometrically inspired itemset mining,” in ICDM.   IEEE Computer Society, 2006, pp. 655–666. [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.75
  • [6] S. D. S. Survey, “Sdss - sloan digital sky survey. retrieved august 5, 2005 from http://cas.sdss.org/dr5/en/help/download/,” 2006.
  • [7] D.N.Spergel, M.Bolte, and W.Freedman, “The age of the universe,” Proceedings of the National Academy of Science, vol. 94, pp. 6579–6584, 1997.
  • [8] H. M. and S. Churchman, “Hubble’s law. retrieved march 12, 2005, from from http://map.gsfc.nasa.gov/,” 1999.
  • [9] V.J.Martin and E.Saar, Statistics of the Galaxy Distribution.   Chapman and Hall/CRC, 2002.
  • [10] S. Shekhar, Y. Huang, and H. Xiong, “Discovering spatial co-location patterns from spatial data sets:a general approach,” vol. 16, 2004, pp. 1472–1485.
  • [11] F. Verhein and G. Al-Naymat, “Fast mining of complex spatial co-location patterns using glimit,” in The 2007 International Workshop on Spatial and Spatio-temporal Data Mining (SSTDM) in cooperation with The 2007 IEEE International Conference on Data Mining (ICDM).   Los Alamitos, CA, USA: IEEE Computer Society, 2007, pp. 679–684.
  • [12] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of 20th International Conference on Very Large Data Bases VLDB.   Morgan Kaufmann, 1994, pp. 487–499.
  • [13] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in 2000 ACM SIGMOD Intl. Conference on Management of Data.   ACM Press, May 2000, pp. 1–12. [Online]. Available: citeseer.ist.psu.edu/han99mining.html
  • [14] Y. Huang, H. Xiong, S. Shekhar, and J. Pei, “Mining confident co-location rules without a support threshold,” in Proceedings of the 18th ACM Symposium on Applied Computing ACM SAC.   ACM Press, New York, 2003.
  • [15] R. Munro, S. Chawla, and P. Sun, “Complex spatial relationships,” in Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM 2003.   IEEE Computer Society, 2003, pp. 227–234.
  • [16] X. Zhang, N. Mamoulis, D. W. Cheung, and Y. Shou, “Fast mining of spatial collocations,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM Press-New York, 2004, pp. 384 – 393.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
131710
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description