A Formal Definition for Configuration
There exists a wide set of techniques to perform keyword-based search over relational databases but all of them match the keywords in the users’ queries to elements of the databases to be queried as first step. The matching process is a time-consuming and complex task. So, improving the performance of this task is a key issue to improve the keyword based search on relational data sources.
In this work, we show how to model the matching process on keyword-based search on relational databases by means of the symmetric group. Besides, how this approach reduces the search space is explained in detail.
Configuration, Permutation, Symmetric group, -module, Relational database, Keyword-based search, Keyword-based queries.
University of Zaragoza, Spain
In the last decade, the amount of large digital structured data available on the Web is increasing due to the success of initiatives such as DBpedia, Linked Open Data and Semantic Web . Moreover, keyword-based search has become the de-facto standard for searching information on the Web since its adoption by the main web search engines, such as Google, at the end of the 90’s. The reasons of its success are mainly its simplicity and intuitiveness, as it does not require users searching information to know either any formal language, such as SQL  or SPARQL , or how the data and documents are stored.
This context has made to increase the interest in supporting keyword search over structured databases, and, in particular, over relational databases . There exists a wide set of techniques and methods to perform keyword-based search over relational databases, classified under two main groups: graph-based approaches and schema-based approaches . Nevertheless, all of them require matching the keywords in the users’ queries to elements of the databases to be queried as first step, i.e., they require to explore what in Keymantic  is defined as to find configurations of the users’ queries. Thus, exploring possible configurations by using specific algorithms, such as Hungarian algorithm, is required to optimize the search.
In this paper, we propose a formal definition for configurations to improve the performance of current techniques such as Keymantic. In particular, we define configurations as given types of elements of the symmetric group, , in order to establish equivalence relations among different configurations, and, therefore, in order to reduce the number of configurations to evaluate.
Moreover, since there is a natural one to one correspondence between the conjugacy classes of and the partitions of and there exists an order in partitions of , then, it is possible to order configurations.
In the following, the bases to formalize the keyword based search over relational databases are described.
2.1Keyword Search over Relational Databases
A keyword query is an ordered list of keywords . Each keyword is a specification about the element of interest.
It is made the natural assumption that each keyword can be mapped to only one database term, not two keywords can be mapped into the same database term and there are no unjustified keywords.
We must map the keywords in a query to the database terms in the vocabulary of , so there are
with denoting the arity of the relation and the number of tables in the database.
To give a formal definition for a configuration we start by considering the keyword query and the vocabulary of the database as sets of numbers , respectively, where . The definition of a configuration can be interpreted as an injective correspondence
that can be extended to a one to one correspondence
that is, we define it as an element of the symmetric group .
2.2The Symmetric Group
To obtain the partition of a permutation, for each in cycle type we put times the value of , starting with the biggest and decreasing. That is another way to give the cycle type.
We are interested in permutations in that have only cycles of length and with no more than a number bigger than in each cycle, therefore permutations in with .
Conjugacy is an equivalence relation in . Two permutations are in the same conjugacy class if and only if they have the same cycle type and, using for when has type (see  for more details):
Thus there is a natural one to one correspondence between partitions of and conjugacy classes of .
3Configurations as permutations
Let be a configuration of a keyword query on a database with a vocabulary given by
we identify it with given by:
for all ,
for each with , we call and define in order to obtain the smallest cycle containing . For each not obtainned previously, we define .
For each such that there is no with , if , , , are all but , we call and define obtainning a cycle of length in which only one value is bigger than .
For each such that there is no with , we define obtaining a fixedpoint. All this points will be fixedpoints so, if is bigger enough , we will have at least fixedpoints.
Indeed, if in cycle notation and it has fixedpoints, the type cycle of is , where and .
Since each cycle of , , contains no more than an element of value bigger than and we have elements bigger than , has at least cycles. Thus .
Consecuently, since , we have .
We have an equivalence relation between configurations through the conjugacy in . In addition, this definition of configuration permits to order configurations through some orders that we can consider on partitions. These are two important reasons why explaining configurations as elements of the symmetric group opens up a way to explain top-k algorithms as combinatorial algorithms.
4Matrix Representations of a Group
To stablish the orders needed to explain top-k algorithms as combinatorial algorithms, we need some previous results about group representations that are explained in this Section.
4.1Matrix Representations and -Modules
A matrix representation can be thought of as a way to model an abstract group with a concrete group of matrices.
Let be a group and be a vector space over the complex numbers of finite dimension. Let stand for the set of all invertible linear transformations of to itself, called the general linear group of . If , then and are isomorphic as groups.
Let be a group of finite order . We denote by the algebra of over ; this algebra has a basis indexed by elements of and most of the time we identify this bases with . Each element in can be uniquely written in the form
and multiplication in extends that in .
Let be a -vectorial space and let be a linear representation of in . For and set
By linearity this defines for and . Thus, is endowed with the structure of a left -module. Conversely such structure defines a linear representation of in .
An idea pervading all of science is that large structures can be understood by breaking them up into their smallest pieces. The same thing is true in representation theory. Some representations are built out of smaller ones, whereas others are indivisible. This is the distinction between reducible and irreducible representations.
4.2Tableaux and Tabloids
We need something about irreducible representations of the symmetric group. We know that the number of such representatios is equal to the number of conjugacy classes, that is the number of partitions of .
It may not be obvious how to associate an irreducible with each partition but it is easy to find a corresponding soubgroup that is an isomorphic copy of
inside . So, we can produce the right number of representations by including the trivial representation on each up to .
If is a module for the last representation, it is not irreducible. However, we will be able to find an ordering , , of all partitions of with nice properties.
To build the modules first we need:
There are Young tableaux for any shape .
If , then the number of tableaux in any given equivalence class is
Thus the number of -tabloids is just
Let stand for the entry of a -tableau in position . Now acts on a tableau of shape as follows:
This induces an action on tabloids by letting
4.3Dominance and Lexicographic ordering
We consider two important orderings  on partitions of .
The dominance order is partial and the lexicographic is total.
The lexicographic order is a refinement of the dominance order in this sense:
Intuitively, is greater than in the dominance order if the Ferrers diagram of is short and fat but the one for is long and skinny.
We have provided a formal characterization for the configurations in terms of some elements in the symmetric group . We have shown that such a characterization allows us to reduce the number of configurations to check.
In addition, using the symmetric group it is possible to give an order between configurations. That is an important fact because in keyword based search we can obtain more than one configuration for the same keyword query.
Many results about representations of the symmetric group can be used in a purely combinatorial manner.
Some top-k works  uses a combinatorial algorithms, such as the Hungarian algorithm (also called Munkres assignment algorithm ), to give the best answer for a for a keyword query in the context of information search. So, since the representations of the symmetric group can be used to obtain combinatorial algorithms, explaining configurations as a kind of elements of the symmetric group is a first step to formalize Keymantic as a combinatorial algorithm.
- Collecting and generating new ideas.
R. Amaro, J. G. Breslin, J. Cardoso, F. Guerra, R. Trillo-Lado, and Y. Velegrakis. Whitepaper, Keystone COST Action IC 1302, 2014.
- Keyword search over relational databases: A metadata approach.
S. Bergamaschi, E. Domnori, F. Guerra, R. Trillo, and Y. Velegrakis. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 565–576, 2011.
- An extension of the munkres algorithm for the assignment problem to rectangular matrices.
F. Bourgeois and J. Lassalle. Communications of ACM
- Top-k results using the hungarian algorithm.
E. Domnori and B. Hitaj. Unpublished Manuscript. Epoka University, Tirane (Albania).
Young tableaux: with applications to representation theory and geometry
W. Fulton. , volume 35 of London Mathematical Society Students Texts.
SQL: A Complete Reference
A. Leon and M. Leon. .
- SPARQL query language for RDF.
E. Prud’hommeaux and A. Seaborne. Recommentation, W3C, January 2008.
The Symmetric Group. Representations, Combinatorial Algorithms, and Symmetric Funtions
B. E. Sagan. .
Keyword Search in Databases
J. X. Yu, L. Chang, and L. Qin. .
- Keyword search in relational databases: A survey.
J. X. Yu, L. Qin, and L. Chang. IEEE Data Eng. Bull.