The finite index basis property
Abstract
We describe in this paper a connection between bifix codes, symbolic dynamical systems and free groups. This is in the spirit of the connection established previously for the symbolic systems corresponding to Sturmian words. We introduce a class of sets of factors of an infinite word with linear factor complexity containing Sturmian sets and regular interval exchange sets, namely the class of tree sets. We prove as a main result that for a uniformly recurrent tree set , a finite bifix code on the alphabet is maximal of degree if and only if it is the basis of a subgroup of index of the free group on .
1 Introduction
In this paper we study a relation between symbolic dynamical systems and bifix codes. The paper is a continuation of the paper with part of the present list of authors on bifix codes and Sturmian words [3]. We understand here by Sturmian words the generalization to arbitrary alphabets, often called strict episturmian words or ArnouxRauzy words (see the survey [12]), of the classical Sturmian words on two letters.
As a main result, we prove that, under natural hypotheses satisfied by a Sturmian set , a finite bifix code on the alphabet is maximal of degree if and only if it is the basis of a subgroup of index of the free group on (Theorem 4.4 called below the Finite Index Basis Theorem).
The proof uses the property, proved in [6], that the sets of first return words in a uniformly recurrent tree set containing the alphabet form a basis of the free group on (this result is referred to below as the Return Words Theorem).
We actually introduce several classes of uniformly recurrent sets of words on letters having all elements of length for all .
The smallest class () is formed of the Sturmian sets on a binary alphabet, that is, with (see Figure 1.1). It is contained both in the class of regular interval exchange sets (denoted ) and of Sturmian sets (denoted ). Moreover, it can be shown that the intersection of and is reduced to . Indeed, Sturmian sets on more than two letters are not the set of factors of an interval exchange transformation with each interval labeled by a distinct letter (the construction in [2] allows one to obtain the Sturmian sets of letters as an exchange of intervals labeled by letters).
The next one is the class of uniformly recurrent sets satisfying the tree condition (), which contains the previous ones. The class of uniformly recurrent sets satisfying the neutrality condition () contains the class . All these classes are contained in the class of uniformly recurrent sets of complexity on an alphabet with letters.
We have tried in all the paper to use the weakest possible conditions to prove our results. As an example, we prove that, under the neutrality condition, any finite maximal bifix code of degree has elements (Theorem 3.6 called below the Cardinality Theorem).
The class is closed under decoding by a maximal bifix code (Theorem 3.13 in [5] referred to as the Bifix Decoding Theorem) but it is not the case for Sturmian sets. In contrast, the uniformly recurrent tree sets form a class of sets containing the Sturmian sets and the regular interval exchange sets which is closed under decoding by a maximal bifix code (see [7]) and for which the Finite Index Basis Theorem is true.
For each class, the array on the right of Figure 1.1 indicates whether it satisfies the Cardinality Theorem (), the Return Words Theorem (), the Finite Index Basis Theorem () or the Bifix Decoding Theorem (). All these classes are distinct.
The paper is organized as follows. In Section 3, we introduce strong, weak and neutral sets. We prove the Cardinality Theorem in neutral sets (Theorem 3.6). We also prove a converse in the sense that a uniformly recurrent set containing the alphabet and such that the Cardinality Theorem holds for any finite maximal bifix code is neutral (Theorem 3.12).
In Section 4, we introduce acyclic and tree sets. The family of tree sets contains Sturmian sets and, as shown in [5], regular interval exchange sets. We prove, as a main result, that uniformly recurrent tree sets satisfy the finite index property (Theorem 4.4), a result which is proved in [3] for a Sturmian set. The proof uses a result of [6] concerning bifix codes in acyclic sets (Theorem 4.2 referred to as the Saturation Theorem). It also uses the Return Words Theorem proved in [6]. We also prove a converse of Theorem 4.4, in the sense that a uniformly recurrent set which has the finite index basis property is a tree set (Corollary 4.11).
Ackowledgement
This work was supported by grants from Région ÎledeFrance, the ANR projects Eqinocs ANR11 BS02004 and Dyna3S, ANR13BS02003, the Labex Bezout, the FARB Project “Aspetti algebrici e computazionali nella teoria dei codici, degli automi e dei linguaggi formali” (University of Salerno, 2013) and the MIUR PRIN 20102011 grant “Automata and Formal Languages: Mathematical and Applicative Aspects” H41J12000190001. We warmly thank the referee for his useful remarks on the first version of the paper.
2 Preliminaries
In this section, we first recall some definitions concerning words, prefix codes and bifix codes. We give the definitions of recurrent and uniformly recurrent sets of words. We also give the definitions and basic properties of bifix codes (see [3] for a more detailed presentation).
2.1 Words
In this section, we give definitions concerning extensions of words. We define recurrent sets and sets of first return words. For all undefined notions, we refer to [4].
2.1.1 Recurrent sets
Let be a finite nonempty alphabet. All words considered below, unless stated explicitly, are supposed to be on the alphabet . We denote by the set of all words on . We denote by or by the empty word. We refer to [4] for the notions of prefix, suffix, factor of a word.
A set of words is said to be prefixclosed (resp. factorial) if it contains the prefixes (resp. factors) of its elements.
Let be a set of words on the alphabet . For , we denote
and further
A word is rightextendable if , leftextendable if and biextendable if . A factorial set is called rightextendable (resp. leftextendable, resp. biextendable) if every word in is rightextendable (resp. leftextendable, resp. biextendable).
A word is called rightspecial if . It is called leftspecial if . It is called bispecial if it is both right and leftspecial.
A set of words is recurrent if it is factorial and if for every there is a such that . A recurrent set is biextendable.
A set of words is said to be uniformly recurrent if it is rightextendable and if, for any word , there exists an integer such that is a factor of every word of of length . A uniformly recurrent set is recurrent, and thus biextendable.
A morphism is a monoid morphism from into . If is such that the word begins with and if tends to infinity with , there is a unique infinite word denoted which has all words as prefixes. It is called a fixpoint of the morphism .
A morphism is called primitive if there is an integer such that for all , the letter appears in . If is a primitive morphism, the set of factors of any fixpoint of is uniformly recurrent (see [11], Proposition 1.2.3 for example).
A morphism is trivial if for all . The image of a uniformly recurrent set by a nontrivial morphism is uniformly recurrent (see [1], Theorem 10.8.6 and Exercise 10.11.38).
An infinite word is episturmian if the set of its factors is closed under reversal and contains for each at most one word of length which is rightspecial. It is a strict episturmian word if it has exactly one rightspecial word of each length and moreover each rightspecial factor is such that .
A Sturmian set is a set of words which is the set of factors of a strict episturmian word. Any Sturmian set is uniformly recurrent (see [3]).
Example 2.1
Let . The Fibonacci word is the fixpoint of the morphism defined by and . It is a Sturmian word (see [14]). The set of factors of is the Fibonacci set.
Example 2.2
Let . The Tribonacci word is the fixpoint of the morphism defined by , , . It is a strict episturmian word (see [13]). The set of factors of is the Tribonacci set.
2.2 Bifix codes
In this section, we present basic definitions concerning prefix codes and bifix codes. For a more detailed presentation, see [4]. We also describe an operation on bifix codes called internal transformation and prove a property of this transformation (Proposition 2.9). It will be used in Section 3.3.
2.2.1 Prefix codes
A prefix code is a set of nonempty words which does not contain any proper prefix of its elements. A suffix code is defined symmetrically. A bifix code is a set which is both a prefix code and a suffix code.
A coding morphism for a prefix code is a morphism which maps bijectively onto .
Let be a set of words. A prefix code is maximal if it is not properly contained in any prefix code . Note that if is an maximal prefix code, any word of is comparable for the prefix order with a word of .
We denote by the submonoid generated by . A set is right complete if any word of is a prefix of a word in . Given a factorial set , a prefix code is maximal if and only if it is right complete (Proposition 3.3.2 in [3]).
A parse of a word with respect to a set is a triple such that where has no suffix in , has no prefix in and . We denote by the number of parses of with respect to . Let be a prefix code. By Proposition 4.1.6 in [3], for any and , one has
(2.1) 
2.2.2 Maximal bifix codes
Let be a set of words. A bifix code is maximal if it is not properly contained in a bifix code . For a recurrent set , a finite bifix code is maximal as a bifix code if and only if it is an maximal prefix code (see [3], Theorem 4.2.2).
By definition, the degree of a bifix code , denoted , is the maximal number of parses of a word in . It can be finite or infinite.
For , we use the term ‘maximal bifix code’ instead of maximal bifix code and ‘degree’ instead of degree. This is consistent with the terminology of [4].
Let be a bifix code. The number of parses of a word is also equal to the number of suffixes of which have no prefix in and the number of prefixes of which have no suffix in (see Proposition 6.1.6 in [4]).
The set of internal factors of a set of words , denoted , is the set of words such that there exist nonempty words with .
Let be a set of words. A set is said to be thin if there is a word of which is not a factor of . If is biextendable any finite set is thin. Indeed, any long enough word of is not a factor of . The converse is true if is uniformly recurrent. Indeed, let be a word which is not a factor of . Then any long enough word of contains as a factor, and thus is not itself a factor of .
Let be a recurrent set and let be an thin and maximal bifix code of degree . A word is such that if and only if it is an internal factor of , that is,
(Theorem 4.2.8 in [3]). Thus any word of which is not a factor of has parses. This implies that the degree is finite.
Example 2.3
Let be a recurrent set. For any integer , the set is an maximal bifix code of degree .
The kernel of a bifix code is the set . Thus it is the set of words of which are also internal factors of . By Theorem 4.3.11 of [3], an thin and maximal bifix code is determined by its degree and its kernel. Moreover, by Theorem 4.3.12 of [3], we have the following result.
Theorem 2.4
Let be a recurrent set. A bifix code is the kernel of some thin maximal bifix code of degree if and only if is not maximal and for all .
Example 2.5
Let be the Fibonacci set. The set is a bifix code which is not maximal and . The set is the unique maximal bifix code of degree with kernel . Indeed, the word is not an internal factor and has two parses, namely and .
The following proposition allows one to embed an maximal bifix code in a maximal one of the same degree.
Proposition 2.6
Let be a recurrent set. For any thin and maximal bifix code of degree , there is a thin maximal bifix code of degree such that .
Proof.
Let be the kernel of and let be the degree of . By Theorem 2.4, the set is not maximal and for any . Thus, applying again Theorem 2.4 with , there is a maximal bifix code with kernel and degree . Then, by Theorem 4.2.11 of [3], the set is an maximal bifix code.
Let us show that is prefix. Suppose that and are comparable for the prefix order. We may assume that is a prefix of (the other case works symmetrically). If , then and thus . Otherwise, . Set with . Then, by Equation (2.1), and thus . But since all the factors of which are in are in , we have . Analogously, since all factors of which are in are in , we have . Therefore . But, since has degree , . Then, by Equation (2.1) again, we have and . Let be the suffix of which is in . If , then or and in both cases . Since is prefix and is suffix, this implies .
Since and are maximal prefix codes included in , this implies that . ∎
Example 2.7
Let be the Fibonacci set. Let be the maximal bifix code of degree with kernel . Then is the maximal bifix code with kernel of degree such that .
2.2.3 Internal transformation
We will use the following transformation which operates on bifix codes (see [4, Chapter 6] for a more detailed presentation). For a set of words and a word , we denote and the residuals of with respect to (one should not confuse this notation with that of the inverse in the free group). Let be a set of words and a word. Let
(2.2)  
(2.3)  
(2.4) 
Note that . Consequently . The set
(2.5) 
is said to be obtained from by internal transformation with respect to . When , the transformation takes the simpler form
(2.6) 
It is this form which is used in [3] to define the internal transformation.
Example 2.8
Let be the Fibonacci set. Let . The internal transformation applied to with respect to gives . The internal transformation applied to with respect to gives .
The following result is proved in [3] in the case (Proposition 4.4.5).
Proposition 2.9
Proof.
By Proposition 2.6 there is a thin maximal bifix code of degree such that . Let be the code obtained from by internal transformation with respect to . Then
with , , and , , , . We have , , and , for . In particular , . Thus . This implies that is a thin maximal bifix code of degree (see Proposition 6.2.8 and its complement page 242 in [4]).
Since , we have . By Theorem 4.2.11 of [3], is an maximal bifix code of degree at most . Since is uniformly recurrent, this implies that is finite. ∎
3 Strong, weak and neutral sets
In this section, we introduce strong, weak and neutral sets. We prove a theorem concerning the cardinality of an maximal bifix code in a neutral set (Theorem 3.6).
3.1 Strong, weak and neutral words
Let be a factorial set. For a word , let
We say that, with respect to , is strong if , weak if and neutral if .
A biextendable word is called ordinary if for some (see [8, Chapter 4]). If is biextendable, any ordinary word is neutral. Indeed, one has and thus .
Example 3.1
In a Sturmian set, any word is ordinary. Indeed, for any bispecial word , there is a unique letter such that is rightspecial and a unique letter such that is leftspecial. Then and .
We say that a set of words is strong (resp. weak, resp. neutral) if it is factorial and every word is strong or neutral (resp. weak or neutral, resp. neutral).
The sequence with is called the complexity of . Set .
Proposition 3.2
The complexity of a strong (resp. weak, resp. neutral) set is at least (resp. at most, resp. exactly) equal to .
Given a factorial set with complexity , we denote the first difference of the sequence and its second difference. The following is from [9] (it is also part of Theorem 4.5.4 in [8, Chapter 4] and also Lemma 3.3 in [6]).
Lemma 3.3
We have
for all .
Proposition 3.2 follows easily from the following lemma.
Lemma 3.4
If is strong (resp. weak, resp. neutral), then (resp. , resp. ) for all .
Proof.
Assume that is strong. Then for all and thus, by Lemma 3.3, the sequence is nondecreasing. Since , this implies for all . The proof of the other cases is similar. ∎
We now give an example of a set of complexity on an alphabet with three letters which is not neutral.
Example 3.5
Let . The Chacon word on three letters is the fixpoint of the morphism from into itself defined by , and . Thus . The Chacon set is the set of factors of . It is of complexity (see [11, Section 5.5.2]).
It contains strong, neutral and weak words. Indeed, and thus showing that the empty word is neutral. Next shows that and thus is strong. Finally, and thus showing that is weak.
3.2 The Cardinality Theorem
The following result, referred to as the Cardinality Theorem, is a generalization of a result proved in [3] in the less general case of a Sturmian set. Since the set is an maximal bifix code of degree (see Example 2.3), it is also a generalization of Proposition 3.2.
Theorem 3.6
Let be a recurrent set containing the alphabet and let be a finite maximal bifix code. Set and . If is strong (resp. weak), then (resp. ). If is neutral, then .
Note that, for a recurrent neutral set , a bifix code may be infinite since this may happen for a Sturmian set (see [3, Example 5.1.4]).
We consider rooted trees with the usual notions of root, node, child and parent. The following lemma is an application of a wellknown lemma on trees relating the number of its leaves to the sum of the degrees of its internal nodes.
Lemma 3.7
Let be a prefixclosed set. Let be a finite maximal prefix code and let be the set of its proper prefixes. Then .
We order the nodes of a tree from the parent to the child and thus we have if is a descendant of . We denote if with .
Lemma 3.8
Let be a finite tree with root on a set of nodes, let , and let be functions assigning to each node an integer such that

for each internal node , where the sum runs over the children of ,

for each leaf of , one has .
Then .
Proof.
We use an induction on the number of nodes of . If is reduced to its root, then implies and the result is true. Assume that it holds for trees with less nodes than . Since is finite and not reduced to its root, there is an internal node such that all its children are leaves of . Let be such a node. Since has value for each child of , the value is the same for all children of . Let be the tree obtained from by deleting all children of . Let be the set of nodes of . Let be the restriction of to and let be defined by
It is easy to verify that and satisfy the same hypotheses as and . Then
whence the result by the induction hypothesis. ∎
A symmetric statement holds replacing the inequality
in condition (i) by and the conclusion
by .
Proof of Theorem 3.6. Assume first that is strong. Let be larger than the lengths of the words of .
Let be the set of words of of length at most . By considering each word as the father of for , the set can be considered as a tree with root the empty word . The leaves of are the elements of of length .
For , set and let
Let us verify that the conditions of Lemma 3.8 are satisfied. Let be in with . Then, since is strong or neutral, . This implies that showing that condition (i) is satisfied.
Let be a leaf of , that is, a word of of length . Since is larger than the maximal length of the words of , the word is not an internal factor of and thus it has parses with respect to . It implies that it has suffixes which are proper prefixes of (since is right complete, this is the same as to have no prefix in ). Thus . Thus condition (ii) is also satisfied.
By Lemma 3.8, we have . Let be the set of proper prefixes of . By definition of , we have and thus by definition of , . Since is recurrent, is an maximal prefix code. Thus, by Lemma 3.7, we have and thus we obtain which is the desired conclusion.
The proof that if is weak is symmetric, using the symmetric version of Lemma 3.8. The case where is neutral follows then directly.
We illustrate Theorem 3.6 in the following example.
Example 3.9
Consider the set of words on the alphabet obtained as follows. Let be the Fibonacci set and let be the maximal bifix code of degree defined by . We consider the morphism defined by , , , . We set .
The words of of length at most are represented in Figure 3.1 on the left.
Since is Sturmian, it is a uniformly recurrent tree set (see the definition in Section 4). By the main result of [7], the family of uniformly recurrent tree sets is closed under maximal bifix decoding. Thus is a uniformly recurrent tree set.
The following example illustrates the necessity of the hypotheses in Theorem 3.6.
Example 3.10
Consider again the Chacon set of Example 3.5. Let and let be the maximal bifix codes of degree represented in Figure 3.3. The first one is obtained from by internal transformation with respect to . The second one with respect to .
We have and showing that and , illustrating the fact that is neither strong nor weak.
The following example shows that the class of sets of factor complexity is not closed by maximal bifix decoding.
Example 3.11
Let be the Chacon set and let be a coding morphism for the maximal bifix code of degree with elements of Example 3.10. One may verify that . This shows that the set does not have factor complexity .
3.3 A converse of the Cardinality Theorem
We end this section with a statement proving a converse of the Cardinality Theorem.
Theorem 3.12
Let be a uniformly recurrent set containing the alphabet . If any finite maximal bifix code of degree has elements, then is neutral.
Proof.
We may assume that has more than one element. We argue by contradiction. Let be a word which is not neutral. We cannot have since otherwise the maximal bifix code has not the good cardinality.
Set and . The set is an maximal bifix code of degree . Let be the code obtained by internal transformation from with respect to and defined by Equation (2.5). Note that and .
We distinguish two cases.
Case 1.
Assume that .
The code is defined by Equation (2.6) and we have . Since , the hypotheses of Proposition 2.9 are satisfied and has degree (by Proposition 4.4.5 in [3]). This implies . On the other hand
Since is not neutral, we have and thus we obtain a contradiction.
Case 2.
Assume next that . Then with for some letter and the sets defined by Equation (2.3) are . Moreover .
Since is not neutral, it is bispecial. Thus the sets are nonempty and the hypotheses of Proposition 2.9 are satisfied. Since is uniformly recurrent and since , the set is finite. Set . Thus .
Let be a letter such that . Then, since has suffixes which are proper prefixes of . Moreover, has no suffix in . Indeed, if , we cannot have since . And since all words in except have length greater than , is also impossible. Thus by Equation (2.1), we have and thus . This shows that the degree of is and thus that as in Case 1.
We may assume that is chosen maximal such that is not neutral. This is always possible if is neutral. Otherwise, Case 1 applies to and .
For (there may be no such integer if ), since is neutral, we have
Moreover, and . Thus
Hence evaluates as
(the last on the first line comes from the word counted twice in ). Since , this contradicts the fact that and have the same number of elements. ∎
4 Tree sets
We introduce in this section the notions of acyclic and tree sets. We state and prove the main result of this paper (Theorem 4.4). The proof uses results from [6].
4.1 Acyclic and tree sets
Let be a set of words. For , the extension graph of is the following undirected bipartite graph. Its set of vertices is the disjoint union of two copies of the sets and . Next, its edges are the pairs . By definition of , an edge goes from to if and only if .
Recall that an undirected graph is a tree if it is connected and acyclic.
Let be a biextendable set. We say that is acyclic if for every word , the graph is acyclic. We say that is a tree set if is a tree for all .
Clearly an acyclic set is weak and a tree set is neutral.
Note that a biextendable set is a tree set if and only if the graph is a tree for every bispecial nonordinary word . Indeed, if is not bispecial or if it is ordinary, then is always a tree.
Proposition 4.1
A Sturmian set is a tree set.
Indeed, is biextendable and every bispecial word is ordinary (see Example 3.1).
The following example shows that there are neutral sets which are not tree sets.
Example 4.2
Let and let be the set of factors of . The set is biextendable. One has . It is neutral. Indeed the empty word is neutral since . Next, the only nonempty bispecial words are and for . They are neutral since and . However, is not acyclic since the graph contains a cycle (and has two connected components, see Figure 4.1).
In the last example, the set is not recurrent. We present now an example, due to Julien Cassaigne [10] of a uniformly recurrent set which is neutral but is not a tree set (it is actually not even acyclic).
4.2 Finite index basis property
Let be a recurrent set containing the alphabet . We say that has the finite index basis property if the following holds: a finite bifix code is an maximal bifix code of