On Subword Complexity of Morphic Sequences

# On Subword Complexity of Morphic Sequences

Rostislav Devyatov,
Moscow State University and Independent University of Moscow,
deviatov1@rambler.ru
###### Abstract

We study structure of pure morphic and morphic sequences and prove the following result: the subword complexity of arbitrary morphic sequence is either for some , or is .

## 1 Introduction

Morphisms and morphic sequences are well known and well studied in combinatorics on words (e. g., see [1]). We study their subword complexity.

Let be a finite alphabet. A mapping is called a morphism if for all . A morphism is determined by its values on single-letter words. A morphism is called nonerasing if for each , and is called coding if for each . Let denote .

Let for some , , and suppose is not empty. Then an infinite sequence is well-defined and is called pure morphic. Sequences of the form with coding are called morphic.

In this paper we study a natural combinatorial characteristics of sequences, namely subword complexity. The subword complexity of a sequence is a function where is the number of all different -length subwords occurring in . For a survey on subword complexity, see, e. g., [2]. Pansiot showed [3] that the subword complexity of an arbitrary pure morphic sequence adopts one of the five following asymptotic behaviors: , , , , or . Since codings can only decrease subword complexity, the subword complexity of every morphic sequence is . We formulate the following main result.

###### Theorem 1.1.

The subword complexity of a morphic sequence is either for some , or .

Note that for each the complexity class is non-empty [4].

We give an example of a morphic sequence with in Section 9.

Let be a finite alphabet, be a morphism, be a coding, be a letter such that starts with , be the pure morphic sequence generated by from , and be a morphic sequence. By Theorem 7.7.1 from [1] every morphic sequence can be generated by a nonerasing morphism, so further we assume that is nonerasing. To prove Theorem 1.1, we will first replace by its power so that it will have better properties, see Section 3. It is already clear from the definition of a pure morphic sequence that if we replace by its power, then and will not change.

Possibly, we will also add several (at most two) ”new” letters to so that and will be defined on the ”old” letters as previously, and will map the ”new” letters to the ”new” letters only. This will not modify and , and the only reason why we do that is that this simplifies formulations of some statements. For example, we may want to say that a (finite) subword of can be written as a finite word repeated several times, where belongs to a prefixed finite set. In a particular case it can turn out that is the empty word, and then it can be written as any word repeated zero times. However, to ease the formulation of this statement, it is convenient to know that the set where we are allowed to take from is nonempty, even if all letters of all possible words are not present in at all.

To prove Theorem 1.1, we will have to develop some ”structure theory” of pure morphic and morphic sequences (see Sections 47). We will introduce and study the notions of a letter of order , of a -block, of a -multiblock, of a stable -(multi)block, of an evolution, and of a continuously periodic evolution. Actually, these notions will be defined correctly only after we replace with for an appropriate and possibly add several letters to as explained in Section 3 (more precisely, if is a strongly 1-periodic morphism with long images, and if contains at least one periodic letter of order 1 and at least one periodic letter of order 2). These studies of the structure of pure morphic and morphic sequences may be of independent interest.

Using these notions, we can formulate the following two propositions, which the proof of Theorem 1.1 is based on:

###### Proposition 1.2.

Let . If is a letter such that for some , and there are evolutions of -blocks arising in that are not continuously periodic, then the subword complexity of is .

###### Proposition 1.3.

Let . If is a letter of order at least such that for some , and all evolutions of -blocks arising in are continuously periodic, then the subword complexity of is .

However, these two propositions do not cover all cases needed to prove Theorem 1.1. This is not clear right now, before we give the definitions, but, for example, if is a letter of order , where , such that for some , and evolutions of -blocks that are not continuously periodic do not exist (as we will see later, in this case evolutions of -blocks do not exist at all), then Proposition 1.2 does not give us any upper estimate, and we cannot use Proposition 1.3 either, because if we want to use it for -blocks, has to be a letter of order at least . Also, Propositions 1.2 and 1.3 do not say anything about complexity . The following three propositions will help us to prove Theorem 1.1 in these cases:

###### Proposition 1.4.

Let . Suppose that is a strongly 1-periodic morphism with long images and is a letter of order such that for some . Suppose that all evolutions of -blocks arising in are continuously periodic.

Let be the rightmost letter of order in , and let be the rightmost letter of order in .

If there exists a final period such that is a completely -periodic word with period , then the subword complexity of is , otherwise it is .

###### Proposition 1.5.

If is a letter of order 2 such that for some , then the subword complexity of is .

###### Proposition 1.6.

Let . Let be a letter of order such that for some , and let . Suppose that if is a letter of finite order and occurs in , then . Suppose that all evolutions of -blocks arising in are continuously periodic.

Then the subword complexity of is .

## 2 Preliminaries

When we speak about finite words or about words infinite to the right, their letters are enumerated by nonnegative integer indices (starting from 0). The length of a finite word is denoted by .

We will speak about occurrences in . Strictly speaking, we call a pair of a word and a location in an occurrence if the subword of that starts from position in and is of length is . This occurrence is denoted by if is the index of the last letter that belongs to the occurrence. In particular, denotes a single-letter occurrence, and denotes an occurrence of the empty word between the -th and the -th letters. Since , might be considered either as a morphism on words (which we call abstract words sometimes), or as a mapping on the set of occurrences in . Usually we speak of the latter, unless stated otherwise. Sometimes we write for the identity morphism.

A finite word is called a prefix of a (finite or infinite to the right) word if . A finite word is called a suffix of a finite word if .

We call a finite word weakly -periodic with a left (resp. right) period (where ) if and (resp. ), where is the remainder of modulo , is allowed here. We shortly say ”a weakly left (resp. right) -periodic word” instead of ”a weakly -periodic word with left (resp. right) period ”. will be always considered as an abstract word. The subword (resp. ) is called the incomplete occurrence. All the same is with sequences of symbols or numbers. If , then is called completely -periodic with period (which is both left period and right period in this case, so we sometimes call it a complete period). Again, we shortly say ”a completely -periodic word” instead of ”a completely -periodic word with period ”.

Clearly, a weakly -periodic word with some left period always is also weakly -periodic with some right period, and these periods are cyclic shifts of each other. So, we introduce some notation for cyclic shifts. If is a finite word and , we denote the cyclic shift of that begins with the last letters of and ends with the first letters of by . In other words, . If and is the residue of modulo , we denote . In particular, if , then , in other words, is the cyclic shift of that begins with the last letters of and ends with the first letters of .

We widely use the following easy properties of periods and cyclic shifts:

###### Remark 2.1.
1. If , then .

2. If a finite word is weakly -periodic with left period , where is a word of length , then is also weakly -periodic with right period .

3. If a finite word is weakly -periodic with right period , where is a word of length , then is also weakly -periodic with left period .

4. If is a word of length , two finite words and are weakly -periodic, and (resp. ) is weakly -periodic with right (resp. left) period , then the concatenation is weakly -periodic with left period and is also weakly -periodic with right period .

The following lemma, informally speaking, shows that if we know a finite word is ”long enough” and is weakly -periodic for some , which we maybe don’t know itself, but we know that is ”small enough”, then these data determine and the left, right or complete period uniquely.

###### Lemma 2.2.

Let be a finite word. Suppose that is weakly -periodic with a left period and is weakly -periodic with a left period at the same time. Suppose also that and . Then there exists a finite word such that is repeated times and is repeated times for some .

###### Proof.

If , then the statement is obvious. Otherwise, without loss of generality we may suppose that . Then .

Note that the fact that is weakly -periodic can be written as follows: for all one has . Let us prove that is weakly -periodic with a left period . Choose an index , . If , then (since is weakly -periodic) and (since is weakly -periodic). If , then since , so . So .

Note that if is divisible by , then the claim is also clear. Otherwise set , and write , where . If we repeat the argument above times, we will see that is weakly -periodic with a left period .

Finally, we write Euclid algorithm for and :

.
If we repeat all arguments above for each of the pairs , we will finally see that is weakly -periodic with a left period , where is the g. c. d. of and . In particular, since is also weakly -periodic with left period and , this also means that is repeated times. Similarly, is repeated times. ∎

The same lemma for right periods instead of left ones can be proved in completely the same way. After we have this lemma, it is reasonable to give the following definition. A finite word is called the minimal left (resp. right) period of a finite word if , is weakly left (resp. right) -periodic and is not weakly -periodic if . The following corollary provides more properties of the minimal periods if they exist.

###### Corollary 2.3.

Let be a finite word. If there exists such that is weakly -periodic and , then there exist minimal left and right periods of .

If is the minimal left (resp. right) period of , and is weakly -periodic with left (resp. right) period , where , then is divisible by and is repeated times.∎

A similar statement in the case of complete -periodicity follows directly since a word is completely -periodic exactly if it is weakly -periodic and its length is divisible by . A finite word is called the minimal complete period of a finite word if , is completely -periodic, and is not weakly -periodic if .

###### Corollary 2.4.

Let be a finite word. If there exists such that is completely -periodic and , then there exist a minimal complete period of .

If is the complete period of , and is weakly -periodic with left (resp. right) period , where , then is divisible by and is repeated times.∎

###### Corollary 2.5.

Let be a finite word, let and be two occurrences in . Suppose that is weakly -periodic, and is weakly -periodic. Suppose also that these two occurrences overlap, and their intersection (denote it by ) has length at least . In other words, , , and .

Then the union of these two occurrences (i. e. the occurrence , where and ) is a weakly -periodic word.

###### Proof.

Without loss of generality, . Then and . Let be the left period of (so that ), and let be the left period of (so that ). Denote the residue of modulo by . Then, if we write as repeated several times, will be . Moreover, becomes a weakly -periodic word with left period . Since is a prefix of , is also a weakly -periodic word with left period . Now, by Lemma 2.2, there exists a word of length such that is repeated times and is repeated times. But then can also be written as a cyclic shift of repeated times.

Now, since is weakly -periodic with left period , it is also weakly -periodic. Since is weakly -periodic with left period , it is also weakly -periodic. In other words, if and are two indices such that and , then as an abstract letter. Also, if and are two indices such that and , then again as an abstract letter.

If , we are done. Otherwise and , and we have . So, if , but , then , , and anyway. Hence, is weakly -periodic. ∎

Note that in the last computation an inequality instead of would be enough, but we cannot replace with in the statement of the corollary, because we also need the inequality in Lemma 2.2, and there it cannot be a priori replaced by .

An infinite word (where ) is called periodic with a period (where , ) if , in other words, if for all , . An infinite word (where ) is called eventually periodic with a period (where , ) and a preperiod (where , ) if , in other words, if for and for all , .

Sometimes we will also speak about words infinite to the left. We enumerate indices in such words by nonpositive indices, i. e. such a word can be written as (where , ). Such a word is called periodic with a period (where , ) if , in other words, if for all , .

## 3 Periodicity properties of morphisms

For each letter , the function , is called the growth rate of . Let us define orders of letters with respect to . We say that has order if , and has order if for some ().

Consider a directed graph defined as follows. Vertices of are letters of . For every , for each occurrence of in , construct an edge . For instance, if , we construct two edges and three edges . Fig. 1 shows an example of graph .

Using the graph , let us prove the following lemma.

###### Lemma 3.1.

For every , either has some order , or has order . If is a letter of order , then contains at least one letter of order . For every of order , either never appears in (and then is called preperiodic), or for each a unique letter of order occurs in , and the sequence is periodic (then is called periodic).

If is a letter of order , then contains at least one letter of order , and contains at least two letters of order if is large enough.

If is a periodic letter of order and occurs in , then at least one letter of order occurs in .

###### Proof.

Consider also the following graph . Vertices of are strongly connected components of . There is an edge from to iff there is an edge from some of vertices (in ) to some of vertices. Fig. 2 shows an example of the corresponding graph .

Let be the subgraph of induced by vertices such that for all vertices there is at most one edge outgoing from to a vertex . Let be the subgraph of induced by vertices such that for all vertices there are no edges outgoing from to a vertex . In Fig. 2 and 1 the vertices of (resp. the corresponding vertices of ) are black, the vertices of (resp. the corresponding vertices of ) are gray, and the vertices of (resp. the corresponding vertices of ) are white. We will now assign orders (natural numbers or infinity) to the vertices of (hence, to the vertices of too).

A vertex is called a vertex of order one if it does not have outgoing edges (in , not in ). Then assign order one to the vertices (if any) of graph that have outgoing edges to the vertices that are already of order one only. Repeat this operation until there are no new vertices of order one.

Suppose some vertices already are of order (and we don’t want to assign order to any other vertex of ). Then a vertex is called vertex of order if all the edges outgoing from it lead to vertices of order or less. Then, consider a vertex that has not been currently assigned to be of some order. If all its outgoing edges lead to vertices of orders , assign to be of order . Repeat this operation until there are no new vertices of order .

All vertices that currently have no order assigned (after completing the above procedure for each ), are called vertices of order .

We have assigned orders to the vertices of , hence also to the vertices of (that are the letters of ). It follows directly from the definition of the order of a vertex that if is a vertex of order , then there is an edge going from to (possibly another) vertex of order . One can prove by induction on that

Any letter of finite order has the rate of growth . Any letter of infinite order has the rate of growth for some .

Thus, two definitions of the order of a letter are equivalent.

Vertices of of order are exactly the vertices of such that there exists a path from to a vertex . It is already clear that if is a letter of order , then contains a letter of order . To prove that if is large enough, then contains at least two letters of order , we may assume without loss of generality that already belongs to a strongly connected component of such that . Then there exists a vertex such that there are at least two edges leading from to vertices of in . This means that contains at least two letters of order , and contains for some . Then contains at least two letters of order if .

A vertex of of finite order is called preperiodic if it actually belongs to , otherwise it is called periodic. A vertex of (i. e. a letter) is called periodic (resp. preperiodic) iff the corresponding vertex of is periodic (resp. preperiodic). If is a periodic vertex of order , it has exactly one outgoing edge to a vertex of order . These two vertices correspond to the same vertex , and all vertices of that correspond to (i. e. that belong to the strongly connected component ) actually form a directed loop. Unlike that, any edge that starts in a preperiodic vertex of order , leads to a vertex that had been assigned to be of some order before . Hence, this definition of a periodic letter and the definition from the lemma statement are equivalent.

To prove the last claim, observe that if is a periodic letter of order , then there must be an edge going from to a vertex of order , otherwise we would have assigned to be a vertex of order or less. Therefore, there is a vertex corresponding to such such that there is an edge going from to a vertex of of order . In other words, contains a letter of order . Let be (possibly another) vertex of corresponding to . Then we already know that and are contained in a directed loop in . If occurs in , then is divisible by the length of this loop, hence is greater than or equal to the length of this loop, and there exists () such that contains . Then contains a letter of order , and contains . The image of a letter of order always contains a letter of order , so a letter of order occurs in and hence in . ∎

In the example of a graph in Fig. 1, and are vertices of order one. We cannot assign any other vertex to be of order one, so we assign then to be of order two. It is a periodic vertex. Then we can see that has a single outgoing edge, and it leads to . Thus, should be a preperiodic vertex of order two. The remaining vertex cannot be of finite order since it does not belong to . It is a vertex of order .

In general, it is possible that all letters in have order . However, it will be convenient for us if at least one periodic letter of order 1 and at least one periodic letter of order 2 exists. So, first, if periodic letters of order 1 do not exist in (then it follows from the construction above that all letters in have order ), we add one more letter (that we temporarily denote by ) to and set , (without varying and on other letters). Then is a periodic letter of order 1. From now on, we suppose that periodic letters of order 1 exist in .

Second, suppose that periodic letters of order 1 exist in , but periodic letters of order 2 do not exist (it follows from the above construction that in this case all letters in have either order 1, or order ). Let be a periodic letter of order 1. We add one more letter to (denote it temporarily by ) and set , (again, we do not change and on other letters). Then is a periodic letter of order 2. From now on, we suppose that periodic letters of order 2 exist in .

Now we are going to replace by for some to get a morphism satisfying better properties. Namely, first let us call a nonerasing morphism weakly 1-periodic if:

1. If is a preperiodic letter of order , then all letters of order in are periodic.

2. If is a periodic letter of order , then the letter of order contained in is .

We would like to choose so that is a weakly 1-periodic morphism. Note first that the orders of letters with respect to are the same as their orders with respect to . Periodic and preperiodic letters with respect to remain periodic and preperiodic (respectively) with respect to . If the first letter in is for some , then begins with as well, and .

###### Lemma 3.2.

There exists such that is a weakly 1-periodic morphism.

###### Proof.

If is a preperiodic letter of order , then does not contain for any . Therefore, there exists such that if , then all letters of order in are periodic. Take any such that for all these numbers for all preperiodic letters of finite order. (Clearly, is sufficient, in the example above we can take .) Set . If is a preperiodic letter of order , all letters of order in are periodic, and all letters of order in are also periodic. So, now it is sufficient to choose so that if is a periodic letter of order , then the letter of order occurring in is again. By the definition of a periodic letter, for each individual periodic letter there exists such that if is divisible by , then the letter of order contained in is . Now let us take divisible by all numbers for all periodic letters . (E. g., we can always take , and in the example above we can take .) Then is a weakly 1-periodic morphism. ∎

From now on, we replace by from the proof and assume that is a weakly 1-periodic morphism.

Actually, we want to improve more. For each and for each letter of order , the leftmost and rightmost letters of order in will be important for us. If and is a finite word in containing at least one letter of order , denote the leftmost (resp. rightmost) letter of order in by (resp. by ). Observe that if , then since if is a letter of order or less, then consists of letters of order or less only. Hence, is an eventually periodic sequence. Similarly, is also an eventually periodic sequence. We want to make these sequence as simple as possible, so we call a morphism strongly 1-periodic if for each and for each letter of order , one has and , in other words, the sequences and are both eventually periodic with periods of length one and preperiods of length 1.

Observe that the definition of a weakly 1-periodic morphism guarantees that if is a letter of finite order, then these sequences have periods of length 1, but we cannot say anything about the length of the preperiods. Also, we cannot say anything about the length of the period if all letters have order .

###### Lemma 3.3.

There exists such that is a strongly 1-periodic morphism.

###### Proof.

The proof is similar to the proof of the previous lemma. Namely, if is large enough, then for sequences and are eventually periodic with preperiods of length 1 for all and for all letters of order . Again, is sufficient for this purpose.

Now, if we take a large enough and set , then the sequences and will become eventually periodic with periods of length 1 for all and for all letters of order . This time, is sufficient. Clearly, the preperiods of length 1 will remain the same. ∎

From now on, we replace by from the proof and assume that is strongly 1-periodic.

Our final improvement of the morphism will guarantee that the image of each letter is ”sufficiently long”. Namely, first we are going to define the set of final periods. Let be a letter such that . Then the prefix of to the left of the leftmost occurrence of in consists of letters of order 1 only, denote it by . That is, if and for , then . Suppose that is nonempty. Then consists of periodic letters of order 1 only. Recall that to construct a morphic sequence, we use and also a coding . Consider the word . Since we have repeated twice, we can apply Corollary 2.4 and conclude that there exists the minimal complete period of , denote it by . We call , as well as all its cyclic shifts, final periods. Similarly, we can define a final period using a letter such that and considering the suffix of to the right of the rightmost occurrence of . These are all words we call final periods, i. e. a final period is a word obtained from a letter such that and by the procedure described above or a word obtained from a letter such that and does not end with by a similar procedure.

###### Lemma 3.4.

If is a final period, then cannot be written as a finite word repeated more than once.

###### Proof.

Since is a final period, there exists a finite word (which is also a final period) and a finite word such that is the minimal complete period of and for some ().

Suppose that can be written as a finite word repeated more than once, in other words, that is a completely -periodic word and . But then is a completely -periodic word, where . Then the word , which is repeated several times, is also a completely -periodic word. But , and this is a contradiction with the fact that is the minimal complete period of . ∎

Note that final periods always exist if is a strongly 1-periodic morphism and there is a periodic letter of order 2 in (we have already assumed that this is true). Indeed, if is a periodic letter of order 2, then contains exactly one occurrence of order 2, which is , and at least one letter of order 1. In other words, can be written as , where the words and consist of letters of order 1 only, and at least one of these words is nonempty. We can use this nonempty word to construct a final period.

Clearly, the amount of final periods is finite and their lengths are bounded. Denote the maximal length of a final period by .

###### Lemma 3.5.

Let . Then the sets of final periods for and for are the same.

If (resp. ) is a prefix (resp. suffix) of , where and is a finite word consisting of letters of order 1 only, then (resp. ), where is repeated times, is a prefix (resp. suffix) of .

###### Proof.

Choose a letter such that . (The case is completely symmetric.) Suppose that and denote the prefix of to the left of the leftmost occurrence of by . Then is a prefix of and consists of letters of order 1 only. Let us prove by induction on that , where is repeated times, is a prefix of . For we already know this. If , where is repeated times, is a prefix of , then , where is repeated times, is a prefix of . Recall that is a prefix of , so , where is repeated times, is also a prefix of