FiniteState Complexity and the Size of Transducers
Abstract
Finitestate complexity is a variant of algorithmic
information theory obtained by replacing Turing machines
with finite transducers. We consider the statesize of
transducers needed for minimal descriptions of arbitrary
strings and, as our main result, show that the statesize
hierarchy with respect to a standard encoding is
infinite. We consider also hierarchies yielded by
more general computable encodings.
Keywords: finite transducers, descriptional complexity,
statesize hierarchy, computability
1 Introduction
Algorithmic information theory [7, 5] uses the minimal size of a Turing machine that outputs a string as a descriptional complexity measure. The theory has produced many elegant and important results; however, a drawback is that all variants of descriptional complexity based on various types of universal Turing machines are incomputable. Descriptional complexity defined by resource bounded Turing machines has been considered in [4], and, at the other end of the spectrum, lie models based on contextfree grammars or finite automata.
Grammarbased complexity measures the size of the smallest contextfree grammar generating a single string. This model has been investigated since the 70’s, and recently there has been renewed interest due to applications in text compression and connections with LempelZiv codings, see e.g. [12, 13]; a general overview of this area can be found in [11]. The automatic complexity of a string [17] is defined as the smallest number of states of a DFA (deterministic finite automaton) that accepts and does not accept any other string of length . Note that a DFA recognizing the singleton language always needs states, which is the reason the definition considers only strings of length . Automaticity [1, 16] is an analogous descriptional complexity measure for languages. The finitestate dimension is defined in terms of computations of finite transducers on infinite sequences, see e.g. [2, 9].
The NFA (nondeterministic finite automaton) based complexity of a string [8] can also be viewed as being defined in terms of finite transducers that are called “NFAs with advice” in [8]. However, the model allows the advice strings to be over an arbitrary alphabet with no penalty in terms of complexity and, as observed in [8], consequently the NFAs used for compression can always be assumed to consist of only one state.
The finitestate complexity of a finite string was introduced recently [6] in terms of a finite transducer and a string such that the transducer on input outputs . Due to the nonexistence of universal transducers, the size of the transducer is included as part of the descriptional complexity measure. We get different variants of the measure by using different encodings of the transducers.
In our main result we establish that the measure results in a rich hierarchy in the sense that there is no a priori upper bound for the number of states used by transducers in minimal descriptions of given strings. The result applies to our standard encoding, as well as to any other “reasonable” encoding where a transducer is encoded by listing the productions in some uniform way.
By the statesize hierarchy we refer to the hierarchy of languages , , consisting of strings where a minimal description uses a transducer with at most states. We show that the statesize hierarchy with respect to the standard encoding is infinite; however, it remains an open question whether the hierarchy is strict at every level.
In a more general setting, the definition of finitestate complexity [6] allows an arbitrary computable encoding of the transducers, and properties of the statesize hierarchy depend significantly on the particular encoding. We establish that, for suitably chosen computable encodings, every level of the statesize hierarchy can be strict.
2 Preliminaries
If is a finite set then is the set of all strings (words) over , with denoting the empty string. The length of is denoted by . We use to denote strict set inclusion.
For all unexplained notions concerning transducers we refer the reader to [3, 18]. In the following, by a transducer we mean a (left) sequential transducer [3], also called a deterministic generalised sequential machine [18], where both the input and output alphabets are . The set of all transducers is .
A transducer is denoted as a triple where is the finite set of states, is the start state, (all states of are considered to be final), and
(1) 
is the transition function. When a transducer is represented as a figure, each transition , , , , is represented by an arrow with label from state to state , and (respectively, ) is called the input (respectively, output) label of the transition. By the (state) size of , , we mean number of states in the set .
The function computed by the transducer is, by slight abuse of notation, also denoted by and defined by , , for , . Here , , are the two projections on , and is defined by , , .
By a computable encoding of all transducers
we mean a pair
where is
a decidable set and
is a computable bijective mapping that associates
a transducer
to each .
We say that is a polynomialtime (computable) encoding if and for a given we can compute the transducer in polynomial time. We identify a transducer with its transition function (1), and the set of state names is always where is the start state. By computing the transducer we mean an algorithm that (in polynomial time) outputs the list of transitions (corresponding to (1), with state names written in binary) of .
Next we define a fixed natural encoding of transducers that we call the standard encoding. For our main result we need some fixed encoding of the transducers where the length of the encoding relates in a “reasonable way” to the lengths of the transition outputs. We encode a transducer as a binary string by listing for each state and input symbol the output and target state corresponding to the pair , that is, . Thus, the encoding of a transducer is a list of (encodings of) states and output strings. For succinctness, in the list we omit (that is, replace by ) the states that correspond to selfloops.
By bin() we denote the binary representation of . Note that for all , always begins with a . For , , , we use the following functions producing selfdelimiting versions of their inputs (see [5]): and , where is the negation morphism given by . It is seen that and .
We define the set to consist of all strings of the form
(2) 
where , , , and
A string as in (2) encodes the transducer , where , , . Note that in (2), if the corresponding transition of is a selfloop.
Now we define the standard encoding as the pair where associates to each the transducer as described above. It can be verified that for each there exists a unique such that , that is, and have the same transition function. The details of verifying that when can be found in [6]. For , the standard encoding of is the unique such that . The standard encoding is a polynomialtime encoding.
Note that using a modification of the above definitions it is possible to guarantee that the set of all legal encodings of transducers is regular [6] – this is useful e.g., for showing that the nonexistence of a universal transducer is not caused simply by the fact that a finite transducer cannot recognize legal encodings of transducers. More details about computable encodings can be found in [6], including binary encodings that are more efficient than the standard encoding.
3 Finitestate complexity
In the general form, the transducer based finitestate complexity with respect to a computable encoding of transducers in is defined as follows [6].
We say that a pair , , , defines the string provided that ; the pair is called a description of . As the pair is uniquely represented by the pair we define the size of the description by
We define the finitestate complexity of a string with respect to encoding by the formula:
We will be interested in the statesize, that is, the number of states of transducers used for minimal encodings of arbitrary strings. For we define the language to consist of strings that have a minimal description using a transducer with at most states. Formally, we write
By setting , the set of strings for which the smallest number of states of a transducer in a minimal description of is can then be denoted as
Also, we let denote the set of strings that have a minimal description in terms of a transducer with exactly states. Note that , but the converse inclusion need not hold, in general, because strings in may have other minimal descriptions with fewer than states.
In the following, when dealing with the standard encoding (introduced in Section 2) we write, for short, , , and , , , , instead of , , and , , , respectively. The main result in section 4 is proved using the standard encoding; however, it could easily be modified for any “naturally defined” encoding of transducers, where each transducer is described by listing the states and transitions in a uniform way. For example, the more efficient encoding considered in [6] clearly satisfies this property. On the other hand, when dealing with arbitrarily defined computable encodings , the languages , , obviously can have very different properties. In section 5 we will consider properties of more general computable encodings.
The finitestate complexity with respect to an arbitrary computable encoding is computable [6] because for given , gives an upper bound for where is an encoding of the onestate identity transducer. An encoding of the identity transducer can be found from an enumeration of strings in , and after this we can simply try all transducer encodings and input strings up to length . Hence “inf” can be replaced by “min” in the definition of .
Proposition 3.1
For any computable encoding , the languages , , are decidable.
We conclude this section with an example concerning the finitestate complexity with respect to the standard encoding.
Example 3.1
Define the sequence of strings
Using the transducer of Figure 1 we produce an encoding of . Note that .
With the encodings of the states indicated in Figure 1, is encoded by a string of length 352. Each number can be represented as a sum of, on average, 3.18 numbers from the multiset [15]. Thus, when we represent in the form , we need on average at most symbols in to output each substring , . (This is only a very rough estimate since it assumes that for each element in the sum representing we need to make a cycle of length six through the start state, and this is of course not true when the sum representing has some element occurring more than once.) Additionally we need to produce the 100 symbols “1”, which means that the length of can be chosen to be at most 2008. Our estimate gives that
which is a very rough upper bound for .
The above estimation could
be improved using more detailed information
from the computation of the average from [15]. Furthermore,
[15] gives other systems of six numbers that,
on average, would give a more efficient way to represent numbers from 0
to 99 as the sum of the least number
of summands.
4 Statesize hierarchy
We establish that finitestate complexity is a rich complexity measure with respect to the number of states of the transducers, in the sense that there is no a priori upper bound for the number of states used for minimal descriptions of arbitrary strings. This is in contrast to algorithmic information theory, where the number of states of a universal Turing machine can be fixed.
For the hierarchy result we use the standard encoding . The particular choice of the encoding is not important and the proof could be easily modified for any encoding that is based on listing the transitions of a transducer in a uniform way. However, as we will see later, arbitrary computable encodings can yield hierarchies with very different properties.
Theorem 4.1
For any there exists a string such that .
Proof. Consider an arbitrary but fixed . We define strings of length ,
For , we define
Let be an arbitrary encoding of where . We show that by choosing to be sufficiently large as a function of , we have
(3) 
The set of transitions of can be written as a disjoint union , where

consists of the transitions where the output contains a unique , , as a substring,
^{5} that is, for any , is not a substring of the output; 
consists of the transitions where for distinct , the output contains both and as a substring;

consists of transitions where the output does not contain any of the ’s as a substring, .
Note that if a transition is used in the computation , the output produced by cannot completely overlap any of the occurrences of ’s, . Hence
a transition of used by on has output length at most .  (4) 
Since has at most states, and consequently at most transitions, it follows by the pigeonhole principle that there exists such that is not a substring of any transition of . We consider how the computation of on outputs the substring of . Let , …, be the minimal sequence of outputs that “covers” . That is, (respectively, ) is the output of a transition that overlaps with a prefix (respectively, a suffix) of and is a substring of .
Define
By the choice of we know that . For , we know that the transition outputting can be applied only once in the computation of on because for all occurrences of as substrings of occur before all occurrences of . Thus, for , the use of this transition contributes at least to the length of the encoding .
Finally, by (4), for any we have . Such transitions may naturally be applied multiple times, however, the use of each transition outputting , , contributes at least one symbol to .
Thus, we get the following estimate:
To complete the proof it is sufficient to show that, with a suitable choice of , . The string can be represented by the pair where is the state transducer from Figure 2 and .
Each state of can be encoded by a string of length at most , so (recalling that in the standard encoding each transition output contributes to the length of the encoding and each binary encoding of a state name that is the target of a transition that is not a selfloop contributes to the length of the encoding) we get the following upper bound for the length of a string encoding :
Noting that we observe that
(5) 
for example, if we choose . This completes the proof.
As a corollary we obtain that the sets of strings with minimal descriptions using a transducer with at most states, , form an infinite hierarchy.
Corollary 4.1
For any , there exists effectively such that
.
We do not know whether all levels of the statesize hierarchy with respect to the standard encoding are strict. Note that the proof of Theorem 4.1 constructs strings that have a smaller description using a transducer with states than any description using a transducer with states. We believe that (with chosen as in the proof of Theorem 4.1) the minimal description of , in fact, has states, but do not have a complete proof for this claim. The claim would imply that is strictly included in , . In any case, the construction used in the proof of Theorem 4.1 gives an effective upper bound for the size of such that , because the estimation (5) (with the particular choice for ) implies also an upper bound for the number of states of a transducer used in a minimal description of .
The standard encoding is monotonic in the sense that adding states to a transducer or increasing the lengths of the outputs, always increases the length of an encoding. This leads us to believe that for any there exist strings where the minimal transducer has exactly states, that is, for any , .
Conjecture 4.2
, for all .
By Proposition 3.1 we know that the languages are decidable. Thus, for such that , in principle, it would be possible to compute the length of shortest words in . However, we do not know how behaves as a function of . Using a bruteforce search we have established [6] that all strings of length at most 23 have a minimal description using a single state transducer.
Open problem 1
What is the asymptotic behavior of the length of the shortest words in as a function of ?
Also, we do not know whether there exists that has two minimal descriptions (in the standard encoding) where the respective transducers have different numbers of states. This amounts to the following:
Open problem 2
Does there exist such that ?
5 General computable encodings
While the proof of Theorem 4.1 can be easily modified for any encoding that, roughly speaking, is based on listing the transitions of a transducer, the proof breaks down if we consider arbitrary computable encodings . Note that the number of transducers with states is infinite and, for arbitrary computable , it does not seem easy, analogously as in the proof of Theorem 4.1, to get upper and lower bounds for for suitably chosen strings . We do not know whether there exist computable encodings for which the statesize hierarchy collapses to a finite level.
Open problem 3
Does there exist and a computable encoding such that that, for all , ?
On the other hand, it is possible to construct particular encodings for which every level of the statesize hierarchy is strict.
Theorem 5.1
There exists a computable encoding such that
Proof. Let , , be the th prime. We define an state () transducer by setting by , , , , , and .
In the encoding we use the string to encode the transducer , . Any transducer that is not one of the above transducers , , is encoded in by a string , , where is at least the sum of the lengths of outputs of all transitions in . This condition is satisfied, for example by choosing the encoding of in to be simply 0 concatenated with the standard encoding of .
Let be arbitrary but fixed. The string has a description of size , where encodes and the transducer has states. We show that .
By the definition of the transducers , for any , is of the form , . Thus, cannot be the output of any transducer , .
On the other hand, consider an arbitrary description of the string where is not any of the transducers , . Let be the length of the longest output of a transition of . Thus, . By the definition of we know that , and we conclude that
We have shown that, in the encoding , the unique minimal description of uses a transducer with states, which implies .
The encoding constructed in the proof of Theorem 5.1 is not a polynomialtime encoding because has an encoding of length , whereas the description of the transition function of (in the format specified in Section 2) has length . Besides the above problem is otherwise efficiently computable and using standard “padding techniques” we can simply increase the length of all encodings of transducers in .
Corollary 5.1
There exists a polynomial time encoding such that
Proof. The encoding is obtained by modifying the encoding of the proof of Theorem 5.1 as follows. For , is encoded by the string . Any transducer that is not one of the transducers , , is encoded by a string where and is the sum of the lengths of outputs of all transitions of . If is the standard encoding of , for example, we can choose .
Now is polynomially related to the length of the description of the transition function of , , and given the transition function of can be output in quadratic time. For transducers not of the form , , the same holds trivially.
Essentially in the same way as in the proof of Theorem 5.1, we verify that for any , the string has a unique minimal description , where is the description of the state transducer . The same argument works because, the encoding of any transducer in is, roughly speaking, obtained from the encoding of in by appending symbols 1.
There exist computable encodings that allow minimal descriptions of strings based on transducers with different numbers of states. Furthermore, the gap between the numbers of states of the transducers used for different minimal descriptions of the same string can be made arbitrarily large, that is, for any we can construct an encoding where some string has minimal descriptions both using transducers with either or states. The proof uses an idea similar to the proof of Theorem 5.1.
Theorem 5.2
For any , there exists a computable encoding such that .
6 Conclusion
As perhaps expected, the properties of the statesize hierarchy with respect to the specific computable encodings considered in section 5 could be established using constructions where we added to transducers additional states without changing the size of the encoding. In a similar way various other properties can be established for the statesize hierarchy corresponding to specific (artificially defined) computable encodings. The main open problem concerning general computable encodings is whether it is possible to construct an encoding for which the statesize hierarchy collapses to some finite level, see Problem 3.
As our main result we have established that the statesize hierarchy with respect to the standard encoding is infinite. Many interesting open problems dealing with the hierarchy with respect to the standard encoding remain. In addition to the problems discussed in section 4, we can consider various types of questions related to combinatorics on words. For example, assuming that a minimal description of a string needs a transducer with at least states, is it possible that has a minimal description based on a transducer with less than states?
Conjecture 6.1
If (), then for any , .
Footnotes
 Research supported in part by FRDF Grant of the UoA.
 Research supported in part by NSERC.
 In a more general setting the mapping may not be injective (for example, if we want to define as a regular set [6]), however, in the following we restrict consideration to bijective encodings in order to avoid unnecessary complications with the notation associated with our statesize hierarchy.
 In [15] it is established that 18 is the optimal value to add to an existing system of .
 By a substring we mean a “continuous substring”.
 Note that here “” stands for strict inclusion.
References
 J.P. Allouche, J. Shallit. Automatic Sequences. Cambridge University Press, 2003.
 C. Bourke, J.M. Hitchcock, N.V. Vinodchandran. Entropy rates and finitestate dimension. Theoret. Comput. Sci. 349, 392–406, 2005.
 J. Berstel. Transductions and Contextfree Languages. Teubner, 1979.
 H. Buhrman, L. Fortnow. Resourcebounded Kolmogorov complexity revisited. In Proc. STACS’97, Lect. Notes Comput. Sci. 1200, 105–116, Springer, 1997.
 C.S. Calude. Information and Randomness—An Algorithmic Perspective, 2nd ed., Springer, Berlin, 2002.
 C.S. Calude, K. Salomaa, T. Roblot. FiniteState Complexity and Randomness. Technical Report CDMTCS374, Dec. 2009. Extended abstract presented at CiE 2010.
 G. Chaitin. Algorithmic Information Theory. Cambridge University Press, 1987.
 M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Rasala, A. Sahai and A. Shelat. Approximating the smallest grammar: Kolmogorov complexity in natural models. In Proceedings of STOC’02, ACM Press, 792–801, 2002.
 D. Doty, J.H. Lutz, S. Nandakumar. Finitestate dimension and real arithmetic. Inform. Comput. 205, 1640–1651, 2007.
 R. K. Guy. Unsolved Problems in Number Theory, 3rd ed., Springer, Berlin, 2004.
 E. Lehman. Approximation Algorithms for Grammarbased Compression. PhD thesis, MIT, 2002.
 E. Lehman and A. Shelat. Approximation algorithms for grammarbased compression. In Proc. of SODA’02, SIAM Press, 205–212, 2002.
 W. Rytter. Grammar compression, LZencodings, and string algorithms with implicit input. In Proc. ICALP’04, Lect. Notes Comput. Sci. 3142, 15–27, Springer, 2004.
 J. Shallit. The computational complexity of the local postage stamp problem. SIGACT News 33, 90–94, 2002.
 J. Shallit. What this country needs is an 18 cent piece. Mathematical Intelligencer 25, 20–23, 2003.
 J. Shallit and Y. Breitbart. Automacity I: Properties of a measure of descriptional complexity. J. Comput. System Sci., 53, 10–25, 1996.
 J. Shallit and M.W. Wang. Automatic complexity of strings. J. Automata, Languages and Combinatorics, 6, 537–554, 2001.
 S. Yu. Regular languages. In: G. Rozenberg, A. Salomaa (eds.). Handbook of Formal Languages, vol. I, Springer, Berlin, 41–110, 1997.