Decision Problems For Convex Languages
Abstract
In this paper we examine decision problems associated with various classes of convex languages, studied by Ang and Brzozowski (under the name “continuous languages”). We show that we can decide whether a given language is prefix, suffix, factor, or subwordconvex in polynomial time if is represented by a DFA, but that the problem is PSPACEhard if is represented by an NFA. In the case that a regular language is not convex, we prove tight upper bounds on the length of the shortest words demonstrating this fact, in terms of the number of states of an accepting DFA. Similar results are proved for some subclasses of convex languages: the prefix, suffix, factor, and subwordclosed languages, and the prefix, suffix, factor, and subwordfree languages.
1 Introduction
Thierrin [11] introduced convex languages with respect to the subword relation. Ang and Brzozowski [2] generalized this concept to arbitrary relations. For example, a language is said to be prefixconvex if, whenever with a prefix of , then any word must also be in if is a prefix of and is a prefix of . Similar definitions hold for suffix, factor, and subwordconvex languages. (In this paper, a “factor” is a contiguous block inside another word, while a “subword” need not be contiguous. In the literature, these concepts are sometimes called “subword” and “subsequence”, respectively.)
A language is said to be prefixfree if whenever , then no proper prefix of is in . (By proper we mean a prefix of other than itself.) Prefixfree languages (prefix codes) were studied by Berstel and Perrin [4]. Han has recently considered free languages for various values of , such as prefix, suffix, factor and subword [7].
A language is said to be prefixclosed if whenever , then every prefix of is also in . Analogous definitions hold for suffix, factor, and subwordclosed languages. A factorclosed language is often called factorial.
In this paper we consider the computational complexity of testing whether a given language has the property of being prefixconvex, suffixconvex, etc., prefixclosed, suffixclosed, etc., for a total of 12 different problems. As we will see, the computational complexity of these decision problems depends on how the language is represented. If it is represented as the language accepted by a DFA, then the decision problem is solvable in polynomial time. On the other hand, if it is represented as a regular expression or an NFA, then the decision problem is PSPACEcomplete. We also consider the following question: given that a language is not prefixconvex, suffixconvex, etc., what is a good upper bound on the shortest words (shortest witnesses) demonstrating this fact?
The remainder of the paper is structured as follows. In Section 2 we study the complexity of testing for convexity for languages represented by DFA’s, and we include testing for closure and freeness as special cases. In Section 3 we exhibit shortest witnesses to the failure of the convexity property. Convex languages specified by NFA’s are studied in Section 4. We also briefly consider convex languages specified by contextfree grammars in Section 5. Section 6 concludes the paper.
2 Deciding convexity for DFA’s
We will show that, if a regular language is represented by a DFA with states, it is possible to test the property of prefix, suffix, factor, and subwordconvexity efficiently. More precisely, we can test these properties in time.
Let be one of the four relations prefix, suffix, factor, or subword. The basic idea is as follows: is not convex if and only if there exist words , , such that . Given , we create an NFA with states and transitions that accepts the language
Then if and only if is convex. We can test the emptiness of using depthfirst search in time linear in the size of . This gives an algorithm for testing the convex property.
Since the constructions for all four properties are similar, in the next subsection we handle the hardest case (factorconvexity) in detail. In the following subsections we content ourselves with a brief sketch of the necessary constructions.
2.1 Factorconvexity
Suppose is a DFA accepting the language , and suppose has states. We now construct an NFA such that
Clearly if and only if is factorconvex.
Here is the construction of . States of are quadruples, where components , , and keep track of where is upon processing , , and (respectively). The last component is a flag indicating the present mode of the simulation process.
Formally, , where
One verifies that the NFA has states and transitions, where is the cardinality of .
To see that the construction is correct, suppose is not factorconvex. Then there exist words such that is a factor of , is a factor of , and while . Then there exist words such that such that and . Let , , , , and . Moreover, let , , and , and . Since , we know that and are accepting states. Since , we know that is not accepting.
Automaton operates as follows. In the initial state we process the symbols of using Rule 1, ending in the state . At this point, we use Rule 2 to move to by an move. Next, we process the symbols of using Rule 3, ending in the state . Then we use Rule 4 to move to by an move. Next, we process the symbols of using Rule 5, ending in the state . Then we use Rule 6 to move to by an move. Next, we process the symbols of using Rule 7, ending in the state . Then we use Rule 8 to move to by an move. Finally, we process the symbols of using Rule 9, ending in the state , and this state is in .
On the other hand, suppose accepts the input . Then we must have . But the only way to reach a state in is, by our construction, to apply Rules 1 through 9 in that order, where oddnumbered rules can be used any number of times, and evennumbered rules can be used only once. Letting be the words labeling the uses of Rules 1, 3, 5, 7, and 9, respectively, we see that , where , , and . It follows that and , and so is not factorconvex.
We have proved
Theorem 1.
If is a DFA with states, there exists an NFA with states and transitions such that accepts the language
Corollary 2.
We can decide if a given regular language accepted by a DFA with states is factorconvex in time.
Proof.
Since is factorconvex if and only if , it suffices to check if using depthfirst search of a directed graph, in time linear in the number of vertices and edges of . ∎
2.1.1 Factorclosure
The language is not factorclosed if and only if there exist words such that is a factor of , and , while .
Given a DFA accepting , we construct from an NFA such that
As before, if and only if is factorclosed. The size of is .
States of are triples, where components and keep track of where would be upon processing , and (respectively). The last component is a flag as before.
Formally, , where

for , .

, for all ;

, for all ;

, for all ;

, for , .
has states and transitions. Thus we have:
Theorem 3.
We can decide if a given regular language accepted by a DFA with states is factorclosed in time.
This result was previously obtained by Béal et al. [3, Prop. 5.1, p. 13] through a slightly different approach.
The converse of the relation “ is a factor of ” is “ contains as a factor”. This converse relation and similar converse relations, derived from the prefix, suffix, and subword relations, lead to “converseclosed languages” [2]. It has been shown by de Luca and Varricchio [5] that a language is factorclosed (factorial, in their terminology) if and only if it is a complement of an ideal, that is, if and only if for some . Ang and Brzozowski [2] noted that a language is an ideal if and only if it is conversefactorclosed, that is, if, for every , each word of the form is also in . Thus, to test whether is conversefactorclosed, we must check that there is no pair such that , , and is a factor of . This is equivalent to testing whether is factorclosed. Then the following is an immediate consequence of Theorem 1:
Corollary 4.
We can decide if a given regular language accepted by a DFA with states is an ideal in time.
The results above also apply to other converseclosed languages. Similarly, any result about the size of witness demonstrating the lack of prefix, suffix and subwordclosure apply also to the witness demonstrating the lack of converseprefix, conversesuffix and conversesubwordclosure, respectively. Subwordclosed and conversesubwordclosed languages were also investigated and characterized by Thierrin [11].
2.1.2 Factorfreeness
Factorfree languages (also known as infixfree) have recently been studied by Han et al. [8]; they gave an efficient algorithm for determining if the language accepted by an NFA is prefixfree, suffixfree, or factorfree.
We can decide whether a DFA language is factorfree in time with the automaton we used for testing factorclosure, except that the set of accepting states is now
Similar results hold for prefixfree, suffixfree, and subwordfree languages.
2.2 Prefixconvexity
Prefix convexity can be tested in an analogous fashion. We give the construction of without proof: let , where
The NFA has states and transitions.
2.2.1 Prefixclosure
By varying the construction as before, we have
Theorem 5.
We can decide if a given regular language accepted by a DFA with states is prefixclosed, suffixclosed, or subwordclosed in time.
2.2.2 Prefixfreeness
See Section 2.1.2.
2.3 Suffixconvexity
Suffixconvexity can be tested in an analogous fashion. We give the construction of without proof. Let , where
The NFA has states and transitions.
2.4 Subwordconvexity
Subwordconvexity can be tested in an analogous fashion. We give the construction of without proof. Let , where
The NFA has states and transitions.
The idea is that as the symbols of are read, we keep track of the state of in the first component. We then “guess” which symbols of the input also belong to and/or , enforcing the condition that, if a symbol belongs to , then it must belong to , and if it belongs to , then it must belong to . We therefore cover all possibilities of words such that is a subword of and is a subword of .
2.5 Almost convex languages
As we have seen, a language is prefixconvex if and only if there are no triples with a prefix of , a prefix of , and , . We call such a triple a witness. A language could fail to be prefixconvex because there are infinitely many witnesses (for example, the language ), or it could fail because there is at least one, but only finitely many witnesses (for example, the language ).
We define a language to be almost prefixconvex if there exists at least one, but only finitely many witnesses to the failure of the prefixconvex property. Analogously, we define almost suffix, almost factor, and almost subwordconvex.
Theorem 6.
Let be a regular language accepted by a DFA with states. Then we can determine if is almost prefixconvex (respectively, almost suffixconvex, almost factorconvex, almost subwordconvex) in time.
Proof.
We give the proof for the almost factorconvex property, leaving the other cases to the reader.
Consider the NFA defined in Section 2.1. As we have seen, accepts the language
Then accepts an infinite language if and only if is not almost factorconvex. For if accepts infinitely many distinct words, then there are infinitely many distinct witnesses, while if there are infinitely many distinct witnesses , then there must be infinitely many distinct among them, since the lengths of and are bounded by .
Thus it suffices to see if accepts an infinite language. If were an NFA, this would be trivial: first, we remove all states not reachable from the start state or from which we cannot reach a final state. Next, we look for the existence of a cycle. All three goals can be easily accomplished in time linear in the size of , using depthfirst search.
However, is an NFA, so there is one additional complication: namely, that the cycle we find might be labeled completely by transitions. To solve this, we use an idea suggested to us by Jack Zhao and Timothy Chan (personal communication): we find all the connected components of the transition graph of (which can be done in linear time) and then, for each edge labeled with something other than (corresponding to the transition for some ), we check to see if and are in the same connected component. If they are, we have found a cycle labeled with something other than . This technique runs in linear time in the size of the NFA. ∎
2.5.1 Almost closed languages
In analogy with Section 2.5, we can define a language to be almost prefixclosed if there exists at least one, but only finitely many witnesses to the failure of the prefixclosed property. Analogously, we define almost suffix, almost factor, and almost subwordclosed.
Theorem 7.
Let be a regular language accepted by a DFA with states. Then we can determine if is almost prefixclosed (respectively, almost suffixclosed, almost factorclosed, almost subwordconvex) in time.
Proof.
Just like the proof of Theorem 6. ∎
2.5.2 Almost free languages
In a similar way, we can define a language to be almost prefixfree if there exists at least one, but only finitely many witnesses to the failure of the prefixfree property. Analogously, we define almost suffix, almost factor, and almost subwordfree.
Theorem 8.
Let be a regular language accepted by a DFA with states. Then we can determine if is almost prefixfree (respectively, almost suffixfree, almost factorfree, almost subwordfree) in time.
Proof.
Just like the proof of Theorem 6. ∎
3 Minimal witnesses
Let represent one of the four relations: factor, prefix, suffix, or subword. A necessary and sufficient condition that a language be not convex is the existence of a triple of words, where , , , and . As before, we call such a triple a witness to the lack of convexity. A witness is minimal if every other witness satisfies , or and , or , , and . The size of a witness is .
Similarly, if is not closed, then is a witness if , , and . A witness is minimal if there exists no witness such that , or and . The size is again . For freeness witness, minimal witness, and size are defined as for closure, except that both words are in .
Suppose we are given a regular language specified by an state DFA , and we know that is not convex (respectively, closed or free). A natural question then is, what is a good upper bound on the size of the shortest witness that demonstrates the lack of this property?
3.1 Factorconvexity
From Theorem 1, we get an upper bound for a witness to the lack of factorconvexity.
Corollary 9.
Suppose is accepted by a DFA with states and is not factorconvex. Then there exists a witness such that .
Proof.
In our proof of Theorem 1, we constructed an NFA with states accepting Thus, if is not factorconvex, accepts such a word , and the length of is clearly bounded above by the number of states of minus . ∎
It turns out that the bound in Corollary 9 is best possible:
Theorem 10.
There exists a class of nonfactorconvex regular languages , accepted by DFA’s with states, such the size of the minimal witness is .
The proof is postponed to Section 3.3 below.
Results analogous to Corollary 9 hold for prefix, suffix, and subwordconvex languages. However, in some cases we can do better, as we show below.
3.1.1 Factorclosure
Theorem 3 gives us a upper bound on the length of a witness to the failure of the factorclosed property:
Corollary 11.
If is accepted by a DFA with states and is not factorclosed, then there exists a witness such that .
It turns out that this upper bound is best possible. Let be a DFA , where , , . For , , the transition function is
The DFA has states. For , is illustrated in Figure 1.
Then we have the following theorem:
Theorem 12.
For the DFA above, let . For any witness to the lack of factorclosure we have , and this bound is achievable.
Proof.
Let be a minimal witness. Since the only rejecting state in leads only to itself, all the states along the accepting path of are final. We claim that is a suffix of , that is, for some . Otherwise, if the last letter of is not the last letter of , we can just omit it and get a shorter , which contradicts the minimality of . Similarly, all the states along the rejecting path of except the last one are final; otherwise, we get a shorter .
First, we prove that the set of states along the accepting path of includes both states and states. Let for . Then . If is a state, we are done. Otherwise, let for some . If , then , a contradiction. If , then , which is a state. Otherwise, , a contradiction. Hence, the set of states along the accepting path of includes both states and states.
Now, consider the set of states along the rejecting path of . We prove that the set of states along the rejecting path of includes only states. Suppose it includes both states and states. Since there is only one transition from a state to a state and all transitions from a state to a state are to the rejecting state , we have , where , and
Since is a suffix of , the last letter of is also . So, by the construction of , we have that , where , and
It is obvious that , which contradicts the equality . Therefore, the set of states along the rejecting path of includes only states.
Consider the last block of ’s in the words and . By the structure of , we have
and
Therefore, the length of the last block of ’s is at least . In other words, . Since the shortest word that leads to state (which is the only state having a transition to a state on input ) is , we also have , and the first part of this theorem proved.
To see that equality is achieved, let and ∎
3.1.2 Factorfreeness
From the remarks in Section 2.1.2, we get
Corollary 13.
If is accepted by a DFA with states and is not factorfree, then there exists a witness such that .
Up to a constant, Corollary 13 is best possible, as the following theorem shows.
Theorem 14.
There exists a class of languages accepted by DFA’s with states, such that the smallest witness showing the language not factorfree is of size .
Proof.
Let . This language can be accepted by a DFA with states. However, the shortest witness to lack of factorfreeness is , which has size . ∎
3.2 Prefixconvexity
For prefixconvexity, we have the following theorem.
Theorem 15.
Let be a DFA with states. Then if is not prefixconvex, there exists a witness with . Furthermore, this bound is best possible, as for all , there exists a unary DFA with states that achieves this bound.
Proof.
If is not prefixconvex, then such a witness exists. Without loss of generality, assume that is minimal. Now write , where and .
Let , , and . Let be the path from to traversed by , and let be the states from to (not including ), be the states from to (not including ), and be the states from to (not including ). See Figure 2. Since is minimal, we know that every state of is rejecting, since we could have found a shorter if there were an accepting state among them. Similarly, every state of must be accepting, for, if there were a rejecting state among them, we could have found a shorter and hence a shorter . Finally, every state of must be rejecting, since, if there were an accepting state, we could have found a shorter .
Let for . There are no repeated states in , for if there were, we could cut out the loop to get a shorter ; the same holds for and . Thus for .
Now and are disjoint, since all the states of are rejecting, while all the states of are accepting. Similarly, the states of are disjoint from . So and . It follows that . Since , it follows that .
To see that is optimal, consider the DFA of states accepting the unary language . Then is not prefixconvex, and the shortest witness is . ∎
3.2.1 Prefixclosure
For prefixclosed languages we can get an even better bound.
Theorem 16.
Let be an state DFA, and suppose is not prefixclosed. Then the minimal witness showing is not prefixclosed has , and this is best possible.
Proof.
Assume that is a minimal witness. Consider the path from to , passing through . Let denote the part of the path from to (not including ) and denote the part of the path from to (not including ). Then all the states traversed in must be rejecting, because if any were accepting we would get a shorter . Similarly, all the states traversed in must be accepting, because otherwise we could get a shorter . Neither nor contains a repeated state, because if they did, we could “cut out the loop” to get a shorter or . Furthermore, the states in are disjoint from . So the total number of states in the path to (not counting ) is at most . Thus .
The result is best possible, as the example of the unary language shows. This language is not prefixclosed, can be accepted by a DFA with states, and the smallest witness is . ∎
3.2.2 Prefixfreeness
For the prefixfree property we have:
Theorem 17.
If is accepted by a DFA with states and is not prefixfree, then there exists a witness with . The bound is best possible.
Proof.
The proof is similar to that of Theorem 15. The bound is achieved by a unary DFA accepting . ∎
3.3 Suffixconvexity
For the suffixconvex property, the cubic upper bound implied by Corollary 9 is best possible, up to a constant factor.
Theorem 18.
There exists a class of nonsuffixconvex regular languages , accepted by DFA’s with states, such the size of the minimal witness is .
Proof.
Let
</ 