Finding a non-minority ball with majority answers

Finding a non-minority ball with majority answers

Dániel Gerbner111Research supported by the Hungarian Scientific Research Fund (OTKA), under grant PD 109537. Balázs Keszegh222Research supported by the Hungarian Scientific Research Fund (OTKA), under grant PD 108406.333Research supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. Dömötör Pálvölgyi444Research supported by the Hungarian Scientific Research Fund (OTKA), under grant PD 104386.555Research supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. Balázs Patkós666Research supported by the Hungarian Scientific Research Fund (OTKA), under grant SNN 116095.777Research supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. Máté Vizer888Research supported by the Hungarian Scientific Research Fund (OTKA), under grant SNN 116095.999Research supported by ERC Advanced Research Grant no 267165 (DISCONV). Gábor Wiener101010Research supported by the Hungarian Scientific Research Fund (OTKA), under grant K 108947.111111Research supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. MTA Alfréd Rényi Institute of Mathematics
Reáltanoda u. 13-15 Budapest, 1053 Hungary
email: gerbner.daniel,keszegh.balazs,patkos.balazs, vizer.mate@renyi.mta.hu
Eötvös Loránd University, Department of Computer Science.
Pázmány Péter sétány 1/C Budapest, 1117 Hungary,
email:dom@cs.elte.hu
MTA-ELTE Geometric and Algebraic Combinatorics Research Group.
Pázmány Péter sétány 1/C Budapest, 1117 Hungary,
email:patkosb@cs.elte.hu
Department of Computer Science and Information Theory
Budapest University of Technology and Economics
1117 Budapest, Magyar tudósok körútja 2.
email:wiener@cs.bme.hu
Abstract

Suppose we are given a set of balls each colored either red or blue in some way unknown to us. To find out some information about the colors, we can query any triple of balls . As an answer to such a query we obtain (the index of) a majority ball, that is, a ball whose color is the same as the color of another ball from the triple. Our goal is to find a non-minority ball, that is, a ball whose color occurs at least times among the balls. We show that the minimum number of queries needed to solve this problem is in the adaptive case and in the non-adaptive case. We also consider some related problems.

keywords:
combinatorial search, majority, median

1 Introduction

This paper deals with search problems where the input is a set of balls, each colored in some way unknown to us and we have to find a ball possesing a certain property (or show that such a ball does not exist) by asking certain queries. Our goal is to determine the minimum number of queries needed in the worst case. It is possible that the queries are all fixed beforehand (in which case we speak of non-adaptive search) or each query might depend on the answers to the earlier queries (in which case we speak of adaptive search). We say that a ball is a majority ball of a set if there are more than balls in the set that have the same color as . Similarly, a ball is a non-minority ball of a set if there are at least balls in the set that have the same color as . Note that these two notions are different if and only if is even and balls are colored of the same color, in which case each of these balls is a non-minority ball and there are no majority balls (moreover, if there are only two colors, then the size of the other color class is also , thus every ball is a non-minority ball). For more than two colors it is possible that even non-minority balls do not exist. A ball is said to be a plurality ball of a set if the number of balls in the set with the same color as is greater than the number of balls with any of the other colors. In this paper we focus on the case of just two colors.

The most natural non-trivial question is the so-called majority problem which has attracted the attention of many researchers. In this problem our goal is to find a majority ball (or show that none exists), such that the possible queries are pairs of balls and the answer tells us whether and have the same color or not. In the adaptive model, Fisher and Salzberg FS () proved that queries are necessary and sufficient for any number of colors, while Saks and Werman SW () showed that if the number of colors is known to be two, then the minimum number of queries is , where is the number of 1’s in the binary representation of (simplified proofs of the latter result were later found, see ARS (); HKM (); W ()). In the non-adaptive model with two colors, it is easy to see that the minimum number of queries needed is if is even and if is odd.

There are several variants of the majority problem A (). The plurality problem, where we have to find a plurality ball (or show that none exists) was considered, among others, in A (); DJKKS (); GKPP (). Another possible direction is to use sets of size greater than two as queries MKW (); DK ().

The main model studied in this paper is the following. In the original comparison model the answer to the query can be interpreted as the answer to the question whether there is a majority ball in the subset . If the answer is yes, then obviously both and are majority balls. Therefore we obtain a generalization of the comparison model if for any query that is a subset of the balls the answer is either (the index of) a majority ball, or that there is no majority ball in the given subset (which cannot be the case if the size of the subset is odd and there are only two colors). We study this model in case of two colors, and mostly when only queries of size three are allowed, although we also prove some results for greater query sizes.

Unfortunately, even asking all triples cannot guarantee that we can solve the majority problem for two colors. Suppose we have an even number of balls that are partitioned into two sets of the same size, and , and suppose that the answer for any triple is a ball from if and only if . In this case we cannot decide whether all balls have the same color or all balls in are red, but all balls in are blue. In the former case all balls are majority balls, while in the latter there exists no majority ball.

Because of this, our aim will be to show a non-minority ball (which always exists if there are only two colors). Let us assume the balls are all red or blue and all queries are of size . We will denote the minimum number of queries needed (in the worst case) to determine a non-minority ball by in the adaptive model and by in the non-adaptive model.

At first sight the model we have just introduced seems to be rather artificial. Let us, however, state a more natural problem that is equivalent to this model.

Suppose that our input is a binary sequence of length , i.e., numbers such that each is either 0 or 1. Our task is to find a median element, such that the queries are odd subsets of the input elements and the answer is one of the median elements of the subset. Let us assume queries are of size . Denote the the minimum number of queries needed in the worst case to determine a median element by in the adaptive model and by in the non-adaptive model.

Proposition 1.

and .

Proof.

If we replace 0 and 1 by red and blue, then the median elements of any set are exactly the non-minority balls of the set. ∎

We obtain a natural generalization that also works for even sized subsets if the answer is the th element for some fixed . More precisely, for a query , the answer may be if and only if there exist elements , such that and elements , such that . Note that in this model there might be more than one valid answer to a given query. This can be outruled by assuming that all elements are different (in which case we do not deal with just the numbers 0 and 1, obviously). This approach was proposed by G. O.H. Katona and studied by Johnson and Mészáros JM (). They have shown that if all elements are different, then they can be almost completely sorted121212 Note that the largest and the smallest elements cannot ever be differentiated with such questions, so we only want to determine these and sort the rest. using queries in the adaptive model and queries in the non-adaptive model and both results are sharp. However, their algorithms fail if not all elements are different. Our results imply that the same bound holds in the adaptive model with no restriction. However, the bound in the non-adaptive model cannot be extended to the general case. We discuss our related results in Section 4.

To state our results concerning and we introduce the following notations. We write for the set of the first positive integers and the set of balls is denoted by . For a set , the set of its -subsets will be denoted by . Let be a query set. Then for any ball let denote the degree of in and for any two balls let denote the co-degree of and in . Furthermore, let us write and .

Throughout the paper we use the following standard notation to compare the asymptotic behavior of two functions and . We write if holds. We write () if there exists a positive constant such that () holds for all values of and we write if both and hold. Sometimes, the function might have two variables and . Then means that for every there exists a constant such that holds for all values of . Finally, we write if there exist positive constants and such that holds.

Theorem 2.

.

Before stating our results on we state a theorem on the structure of query sets that do not determine non-minority balls.

Theorem 3.

(i) If is odd and is a set of non-adaptive queries with , i.e., there is a pair of balls with , then cannot determine a non-minority ball.

(ii) For every there exists a non-adaptive query set with that determines a non-minority ball.

(iii) If is even and is a set of non-adaptive queries, such that there exist four balls with , then cannot determine a non-minority ball.

(iv) Any non-adaptive query set with determines a non-minority ball.

(v) There exists a non-adaptive query set with that does not determine a non-minority ball.

With the help of Theorem 3 we will give bounds on .

Theorem 4.

(i) If is odd, then .

(ii) If is even, then .

(iii) .

Concerning large query sizes we prove the following results:

Theorem 5.

We have and .

Theorem 6.

We have and .

The rest of the paper is organized as follows: in Section 2 we examine some generalizations of the well-known median of medians algorithm of Blum, Floyd, Pratt, Rivest, and Tarjan bfprt (). Using the facts shown in Section 2, we prove Theorem 2, Theorem 3 and Theorem 4 in Section 3. Section 4 contains the proof of Theorem 5 and Theorem 6, we introduce some new models and some open problems. We postpone some proofs and the analysis of a related model to the Appendix.

2 Algorithms

In this section we gather most of the building blocks of the algorithms we will use in Section 3 to prove our main results. We start with an easy algorithm that finds two balls of different colors unless all balls have the same color.


Algorithm 2DB (Two Different Balls)

Input: a subset of the balls colored with two colors.

Query: a triple of balls.

Answer: a majority ball in (that we will call answer ball).

Output: two balls with the property that either they are of different colors or all balls in have the same color.

Description of Algorithm 2DB

We start with an arbitrary query, then remove the answer ball, keep the other two elements and add a new element to obtain the second query. Then we repeat this procedure, always replacing the answer ball by a ball that has not appeared in any earlier query. After questions, we have removed balls, and thus we cannot continue this procedure and the algorithm outputs the remaining two balls. If these balls are of the same color, then it is easy to see that all balls in have the same color.

Remark: Algorithm 2DB is clearly adaptive, but if a non-adaptive query set contains all queries from a subset of , then Algorithm 2DB can be used. We will do so in Section 3.

From now on for a coloring and a ball we denote the color of by .

Observation 7.

Suppose the balls are colored with 0 and 1, and we know that the color of is and the color of is . If we query and , then we can conclude one of the following.

  • ,

  • ,

  • .

Proof.

If the answers to the queries and are, respectively, and , then and are of different colors. Otherwise, either the answer to is , in which case holds, or the answer to is , in which case holds. ∎

Proposition 1 showed the connection between the non-minority problem and finding a median element among entries. Algorithm 2DB and Observation 7 tell us that we can almost imitate comparison based algorithms to find the median with triple queries. Therefore it is natural to think that some of the comparison based median finding algorithms can be altered in a way that can be useful for our purposes. In the remainder of this section, we show two variants of the well-known median of medians (MoM) algorithm by Blum, Floyd, Pratt, Rivest, and Tarjan bfprt (). The first variant (MoM2) is a very natural generalization and is of independent interest: entries are not necessarily distinct integers, the answer to a query is either or and the Adversary has the right to answer any of and if and are equal. (Using the well-known Adversary method.) The second variant (MoM3) is much less natural, but is based on Observation 7: entries are 0’s and 1’s and apart from the previous possibilities the Adversary has the right to answer if that is the case.

As in both models an element may appear more than once, therefore we need the following definitions in order to state our results.

Let be an -element multiset of integers. We call an element of a kth largest element if there exists a partition of , such that , , such that each element in is at least and each element in is at most . A median is a ()st largest element of an -element set.

A decreasing enumeration of an element multiset is a permutation with . Clearly, if all elements of are distinct, then has only one decreasing enumeration, while if in a multiset of integers the multiplicities are , then the number of decreasing enumerations is . We say that a multiset is completely sorted, if we fix one of its decreasing enumerations.

Observation 8.

If for all we know or , then we know a decreasing enumeration of .

Proof.

Consider the directed graph on vertex set with if and only if . If is a DAG (directed acyclic graph), then is a transitive tournament and we are done. If contains some directed cycles, then the integers in corresponding to all elements of such a directed cycle are equal. Therefore we can contract all the vertices of the cycle into one vertex. At some point no directed cycles will remain and we obtain the desired decreasing enumeration. ∎

The next algorithm shows that instead of asking all pairs, the problem of finding a th largest element can be solved adaptively in linear time.

Algorithm MoM2

Input: an element multiset of integers and an integer with .

Goal: find one of the th largest elements of the input multiset.

Query: a pair .

Answer: either or (in case , both answers are possible).

Output of Algorithm MoM2: (, , ), a partition of the input multiset, where:

is one of the th largest elements,

is a set of elements that are at least , and

is a set of elements that are at most .

Description of Algorithm MoM2

The Algorithm MoM2 consists of 4 phases. For each phase we will write in teletype style what the algorithm does, and our analysis will be in normal typestyle.

In the description of the algorithm we will introduce two sets: and , change them dynamically during the phases and finally use them to define and of the output. (For the sake of simplicity we will use the term set instead of multiset in the description of the algorithm.)

To count the queries used by the recursive calls, we denote by the worst case running time of Algorithm MoM2 on elements for any .

Phase 1: We divide the elements of the -element input set into groups of five except at most four elements and sort each group.

Observation 8 shows that this can be done with 10 comparisons in each group, but a simple case analysis shows that adaptively 7 queries are enough. Therefore this phase requires queries. Take a median from each group to form of size .

Phase 2: By recursion we can find a median of , called the pivot, and a partition of the rest of the elements of into two almost equal subsets, and , such that each element of is at most as large as the pivot and each element of is at least as large as the pivot.

Then we put into the elements that were smaller than or equal to some in their (completely sorted) group and we put into the elements which were larger than or equal to in their (completely sorted) group.

We use queries during this phase. Note also that at the end of this phase we have

Phase 3: We compare each element in the complement of to the pivot and if the answer is , we put in , if the answer is , we put in .

This phase requires at most queries. At the end of this phase we have

Phase 4:

Case 1: If (i.e., the pivot is a th largest element), then the output of the algorithm is .

Case 2: If , then by a recursive call on , whose output is , we find , a th element of and a partition of such that and . In this case the output of the algorithm is .

Case 3: If , then by a recursive call on we find , a th element of and a partition of such that . In this case the output of the algorithm is .

In each cases of Phase 4 we find the respective element, using at most

queries. During the algorithm altogether we have used

queries. Now by induction . It is worth mentioning, that this is somewhat better than the well-known bound for all different numbers using the original algorithm. (Note that the best bound for finding the median is between and , see DZ ().)


Let us now consider the second model, that is, where the input is restricted to binary sequences, but the Adversary can answer , or . In this model for any there is a strategy of the Adversary, such that even asking all possible queries we can not find any of the th largest elements. Indeed, let us partition into two sets of equal size and . If the Adversary answers whenever and , then all we know is that there are the same number of 0’s and 1’s in and the two classes are and , but we cannot tell which one is which, therefore we cannot solve the problem.

On the other hand, using Algorithm 2DB, for balls we can suppose that a 0 and a 1 is in , provided not all elements of are the same. Still, a similar strategy of the Adversary shows that even after querying all pairs, we cannot sort all the elements. However, we can show a th largest element for any .

Observation 9.

If the Adversary answers all possible queries, then

(i) we can partition such that is completely sorted and contains exactly many 0’s and 1’s. In particular, if is odd then we can find a median element.

(ii) We can determine a th largest element provided hold.

Proof.

We can check if there exists an element multiset satisfying all answers of the Adversary. If not, then we are done, as we revealed a contradiction in his answers. However if there is an element multiset satisfying Adversary’s answers, then let us remove a subset in the following way: let and as long as there exists a pair with the answer , let us put . If there is no such pair, then we stop and write . As removed elements come in pairs, we obtain that for some and contains exactly 0’s and 1’s. If , then a th largest element must be 1, thus we can output . Similarly, if , then a th largest element must be 0 and we can output . Finally, if holds, then by Observation 8 we can sort and output a th element of . To see the second part of (i), note that if is odd, then always holds. ∎

Algorithm MoM3

Input: an -element multiset containing 0’s and 1’s (and we will

use 0 instead of and 1 instead of ) and an integer with .

Goal: find one of the th largest elements of the input multiset.

Query: a pair .

Answer: either , or .

Output of Algorithm MoM3: (), a partition of the input multiset where:

is one of the th largest elements of .

such that of them are 0’s and of them are 1’s. Moreover, the output also contains a bijection among the 0’s and 1’s of . Using this bijection, we will talk about the pair of an element of .

, a partition of such that each element in is at most and each element of is at least as large as with if .

If we output and we output if . (Note that cannot happen.)

Description of Algorithm MoM3

Algorithm MoM3 consists of 5 phases. For each phase we will write in teletype style what the algorithm does, and our analysis will be in normal typestyle.

In the description of the algorithm we will introduce three sets: , and , change them dynamically during the phases and finally use them to define , and of the output. We make sure that at any moment of the algorithm consists of pairs of 0’s and 1’s. (Again, for the sake of simplicity we will use the term set instead of multiset.)

To count the queries used by the recursive calls, we denote by the worst running time of Algorithm MoM3 on elements for any . For the sake of simplicity during the computation we omit the additive constants, floor and ceiling signs.

Phase 1: We divide the elements of into groups of five (with the exception of at most four elements) and execute all possible queries within all groups. Applying Observation 9 (i) we obtain a median element in each of the groups.

Let denote the set of medians, then . For any , let denote its five element group, and let and be the partition of we obtain by Observation 9. We put and note that has size 1, 3, or 5 for all .

This phase requires queries.

Phase 2: By a recursive call on , we find a median of , called the pivot, a set of pairs (we do not put these elements into ), and a partition of the rest of the elements into two subsets, and with , such that each element of is at most as large as the pivot and each element of is at least as large as the pivot. For all we put all elements of the sorted not larger than into and for all elements we put all elements of the sorted not smaller than into .

Note that both and can contain and .

This phase requires queries.

Phase 3: For every pair we query all pairs with . Using the fact that one of and is 0, the other one is 1, we can apply Observation 9 (i) to and obtain the partition with .

If , we put its elements into ,

if not, then we can still deduce that some elements of and are 0 and 1 using that are median elements of and . Therefore we can pair the first one and the last one (if ) or the first two and the last two (if ) elements of the order of and put them into along with and .

Note that at this moment by Phase 2 for every we have at least 3 elements of in and by Phase 3 for every pair we have at least 6 elements of in . Thus holds. Observe that at this moment we have as half of the groups , , contributed at most 3 elements to and the other half of such groups contributed at most 3 elements to , while groups with contributed only to . Let .

This phase requires queries.

Phase 4: For every element we query the pair . If the answer is , then we put into , if the answer is , then we put into .

Observe that new elements to and came from , thus at this moment their size is not more than . Let be the set of those elements, when the answer is . Note that all elements of have the same color (the opposite of that of ) and as , we have .

By the above observation, we use at most queries.

Phase 5: In this phase, we compare every element of to one element of . We proceed in the following way:

if the answer to query with , is , then we put and as a pair to and the remaining elements of should be compared to a remaining element of ,

if the answer is or , then we just move on to the next element of .

We use at most queries.

The following can occur during Phase 5:

If in any of the following cases happens, we output and if happens, we output .

Case 1: becomes empty as all its elements are moved to .

In this case is partitioned into , , , and . By the above case we are not done if , but then observe that the th largest element out of the original elements is the same as the th element from . We know that for all .

Therefore we can make a recursive call to either with or with depending on whether or (if , then is a th largest element of and we can output ).

If the recursive call on outputs , then our final output is . The case when we make the recursive call to is similar.

As , this requires at most queries.

Case 2: We obtain an answer for some and .

In this case the value of the pivot is , all elements of are and all elements of are . Indeed, supposing that the value of the pivot is 0, then as for all we have , we obtain , a contradiction. As the value of the pivot is 1, so are the values of all . Thus

if , then we can output , where is an arbitrary subset of of size and ,

if , we can output , where is a subset of of size and , since half of the elements in and all elements in are 0,

if , then a st element of is a th element of . We make a recursive call to with . If the output of the recursive call is , then our final output is .

This case uses at most queries.

Case 3: We obtain an answer for some and .

In this case the value of the pivot is , all elements of are and all elements of are and we proceed analogously to the previous case.

Case 4: We have and for all , , .

In this case all elements of have value and all elements of have value . Indeed, we have , and .

We put one element and to as a pair and

if , then we can output , where is a subset of of size and ,

if , then we output , where is a subset of of size and ,

if , then we output , where , with and .

In all cases of the final case analysis we used at most queries. Therefore we obtain that the running time satisfies

Solving this we obtain a linear bound .

3 Proofs of the main theorems

In this section we prove Theorems 2, 3 and 4, and to make the presentation more followable we restate them before their proof.

First we put together the pieces from Section 2 to obtain a proof of Theorem 2.

Theorem 2. .

Proof of Theorem 2.

Let us start by executing Algorithm 2DB. If the two balls of the output have the same color, then no matter which ball we output, it will be a non-minority ball. Thus, from now on we assume that the two remaining balls are of different colors. We call the color of one of them 0, the other 1, and denote the respective balls by 0 and 1. To every ball we assign the number of its color, e.g., for two balls means that if has color 1, then so does . By Observation 7, we know that after obtaining the answers to the queries and we know if , or hold.

Therefore, we can run in linear time the queries that correspond to the queries of Algorithm MoM3 to find a median, which is, by Proposition 1, a non-minority ball. ∎

Now we turn our attention to non-adaptive problems. We start with a definition and a simple observation that we will use in many of our proofs.

Definition 10.

Let be a non-adaptive query set and () a possible sequence of answers. We say that is a legal coloring of the ball set if for every is a majority ball in . The minority set of a coloring is the set of all balls that are not non-minority balls.

Observation 11.

A non-adaptive query set does not determine a non-minority ball if and only if there exists a sequence () of answers for which the minority sets of all legal colorings cover the ball set, i.e., where denotes the set of all legal colorings. ∎

Using the above simple observation, we can prove Theorem 3 that we restate here.

Theorem 3.

(i)  If is odd and is a set of non-adaptive queries with , i.e., there is a pair of balls with , then cannot determine a non-minority ball.

(ii)  For every there exists a non-adaptive query set with that determines a non-minority ball.

(iii)  If is even and is a set of non-adaptive queries, such that there exist four balls with , then cannot determine a non-minority ball.

(iv)  Any non-adaptive query set with determines a non-minority ball.

(v) There exists a non-adaptive query set with that does not determine a non-minority ball.

Proof of Theorem 3.

First we prove (i). Let and assume that for a query set and a pair of balls we have . We have to show that we cannot determine a non-minority ball. Let us partition into two sets and such that , , and . To prove (i) we will show how to answer queries of such that the conditions of Observation 11 are met. Let be an arbitrary query.

  1. if for some , then the answer to is a ball from ,

  2. if for , then we answer the ball from ,

  3. if , then by the assumption on the partition we know that the third ball in belongs to , and the answer is this third ball.

Note that the above answers are all possible if we assume that balls in are blue and balls in are red. Furthermore, (3) assures that at least one of and is also red. Thus the three different colorings for which is blue, is red and at least one of is red, with respective minority sets , are all legal with respect to the above answers. Therefore by Observation 11 does not determine a non-minority ball.

To prove (ii) we construct a query set with that determines a non-minority ball. Let be a subset of the balls with and let . In proving that does indeed determine a non-minority ball we will apply our results concerning the adaptive algorithms from Section 2.

We start by executing Algorithm 2DB on (we can do that as ). We obtain two balls and that have different colors unless all balls in have the same color. If and are of the same color, then that color is the majority color (and hence non-minority), and if some is colored with the other color, cannot be in , therefore the answer to the query cannot be . We look at the queries of the form for all , and define to be the set of those balls , for which the answer to the query is or . We will make sure that our final output will be a ball in . This guarantees that if the colors of and are the same, then we will output a non-minority ball. Therefore, we can assume that the color of is 0 and the color of is 1. If the answer to the query is , then the color of is 0, if the answer is , then the color of is 1. Let denote the number of balls of the latter type. We remove the balls in , and then the median is the th largest among the remaining balls for (this is always positive as ), and we can find it using Algorithm MoM3. All these queries are in , as they contain or .

For (iii), we have to show that if is even and is a set of non-adaptive queries such that there exist four balls with , then cannot determine a non-minority ball. Take the set of balls that are in a query with one of and one of , e.g., , and add some further balls to them, if necessary, to form a set of size . Let be the set of the remaining balls. We answer the queries such that all colorings are valid for which , , and are all monochromatic sets, and are colored differently and either and , or and (possibly all four) have the same color as the balls in .

  • If a query meets one of the four sets above in at least two balls, then the answer is one of those balls,

  • the answer to a query with , , is ,

  • the answer to a query with is .

By the definition of the partition , there are no other possible queries, and one can easily check that the sets and are all minority sets of legal colorings and thus we are done by Observation 11.

We prove (iv) by contradiction. Assume is a query set with that does not determine a non-minority ball. Then by Observation 11 there exists a set of answers for which the minority sets of all legal colorings cover the ball set . Let be a minimal set of legal colorings for which holds. Note that as for any legal coloring . Let us consider three legal colorings and the corresponding minority sets . By the minimality of , there exist balls with