ContextAware Hierarchical Online Learning
for Performance Maximization
in Mobile Crowdsourcing
Abstract
In mobile crowdsourcing, mobile users accomplish outsourced human intelligence tasks. Mobile crowdsourcing requires an appropriate task assignment strategy, since different workers may have different performance in terms of acceptance rate and quality. Task assignment is challenging, since a worker’s performance (i) may fluctuate, depending on both the worker’s current context and the task context, (ii) is not known a priori, but has to be learned over time. However, learning contextspecific worker performance requires access to context information, which workers may not grant to a central entity. Moreover, evaluating worker performance might require costly quality assessments. In this paper, we propose a contextaware hierarchical online learning algorithm addressing the problem of performance maximization in mobile crowdsourcing. In our algorithm, a local controller (LC) in the mobile device of a worker regularly observes its worker’s context, his decisions to accept or decline tasks and the quality in completing tasks. Based on these observations, the LC regularly estimates its worker’s contextspecific performance. The mobile crowdsourcing platform (MCSP) then selects workers based on performance estimates received from the LCs. This hierarchical approach enables the LCs to learn contextspecific worker performance and it enables the MCSP to select suitable workers. In addition, our algorithm preserves worker context locally, and it keeps the number of required quality assessments low. We prove that our algorithm converges to the optimal task assignment strategy. Moreover, the algorithm outperforms simpler task assignment strategies in experiments based on synthetic and real data.
I Introduction
Conventional webbased crowdsourcing is a popular way to outsource human intelligence tasks, prominent examples being Amazon Mechanical Turk^{1}^{1}1https://www.mturk.com and Crowdflower^{2}^{2}2https://www.crowdflower.com/. More recently, mobile crowdsourcing has evolved as a powerful tool to leverage the workforce and skills of a plethora of mobile users to accomplish tasks in a distributed manner [1]. This may be due to the fact that the number of mobile devices is growing rapidly and at the same time, people spend a considerable amount of their daily time using these devices. For example, according to the Cisco Visual Networking Index, between 2015 and 2016, the global number of mobile devices grew from 7.6 to 8 billion [2]. Moreover, according to an estimate of eMarketer [3], the daily time US adults spend using mobile devices will be more than 3 hours in 2017, which is an increase by more than compared to 2013.
In mobile crowdsourcing, task owners outsource their tasks via an intermediary mobile crowdsourcing platform (MCSP) to a set of workers, i.e., mobile users, who may be willing to complete these tasks. A mobile crowdsourcing task may require the worker to interact with his mobile device in the physical world (e.g., photography tasks) or to complete some virtual task via his mobile device (e.g., image annotation, sentiment analysis). Some mobile crowdsourcing tasks, subsumed under the term spatial crowdsourcing [4], are spatially constrained (e.g., photography task at point of interest), or require high spatial resolution (e.g., air pollution map of a city). In spatial crowdsourcing, tasks typically require workers to travel to certain locations. However, recently emerging mobile crowdsourcing applications are also concerned with locationindependent tasks. For example, MapSwipe^{3}^{3}3https://mapswipe.org/ lets mobile users annotate satellite imagery to find inhabitated regions around the world. The GalaxyZoo app^{4}^{4}4https://www.galaxyzoo.org/ lets mobile users classify galaxies. The latter project is an example of the more general trend of citizen science [5]. On the commercial side, Spare5^{5}^{5}5https://app.spare5.com/fives or Crowdee^{6}^{6}6https://www.crowdee.de/ outsource microtasks (e.g., image annotation, sentiment analysis, and opinion polls) to mobile users in return for small payments. While locationindependent tasks could as well be completed by users of static devices as in webbased crowdsourcing, emerging mobile crowdsourcing applications for locationindependent tasks exploit the increasing amount of online mobile users that complete tasks on the go.
Mobile crowdsourcing – be it spatial or locationindependent – requires an appropriate task assignment strategy, since not all online workers may be equally suitable for a given task. First, different workers may have different task preferences and hence different acceptance rates. Secondly, different workers may have different skills, and hence provide different quality when completing a task. Two assignment modes considered in the crowdsourcing literature are the serverassigned task (SAT) mode and the workerselected task (WST) mode (see [6] for a taxonomy). In SAT mode, the MCSP tries to match workers and tasks in an optimal way, e.g., to maximize the number of task assignments, possibly under a given task budget. For this purpose, the MCSP typically gathers task and worker information to decide on task assignment. While this represents a sophisticated strategy, it may require a large communication overhead and a privacy concern for workers since the MCSP has to be regularly informed about the current worker contexts (e.g., their current positions). Moreover, previous work on the SAT mode often either assumed that workers always accept a task once assigned to it or that workers’ acceptance rates and quality are known in advance. However, this is not necessarily true in reality, since acceptance rates and quality are usually not known beforehand and therefore have to be learned by the MCSP over time. In addition, a worker’s acceptance rate and the quality of completed tasks might depend not only on the specific task, but also on the worker’s context, e.g., the worker’s location or the time of day [7]. This context may change quickly, especially in mobile crowdsourcing with locationindependent tasks, since workers can complete such tasks anytime and anywhere.
In contrast, in WST mode, workers autonomously select tasks from a list. This rather simple mode is often used in practice (e.g., on Amazon Mechanical Turk) since it has the advantage that workers automatically select tasks they are interested in. However, the WST mode can lead to suboptimal task assignments since first, finding suitable tasks is not as easy as it seems (e.g., timeconsuming searches within a long list of tasks are needed and workers might simply select from the first displayed tasks [8]) and secondly, workers might leave unpopular tasks unassigned. Therefore, in WST mode, the MCSP might additionally provide personalized task recommendation (TR) to workers such that workers find appropriate tasks [7]. However, personalized TR typically requires the workers to share their current context with the MCSP, which again means a communication overhead and a privacy concern for workers.
We argue that a task assignment strategy is needed which combines the advantages of the above modes: The MCSP should centrally coordinate task assignment to ensure that appropriate workers are selected for each task, as in SAT mode. At the same time, the communication overhead due to task assignment should be small and workers’ personal data should be protected. Hence, no worker context should be shared with the MCSP, as in basic WST mode. Moreover, task assignment should take into account that workers may decline a task they are assigned to, and hence, the assignment should fit to the workers’ preferences, as in WST mode with personalized TR. In addition, task assignment should be based both on acceptance rates and on the quality with which a task is completed. Since observing the quality of completed tasks may require costly quality assessments (e.g., a manual quality rating from a task owner, or an automatic quality assessment using local software in a mobile device), the required number of quality assessments should be kept low. Finally, workers’ acceptance rates and quality have to be learned over time.
Our contribution therefore is as follows: We propose a contextaware hierarchical online learning algorithm for performance maximization in mobile crowdsourcing for locationindependent tasks. Our algorithm is split into two parts, one part executed by the MCSP, the other part by local controllers (LCs) located in each of the workers’ mobile devices. An LC learns its worker’s performance in terms of acceptance rate and quality online over time, by observing the worker’s personal contexts and his decisions to accept or decline tasks and the quality in completing these tasks. The LC learns from its worker’s context only locally, and personal context is not shared with the MCSP. Each LC regularly sends performance estimates to the MCSP. Based on these estimates, the MCSP takes care of the worker selection. This hierarchical (in the sense of the coordination between the MCSP and the LCs) approach enables the MCSP to select suitable workers for each task under its budget without having access to the workers’ personal contexts. Moreover, workers receive personalized task requests based on their interests and skills, while keeping the number of (possibly costly) quality assessments low. Since our algorithm learns in an online fashion, it adapts and improves the worker selection over time and can hence achieve good results already during run time. We prove that our algorithm converges to the optimal task assignment strategy, which for each task selects the subset of workers which maximizes the expected performance under the task budget.
The remainder of this paper is organized as follows. Section II gives an overview on related work. Section III describes the system model. In Section IV, we propose a contextaware hierarchical online algorithm for performance maximization in mobile crowdsourcing. In Section V, we theoretically analyze our proposed algorithm in terms of its regret, as well as its requirements with respect to local storage, communication and worker quality assessment. Section VI contains a numerical evaluation of our algorithm based on synthetic and real data. Section VII concludes the paper.
Ii Related Work
Research has put some effort in theoretically defining and classifying crowdsourcing systems, such as webbased crowdsourcing [9], mobile [1] and spatial [4] crowdsourcing.
[13]  [14]  [15]  [16]  [17]  [18]  [6],[19]  [20]  [21]  [22]  This work  

Crowdsourcing Type  General  General  General  General  Mobile  Mobile  Spatial  Spatial  Spatial  Spatial  Loc.Ind. 
Task Assignment Mode  SAT  SAT  SAT  WST/TR  WST/TR  SAT  SAT  SAT  SAT  SAT  proposed 
Different Task Types  Yes  No  Yes  Yes  Yes  No  Yes  Yes  No  Yes  Yes 
Worker ContextAware  No  No  No  No  Yes  No  Yes  Yes  Yes  No  Yes 
ContextSpec. Performance  No  No  No  No  Yes  No  No  Yes  Yes  No  Yes 
Worker Context Protected  N/A  N/A  N/A  N/A  Yes  N/A  No  No  Yes  N/A  Yes 
Type of Learning  Online  Online  Batch  Online  Offline  Online  N/A  Online  N/A  Offline  Online 
Regret Bounds  Yes  Yes  No  No  No  Yes  N/A  Yes  N/A  No  Yes 
Below, we give an overview on related work on task assignment in general, mobile and spatial crowdsourcing systems, as relevant for our scenario. Note that strategic behavior of workers and task owners in crowdsourcing systems, e.g., concerning pricing and effort spent in task completion [10], represents an active research area itself, which is out of the scope of this paper. Also note that we assume that the quality of a completed task can be observed. A different line of work on crowdsourcing deals with quality estimation in case of missing ground truth, recently also using online learning [11].
Due to the inherent dynamic nature of crowdsourcing, with tasks and/or workers typically arriving dynamically over time, task assignment is often modeled as an online decision making problem [12]. For general crowdsourcing systems, [13] proposed a competitive online task assignment algorithm for maximizing the utility of a task owner on a given set of task types, with finite number of tasks per task type, by learning the skills of sequentially appearing workers. While [13] considers sequentially arriving workers and their algorithm decides which task to assign to a worker, we consider sequentially arriving tasks and our algorithm decides which workers should be assigned to a task. Therefore, our algorithm can be applied to an infinite number of task types by describing a task using its context. Moreover, our algorithm additionally takes worker context into account, which may affect worker performance in mobile crowdsourcing. In [14], a bounded multiarmed bandit model for expert crowdsourcing is presented and a task assignment algorithm with sublinear regret is derived which maximizes the utility of a budgetconstrained task owner under uncertainty about the skills of a finite set of workers with (known) different prices and limited working time. While in [14], the average skill of a worker is learned, our algorithm additionally takes worker and task contexts into account, and thereby learns contextspecific performance. In [15], a realtime algorithm for finding the topk workers for sequentially arriving tasks is presented. First, tasks are categorized offline into different types and the similarity between a worker’s profile and each task type is computed. Then, in real time, the topk workers are selected for a task based on a matching score, which takes into account the similarity and historic worker performance. The authors propose to periodically update the performance estimates offline in batches, but no guarantees on the learning process are given. Compared to [15], we additionally take into account worker context, learn contextspecific performance and derive guarantees on the learning speed. In [16], methods for learning a worker preference model are proposed for personalized TR in WST mode. These methods use the history of worker preferences for different tasks, but in contrast to our algorithm, they do not take into account the worker context for contextaware TR.
In mobile crowdsourcing, [17] proposes algorithms for optimal TR in WST mode that take into account the tradeoff between the privacy of worker context, the utility to recommend the best tasks and the efficiency in terms of communication and computation overhead. The TR is performed by a server based on a generalized context shared by the worker. The statistics used for TR are gathered offline via a proxy that ensures differential privacy guarantees. While [17] allows to flexibly adjust the shared generalized context and makes TRs based on offline statistics and generalized worker context, our approach keeps the worker context locally and learns each worker’s individual statistics online. In [18], which is focused on mobile crowdsensing, an online learning algorithm is presented to maximize the sensing revenue of a budgetconstrained task owner by learning the sensing value of workers with known prices. While [18] considers a total budget and each crowdsensing task requires a minimum number of workers, we consider a separate budget per task, which translates to a maximum number of required workers. Moreover, we additionally take task and worker context into account.
A taxonomy for spatial crowdsourcing was first introduced in [6]. The authors present a locationentropy based algorithm for SAT mode to maximize the number of task assignments under uncertainty about task and worker arrival processes. The server decides on task assignment based on centrally gathered knowledge about the workers’ current locations. In [19], the authors extend this framework to maximize the quality of assignments under varying worker skills for different task types. However, in contrast to our work, [6] and [19] assume that worker context is centrally gathered, that workers always accept assigned tasks within certain known bounds and that worker skills are known a priori. In [20], an online task assignment algorithm is proposed for spatial crowdsourcing with SAT mode for maximizing the expected number of accepted tasks. The problem is modeled as a contextual multiarmed bandit problem, and workers are selected for sequentially arriving tasks. The authors adapt the LinUCB algorithm by assuming that the acceptance rate is a linear function of the worker’s distance to the task and the task type. However, such a linearity assumption is restrictive and it especially may not hold in mobile crowdsourcing with locationindependent tasks. In contrast, our algorithm works for more general relationships between context and performance. In [21], an algorithm for privacypreserving spatial crowdsourcing in SAT mode is proposed. Using differential privacy and geocasting, the algorithm preserves worker locations (i.e., their context) while optimizing the expected number of accepted tasks. However, the authors assume that the workers’ acceptance rates are identical and known, whereas our algorithm learns contextspecific acceptance rates. In [22], exact and approximation algorithms for acceptance maximization in spatial crowdsourcing with SAT mode are proposed. The algorithms are performed offline for given sets of available workers and tasks based on a probability of interest for each pair of worker and task. The probabilities of interest are computed beforehand using maximum likelihood estimation. On the contrary, our algorithm learns acceptance rates online and we provide an upper bound on the regret of this learning.
Our proposed algorithm is based on the contextual multiarmed bandit problem [23, 24, 25, 26, 27, 28]. The closest related work is [28], in which a learner observes multiple context arrivals in each round and selects a subset of actions which maximize the expected rewards given the set of context arrivals. We extended the algorithm in [28] as follows: While in [28], a central learner observes all contexts and selects actions based on these contexts, our algorithm is decoupled to several learning entities, each observing the context of one particular action and learning the rewards of this action, and a coordinating entity, which selects actions based on the learning entities’ estimates. In the crowdsourcing scenario, an action corresponds to a worker, the learning entities correspond to the LCs which learn the performance of their workers, and the coordinating entity corresponds to the MCSP, which selects workers based on the performance estimates from the LCs. Moreover, while in [28], the same number of actions is selected per round and all actions are available in any round, we allow different numbers of actions to be selected per round and we allow actions to be unavailable. In the crowdsourcing scenario, this corresponds to allowing different number of required workers for different tasks and allowing that workers may be unavailable.
Iii System Model
Iiia Mobile Crowdsourcing Platform
We consider an MCSP, to which a fixed set of workers belongs. A worker is a user equipped with a mobile device, in which the mobile crowdsourcing application is installed. Workers can be in two modes: A worker is called available, if the mobile crowdsourcing application on his device is running. In this case, the MCSP might request the worker to complete a task, which the worker may then accept or decline. A worker is called unavailable, if the mobile crowdsourcing application on his device is turned off.
Task owners can place locationindependent tasks of different types into the MCSP and select their task budget. In detail, a task is defined by a tuple , where denotes the budget that the task owner is willing to pay the MCSP for this task and denotes the task context. The task context is taken from a bounded dimensional task context space and captures information about the task type.^{7}^{7}7In order to represent different task types, we assume that tasks are described by context dimensions. In each of the context dimensions, a task is classified via a value between . Then, is a vector describing task ’s overall context. The task owner has to pay the MCSP for taking care of selecting appropriate workers for his task and for requesting these workers to complete the task. We assume here that the MCSP sets a fixed price per requested worker to be paid by the task owner. Moreover, we assume that for each task , the budget satisfies , so that the MCSP should request at least one worker and at most workers. Based on the budget , the MCSP computes the number of workers it should request.
We assume that tasks arrive at the MCSP sequentially and we denote the sequentially arriving tasks by . For each arriving task , the MCSP should select a subset of workers which maximizes the worker performance for that task.
Due to the dynamics in worker availability over time, the MCSP can only select workers from the set of currently available workers for task , as defined by , where the number of available workers^{8}^{8}8We assume that for each arriving task, at least one worker is available. is denoted by . Since the MCSP can select at most all available workers, it aims at selecting workers for task , see Fig. 1 for an illustration.^{9}^{9}9Note that each task will only be processed once by the MCSP, even if there are too few workers available. If less workers are available than required for a task, the MCSP will request all available workers to complete the task and the task owner will only be charged for the actual number of requested workers.
IiiB ContextSpecific Worker Performance
The performance of an (available) worker depends on (i) the worker’s willingness to accept the task and (ii) the worker’s quality in completing the task, where we assume that the quality can take values in a range . Both the willingness of a worker to accept the task as well as the quality may depend on the worker’s current context and the task context. Let denote the personal context of worker at the arrival time of task , coming from a bounded dimensional personal context space . Here, we allow each worker to have an individual personal context space , since each worker may allow access to an individual set of context dimensions due to his personal settings (e.g., the worker allows access to a certain set of sensors of his mobile device that are used to derive his context). Possible personal context dimensions could be the worker’s current location (in terms of geographic coordinates), the type of location (e.g., at home, in a coffee shop), the worker’s current activity (e.g., commuting, working) or his current device status (e.g., battery state, type of wireless connection). We further call the concatenation the joint context of worker and task . For worker , this joint context is hence a vector of dimension . We call the joint (personal and task) context space of worker . The reason for considering the joint context is that the performance of worker may depend on both the his current context and the task context – in other words, the performance depends jointly on .
Let denote the performance of worker with current personal context for task context . The performance can be decomposed into (i) worker ’s decision to accept () or reject () the task and, in case the worker accepts the task, also on (ii) worker ’s quality when completing the task. Hence, we can write
The performance is a random variable whose distribution depends on the distributions of the random variables and . Here, since the decision is binary, it is drawn from the Bernoulli distribution with unknown parameter . Hence, represents the acceptance rate of worker when the joint context of worker and task is . The quality is a random variable conditioned on (i.e., conditioned on task acceptance) with unknown distribution and we denote its expected value by . Hence, represents the average quality of worker with personal context when completing a task of context . Therefore, the performance of worker with personal context for a task of context has unknown distribution, takes values in and its expected value satisfies
where .
IiiC Formal Problem Formulation
Consider an arbitrary sequence of task budgets (which translates to a sequence ) and an arbitrary sequence of worker availability . Let denote a binary variable which is if worker is requested to complete task and otherwise. Then, the problem of selecting, for each task, a subset of workers which maximizes the sum of expected performances given the task budget is given by
(1)  
s.t.  
First, we analyze problem (1) assuming full knowledge about worker performance. Therefore, assume that there was an entity that (i) was an omniscient oracle, which knows the expected performance of each worker in each context for each task context a priori and (ii) for each arriving task, this entity is centrally informed about the current contexts of all available worker. For such an entity, problem (1) is an integer linear programming problem, which can be decoupled to an independent subproblem per arriving task. For a task , if less workers are available than required, i.e., , the optimal solution is to request all available workers to complete the task. However, if , the corresponding subproblem is a special case of a knapsack problem with a knapsack of size and with items of identical size and nonnegative profit. Therefore, the optimal solution can be easily computed in at most by ranking the available workers according to their contextspecific expected performance and selecting the highest ranked workers. By , we denote the optimal subset of workers to select for task . Formally, these workers satisfy
Note that depends on the task budget and context , the set of available workers and their personal contexts , but we write instead of for brevity. Let be the collection of optimal subsets of workers for the collection of tasks. We call this collection the solution achieved by a centralized oracle, since it requires an entity which has a priori knowledge about expected performances and central knowledge about all current worker contexts to make optimal decisions.
However, we assume that the MCSP does not have a priori knowledge about expected performances, but it still has to select workers for arriving tasks. Let denote the set of workers that the MCSP selects and requests to complete task . If for an arriving task, less workers are available than required, i.e., , by simply requesting all available workers (i.e., ) to complete the task, the MCSP still automatically selects the optimal subset of workers. Otherwise, for , the MCSP cannot simply solve problem (1) like an omniscient oracle, since it does not know the expected performances . Moreover, we assume that a worker’s current personal context is only locally available in his mobile device. We call the software of the mobile crowdsourcing application, which is installed in a worker’s mobile device, a local controller (LC) and we denote by LC the LC of worker . Each LC has access to its corresponding worker’s personal context information, but it does not share this information with the MCSP. Hence, only the set of LCs, but not the MCSP does have knowledge about the workers’ current personal contexts (such as, their current locations, their activities). However, the expected performance of a worker might depend on his personal context.
Hence, the MCSP and the LCs should cooperate in order to learn expected performances over time and in order to select an appropriate subset of workers for each task. For this purpose, over time, the system of MCSP and LCs has to find a tradeoff between exploration and exploitation, by, on the one hand, selecting workers about whose performance only little information is available and, on the other hand, selecting workers which are likely to have high expected performance. For each arriving task, the selection of workers depends on the history of previously selected workers and their observed performances. However, observing worker performance might be costly, since it might require a manual quality rating from a task owner, or an automatic quality assessment using local software in a batteryconstrained mobile device. Hence, the number of performance observations should be limited in order to keep the cost for such quality assessment feasible (e.g., a task owner should not have to rate the quality of dozens of workers for a single task; quality assessment in mobile devices should be limited to save battery).
Next, we will present a contextaware hierarchical online learning algorithm, which maps the history of previously selected workers and observed performances to the next selection of workers. The performance of this algorithm can be evaluated by comparing its loss with respect to the centralized oracle. This loss is called the regret of learning. For an arbitrary sequence of task budgets (translating to a sequence ) and an arbitrary sequence of worker availability , the regret is formally defined as
(2) 
which is equivalent to
(3) 
Iv A Contextaware Hierarchical Online Learning Algorithm for Performance Maximization in Mobile Crowdsourcing
The goal of the MCSP is to select, for each arriving task, a set of workers that maximizes the sum of expected performances for that task given the task budget. Since the expected performances are not known a priori by neither MCSP nor the LCs, they have to be learned over time. Moreover, since only the LCs have access to the personal context of their respective worker, a coordination is needed between the MCSP and the LCs. Below, we propose a hierarchical contextual online learning algorithm, which is based on algorithms for the contextual multiarmed bandit problem [23, 24, 25, 26, 27, 28]. Our algorithm is based on the assumption that a worker’s expected performance is similar in similar personal and task contexts. Therefore, by observing the task context, a worker’s personal context and his performance when requested to complete a task, the worker’s contextspecific expected performances can be learned and exploited for future worker selection.
We call the proposed algorithm Hierarchical Contextaware Learning (). Fig. 2 shows an overview of ’s operation. In , the MCSP broadcasts the context of each arriving task to the LCs. Upon receiving information about a task, an LC first observes its worker’s personal context. If the worker’s performance has been observed sufficiently often before given the current joint personal and task context, the LC relies on previous observations to estimate its worker’s performance and sends the estimated performance to the MCSP. If its worker’s performance has not been observed sufficiently often before, the LC informs the MCSP that its worker has to be explored. Based on the messages received from the LCs, the MCSP selects a subset of workers and requests them to complete the task. The number of selected workers depends on the task budget, the price per worker and the number of available workers. The LC of each selected worker observes its worker’s decision to accept/decline the task. If a worker was selected for exploration purposes and he accepted the task, the LC additionally observes the quality of the completed task. The reason for only making a quality assessment when a worker was selected for exploration purposes is that this may either require a manual quality rating from the task owner, or, an automatic quality assessment at the mobile device using local software, which both may be costly.^{10}^{10}10If quality assessment is cheap, can be adapted to always observe worker quality. This may increase the learning speed. Hence, by observing the quality of a completed task only if the worker was selected for exploration purposes, keeps the number of costly quality assessments low.
In , a worker’s personal contexts, decisions and qualities are only locally stored at the LC, but not shared with the MCSP. Thereby, i) personal context is locally protected, ii) the required storage space for worker information at the MCSP is kept low, iii) task completion and result transmission can be directly handled between the LC and the task owner, without the need for the MCSP to interfere, iv) workers receive requests for tasks that are interesting for them and which they are good at, but without the need to share their context information, v) even though an LC has to keep track of its worker’s personal context, decision and quality, the computation and storage overhead for each LC is small.
In more detail, LC operates as follows, as given in the pseudocode in Alg. 1. First, for synchronization purposes, LC receives the finite number of tasks to be considered, the task context space and its dimension from the MCSP. Moreover, LC checks to which of worker ’s context dimensions it has access. This defines the personal context space and its dimension . Then, LC sets the joint context space to with size . In addition, LC has to set a parameter and a control function , which are both described below. Next, LC initializes a uniform partition of worker ’s joint context space , which consists of dimensional hypercubes of equal size . Hence, the parameter determines the granularity of the partition of the context space. Moreover, LC initializes a counter for its worker for each hypercube . The counter represents the number of times before (i.e., up to, but not including) task , in which worker was selected to complete a task when worker ’s joint personal and task context belonged to hypercube . Additionally, for each hypercube , LC initializes the counter , which represents the estimated performance of its worker for contexts in hypercube before task .
Then, LC performs the following steps for each of the sequentially arriving tasks . For an arriving task , LC only takes actions if its worker is currently available (i.e., ). If this is the case, LC first receives the task context sent by the MCSP.^{11}^{11}11A worker being not available may mean that he is offline so that the LC cannot even receive information about the arriving task. Therefore, we here consider the LC to only take actions if its worker is in the “available” mode. Moreover, it observes worker ’s current personal context and determines the hypercube from to which the joint context belongs to. We denote this hypercube by . It satisfies . Then, LC checks if worker has not been selected sufficiently often before when worker ’s joint personal and task context belonged to hypercube . For this purpose, LC compares the counter with , where is a deterministic, monotonically increasing control function, set in the beginning of the algorithm. On the one hand, if worker has been selected sufficiently often before (), LC can rely on the estimated performance , which it sends to the MCSP in this case. On the other hand, if worker has not been selected sufficiently often before (), LC sends an “explore” message to the MCSP. The control function is hence needed to distinguish when a worker should be selected for exploration (to achieve reliable estimates) or when its estimates are already reliable and can be exploited. Therefore, the choice of control function is essential to ensure a good result of the learning algorithm, since it determines the tradeoff between exploration and exploitation. Then, LC waits for the MCSP to take care of the worker selection. If worker is not selected, LC does not take further actions. However, if the MCSP requests worker , LC observes whether worker declines or accepts the task. If worker was selected for exploration purposes, LC performs an additional counter update. For this, if worker accepted the task, LC first additionally observes worker ’s quality in completing the task (e.g., by requesting a quality rating from the task owner or by using local software for automatic quality assessment) and sets the observed performance to the observed quality. If worker declined the task, LC sets the observed performance to . Then, based on the observed performance, LC computes the estimated performance for hypercube and the counter . Note that in Alg. 1, the argument is omitted from counters and since it is not necessary to store previous values of these counters.
By definition of , the estimated performance corresponds to the product of (i) the relative frequency with which worker accepted tasks when the joint context belonged to hypercube and (ii) the average quality in completing these tasks. Formally, is computed as follows. Let be the set of observed performances of worker before task when worker was selected for a task and the joint context was in hypercube . If before task , worker ’s performance has never been observed before for a joint context in hypercube , we have and . Otherwise, the estimated performance is given by . However, in , the set does not appear, since the estimated performance can be computed based on , and on the performance for task .
In , the MCSP is responsible for the worker selection, which it performs according to the pseudocode given in Alg. 2. First, for synchronization purposes, the MCSP sends the finite number of tasks to be considered, the task context space and its dimension to the LCs. Then, for each arriving task ), the MCSP computes the required number of workers , based on the budget and the price per worker. In addition, the MCSP initializes two sets. The set represents the set of available workers when task arrives, while is the socalled set of underexplored workers, which contains all available workers which have not been selected sufficiently often before. After broadcasting the task context , the MCSP waits for messages from the LCs. If the MCSP receives a message from an LC, it adds the corresponding worker to the set of available workers. Moreover, in this case the MCSP additionally checks if the received message is an “explore” request. If this is the case, the MCSP adds the corresponding worker to the set of underexplored workers. Note that according to Alg. 1 and Alg. 2, the set of underexplored workers is hence given by
(4) 
Next, the MCSP calculates the number of available workers. If , i.e., exactly the required number or less workers are available, the MCSP enters a selectallworkers phase and selects all available workers to complete the task. Otherwise, the MCSP continues by calculating the number of underexplored workers. If there is no underexplored worker, the MCSP enters an exploitation phase. It ranks the available workers in according to the estimated performances, which it received from their respective LCs. Then, the MCSP selects the highest ranked workers from this ranking. By this procedure, the MCSP is able to use contextspecific estimated performances without actually observing the workers’ current personal context. If there are underexplored workers, the MCSP enters an exploration phase. These phases are needed, such that the LCs of all workers are able to update their estimated performances sufficiently often. Here, two different cases can occur, depending on the number of underexplored workers. Either the number of underexplored workers is at least , in which case the MCSP selects underexplored workers at random. Or the number of underexplored workers is smaller than , in which case the MCSP selects all underexplored workers. Since it should select additional workers, it ranks the available sufficientlyexplored workers according to the estimated performances, which it received from their respective LCs. Then, the MCSP additionally selects the highest ranked workers from this ranking. In this way, additional exploitation is carried out in exploration phases, when the number of underexplored workers is small. After worker selection, the MCSP requests selected workers to perform the task by alerting them via the application’s user interface. Via their interface, these workers are also informed about task context . Note that since the MCSP does not have to keep track of the workers’ decisions, the LCs can handle the contact to the task owner directly (e.g., the task owner may send more detailed task instructions directly to the LC; after task completion, the LC sends the result of the completed task to the task owner).
V Theoretical Analysis
Va Upper Regret Bound
The performance of is evaluated by analyzing its regret, see Eq. (2), with respect to the centralized oracle. In this section, we derive a sublinear bound on the regret, i.e., we show that with some holds. Hence, our algorithm does not have a loss compared to the centralized oracle, since for , it follows that . The regret bound is derived based on the assumption that under a similar personal context and a similar task context, a worker’s expected performance is also similar. This assumption can be formalized as follows.^{12}^{12}12Note that our algorithm can also be applied to data, which does not satisfy this assumption. In this case, the regret bound may, however, not hold.
Assumption 1 (Hölder continuity assumption)
There exists , such that for all workers and for all joint contexts , it holds that
where denotes the Euclidean norm in .
The theorem given below shows that the regret of is sublinear in the time horizon .
Theorem 1 (Bound for )
The proof of Theorem 1 is given in Appendix A. Theorem 1 shows that converges to the centralized oracle in the sense that when the number of tasks goes to infinity, the averaged regret diminishes. Moreover, since Theorem 1 is applicable for any finite number of tasks, it can be used to characterize the algorithm’s speed of learning.
While the regret bound given in Theorem 1 holds for an arbitrary sequence of task budgets and worker availability, more specific regret bounds can be derived for specific stochastic task budgets and worker availability. For example, consider the case when both the task budgets and the worker availability are i.i.d. random variables. Furthermore, assume that the distributions of and are such that for some for . This means, that with probability , the number of available workers exceeds the required number of workers for any task. For this scenario, the following regret bound holds.
Corollary 1 (Bound for under )
The proof of Corollary 1 is given in Appendix B. Compared to Theorem 1, in this special case, the regret is scaled by the factor , since with probability , the tasks need more workers than available workers, and hence, the algorithm selects the optimal set of workers, which is the set of all available workers.
VB Local Storage Requirements
The required local storage size in the mobile device of a worker is determined by the storage size needed when the LC executes Alg. 1. In Alg. 1, the LC of worker stores the counters and for each . Using the parameters from Theorem 1, the number of hypercubes in the partition is . Hence, the number of counters to store in the mobile device of worker is upper bounded by . Hence, the required storage depends on the number of context dimensions. If the worker allows access to a high number of personal context dimensions and/or the number of task context dimensions is large, the algorithm learns the worker’s contextspecific performance with finer granularity and therefore the assigned tasks are more personalized, but also the required storage size will increase.
VC Communication Requirements
The communication requirements of can be deduced from its main operation steps: For each task , first, the MCSP broadcasts the task context to the LCs, which is a vector of dimension (or, in other words, scalars). Then, the LCs of available workers estimate their worker’s performance and send it to the MCSP. This corresponds to scalars to be transmitted (one scalar sent by each LC of an available worker). Finally, the MCSP informs the selected workers about its decision, which corresponds to scalars sent by the MCSP. Hence, for task , in sum, scalars have to be transmitted. Among these, scalars are transmitted by the MCSP and one scalar is transmitted by each mobile device of an available worker.
We now compare the communication requirements of and of the centralized version of , in which for each task, first, the personal contexts of available workers are gathered in the MCSP, which then makes the worker selection based on the task and personal contexts and informs selected workers about its decision. The communication requirements of the centralized version are as follows: For each task , the LC of each available worker sends the current worker context to the MCSP, which is a vector of dimension (i.e., scalars). Hence, in sum, scalars are transmitted. After worker selection, the MCSP requests selected workers to perform the task, which corresponds to scalars sent by the MCSP. Moreover, the MCSP has to inform the selected workers about the task context, which is a vector of dimension (i.e., scalars). Hence, in total, scalars are transmitted for task . Among these, scalars are transmitted by the MCSP and scalars are transmitted by each mobile device of an available worker.
From this analysis, we can deduce the following: Since the number of personal context dimensions will typically be larger than 1, reduces the size of transmission of each mobile device, compared to the centralized approach, while still taking advantage of personal context information. In a special case, even the sum communication requirements (for all mobile devices and the MCSP in sum) of are smaller than that of the centralized approach: If , where , and additionally for all , then in the centralized approach, in sum, at least scalars are transmitted.
VD Worker Quality Assessment Requirements
As mentioned above, observing a worker’s quality might be costly since it might require either a quality rating by the task owner or an automatic quality assessment using local software in the batteryconstrained mobile device. explicitly takes this into account by only requesting a quality assessment if a worker is selected for exploration purposes. Here, we give an upper bound on the number of quality assessments per worker up to task .
Corollary 2 (Bound for Number of Quality Assessments up to task )
Vi Numerical Results
We evaluate by comparing its performance with various algorithms based on synthetic and real data.
Via Reference Algorithms
The following algorithms are used for comparison.

(Centralized) Oracle: The Oracle has perfect a priori knowledge about contextspecific expected performances. Moreover, it is centrally informed about the current contexts of available workers.

LinUCB: This algorithm assumes that the expected performance of a worker is linear in its context [29], [30]. Based on a linear reward function over contexts and the history of previous observations of contextspecific worker performances, for each task, the algorithm chooses the available workers with highest estimated upper confidence bounds on their expected performance. The algorithm has an input parameter , controlling the influence of the confidence bound. LinUCB is used in [20] for task assignment in spatial crowdsourcing.

AUER: This algorithm [31] is an extension of the wellknown UCB algorithm [32] to the sleeping arm case. It learns from previous observations of worker performances, but without taking into account context information. Based on the history of previous observations of worker performances, this algorithm selects the available workers with highest estimated upper confidence bounds on their expected performance. The algorithm has an input parameter , which controls the influence of the confidence bound.

Greedy: With a probability of , this algorithm selects a random subset of available workers. With a probability of , the algorithm selects the available workers with highest estimated performance. The estimated performance of a worker is computed based on the history of previous performances [32], but without taking into account context.

Myopic: This algorithm only learns from the last interaction with each worker. For task , it selects a random subset of workers. For each of the following tasks, it checks which of the available workers have previously accepted a task. If more than of the available workers have accepted a task when requested the last time, the algorithm selects out of these workers the workers with the highest performance in their last completed task. Otherwise, the algorithm selects all of these workers and an additional subset of random workers so that in total workers are selected.

Random: For each task , a random subset of available workers is selected.
Note that, if an algorithm originally would have selected only one worker per task, we adapted it to select workers per task. Also, above, we described the behavior of the algorithms for the case . In the case of , we adapted each algorithm such that it selects all available workers. Moreover, most of the reference algorithms are usually used in a centralized fashion, but they can be decoupled to use them in the same hierarchical setting as .
ViB Evaluation Metrics
Each algorithm is run over a sequence of tasks and its result is evaluated using the following metrics. As a function of the arriving tasks, we compute the cumulative performance up to achieved by an algorithm, which is the cumulative sum of performances by all selected workers up to (and including) task . Formally, if the set of selected workers of an algorithm for task is and is the observed performance of worker , the cumulative performance up to achieved by algorithm is
Moreover, we compute the average worker performance up to achieved by an algorithm, which is the average performance of all selected workers up to task . Formally, the average worker performance up to is defined by
ViC Simulation Setup
We evaluate the different algorithms using synthetic as well as real data. The difference between the two approaches lies in the arrival process of workers and their contexts. To produce synthetic data, we generate workers and their contexts based on some predefined distributions as described below. In case of real data, similar to, e.g., [6, 20, 22], we use a data set from Gowalla [33]. Gowalla is a locationbased social network where users share their location by checking in at ’spots’, i.e., certain places in their vicinity. We use the checkins to simulate the arrival process of workers and their contexts. The Gowalla data set consists of 6,442,892 checkins of 107,092 distinct users over the period of February 2009 to October 2010. Each entry of the data set consists of the form (User ID, Checkin Time, Latitude, Longitude, Location ID). Similar to the approach in [22], we first extract the checkins in New York City, which leaves a subset of 138,954 checkins of 7,115 distinct users at 21,509 distinct locations. This resulting GowallaNY data set is used below.
Fig. 3(a) shows the distribution of the number of checkins in the GowallaNY data set. The number of checkins is distributed between 1 checkin (1414 users) up to 1794 checkins (1 user). As an example, 2532 users checked in more than 10 times. Fig. 3(b) shows the distribution of the number of distinct locations visited by the users in the GowallaNY data set. The number of visited locations is distributed between 1 location (1524 users) up to 1633 locations (1 user). For example, 3661 users checked in at more than 5 locations.
For both synthetic and real data, the basic simulation setup is as follows: We simulate an MCSP, to which a set of workers belongs. For synthetic data, workers are created in the beginning. For real data, we randomly select users from the GowallaNY data set, which represent the workers of the MCSP. Then we use this reduced GowallaNY data set containing the checkins of users. Task owners have to pay a fixed price of per worker requested by the MCSP to perform the task. The quality of a completed task lies in the range and . The task properties are modeled as follows. The task budget is sampled from a normal distribution with expected value and standard deviation of , truncated between and , i.e., given the price , the required number of workers is on average . The task context is assumed to be uniformly distributed in (i.e., ).
The worker arrival process and the worker context are sampled differently in case of synthetic and real data. For synthetic data, we let each worker be available with a probability of (default value) for each arriving task. The personal context space of an available worker is set to (i.e., ). The first personal context dimension refers to the worker’s location, which is sampled from different (personal) locations, using a weighted discrete distribution with probabilities to represent the fact that workers may use the crowdsourcing application different amounts of time in different places (e.g., at home more often than at work). The second personal context dimension refers to the worker’s battery state, which is sampled from a uniform distribution in . The worker performance is modeled as follows. The joint personal and task context space (of dimension ) is uniformly split into a uniform grid of subsets (i.e., in each of the dimension, the space is split into identical parts). In each of the subsets, the expected performance of a worker is a priori sampled uniformly at random from . Later, for each sampled joint worker and task context, first it is checked to which subset of the grid the sampled context belongs. Then, the instantaneous performance is sampled by adding noise uniformly sampled from to the expected performance in the given subset of the grid (the noise interval is truncated to a smaller interval if the expected performance lies close to either or ).
Using real data, the arrival processes of workers and their contexts are sampled as follows. For the worker availability, we use a Binomial distribution with parameters and (default value) to sample the number of available workers for an arriving task.^{13}^{13}13In this way, the number of available workers in our experiments using the real and the synthetic data are distributed in the same way. Having sampled , we randomly draw samples from the reduced GowallaNY data set (consisting of the checkins of users) until these samples contain distinct users. These sampled users correspond to the available workers (i.e., users with higher number of checkins in the reduced GowallaNY data set translate to workers that are more often available for the MCSP). The personal context space of an available worker is again set to (i.e., ). We set the first personal context of the available workers to the checkin location of the respective user from the sample.^{14}^{14}14If a user was sampled several times until we sampled distinct users, we choose his first sampled checkin location. The second personal context dimension refers again to the battery state, uniformly sampled from . The worker performance is modeled as follows. The joint personal and task context space () is split into a grid. Along the dimensions of task context and battery state, the context space is split into uniform parts each. Along the dimension of location context, the context space is split into uniform parts, where corresponds to the number of distinct locations visited by the corresponding user from the reduced GowallaNY data set (i.e., for each visited location, the expected performance will be different). In each of the subsets, the expected performance of a worker is a priori sampled uniformly at random from . In this way, workers with higher number of visited locations have a higher number of different contextspecific performances. Later, for each sampled joint worker and task context, the instantaneous performance is sampled by adding noise uniformly sampled from to the expected performance in the given subset of the grid (the noise interval is truncated to a smaller interval if the expected performance lies close to either or ).
ViD Parameter Selection
, LinUCB, AUER and Greedy require a parameter as input, which affects their performance. In order to find appropriate parameters, we generate synthetic instances, each instance consisting of a sequence of tasks and worker arrivals sampled according to the procedure explained above. Then, we run , LinUCB, AUER and Greedy with different parameters. Note that for , we set , choose , , as in Theorem 1, and set the control function to , , where the factor is included to reduce the number of exploration phases. Then we search for an appropriate . Table II shows the parameters at which each of the algorithms performed best, respectively.
Algorithm  Parameter  Selected Value 

LinUCB  
AUER  
Greedy 
ViE Results
Next, in order to evaluate the different algorithms, we generate another synthetic instances and instances based on real data. Each instance consists again of a sequence of tasks and worker arrivals sampled according to the descriptions given above. Then, we run the algorithms with the parameters from Table II, on the synthetic instances and on the instances based on real data, respectively. The results shown below are averaged over these instances.
Fig. 4(a) and Fig. 4(b) show the cumulative worker performance up to task as a function of the sequentially arriving tasks , for synthetic and real data, respectively.
The cumulative performance achieved by each algorithm is linearly increasing for increasing number of processed tasks for both synthetic and real data. As expected, while Oracle outperforms all other algorithms due to its a priori knowledge about the workers’ expected performances, Random gives a lower bound on the achievable cumulative performance. The proposed algorithm clearly outperforms the algorithms LinUCB, AUER, Greedy and Myopic, even though observes worker performance only when requesting a worker for exploration purposes, while the other algorithms have access to worker performance whenever a worker is requested. This is due to the fact that smartly exploits worker and task context information. In detail, the cumulative worker performance up to achieved by corresponds to between () and () times the results achieved by the other nonoracle algorithms for the synthetic (real) data. Moreover, reaches a result close to the optimal. In detail, the cumulative worker performance up to achieved by corresponds to () times the one achieved by Oracle for the synthetic (real) data. In contrast, the algorithms LinUCB, AUER, Greedy and Myopic perform by far worse and interestingly, lie in a very similar region close to the result of Random. This shows that the learning approaches which either do not take context into account (i.e., AUER, Greedy and Myopic) or which assume a linear dependency between context and performance (i.e., LinUCB), cannot cope with the nonlinear context dependency of expected worker performance. Comparing synthetic and real data, has a better performance on the synthetic data, but it still reaches a good result on the real data, even though each worker has his own diversity in context arrival and hence in expected performance in the real data (since users in the GowallaNY data set have different numbers of visited checkin locations).
Fig. 5(a) and Fig. 5(b) show the average worker performance up to task as a function of the sequentially arriving tasks .
We see that over the sequence of tasks, the average worker performance achieved by Oracle and Random stay nearly constant at around and for both synthetic and real data. The algorithms LinUCB, AUER, Greedy and Myopic increase the average worker performance slightly, starting between and at and ending with average performance of between and at . On the contrary, is able to increase the average worker performance from at up to () at for the synthetic (real) data. Hence, learns contextspecific worker performances over time and after sufficiently many processed tasks selects workers almost as well as Oracle does.
Finally, we evaluate the impact of worker availability by varying the probability .^{15}^{15}15Note that this is equivalent to varying the expected task budget. We therefore do not separately evaluate the impact of the task budget. In particular, the results presented below for different worker availability of course scale with the task budget and cannot be used to draw absolute conclusions. For each value of , we run all algorithms on synthetic instances and instances based on real data for , respectively. Then, we average the results. Fig. 6(a) and 6(b) show the cumulative performance at achieved by the algorithms for different .
For small , all algorithms yield approximately the same performance. This is as expected since given our modelling of task budget, for small