Context-Aware Hierarchical Online Learning for Performance Maximization in Mobile Crowdsourcing

Context-Aware Hierarchical Online Learning
for Performance Maximization
in Mobile Crowdsourcing

Sabrina Müller1, Cem Tekin2, Mihaela van der Schaar34 and Anja Klein1



This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. 1Communications Engineering Lab, TU Darmstadt, Germany,{s.mueller, a.klein}@nt.tu-darmstadt.de 2Electrical and Electronics Engineering Department, Bilkent University, Turkey, cemtekin@ee.bilkent.edu.tr 3Department of Electrical Engineering, University of California Los Angeles, USA, mihaela@ee.ucla.edu 4Department of Engineering Science, University of Oxford, Oxford, UK
Abstract

In mobile crowdsourcing, mobile users accomplish outsourced human intelligence tasks. Mobile crowdsourcing requires an appropriate task assignment strategy, since different workers may have different performance in terms of acceptance rate and quality. Task assignment is challenging, since a worker’s performance (i) may fluctuate, depending on both the worker’s current context and the task context, (ii) is not known a priori, but has to be learned over time. However, learning context-specific worker performance requires access to context information, which workers may not grant to a central entity. Moreover, evaluating worker performance might require costly quality assessments. In this paper, we propose a context-aware hierarchical online learning algorithm addressing the problem of performance maximization in mobile crowdsourcing. In our algorithm, a local controller (LC) in the mobile device of a worker regularly observes its worker’s context, his decisions to accept or decline tasks and the quality in completing tasks. Based on these observations, the LC regularly estimates its worker’s context-specific performance. The mobile crowdsourcing platform (MCSP) then selects workers based on performance estimates received from the LCs. This hierarchical approach enables the LCs to learn context-specific worker performance and it enables the MCSP to select suitable workers. In addition, our algorithm preserves worker context locally, and it keeps the number of required quality assessments low. We prove that our algorithm converges to the optimal task assignment strategy. Moreover, the algorithm outperforms simpler task assignment strategies in experiments based on synthetic and real data.

Crowdsourcing, task assignment, online learning, contextual multi-armed bandits

I Introduction

Conventional web-based crowdsourcing is a popular way to outsource human intelligence tasks, prominent examples being Amazon Mechanical Turk111https://www.mturk.com and Crowdflower222https://www.crowdflower.com/. More recently, mobile crowdsourcing has evolved as a powerful tool to leverage the workforce and skills of a plethora of mobile users to accomplish tasks in a distributed manner [1]. This may be due to the fact that the number of mobile devices is growing rapidly and at the same time, people spend a considerable amount of their daily time using these devices. For example, according to the Cisco Visual Networking Index, between 2015 and 2016, the global number of mobile devices grew from 7.6 to 8 billion [2]. Moreover, according to an estimate of eMarketer [3], the daily time US adults spend using mobile devices will be more than 3 hours in 2017, which is an increase by more than compared to 2013.

In mobile crowdsourcing, task owners outsource their tasks via an intermediary mobile crowdsourcing platform (MCSP) to a set of workers, i.e., mobile users, who may be willing to complete these tasks. A mobile crowdsourcing task may require the worker to interact with his mobile device in the physical world (e.g., photography tasks) or to complete some virtual task via his mobile device (e.g., image annotation, sentiment analysis). Some mobile crowdsourcing tasks, subsumed under the term spatial crowdsourcing [4], are spatially constrained (e.g., photography task at point of interest), or require high spatial resolution (e.g., air pollution map of a city). In spatial crowdsourcing, tasks typically require workers to travel to certain locations. However, recently emerging mobile crowdsourcing applications are also concerned with location-independent tasks. For example, MapSwipe333https://mapswipe.org/ lets mobile users annotate satellite imagery to find inhabitated regions around the world. The GalaxyZoo app444https://www.galaxyzoo.org/ lets mobile users classify galaxies. The latter project is an example of the more general trend of citizen science [5]. On the commercial side, Spare5555https://app.spare5.com/fives or Crowdee666https://www.crowdee.de/ outsource micro-tasks (e.g., image annotation, sentiment analysis, and opinion polls) to mobile users in return for small payments. While location-independent tasks could as well be completed by users of static devices as in web-based crowdsourcing, emerging mobile crowdsourcing applications for location-independent tasks exploit the increasing amount of online mobile users that complete tasks on the go.

Mobile crowdsourcing – be it spatial or location-independent – requires an appropriate task assignment strategy, since not all online workers may be equally suitable for a given task. First, different workers may have different task preferences and hence different acceptance rates. Secondly, different workers may have different skills, and hence provide different quality when completing a task. Two assignment modes considered in the crowdsourcing literature are the server-assigned task (SAT) mode and the worker-selected task (WST) mode (see [6] for a taxonomy). In SAT mode, the MCSP tries to match workers and tasks in an optimal way, e.g., to maximize the number of task assignments, possibly under a given task budget. For this purpose, the MCSP typically gathers task and worker information to decide on task assignment. While this represents a sophisticated strategy, it may require a large communication overhead and a privacy concern for workers since the MCSP has to be regularly informed about the current worker contexts (e.g., their current positions). Moreover, previous work on the SAT mode often either assumed that workers always accept a task once assigned to it or that workers’ acceptance rates and quality are known in advance. However, this is not necessarily true in reality, since acceptance rates and quality are usually not known beforehand and therefore have to be learned by the MCSP over time. In addition, a worker’s acceptance rate and the quality of completed tasks might depend not only on the specific task, but also on the worker’s context, e.g., the worker’s location or the time of day [7]. This context may change quickly, especially in mobile crowdsourcing with location-independent tasks, since workers can complete such tasks anytime and anywhere.

In contrast, in WST mode, workers autonomously select tasks from a list. This rather simple mode is often used in practice (e.g., on Amazon Mechanical Turk) since it has the advantage that workers automatically select tasks they are interested in. However, the WST mode can lead to suboptimal task assignments since first, finding suitable tasks is not as easy as it seems (e.g., time-consuming searches within a long list of tasks are needed and workers might simply select from the first displayed tasks [8]) and secondly, workers might leave unpopular tasks unassigned. Therefore, in WST mode, the MCSP might additionally provide personalized task recommendation (TR) to workers such that workers find appropriate tasks [7]. However, personalized TR typically requires the workers to share their current context with the MCSP, which again means a communication overhead and a privacy concern for workers.

We argue that a task assignment strategy is needed which combines the advantages of the above modes: The MCSP should centrally coordinate task assignment to ensure that appropriate workers are selected for each task, as in SAT mode. At the same time, the communication overhead due to task assignment should be small and workers’ personal data should be protected. Hence, no worker context should be shared with the MCSP, as in basic WST mode. Moreover, task assignment should take into account that workers may decline a task they are assigned to, and hence, the assignment should fit to the workers’ preferences, as in WST mode with personalized TR. In addition, task assignment should be based both on acceptance rates and on the quality with which a task is completed. Since observing the quality of completed tasks may require costly quality assessments (e.g., a manual quality rating from a task owner, or an automatic quality assessment using local software in a mobile device), the required number of quality assessments should be kept low. Finally, workers’ acceptance rates and quality have to be learned over time.

Our contribution therefore is as follows: We propose a context-aware hierarchical online learning algorithm for performance maximization in mobile crowdsourcing for location-independent tasks. Our algorithm is split into two parts, one part executed by the MCSP, the other part by local controllers (LCs) located in each of the workers’ mobile devices. An LC learns its worker’s performance in terms of acceptance rate and quality online over time, by observing the worker’s personal contexts and his decisions to accept or decline tasks and the quality in completing these tasks. The LC learns from its worker’s context only locally, and personal context is not shared with the MCSP. Each LC regularly sends performance estimates to the MCSP. Based on these estimates, the MCSP takes care of the worker selection. This hierarchical (in the sense of the coordination between the MCSP and the LCs) approach enables the MCSP to select suitable workers for each task under its budget without having access to the workers’ personal contexts. Moreover, workers receive personalized task requests based on their interests and skills, while keeping the number of (possibly costly) quality assessments low. Since our algorithm learns in an online fashion, it adapts and improves the worker selection over time and can hence achieve good results already during run time. We prove that our algorithm converges to the optimal task assignment strategy, which for each task selects the subset of workers which maximizes the expected performance under the task budget.

The remainder of this paper is organized as follows. Section II gives an overview on related work. Section III describes the system model. In Section IV, we propose a context-aware hierarchical online algorithm for performance maximization in mobile crowdsourcing. In Section V, we theoretically analyze our proposed algorithm in terms of its regret, as well as its requirements with respect to local storage, communication and worker quality assessment. Section VI contains a numerical evaluation of our algorithm based on synthetic and real data. Section VII concludes the paper.

Ii Related Work

Research has put some effort in theoretically defining and classifying crowdsourcing systems, such as web-based crowdsourcing [9], mobile [1] and spatial [4] crowdsourcing.

[13] [14] [15] [16] [17] [18] [6],[19] [20] [21] [22] This work
Crowdsourcing Type General General General General Mobile Mobile Spatial Spatial Spatial Spatial Loc.-Ind.
Task Assignment Mode SAT SAT SAT WST/TR WST/TR SAT SAT SAT SAT SAT proposed
Different Task Types Yes No Yes Yes Yes No Yes Yes No Yes Yes
Worker Context-Aware No No No No Yes No Yes Yes Yes No Yes
Context-Spec. Performance No No No No Yes No No Yes Yes No Yes
Worker Context Protected N/A N/A N/A N/A Yes N/A No No Yes N/A Yes
Type of Learning Online Online Batch Online Offline Online N/A Online N/A Offline Online
Regret Bounds Yes Yes No No No Yes N/A Yes N/A No Yes
TABLE I: Comparison with related work on task assignment in crowdsourcing.

Below, we give an overview on related work on task assignment in general, mobile and spatial crowdsourcing systems, as relevant for our scenario. Note that strategic behavior of workers and task owners in crowdsourcing systems, e.g., concerning pricing and effort spent in task completion [10], represents an active research area itself, which is out of the scope of this paper. Also note that we assume that the quality of a completed task can be observed. A different line of work on crowdsourcing deals with quality estimation in case of missing ground truth, recently also using online learning [11].

Due to the inherent dynamic nature of crowdsourcing, with tasks and/or workers typically arriving dynamically over time, task assignment is often modeled as an online decision making problem [12]. For general crowdsourcing systems, [13] proposed a competitive online task assignment algorithm for maximizing the utility of a task owner on a given set of task types, with finite number of tasks per task type, by learning the skills of sequentially appearing workers. While [13] considers sequentially arriving workers and their algorithm decides which task to assign to a worker, we consider sequentially arriving tasks and our algorithm decides which workers should be assigned to a task. Therefore, our algorithm can be applied to an infinite number of task types by describing a task using its context. Moreover, our algorithm additionally takes worker context into account, which may affect worker performance in mobile crowdsourcing. In [14], a bounded multi-armed bandit model for expert crowdsourcing is presented and a task assignment algorithm with sublinear regret is derived which maximizes the utility of a budget-constrained task owner under uncertainty about the skills of a finite set of workers with (known) different prices and limited working time. While in [14], the average skill of a worker is learned, our algorithm additionally takes worker and task contexts into account, and thereby learns context-specific performance. In [15], a real-time algorithm for finding the top-k workers for sequentially arriving tasks is presented. First, tasks are categorized offline into different types and the similarity between a worker’s profile and each task type is computed. Then, in real time, the top-k workers are selected for a task based on a matching score, which takes into account the similarity and historic worker performance. The authors propose to periodically update the performance estimates offline in batches, but no guarantees on the learning process are given. Compared to [15], we additionally take into account worker context, learn context-specific performance and derive guarantees on the learning speed. In [16], methods for learning a worker preference model are proposed for personalized TR in WST mode. These methods use the history of worker preferences for different tasks, but in contrast to our algorithm, they do not take into account the worker context for context-aware TR.

In mobile crowdsourcing, [17] proposes algorithms for optimal TR in WST mode that take into account the trade-off between the privacy of worker context, the utility to recommend the best tasks and the efficiency in terms of communication and computation overhead. The TR is performed by a server based on a generalized context shared by the worker. The statistics used for TR are gathered offline via a proxy that ensures differential privacy guarantees. While [17] allows to flexibly adjust the shared generalized context and makes TRs based on offline statistics and generalized worker context, our approach keeps the worker context locally and learns each worker’s individual statistics online. In [18], which is focused on mobile crowdsensing, an online learning algorithm is presented to maximize the sensing revenue of a budget-constrained task owner by learning the sensing value of workers with known prices. While [18] considers a total budget and each crowdsensing task requires a minimum number of workers, we consider a separate budget per task, which translates to a maximum number of required workers. Moreover, we additionally take task and worker context into account.

A taxonomy for spatial crowdsourcing was first introduced in [6]. The authors present a location-entropy based algorithm for SAT mode to maximize the number of task assignments under uncertainty about task and worker arrival processes. The server decides on task assignment based on centrally gathered knowledge about the workers’ current locations. In [19], the authors extend this framework to maximize the quality of assignments under varying worker skills for different task types. However, in contrast to our work, [6] and [19] assume that worker context is centrally gathered, that workers always accept assigned tasks within certain known bounds and that worker skills are known a priori. In [20], an online task assignment algorithm is proposed for spatial crowdsourcing with SAT mode for maximizing the expected number of accepted tasks. The problem is modeled as a contextual multi-armed bandit problem, and workers are selected for sequentially arriving tasks. The authors adapt the LinUCB algorithm by assuming that the acceptance rate is a linear function of the worker’s distance to the task and the task type. However, such a linearity assumption is restrictive and it especially may not hold in mobile crowdsourcing with location-independent tasks. In contrast, our algorithm works for more general relationships between context and performance. In [21], an algorithm for privacy-preserving spatial crowdsourcing in SAT mode is proposed. Using differential privacy and geocasting, the algorithm preserves worker locations (i.e., their context) while optimizing the expected number of accepted tasks. However, the authors assume that the workers’ acceptance rates are identical and known, whereas our algorithm learns context-specific acceptance rates. In [22], exact and approximation algorithms for acceptance maximization in spatial crowdsourcing with SAT mode are proposed. The algorithms are performed offline for given sets of available workers and tasks based on a probability of interest for each pair of worker and task. The probabilities of interest are computed beforehand using maximum likelihood estimation. On the contrary, our algorithm learns acceptance rates online and we provide an upper bound on the regret of this learning.

Our proposed algorithm is based on the contextual multi-armed bandit problem [23, 24, 25, 26, 27, 28]. The closest related work is [28], in which a learner observes multiple context arrivals in each round and selects a subset of actions which maximize the expected rewards given the set of context arrivals. We extended the algorithm in [28] as follows: While in [28], a central learner observes all contexts and selects actions based on these contexts, our algorithm is decoupled to several learning entities, each observing the context of one particular action and learning the rewards of this action, and a coordinating entity, which selects actions based on the learning entities’ estimates. In the crowdsourcing scenario, an action corresponds to a worker, the learning entities correspond to the LCs which learn the performance of their workers, and the coordinating entity corresponds to the MCSP, which selects workers based on the performance estimates from the LCs. Moreover, while in [28], the same number of actions is selected per round and all actions are available in any round, we allow different numbers of actions to be selected per round and we allow actions to be unavailable. In the crowdsourcing scenario, this corresponds to allowing different number of required workers for different tasks and allowing that workers may be unavailable.

Iii System Model

Iii-a Mobile Crowdsourcing Platform

We consider an MCSP, to which a fixed set of workers belongs. A worker is a user equipped with a mobile device, in which the mobile crowdsourcing application is installed. Workers can be in two modes: A worker is called available, if the mobile crowdsourcing application on his device is running. In this case, the MCSP might request the worker to complete a task, which the worker may then accept or decline. A worker is called unavailable, if the mobile crowdsourcing application on his device is turned off.

Task owners can place location-independent tasks of different types into the MCSP and select their task budget. In detail, a task is defined by a tuple , where denotes the budget that the task owner is willing to pay the MCSP for this task and denotes the task context. The task context is taken from a bounded -dimensional task context space and captures information about the task type.777In order to represent different task types, we assume that tasks are described by context dimensions. In each of the context dimensions, a task is classified via a value between . Then, is a vector describing task ’s overall context. The task owner has to pay the MCSP for taking care of selecting appropriate workers for his task and for requesting these workers to complete the task. We assume here that the MCSP sets a fixed price per requested worker to be paid by the task owner. Moreover, we assume that for each task , the budget satisfies , so that the MCSP should request at least one worker and at most workers. Based on the budget , the MCSP computes the number of workers it should request.

We assume that tasks arrive at the MCSP sequentially and we denote the sequentially arriving tasks by . For each arriving task , the MCSP should select a subset of workers which maximizes the worker performance for that task.

Fig. 1: System model. A task arrives at the MCSP. The MCSP has to select an appropriate subset of available workers for the task.

Due to the dynamics in worker availability over time, the MCSP can only select workers from the set of currently available workers for task , as defined by , where the number of available workers888We assume that for each arriving task, at least one worker is available. is denoted by . Since the MCSP can select at most all available workers, it aims at selecting workers for task , see Fig. 1 for an illustration.999Note that each task will only be processed once by the MCSP, even if there are too few workers available. If less workers are available than required for a task, the MCSP will request all available workers to complete the task and the task owner will only be charged for the actual number of requested workers.

Iii-B Context-Specific Worker Performance

The performance of an (available) worker depends on (i) the worker’s willingness to accept the task and (ii) the worker’s quality in completing the task, where we assume that the quality can take values in a range . Both the willingness of a worker to accept the task as well as the quality may depend on the worker’s current context and the task context. Let denote the personal context of worker at the arrival time of task , coming from a bounded -dimensional personal context space . Here, we allow each worker to have an individual personal context space , since each worker may allow access to an individual set of context dimensions due to his personal settings (e.g., the worker allows access to a certain set of sensors of his mobile device that are used to derive his context). Possible personal context dimensions could be the worker’s current location (in terms of geographic coordinates), the type of location (e.g., at home, in a coffee shop), the worker’s current activity (e.g., commuting, working) or his current device status (e.g., battery state, type of wireless connection). We further call the concatenation the joint context of worker and task . For worker , this joint context is hence a vector of dimension . We call the joint (personal and task) context space of worker . The reason for considering the joint context is that the performance of worker may depend on both the his current context and the task context – in other words, the performance depends jointly on .

Let denote the performance of worker with current personal context for task context . The performance can be decomposed into (i) worker ’s decision to accept () or reject () the task and, in case the worker accepts the task, also on (ii) worker ’s quality when completing the task. Hence, we can write

The performance is a random variable whose distribution depends on the distributions of the random variables and . Here, since the decision is binary, it is drawn from the Bernoulli distribution with unknown parameter . Hence, represents the acceptance rate of worker when the joint context of worker and task is . The quality is a random variable conditioned on (i.e., conditioned on task acceptance) with unknown distribution and we denote its expected value by . Hence, represents the average quality of worker with personal context when completing a task of context . Therefore, the performance of worker with personal context for a task of context has unknown distribution, takes values in and its expected value satisfies

where .

Iii-C Formal Problem Formulation

Consider an arbitrary sequence of task budgets (which translates to a sequence ) and an arbitrary sequence of worker availability . Let denote a binary variable which is if worker is requested to complete task and otherwise. Then, the problem of selecting, for each task, a subset of workers which maximizes the sum of expected performances given the task budget is given by

(1)
s.t.

First, we analyze problem (1) assuming full knowledge about worker performance. Therefore, assume that there was an entity that (i) was an omniscient oracle, which knows the expected performance of each worker in each context for each task context a priori and (ii) for each arriving task, this entity is centrally informed about the current contexts of all available worker. For such an entity, problem (1) is an integer linear programming problem, which can be decoupled to an independent sub-problem per arriving task. For a task , if less workers are available than required, i.e., , the optimal solution is to request all available workers to complete the task. However, if , the corresponding sub-problem is a special case of a knapsack problem with a knapsack of size and with items of identical size and non-negative profit. Therefore, the optimal solution can be easily computed in at most by ranking the available workers according to their context-specific expected performance and selecting the highest ranked workers. By , we denote the optimal subset of workers to select for task . Formally, these workers satisfy

Note that depends on the task budget and context , the set of available workers and their personal contexts , but we write instead of for brevity. Let be the collection of optimal subsets of workers for the collection of tasks. We call this collection the solution achieved by a centralized oracle, since it requires an entity which has a priori knowledge about expected performances and central knowledge about all current worker contexts to make optimal decisions.

However, we assume that the MCSP does not have a priori knowledge about expected performances, but it still has to select workers for arriving tasks. Let denote the set of workers that the MCSP selects and requests to complete task . If for an arriving task, less workers are available than required, i.e., , by simply requesting all available workers (i.e., ) to complete the task, the MCSP still automatically selects the optimal subset of workers. Otherwise, for , the MCSP cannot simply solve problem (1) like an omniscient oracle, since it does not know the expected performances . Moreover, we assume that a worker’s current personal context is only locally available in his mobile device. We call the software of the mobile crowdsourcing application, which is installed in a worker’s mobile device, a local controller (LC) and we denote by LC  the LC of worker . Each LC has access to its corresponding worker’s personal context information, but it does not share this information with the MCSP. Hence, only the set of LCs, but not the MCSP does have knowledge about the workers’ current personal contexts (such as, their current locations, their activities). However, the expected performance of a worker might depend on his personal context.

Hence, the MCSP and the LCs should cooperate in order to learn expected performances over time and in order to select an appropriate subset of workers for each task. For this purpose, over time, the system of MCSP and LCs has to find a trade-off between exploration and exploitation, by, on the one hand, selecting workers about whose performance only little information is available and, on the other hand, selecting workers which are likely to have high expected performance. For each arriving task, the selection of workers depends on the history of previously selected workers and their observed performances. However, observing worker performance might be costly, since it might require a manual quality rating from a task owner, or an automatic quality assessment using local software in a battery-constrained mobile device. Hence, the number of performance observations should be limited in order to keep the cost for such quality assessment feasible (e.g., a task owner should not have to rate the quality of dozens of workers for a single task; quality assessment in mobile devices should be limited to save battery).

Next, we will present a context-aware hierarchical online learning algorithm, which maps the history of previously selected workers and observed performances to the next selection of workers. The performance of this algorithm can be evaluated by comparing its loss with respect to the centralized oracle. This loss is called the regret of learning. For an arbitrary sequence of task budgets (translating to a sequence ) and an arbitrary sequence of worker availability , the regret is formally defined as

(2)

which is equivalent to

(3)

Iv A Context-aware Hierarchical Online Learning Algorithm for Performance Maximization in Mobile Crowdsourcing

The goal of the MCSP is to select, for each arriving task, a set of workers that maximizes the sum of expected performances for that task given the task budget. Since the expected performances are not known a priori by neither MCSP nor the LCs, they have to be learned over time. Moreover, since only the LCs have access to the personal context of their respective worker, a coordination is needed between the MCSP and the LCs. Below, we propose a hierarchical contextual online learning algorithm, which is based on algorithms for the contextual multi-armed bandit problem [23, 24, 25, 26, 27, 28]. Our algorithm is based on the assumption that a worker’s expected performance is similar in similar personal and task contexts. Therefore, by observing the task context, a worker’s personal context and his performance when requested to complete a task, the worker’s context-specific expected performances can be learned and exploited for future worker selection.

We call the proposed algorithm Hierarchical Context-aware Learning (). Fig. 2 shows an overview of ’s operation. In , the MCSP broadcasts the context of each arriving task to the LCs. Upon receiving information about a task, an LC first observes its worker’s personal context. If the worker’s performance has been observed sufficiently often before given the current joint personal and task context, the LC relies on previous observations to estimate its worker’s performance and sends the estimated performance to the MCSP. If its worker’s performance has not been observed sufficiently often before, the LC informs the MCSP that its worker has to be explored. Based on the messages received from the LCs, the MCSP selects a subset of workers and requests them to complete the task. The number of selected workers depends on the task budget, the price per worker and the number of available workers. The LC of each selected worker observes its worker’s decision to accept/decline the task. If a worker was selected for exploration purposes and he accepted the task, the LC additionally observes the quality of the completed task. The reason for only making a quality assessment when a worker was selected for exploration purposes is that this may either require a manual quality rating from the task owner, or, an automatic quality assessment at the mobile device using local software, which both may be costly.101010If quality assessment is cheap, can be adapted to always observe worker quality. This may increase the learning speed. Hence, by observing the quality of a completed task only if the worker was selected for exploration purposes, keeps the number of costly quality assessments low.

In , a worker’s personal contexts, decisions and qualities are only locally stored at the LC, but not shared with the MCSP. Thereby, i) personal context is locally protected, ii) the required storage space for worker information at the MCSP is kept low, iii) task completion and result transmission can be directly handled between the LC and the task owner, without the need for the MCSP to interfere, iv) workers receive requests for tasks that are interesting for them and which they are good at, but without the need to share their context information, v) even though an LC has to keep track of its worker’s personal context, decision and quality, the computation and storage overhead for each LC is small.

Fig. 2: Overview of operation of algorithm for task .
1:Receive input from MCSP: , ,
2:Receive input from worker : ,
3:Set joint context space , set
4:Set parameter and control function
5:Initialize context partition: Create partition of into hypercubes of identical size
6:Initialize counters: For all , set
7:Initialize estimated performance: For all , set
8:for each  do
9:     if  then
10:         Receive task context
11:         Observe worker ’s personal context
12:         Find the set such that
13:         if  then
14:              Send to MCSP
15:         else
16:              Send to MCSP
17:         end if
18:         Wait for MCSP’s worker selection
19:         if MCSP requests worker to perform task  then
20:              Observe worker ’s decision
21:              if  then
22:                  if  then
23:                       Observe worker ’s quality , set
24:                  else
25:                       Set
26:                  end if
27:                  
28:                  
29:              end if
30:         end if
31:     end if
32:end for
Algorithm 1 @: Local Controller of Worker .

In more detail, LC  operates as follows, as given in the pseudocode in Alg. 1. First, for synchronization purposes, LC  receives the finite number of tasks to be considered, the task context space and its dimension from the MCSP. Moreover, LC  checks to which of worker ’s context dimensions it has access. This defines the personal context space and its dimension . Then, LC  sets the joint context space to with size . In addition, LC  has to set a parameter and a control function , which are both described below. Next, LC  initializes a uniform partition of worker ’s joint context space , which consists of -dimensional hypercubes of equal size . Hence, the parameter determines the granularity of the partition of the context space. Moreover, LC  initializes a counter for its worker for each hypercube . The counter represents the number of times before (i.e., up to, but not including) task , in which worker was selected to complete a task when worker ’s joint personal and task context belonged to hypercube . Additionally, for each hypercube , LC  initializes the counter , which represents the estimated performance of its worker for contexts in hypercube before task .

Then, LC  performs the following steps for each of the sequentially arriving tasks . For an arriving task , LC  only takes actions if its worker is currently available (i.e., ). If this is the case, LC  first receives the task context sent by the MCSP.111111A worker being not available may mean that he is offline so that the LC cannot even receive information about the arriving task. Therefore, we here consider the LC to only take actions if its worker is in the “available” mode. Moreover, it observes worker ’s current personal context and determines the hypercube from to which the joint context belongs to. We denote this hypercube by . It satisfies . Then, LC  checks if worker has not been selected sufficiently often before when worker ’s joint personal and task context belonged to hypercube . For this purpose, LC  compares the counter with , where is a deterministic, monotonically increasing control function, set in the beginning of the algorithm. On the one hand, if worker  has been selected sufficiently often before (), LC  can rely on the estimated performance , which it sends to the MCSP in this case. On the other hand, if worker has not been selected sufficiently often before (), LC  sends an “explore” message to the MCSP. The control function is hence needed to distinguish when a worker should be selected for exploration (to achieve reliable estimates) or when its estimates are already reliable and can be exploited. Therefore, the choice of control function is essential to ensure a good result of the learning algorithm, since it determines the trade-off between exploration and exploitation. Then, LC  waits for the MCSP to take care of the worker selection. If worker is not selected, LC  does not take further actions. However, if the MCSP requests worker , LC observes whether worker declines or accepts the task. If worker was selected for exploration purposes, LC  performs an additional counter update. For this, if worker accepted the task, LC  first additionally observes worker ’s quality in completing the task (e.g., by requesting a quality rating from the task owner or by using local software for automatic quality assessment) and sets the observed performance to the observed quality. If worker declined the task, LC  sets the observed performance to . Then, based on the observed performance, LC  computes the estimated performance for hypercube and the counter . Note that in Alg. 1, the argument  is omitted from counters and since it is not necessary to store previous values of these counters.

By definition of , the estimated performance corresponds to the product of (i) the relative frequency with which worker accepted tasks when the joint context belonged to hypercube and (ii) the average quality in completing these tasks. Formally, is computed as follows. Let be the set of observed performances of worker before task when worker was selected for a task and the joint context was in hypercube . If before task , worker ’s performance has never been observed before for a joint context in hypercube , we have and . Otherwise, the estimated performance is given by . However, in , the set does not appear, since the estimated performance can be computed based on , and on the performance for task .

1:Send input to LCs: , ,
2:for each  do
3:     Receive task
4:     Compute
5:     Set
6:     Set
7:     Broadcast task context
8:     for each  do
9:         if Receive from LC  then
10:              
11:              if  then
12:                  
13:              end if
14:         end if
15:     end for
16:     Compute
17:     if  then SELECT ALL
18:         Select all workers from
19:     else
20:         Compute
21:         if  then EXPLOITATION
22:               Rank workers in according to estimates from
23:              Select the highest ranked workers
24:         else EXPLORATION
25:              if   then
26:                  Select workers randomly from
27:              else
28:                  Select the workers from
29:                   Rank workers in according to estimates from
30:                   Select the highest ranked workers
31:              end if
32:         end if
33:     end if
34:     Request selected workers to perform task
35:end for
Algorithm 2 @: Worker Selection at MCSP.

In , the MCSP is responsible for the worker selection, which it performs according to the pseudocode given in Alg. 2. First, for synchronization purposes, the MCSP sends the finite number  of tasks to be considered, the task context space  and its dimension  to the LCs. Then, for each arriving task ), the MCSP computes the required number of workers , based on the budget and the price per worker. In addition, the MCSP initializes two sets. The set represents the set of available workers when task arrives, while is the so-called set of under-explored workers, which contains all available workers which have not been selected sufficiently often before. After broadcasting the task context , the MCSP waits for messages from the LCs. If the MCSP receives a message from an LC, it adds the corresponding worker to the set of available workers. Moreover, in this case the MCSP additionally checks if the received message is an “explore” request. If this is the case, the MCSP adds the corresponding worker to the set of under-explored workers. Note that according to Alg. 1 and Alg. 2, the set of under-explored workers is hence given by

(4)

Next, the MCSP calculates the number of available workers. If , i.e., exactly the required number or less workers are available, the MCSP enters a select-all-workers phase and selects all available workers to complete the task. Otherwise, the MCSP continues by calculating the number of under-explored workers. If there is no under-explored worker, the MCSP enters an exploitation phase. It ranks the available workers in according to the estimated performances, which it received from their respective LCs. Then, the MCSP selects the highest ranked workers from this ranking. By this procedure, the MCSP is able to use context-specific estimated performances without actually observing the workers’ current personal context. If there are under-explored workers, the MCSP enters an exploration phase. These phases are needed, such that the LCs of all workers are able to update their estimated performances sufficiently often. Here, two different cases can occur, depending on the number of under-explored workers. Either the number of under-explored workers is at least , in which case the MCSP selects under-explored workers at random. Or the number of under-explored workers is smaller than , in which case the MCSP selects all under-explored workers. Since it should select additional workers, it ranks the available sufficiently-explored workers according to the estimated performances, which it received from their respective LCs. Then, the MCSP additionally selects the highest ranked workers from this ranking. In this way, additional exploitation is carried out in exploration phases, when the number of under-explored workers is small. After worker selection, the MCSP requests selected workers to perform the task by alerting them via the application’s user interface. Via their interface, these workers are also informed about task context . Note that since the MCSP does not have to keep track of the workers’ decisions, the LCs can handle the contact to the task owner directly (e.g., the task owner may send more detailed task instructions directly to the LC; after task completion, the LC sends the result of the completed task to the task owner).

V Theoretical Analysis

V-a Upper Regret Bound

The performance of is evaluated by analyzing its regret, see Eq. (2), with respect to the centralized oracle. In this section, we derive a sublinear bound on the regret, i.e., we show that with some holds. Hence, our algorithm does not have a loss compared to the centralized oracle, since for , it follows that . The regret bound is derived based on the assumption that under a similar personal context and a similar task context, a worker’s expected performance is also similar. This assumption can be formalized as follows.121212Note that our algorithm can also be applied to data, which does not satisfy this assumption. In this case, the regret bound may, however, not hold.

Assumption 1 (Hölder continuity assumption)

There exists , such that for all workers and for all joint contexts , it holds that

where denotes the Euclidean norm in .

The theorem given below shows that the regret of is sublinear in the time horizon .

Theorem 1 (Bound for )

Given that Assumption 1 holds, when LC , , runs Alg. 1 with parameters , , and , and the MCSP runs Alg. 2, the regret is bounded by

Hence, the leading order of the regret is , where .

The proof of Theorem 1 is given in Appendix A. Theorem 1 shows that converges to the centralized oracle in the sense that when the number of tasks goes to infinity, the averaged regret diminishes. Moreover, since Theorem 1 is applicable for any finite number of tasks, it can be used to characterize the algorithm’s speed of learning.

While the regret bound given in Theorem 1 holds for an arbitrary sequence of task budgets and worker availability, more specific regret bounds can be derived for specific stochastic task budgets and worker availability. For example, consider the case when both the task budgets and the worker availability are i.i.d. random variables. Furthermore, assume that the distributions of and are such that for some for . This means, that with probability , the number of available workers exceeds the required number of workers for any task. For this scenario, the following regret bound holds.

Corollary 1 (Bound for under )

Given that Assumption 1 holds and , , when LC , , runs Alg. 1 with the parameters given in Theorem 1, and the MCSP runs Alg. 2, the regret is bounded by

The proof of Corollary 1 is given in Appendix B. Compared to Theorem 1, in this special case, the regret is scaled by the factor , since with probability , the tasks need more workers than available workers, and hence, the algorithm selects the optimal set of workers, which is the set of all available workers.

V-B Local Storage Requirements

The required local storage size in the mobile device of a worker is determined by the storage size needed when the LC executes Alg. 1. In Alg. 1, the LC of worker  stores the counters and for each . Using the parameters from Theorem 1, the number of hypercubes in the partition is . Hence, the number of counters to store in the mobile device of worker  is upper bounded by . Hence, the required storage depends on the number of context dimensions. If the worker allows access to a high number of personal context dimensions and/or the number of task context dimensions is large, the algorithm learns the worker’s context-specific performance with finer granularity and therefore the assigned tasks are more personalized, but also the required storage size will increase.

V-C Communication Requirements

The communication requirements of can be deduced from its main operation steps: For each task , first, the MCSP broadcasts the task context to the LCs, which is a vector of dimension (or, in other words, scalars). Then, the LCs of available workers estimate their worker’s performance and send it to the MCSP. This corresponds to scalars to be transmitted (one scalar sent by each LC of an available worker). Finally, the MCSP informs the selected workers about its decision, which corresponds to scalars sent by the MCSP. Hence, for task , in sum, scalars have to be transmitted. Among these, scalars are transmitted by the MCSP and one scalar is transmitted by each mobile device of an available worker.

We now compare the communication requirements of and of the centralized version of , in which for each task, first, the personal contexts of available workers are gathered in the MCSP, which then makes the worker selection based on the task and personal contexts and informs selected workers about its decision. The communication requirements of the centralized version are as follows: For each task , the LC of each available worker sends the current worker context to the MCSP, which is a vector of dimension (i.e., scalars). Hence, in sum, scalars are transmitted. After worker selection, the MCSP requests selected workers to perform the task, which corresponds to scalars sent by the MCSP. Moreover, the MCSP has to inform the selected workers about the task context, which is a vector of dimension (i.e., scalars). Hence, in total, scalars are transmitted for task . Among these, scalars are transmitted by the MCSP and scalars are transmitted by each mobile device of an available worker.

From this analysis, we can deduce the following: Since the number of personal context dimensions will typically be larger than 1, reduces the size of transmission of each mobile device, compared to the centralized approach, while still taking advantage of personal context information. In a special case, even the sum communication requirements (for all mobile devices and the MCSP in sum) of are smaller than that of the centralized approach: If , where , and additionally for all , then in the centralized approach, in sum, at least scalars are transmitted.

V-D Worker Quality Assessment Requirements

As mentioned above, observing a worker’s quality might be costly since it might require either a quality rating by the task owner or an automatic quality assessment using local software in the battery-constrained mobile device. explicitly takes this into account by only requesting a quality assessment if a worker is selected for exploration purposes. Here, we give an upper bound on the number of quality assessments per worker up to task .

Corollary 2 (Bound for Number of Quality Assessments up to task )

Given that Assumption 1 holds, when LC , , runs Alg. 1 with parameters , , and , and the MCSP runs Alg. 2, the number of quality assessments of each worker up to task is upper bounded by

The proof of Corollary 2 is given in Appendix C. From Corollary 2, we see that the number of quality assessments per worker is sublinear in . Hence, it holds , so that the for , the average rate of quality assessments approaches zero.

Vi Numerical Results

We evaluate by comparing its performance with various algorithms based on synthetic and real data.

Vi-a Reference Algorithms

The following algorithms are used for comparison.

  • (Centralized) Oracle: The Oracle has perfect a priori knowledge about context-specific expected performances. Moreover, it is centrally informed about the current contexts of available workers.

  • LinUCB: This algorithm assumes that the expected performance of a worker is linear in its context [29], [30]. Based on a linear reward function over contexts and the history of previous observations of context-specific worker performances, for each task, the algorithm chooses the available workers with highest estimated upper confidence bounds on their expected performance. The algorithm has an input parameter , controlling the influence of the confidence bound. LinUCB is used in [20] for task assignment in spatial crowdsourcing.

  • AUER: This algorithm [31] is an extension of the well-known UCB algorithm [32] to the sleeping arm case. It learns from previous observations of worker performances, but without taking into account context information. Based on the history of previous observations of worker performances, this algorithm selects the available workers with highest estimated upper confidence bounds on their expected performance. The algorithm has an input parameter , which controls the influence of the confidence bound.

  • -Greedy: With a probability of , this algorithm selects a random subset of available workers. With a probability of , the algorithm selects the available workers with highest estimated performance. The estimated performance of a worker is computed based on the history of previous performances [32], but without taking into account context.

  • Myopic: This algorithm only learns from the last interaction with each worker. For task , it selects a random subset of workers. For each of the following tasks, it checks which of the available workers have previously accepted a task. If more than of the available workers have accepted a task when requested the last time, the algorithm selects out of these workers the workers with the highest performance in their last completed task. Otherwise, the algorithm selects all of these workers and an additional subset of random workers so that in total workers are selected.

  • Random: For each task , a random subset of available workers is selected.

Note that, if an algorithm originally would have selected only one worker per task, we adapted it to select workers per task. Also, above, we described the behavior of the algorithms for the case . In the case of , we adapted each algorithm such that it selects all available workers. Moreover, most of the reference algorithms are usually used in a centralized fashion, but they can be decoupled to use them in the same hierarchical setting as .

Vi-B Evaluation Metrics

Each algorithm is run over a sequence of tasks and its result is evaluated using the following metrics. As a function of the arriving tasks, we compute the cumulative performance up to achieved by an algorithm, which is the cumulative sum of performances by all selected workers up to (and including) task . Formally, if the set of selected workers of an algorithm  for task is and is the observed performance of worker , the cumulative performance up to achieved by algorithm  is

Moreover, we compute the average worker performance up to  achieved by an algorithm, which is the average performance of all selected workers up to task . Formally, the average worker performance up to is defined by

Vi-C Simulation Setup

We evaluate the different algorithms using synthetic as well as real data. The difference between the two approaches lies in the arrival process of workers and their contexts. To produce synthetic data, we generate workers and their contexts based on some predefined distributions as described below. In case of real data, similar to, e.g., [6, 20, 22], we use a data set from Gowalla [33]. Gowalla is a location-based social network where users share their location by checking in at ’spots’, i.e., certain places in their vicinity. We use the check-ins to simulate the arrival process of workers and their contexts. The Gowalla data set consists of 6,442,892 check-ins of 107,092 distinct users over the period of February 2009 to October 2010. Each entry of the data set consists of the form (User ID, Check-in Time, Latitude, Longitude, Location ID). Similar to the approach in [22], we first extract the check-ins in New York City, which leaves a subset of 138,954 check-ins of 7,115 distinct users at 21,509 distinct locations. This resulting Gowalla-NY data set is used below.

(a) Check-ins.
(b) Visited locations.
Fig. 3: Statistics of used Gowalla-NY data set.

Fig. 3(a) shows the distribution of the number of check-ins in the Gowalla-NY data set. The number of check-ins is distributed between 1 check-in (1414 users) up to 1794 check-ins (1 user). As an example, 2532 users checked in more than 10 times. Fig. 3(b) shows the distribution of the number of distinct locations visited by the users in the Gowalla-NY data set. The number of visited locations is distributed between 1 location (1524 users) up to 1633 locations (1 user). For example, 3661 users checked in at more than 5 locations.

For both synthetic and real data, the basic simulation setup is as follows: We simulate an MCSP, to which a set of workers belongs. For synthetic data, workers are created in the beginning. For real data, we randomly select users from the Gowalla-NY data set, which represent the workers of the MCSP. Then we use this reduced Gowalla-NY data set containing the check-ins of users. Task owners have to pay a fixed price of per worker requested by the MCSP to perform the task. The quality of a completed task lies in the range and . The task properties are modeled as follows. The task budget is sampled from a normal distribution with expected value and standard deviation of , truncated between and , i.e., given the price , the required number of workers is on average . The task context is assumed to be uniformly distributed in (i.e., ).

The worker arrival process and the worker context are sampled differently in case of synthetic and real data. For synthetic data, we let each worker be available with a probability of (default value) for each arriving task. The personal context space of an available worker is set to (i.e., ). The first personal context dimension refers to the worker’s location, which is sampled from different (personal) locations, using a weighted discrete distribution with probabilities to represent the fact that workers may use the crowdsourcing application different amounts of time in different places (e.g., at home more often than at work). The second personal context dimension refers to the worker’s battery state, which is sampled from a uniform distribution in . The worker performance is modeled as follows. The joint personal and task context space (of dimension ) is uniformly split into a uniform grid of subsets (i.e., in each of the dimension, the space is split into identical parts). In each of the subsets, the expected performance of a worker is a priori sampled uniformly at random from . Later, for each sampled joint worker and task context, first it is checked to which subset of the grid the sampled context belongs. Then, the instantaneous performance is sampled by adding noise uniformly sampled from to the expected performance in the given subset of the grid (the noise interval is truncated to a smaller interval if the expected performance lies close to either or ).

Using real data, the arrival processes of workers and their contexts are sampled as follows. For the worker availability, we use a Binomial distribution with parameters and (default value) to sample the number of available workers for an arriving task.131313In this way, the number of available workers in our experiments using the real and the synthetic data are distributed in the same way. Having sampled , we randomly draw samples from the reduced Gowalla-NY data set (consisting of the check-ins of users) until these samples contain distinct users. These sampled users correspond to the available workers (i.e., users with higher number of check-ins in the reduced Gowalla-NY data set translate to workers that are more often available for the MCSP). The personal context space of an available worker is again set to (i.e., ). We set the first personal context of the available workers to the check-in location of the respective user from the sample.141414If a user was sampled several times until we sampled distinct users, we choose his first sampled check-in location. The second personal context dimension refers again to the battery state, uniformly sampled from . The worker performance is modeled as follows. The joint personal and task context space () is split into a grid. Along the dimensions of task context and battery state, the context space is split into uniform parts each. Along the dimension of location context, the context space is split into uniform parts, where corresponds to the number of distinct locations visited by the corresponding user from the reduced Gowalla-NY data set (i.e., for each visited location, the expected performance will be different). In each of the subsets, the expected performance of a worker is a priori sampled uniformly at random from . In this way, workers with higher number of visited locations have a higher number of different context-specific performances. Later, for each sampled joint worker and task context, the instantaneous performance is sampled by adding noise uniformly sampled from to the expected performance in the given subset of the grid (the noise interval is truncated to a smaller interval if the expected performance lies close to either or ).

Vi-D Parameter Selection

, LinUCB, AUER and -Greedy require a parameter as input, which affects their performance. In order to find appropriate parameters, we generate synthetic instances, each instance consisting of a sequence of tasks and worker arrivals sampled according to the procedure explained above. Then, we run , LinUCB, AUER and -Greedy with different parameters. Note that for , we set , choose , , as in Theorem 1, and set the control function to , , where the factor is included to reduce the number of exploration phases. Then we search for an appropriate . Table II shows the parameters at which each of the algorithms performed best, respectively.

Algorithm Parameter Selected Value
LinUCB
AUER
-Greedy
TABLE II: Choice of Parameters for Different Algorithms.

Vi-E Results

Next, in order to evaluate the different algorithms, we generate another synthetic instances and instances based on real data. Each instance consists again of a sequence of tasks and worker arrivals sampled according to the descriptions given above. Then, we run the algorithms with the parameters from Table II, on the synthetic instances and on the instances based on real data, respectively. The results shown below are averaged over these instances.

Fig. 4(a) and Fig. 4(b) show the cumulative worker performance up to task as a function of the sequentially arriving tasks , for synthetic and real data, respectively.

(a) Experiments with synthetic data.
(b) Experiments with real data.
Fig. 4: Cumulative worker performance up to task over the sequence of arriving tasks for worker availability probability .

The cumulative performance achieved by each algorithm is linearly increasing for increasing number of processed tasks for both synthetic and real data. As expected, while Oracle outperforms all other algorithms due to its a priori knowledge about the workers’ expected performances, Random gives a lower bound on the achievable cumulative performance. The proposed algorithm clearly outperforms the algorithms LinUCB, AUER, -Greedy and Myopic, even though observes worker performance only when requesting a worker for exploration purposes, while the other algorithms have access to worker performance whenever a worker is requested. This is due to the fact that smartly exploits worker and task context information. In detail, the cumulative worker performance up to achieved by corresponds to between () and () times the results achieved by the other non-oracle algorithms for the synthetic (real) data. Moreover, reaches a result close to the optimal. In detail, the cumulative worker performance up to achieved by corresponds to () times the one achieved by Oracle for the synthetic (real) data. In contrast, the algorithms LinUCB, AUER, -Greedy and Myopic perform by far worse and interestingly, lie in a very similar region close to the result of Random. This shows that the learning approaches which either do not take context into account (i.e., AUER, -Greedy and Myopic) or which assume a linear dependency between context and performance (i.e., LinUCB), cannot cope with the non-linear context dependency of expected worker performance. Comparing synthetic and real data, has a better performance on the synthetic data, but it still reaches a good result on the real data, even though each worker has his own diversity in context arrival and hence in expected performance in the real data (since users in the Gowalla-NY data set have different numbers of visited check-in locations).

Fig. 5(a) and Fig. 5(b) show the average worker performance up to task as a function of the sequentially arriving tasks .

(a) Experiments with synthetic data.
(b) Experiments with real data.
Fig. 5: Average worker performance up to task for sequence of arriving tasks for worker availability probability .

We see that over the sequence of tasks, the average worker performance achieved by Oracle and Random stay nearly constant at around and for both synthetic and real data. The algorithms LinUCB, AUER, -Greedy and Myopic increase the average worker performance slightly, starting between and at and ending with average performance of between and at . On the contrary, is able to increase the average worker performance from at up to () at for the synthetic (real) data. Hence, learns context-specific worker performances over time and after sufficiently many processed tasks selects workers almost as well as Oracle does.

Finally, we evaluate the impact of worker availability by varying the probability .151515Note that this is equivalent to varying the expected task budget. We therefore do not separately evaluate the impact of the task budget. In particular, the results presented below for different worker availability of course scale with the task budget and cannot be used to draw absolute conclusions. For each value of , we run all algorithms on synthetic instances and instances based on real data for , respectively. Then, we average the results. Fig. 6(a) and 6(b) show the cumulative performance at achieved by the algorithms for different .

(a) Experiments with synthetic data.
(b) Experiments with real data.
Fig. 6: Impact of worker availability on cumulative performance at for tasks.

For small , all algorithms yield approximately the same performance. This is as expected since given our modelling of task budget, for small