A Real-Time Framework for Task Assignment in Hyperlocal Spatial Crowdsourcing

A Real-Time Framework for Task Assignment in Hyperlocal
Spatial Crowdsourcing

Luan Tran11footnotemark: 1 Hien To111The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Liyue Fan Cyrus Shahabi University of Southern California University of Southern California University at Albany, SUNY University of Southern California
Abstract

Spatial Crowdsourcing (SC) is a novel platform that engages individuals in the act of collecting various types of spatial data. This method of data collection can significantly reduce cost and turnover time, and is particularly useful in urban environmental sensing, where traditional means fail to provide fine-grained field data. In this study, we introduce hyperlocal spatial crowdsourcing, where all workers who are located within the spatiotemporal vicinity of a task are eligible to perform the task, e.g., reporting the precipitation level at their area and time. In this setting, there is often a constraint, either for every time period or for the entire campaign, on the number of workers to activate to perform tasks. The challenge is thus to maximize the number of assigned tasks under the budget constraint, despite the dynamic arrivals of workers and tasks. We introduce a taxonomy of several problem variants, such as budget-per-time-period vs. budget-per-campaign and binary-utility vs. distance-based-utility. We study the hardness of the task assignment problem in the offline setting and propose online heuristics which exploits the spatial and temporal knowledge acquired over time. Our experiments are conducted with spatial crowdsourcing workloads generated by the SCAWG tool and extensive results show the effectiveness and efficiency of our proposed solutions.

Spatial Crowdsourcing, Crowdsensing, Participatory Sensing, GIS, Online Task Assignment, Budget Constraints
\MakePerPage

footnote \makesavenoteenvtable

{CCSXML}

¡ccs2012¿ ¡concept¿ ¡concept_id¿10002951.10003260.10003282.10003296¡/concept_id¿ ¡concept_desc¿Information systems Crowdsourcing¡/concept_desc¿ ¡concept_significance¿500¡/concept_significance¿ ¡/concept¿ ¡concept¿ ¡concept_id¿10003120.10003138¡/concept_id¿ ¡concept_desc¿Human-centered computing Ubiquitous and mobile computing¡/concept_desc¿ ¡concept_significance¿500¡/concept_significance¿ ¡/concept¿ ¡concept¿ ¡concept_id¿10002951.10003227.10003236.10003237¡/concept_id¿ ¡concept_desc¿Information systems Geographic information systems¡/concept_desc¿ ¡concept_significance¿500¡/concept_significance¿ ¡/concept¿ ¡/ccs2012¿

\ccsdesc

[500]Information systems Crowdsourcing \ccsdesc[500]Human-centered computing Ubiquitous and mobile computing \ccsdesc[500]Information systems Geographic information systems

\acmformat

Luan Tran, Hien To, Liyue Fan, Cyrus Shahabi, 2017. A Real-Time Framework for Task Assignment in Hyperlocal Spatial Crowdsourcing.

1 Introduction

With the ubiquity of smart phones and the improvements of wireless network bandwidth, every person with a mobile phone can now act as a multimodal sensor collecting and sharing various types of high-fidelity spatiotemporal data instantaneously. In particular, crowdsourcing for weather information has become popular. With a few recent apps, such as mPING222http://mping.nssl.noaa.gov/ and WeatherSignal333http://weathersignal.com, individual users can report weather conditions, air pollutions, noise levels, etc. In fact, the authors in [Dorminey (2014)] regards crowdsourcing as “the future of weather forecasting”.

Through our collaboration with the Center for Hydrometeorology and Remote Sensing (CHRS)444http://chrs.web.uci.edu/ at the University of California, Irvine, we have developed a mobile app, iRain555https://itunes.apple.com/us/app/irain-uci/id982858283 [iRa (2016)], to perform spatial crowdsourcing for precipitation information. Unlike other weather crowdsourcing apps, iRain allows CHRS researchers to request rainfall information at specific locations and times where their global satellite precipitation estimation technologies666http://hydis.eng.uci.edu/gwadi/ fail to provide real-time, fine-grained data. Individual iRain users around those locations can respond to those requests by reporting rainfall observations, e.g., heavy/medium/light/none.

In general, spatial crowdsourcing (SC) [Kazemi and Shahabi (2012)] offers an effective data collection platform where data requesters can create spatial tasks dynamically and workers are assigned to tasks based on their locations. Figure 1 depicts the architecture of iRain. A requester issues a set of rainfall observation tasks to the SC-server (Step 1) where each task corresponds to a specific geographical extent, e.g., a circle. The workers continuously update their locations to the SC-server when they become available for performing tasks (Step 0). Subsequently, the SC-server crowdsources the tasks among the workers in the task regions and sends the collected data back to the requester (Steps 2, 3).

Figure 1: Hyperlocal spatial crowdsourcing framework.

One major difference from existing SC paradigms [Kazemi and Shahabi (2012), He et al. (2014), To et al. (2015), Xiao et al. (2015), Guo et al. (2016)] is that workers in our paradigm do not need to travel to the exact task locations, e.g., to the centers of the circular regions, and are eligible to perform tasks as long as they are in close spatiotemporal vicinity of the tasks, e.g., enclosed in the circular regions777Tasks that require workers to physically travel to task locations, e.g., taking a picture of an event, are not considered in our problem setting.. We denote this new paradigm as Hyperlocal Spatial Crowdsourcing. The reason is two-fold. Without requiring the workers travel physically, our paradigm lowers the threshold for worker participation and will potentially yield faster response. Furthermore, the requested data, e.g., rainfall or temperature, exhibits spatiotemporal continuity in measurement. Therefore, observations obtained at nearby locations, e.g., within a certain distance to the task location, and close to the requested time, are sufficient to fulfil the task. For example, workers and in Figure 1 are both eligible to report precipitation level at University of Southern California (USC), and worker who becomes available 5 minutes later is also qualified. The acceptable ranges of space and time can be specified by data requesters, from which the SC-server can find the set of eligible workers for each task.

The SC-server operates to maximize fulfilled tasks for revenue. Therefore it cannot assign tasks to an unlimited number of workers due to budget considerations. In Hyperlocal SC, the budget represents the payment to each selected worker upon task completion, or the communication cost for sending/receiving task notifications between the SC-server and each selected worker. Furthermore, it is not necessary to select many workers for overlapping tasks. For example in Figure 1, the observation of worker can be used for precipitation tasks at both USC and Los Angeles downtown (shown in two circles).

The goal of our study is to maximize the number of assigned tasks on the SC-server where only a given number of workers can be selected over a time period or during the entire campaign, i.e., under “budget” constraints. When tasks and workers are known a priori, we can reduce the task assignment problem to the classic Maximum Coverage Problem and its variants. However, the main challenge with SC comes from the dynamism of the arriving tasks and workers, which renders an optimal solution infeasible in the online scenario. In Figure 1, the SC-server is likely to activate worker and either worker or for the two tasks, respectively, without knowing that a more favorable worker is qualified for both tasks and will arrive in the near future. Previous heuristics in the literature [Kazemi and Shahabi (2012), To et al. (2015), Deng et al. (2016), Guo et al. (2016)] do not consider the vicinity of tasks in space and time or the budget, thus cannot be applied to Hyperlocal SC.

The contributions of this paper are as follows888This paper is an extension of a short paper appeared in [To et al. (2016b)]. 1) We provide a formal definition of Hyperlocal Spatial Crowdsourcing, where the goal is to maximize task coverage under budget constraints. We introduce a taxonomy to classify several problem variants, e.g., given a budget constraint for each time period () vs. for the entire campaign (). We show that both and variants are NP-hard in the offline setting. 2) In the online setting, we propose several heuristics for real-time task assignment. When a budget constraint is given for each time period (), local heuristics i.e., Basic, Temporal, and Spatial, are developed to select workers within each time period. When a budget is given for the entire campaign (), we devise an adaptive strategy based on the contextual bandit to dynamically allocate the total budget to a number of time periods. 3) When the utility of an assigned task is considered, we introduce two distance-based utility models to measure the assignment quality based on worker-task distance, which can be integrated with any previously developed heuristics. To avoid overloading workers, we introduce a multi-objective variant in order to minimize the repetitive activation of the same worker. Online solutions based on genetic algorithm and adaptive budget allocation are developed for and scenarios, respectively. 4) We conduct extensive experiments with workbench datasets generated from real-world location check-ins. The empirical results confirm that our heuristics are efficient and effective in assigning hyperlocal tasks in a real-time manner.

The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 provides notations and a taxonomy for Hyperlocal SC problem. We prove in Section 4 that offline task assignment in Hyperlocal SC with budget constraints is NP-hard. In Section 5, we study two problem variants in the online setting. Section 6 discusses the multi-objective optimization variant to mitigate worker overloading and Section 7 describes the integration of distance-based task utility models. We report our experimental results in Section 8, provide discussion in Section 9, and conclude the paper in Section 10.

2 Related work

Spatial Crowdsourcing (SC) can be deemed as one of the main enablers of urban computing’s applications such as monitoring traffic information and air pollution [Zheng et al. (2014), Ji et al. (2016)]. Only recently SC has gained popularity in both research community and industry, e.g., TaskRabbit, Gigwalk. A recent study [To et al. (2015)] distinguishes SC from related fields, including generic crowdsourcing, participatory sensing, volunteered geographic information, and online matching. Research efforts in SC have focused on different aspects, such as task assignment (e.g., [Kazemi and Shahabi (2012), Tong et al. (2016), Liu et al. (2016b), Cheng et al. (2016), Hu et al. (2016)]), task scheduling (e.g., [Deng et al. (2016), Li et al. (2015), Sales Fonteles et al. (2016)]), quality control and trust (e.g.,  [Kazemi et al. (2013), Cheng et al. (2015)]), privacy (e.g., [To et al. (2014), To et al. (2017), Wang et al. (2017)]), incentive mechanism (e.g., [Gao et al. (2015), Kandappu et al. (2016), Zhang et al. (2017)]). The authors in [Kazemi and Shahabi (2012)] proposed task assignment problem whose goal is to maximize the number of assigned tasks. Requesters may want to crowdsource a spatial complex task that requires multiple workers at different locations to collectively perform several spatial sub-tasks [Dang et al. (2013), Zhang et al. (2016), Gao et al. (2017)]. In [To et al. (2014)], the authors introduce the problem of protecting worker location privacy in SC. A framework is proposed to ensure differentially-private protection guarantees without significantly affecting the effectiveness and efficiency of the SC system. Another study proposes a differentially private incentive mechanism in mobile crowd sensing system [Jin et al. (2016)]. In such systems where workers bid tasks, worker’s bid may reveal her interests, knowledge base. Thus, the proposed method preserves the privacy of each worker’ bid against the other honest-but-curious workers. However, unlike our study, this work does not focus on challenges that are unique to spatial crowdsourcing such as spatial task allocation. Meanwhile, the authors in [Pournajaf et al. (2014)] propose a two-phase framework whose objective is to match a set of spatial tasks to a set of workers given the workers’ cloaking regions such that task assignment is maximized while satisfying travel budget constraint of each worker. The trust issues in SC have been studied in [Kazemi et al. (2013)], where one solution is having tasks performed redundantly by multiple workers. Recently in [Cheng et al. (2016)], the reliability of task assignment is measured in term of both the confidence of task completion and the diversity quality of the tasks. However, the trust and reliability issues of workers are beyond the scope of our work; if there are multiple reports for one task, the SC-server will simply send all available reports to the task requester. It is worth noting that we assume the workers would respond to their assigned tasks, which is a common assumption in the server-assigned mode of spatial crowdsourcing [Kazemi and Shahabi (2012)]. There have been studies that relax this assumption by associating to each worker a probability to perform an assigned task (e.g., [Cheng et al. (2016), To et al. (2014)]). However, this consideration is not the focus of our paper. In addition, the time required for the workers responding to an assigned task is negligible (e.g., a few seconds with the iRain application) when compared to the deadline of each task (e.g., one day).

Online Spatial Task Assignment: There have been extensive studies regarding task assignment in generic crowdsourcing (e.g., [Venanzi et al. (2013), Tran-Thanh et al. (2013)]). However, unlike our study that focuses on the spatiotemporal aspects of the task assignment, these studies focus on task assignment in general crowdsourcing rather than spatial crowdsourcing. Recent studies that are closely related to ours include [ul Hassan and Curry (2014)] and [Tong et al. (2016)]. Both of them study the online spatial task assignment problem; however, they differ from our work in terms of the problem setting and objectives. First, in our problem the report of a worker can be used for multiple tasks as long as their geographical extents cover the worker’s location. Thus, our focus is worker selection rather than matching of workers to tasks as in [ul Hassan and Curry (2014), Tong et al. (2016)]. Second, the objectives in these studies are respectively to maximize the number of assigned tasks and to maximize the total utility score of the worker-task matches while our framework considers both kinds of utility, each is jointly combined with other real-world considerations, i.e., leveraging historical workload and minimizing worker overloading. In addition, our study maximizes the utility of assignment under budget constraints while others focus solely on maximizing the utility.

Budgeted Spatial Task Assignment: There have been recent studies on matching workers with tasks under budget constraints. In [Tran-Thanh et al. (2013)], the authors propose CrowdBudget — an agent-based budget allocation algorithm that divides a given budget among different tasks in order to achieve low estimation error (of the estimated answers for a set of tasks). This study does not consider the challenges of spatial task assignment, where workers and tasks can come and go at any time and we are not aware of their locations until their arrival time. The study in [Miao et al. (2016)] also differs from ours. In our problem, the workers do not need to travel and report sensed value at their current locations. In contrast, in [Miao et al. (2016)], the workers do need to travel to the task locations, which may take a long time in rush hour. Consequently, the workers may reject their assigned tasks. It is worth noting that these studies focus on worker/task matching problem; however, our aim is to select the best workers to maximize task coverage. Similar to [Zhang et al. (2015), Tran-Thanh et al. (2013)], the budget in [Miao et al. (2016)] refers to payment to workers, while in our study, we consider budget as the number of workers to select and focus on allocating a total budget across multiple time periods.

Worker Selection: Several works studied the problem of selecting workers with budget constraints [Song et al. (2014), Zhang et al. (2014)]. However, those studies focus on offline participant selection problem while our focus is to propose online solutions. Furthermore, the problem settings in those studies differ from ours in several aspects. Sensing tasks in [Song et al. (2014)] are represented by non-overlapping regions while tasks in our study can overlap spatially thus more challenging for optimization. The authors in [Zhang et al. (2014)] studied the problem of selecting a minimum number of workers to minimize the overall incentive payment while satisfying a probabilistic coverage requirement; however, in our problem, the number of workers to be selected is constrained by a predefined budget. Our work is also closely related to the problem of matching workers with tasks [He et al. (2014), Xiao et al. (2015)]. Particularly, in [He et al. (2014)], the authors studied the problem of task allocation that maximizes the reward of the SC platform given a time constraint for each worker. Recently in [Xiao et al. (2015)], a task assignment problem that minimizes the average makespan of all assigned tasks was proposed. Unlike these studies, SC workers in our setting need not travel to task locations. Furthermore, our aim is different from the aforementioned studies, which is to maximize task coverage.

3 Preliminaries

We first introduce concepts and notations used in this paper. A task is a query of certain hyperlocal information, e.g., precipitation level at a particular location and time. For simplicity, we assume that the result of a task is in the form of a numerical value, e.g., 0=rain,1=snow,2=none999Remote sensing techniques based on satellite images cannot differentiate between rain and snow.. Specifically, every task comes with a pre-defined region where any enclosed worker can report data for that task. In this paper, we define each task region as a circular space centered at the task location; however, task region can be extended to other shapes such as a polygon or to represent geography such as district, city, county, etc. Moreover, each task also specifies a valid time interval during which users can provide data. {definition}[Task] A task of form is a query at location , which can be answered by workers within a circular space centered at with radius . The parameter indicates the duration of the query: it is requested at time and can be answered until time . We refer to as the “deadline” of task . A task expires if it has not been answered before its deadline. Figure 2(a) shows the regions of six tasks, . All tasks expire at time period 2 (i.e., they can be deferred to time period 2), represented by the dashed circles in Figure 2(b).

(a) Time period 1
(b) Time period 2
(c) Bipartite graph
Figure 2: Graphical example of worker-task coverage (). Subscripts represent time periods while superscripts mean ids.

A worker can accept task assignments when he is online. {definition}[Worker] A worker of form , is a carrier of a mobile device who can accept spatial task assignments. The worker can be uniquely identified by his and his location is at . Intuitively, a worker is eligible to perform a task if his location is enclosed in the task region. In Figure 2(a), is eligible to perform and while is qualified for and . Furthermore, a worker’s report to one task can also be used for all other unexpired tasks whose task regions enclose the worker. As in Figure 2(b), is eligible to perform and , which are deferred from time 1.

Let denotes the set of available workers at time and denotes the set of available tasks including tasks issued at time and previously issued un-expired tasks. Below we define the notions of worker-task coverage and coverage instance sets.

{definition}

[Worker-Task Coverage] Given , let denotes the task coverage set of , such that for every ,

(1)
(2)

We also say the worker covers the tasks . An example of a coverage in Figure 2(a) is .

{definition}

[Coverage Instance Set] At time , the coverage instance set, denoted by is the set of worker-task coverage of form for all workers .

Time Coverage Instance Sets
1
2
Table 1: The coverage instance set of the example in Figure 2.

The coverage instance sets for the example in Figure 2 are illustrated in Table 1. For simplicity, we now assume the utility of a specific task assignment is binary within the task region and before the deadline. That is, assignment to any worker within a task region before the deadline has utility 1 (1 successful assignment), and 0 otherwise. As a result, task and being answered by worker at time is equivalent to it being answered by at time .

Again, the goal of our study is to maximize task assignment given a budget, despite the dynamic arrivals of tasks and workers. Now, we formally define the notion of a budget. {definition}[Budget] Budget is the maximum number of workers to select in a coverage instance set. In practice, budget can capture the communication cost the SC-server incurs to push notifications to selected workers (Step 3 in Figure 1), or the rewards paid to the workers.

With these, we formally define the hyperlocal crowdsourcing problem with budget constraint as follows.

{definition}

[Problem] Given a set of workers , a set of available tasks , a budget constraint , and a utility function , find a subset of workers of within the budget constraint, such that the total utility of the covered tasks is maximized.

3.1 Problem Taxonomy

3.1.1 Budget-per-time-period vs. Budget-per-campaign

In certain scenarios, the task requester may specify a budget constraint, i.e., the maximum number of workers to select, for each time period in a campaign, e.g., a day or a week. Given a set of time periods , a budget constraint is specified for each . The challenge is to decide which workers to select within each time period. On the other hand, the task requester could specify a budget constraint for the entire campaign. Given a set of time periods and assuming workers are selected for , a budget constraint is specified for the sum of ’s. The new challenge of this problem variant is to allocate the total budget wisely over time periods.

The choice of the constraint model depends on the financial flexibility of the task requester. Furthermore, the budget-per-time-period model is a special case of the budget-per-campaign model. As a result, the utility of budget-per-campaign solution is no worse than that of the budget-per-period solution for any problem instance.

3.1.2 Binary-utility vs. Distance-based-utility

Considering the utility of assigned tasks, our problem can be classified into binary-utility and distance-based-utility variants. In the binary-utility model, a task can be assigned to any worker located within the task radius to achieve utility . Unassigned tasks will yield utility. Therefore, the optimization objective is to maximize the total number of assigned tasks. However, for some applications, a worker who is closer to the task location may be “preferred” over other workers farther away. For example, in weather crowdsourcing applications, e.g., iRain [iRa (2016)], a closer worker can report more accurate rainfall data. The distance-based-utility model thus evaluates a task assignment to a specific worker with various distance functions.

3.1.3 Single-objective vs. Multi-objective

In the single-objective problem formulation, we aim to maximize the total utility of assigned tasks. On the other hand, crowdsourcing applications may have more than one, sometimes conflicting, objectives, to ensure long-term prosperity. For example, worker overloading can be a critical concern of the novel crowdsourcing platforms, in which only a few workers are frequently selected to optimize task assignments. Therefore, a multi-objective formulation can introduce a second objective to minimize the worker overloading phenomenon. The challenge is thus to find solutions considering the trade-off between the two objectives.

3.1.4 Offline vs. Online

Orthogonal to the dimensions above, our problem can be further classified into offline and online variants. The offline variant selects workers with complete knowledge of task/worker arrivals during the entire campaign. Although this is not practical, studying the offline variant allows us to eliminate the hardness arising from the randomness of the online problem and focus on the optimization in a deterministic setting. In the online variant, assignments have to be made in real-time for the currently arriving tasks/workers without complete knowledge of future arrivals. While it is more fitting for crowdsourcing applications, it is also intuitively more challenging — it is uncertain in nature when and where future tasks and workers may appear. Thus, effective worker selection must optimize the objective(s) in the long run.

The majority of this paper will focus on the online, binary-utility, single-objective problem with both per-time-period and per-campaign budget constraints. We will also show how to extend our solutions to the distance-based utility model as well as the multi-objective problem.

4 Hardness of the Problem

In this section we study the problem complexity of task assignment with budget constraint in hyperlocal spatial crowdsourcing. We show that two offline variants, i.e., budget-per-time-period vs. budget-per-campaign, of the problem are NP-hard and propose online heuristics in the next section.

4.1 Fixed Budget

Problem 1 (Fixed-budget Maximum Task Coverage)

Given a set of time periods and a budget for each , the fixed-budget maximum task coverage (fMTC) problem is to select a set of workers at every , such that the total number of covered tasks is maximized and .

This optimization problem is challenging since each worker is eligible for a subset of tasks. The fact that a task can be deferred to future time periods further adds to the complexity of the problem. With the following theorem, we proof that fMTC is NP-hard by a reduction from the maximum coverage with group budgets constraints problem (MCG) [Chekuri and Kumar (2004)]. MCG is motivated by the maximum coverage problem (MCP) [Feige (1998)]. Consider a given , we are given the subsets of a ground set and the disjoint sets . Each , namely a group, is a subset of . With MCG, we are given an integer , and an integer bound for each group . A solution to is a subset such that and for . The objective is to find a solution such that the number of elements of covered by the sets in is maximized. MCP is the special case of MCG. Since MCP is known to be strongly NP-hard [Feige (1998)], by restriction, MCG is also NP-hard.

Theorem 1

fMTC is NP-hard.

{proof}

We prove the theorem by a reduction from MCG [Chekuri and Kumar (2004)]. That is, given an instance of the MCG problem, denoted by , there exists an instance of the MTC problem101010In this section, MTC refers to fixed-budget MTC for short, denoted by , such that the solution to can be converted to the solution of in polynomial time. The reduction has two phases, transforming all workers/tasks across the entire campaign to a bipartite graph, and mapping from MCG to MTC. First, we layout the tasks and workers as two set of vertices in a bipartite graph in Figure 2(c). A worker can cover a task if both spatial and temporal constraints hold, i.e., Equations 2 and 1, respectively. In Figure 2(c), can cover and , which are deferred from to , represented by the dashed line.

Thereafter, MTC can be stated as follows. Selecting the maximum workers per group, each group represents a time period, such that the number of covered tasks is maximized (i.e., ). To reduce to , we show a mapping from components to components. For every element in the ground set in , we create a task (). Also, for every set in , we create a worker with (). Consequently, to solve , we need to find a subset workers of maximum size in each group whose coverage is maximized. Clearly, if an answer to is the set (), the answer to will be the set of maximum coverage such that and for .

As the transformation is bounded by the polynomial time to construct the bipartite graph, this completes the proof.

By a reduction from the MCG problem, we can now use any algorithm that computes MCG to solve the MTC problem. The greedy algorithm in [Chekuri and Kumar (2004)] provides -approximation for MCG. For example, the greedy solution in Figure 2(c) is . However, the approximation ratio only holds in the offline scenario where the server knows the coverage instance set for every time period.

4.2 Dynamic Budget

Problem 2 (Dynamic-budget Maximum Task Coverage)

The dynamic-budget maximum task coverage problem (), is similar to , except the total budget is specified for the entire campaign, i.e., .

In the offline setting where the server is clairvoyant about the future workers and tasks, we prove the problem is NP-hard by reduction from the maximum coverage problem (MCP).

Theorem 2

is NP-hard.

{proof}

We prove the theorem by a reduction from MCP. That is, given an instance of the MCP problem, denoted by , there exists an instance of the MTC problem111111In this section, MTC refers to dynamic-budget MTC for short, denoted by , such that the solution to can be converted to the solution of in polynomial time. The reduction includes two steps, transforming all workers/tasks across the entire campaign to a bipartite graph, and mapping from MCP to MTC. The first step is similar to that of Theorem 1, in which the workers and tasks from the entire campaign are transformed into a bipartite graph as illustrated in Figure 2(c). The mapping step can be considered as a special case of the proof in Theorem 1, in which there exists only one group of all budget. As the transformation is bounded by the polynomial time to construct the bipartite graph and MCP is strongly NP-hard, this completes the proof.

The results of these solutions to the offline scenarios will be used as the upper bounds of the results to the online solutions to be discussed in Section 5.

5 Online Task Assignment

In this section we focus on online variants: online fMTC when a budget constraint is given for each time period, and online dMTC, when the budget constraint is given for the entire campaign. We will introduce heuristics for each variant as follows.

5.1 Fixed Budget

In the online scenario where workers and tasks arrive dynamically, it becomes more challenging to achieve the global optimal solution for Problem 1. Since the server does not have prior knowledge about future workers and tasks, it tries to optimize task assignment locally at every time period. However, the optimization within every time period, similar to the maximum coverage problem (MCP), is also NP-hard. A greedy algorithm [Feige (1998)] was proposed to achieve an approximation ratio of , by choosing a set which contains the largest number of uncovered elements at each stage. This study shows that the greedy algorithm is the best-possible polynomial time approximation algorithm for MCP. Below we propose several greedy heuristics to solve the online problem, namely Basic, Spatial and Temporal.

5.1.1 Basic Heuristic

The Basic heuristic solves the online problem by using the greedy algorithm [Hochbaum (1996)] for every time period. At each stage, Basic selects the worker that covers the maximum number of uncovered tasks, depicted in Line 10 of Algorithm 1. For instance, in Figure 2(a), is selected at the first stage. At the beginning of each time period, Line 4 removes expired tasks from the previous time period. Line 5 adds unassigned, unexpired tasks to current task set. Line 12 outputs the covered tasks per time period which will be used as the main performance metric in Section 8. The algorithm terminates when either running out of budget or all the tasks are covered (Line 9).

1:  Input: worker set , task set , budgets
2:  Output: selected workers
3:  For each time period
4:       Remove expired tasks
5:       Update task set
6:       Remove tasks that do not enclose any worker
7:       Construct worker set , each contains
8:       Init , uncovered tasks
9:       While and
10:          Select that maximize
11:          ;
12:      
13:       Keep uncovered tasks
Algorithm 1 Basic Algorithm

Basic can achieve fast task assignment by simply counting the number of tasks covered by each worker (Line 10). However, it treats all tasks equally without considering the spatial and temporal information of each task, i.e., location and deadline. For example, a task located in a “worker-sparse” area may not be assigned in the future due to lack of nearby workers and thus should be assigned with higher priority at the current iteration. Similarly, tasks that are expiring soon should be assigned with higher priorities. Consequently, the priority of a worker is high if he covers a larger number of high priority tasks. Below we introduce two assignment heuristics that explicitly model the task priority given its spatial and temporal characteristics.

5.1.2 Temporal Heuristic

One approach to prioritizing tasks is by considering their temporal urgency. The intuition is that a task which is further away from its deadline is more likely to be covered in the future, and vice versa. As a result, near-deadline tasks should have higher priorities to be assigned than others. Consequently, a worker who covers a large number of soon-to-expire tasks should be preferred for selection. Based on the above intuition, we model the priority of a worker based on the remaining time of each task he covers as follows.

(3)

The Temporal heuristic adapts Basic by selecting the worker with maximum priority at each stage. For instance, given two workers and at time , where and . Suppose both and expire in time periods and expires in time periods. The Temporal heuristic chooses over as their priorities are and , respectively. To implement Temporal, Line 10 in Algorithm 1 can be updated to select the worker with maximum priority defined as in Equation 3. We will empirically evaluate all heuristics in Section 8.

5.1.3 Spatial Heuristic

To maximize task assignment in the long term, we also consider the “popularity” of a task location as an indicator of whether the task can be assigned to future workers. Accordingly, we can spend the budget for the current time period to assign those tasks which can be only covered by existing workers. The “popularity” of a task region can be measured using Location Entropy [Cranshaw et al. (2010)], which captures the diversity of visits to that region. A region has a high entropy if many workers visit with equal probabilities. In contrast, a region has a low entropy if there are only a few workers visiting. We define the entropy of a given task as follows.

For task , let be the set of visits to the task region . Let be the set of distinct workers that visited , and be the set of visits that worker made to . The probability that a random draw from belongs to is . The entropy of is computed as follows

(4)

For efficient evaluation, can be approximated by aggregating the entropies of 2D grid cells within the task region and the cell entropies can be precomputed using historical data. Since any worker located inside can perform task , is likely to be covered in the future as long as one grid cell inside is “popular” among workers. Figure 3 illustrates the pre-computation of the entropy of task . When a task arrives, we first identify the grid cell that encloses the task location, i.e., the white cell in the center, and slightly adjust the task region (solid circle) to be centered at the white cell (dashed circle). We approximate the task entropy by the entropy of the dashed circle, which can be computed. This is because the dashed circle is solely determined by the white cell and radius . To further speed up the precomputation of all possible combination of the cell and the radius, we approximate the dashed circle by a set of cells whose centers are within the circle. With the entropy of every task covered by worker , his priority can be calculated as follows

(5)

Note that the constant is needed to avoid division by zero. Consequently, the Spatial heuristic greedily selects the worker with maximum priority at each stage. Line 10 in Algorithm 1 can be modified to reflect the spatial priority of each worker.

Figure 3: Approximation of Task Entropy.

5.2 Dynamic Budget

The second problem variant we study is more general, where a budget constraint is given for the entire campaign. This relaxation often results in higher task coverage. For example, in Figure 2, if budget is given at every time period, we select and and obtain the coverage of 5. However, the dynamic-budget variant yields higher coverage of 6 by selecting and at time . Below we study the problem complexity in the offline scenario and propose adaptive budget allocation strategies for the online scenario.

The challenge of the online problem is to allocate the overall budget over time periods () optimally, despite the dynamic arrivals of workers and tasks. Below we introduce several budget allocation strategies. Once a budget is allocated to a particular time period, we can adopt previously proposed heuristics, i.e., Basic, Spatial, Temporal, to select the best worker.

The simplest strategy, namely Equal, equally divides to time periods; each time period has budget and the last time period obtains the remainder. However, Equal may over-allocate budget to the time periods with small numbers of tasks. Another strategy is to allocate a budget to each time period proportional to the number of available tasks at that time period, i.e., , where is the total number of tasks. However, is not known a priori. Furthermore, we may still over-allocate budget to any time period with large , if none of the tasks can be covered by any workers (or all the tasks can be covered by 1 worker). We cannot allocate budget optimally without looking at the coverage instance set at each time period.

5.2.1 Adaptive Budget Allocation

To maximize task assignment, we need to adaptively allocate the overall budget and consider the “return” of selecting every worker, i.e., the worker priority, given the dynamic coverage instance set at every time period. We define the following two notions. Delta budget, denoted as , captures the current status of budget utilization, compared to a baseline budget strategy , e.g., the Equal strategy . Given a certain baseline , is the difference between the cumulative baseline budget and the actual budget spent up to time period . Formally, at any time period ,

(6)

A positive indicates budget is under-utilized, and vice versa. Another notion is delta gain, denoted as , which represents the return of a worker currently being considered () compared to the ones selected in the past (). Formally,

(7)

where is the gain of the current worker, calculated by any previously proposed local heuristic, i.e., as . is the average gain of previously added workers, i.e., . A positive indicates the current worker has higher priority than the historical average, and vice versa.

Based on the contextual information and at each stage of worker selection, we examine all available workers at the currently time period and decide whether to allocate budget to selecting any worker. Intuitively, when both and are positive, i.e., the budget is under-utilized and a worker has higher priority, the selection of the considered worker is favored. When both are negative, it may not be worthwhile to spend the budget. The other cases when one is positive and the other is negative are more complex, as we would like to spend budget on workers with higher priority but also need to save budget for future time periods in case better worker candidates arrive.

Our solution to the sequential decision problem is inspired by the well-known multi-armed bandit problem (MAB), which has been widely studied and applied to decisions in clinical trials, online news recommendation, and portfolio design. -greedy, which achieves a trade-off between exploitation and exploration, proves to be often hard to beat by other MAB algorithms [Vermorel and Mohri (2005)]. Hence, we propose an adaptive budget allocation strategy, based on contextual -greedy algorithm [Li et al. (2010)]. We illustrate our solution in Figure 4.

Figure 4: Adaptive budget allocation based on contextual -greedy.

At each stage of the local heuristic, a binary decision to make is whether to allocate budget to activate the current worker with the highest priority. The contextual -greedy algorithm allows us to specify an exploration-exploitation ratio, i.e., , based on the worker’s context, i.e., and . As depicted in Figure 4, an -greedy algorithm is used to determine whether to select the current worker based on his and . For each case, a YES decision is made with probability and a NO decision with probability. By default, we set and to reflect NO and YES decisions, respectively, as discussed before. When and have different signs, the decision is not as straightforward as the other cases and thus we set to allow YES and NO decisions with equal probabilities. The pseudo code of our adaptive algorithm is depicted in Algorithm 2.

1:  Input: , , total budgets
2:  Output: selected workers
3:  Init ; used budget ; average gain
4:  Budget allocation with Equal strategy
5:  For each time period
6:       Perform Lines 4-8 from Algorithm 1
7:       Remained budget
8:       If , then {the last time period}
9:       Otherwise,
10:       While and is not empty:
11:          Select in with highest
12:          Delta gain
13:          If and and , then break
14:          If and and , then break
15:          If and and , then break
16:          If and and , then break
17:         
18:          Perform Line 11 from Algorithm 1
19:       {update the budget}
20:      
21:       Perform Lines 12,13 from Algorithm 1
Algorithm 2 Adaptive Budget Algorithm (Adapt)

5.2.2 Historical Workload

Previously our solution is simplified by considering as the baseline budget strategy. Since human activity exhibits temporal patterns, understanding those patterns may help to guide budget allocation. Therefore, we propose to compute a baseline budget strategy with historical data that captures the expected worker/task patterns. The study in [Musthag and Ganesan (2013)] shows the time-of-day usage patterns of workers in mobile crowdsourcing applications. The activity peaks are between 4 to 7 pm when workers leave their day jobs. Similar patterns are observed in Foursquare and Gowalla data sets in Figure 5. Figure 5(a) shows the hourly count of check-ins present three peaks, i.e., during lunch and morning/afternoon commute. In Figure  5(b), we can observe peak check-in activities during weekends.

(a) Foursquare, 16x24 hours
(b) Gowalla, 32x7 days
Figure 5: Daily and weekly human activity patterns.

With historical worker and task information, we can leverage the optimal budget allocation strategy in the recent past and use it as the baseline strategy in Equation 6. We propose to learn the budget allocation of previous time periods, namely workload, using the greedy algorithm for the offline problem. To guide future budget allocation decisions, the previous workload will be used as the baseline in Equation 6. We will empirically evaluate our proposed solutions in the experiment section.

6 Worker Overload

In this section, we present an enhancement to our solution in order to avoid repetitive activations of the same workers. The practical implication is that those workers who locate in popular areas can be repeatedly selected by our heuristics. Overloading workers may result in undesirable consequences, such as tasks being rejected and the workers either feel annoyed or stressed out to report. Several recent studies [Alfarrarjeh et al. (2015), Zhang et al. (2016), Liu et al. (2016a)] also discuss the issue of over-assigning tasks to workers. These studies minimize worker overloading by balancing the workload of the workers. For example, the objective is to find an assignment that minimizes the variance of the workload among workers, i.e., maximize the so-called social fairness [Liu et al. (2016a)]. Another work [Alfarrarjeh et al. (2015)] also aims to assign a similar number of tasks to each worker. However, none of these studies considers task assignment and worker overloading as a multi-objective optimization problem.

Our idea is to minimize the phenomenon of overloading. This requires to maintain the number of times each worker has been activated up to time , . The counter is defined as:

(8)

where represents a decision to select the worker at time : if the worker is selected, otherwise . The brackets enclose a condition that includes the term to the sum iff is identified by the same .

We include minimization of worker overloading as another objective to coverage maximization. In the following, we formulate a multi-objective optimization (MOO) problem and propose solutions in both fixed-budget and dynamic-budget scenarios.

6.1 Fixed Budget

In the fixed-budget setting, we formally define the multi-objective optimization (MOO) problem for each time instance below:

(9a)
(9b)
(9c)

Equation 9a maximizes the coverage of the selected workers while Equation 9b minimizes the highest activation count across all workers present at the time. The constraint in Equation 9c ensures the number of selected workers does not exceed the budget at each time instance.

Rather than coming up with heuristics to sort the workers according to two objectives, we adopt a widely used approach, i.e., nondominated sorting genetic algorithm (NSGA) [Srinivas and Deb (1994)], to solve the MOO formulation for each time instance. Intuitively, nondominated sorting is to maintain stable nondominated fronts (i.e., subpopulations of good individuals) in a multi-dimensional space, where each dimension corresponds to an objective. A nondominated front, also referred to as Pareto optimal, is a solution where none of the objective functions can be improved in value without degrading other objective values. The advantage of genetic algorithms is that they simultaneously deal with a set of possible solutions i.e., population, which enables us to find several members of the Pareto optimal set in a single run of the algorithm. We outline our solution based on the NSGA algorithm in Algorithm 3.

1:  Population RandomInit,
2:  While
3:       Select nondominated fronts , ranked by Eqs. 9a and 9b
4:       Mutation Crossover
5:      
6:      
7:  Select best solution from , ranked by Eq. 10
Algorithm 3 NSGA Algorithm

The results of NSGA, at the end of while loop, include a set of nondominated fronts. Subsequently, in Line 7 we select the best individual solution based on a weighted sum of objective values:

(10)

In Equation 10, is a linear coefficient, , to specify the weight for each objective. The higher , the more important the objective of the Equation 9a in comparison to that of Equation 9b. The minus sign indicates the minimization objective in Equation 9b. Both objective functions are normalized by the total number of tasks and the total number of time instances , respectively. In our experiments, we adopted NSGA-II version [Deb et al. (2002)] for implementing Algorithm 3 and set .

6.2 Dynamic Budget

In the dynamic-budget setting, the multi-objective optimization (MOO) formulation is similar to Equation 9a, 9b but the total budget is constrained over all time periods. Therefore, the constraint 9c is replaced by the following constraint:

(11)

In the online setting, we need to simultaneously consider the task coverage and the number of activations of the candidate worker, in order to optimize both objectives. As a result, we modify the adaptive strategy in Section 5.2.1 and define the gain of the current worker in (7) to be a linear combination of the number previous activations and his priority:

(12)

In equation 12, and are respectively the priority of the worker calculated by any previously proposed local heuristic, and the number of times that worker was selected. The coefficient can be varied to balance the importance of the overloading and the priority.

7 Distance-based Task Utility

Thus far, our goal is to maximize the number of assigned tasks, assuming assigning to any worker within the task region is equivalent. However, in practice an assignment of a task to a nearby worker may yield higher utility than that of a farther worker [Miao et al. (2016)]. Thus, in this section we aim to generalize the binary-utility model to a distance-based-utility variant, i.e., maximizing the utility of covered tasks. We assume the utility of worker ’s response to task is a function of the spatial distance between them: . And is a decreasing function of worker-task distance. Intuitively, the utility is at the highest when the worker is co-located with the task and decreases as the worker-task distance increases. The utility is zero if the distance is larger than task radius. We consider three cases depicted in Figure 6: (i) Binary, where utility has value 1/0 (ii) Linear, where utility decreases linearly with the worker-task distance and (iii) Zipf, where utility follows Zipfian distribution with skewness parameter . The higher the value of , the faster utility drops.

Figure 6: Distance-based utility functions.

This extension can be incorporated into all the previously developed algorithms. Specifically, Algorithm 1 (Line 10) now chooses the worker that maximizes utility increase at each stage.

(13)

With the temporal heuristic, Equation 3 becomes:

(14)

In the same fashion, with the spatial heuristic, Equation 5 becomes:

(15)

With the adaptive budget strategies, the gain of a candidate worker is adapted similarly.

8 Performance Evaluation

8.1 Experimental Methodology

We adopted real-world datasets from location-based applications, summarized in Table 2, to emulate spatial crowdsourcing (SC) workers and tasks. We consider Gowalla, Foursquare users as SC workers and the venues as tasks. The Gowalla dataset contains check-ins for 224 days in 2010, including more than 100,000 spots (e.g., restaurants), within the state of California. By considering each day as a unit time period, all the users who checked in during a day are available workers for that time period in our setting. The Foursquare dataset contains the check-in history of 45,138 users to 89,968 venues over 384 hours in Pittsburgh, Pennsylvania. We considered each hour as a unit time period for this dataset.

Name #Tasks #Workers MTD 121212MTD: Mean Travel Distance [To et al. (2014)]
Foursquare 89,968 45,138 (90/) 16.6km 1 hour
Gowalla 151,075 6,160 (35/) 3.6km 1 day
Table 2: Summaries of real-world datasets.

We generated a range of datasets with SCAWG toolbox [To et al. (2016a)] by utilizing real-world worker/task spatial distributions and varying their arrival rate. We generated worker count following COSINE (default) and POISSON distributions with mean = 50 and set default value of task count per time period to be constant, i.e., 1000. We denote Go-POISSON a dataset that uses Gowalla for the spatial distribution and POISSON for the worker arrival rate.

In all of our experiments, for Gowalla dataset, we varied the total number of time periods {7, 14, 28, 56} and the budget {56, 112, 224, 448, 896, 1288}. For Foursquare, {24, 48, 72, 96} and {24, 48, 96, 192,…, 1536} because we modeled a time period as one hour. The task duration was randomly chosen from 1 to and the task radius {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} km. The choices of and values are defined by the CHRS experts. Default values are shown in boldface. Finally, experiments were run on an Intel(R) Core(TM)i7-2600 CPU @ 3.40 GHz with 8 GB of RAM.

8.2 Experimental Results

In the following we evaluate our solutions in terms of the number of covered tasks, i.e., task coverage. We first show the performance of offline solutions (Section 8.2.1). Then we present the results for the online scenario, including local heuristics, adaptive budget strategy, and workload heuristic (Section 8.2.2). We next show the results of distance-based utility and worker overloading (Section 8.2.3), followed by runtime measurements (Section 8.2.4).

8.2.1 Optimal Solutions for Offline Setting

We implemented the offline solutions for the two problem variants, and (Section 4), using integer linear programming. These algorithms provide optimal results, which are used as the upper-bounds of the online algorithms. Figure 7(a) illustrates the results for Go-POISSON by varying the total budget . As expected, higher budget yields higher coverage. Also, DynamicOff yields higher coverage than FixedOff as FixedOff is the special case of DynamicOff. However, the higher the budget, the smaller the performance gap between them. This effect can be explained by the diminishing return property of the max cover problem. That is, the more workers are selected, the smaller gain of each selected worker. We also evaluated the offline solutions by varying task radius (Figure 7(b)). Intuitively, when increases, more workers are located within the task’s spatial range. This means that a worker is eligible to perform more tasks, which yields higher task coverage.

(a) Vary , Go-COSINE
(b) Vary , Go-COSINE
(c) Vary , Go-POISSON
(d) Vary , Go-POISSON
Figure 7: Performance of offline solutions with Go-COSINE and Go-POISSON.

Similar trends were observed for Go-POISSON as shown in Figures 7(c) and 7(d). We observe a small difference between FixedOff and DynamicOff for Go-POISSON in Figure 7(d). However, when the arrival rate has high variance, such as in Go-COSINE, DynamicOff shows more coverage over FixedOff in Figure 7(b). The reason is that FixedOff uses a constant budget to the time periods with high spikes while DynamicOff can allocate more budget to those time periods to cover more tasks.

8.2.2 Local Heuristics and Budget Allocations for Online Setting


The Performance of Heuristics: We evaluate the performance of the online heuristics for from Section 5.1, Basic, Spatial and Temporal. Figures 8(a) shows the improvements of Spatial and Temporal over Basic on Go-COSINE. When the budget is high, we observe that the simple heuristic Basic already obtains results close to the optimal solution. This is because most workers are selected. Furthermore, Spatial and Temporal yield 2% and 4% higher coverage than Basic (at ) and their performance converges as increases. Similar trends can be observed when increasing the task radius in Figure 8(b).

(a) Vary , Go-COSINE
(b) Vary , Go-COSINE
Figure 8: Performance of local heuristics in the fixed-budget scenario.

Adaptive Budget Allocation Strategy: We evaluate the performance of the adaptive budget allocation strategy in Section 5.2.1 by comparing with three baseline strategies inspired by a few previous studies, namely, Equal [To et al. (2016b), Tran-Thanh et al. (2013)], Random [Tran-Thanh et al. (2013)] and Naive [Kazemi et al. (2013), Ji et al. (2016), Zhang et al. (2015)]. In Equal budget strategy, the total budget is allocated equally to the time intervals. In Random budget strategy, a random positive number is generated for each time interval and then each time interval is given a budget . In Naive budget strategy, there is no particular limitation for the budget of each time interval, the budget is used until no more worker is available or entire budget is exhausted. We use local heuristic Basic to compare the performances of budget strategies. Figure 9 shows the results of the budget strategies when varying the total budget . AdaptB is shown to be the best in coverage. EqualB and RandomB do not perform as well as AdaptB, as they lack an intelligent budget allocation strategy. NaiveB performs poorly as it selects the workers on a first-come-first-serve basis without considering future time intervals. As the result, the total budget is quickly exhausted during the first few time intervals. The difference between AdaptB and the others is higher with GO-COSINE because it has more fluctuation in the number of workers over time. This shows that AdaptB can quickly adapt to the dynamic arrivals of workers.

(a) Vary , Go-COSINE
(b) Vary , Go-POISSON
Figure 9: Performance of adaptive budget allocation strategies.

Historical Workload Improvement: We also evaluate the performance of the adaptive budget allocation strategy applied with local heuristics, in which Temporal is shown to perform better than Basic. Figure 10 shows the results of EqualB, AdaptB, AdaptT and AdaptTW (AdaptT with historical workload improvement) when varying total budget and task radius on Go-COSINE and Go-POISSON datasets. We include DynamicOff as the optimal result for reference. As can be seen in that figure, with small budgets, AdaptTW, which uses historical optimal workload as the baseline budget strategy, has higher coverage than the others and with , EqualB performs better than AdaptB, AdaptT. The reason is with small budgets, AdaptB and AdaptT do not have enough contextual information. With higher budget (K ), the adaptive algorithms perform better than EqualB and their results are close to the optimal result. We observe similar results when varying .

(a) Vary , Go-COSINE
(b) Vary , Go-COSINE
(c) Vary , Go-POISSON
(d) Vary , Go-POISSON
Figure 10: Performance of adaptive budget allocation in the dynamic-budget scenario.

We further study the performance of various budget allocation strategies by plotting the task coverage across multiple workloads using boxplots. Figure 11 shows the results of the techniques with the default parameter setting. As can be seen in the figure, the adaptive algorithms perform better than EqualB especially with GO-COSINE dataset. While AdaptTW has the highest median, minimum, maximum values, AdaptT is the most stable method with the smallest difference between the minimum and the maximum values.

(a) Go-COSINE
(b) Go-POISSON
Figure 11: Boxplots for various budget allocation strategies.

8.2.3 Distance-based Task Utility and Worker Overload


Distance-based Task Utility: We show the performance of AdaptT under distance-based functions for task utility from Section 7. In Figure 12(a), we observe that the obtained task coverage in the cases of Linear and Zipfian are similar to the result in the binary-utility model. We also present the overlapping ratio of selected workers between distance-based utility and binary utility as shown in Figure 12(b). The figure shows that when total budget increases, initially, the ratio decreases because of more different workers can be selected and then ratio increases because when the budget is large enough, most workers are selected in both cases, resulting in large overlaps.

(a) Task coverage (vary )
(b) Overlapping ratio (vary )
Figure 12: Performance of AdaptT on Go-COSINE with distance-based utility. The overlapping ratio indicates the percentages of workers that are selected in both binary and the corresponding distance-based utility model.

Worker Overload Minimization: In this section we evaluate the performance of the multi-objective optimization techniques in both fixed-budget and dynamic-budget scenarios. The techniques are evaluated in terms of balancing the trade-off between maximizing task coverage and minimizing worker overload. In the fixed-budget scenario (Section 6.1), EqualGA refers to the equal-budget strategy with NSGA in Algorithm 3. We observe that varying coefficient does not significantly change task coverage (Figure 13(a)). This is due to the equal allocation of the total budget to each time period, which yields suboptimal task coverage. By increasing the average number of activations per worker in Figure 13(b) shows a slightly decreasing trend, due to a higher weight on the second objective. In the dynamic-budget scenario, AdaptT-MOO refers to adaptive budget allocation with temporal local heuristic and the multi-objective optimization in Section 6.2. Figure 13(c) shows that the task coverage is quite stable as increases while the average number of activation decreases significantly. This means that our adaptive budget allocation strategy achieves workload balancing among the workers at a very small cost in utility. Furthermore, without loss of generality, based on the observations, we set for the following experiments.

Figure 14 shows the distribution of activation counts of selected workers when . In the fixed-budget setting, it is shown that EqualGA does not cover as many tasks as EqualB but it activates more workers for a small number of times, i.e., 1, 2, and 3 times. In the dynamic-budget setting, AdaptT-MOO also has more workers with a small number of activations and yields comparable task coverage, compared to AdaptT. We conclude that our solutions can mitigate worker overloading without compromising task assignment.

(a) Task coverage (vary )
(b) Average number of activations (vary )
(c) Task coverage (vary )
(d) Average number of activations (vary )
Figure 13: Performance of EqualGA and AdaptT-MOO in the when varying .
(a) Fixed budget
(b) Dynamic budget
Figure 14: Worker activation count distribution of MOO-based algorithms in the fixed and dynamic budget scenarios ().

8.2.4 Runtime Measurements

Figure 15 shows the runtime performance of our online algorithms by varying the number of tasks per time period. As expected, the runtime linearly increases when the number of tasks grows. In the fixed-budget scenario, the runtimes of the local heuristics (e.g., Temporal) are the same as Basic while the runtime of EqualGA is higher due to having a large number of iterations for Algorithm 3. We do not show the runtime of Spatial heuristic and Zipfian utility model but their runtimes are similar to EqualB and EqualT-Linear, respectively. In the dynamic-budget scenario, the runtime of AdaptTW is higher than AdaptT, AdaptT-Linear, and AdaptT-MOO. This suggests that the workload heuristic significantly increases the overhead of AdaptT. However, the MOO extension does not incur observable runtime overhead in the dynamic-budget scenario.

(a) Fixed budget, Go-COSINE
(b) Dynamic budget, Go-COSINE
(c) Fixed budget, Fo-COSINE
(d) Dynamic budget, Fo-COSINE
Figure 15: Average runtime per time period with Go-COSINE and Fo-COSINE.

9 Discussion

Existing studies show that knowing the worker mobility pattern a priori can improve the efficiency of the task assignment [Ji et al. (2016), Zhang et al. (2015)]. Even though, our solution does not consider individual worker mobility pattern, i.e. the worker’s trajectory, for task assignment. However, our heuristics (Section 5.1.3) does consider worker population mobility pattern by prioritizing tasks whose locations are not likely to be visited by many workers in the future. Furthermore, our dynamic budget algorithm (Algorithm 2) takes into account the dynamic arrivals of workers and tasks as well as their co-location relationship.

It is worth noting that in our problem settings 1) task assignment is real-time and online and 2) workers are not required to travel to perform tasks. Workers can respond to a task immediately after receiving the task notifications from SC-server. Therefore, they do not need to perform a sequence of tasks as in a typical mobile crowdsourcing where workers often chain multiple tasks to maximize their earnings while minimizing travel time [Ji et al. (2016)]. In addition, workers’ trajectories within the same task region would not have much impact in our problem setting, as the workers are not required to travel to perform the task. Obviously, as the workers move, they may become relevant to another spatial task and/or irrelevant to the prior task, which can be represented as the addition and deletion of a worker in our framework at a given snapshot.

10 Conclusion

Motivated by weather crowdsourcing applications, we introduced the problem of Hyperlocal Spatial Crowdsourcing, where tasks can be performed by workers within their spatiotemporal vicinity. We studied task assignment in Hyperlocal SC to maximize the covered tasks without exceeding the budget for activating workers. A range of problem variants was considered, including offline vs. online, budget constraint for each time period vs. for the entire campaign, single objective vs. multiple objectives, and binary vs. distance-based utility. We showed that the offline variants are NP-hard and proposed several local heuristics and the dynamic budget allocation for the online scenario which utilize the spatial and temporal properties of workers/tasks. We generated spatial crowdsourcing workloads with SCAWG tool and conducted extensive experiments. We concluded that AdaptT, which merits the temporal local heuristic and dynamic budget allocation, is the superior technique in terms of utility and runtime. The extensions to measure distance-based utility and to minimize worker overloading were shown to be very effective and do not impose significant runtime overhead. As future work, we will consider non-uniform activation cost of the workers, which represents the reputation or the compensation demand of each worker. We will also consider assigning a task to multiple workers to improve the quality of collected data and utilizing known worker mobility patterns to boost task assignment.

11 Acknowledgments

We would like to thank CHRS researchers, especially Dr. Phu Dinh Nguyen for leading the development of the iRain project: http://irain.eng.uci.edu/.

This research has been funded by NSF grants IIS-1320149, CNS-1461963, the USC Integrated Media Systems Center, and the University at Albany. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of any of the sponsors such as NSF.

References

  • [1]
  • iRa (2016) 2016. iRain: new mobile App to promote citizen-science and support water management. (2016). http://en.unesco.org/news/irain-new-mobile-app-promote-citizen-science-and-support-water-management
  • Alfarrarjeh et al. (2015) Abdullah Alfarrarjeh, Tobias Emrich, and Cyrus Shahabi. 2015. Scalable Spatial Crowdsourcing: A study of distributed algorithms. In Mobile Data Management (MDM), 2015 16th IEEE International Conference on, Vol. 1. IEEE, 134–144.
  • Chekuri and Kumar (2004) Chandra Chekuri and Amit Kumar. 2004. Maximum coverage problem with group budget constraints and applications. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Springer, 72–83.
  • Cheng et al. (2016) Peng Cheng, Xiang Lian, Lei Chen, Jinsong Han, and Jizhong Zhao. 2016. Task assignment on multi-skill oriented spatial crowdsourcing. IEEE Trans. Knowl. Data Eng. 28, 8 (2016), 2201–2215.
  • Cheng et al. (2015) Peng Cheng, Xiang Lian, Zhao Chen, Rui Fu, Lei Chen, Jinsong Han, and Jizhong Zhao. 2015. Reliable diversity-based spatial crowdsourcing by moving workers. Proc. VLDB Endow. 8, 10 (2015), 1022–1033.
  • Cranshaw et al. (2010) Justin Cranshaw, Eran Toch, Jason Hong, Aniket Kittur, and Norman Sadeh. 2010. Bridging the gap between physical location and online social networks. In Proceedings of the 12th ACM international conference on Ubiquitous computing. ACM.
  • Dang et al. (2013) Hung Dang, Tuan Nguyen, and Hien To. 2013. Maximum Complex Task Assignment: Towards Tasks Correlation in Spatial Crowdsourcing. In Proceedings of International Conference on Information Integration and Web-based Applications & Services. ACM, 77.
  • Deb et al. (2002) Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. Evolutionary Computation, IEEE Transactions on 6, 2 (2002), 182–197.
  • Deng et al. (2016) Dingxiong Deng, Cyrus Shahabi, Ugur Demiryurek, and Linhong Zhu. 2016. Task selection in spatial crowdsourcing from worker’s perspective. Geoinformatica 20, 3 (jul 2016), 529–568.
  • Dorminey (2014) Bruce Dorminey. 2014. Crowdsourcing The Weather. (February 2014). http://www.forbes.com/sites/brucedorminey/2014/02/26/crowdsourcing-as-the-future-of-weather-forecasting/ [Accessed Jan. 2016].
  • Feige (1998) Uriel Feige. 1998. A threshold of ln n for approximating set cover. Journal of the ACM (JACM) 45, 4 (1998), 634–652.
  • Gao et al. (2017) Dawei Gao, Yongxin Tong, Jieying She, Tianshu Song, Lei Chen, and Ke Xu. 2017. Top-k Team Recommendation and Its Variants in Spatial Crowdsourcing. Data Science and Engineering (2017), 1–15.
  • Gao et al. (2015) Hui Gao, Chi Harold Liu, Wendong Wang, Jianxin Zhao, Zheng Song, Xin Su, Jon Crowcroft, and Kin K Leung. 2015. A survey of incentive mechanisms for participatory sensing. IEEE Communications Surveys & Tutorials 17, 2 (2015), 918–943.
  • Guo et al. (2016) Bin Guo, Yan Liu, Wenle Wu, Zhiwen Yu, and Qi Han. 2016. ActiveCrowd: A Framework for Optimized Multitask Allocation in Mobile Crowdsensing Systems. IEEE Transactions on Human-Machine Systems (2016).
  • He et al. (2014) Shibo He, Dong-Hoon Shin, Junshan Zhang, and Jiming Chen. 2014. Toward optimal allocation of location dependent tasks in crowdsensing. In INFOCOM, 2014 Proceedings IEEE. IEEE, 745–753.
  • Hochbaum (1996) Dorit S Hochbaum. 1996. Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems. In Approximation algorithms for NP-hard problems.
  • Hu et al. (2016) Huiqi Hu, Yudian Zheng, Zhifeng Bao, Guoliang Li, Jianhua Feng, and Reynold Cheng. 2016. Crowdsourced POI labelling: Location-aware result inference and Task Assignment. In 2016 IEEE 32nd Int. Conf. Data Eng. ICDE 2016. IEEE, 61–72.
  • Ji et al. (2016) Shenggong Ji, Yu Zheng, and Tianrui Li. 2016. Urban sensing based on human mobility. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 1040–1051.
  • Jin et al. (2016) Haiming Jin, Lu Su, Bolin Ding, Klara Nahrstedt, and Nikita Borisov. 2016. Enabling privacy-preserving incentives for mobile crowd sensing systems. In Distributed Computing Systems (ICDCS), 2016 IEEE 36th International Conference on. IEEE, 344–353.
  • Kandappu et al. (2016) Thivya Kandappu, Archan Misra, Shih-fen Cheng, Nikita Jaiman, and Randy Tandriansiyah. 2016. Campus-Scale Mobile Crowd-Tasking : Deployment & Behavioral Insights. In Proc. 19th ACM Conf. Comput. Coop. Work & Soc. Comput. ACM Press, New York, New York, USA, 798–810.
  • Kazemi and Shahabi (2012) Leyla Kazemi and Cyrus Shahabi. 2012. GeoCrowd: enabling query answering with spatial crowdsourcing. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems.
  • Kazemi et al. (2013) Leyla Kazemi, Cyrus Shahabi, and Lei Chen. 2013. GeoTruCrowd: trustworthy query answering with spatial crowdsourcing. In The 21st ACM SIGSPATIAL GIS 2013.
  • Li et al. (2010) Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. ACM, 661–670.
  • Li et al. (2015) Yu Li, Man Lung Yiu, and Wenjian Xu. 2015. Oriented online route recommendation for spatial crowdsourcing task workers. In Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Vol. 9239. Springer International Publishing, 137–156.
  • Liu et al. (2016a) Qing Liu, Talel Abdessalem, Huayu Wu, Zihong Yuan, and Stéphane Bressan. 2016a. Cost Minimization and Social Fairness for Spatial Crowdsourcing Tasks. In International Conference on Database Systems for Advanced Applications. Springer, 3–17.
  • Liu et al. (2016b) Yan Liu, Bin Guo, Yang Wang, Wenle Wu, Zhiwen Yu, and Daqing Zhang. 2016b. TaskMe: Multi-Task Allocation in Mobile Crowd Sensing. In Proc. 2016 ACM Int. Jt. Conf. Pervasive Ubiquitous Comput. - UbiComp ’16. ACM Press, New York, New York, USA, 403–414.
  • Miao et al. (2016) Chunyan Miao, Han Yu, Zhiqi Shen, and Cyril Leung. 2016. Balancing quality and budget considerations in mobile crowdsourcing. Decision Support Systems 90 (2016), 56–64.
  • Musthag and Ganesan (2013) Mohamed Musthag and Deepak Ganesan. 2013. Labor dynamics in a mobile micro-task market. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 641–650.
  • Pournajaf et al. (2014) Layla Pournajaf, Li Xiong, Vaidy Sunderam, and Slawomir Goryczka. 2014. Spatial task assignment for crowd sensing with cloaked locations. In Proc. - IEEE Int. Conf. Mob. Data Manag., Vol. 1. IEEE, 73–82.
  • Sales Fonteles et al. (2016) André Sales Fonteles, Sylvain Bouveret, and Jérôme Gensel. 2016. Trajectory recommendation for task accomplishment in crowdsourcing – a model to favour different actors. J. Locat. Based Serv. 10, 2 (apr 2016), 125–141.
  • Song et al. (2014) Zheng Song, Chi Harold Liu, Jie Wu, Jian Ma, and Wendong Wang. 2014. QoI-Aware Multitask-Oriented Dynamic Participant Selection With Budget Constraints. Vehicular Technology, IEEE Transactions on 63, 9 (2014), 4618–4632.
  • Srinivas and Deb (1994) Nidamarthi Srinivas and Kalyanmoy Deb. 1994. Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary computation 2, 3 (1994), 221–248.
  • To et al. (2016a) Hien To, Mohammad Asghari, Dingxiong Deng, and Cyrus Shahabi. 2016a. SCAWG: A toolbox for generating synthetic workload for spatial crowdsourcing. In 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops). IEEE, 1–6.
  • To et al. (2016b) Hien To, Liyue Fan, Luan Tran, and Cyrus Shahabi. 2016b. Real-time task assignment in hyperlocal spatial crowdsourcing under budget constraints. In Pervasive Computing and Communications (PerCom), 2016 IEEE International Conference on. IEEE, 1–8.
  • To et al. (2017) Hien To, Gabriel Ghinita, Liyue Fan, and Cyrus Shahabi. 2017. Differentially private location protection for worker datasets in spatial crowdsourcing. IEEE Transactions on Mobile Computing 16, 4 (2017), 934–949.
  • To et al. (2014) Hien To, Gabriel Ghinita, and Cyrus Shahabi. 2014. A framework for protecting worker location privacy in spatial crowdsourcing. Proceedings of the VLDB Endowment 7, 10 (2014), 919–930.
  • To et al. (2015) Hien To, Cyrus Shahabi, and Leyla Kazemi. 2015. A server-assigned spatial crowdsourcing framework. ACM Transactions on Spatial Algorithms and Systems 1, 1 (2015), 2.
  • Tong et al. (2016) Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen. 2016. Online mobile Micro-Task Allocation in spatial crowdsourcing. In 2016 IEEE 32nd Int. Conf. Data Eng. ICDE 2016. IEEE, 49–60.
  • Tran-Thanh et al. (2013) Long Tran-Thanh, Matteo Venanzi, Alex Rogers, and Nicholas R Jennings. 2013. Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems. 901–908.
  • ul Hassan and Curry (2014) Umair ul Hassan and Edward Curry. 2014. A multi-armed bandit approach to online spatial task assignment. In 11th IEEE International Conference on Ubiquitous Intelligence and Computing UIC.
  • Venanzi et al. (2013) Matteo Venanzi, Alex Rogers, and Nicholas R Jennings. 2013. Crowdsourcing spatial phenomena using trust-based heteroskedastic gaussian processes. In First AAAI Conference on Human Computation and Crowdsourcing.
  • Vermorel and Mohri (2005) Joannes Vermorel and Mehryar Mohri. 2005. Multi-armed bandit algorithms and empirical evaluation. In Machine Learning: ECML 2005. Springer, 437–448.
  • Wang et al. (2017) Leye Wang, Dingqi Yang, Xiao Han, Tianben Wang, Daqing Zhang, and Xiaojuan Ma. 2017. Location Privacy-Preserving Task Allocation for Mobile Crowdsensing with Differential Geo-Obfuscation. (2017).
  • Xiao et al. (2015) Mingjun Xiao, Jie Wu, Liusheng Huang, Yunsheng Wang, and Cong Liu. 2015. Multi-task assignment for crowdsensing in mobile social networks. In Computer Communications (INFOCOM), 2015 IEEE Conference on. IEEE, 2227–2235.
  • Zhang et al. (2015) Bo Zhang, Zheng Song, Chi Harold Liu, Jian Ma, and Wendong Wang. 2015. An event-driven qoi-aware participatory sensing framework with energy and budget constraints. ACM Transactions on Intelligent Systems and Technology (TIST) 6, 3 (2015), 42.
  • Zhang et al. (2014) Daqing Zhang, Haoyi Xiong, Leye Wang, and Guanling Chen. 2014. CrowdRecruiter: selecting participants for piggyback crowdsensing under probabilistic coverage constraint. In ACM UbiCom 2016. ACM, 703–714.
  • Zhang et al. (2016) Hongli Zhang, Zhikai Xu, Xiaojiang Du, Zhigang Zhou, and Jiantao Shi. 2016. CAPR: context-aware participant recruitment mechanism in mobile crowdsourcing. Wireless Communications and Mobile Computing 16, 15 (2016), 2179–2193.
  • Zhang et al. (2017) Xinglin Zhang, Zheng Yang, Yunhao Liu, Jianqiang Li, and Zhong Ming. 2017. Toward Efficient Mechanisms for Mobile Crowdsensing. IEEE Transactions on Vehicular Technology 66, 2 (2017), 1760–1771.
  • Zheng et al. (2014) Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 3 (2014), 38.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
15564
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description