Collaborative P2P Streaming of
Interactive Live Free Viewpoint Video
Abstract
We study an interactive live streaming scenario where multiple peers pull streams of the same free viewpoint video that are synchronized in time but not necessarily in view. In free viewpoint video, each user can periodically select a virtual view between two anchor camera views for display. The virtual view is synthesized using texture and depth videos of the anchor views via depthimagebased rendering (DIBR). In general, the distortion of the virtual view increases with the distance to the anchor views, and hence it is beneficial for a peer to select the closest anchor views for synthesis. On the other hand, if peers interested in different virtual views are willing to tolerate larger distortion in using more distant anchor views, they can collectively share the access cost of common anchor views.
Given anchor view access cost and synthesized distortion of virtual views between anchor views, we study the optimization of anchor view allocation for collaborative peers. We first show that, if the network reconfiguration costs due to viewswitching are negligible, the problem can be optimally and efficiently solved in polynomial time using dynamic programming. We then consider the case of nonnegligible reconfiguration costs (e.g., large or frequent viewswitching leading to anchorview changes). In this case, the view allocation problem becomes NPhard. We thus present a locally optimal and centralized allocation algorithm inspired by Lloyd’s algorithm in nonuniform scalar quantization. We also propose a distributed algorithm with guaranteed convergence where each peer group independently make mergeandsplit decisions with a welldefined fairness criteria. The results show that depending on the problem settings, our proposed algorithms achieve respective optimal and closetooptimal performance in terms of total cost, and substantially outperform a P2P scheme without collaborative anchor selection.
I Introduction
The advent of multiview imaging technologies means that videos from different viewpoints of the same 3D scene can now be captured simultaneously by a system of multiple closely spaced cameras [1]. If depth maps (perpixel distance between camera and physical objects) from the same camera viewpoints are also available,^{1}^{1}1Depth maps can be captured directly through timeofflight (ToF) cameras [2], or indrectly through stereomatching algorithms. then virtual views can be synthesized during video playback using texture and depth maps of the closest captured camera views (i.e., anchor views) via depthimagebased rendering (DIBR) [3]. This ability to construct and observe any virtual view is called free viewpoint video [4], which enables a 3D visual effect known as motion parallax [5]: a viewer’s detected head movements trigger correspondingly shifted video views on his/her 2D display. It is well known that motion parallax is the strongest cue in human’s perception of depth in a 3D scene [6], enhancing the immersive experience.
In a live free viewpoint video streaming scenario, texture and depth videos from multiple viewpoints in the same 3D scene are realtime encoded into separate streams at server before delivery to interested peers. The clients, organized in a P2P system, can choose to look at the recorded anchor views or virtual views that are arbitrarily positioned between the anchor views. Because the distortion of synthesized view tend to be larger as virtual view distance to anchor views increases [7], it is beneficial for a viewer to request anchor views that tightly “sandwich” the virtual viewpoint he wants to look at. On other hand, given that a group of local peers can share the access cost of common anchor views, peers have incentive to collaboratively select and share the same anchor views, even if doing so means that the anchor views are further away with a distortion penalty in the synthesized views. In this paper, we investigate the anchor view allocation problem for collaborative streaming of live free viewpoint video under different network settings. To the best of our knowledge, this is the first piece of work addressing such an issue for collaborative streaming of free viewpoint video.
As a peer changes his interested view over time, may eventually move outside the viewing range delimited by his two current anchor views and . This necessitates the system to reallocate new anchor views for the peer. If such network reconfiguration costs due to peers’ viewswitching is negligible, we first show that the anchor view allocation problem can be efficiently and optimally solved in polynomial time using dynamic programming (DP). This is true no matter if the anchor view access cost from the server to the group of peers is formulated as a constraint (i.e., the maximum number of anchor views allocated to a peer group cannot be larger than a certain number ) or as a cost function (i.e., each anchor view pulled from the source incurs a certain access cost ).
On the contrary, if the network reconfiguration cost is nonnegligible due to peers’ viewswitching, (e.g., in the case of large or frequent viewswitching by the peers), the problem of anchor view allocation becomes NPhard for both formulations of anchor view access cost (as a constraint or as a cost function). We thus present a locally optimal and centralized allocation algorithm inspired by the Lloyd’s algorithm in nonuniform scalar quantization [8]. Finally, we propose a distributed version of the algorithm with guaranteed convergence, where each peer group can independently makes mergeandsplit decisions with a welldefined fairness criteria. The results show that our proposed algorithms achieve optimal and closetooptimal performance respectively in terms of total cost, and substantially outperform a P2P scheme without collaborative anchor selection.
The outline of the paper is as follows. We first discuss related work in Section II. We then overview the live free viewpoint video streaming in Section III. We first formulate the anchor view allocation problem with negligible network reconfiguration cost and the corresponding optimal DP algorithm in Section IV. We then formulate our problem with reconfiguration cost in Section V and show it is NPhard. We then describe locally optimal solutions to the problem in Section VI. Finally, we present results and conclusion in Section VII and VIII, respectively.
Ii Related Work
Though much research in multiview video has been focusing on compression (e.g., multiview video coding (MVC) [9]), streaming strategies and network optimization for multiview video is still a relatively unexplored and new research topic. [10] discusses an interactive multiview video streaming (IMVS) videoondemand scenario, where only a single requested view per client is needed at one time during video playback as the client periodically requests viewswitches. It proposes an efficient coding structure where a captured image can be encoded into multiple versions, so that the appropriate version can be transmitted depending on the currently available content in decoder’s buffer, in order to reduce server transmission rate. Later, [11] leverages on the IMVS coding structure for content replication, so that suitable versions of multiview video segments can be cached in a distributed manner across cooperative network servers.
Our current work on anchor view allocation differs from the above work in that: i) we consider the more general free viewpoint video, where, a client can select and synthesize any intermediate virtual view between two anchor views via DIBR; and ii) we focus on the live collaborative streaming scenario, where anchor views can be shared among peers that are synchronized in time but not necessarily in view.
There has been a large body of work on peertopeer (P2P) streaming, addressing different aspects of the problem. For example, [12, 13] study the structure and organization of streaming overlays, while the work of [14, 15] discuss the design and deployment of largescale P2P streaming systems through measurement on realworld streaming systems. All the previous works above study single view streaming, and the results cannot be applied to live free viewpoint video streaming, where anchorview selection is a critical and challenging issue.
There has been little work studying multiview streaming over P2P network. For example, the work of [16] proposes a scheduling algorithm that allows peers to frequently compute the scheduling of multiview segments. [17] studies achieving low viewswitch delay by organizing viewers with different views together. These works essentially treat multiview video as streaming of multiple singleview videos, and it is not clear how to extend them to live free viewpoint streaming where anchorview selection and its effect on distortion need to be considered. To the best of our knowledge, this is the first piece of work on collaborative streaming of interactive live free viewpoint video.
Iii Collaborative Streaming Model
Iiia Network Model
We model the free viewpoint video distribution network with two nodes: is the server node where live video streams originate, and is a single node representing a group of local peers with close geographical or network distance.^{2}^{2}2If the peer group is too large, subdivision into smaller groups for independent content sharing is also possible. Our current formulation can be easily extended to this case. The connection between server and peer group may be modeled as a hard constraint; i.e., the number of anchor views pulled from by cannot exceed . Alternatively, the connection may be modeled as a soft constraint; i.e., each anchor view pulled by incurs a cost in the total cost function. The different connection constraints are used later in the problem formulation.
IiiB Free Viewpoint Video Model
Let be a discrete set of captured viewpoints for equally spaced cameras in a 1D array as done in [1] and others. Each camera captures both a texture (RGB image) and depth map (perpixel physical distances between captured objects in the 3D scene and capturing camera) at the same resolution. Texture map from an intermediate virtual viewpoint between any two cameras can be synthesized using texture and depth maps of the two camera views (anchor views) via a depthimagebased rendering (DIBR) technique like 3D warping [3]. Disoccluded pixels in the synthesized view—pixel locations that are occluded in the two anchor views—can be filled using a depthbased inpainting technique like [18].
More specifically, denote a virtual viewpoint by that a peer currently requests for observation. We write as , , for some large .^{3}^{3}3Though we consider here equally spaced virtual views for ease of exposition, our analysis and algorithms can be easily generalized to uneven virtual view spacing as well. In other words, belongs to a discrete set of intermediate viewpoints between (and including) captured views and , spaced apart by integer multiples of distance ( approaches a continuum as increases). We consider that a distribution function describes the fraction of peers in the group who currently request virtual view . Any virtual view can be synthesized using left and right anchor views denoted as and , respectively, where and . Note that and do not have to be the closest captured views to . The distortion of the synthesized view varies with the choices of anchor views. Let be the distortion function of peers requesting virtual view , which is synthesized using , as anchors.
IiiB1 Monotonic Distortion model
A reasonable assumption on distortion is monotonicity with respect to anchor view distance [7]. It is not guaranteed that distortion always decreases with the distance between reference views, but this is true in the vast majority of the settings. We hence consider a monotonic distortion model in this paper: furtheraway anchor view does not lead to smaller resulting synthesized view distortion:
(1) 
IiiC Viewswitching Model
To model the viewswitching behavior of peers, we consider that a peer with current desired virtual view can switch in the next time instant to any virtual views ’s with probability , and is the viewtransition probability matrix. For example, if a peer stays in the current view with probability , and switches to any one of the two adjacent views with equal probability , we have the following transition probabilities:
(2) 
Iv Formulation I: no reconfiguration cost
In this section, we consider the case where the reconfiguration cost due to peers’ anchor view changes is negligible, e.g., peers tend to switch views infrequently, and hence the distribution network does not need to be reconfigured often. We now formulate the anchor view allocation problem formally as the interactive freeviewpoint live streaming (IFLS) problem.
Iva Optimization and System Variables
We first define the optimization variables, which are the same for all our formulations of the problem. Let be a purchased set of captured views selected by the peer group to serve as anchor views to synthesize virtual views requested. A peer of virtual view selects left and right anchor views, and from the purchased set to synthesize its desired virtual view . We consider the following anchor view selection constraint:
(3) 
In words, Equation (3) states that peer of virtual view must select from the left anchor view to the left of (i.e., ) and right anchor view to the right of (i.e., ). The selected anchor views , and will induce synthesized distortion , as discussed in Section IIIB. These are our variables to be optimized.
There is an access cost to purchase the set of anchor views by the peer group . If there is a hard connection constraint (or cost budget), we have
(4) 
One may alternatively consider a soft connection constraint, where the total access cost for the peer group is proportional to the number of anchor views purchased, i.e., . For now, we are only concerned with the access cost of camera views in the purchased set ; the question of how the cost should be fairly distributed to each peer is deferred to Section VIC.
If the connection is modeled as a hard constraint, the objective of the IFLS problem is to select a subset and anchor views for each virtual view , so as to minimize the aggregate distortion of all peers of all virtual views ’s, i.e.,
(5) 
subject to Constraints (3) and (4). We label this combinatorial optimization problem as IFLSH.
Alternatively, if the connection is modeled as a soft constraint, the objective becomes the combination of total distortion of all peers of all virtual views ’s plus the total access cost,
(6) 
subject to Constraint (3). We label this problem as IFLSS.
IvB Algorithm I: DP solution
Both IFLSH and IFLSS can be solved optimally in polynomial time via DP. We show here how IFLSS is solved; algorithm for IFLSH follows similar steps in a straightforward manner, and hence is omitted.
Define as the minimum cost for all peers interested in virtual views , where and are the nearest left and right anchor views that have already been purchased. The optimal solution of IFLSS can be found by a call to , where and are the leftmost and rightmost virtual views requested by the peer group, and and are the corresponding camera views just to the left and right of them, i.e.,
(7) 
Given above, can be recursively calculated as
(8)  
where is the virtual view of a peer to the left and nearest to new anchor view (), and is the virtual view of a peer to the right and nearest to . The loop invariant of Equation (8) is .
In words, Equation (8) states that is the smaller of:

Sum of synthesized distortion of virtual views ’s, , given that no more anchor views will be purchased (and hence and are the best anchor views for synthesis of views ).

Cost of one more anchor view , , which is the access cost plus the recursive cost using two virtualview ranges, given by and , that divide the original range .
The complexity of the solution given by Equation (8) can be analyzed as follows. Each time Equation (8) is solved for arguments , and , they can be stored in entry of a DP table so that any subsequent repeated subproblem can be simply looked up. Each computation of Equation (8) takes steps, and the size of the table is . This results in runtime complexity of .
V Formulation II: reconfiguration cost
As the video is played back, a peer may switch his observation viewpoint from a virtual view to a new view , where may fall outside the range spanned by the anchor views and . The network hence needs to be reconfigured to supply the peer with new anchors. If the reconfiguration cost is nonnegligible, the peer group would tend to choose anchors and that are further apart, so that the likelihood of the virtual view switching outside the range is low. In this section, we formulate the anchorview allocation problem with reconfiguration costs, termed freeviewpoint live streaming with viewswitching (FLSV).
Va Reconfiguration Cost
We define the reconfiguration cost as the probability that a peer requires new anchor views during the next viewswitches, given the current virtual view and the anchor views and . may be computed as follows. We first define a submatrix that contains only entries ’s, where , defined in Equation (2). Note that unlike , the sum of the entries in a row in does not need to add up to . We can write as a simple sum:
(9) 
where is the entry in matrix , the step transition probability. In words, Equation (9) states that the reconfiguration cost is one minus the probability that the peer stays within the range for all view switches.
VB Objective Function
We first consider the serverpeer cost as a hard constraint, and formulate the FLSVH optimization problem. The objective is to select a subset of camera views and to select anchor views for each virtual view within , in order to minimize the total distortion of all peers plus a reconfiguration cost weighted by , i.e.,
(10) 
We next consider the connection as a soft constraint. The objective then becomes the sum of the distortion, reconfiguration cost, plus total access cost, i.e.,
(11) 
subject to Constraint (3). This problem is FLSVS.
VC NPHardness Proof
Both FLSVH and FLSVS are NPhard. We present the proof of FLSVH here; the proof of FLSVS follows similar argument and is discussed in the Appendix.
We show that the well known NPcomplete Minimum Cover (MC) problem is polynomialtime reducible to a special case of FLSVH. In MC, a collection of subsets of a finite item set is given. The decision problem is: does contain a cover for of size at most , i.e., a subset where , such that every item in belongs to at least one subset of ?
Consider a special case of FLSVH where in the optimal solution, all peers use the leftmost camera view 1 as their left anchor view. This is the case if the synthesized distortion for each peer of view is a local minimum whenever view 1 is used as left anchor, i.e., . Hence all peers will share view 1 as left anchor view, and need to select only right anchor view to minimize the aggregate cost in Equation (10).
We first map items in set to consecutive virtual views ’s (each with ) just to the right of leftmost captured view . We map subsets in collection to captured views ’s to the right of the virtual views ’s. We next construct reconfiguration cost by assuming a viewswitching probability in (1) and , resulting in a decreasing as function of for all virtual views ’s, as shown in Figure 1.
We first set distortion for peers of virtual views ’s such that the aggregate cost is a constant , i.e., . Then for each item in subset , we reset distortion (of virtual view corresponding to item and of anchor view corresponding to set ) to distortion of anchor view . Note that the distortion function remains monotonically nondecreasing.
Figure 1 shows an example of the aggregate cost for peer of virtual view , where is the distortion and is the reconfiguration cost. Note that except for and . If an optimal solution to FLSVH with constraint has a total cost less than , then the selected camera views will correspond to in . Hence MC is a special case of FLSVH.
Vi Algorithm II: heuristics
In this section, we present heuristic algorithms to address the anchor view selection problem with reconfiguration cost. We first present a centralized and locally optimal algorithm based on Lloyd’s algorithm [8] in nonuniform scalar quantization. Then we present a distributed algorithm with guranteed convergence, followed by the fair access cost allocation mechanism.
Via Local Optimum with Lloyd’s Algorithm
We present here a lowcomplexity centralized optimization algorithm that converges to a locally optimal solution for FLSV. We first observe that for a given subset of camera views with a fixed access cost , a peer of virtual view can independently select and from in order to minimize its own sum of distortion and reconfiguration cost given by . This potentially leads to a better global solution. In other words, a solution cannot be globally optimal if a peer of a view can lower his own sum of distortion and reconfiguration cost by choosing a different left or right anchor views from the same purchased set . We formalize this necessary condition for global optimality in the following lemma.
Lemma 1
If , ’s and ’s are a set of optimal variables, then peer(s) of any virtual view cannot switch from a selected left anchor view to another anchor view and lower the overall cost.
The above Lemma also holds for switching of right anchor view to lower overall cost.
While the first lemma is concerned with switching of anchor views within a fixed subset of camera views, we can similarly construct a second Lemma concerning a selected camera view being replaced by another camera view .
Lemma 2
If , ’s and ’s are a set of optimal variables, then one cannot replace a selected camera view with an unselected camera view , so that peers of views ’s that currently select camera view as anchor, i.e. or , switch to as anchor, and lower overall cost.
These two Lemmas are analogous to the two necessary conditions in optimizing nonuniform scalar quantization (SQ). SQ is the problem of quantizing a large number of samples in space into Voronoi regions for compact representation, so that only bits are required to represent a sample with minimal distortion. The first necessary optimal condition for SQ is that each sample can freely select a Voronoi region to represent itself, one whose centroid has the minimum distance to itself (minimum distortion). This is similar to our first Lemma. In the second optimal condition, each Voronoi region can freely select a centroid that minimizes the sum of distance to all samples in the region. This is similar to our second Lemma.
Due to the similarity of our problem to SQ, we can deploy a modified version of the famed Lloyd’s algorithm to solve our problem. We call our algorithm the centralized peer grouping (CPG) algorithm.
For FLSVH, we first pull the leftmost and rightmost camera views from the server, and then a total number of camera views are randomly pulled in between. For each peer we calculate the optimal anchor views (chosen from camera views) that minimizes the sum of its distortion and reconfiguration cost. Similar to the Lloyd’s algorithm, we iteratively adjust the positions of camera views to reduce the total costs of all peers in the group. In each iteration, we go through each one of camera views, calculate the new total costs if we shift the camera view one step towards its left and right. If the new total cost is lower than the original, we substitute the camera view with the one to its left (or right). The algorithm stops when the total cost of peers cannot be further reduced. It is guaranteed to converge since the total cost only decreases in each iteration.
For FLSVS, we run the above procedure times with to , and then choose the optimal that gives us the minimum total cost due to distortion, reconfiguration and access.
ViB Distributed Heuristic
The centralized algorithm presented above is able to find a nearly optimal FLSV solution by assigning anchor views to each peer. The solution is suitable when there is a central controller, and the network is not large or highly dynamic (with peer arrivals, many view switchings and departures). In this section, we present a simple, adaptive and distributed heuristic for collaborative sharing of anchor views, or equivalently for constructing the overlay P2P network, which scales well to large network with peer churns. We call this distributed heuristic the distributed peer grouping (DPG) algorithm.
In a peer group, peers watching the same or adjacent virtual views are organized into “coalitions”. Figure 2 shows an example of how the peer coalitions are formed, where are virtual views. Peers watching virtual views between and are organized into a coalition, i.e., Coalition 1. All peers that belong to the same coalition share anchor views and thus access costs. There is a leader peer (marked in white) in each coalition, which keeps track of the number of peers watching each virtual view and of the total cost of the whole coalition. It periodically exchanges the cost information with both neighoring coalitions on each side. Two neighboring coalitions may merge into a new bigger coalition, and a coalition may also split into two coalitions if the overall cost can be reduced. We discuss algorithms for peer joins, coalition merge and split, peer leaves and view switching in the following.
Peer Join: When a new peer arrives, it first contacts a Rendezvous Point (RP) that forwards it to the peer group that belongs to. This could be done with an IP address lookup. If there is an existing coalition that covers the virtual view peer requests in the peer group, RP connects with the leader node of the coalition . The node joins coalition and starts to pull anchor views from other peers in the coalition. The leader peer of updates the cost and information of the coalition. However, if the virtual view requested by peer is not in the range of any coalition, a new coalition will be created, and becomes the leader of the coalition. It pulls the anchor views from the streaming server that minimizes its own costs (distortion and reconfiguration cost).
Coalition merge: The coalition structure is adaptive to peer churns, which keeps the P2P network optimized. The leader peers of each coalition periodically exchange information with neighboring leaders. Let , be the cost for and respectively, and be the optimal cost from the result of the CPG algorithm run on if and merge and cooperate. If , the two coalitions and are merged. Let be the optimal set of anchor views returned by the CPG algorithm. Each peer in the merged coalition adapts to new anchor views and that give the minimum cost (). The leader who requested the merge becomes the new leader of the merged coalition.
Coalition split: For a big coalitation , the leader periodically examines whether splitting into two coalitions leads to lower cost. Let be a virtual view separating into two coalitions , . For each different , the leader runs the CPG algorithm on both and . If the combination of optimal costs is smaller than , then is split into and , and a new leader will be randomly selected for the newly created coalition.
Peer leave: When a peer is about to leave, all content sharing between and its neighbors is stopped, and the leader node updates the cost of the coalition. If the leader node leaves, a new leader is randomly chosen.
View switch: A peer could switch the virtual view it currently watches in the middle of a streaming session. If the new virtual view is still within the range of the coalition, peer can still pull anchor views from other peers and synthesize the new view. There will be no change of the overlay structure. However, if the new virtual view goes out of the range of the coalition, the peer will leave the current coalition and join (or create) a new coalition. It follows the same process as in the situation where peers join or leave the system.
ViC Fair cost allocation within a coalition
We propose a mechanism to fairly distribute the access costs to each peer for the DPG algorithm described in section VIB. From the above discussion, cooperation enables peers watching adjacent views to share the anchor views and thus the access cost. It helps to reduce the total cost of all users. As peers in P2P networks are selfish and rational, an important issue in our live free viewpoint video streaming problem is the fair allocation of the cost among peers in a coalition, so that our solution does not only minimize the total cost of the entire P2P network, but also helps each user to lower its own cost. As such, no user is willing to deviate from the proposed solution, and the constructed overlay P2P network is stable.
Coalitional game theory provides an ideal tool to provide fair rules for cost reduction via cooperation in our freeviewpoint live streaming problem [19]. Consider a coalition with peers who watch neighboring views and share the anchor views and the access cost. Let be a subgroup of users in watching nearby views, where is the total cost of peers in if they decide to cooperate, with L being the cost function defined in (11). An allocation vector divides the total cost among its members, where is the cost (including view distortion, access cost and reconfiguration cost) assigned to user .^{4}^{4}4Note that from Section IIIB and Equation (9), users’ view distortion and reconfiguration costs are fixed once the set of anchor views is selected, and only their access costs can be adjusted to achieve fairness among peers in a coalition. In our work, given the desired allocation , we adjust users’ access costs to ensure that user ’s total cost is .
Given an allocation , define the excess of a subgroup (with respect to ) as , which is the extra cost incurred to if they deviate from the coalition and the allocation but form a coalition themselves. If , the subgroup has no incentive to deviate from the coalition . For an allocation, if its excesses are all nonnegative, then users in C have an incentive to stay in , and our goal is to find such stable coalitions and allocations.
Finding such stable allocations is often difficult, and a well known fair solution is the nucleolus [19, 20]. The nucleolus always exists and is unique. It maximizes the excesses in the nondecreasing order, or equivalently, minimizes peers’ dissatisfaction in the nonincreasing order. Moreover, it is one of the stable allocations if they exist. The nucleolus is defined as follows. Given an allocation , let be the vector of all excesses sorted in the nondecreasing order. The nucleolus is the unique allocation that lexicographically maximizes over all allocations, that is, .^{5}^{5}5A vector is said to be lexicographically larger than vector () if in the first component that they differ, that component of is larger than that of .
To compute the nucleolus, we follow the above definition and solve a sequence of linear programs as follows [20]. We first solve the following problem
(14)  
which adds constraints on the allocation vectors to maximize the smallest excess. Let be the optimal solution of (), which is the maximal smallest excess, and let be the collection of all subgroups whose excesses are equal to . We then solve
(18)  
which maximizes the second smallest excess. We continue this way until there is only one allocation that satisfies all the constraints in the optimal solution, and that allocation is the nucleolus.
In DPG, we apply the above procedure to compute the nucleolus for each coalition found by the algorithm.
Vii Experimentation
In this section we present illustrative simulation results. In simulations, we assume the distortion function has the following form:
(19) 
Note that if virtual view is actually one of the anchor views, then the distortion is zero. The rate at which the distortion increases with the distance between anchor views, depends on the parameters and .
Unless otherwise stated, we use the following baseline parameters in our simulation: number of captured views: 21, number of virtual views: 200, number of peers: 10000, . We assume that the distribution of peers watching each virtual view follows a normal distribution. We have also run our simulations on different peer distributions. The results of those simulations are qualitatively the same as what is presented here, and hence are not shown for the sake of brevity.
Viia Results for Negligible Reconfiguration Cost
We compare the DPbased optimal solution with a simple P2P approach for solving the IFLS problem. In the latter simple P2P approach, peers independently choose the anchor views that minimize their own distortion. The access costs of each anchor view are shared by all users that request it. There is no collaboration on anchor selections among peers.
Figure 5 shows the total cost (distortion plus access costs) for the peers as a function of the price of camera views. It is shown that our CPG algorithm gives much better results than the simple P2P approach, especially when the price is high. This is because, in the DP algorithm, the peers can collaboratively select and share the same anchor views to reduce the access cost, with a small price in distortion penalty. Therefore, fewer captured views are pulled from the server, and the total cost is minimized.
ViiB Results for NonNegligible Reconfiguration Cost
We carried out simulation to evaluate the performance of our proposed CPG and DPG algorithms with the optimal solution (Optimal), and the simple P2P approach. The optimal solution is obtained through exhaustive search. The simple P2P approach is similar to the one we used in IFLS except that peers choose anchor views to minimize their own total cost.
Figure 5 shows the total cost of all peers versus the price of a captured view. It is shown in the figure that the total cost increases with the price of a camera view. This is because a higher view price leads to a higher access cost, and peers tend to share the same anchor views with others so they can share the cost of common anchor views from the streaming server. This, in turn, increases other cost components, i.e., distortion and reconfiguration costs. From the figure, we see that CPG performs very close to the global optimal solution. The anchor views can successfully adapt to good positions to minimize the total costs of all peers. DPG is also very efficient in reducing the total cost, especially when the price of a captured view is high. DPG does not outperform simple P2P when the view price is low due to the lack of global information.
Figure 5 shows the total number of views pulled from the streaming server as a function of access cost of an anchor view. The number drops with the increase in the price of access cost. When requesting a captured view from the streaming server becomes expensive, in order to reduce their access costs, peers tend to seek more cooperation by using the same anchor views and sharing the access cost. Therefore, the total number of camera views pulled from the streaming server becomes smaller. In DPG, the total number of views pulled could be higher than the total number of camera views since peers only share the access costs within the same coalition, and a captured view could be pulled multiple times by peers from different coalitions.
Figure 8 shows the number of coalitions formed by Heuristics algorithm. The number of coalitions drops with the price of a captured view. When the anchor views are expensive, neighboring coalitions are more likely to merge into a bigger one so that the access costs could be shared by more peers. The Heuristics can efficiently rearrange the topology to minimize the total cost when the view prices changes.
Figure 8 shows the total cost of all peers versus peer population. The total cost increases with the number of peers. Simple P2P performs the worst. It has very high total cost even when the number of peers is low. This is due to the lack of collaboration in peer anchor selections. DPG and CPG achieve closetooptimal performance. When there are fewer peers in the system, they tend to use same anchor views to reduce access cost, with a penalty in other cost components. When the peer population increases, each peer can choose better anchor views that leads to a lower distortion and reconfiguraiton cost, since there will be more neighbors to share the access costs.
Figure 8 shows the cost components of CPG algorithm. With the increase of view price, access cost becomes the major component of the total cost. Distortion and reconfiguration costs also increase because peers compromise to suboptimal anchor views (in terms of distortion and reconfiguration) so that their access costs can be shared with a larger crowd. The cost components of DPG are qualitatively the same as CPG, and hence are not shown for brevity.
Viii Conclusion
In this paper we study the design and optimization of interactive P2P streaming of live free viewpoint video. In free viewpoint live streaming, peers could select different virtual viewpoints, which are synthesized using texture and depth videos of the anchor views captured by multiple cameras. The access cost of common anchor views are collectively shared by peers with a price of higher distortion. We formulate two problems, IFLS with negligible reconfiguration cost, and FLSV with nonenegligible reconfiguration cost. Then we provide a DPbased optimal solution for IFLS, and heuristic algorithms for FLSV. The simulation results show that our proposed algorithms achieve respective optimal and closetooptimal performance in terms of total cost, and substantially outperform a P2P scheme without collaborative anchor selection.
References
 [1] T. Fujii, K. Mori, K. Takeda, K. Mase, M. Tanimoto, and Y. Suenaga, “Multipoint measuring system for video and sound—100 camera and microphone system,” in IEEE International Conference on Multimedia and Expo, Toronto, Canada, July 2006.
 [2] S. Gokturk, H. Yalcin, and C. Bamji, “A timeofflight depth sensor—system description, issues and solutions,” in Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Washington, DC, June 2004.
 [3] W. Mark, L. McMillan, and G. Bishop, “Postrendering 3D warping,” in Symposium on Interactive 3D Graphics, New York, NY, April 1997.
 [4] M. Tanimoto, M. P. Tehrani, T. Fujii, and T. Yendo, “Freeviewpoint TV,” in IEEE Signal Processing Magazine, vol. 28, no.1, January 2011.
 [5] C. Zhang, Z. Yin, and D. Florencio, “Improving depth perception with motion parallax and its application in teleconferencing,” in IEEE International Workshop on Multimedia Signal Processing, Rio de Jeneiro, Brazil, October 2009.
 [6] S. Reichelt, R. Hausselr, G. Futterer, and N. Leister, “Depth cues in human visual perception and their realization in 3D displays,” in SPIE ThreeDimensional Imaging, Visualization, and Display 2010, Orlando, FL, April 2010.
 [7] G. Cheung, V. Velisavljevic, and A. Ortega, “On dependent bit allocation for multiview image coding with depthimagebased rendering,” in IEEE Transactions on Image Processing, vol. 20, no.11, March 2011, pp. 3179–3194.
 [8] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992.
 [9] M. Flierl, A. Mavlankar, and B. Girod, “Motion and disparity compensated coding for multiview video,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no.11, November 2007, pp. 1474–1484.
 [10] G. Cheung, A. Ortega, and N.M. Cheung, “Interactive streaming of stored multiview video using redundant frame structures,” in IEEE Transactions on Image Processing, vol. 20, no.3, March 2011, pp. 744–761.
 [11] H. Huang, B. Zhang, G. Chan, G. Cheung, and P. Frossard, “Coding and replication codesign for interactive multiview video streaming,” in miniconference in IEEE INFOCOM, Orlando, FL, March 2012.
 [12] N. Magharei, R. Rejaie, and Y. Guo, “Mesh or multipletree: A comparative study of live P2P streaming approaches,” in INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, may 2007, pp. 1424 –1432.
 [13] X. Lu, Q. Wu, R. Li, and Y. Lin, “On tree construction of super peers for hybrid P2P live media streaming,” in Computer Communications and Networks (ICCCN), 2010 Proceedings of 19th International Conference on, aug. 2010, pp. 1 –6.
 [14] L. Vu, I. Gupta, K. Nahrstedt, and J. Liang, “Understanding overlay characteristics of a largescale peertopeer IPTV system,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 6, no. 4, Nov. 2010.
 [15] H. Chang, S. Jamin, and W. Wang, “Live streaming with receiverbased peerdivision multiplexing,” Networking, IEEE/ACM Transactions on, vol. 19, no. 1, pp. 55 –68, feb 2011.
 [16] Y. Ding and J. Liu, “Efficient stereo segment scheduling in peertopeer 3D/multiview video streaming,” in PeertoPeer Computing, 2011 IEEE International Conference on, sept 2011, pp. 182 –191.
 [17] Z. Chen, L. Sun, and S. Yang, “Overcoming view switching dynamic in multiview video streaming over P2P network,” in 3DTVConference: The True Vision  Capture, Transmission and Display of 3D Video (3DTVCON), 2010, june 2010, pp. 1 –4.
 [18] K. Oh, S. Yea, and Y.S. Ho, “Holefilling method using depth based inpainting for view synthesis in free viewpoint television and 3D video,” in 27th Picture Coding Symposium, Chicago, IL, May 2009.
 [19] G. Owen, Game Theory. Academic Press, 1995.
 [20] U. Faigle, W. Kern, and J. Kuipers, “On the computation of the nucleolus of a cooperative game,” International Journal of Game Theory, vol. 30, no. 1, pp. 79–98, Sept. 2001.
We prove that FLSVS is also NPhard, by reducing the NPcomplete MC problem to a special case of FLSVS. Following similar construction in the proof for FLSVH, we first map items in set to virtual views ’s (each with ) to the right of leftmost captured view , and map subsets in collection to captured views ’s to the right of the virtual views. Consider again the case where the optimal solution has all peers sharing view as their left anchor.
We construct reconfiguration cost as done in the FLSVH proof. Next, we identify the smallest for all ’s and ’s for which and correspond to an item and a subset in original MC problem, respectively. Let . We then construct to be if the subset corresponding to contains the item corresponding to , and otherwise. That means that a virtual view covered by a camera view will have a decrease of in distortion. Note that by definition of , is monotonically nondecreasing. Finally, we define the access cost , which means that purchasing all the captured views ’s is cheaper than paying for for a virtual view uncovered by a captured view .
We now claim that, if the optimal solution to FLSVS has access cost smaller than , then the corresponding MC decision problem is positive, and vice versa. The reason is the following. Under the above construction, FLSVS can always find a solution that covers all virtual views ’s (items in MC) with camera views ’s. If the minimum cost solution requires or fewer captured views, then the corresponding subsets will cover all items in in MC.