CBA: Contextual Quality Adaptation for Adaptive Bitrate Video Streaming
(Extended Version)^{†}^{†}thanks: This work has been funded by the German Research Foundation (DFG) as part of the projects B4 and C3 within the Collaborative Research Center (CRC) 1053 – MAKI.
Abstract
Recent advances in quality adaptation algorithms leave adaptive bitrate (ABR) streaming architectures at a crossroads: When determining the sustainable video quality one may either rely on the information gathered at the client vantage point or on server and network assistance. The fundamental problem here is to determine how valuable either information is for the adaptation decision. This problem becomes particularly hard in future Internet settings such as Named Data Networking (NDN) where the notion of a network connection does not exist.
In this paper, we provide a fresh view on ABR quality adaptation for QoE maximization, which we formalize as a decision problem under uncertainty, and for which we contribute a sparse Bayesian contextual bandit algorithm denoted CBA. This allows taking highdimensional streaming context information, including clientmeasured variables and network assistance, to find online the most valuable information for the quality adaptation. Since sparse Bayesian estimation is computationally expensive, we develop a fast new inference scheme to support online video adaptation. We perform an extensive evaluation of our adaptation algorithm in the particularly challenging setting of NDN, where we use an emulation testbed to demonstrate the efficacy of CBA compared to stateoftheart algorithms.
I Introduction
Video streaming services such as Netflix, YouTube, and Twitch, which constitute an overwhelming share of current Internet traffic, use adaptive bitrate streaming algorithms that try to find the most suitable video quality representation given the client’s networking conditions. Current architectures use Dynamic Adaptive Streaming over HTTP (DASH) in conjunction with clientdriven algorithms to adjust the quality bitrate of each video segment based on various signals, such as measured throughput, buffer filling, and derivatives thereof. In contrast, new architectures such as SAND [1] introduce networkassisted streaming via DASHenabled network elements that provide the client with guidance, such as accurate throughput measurements and source recommendations. Given the various adaptation algorithms that exist in addition to clientside and networkassisted information, a fundamental question arises on the importance of this context information for the Quality of Experience (QoE) of the video stream.
The problem of video quality adaptation is aggravated in Future Internet architectures such as Named Data Networking (NDN). In NDN, content is requested by name rather than location, and each node within the network will either return the requested content or forward the request. Routers are equipped with caches to hold frequentlyrequested content, thereby reducing the roundtriptime (RTT) of the request while simultaneously saving other network links from redundant content requests. Several attempts to make DASHstyle streaming possible over NDN exist, e.g., [2], for which the key difficulty is that traditional algorithms rarely play to the strengths of NDN where the notion of a connection does not exist. Throughput, for example, is not a trivial signal in NDN as data may not be coming from the same source.
In this paper, we closely look at the problem of using context information available to the client for video quality adaptation. Note that our problem description is agnostic to the underlying networking paradigm, making it a good fit to traditional IPbased video streaming as well as NDN. In essence, we consider the fundamental problem of sequential decisionmaking under uncertainty where the client uses network context information received with every fetched video segment. In Fig. 1 we show a sketch where the client adaptation algorithm decides on the quality of the next segment based on a highdimensional network context. We model the client’s decision on a video segment quality as a contextual multiarmed bandit problem aiming to optimize an objective QoE metric that comprises (i) the average video quality bitrate, (ii) the quality degradation, and (iii) the video stalling.
One major challenge with incorporating highdimensional network context information in video quality adaptation is extracting the information that is most relevant to the sought QoE metric. We note that the interactions within this context space become complicated given the NDN architecture, where the network topology and cache states influence the streaming session. Our approach introduces a sparse Bayesian contextual bandit algorithm that is fast enough to run online during video playback. The rationale behind the sparsity is that the given information, including networkassisted and clientside measured signals such as buffer filling and throughput, constitutes a highdimensional context which is difficult to model in detail. Our intuition is that, depending on the client’s network context, only a few input variables have a significant impact on QoE. Note, however, that sparse Bayesian estimation is usually computationally expensive. Hence, we develop here a fast new inference scheme to support online quality adaptation.
Our contributions in this paper can be summarized as:

We formulate the quality adaptation decision for QoE maximization in ABR video streaming as a contextual multiarmed bandit problem.

We provide a sparse Bayesian contextual bandit algorithm, denoted CBA, which is computationally fast enough to provide realworld video players with quality adaptation decisions based on the network context.

We show emulation testbed results and demonstrate the fundamental differences to the established stateoftheart quality adaptation algorithms, especially given an NDN architecture.
The developed software is provided here^{1}^{1}1https://github.com/arizk/cbapipelinepublic. The remainder of this paper is organized as follows: In Sect. II, we review relevant related work on ABR video streaming and contextual bandits. In Sect. III, we present the relevant background on ABR video streaming. In Sect. IV, we model the quality adaptation problem as a contextual multiarmed bandit problem before providing a fast contextual bandit algorithm for highdimensional information. In Sect. V, we show how ABR streaming uses CBA and define a QoEbased reward. We describe the evaluation testbed before providing emulation results in Sect. VI. Section VII concludes the paper.
Ii Related Work
In the following, we split the stateoftheart related work into two categories; i.e., work on ABR quality adaptation, especially in NDN, and related work on contextual bandit algorithms with highdimensional covariates.
Significant amounts of research have been given to finding streaming architectures capable of satisfying high bitrate and minimal rebuffering requirements at scale. CDN brokers such as Conviva [3] allow content producers to easily use multiple CDNs, and are becoming crucial to meet user demand [4]. Furthermore, the use of network assistance in CDNs has received significant attention recently as a method of directly providing network details to DASH players. SAND [1] is an ISO standard which permits DASH enabled innetwork entities to communicate with clients and offer them QoS information. SDNDASH [5] is another such architecture aiming to maintain QoE stability across clients, as clients without network assistance information are prone to misjudge current network conditions, causing QoE to oscillate. Beyond HTTP, the capabilities of promising new network paradigms such as NDN pose challenges to video streaming. The authors of [2] compare three stateoftheart DASH adaptation algorithms over NDN and TCP/IP, finding NDN performance to notably exceed that of TCP/IP given certain network conditions. New adaptation algorithms specific to NDN have also been proposed, such as NDNLive [6], which uses a simple RTT mechanism to stream live content with minimal rebuffering.
In this work, we model the video quality adaptation problem as a contextual bandit problem assuming a linear parametrization, which has successfully been used, e.g., for ad placement [7]. Another promising approach is based on costsensitive classification in the bandit setting [8]. Recently, [9] has discussed the use of variational inference in the bandit setting, wherein Thompson sampling is considered to cope with the explorationexploitation tradeoff. By assuming a highdimensional linear parametrization, we make use of sparse estimation techniques. Highdimensional information arises in video streaming due to the network context. Sparsity has been a major topic in statistical modeling and many Bayesian approaches have been proposed. Traditionally, double exponential priors which correspond to regularization have been used. However, these priors often fail due to limited flexibility in their shrinkage behavior. Other approaches that induce sparsity include ’spikeandslab’ priors [10] and continuous shrinkage priors. Between these two, continuous shrinkage priors have the benefit of often being computationally faster [11]. For our approach we use the Three Parameter Beta Normal (TPBN) continuous shrinkage prior introduced by [11], which generalizes diverse shrinkage priors, e.g, the horseshoe prior [12], the StrawdermanBerger prior, the normalexponentialgamma prior, and the normalgamma prior.
Iii Adaptive Bitrate Streaming: Decisions under Uncertainty
In this section, we review the established model for quality adaptation in ABR video streaming and highlight the changes that arise when streaming over NDN.
Iiia Adaptive Bitrate Streaming: A Primer
In adaptive bitrate streaming, the content provider offers multiple qualities of the same content to clients, who decide which one to pick according to their own clientside logic. Each video is divided into consecutive segments which represents some fixed seconds of content. These segments are encoded at multiple bitrates corresponding to the perceived average segment quality. In practice, segment lengths are often chosen to be two to ten seconds [13] with several distinct quality levels to choose from, such as 720p and 1080p. Let represent the set of all available video qualities, such that and for all ; i.e., a higher index indicates a higher bitrate and better quality. Let the th segment encoded at the th quality be denoted .
Received video segments are placed into a playback buffer which contains downloaded, unplayed video segments. Let the number of seconds in the buffer when segment is received be , and let the playback buffer size BUF_MAX be the maximum allowed seconds of video in the buffer. By convention, we define and write the recursion of the buffer filling as , where denotes the fetch time for . A stalling event is ascribed to the th segment when . Note that the recursion above holds only if ; i.e., the client is blocked from fetching new segments if the playback buffer is full. If this occurs, the client idles for exactly seconds before resuming segment fetching. In some related work [13], BUF_MAX is chosen between 10 and 30 seconds.
To allow the client to select a segment in the th quality, the client fetches a Media Presentation Description (MPD), an XMLlike file with information on the available video segments and quality levels, during session initialization. After obtaining the MPD, the client may begin to request each segment according to its adaptation algorithm. In general, uncertainty exists over the segment fetch time. The most prevalent quality adaptation algorithms take throughput estimates [14] or the current buffer filling [15], or combinations and functions thereof to make a decision on the quality of the next segment . The decision aims to find the segment quality which maximizes a QoE metric, such as the average video bitrate, or compound metrics taking the bitrate, bitrate variations, and stalling events into account.
IiiB Streaming over Named Data Networking
In NDN, consumers or clients issue interests which are forwarded to content producers, i.e., origin servers, via cachingenabled network routers. These interests are eventually answered with data provided by the producer or an intermediary router cache. To request a video, a consumer will first issue an interest for the MPD of the video. Each is given a name in the MPD, e.g., of the form /video ID/quality level/segment number. The client issues an interest for each data packet when requesting a particular segment. Since NDN data packets are of a small, fixed size, higherquality video segments will require more data packets to encode. We do not permit the client to drop frames, so all data packets belonging to some segment must be in the playback buffer to watch that segment.
Iv A Fast Contextual Bandit Algorithm for High Dimensional Covariates
In this work, we model the problem of video quality adaptation as a sequential decisionmaking problem under uncertainty, for which a successful framework is given by the multiarmed bandit problem dating back to [16]. The contextual bandit problem [17] is an extension to the classic problem, where additional information is revealed sequentially. The decisionmaking can therefore be seen as a sequential game.
At decision step , i.e., at the th segment, a learner observes a dimensional context variable for a set of actions . Here, the actions map to the video qualities that the client chooses from. The client chooses an action , for which it observes a reward . This reward can be measured in terms of lowlevel metrics such as fetching time or, as we consider later, QoE. The decision making is performed over a typically unknown decision horizon , i.e., . Therefore, the learner tries to maximize the cumulative reward until the decision horizon. It is important to note that after each decision the learner only observes the reward associated to the played action ; hypothetical rewards for other actions , are not revealed to the learner.
Next, we model the contextual bandit problem under the linearizability assumption, as introduced in [18]. Here, we assume that a parameter controls the mean reward of each action at decision step as . We introduce the regret of an algorithm to evaluate its performance as
(1) 
with . The regret compares the cumulative reward of the algorithm against the cumulative reward with hindsight. In order to develop algorithms with a small regret in the linear setting, many different strategies have been proposed. Such algorithms include techniques based on forced sampling [19], Thompson sampling [20], and the upper confidence bound (UCB) [18, 7, 21, 22].
Networkassisted video streaming environments provide highdimensional context information, so it is natural to assume a sparse parameter . We therefore impose a sparsityinducing prior on the sought regression coefficients . To cope with the contextual bandit setting, we start with the BayesUCB algorithm with liner bandits introduced in [23] and develop a version which fits the given problem. Since previously developed sparse Bayesian inference algorithms are computationally expensive, we develop a fast new inference scheme for the contextual bandit setting.
Iva The Contextual BayesUCB Algorithm  CBA
The Contextual BayesUCB algorithm (CBAUCB) selects in each round the action which maximizes the index
(2) 
where is a width parameter for the UCB and is the quantile function associated with the distribution , i.e., , with . Additionally, we denote as the posterior distribution of the mean reward
(3) 
where is the set of data points of contexts and rewards for which action was previously played
(4) 
In the following subsections, we derive a Gaussian distribution for the posterior distribution of the regression coefficients . In this case the index in (2) reduces to
(5) 
where the quantile function computes to , with the inverse error function . The algorithm for CBAUCB is depicted in Fig. 2, Alg. 1.
Algorithm 2: subroutine SVI
Input: ( design matrix), ( response vector), , , , (hyperparameter), (step size schedule)
Output: , (updated parameters)
Initialize all natural parameters , , , , , (iteration step) while ELBO not converged do
draw a random sample from calculate intermediate parameters with Eq. (15) do gradient update with Eq. (14) and step size update the variational parametrization with Eq. (16) update the moments with Eq. (11)
end while
return ,

Algorithm 3: subroutine VB
Input: ( design matrix), ( response vector), , , , (hyperparameter)
Output: , (updated parameters)
end while
return ,

Algorithm 4: subroutine OSSVI
Input: (context vector for the last played action), (reward for the last played action), , , , (hyperparameter), (step size), (current decision step)
Output: , (updated parameters)

IvB Generative model of the linear rewards
Here, we derive the posterior inference for the regression coefficients . The posterior distributions are calculated for each of the actions. For the inference of the posterior (3), we use Bayesian regression to infer the posterior of the regression coefficients^{2}^{2}2For readability we drop the dependency on of the regression coefficients . We use the data , which is a set of previously observed contexts and rewards when taking action .
Assuming a linear regression model with i.i.d. noise the regression response follows the likelihood
where is the noise precision for the regression problem. For the application of video streaming with highdimensional context information, we use a sparsity inducing prior over the regression coefficients to find the most valuable context information. We use here the Three Parameter Beta Normal (TPBN) continuous shrinkage prior introduced by [11] , which puts on each regression coefficient , , the following hierarchical prior
(6) 
where is a Gamma distributed^{3}^{3}3We use the shape and rate parametrization of the Gamma distribution. continuous shrinkage parameter that shrinks , as gets small. The parameter controls via a global shrinkage parameter parameter . For appropriate hyperparameter choice of and different shrinkage prior are obtained. For example we use , which corresponds to the horseshoe prior [12]. For notational simplicity, we collect the parameters for the context dimensions in the column vectors , respectively.
For the estimation of the global shrinkage parameter an additional hierarchy is used as and . For the noise precision a gamma prior is used , with hyper parameters and . The graphical model [24] of this generative model is depicted in Fig. 3.
IvC Variational Bayesian Inference (VB)
In the following, we review the general approximate inference scheme of mean field variational Bayes (VB) and the application to the linear regression with TPBN prior as proposed in [11]. Thereafter, we leverage stochastic variational inference (SVI) to develop a new contextual bandit algorithm.
Since exact inference of the posterior distribution is intractable [25], we apply approximate inference in form of variational Bayes (VB) for posterior inference. We use a mean field variational approximation, with for the approximate distribution. The variational distributions are obtained by minimizing the KullbackLeibler (KL) divergence between the variational distribution and the intractable posterior distribution
(7) 
By Jensen’s inequality, a lower bound on the marginal likelihood (evidence) can be found
(8) 
The evidence lower bound (ELBO) is used for solving the optimization problem over the KL divergence (7), since maximizing is equivalent to minimizing the KL divergence. Using calculus of variations [25], the solution of the optimization problem can be found with the following optimal variational distributions^{4}^{4}4 denotes the generalized inverse Gaussian distribution, see Appendix A.
(9) 
with the parameters of the variational distributions
(10) 
and the moments
(11) 
where is the modified Bessel function of second kind. The calculation of the ELBO is provided in Appendix B. Fig 4 shows the probabilistic graphical model of the mean field approximation for the generative model. Note the factorization of the random variables which enables tractable posterior inference in comparison to the probabilistic graphical model for the coupled Bayesian regression in Fig. 3.
A local optimum of the ELBO can be found by cycling through the coupled moments of the variational distributions. This corresponds to a coordinate ascent algorithm on . The corresponding algorithm is shown in Fig. 2 Alg. 3.
IvD Stochastic Variational Inference (SVI)
Next, we present a new posterior inference scheme with TPBN prior based on stochastic variational inference (SVI) [26]. We optimize the ELBO by the use of stochastic approximation [27] where we calculate the natural gradient that is obtained with respect to the natural parameters of the exponential family distribution of the mean field variational distributions.
Consider the mean field approximation for the intractable posterior distribution , where and denote the tuple of parameters and the data, respectively. For each factor , assuming it belongs to the exponential family, the probability density is
Here, denotes the base measure, are the natural parameters, is the sufficient statistics of the natural parameters, and is the lognormalizer.
We compute the natural gradient of the ELBO with respect to the natural parameters of the factorized variational distributions for each variational factor . Therefore, the natural gradient computes to
(12) 
where . The parameter is the natural parameter of the full conditional distribution , where denotes the tuple of all variables but . Using a gradient update the variational approximation can be found as
(13) 
where denotes the iteration step of the algorithm and is a step size parameter.
Random subsampling of the data enables constructing a stochastic approximation algorithm. For this, is replaced by an unbiased estimate , which yields a stochastic gradient ascent algorithm on the ELBO in the form
(14) 
For the step size we use . In the case of the regression problem, we sample one data point from the set of observed data points and replicate it times to calculate . The intermediate estimates of the natural parameters are then obtained by
(15) 
The derivation is provided in Appendix C.
The transformation from the natural parametrization to the variational parametrization is calculated using
(16) 
and the moments can then be calculated with (11). We denote by the th variable of the tuple of natural parameters . The gradient update (14) with random subsampling is performed until the ELBO converges. For an algorithmic description of SVI see Fig. 2 Alg. 2.
IvE One Step Stochastic Variational Inference (OSSVI)
Since the optimization of the ELBO until convergence with both VB and SVI is computationally expensive, we present a novel onestep SVI (OSSVI) algorithm for the bandit setting. In each round of OSSVI the learner observes a context and a reward based on the taken action . This data point is used to update the variational parameters of the th regression coefficients by going one step in the direction of the natural gradient of the ELBO . For this we calculate the intermediate estimates (15) based on replicates of the observed data point . Thereafter, the stochastic gradient update is performed with (14). By transforming the natural parameters back to their corresponding parametric form (16), the updated mean and covariance matrix can be found. This update step is computationally significantly faster than using VB or SVI. The OSSVI subroutine is described in Fig. 2 Alg. 4.
IvF Accuracy and Computational Speed of the CBAUCB Algorithms
For the numerical evaluation of the CBAUCB with three parameter Beta Normal prior, we first create data based on the linearization assumption. We use a problem with decision horizon , dimensions, and actions. We use two experimental setups with a dense regression coefficient vector and a sparse regression coefficient vector, i.e., only five regression coefficients are unequal to zero.
We compare the developed algorithm CBAUCB using the variants VB, SVI and OSSVI with two baseline algorithms: LinUCB [7] and CGPUCB [22]. For the CGPUCB, we use independent linear kernels for every action. Fig. 5 and Fig. 6 show the average regret (1) for the dense and the sparse setting, respectively. For the sparse setting expected in highdimensional problems such as networkassisted video streaming, CBAUCB with VB yields the smallest regret. We observe in Fig. 5 that in the dense setting CGPUCB obtains a high performance which is closely followed by CBAUCB with VB. Note that CGPUCB performs well, as Gaussian process regression with a linear kernel corresponds to a dense Bayesian regression with marginalized regression coefficients, and therefore matches the model under which the dense data has been created.
In Fig 1 we show the runtimes of the algorithms, where we observe that the runtimes for CBAUCB with VB / SVI and the CGPUCB baseline are impractically high. Further, this running performance deteriorates as the dimensions of the context grow, since the computational bottleneck of both VB and SVI are multiple matrix inversions of size , see Fig. 7. Fig. 8 shows the scaling of the runtime with the decision horizon with an identical setup as in Tab. 1. The CGPUCB scales badly with , as the kernel matrix of size is inverted at every time step. Here, denotes the number of already observed contexts and rewards for decision . Since the decision making has to be made in the order of a few hundred milliseconds for video streaming applications, neither CBAUCB with VB nor CGPUCB can be computed within this timing restriction. Therefore, we resort to the OSSVI variant of the CBA algorithm, which empirically obtains a much smaller regret than the fast LinUCB baseline algorithm, but still retains a comparable runtime performance^{5}^{5}5For updating CBAUCB with OSSVI or LinUCB we only have to invert a matrix once after a decision.. This renders the use of CBA with One Step Stochastic Variational Inference for networkassisted video quality adaptation feasible.
Algorithm  Sparse Setting  Dense Setting 
CGPUCB  638.68 s  643.44 s 
LinUCB  31.24 s  30.70 s 
CBAOSSVI  91.40 s  89.56 s 
CBASVI  3784.00 s  4081.74 s 
CBAVB  1434.11 s  1760.83 s 
V Video Quality Adaptation as a Contextual Bandit Problem
In the following, we model ABR streaming as a contextual bandit problem where we use our developed CBA algorithm for video quality adaptation. The action set corresponds to the set of available bitrates such that action represents the decision to request quality for the th segment; i.e., to request the segment . Below we formulate a realvalued segmentbased QoE function to represent the reward obtained by performing . Furthermore, we let represent the network context vector corresponding to an available action at segment . At each , therefore, there will be unique context vectors available.
Va Online Video Quality Adaptation using CBA
CBA performs online video quality adaptation by calculating the index presented in (5) for each available action after observing the context vector of the action to determine the optimal bitrate to request for the next segment . There are no constraints on the contents of the context vectors, allowing CBA to learn with any information available in the networking environment. Furthermore, each context feature may be either global or actionspecific; for example, the current buffer filling percentage or the last 50 packet RTTs at bitrate , respectively. The action with the largest computed index is chosen, and a request goes out for . Once is received, its QoE value below is calculated and fed to CBA as the reward . CBA then updates its internal parameters before observing the next set of context vectors and repeating the process for segment , until the video ends at segment .
The performance of CBA depends upon several hyperparameters. In the description in Fig. 2, Alg. 1., we choose as it was shown to yield the most promising results [23]. As mentioned in Sect IV, we use to obtain the horseshoe shrinkage prior. We let ; we choose and to be small nonzero values such that a vague prior is obtained.
VB Reward Formulation: Objective QoE
The calculated QoE metric is the feedback used by CBA to optimize the quality adaptation strategy. As QoE scores for a video segment may vary among users, we resort in this work to an objective QoE metric similar to [28] which is derived from the following set of factors:

Video quality: The bitrate of the segment. .

Decline in quality: If the current segment is at a lower bitrate than the previous one, for two back to back segments^{6}^{6}6we use to denote ..

Rebuffer time: The amount of time spent with an empty buffer after choosing .
The rationale behind using the decline in quality, in contrast to the related work that counts quality variations, is that we do not want to penalize CBA if the player strives for higher qualities without risk of rebuffering. The importance of each component may vary based on the specific user or context, so, similar to [28], we define the QoE of a segment as a weighted sum of the above factors. Let the rebuffer time be the amount of time spent rebuffering after choosing . We define the QoE then as:
(17) 
where , , and are nonnegative weights corresponding to the importance of the video quality, decline in quality, and rebuffer time, respectively. For a comparison of several instantiations of these weights, see [28].
Note that the above QoE metric is independent from CBA; the bandit is only given the scalar result of the calculation. CBA is able take arbitrary QoE metrics as specified input as long as these comprise a realvalued function to produce the reward metric.
Vi Evaluation of Quality Adaptation in NDN
To evaluate the performance of CBA and compare it with Throughputbased (TBA) and Bufferbased (BBA) adaptation peers, we emulate the two NDN topologies: the doubles topology, shown in Fig. 9; and the full topology, shown in Fig. 10. The topologies are built using an extension of the Containernet project^{7}^{7}7https://github.com/containernet/containernet which allows the execution of Docker hosts as nodes in the Mininet emulator.
The NDN clients use a DASH player implemented with libdash, based on the code from [2] with Interest Control Protocol (ICP) parameters of , , and . We note that traffic burstiness can vary significantly depending on the ICP parameters used.
The clients begin playback simultaneously, where they stream the first 200 seconds of the BigBuckBunny video encoded in twosecond H.264AVC segments offered at the quality bitrates {1, 1.5, 2.1, 3, 3.5}Mbps, with a playback buffer size of 30 seconds. All containers run instances of the NDN Forwarding Daemon (NFD) with the access strategy, and repong is used to host the video on the servers and caches.
In the following, we compare the performance of CBA in the VB and OSSVI variants, in addition to the baseline algorithm LinUCB [7]. We also examine the performance of two stateoftheart BBA and TBA algorithms, i.e., BOLA [15] and PANDA [14], respectively. There are many adaptation algorithms in the literature, some of which use BBA and TBA simultaneously, including [28], [29], [30], and [31]; however, BOLA and PANDA were chosen because they are widely used and achieve stateoftheart performance in standard HTTP environments. Buffer filling percentage and qualityspecific segment packet RTTs are provided to the client as context. Furthermore, we added a numHops tag to each Data packet to track the number of hops from the Data origin to the consumer.
We track the RTTs and number of hops of the last 50 packets of each segment received by the client in accordance with measurements from [32]. If a segment does not contain 50 packets, results from existing packets are resampled. As a result, each CBA algorithm is given a dimensional context vector constituted of the buffer fill percentage, packet RTTs, and numHops for each of the available qualities.
Algorithm 






CBAOSSVI  3.10  6  0.57  15  
CBAVB  2.58  6  0.65  325  
LinUCB  2.24  14  1.07  6  
BOLA  2.63  36  1.19  
PANDA  2.51  16  1.00 
Via Results on the Doubles Topology
We modulate the capacity of the bottleneck link using truncated normal distributions. The link capacity is hence drawn with mean of 7Mbps, where it stays unchanged for a period length drawn with a mean of 5s. The weights in Eq. 17 are set to , , and , emphasizing the importance of the average quality bitrate without allowing a large amount of rebuffering to take place. We note that the use of subjective quality evaluation tests for different users to map these weights to QoE metrics via, e.g., the mean opinion score (MOS), is out of the scope of this work.
Examining Tab. 1, we see that the onestep CBAOSSVI yields a significantly higher average bitrate. This is expected based on the QoE definition (17), but we might expect CBAVB to pick high bitrates as well. However, we observe that the parameter update time for CBAVB is 20 times greater than that of CBAOSSVI; this puts a delay of onesixth of each segment length on average between receiving one segment and requesting another. Looking at CBAVB in Fig. 11 we see that CBAVB accumulates a much larger rebuffer time than other methods. Hence, CBAVB is forced to request lower bitrates to cope with the extra rebuffer time incurred by updating its parameters. In addition, note that LinUCB fails to select high bitrates despite having a very small parameter update time, implying that LinUCB is not adequately fitting the context to the QoE and is instead accumulating a large amount of regret. This is corroborated by its cumulative QoE depicted in Fig. 11, which performs nearly as poorly as CBAVB. By inducing sparsity on the priors and using just one sample, CBAOSSVI successfully extracts the most salient features quickly enough to obtain the highest cumulative QoE of all algorithms tested.
Interestingly, the CBA approaches shown in Fig. 1 also result in the lowest number of quality switches, though our QoE metric does not severely penalize quality variation. We see that the magnitude of their quality switches is also nearly half that of the other algorithms.
Concerning the rebuffering behavior, we observe rebuffering ratios of {4.5%, 8.4%, 11.4%, 17.6%, 32.9%} for LinUCB, BOLA, PANDA, CBAOSSVI, and CBAVB, respectively. We trace some of the rebuffering events to the ICP congestion control in NDN. Note that tuning the impact of rebuffering on the adaptation decision is not a trivial task [2]. Fortunately, this is not hardwired in CBA but rather given through (17). Hence, in contrast to stateoftheart adaptation algorithms, CBA could learn to filter the contextual information that is most important for rebuffering by tweaking the QoE metric used.
An important consideration when choosing a quality adaptation algorithm is fairness among clients while simultaneously streaming over common links. While this is taken care of in DASH by the underlying TCP congestion control, we empirically show here how the ONOFF segment request behavior, when paired with the considered quality adaptation algorithms, impacts the QoE fairness in NDN. This is fundamentally different from considering bandwidth sharing fairness in NDN; e.g., in [2]. Here we are interested in QoE fairness since the QoE metric and not the bandwidth share is the main driver of the quality adaptation algorithm. Fig. 12 shows the regret of QoE fairness between both clients , where a larger regret indicates a greater difference in QoE between both clients up to a particular segment. Note that the regret is defined as a cumulative metric similar to (1). In accordance to the discussion in[33], the fairness measure used here is the entropy of the relative QoE of the two clients where denotes the binary entropy and the QoE is given by (17). The regret is calculated with respect to the optimal fairness of . Observe that the CBA algorithms attain a significantly lower QoE fairness regret than other techniques.
ViB Results on the Full Topology
To evaluate the capacity of CBA to adapt to different reward functions in complex environments, we compare performance with the full topology on two sets of weights in Eq. 17: HIGH_QUALITY_WEIGHTS sets , , and , identical to those used in the evaluation on the doubles topology; conversely, NO_REBUFFERING_WEIGHTS sets , , and , placing greater importance on continuous playback at the expense of video quality. We evaluate each algorithm with each weighting scheme for 30 epochs, where one epoch corresponds to streaming 200 seconds of the BigBuckBunny video. All clients use the same adaptation algorithm and weighting scheme within an epoch, and bandits begin each epoch with no previous context information.
Inspecting Tab. 2, we observe that the performance statistics among algorithms, even with different weighting schemes, are much closer than for the doubles topology. We attribute this to the use of a more complicated topology in which many more clients are sharing network resources, resulting in fewer and less predictable resources for each client. Furthermore, the average bitrate for the bandit algorithms does not change significantly across weighting schemes, and either stays the same or increases when using NO_REBUFFERING_WEIGHTS. This may seem contradictory, but, analyzing part (a) of Figs. 13 and 14, we note that CBAOSSVI tended to choose much lower bitrates with NO_REBUFFERING_WEIGHTS, and therefore accruing less rebuffer time in part (b), than with HIGH_QUALITY_WEIGHTS, indicating that CBAOSSVI successfully adapted to either weighting scheme within the playback window. Similarly to the doubles topology, LinUCB failed to map the context to either weighting scheme, selecting higher bitrates and rebuffering longer with NO_REBUFFERING_WEIGHTS. Note that, for either CBAOSSVI or LinUCB, the cumulative rebuffer time in part (b) of Figs. 13 and 14 tapers off roughly halfway through the video, as either algorithm learns to request more appropriate bitrates.
Interestingly, CBAVB also fails to adapt to either weighting scheme, performing nearly identically in either case. This is a byproduct of the excessive parameter update time for CBAVB in Tab. 2, which stems from the unpredictable nature of a larger network and the computational strain of performing up to 7 CBAVB parameter updates simultaneously on the test machine. CBAVB is therefore spending over half of the length of each segment deciding on which segment to request next, causing long rebuffering times in part (b) of Figs. 13 and 14, culminating in very low QoE scores regardless of the weighting scheme used. This obfuscates the underlying QoE function, preventing CBAVB from differentiating between the weights in either case within the time allotted. In a realworld scenario, where each client is an independent machine, we expect that CBAVB, as well as CBAOSSVI and LinUCB to a lesser extent, would have parameter update times comparable to those in the doubles topology, resulting in better performance; however, we note that evaluation in such an environment is out of the scope of this work.
Again, we see in Tab. 1 that CBAOSSVI switches qualities least frequently despite neither weighting scheme explicitly penalizing quality variation. Furthermore, according to parts (c) and (d) of Fig. 13 and Fig. 14, CBAOSSVI and CBAVB are both stable in the number of quality switches and the quality switch magnitude across epochs, even under different weighting schemes, as opposed to the other algorithms tested.
Algorithm 






HIGH_QUALITY_WEIGHTS  
CBAOSSVI  1.55  5  0.82  53  
CBAVB  1.52  15  1.16  1254  
LinUCB  1.27  17  1.01  11  
BOLA  1.96  8  0.63  
PANDA  1.15  18  0.56  
NO_REBUFFERING_WEIGHTS  
CBAOSSVI  1.55  6  0.93  55  
CBAVB  1.68  12  1.08  1362  
LinUCB  1.43  22  1.04  16  
BOLA  1.92  12  0.71  
PANDA  1.13  17  0.70 








Vii Conclusions and Future Work
In this paper, we contributed a sparse Bayesian contextual bandit algorithm for quality adaptation in adaptive video streaming, denoted CBA. In contrast to stateoftheart adaptation algorithms, we take highdimensional video streaming context information and enforce sparsity to shrink the impact of unimportant features. In this setting, streaming context information includes clientmeasured variables, such as throughput and buffer filling, as well as, network assistance information. Since sparse Bayesian estimation is computationally expensive, we developed a fast new inference scheme to support online video quality adaptation. Furthermore, the provided algorithm is naturally applicable to different adaptive video streaming settings such as DASH over NDN. Finally, we provided NDN emulation results showing that CBA yields higher QoE and better QoE fairness between simultaneous streaming sessions compared to throughput and bufferbased video quality adaptation algorithms.
Appendix A The Generalized Inverse Gaussian
The probability density function of a generalized inverse Gaussian (GIG) distribution is
(18) 
The GIG distribution with parameters is a member of the exponential family distribution with base measure , natural parameters , sufficient statistics and lognormalizer . The inverse transform of the natural parameters is obtained by .
Appendix B Calculation of the ELBO
Here, we present the calculation for the ELBO. The joint distributions involved in the calculation of the evidence lower bound (8) factorize as
(19) 
and
(20) 
Denoting as the expactation w.r.t. to the distribution , the evidence lower bound (8) is
(21) 
The expected values of the log factorized joint distribution (19) needed for (21) are
(22) 
The expected values of the log factorized variational distribution (20) compute to
(23) 