Degenerate Feedback Loops in Recommender Systems

Degenerate Feedback Loops in Recommender Systems

Ray Jiang, Silvia Chiappa, Tor Lattimore, Andras Agyorgy, Pushmeet Kohli
DeepMind London, UK

Machine learning is used extensively in recommender systems deployed in products. The decisions made by these systems can influence user beliefs and preferences which in turn affect the feedback the learning system receives - thus creating a feedback loop. This phenomenon can give rise to the so-called “echo chambers” or “filter bubbles” that have user and societal implications. In this paper, we provide a novel theoretical analysis that examines both the role of user dynamics and the behavior of recommender systems, disentangling the echo chamber from the filter bubble effect. In addition, we offer practical solutions to slow down system degeneracy. Our study contributes toward understanding and developing solutions to commonly cited issues in the complex temporal scenario, an area that is still largely unexplored.

Degenerate Feedback Loops in Recommender Systems

Ray Jiang, Silvia Chiappa, Tor Lattimore, Andras Agyorgy, Pushmeet Kohli {rayjiang,csilvia,lattimore,agyorgy,pushmeet} DeepMind London, UK

Copyright © 2019, Association for the Advancement of Artificial Intelligence ( All rights reserved.


Recommender systems are increasingly used to provide users with personalized product and information offerings (Ben Schafer, Konstan, and Riedl, 2001; Lu et al., 2015; Covington, Adams, and Sargin, 2016). These systems employ user’s personal characteristics and past behaviors to generate a list of items that are individually tailored to the user’s preferences. Whilst extremely successful commercially, there are growing concerns that such systems might lead to a self-reinforcing pattern of narrowing exposure and shift in user’s interest, problems that are often referred to in the literature as “echo chamber” and “filter bubble”. A significant amount of research has therefore been devoted to deriving ways to favor diversity in the set of items an individual may be exposed to (see Kunaver and Porl (2017) for a review). However, current understanding of the echo chamber and filter bubble effects is limited and experimental analysis reports conflicting results.

In this paper, we define as echo chamber the effect of a user’s interest being positively or negatively reinforced by repeated exposure to a certain item or category of items, thereby generalizing the definition in Sunstein (2009), where the term is used to refer to over- and limited-exposure to similar political opinions reinforcing one’s existing beliefs. We focus the definition of filter bubble introduced by Pariser (2011) to describe just the fact that recommender systems select limited content to serve users online. We provide a theoretical treatment that allows us to consider the echo chamber and filter bubble effects separately. We view user’s interest as a dynamical system and treat interest extremes as degeneracy points of the system. We consider different models of dynamics and identify sets of sufficient conditions that make them degenerate over time. We then use this analysis to understand the role played by the recommender system. Finally, we showcase the interplay between the user’s dynamics and the recommender system actions in a simulation study using synthetic data and several classic bandit algorithms. The results reveal several pitfalls of recommender system design and point towards mitigation strategies.

Related Work

Through an analysis on the MovieLens dataset, Nguyen et al. (2014) found that the diversity of items recommended, and those users engage with, gets narrower over time. The paper asks whether there is a “natural” tendency of degeneration in user interest. Our paper takes steps toward answering this question by providing theoretical conditions for user interest degeneracy.

In the social sciences literature, Flaxman, Goel, and Rao (2016) found that online services are associated with increased political polarization between users as well as increased exposure to the less preferred side of political opinions. Their seemingly counter-intuitive findings are not contradictory according to our results: systems with some level of random exploration can be degenerative. Barberá et al. (2015) also presented evidence of echo chamber related to political issues on Twitter. On the other hand, Borgesius et al. (2016); Beam, Hutchens, and Hmielowski (2018); Nechushtai and Lewis (2018) found counter-evidence on online news consumption. Another work by Bakshy, Messing, and Adamic (2015) measured the effect of user choices separately from that of the recommendation algorithm, and found that individual choices play a larger role than the algorithm in creating echo chamber on Facebook. This supports our viewpoint that user interests degenerate or not depending on their internal dynamics, the recommender system can only slow down or accelerate the process of degeneration.


We consider a recommender system that interacts with a user over time111For simplicity, in this paper we restrict ourselves to the case of a single user, and leave the case of multiple users possibly influencing each-other interests to future work.. At every time step , the recommender system serves items (or categories of items, e.g. news articles, videos, or consumer products) to a user from a finite or countably infinite item222Throughout the paper, “items” also mean categories of items. set . In general, the goal of the system is to present items to a user that she is interested in: we assume that, at time step , the user’s interest in an item is described by a function such that is large (positive) if the user is interested in the item, and small (negative) if she is not333Whilst we focus on , we show in Remark 5, Appendix B that our results can be extended to the case where belongs to a bounded open interval..

Given a recommendation , the user provides some feedback based on her current interests . This interaction has multiple effects: in the traditional literature for recommender systems, the feedback is used to update the internal model of the recommender system that has been used to obtain the recommendation , and the new model may depend on , , and . In practice usually predicts the distribution of user feedback to determine which items should be presented to the user. In this paper we focus on another effect and consider explicitly that the user’s interaction with the recommender system may change her interest in different items for the next interaction, thus the interest may depend on , , and . The full model of interaction is depicted in Fig. 1.

Recommender system


Figure 1: Model of interaction between a recommender system and user over time. Continuous and dashed links indicate existing or possible dependencies, respectively.

We are interested in studying the evolution of the user’s interest. An example of such an evolution is that the interest is reinforced by user interactions with the recommended items, that is, if the user clicks on an item at time step , while if is shown but not clicked (here can be defined as the indicator vector of clicks to the corresponding items).

To analyze the echo chamber or filter bubble effect, we are interested in understanding when the user’s interest changes extremely, which, in our model, translates to taking values arbitrarily different from the initial interest : large positive values indicate that the user becomes extremely interested in item , while large negative values indicate that the user dislikes . Formally, for a finite item set , we can ask if the norm can grow arbitrarily large: the user’s interest sequence is called weakly degenerate if


A stronger notion of degeneracy, which also requires that once drifted away from it remains so, is strong degeneracy: the sequence is strongly degenerate if


In the next section we show that weak or strong degeneracy occurs under mild sufficient conditions on the evolutionary dynamics of .

There are multiple ways to extend the above definitions to the case of an infinite item set . For simplicity, we only consider here replacing with in Eqs. (1) and (2), which is equivalent to the original definitions when is finite555As such, we could have used in our original definitions, but we prefer as it also provides some information about the “average” deviation of the user’s interest over the different items at any finite time ..

User Interest Dynamics – Echo Chamber

As items often represent diverse categories of things, we make the simplifying assumption that they are independent from each other. By setting and for all (i.e., ), we can remove the influence of the recommender system and consider the dynamics of the user’s interest separately. This allows us to analyze the echo chamber effect: what happens to the interest if item is served infinitely often (i.o.).

Since is fixed, to simplify the notation, we write instead of in this section. Given , according to Fig. 1, is a—possibly stochastic—function of (as depends on and , and depends on ). Below we discuss the general case when the drift is a nonlinear stochastic function; deterministic models for the drift are considered in Appendix B.

Nonlinear Stochastic Model.

We assume that is fixed and that , where is an infinite sequence of independent uniformly distributed random variables that introduce noise into the system (i.e.  is a stochastic function of ). The function is assumed to be measurable, but otherwise arbitrary. Denoting the uniform distribution over by , let

be the expected increment when . We also define

to be the cumulative distribution of the increment. The asymptotic behavior of depends on , but under mild assumptions the system degenerates weakly (Theorem 1) or strongly (Theorem 2)666The proofs of these theorems are given in Appendix A..

Theorem 1 (weak degeneracy).

Assume that is continuous at for all and there exists a such that 1) for all , 2) for all . Then the sequence is weakly degenerate, i.e.  almost surely.

The assumptions guarantee that within any closed bounded interval there is a constant probability that the random walk escapes to the left/right when starting to the left/right of respectively. Under stronger conditions it is possible to guarantee the divergence of the random walk. We state a simple version of the theorem, but note that the result can be generalized in many ways.

Theorem 2 (strong degeneracy).

Assume that the conditions of Theorem 1 hold, and additionally that there exists such that almost surely and there exists an such that for all sufficiently large it holds that , and for all sufficiently small it holds that . Then or almost surely.

Intuitively, weak degeneracy occurs in a stochastic environment if the user’s interest has some non-zero probability of drifting up when above some threshold, and of drifting down when below. Strong degeneracy holds if additionally is bounded and for sufficiently large/small the increment has positive/negative drift that is larger than a constant.

Theorems 1 and 2 show that the user’s interest degenerates under very mild conditions, in particular, in the model we consider in our simulation studies. Thus, in such cases degeneracy can only be avoided if an item (or item category) is showed only finitely many times; otherwise one can only hope to control how fast degenerates (i.e. tends to ).

System Design Role – Filter Bubble

In the previous section we discussed conditions for degeneracy for different user interest dynamics. In this section we examine the other side of the story, the influence of recommender system actions in creating filter bubbles. We typically do not know the dynamics of the user’s interest in the real world. However, we consider the relevant scenario to the echo chamber/filter bubble problem where user’s interest in some items has degenerative dynamics, and examine how to design a recommender system that slows down the degeneracy process. We consider three dimensions, namely model accuracy, amount of exploration, and growing candidate pool.

Model Accuracy.

One common goal of recommender systems designers is to increase the prediction accuracy of the internal model . How does model accuracy coupled with greedy optimal affect the speed of degeneration? We examine this question for the extreme case of exact predictions, i.e. , we call such a prediction model the oracle model. We argue that under the surfacing assumption explained below, the oracle model coupled with greedily optimal action selection results in the quickest degeneracy.

In order to analyze the problem concretely, we focus on the degenerate linear dynamics model for for , i.e.  . Then we can solve for , obtaining

for (see Appendix B).

Surfacing Assumption: Let be the candidate set of size . If a subset of items leads to positive degenerate dynamics (i.e.  for all ), then we assume that there exists a time such that, for all , takes up the top items in terms of values of , sorted by the base value of the exponential function, .

The surfacing assumption makes sure that the quickest degenerating items surface out to the top list given enough time of exposure. It can be generalized to nonlinear stochastic dynamics of provided that the items from have an almost surely stable ordering of degeneracy speed over time.

Under the general surfacing assumption, after time , the quickest way to degeneration is to serve the top items according to , or of the oracle model. Even if the assumption is violated to some degree, the oracle model still leads to degeneracy very efficiently by picking the top items according to which are likely to receive positive feedback due to high , and therefore increasing and reinforcing the past choices.

In practice the recommender system models are inaccurate. We can think of inaccurate models as the oracle model with different levels of noises added to . We discuss inaccurate models in the next section.

Amount of Exploration.

Consider a type of -random exploration where always picks the top items out of a finite candidate pool with uniform noise on , i.e. according to .

Given the same model sequence , the bigger is, usually the slower the system degenerates. However, in practice is learned from observations, and the random exploration added to an oracle model may in fact accelerate degeneration: random exploration can help reveal the most positively degenerating items over time making the surfacing assumption more likely to be true (we show this phenomenon in the simulation experiments below, Fig. 17). In addition, if user interests have degenerative dynamics, even recommending items uniformly at random leads to degeneration, albeit quite slowly.

How do we then make sure that the recommender system does not make user interests degenerate? One way is to limit the number of times an item for which the user’s interest dynamics is degenerative is served to the user. In practice it is hard to detect which items correspond to degenerative dynamics, however we can generally prevent degeneration if all items are served only a finite number of times, which suggests having an ever growing pool of candidate items.

Figure 12: Echo chamber and filter bubble effect for Optimal Oracle, Oracle, TS, UCB and Random models. Sorted user interest and serving rates are plotted every 500 steps. Under all models except for the Random Model, very quickly both the top items served and the top user interests narrow down to the () 5 most positively reinforced items.

Growing Candidate Pool .

With a growing candidate pool, at every time step an additional set of new items becomes available to be served to the user. Hence the domain of the function expands as increases. Adding new items at least linearly often is a necessary condition to avoid possible degeneration, since in a finite or any sublinearly growing candidate pool, by the pigeon hole principle, there must exist at least one item that is served i.o., which is degenerate in the worst case scenario (also under general conditions described e.g. in Theorem 2). However, with an at least linearly growing candidate pool the system can potentially impose the maximum number of times any item is served to a user and prevent degeneration.

Simulation Experiments

In this section, we consider a simple degenerative dynamics for and examine degeneration speed under five different recommender system models. We further demonstrate that adding new items to the candidate pool can be an effective solution against system degeneracy.

We create a simulation for the model of interaction between a recommender system and a user of Fig. 1. Consider a possibly growing candidate pool of items of initial size and of size at time step . At each time step , a recommender system picks the top out of the items according to the internal model to serve to a user. The user considers each of the items independently and chooses to click on a (possibly empty) subset of them, thereby generating a binary vector of size where gives the user feedback on item , according to , where is the sigmoid function . The system then updates the model based on the past actions, feedbacks and the current model parameter . We assume that the user’s interest increases/decreases by if the item receives/does not receive a click, i.e.


where the function maps from the candidate set to . From Theorem 2, we know that for every item. In the experiment, we set and sample the drift from a uniform random distribution . The user’s initial interest for all items is independently sampled from a uniform random distribution .

Figure 13: System evolves for 5,000 time steps with report interval 500. The results are averaged over 30 runs with the shaded area indicating the standard deviation. In terms of the degeneracy speed, Optimal Oracle Oracle TS UCB Random.
Figure 16: LABEL:sub@fig:changing_m_T Degeneracy surfaces for Optimal Oracle (grey), UCB (green) and TS (orange) up to time while varying candidate pool sizes . A larger candidate pool requires a longer time for exploration for the bandit algorithms, but among the three models UCB slows down system degeneracy the most given a large candidate pool. LABEL:sub@fig:changing_m Degeneracy speeds at of Optimal Oracle and the Oracle are higher given a larger size of the candidate set, but those of the and Random Model, UCB, and TS are lower.
Figure 17: Degeneracy speed for the Oracle model with different noise levels up to . Adding noise to Oracle () accelerates degeneration but as the noise level grows, degeneracy slows down.

The internal recommender system model is updated according to following five algorithms:

  • Random Model: Instead of picking top items, the set of items is sampled from a uniform random distribution over the candidate set .

  • Oracle: , .

  • Optimal Oracle: , . This model does not pick the highest items according to but according to . Thus, it always picks the fastest degenerating items, therefore maximizing both the long term user engagement and the degeneracy speed. For a fixed candidate pool , this model is equivalent to an Oracle that satisfies the Surfacing Assumption.

  • Upper Confidence Bound Multi-armed Bandit Algorithm (UCB) (Lai, 1987; Auer, Cesa-Bianchi, and Fischer, 2002; Lattimore and Szepesvári, 2019): We use the version of UCB algorithm in Chapter 8 of Lattimore and Szepesvári (2019), however most UCB algorithms perform similarly to the purpose of this experiment. The algorithm prioritizes serving any item from the candidate set that has never been served before. This treatment includes the initial items as well as later whenever new items are added to the candidate pool. At time step , UCB serves previously unserved items and the top items according to values of . Define and we use the following model update , where is the empirical average of feedbacks on item , i.e. , and is the number of times item has been served up to time , i.e. .

  • Thompson Sampling Multi-armed Bandit Algorithm (TS) (Thompson, 1933): We initialize for any new item . If is served at time , we perform the update . At any time , the internal model is sampled from the corresponding beta distribution .

Echo Chamber & Filter Bubble Effect

We examine the echo chamber and filter bubble effects by running the simulation on a candidate pool of fixed size with time horizon .

In Fig. 12, we show the degeneration of user interest (left column) and the serving rate (right column) of every item as each recommender model evolves in time. The serving rate of an item shows how often it is served within the report interval. In order to see the distribution clearly, we sort the items according to the z-values at the report time. Although all models cause user interest degeneration, the degeneration speeds are quite different (Optimal Oracle Oracle, TS, UCB Random Model). The Oracle, TS and UCB optimize based on and so we see a positive degenerative dynamics for . The Optimal Oracle optimizes on the degeneration speed directly and not on so we see both a positive and negative degeneration in . The Random Model also drifts in both directions, but at a much slower rate. However, overall except for the Random Model, very quickly both the top items served and the top user interests narrow down to the () 5 most positively reinforced items.

Speed of Degeneracy

Next, we compare the degeneracy speed for the five recommender system models on both fixed and growing candidate sets. As the distance that measures system degeneracy is asymptotically linear for all five models (see Appendix C), we quantify degeneracy speeds by compare empirically in finite candidate pools for different experiment setups.

Figure 13 shows the degeneracy speed of five models averaged across 30 runs when we take and evolve the system for steps. We see that the Optimal Oracle results in the fastest degeneration by far, followed by the Oracle, TS and UCB. The Random Model offers the slowest degeneracy speed.

The Effect of Candidate Pool Size.

In Fig. LABEL:fig:changing_m_T we compare the Optimal Oracle, UCB and TS’ degeneracy speed up to 5,000 time steps and candidate pool sizes . Apart from the Random model, we see that UCB slows down system degeneracy the most given a large candidate pool since it is forced to explore any unserved item first. A larger candidate pool requires a longer time for exploration for the bandit algorithms. As the candidate pool size grows to 10,000 UCB’s degeneracy speed never peaks up given the time horizon, but will eventually grow given a longer time. TS has higher degeneracy speed due to weaker exploration on new items. The Optimal Oracle accelerates degeneration given a larger pool, as it can pick potentially faster degenerative items than from a smaller pool.

Additionally, in Fig. LABEL:fig:changing_m we plot all five models degeneracy speed for against the same changing candidate pool sizes. The degeneracy speed of the Optimal Oracle and the Oracle increases with the size of the candidate set, but that of the and Random Model, UCB, and TS decreases. In practice, having a large candidate pool can be a temporary solution to slow down system degeneration.

The Effect of the Noise Level.

Next we show the influence of internal model inaccuracy on degeneracy speed. We compare the Oracle model with different amounts of uniformly random noises, i.e. the system serves the top items according to the noisy internal model . The candidate pool has fixed size . In Fig. 17, we vary from 0 to 10. Counter-intuitively adding noise to Oracle accelerates degeneration since faster degenerative items may be selected by chance than those fixed set of top items ranked by , and more likely satisfies the Surfacing Assumption. Given , as expected, we see a nice monotonically increasing damping effect on degeneracy speed as the noise level grows.

Figure 18: Comparison of the five models with growing candidate pools at different rates , degeneracy up to , averaged over 10 runs. Both the Oracle and the Optimal Oracle for all growth rates are degenerate. The Random Model and UCB stop degeneration at sublinear growth while TS model requires linear growth to stop degeneration.

Growing Candidate Pool.

We extend the definition of degeneracy speed to an infinite candidate pool by computing (see Appendix C for an asymptotic analysis). Since the degeneracy speed may not be asymptotically linear for all five models, we examine directly the sup distance over 10,000 time steps. To construct growing candidate pools at different growth speed, we define a growth function by varying the growth parameter777 gives a fixed candidate pool, gives sub-linear growth, gives linear growth. , where . In Fig. 18 we average the results over 10 independent runs. Both the Oracle and the Optimal Oracle for all growth rates are degenerate. The Random Model stops degeneration at sublinear growth, , so does UCB thanks to forced exploration on previously unserved items, although its trajectory has a small upward tilt. The TS model degenerates at sublinear growth but stops degeneration at linear growth . For all models, the higher the growth rate , the slower they degenerate, if they do at all. Overall when applicable, an ideally linearly growing candidate set and continuous random exploration seem to be good remedies against an adversarial dynamics of to best prevent degeneracy.


We provided a theoretical analysis of the echo chamber and filter bubble effects for recommender systems. We used the dynamical system framework to model user’s interest and treated interest extremes as degeneracy points of the system. We gave formal definitions of system degeneracy and provided sufficient conditions which make the system degenerate with both deterministic and stochastic dynamics. On the recommender system side, we discussed the influence on degeneracy speed of three independent factors in system design, i.e. model accuracy, amount of exploration, and the growth rate of the candidate pool. An oracle model often leads to quick degeneracy of the system, while continuous exploration and a large candidate pool size can help slow it down. The best remedies against system degeneracy we found are continuous random exploration and growing the candidate pool at least linearly.

Our work has two main limitations. First, since user interests are hidden variables that are not directly observed, a good measure or proxy for user interests is necessary in practice to study degeneration reliably. Second, we assumed that items and users are independent from each other – we will extend the theoretical analysis to the case of possibly mutually dependent items and users in a future work.


We would like to thank William Isaac, Michael Mathieu, Krishnamurthy Dvijotham, Timothy Mann and Dilan Gorur, for helpful discussions and advice.


  • Auer, Cesa-Bianchi, and Fischer (2002) Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2):235–256.
  • Bakshy, Messing, and Adamic (2015) Bakshy, E.; Messing, S.; and Adamic, L. A. 2015. Exposure to ideologically diverse news and opinion on facebook. Science 348:1130–1132.
  • Barberá et al. (2015) Barberá, P.; Jost, J. T.; Nagler, J.; Tucker, J. A.; and Bonneau, R. 2015. Tweeting from left to right: Is online political communication more than an echo chamber? Psychological Science 26(10):1531–1542.
  • Beam, Hutchens, and Hmielowski (2018) Beam, M. A.; Hutchens, M. J.; and Hmielowski, J. D. 2018. Facebook news and (de)polarization: reinforcing spirals in the 2016 us election. Information, Communication & Society 21(7):940–958.
  • Ben Schafer, Konstan, and Riedl (2001) Ben Schafer, J.; Konstan, J.; and Riedl, J. 2001. E-commerce recommendation applications. Data Mining and Knowledge Discovery 115–153.
  • Borgesius et al. (2016) Borgesius, F. J. Z.; Trilling, D.; Möller, J.; Bodó, B.; de Vreese, C. H.; and Helberger, N. 2016. Should we worry about filter bubbles? Internet Policy Review.
  • Covington, Adams, and Sargin (2016) Covington, P.; Adams, J.; and Sargin, E. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys).
  • Flaxman, Goel, and Rao (2016) Flaxman, S.; Goel, S.; and Rao, J. 2016. Filter bubbles, echo chambers, and online news consumption. Public Opinion Quarterly 80:298–320.
  • Galor (2007) Galor, O. 2007. Discrete Dynamical Systems. Springer.
  • Kunaver and Porl (2017) Kunaver, M., and Porl, T. 2017. Diversity in recommender systems a survey. Knowledge-Based Systems 123(C):154–162.
  • Lai (1987) Lai, T. L. 1987. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091–1114.
  • Lattimore and Szepesvári (2019) Lattimore, T., and Szepesvári, C. 2019. Bandit Algorithms. Cambridge (to appear).
  • Lu et al. (2015) Lu, J.; Wu, D.; Mao, M.; Wang, W.; and Zhang, G. 2015. Recommender system application developments: A survey. 74.
  • Nechushtai and Lewis (2018) Nechushtai, E., and Lewis, S. C. 2018. What kind of news gatekeepers do we want machines to be? filter bubbles, fragmentation, and the normative dimensions of algorithmic recommendations. Computers in Human Behavior.
  • Nguyen et al. (2014) Nguyen, T.; Hui, P.; Harper, F.; Terveen, L.; and Konstan, J. 2014. Exploring the filter bubble: The effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web, 677–686.
  • Pariser (2011) Pariser, E. 2011. The Filter Bubble: What the Internet is Hiding from You. Penguin UK.
  • Sunstein (2009) Sunstein, C. R. 2009. 2.0. Princeton University Press.
  • Thompson (1933) Thompson, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.

Appendix A Proofs

Proof of Theorem 1.

Let denote the measure carrying when . Assume without loss of generality that and let be arbitrary. The result will follow by showing that for all it holds that


To see this suppose that

Then there exists a such that

which is a contradiction. In order to prove (4) notice that compactness of and and the continuity of at for all ensures there exists an depending only on such that for all and for all . Hence for

Let be the event that for all . Then

which completes the proof. ∎

Proof of Theorem 2.

We prove that if is sufficiently large and and , then




For the same holds, but with tending to . To see why this implies the result, notice that the previous theorem shows that eventually leaves almost surely. Each time this happens there is more than 0.5 probability of divergence and a certainty of either divergence or returning to . The conditional Borel-Cantelli theorem concludes the proof. To see why (5) and (6) hold, let

which is a martingale with bounded increments since are bounded. Now is also a martingale. Then by the strong law of large numbers for martingales,

which implies that either or almost surely. To see the latter case, suppose is sufficiently large, we compute

since a.s. A similar argument applies when is sufficiently small. Finally, Eq. (5) holds by Azuma’s inequality. ∎

Appendix B More on User Interest Dynamics

Linear Deterministic Model

Recall that since is fixed, to simplify the notation, we write instead of in this section. We assume for some deterministic function and analyze the one-dimensional, autonomous, first-order discrete dynamical system:


where . In the simple case in which is a linear function, i.e. for a linear coefficient and constant term , Eq. 7 takes the form


By unrolling Eq. (8) over time we obtain


For , the user’s interest is not influenced by the recommender system but drifts with the constant – this case occurs in the real world with probability 0.
For , the steady state equilibrium, obtained from solving the equation , is given by . Substituting with in Eq. 10 and taking the limit on both sides, we obtain




Therefore, no matter how the system selects items, in the first three cases of Eq. (12) the user interest model over the item in question is always bounded over time. Of these, only the first case , or equivalently , will occur with probability different from 0.

On the other hand, for the recommender system will degenerate strongly with growing at an exponential rate. For , the recommender system will degenerate weakly with growing exponentially.

In summary, we can draw the following conclusions when an item is served i.o.: 1) for or , the user interest degenerates by growing exponentially, and therefore the recommender system needs to exert control on how frequent such items are shown to the user in order to control the speed of degeneracy of ; 2) in all other cases, the user interest does not degenerate (), or the system cannot control linear degeneracy (, improbable case).

Non-linear Deterministic Model

If is a non-linear deterministic function, the steady state equilibrium is reached at zeros of . Sufficient (but not necessary) conditions for global stability are given by the following theorem (Galor, 2007):

Theorem 3 (sufficiency).

If is a contraction mapping, i.e. if

then a stationary equilibrium of the difference equation exists and is unique and globally (asymptotically) stable.

Thus if (where is the identity function) is a contraction mapping, then there exists a steady state equilibrium that is unique and globally stable, and the system will not degenerate. Both requiring a globally stable steady state equilibrium and requiring to be a contraction mapping are strong sufficient conditions. Moreover it is almost always impossible to verify it in practice since we don’t know the actual function . Thus we gave three examples of sufficient conditions that are used to describe the general dynamics of user interests. In the first example, users respond to an item if their interests level exceed some action threshold. With each feedback, their interest level changes and so does the action threshold. If after a while the changes in action thresholds will always be smaller than user interest changes and the user interest’s total variation ranges over , then the system degenerates.

Theorem 4 (sufficiency).

Let be an infinite sequence, . Then if the user interest for an item has the following dynamics,


then it degenerates strongly as .


We prove that as . At time , either or . First we consider the case where . At the next time step , there are again two different cases:

  1. : it implies that by Condition 14.

  2. : by Condition 15, and thus

using . Hence in both cases we have . Applying the same argument to every time step, we also have

which implies

By Condition 16, we conclude that if then as .

In the other case, . Similarly at time , we have either or . Following the same argument as above and reversing the inequality signs, we have

which implies

By Condition 16, we conclude that if then as . ∎


Let be the candidate pool of items, which is also the domain of function .

Remark 5 (scale-invariance).

Let for any open interval . All sufficiency theorems still hold if we define the system degeneracy for as the following. We call degenerate iff there exists a monotonic, continuous function such that .

If we define , then all sufficiency theorems readily apply to and the conclusions hold for since .

Appendix C Degeneracy Speed Analysis

We analyze the system degeneracy (when the candidate pool is finite) and (when is infinite) asymptotically in the order of . As , for the selected items. Thus the selected items are asymptotically almost surely clicked, and hence for any such item , .

Finite Candidate Pool.


In the Random recommender system, and thus


Both the Oracle and the Optimal Oracle have a fixed set of items that they keep selecting. Thus for any item and


For both the UCB and TS we have


where are the number of close to optimal arms. Therefore all five models have the degeneracy quantity converge to a linear function of .

Infinite Candidate Pool.

Since we sample from a bounded interval , asymptotically

where is an item with .

In the Random recommender system, and thus


The Oracle has a fixed set of items that it keeps selecting. Thus for any item and


The Optimal Oracle model will pick all items with and for some constant .


For UCB and Thompson sampling the situation is less clear. When growth of the candidate pool is linear, then UCB will spend most of its time exploring new items and consequently degeneration is very slow. Thompson sampling will continue to play degenerate items with reasonable probability and so degeneration speed will be larger than for UCB. Precisely quantifying the rates of degeneration depends in a complicated way on the details of the model.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description