Neuroscientific User Models:
The Source of Uncertain User Feedback and Potentials for Improving Recommendation and Personalisation
Abstract.
Recent research revealed a considerable lack of reliability for user feedback when interacting with adaptive systems, often denoted as user noise or human uncertainty. Moreover, this lack of reliability holds striking impacts for the assessment of adaptive systems and personalisation approaches. Whenever research on this topic is done, there is a very strong systemcentric view in which user variation is something undesirable and should be modelled with the eye to eliminate. However, the possibilities of extracting additional information were only insufficiently considered so far.
In this contribution we consider the neuroscientific theory of the Bayesian brain in order to develop novel user models with the power of turning the variability of user behaviour into additional information for improving recommendation and personalisation. To this end, we first introduce an adaptive model in which populations of neurons provide an estimation for a feedback to be submitted. Subsequently, we present various decoder functions with which neuronal activity can be translated into quantitative decisions. The interplay of cognition model and decoder functions lead to different modelbased properties of decisionmaking. This will help to associate users to different clusters on the basis of their individual neural characteristics and thinking patterns. By means of user experiments and simulations, we show that this information can be used to improve the standard collaborative filtering.
1. Introduction
Personalisation and recommendation have become indispensable in most systems nowadays and the trend still continues to grow in that direction. During the last decade, the growth of interactions continuously supported innovations in a datadriven fashion. This is advantageous as we need to understand a user along with his preferences, peculiarities and behaviour to adapt recommendation and personalisation in order to provide an appealing user experience. This is done by inventive user models and by injecting information into modern personalisation engines based on techniques of machine learning, but the bedrock of such efforts is a thorough knowledge about the user, either by observation (implicit knowledge) or by questioning (explicit knowledge).
The strong dependence on usergenerated data is curse and blessing at the same time, because the fundamental problem with user feedback is its uncertainty. This means that a considerable fraction of users behave differently in the same context or decide otherwise if a decision task has to be repeated. This phenomenon is often denoted as user noise (or human uncertainty in recent research) and gives user data the nature of a random variable. As an introductory example we consider the repeated rating of film trailers in a short temporal interval (granting same emotional and cognitive context) with a sufficient number of distractors between rating repetitions. Figure 1 shows the different ratings of four users of this experiment, which will be described in more detail in forthcoming sections. It becomes clear that this user feedback is scattering around a central tendency and hence supports the assumption of a random variable. This feature has recently been in the focus of some research that presented both, induced problems in the evaluation of adaptive systems as well as attempts for possible solution strategies.
The commonality of all these contributions in the field of user modelling, adaptation and personalisation is that there is a very strong systemcentric view in which user variation is something undesirable and should be modelled with the eye to eliminate. All developed solutions more or less simply try to ignore uncertain data (which obviously leads to results with less uncertainty as well) but they are by no means satisfactory and thus we have to ask whether this controversial view amidst a large fraction of researches is yet worthwhile. In this contribution, we want to introduce a new and diametrically opposed paradigm in which we consider uncertainty no more as a mistake or dysfunction with destructive side effects, but rather as an opportunity for gathering additional information. Such an undertaking sensibly has to start with the measurement of user feedback along with its uncertainty and by transmission of this data to a user model, which maps uncertainty into an information space in order to successively supplement a user profile. Since the measurement of user noise or human uncertainty has already been a subject of research, we confine ourselves to the development of a novel user model with special sensitivity for response uncertainty. This naturally leads to the following research questions:

What does a possible model look like that considers human decision variability and maps it into the highest possible concentration of additional information?

How can this information be integrated into existing recommender systems and personalisation engines?

What are the final benefits of this particular user model and what are the benefits of this novel paradigm in general?
2. Related Work
Recommender Systems and Assessment.
A lot of research about recommendation and personalisation produced a variety of techniques and approaches (Jannach et al., 2010; Ricci
et al., 2015). For the comparative assessment, different metrics are used to determine the prediction accuracy, such as the root mean squared error (RMSE), the mean absolute error (MAE), along with many others (Herlocker, 2004; Bobadilla et al., 2013). In our contribution, we internalise existing criticism about a lack of understanding human beings in the process of system design (McNee
et al., 2006; Knijnenburg et al., 2012) and develop a user model that is close to the current way of looking at the functionality of the human brain.
Dealing with Uncertainties.
The relevance of our contribution arises from the fact that the unavoidable human uncertainty sometimes has a vast influence on the evaluation of different prediction algorithms (Amatriain
et al., 2009; White
et al., 2000). The idea of uncertainty is not only related to recommender systems but also to measuring sciences such as metrology. Recently, a paradigm shift was initiated on the basis of a so far incomplete theory of error (Grabe, 2011; Buffler
et al., 2001). In consequence, measured properties are currently modelled by probability density functions and quantities calculated therefrom are then assigned a distribution by means of a convolution of their argument densities. This model is described in (JCGM, 2008a).
We transfer this perspective to user feedback by considering it as a single draw from an underlying distribution.
This provides us with a probabilistic reference which we can use to verify the predictions based on our own user model.
The Idea of Human Uncertainty.
The idea of underlying distributions for user feedback is not farfetched since the complexity of human perception and cognition can successfully be addressed by means of latent distributions (D’Elia and
Piccolo, 2005). We adopt the idea of modelling user uncertainty by means of individual Gaussians for constructing our individual response models and thus follow the argumentation in latest research of neuroscience and metrology (JCGM, 2008b; Pouget, 2006).
Probabilistic modelling of cognition processes is also quite common to the field of computational neuroscience. In particular, aspects of human decisionmaking can be stated as problems of probabilistic inference (Friston, 2010)
(often referred to as the “Bayesian Brain” paradigm). At any time when a decision has to be made, one has to consider a variety of yet unknown states of the world which are most relevant for the decision process itself. According to (Friston, 2013), each of these states are unconsciously estimated by a population of neurons (agency) and thus being made accessible to the brain. In doing so, there is evidence that each agent provides a probability density over possible values of such a state of the world (probabilistic populaton codes) and thus also accounts for its uncertainty (Pouget, 2006). However, these estimations slightly differ in each cognition trial due to the volatile concentration of released neurotransmitters, impacting the spiking habits of downstream neurons (neural noise) (Faisal
et al., 2008; Pouget, 2006). In other words, human decisions can be seen as uncertain quantities by nature of the underlying cognition mechanisms.
In this paper, we adopt the theory of noisy probabilistic population codes (nPPCs) and use them to construct a user model that can naturally represent and explain response uncertainty with neural noise, mapping this uncertainty to specific neural parameters.
Human Uncertainty in Computer Science. User noise or human uncertainty has been mentioned before in computer science. The first reference in the context of user feedback came along with a study on the applicability of thinking strategies to product recommendations, where reliability problems were registered for repeated ratings (Hill, 1995). The authors, like (Herlocker, 2004) later on, have already speculated on its impact on adaptive systems’ accuracy. This assumption was later confirmed when uncertainty in user ratings (measured by rerating) and its impact on the RMSE was demonstrated (Amatriain et al., 2009; Amatriain and Pujol, 2009). However, this impact was a mere deterioration by means of a specific metric. A more sophisticated analysis is provided by (Said et al., 2012), where it could be demonstrated that human uncertainty leads to an offset in a specific metric (magic barrier). This approach was later expanded by (Jasberg, 2017b) and it was shown that this barrier has some uncertainty itself, i.e. even if RMSE scores are not below this barrier, they could already be completely random. Additional contributions also revealed that each accuracy metric considering human feedback is naturally biased and that possible rankings built upon these metric scores are subject to probabilities of error (Jasberg, 2018).
To solve this problem, some strategies have been proposed over the years, just like a preprocessing step that deletes highly deviant values and replaces them by artificial values closer to the mean of a rerating (Amatriain and Pujol, 2009). Another approach is to provide modelbased predictions with uncertainty as well, so that the uncertainties of a rating and a predictor eliminate each other when calculating their difference (Koren and Sill, 2011). Yet another possibility is to compute metrics only with deviations that differ widely from a given predictor and hence can not be explained by human uncertainty (Jasberg, 2017a). With our contribution, we want to move away from the paradigm of extinction and present a way in which uncertainty can be sensibly used to generate benefits.
3. Theory, Models and Applications
The Single Neuron Model
The response of a single neuron to a stimulus is limited to transmission of electric impulses (spiking) and since each neuron has only got two states of activation, theories of neural coding assume that information is encoded by the spiking frequency (rate) (Doya, 2006). The functional relationship between responses of a neuron and the characteristics of a stimulus is given by the socalled tuning curve . Besides irregular shapes, tuning curves have frequently been measured to be bellshaped or sigmoidshaped respectively. Each tuning curve maximises for a particular value , denoted as the preferred stimulus. For bellshaped tuning curves, can be modelled as
(1) 
where the shape emerges from the Gaussian density function with mean and standard deviation (tuning curve width). The additional components and represent a frequency gain and offset respectively.
When measuring tuning curves in reality, one will find that they are somewhat noisy and that even one and the same stimulus never leads to the same response. This fluctuations can be explained by the socalled neural noise (Faisal et al., 2008). Neuronal responses must therefore be seen as random variables rather than fixed values determined by tuning curves. It has been found that follows a Poisson distribution with expectation (Pouget, 2006; Tolhurst, 1983).
Probabilistic Population Codes
We now consider a population of neurons, all with the same tuning curve type with (almost) the same neural parameters. The only difference is in the preferred values , which are equidistantly spread across the range of possible stimuli (estimation scale). Mathematically we realise this by considering the real continuous set and a sequence being an equidistant discretisation of . All parameters determining the population size , the shape of all tuning curves as well as the assumed stimulus are summarised in a vector which we will refer to as the cognition vector in the following. Given a particular fixed (which is formed from unknown underlying cognitions), each neuron of this population will respond according to its specific tuning curve and interference due to neural noise. Therefore, a response of the th neuron must be seen as a realisation of the random variable . In order to keep in mind, that these responses are always dependent on the parameters of the cognition vector, we henceforth use the notation as realisation of . The response of the entire population is formed by the response of each neuron and so we denote the dimensional random variable
(2) 
as the population response for a given with realisation . This theory of the origin of noisy population responses is illustrated in Fig. 2.
In this example, we used as the cognition vector , i.e. we consider neurons that respond to the assumed stimulus (in this case: cognition result) of stars where each tuning curve has the offset , the width and the gain . In the left picture we can see the individual tuning curves, which are distributed equidistantly over the possible range of a rating scale with five stars. For stars, the responses of each neuron can be fetched from its tuning curve. For a better representation of the population response, it has become a standard to plot the individual responses against the corresponding preferred values, which can be seen in the middle picture. These are the theoretical (static) responses without consideration of neural noise. To add this neural noise, each static response is replaced by the draw of a random number from the Poisson distribution with parameter . This can be seen in the right subfigure. We additionally repeated this sampling once, i.e. the blue and red dots in each case represent a noisy population response and it is obvious that these population responses differ not only from the theoretical reference but also very much from each other. At this point, we see that the same cognition (represented by ) leads to different neural activities on each pass, and that the estimation of a quantity (e.g. product rating) or state of the world is thereby given a natural uncertainty.
Decoder Functions
What we have learned so far is the internal basic cognitive model that allows different neuronal activity for a population of neurons to encode one and the same state of the world. By means of sensory perception, this model can be seen as the a translation of outside reality into inside representation of the external world. By means of cognition, however, this model provides the translation from a cognition black box into internal and measurable representations of thoughts and thinking patterns. The main question that arises at this point is: How does the human brain translates population activity into estimations for a state of the world or a cognition respectively. Theories assume the use of socalled decoder functions. Mathematically, a decoder function is a mapping from population activity onto the estimation scale for a stimulus or cognition. This means that for a particular useritempair , we can obtain an estimation of a single feedback submission directly from the realisation of a population response, i.e. . Hence, noisy user feedback can be represented as a random variable given as
(3) 
In neuroscience literature, there are several decoders that have been suggested and frequently used so far (Ma and Pouget, 2009). We will give a brief overview of the most frequently discussed decoder functions and will relate them directly to the context of user feedback.
Mode Value Decoder
Due to the construction of tuning curves, the MVD assumes that it is exactly the neuron with maximum spiking frequency that is most likely to be addressed by the stimulus or the state of the world. The decoder function is thus given as
(4) 
Figure 3 depicts a population response for a 3stardecision (red line) together with possible estimators (green lines) for this decision. This decoder is very prone to neural noise and its estimators are subject to a great ambiguity which, however, diminishes for higher frequencies in neural responses.
Weighted Average Decoder
The WAD accounts for all responses by setting the specific frequency as a weight to the corresponding preferred value and considers its contribution to the total response. In mathematical terms, the WAD is given by
(5) 
As to see in Fig. 3, this decoder function does not produce ambiguous estimators and is very stable against neural noise.
Maximum Likelihood Decoder
For a given population response, the MLD chooses the estimator with a view to maximise the corresponding likelihood function, i.e.
(6) 
where the likelihood itself is given by the i.i.d.assumption together with the Poisson probability mass function
(7)  
In Fig. 3 we see the likelihood function (green curve) for the particular population response together with the MLE estimator (green line). The MLD is the first decoder that explicitly accounts for neural noise through the Poisson probability mass function.
Maximum A Posteriori Decoder
The likelihood can be transformed into a probability function over the stimulus via Bayes’ theorem, i.e. . denotes prior belief about the stimulus or the states of world that has been learned through former experiences. The estimator is then chosen so that this posterior is maximised, i.e.
(8) 
The MAD is much like the MLD but with less variability since the prior works as a stabiliser. In the example of Fig. 3 we arbitrarily used a Gaussian with and as prior belief. The Bayesian brain theory assumes a prominent role of this decoder function, since each population would then naturally represent a probability density over a stimulus or state of the world which can easily be aggregated with other populations’ densities by mere addition. For multiple sensory inputs, this decoder function was proven to be a plausible description for the brain’s operating principles (Beck et al., 2007).
Theoretic Model Properties
As already mentioned, this modelling can explain the genesis of uncertain user feedback . For the purpose of exemplification, we have computed the resulting feedback distributions for all introduced decoder functions by using Eq. 3 for the cognition vector . The results are depicted in Fig. 4.
Already here, certain properties of this model are clearly visible. For the MVD, the vulnerability for neural noise is quite obvious since the corresponding feedback distribution got the largest spread. Even at the boundaries of 1 and 5 stars, there are still high probabilities, hence this distribution is only slightly more informative than a uniform distribution. Using the Bayesian definition of probability (which is interpreted as ones personal confidence), such a user feedback would be provided by users who are not sure about which rating seems appropriate. For the WAD, we notice the robustness to neural noise and the quality of estimation. A user which would utilise this decoder function would surely give constant ratings. Conversely, users with larger uncertainties can probably not be modelled by this decoder. The MLD reveals a remarkable property. Due to the small size of the rating scale , the likelihood’s maximum frequently coincides with the scale boundaries. Therefore, this theory might explain the common user behaviour of giving preference to these boundary ratings. At first glance, the MAD provides the most plausible feedback distributions which seems to strengthen the Bayesian brain theory.
Of course, all of these distributions depend on the neural parameters in the cognition vector, i.e. the ability to decode responses and compute estimators is strongly dependent on many factors. A sensitivity analysis reveals that the strongest dependency is given for the tuning curve gain, which is not surprising as the gain determines the frequency of neural responses and information is neurally encoded by frequencies. A more thorough analysis of the decoding quality is depicted in Fig. 5. By repeated cognition (population response), computed estimators can be compared with the true stimulus by means of fractions of the maximum mean squared error (MSE). In this case, the MSE has to be divided by its maximum, because a change of naturally changes the limits of the MSE which biases analyses (e.g. for the MSE can only be , but for the MSE can be up to ). For all decoders, we see that the estimation quality increases with neural frequency, i.e. the more active the population, the better a cognition can be translated into a numerical estimate.
For the MVD, lower frequencies evoke that the middle of a scale as well as its margins can be estimated slightly worse than the rest. For higher frequencies, it is only the middle of a scale that can be estimated slightly worse. This would inevitably lead to more uncertainty for these values if a rating task is repeated. This decoder thus explains the effect, that margin ratings are much more reliable. For the WAD, we can see the opposite effect. This decoder is suitable for users who give reliable ratings for the middle of a scale. Both decoder functions need high frequencies () in order to work with high quality. In contrast, the MAD and the MLD are capable of forming the same quality profile with lower frequencies. This basically means a lower neural energy consumption for a brain while maintaining full functionality (evolutionary advantages). Moreover, since the MLD is only a special case of the MAD (with uniform prior), the MAD is the only decoder function forming a variety of quality profiles, for which one would otherwise need two different decoders. Evolutionarily, it is much more reasonable to develop a single mechanism that can be used in all situations than to develop different mechanisms for this task. These arguments can therefore be seen as another indication for the applicability of the Bayesian brain paradigm. This also means that the MAD is again the best candidate for a neuroscientific user model, which is in line with the previous discussion of feedback distributions.
Neuroscientific User Model
The goal of this user model is to find a specific cognition vector for each useritempair along with a decoder function , so that the modelbased feedback minimises the difference to the real user feedback by means of an arbitrary disparity metric . Mathematically, our user model is given by with
(9)  
In the case of ambiguity, that is, when several different cognition vectors lead to the same minimum of , we will select the vector that minimises the population energy
(10) 
This reasoning arises from the fact that a human brain always has to work in an energyefficient manner and thus is most likely to use the cognition vector, in which all neurons spike as sparsely as possible. The advantage of this model is that each useritempair can be mapped into a highdimensional space that theoretically carries much more information than the consideration of product ratings does alone.
4. Evaluation and Results
Although it is clear that this model does not represent the absolute truth about the human brain, the theory of nPPCs has often been confirmed in the context of sensory perception, and in our case this model is (at least in theory) capable of explaining the lacking reliability of user feedback in same situational contexts. In this section, we will systematically evaluate this theoretic ability and examine how far this model fits the real human uncertainty and how adaptive systems may benefit from it.
Measuring Human Uncertainty (User Study)
Im 2016, we conducted the RETRAIN (Reliability Trailer Rating) study as an online experiment in which 67 participants had watched theatrical trailers of popular movies and television shows and provided ratings in five consecutive repetition trials. User ratings have been recorded for five of ten trailers so that the remaining ones act as distractors, triggering the misinformation effect, i.e. memory is becoming less accurate due to interference from postevent information. The so obtained data set comprises individual ratings. As mentioned before, we discovered that user responses scattered around a central tendency rather than being constant. From all user ratings, only 35% manifested a consistent response behaviour, while 50% gave two different responses on the same item, and 15% used even three or more different ratings. A detailed breakdown can be found in Fig. 5(a). The human uncertainty itself is thereby exponentially distributed as to see in Fig. 5(b). In the following, we use this data record^{1}^{1}1The data record is available open access at: link omitted for review to fit individual feedback distributions from all ratings that a user has given to the same item. These will then be compared with our modelbased distributions.
User Modelling Quality
To assign each useritempair its own cognition vector and decoder, we compute the modelbased feedback for each of the four decoder functions and for each cognition vector , where each set
contains 100 equidistantly distributed values. Altogether, there are combinations to be examined brute force. Subsequently, each will be compared to the real user feedback by means of Eq 9. In doing so, we use two different metrics , one for a discrete evaluation (close to the original data) and another for a continuous evaluation (more accurate, but on basis of assumptions):
 Cohen’s Kappa::

This metric is intended to evaluate interraterreliability and compares the concurrence of two independent classifications with the probability of reaching this agreement by random guessing. This metric is given by the equation , where is the relative agreement of both raters and denotes the chance of a random agreement. Its utilisation presupposes discrete finite classes which are given by the discrete rating scale of the RETRAIN study.
To compute for each cognition vector and decoder, we draw five modelbased estimators (rounded to an integer) and count the frequencies where denotes the frequency of all starratings. We only draw five estimators because the RETRAIN study has only five reratings and we would like to stay as close as possible to the real data. To cope with the randomness that arises by considering only five draws, we just repeat this procedure a thousand times. The so obtained frequency vectors can be compared to the original from our study and emerges as the relative frequency of matchings. For we basically follow the same procedure as for , except that the five estimators are not drawn from the user model but from a uniform distribution.
 Jensen–Shannon divergence::

This metric is in line with the spirit of the Bayesian brain paradigm since it assumes the user feedback to have a full probability density rather than considering only five values. Each useritempair is associated to a normal distribution obtained by MLfitting on the corresponding reratings. For the modelbased feedback, we compute estimators and also apply MLfitting. Hence, we yield the probability distributions and for which we compute the Jensen–ShannonDivergence (JSD)
(11) where denotes the KullbackLeiblerDivergence and . Since we use the base logarithm, the JSD yields the boundaries
(12) The inequality on the right provides a normed metric to evaluate the disparity of probability distributions.
For a perfect user model, one expects that only a single combination of cognition vector and decoder will make the disparity metric vanish and that all other combinations will maximize . Therefore, we will not only consider the metric scores themselves, but also their ambiguity.
The results show that each decoder function is able to fit constant users when using a sufficiently high frequency gain or sufficiently small tuning curve widths. The average ambiguity is 300, i.e. for about 300 cognition vectors we yield the same minimal metric score. Nevertheless, with the lowestenergyprinciple from Eq. 10, we can select a single vector and represent users with constant behaviour. However, it becomes clear that the strength of our neuroscientific user models is clearly in the modelling of human uncertainty. Therefore, in the following analyses, we will only consider those users who had provided unreliable feedback.
For noisy users, the mean ambiguity is 5, i.e. only five out of cognition vectors lead to the same metric minimisation. In Fig. 8 we can see the distribution of metric scores for best fitting cognition vectors. For the descriptive evaluation with Cohen’s Kappa, we see that the MVD and WAD perform very poorly. Sometimes there are useritempairs whose best fit is or even higher. The best decoders are the MLD and the MAD. It is also noteworthy that there are major overlaps when using this metric. This means that there is ambiguity for the decoder function as well. For example, a large proportion of scores for the MLD can also be achieved by the MVD. Moreover, considering the whiskers of the MAD, half of the scores can also be formed by the MLD. Nonetheless, first rankings in model quality can be anticipated. For the normed JensenShannon divergence, this ranking can be verified. In addition, we can notice the increased amount of information when using distributions rather than samples with five draws. So, the scattering of metric scores is much smaller. In summary, the maximum a posteriori decoder can be mentioned as the best decoder function leading to feedback distributions modelling reality with high quality. Therefore, we will confine ourselves to this decoder function for further elaborations.
Information Extraction
Finally, we discuss how adaptive systems benefit from these new information, provided by a highdimensional neural space. In doing so, we consider collaborative filtering (CF) in its simplest form: Useritempairs with corresponding product ratings are clustered into user groups in order to recommend new products on the basis of group popularity. In order to compute a reference for further comparisons, we use a simple kmeans approach to find clusters within the samples
(13) 
separately for each rating trial . In each sample, we randomly select 30% of the users in each cluster group to delete their ratings for the fifths item (testingusers). We use the mean rating from the remaining 70% of users (learningusers) within each group as the specific group predictor for item 5. This predictor is then compared to the original prediction of the testingusers by means of the RMSE. In this way, we get an RMSE score for each rating trial, and if we repeat the randomised selection of testingusers five times, we get 25 scores that form a distribution. This approach will be referred to as noiseless reference.
Since this approach does not consider any uncertainty information, we need a second reference to fill this gap.
For this purpose, we primarily proceed like above. The only difference is that we execute clustering on the union , i.e. we allow copies of useritempairs but with different ratings. Our cluster groups will therefore be much larger and means (predictions) more accurate. Additionally, we do not compare the predictions with ratings of a particular trial, but with the mean rating aggregated from all rating trials. This stochastic approach will be referred to as noisy reference.
In contrast, we introduce the following methods, which are based on the additional information of the nPPC user model:
 Clustering::

We associate to each useritempair and use kmeans on the neural space . We then proceed with selecting testingusers and learningusers, just as for the references above. Due to the higher dimensional space, user groups may be much more differentiated and more appropriate for testinguser predictions.
 SubspaceClustering::

Here, we associate to each useritempair and use kmeans on the neural subspaces (denoted as Clustering), (Clustering), (Clustering), (Clustering). We then proceed as above.
 NoiseProfiling::

We associate to each useritempair and aggregate by users to yield sets in which we consider only the first four items. We simply calculate the mean cognition vector for each , where is left arbitrary. Subsequently, we chose so that the variance of the modelbased feedback distribution is as close as possible to the user’s average variance gathered from the rating distributions of the remaining four items.
The results are depicted in Fig. 7. First of all, it has to be noted that the variances of the RMSE distributions are relatively large, which is due to the size of our data record. As a visualisation of the RMSE’s offset (which emerges for uncertain user data), we additionally calculated the magic barrier as proposed by (Said et al., 2012) together with its 95%confidence interval. We can see that the noisy reference operates much better than the noiseless reference. Moreover, we see that the wclustering and the oclustering behave much worse than both references. This can be explained by the fact that clustering according to user ratings for predicting other ratings can be regarded as sensible since there is a causality. In contrast, the tuning curve width as well as the offset are not causally related to the user ratings. Actually, one would expect the same for the nclustering and gclustering respectively. However, the nclustering performs a little better than the noiseless reference, although both distributions have a complete intersection. The results for the gclustering is quite surprising since it outperforms the noisy reference. We explain this by a latent causal dependency between a particular rating and neural frequency. As previously mentioned, information is primarily encoded in terms of frequencies within the human brain. Therefore, frequencies might encode ratings and uncertainty simultaneously. For the clustering as well as for the noiseprofiling we can certify an excellent performance result. However, there are some overlaps between all these distributions. For example, the left whisker of the noisy reference reaches the third quartile of the noiseprofiling approach. Hence, the noiseprofiling does perform doubtlessly better for only 75% of the data whereas the superiority for the other 25% is associated with a certain doubt. Nevertheless, the success of the neuroscientific user models against this stochastic uncertainty model is quite clear, although one should also consider that we have only investigated a very simple approach of collaborative filtering. A focused investigation of more complicated and more sophisticated techniques is therefore needed and will be done in future research.
5. Discussion
In this contribution we have broken with the view that user noise or human uncertainty is something undesirable that only causes trouble in the evaluation of adaptive systems. We explicitly permitted this human property and developed a user model using noisy probabilistic population codes (nPPCs) to reveal and exploit the inherent information. For this purpose, we formulated three research questions at the beginning.
The first question was about how a possible user model could look like that takes into account human uncertainty. For this we consider a population of neurons whose noisy tuning curves are equidistantly allocated over an estimation scale (e.g. rating scale). These tuning curves can be adjusted by various parameters, which we represent in a socalled cognition vector. By this preliminary fixing, the population provides an unreliable response to a stimulus (e.g. a choice of a particular user rating), which can be converted into a real answer through decoder functions. By means of two disparity metrics we can find a cognition vector together with a decoder function for each useritempair so that measured feedback distributions can be reproduced.
The second research question focused on possible solutions for making the information available to adaptive systems. For this we have chosen the example of collaborative filtering. The simplest and most efficient method is the clustering of user groups based on the neural parameters. These represent a higherdimensional vector space than normally yielded by clustering for ratings only. The first results are very promising. We also revealed that the neuroscientific user models outperform a mere statistical model for representing uncertainty.
The third research question referred to the possible benefits of this novel paradigm in general. Every personalisation engine and every recommender system has the goal of being able to map the human being as accurately as possible. A knowledge of the nature of man, together with his or her peculiarities, is hence crucial. The theory of nPPCs is currently a muchdebated theory and is considered by many neuroscientists to be an adequate model of human decisionmaking which is very close to real structures. The Bayesian brain paradigm is always seen in a prominent role and has been verified many times in neurological experiments. Such a theory about human cognitions is hence a decisive possibility to reach for the goal of adaptive systems and to map human beings according to their very nature. But also an epistemological component is delivered by this contribution. The nPPCs, which have so far only been investigated for sensory perception, have been used for investigating cognitions for the very first time The performance of these models on decisionmaking is thus a very good result for theoretic neuroscience as well.
Future Research
In this article, we only examined bellshaped tuning curves. However, sigmoidshaped tuning curves were also frequently measured in vivo. Further investigations of these shapes with respect to our user model are therefore absolutely necessary. For example, initial results show that the population activity for sigmoidshaped tuning curves forms convex and concave functions, which are the basis for the utility theory, i.e. the most widely used theory for human decision making in the field of economics. However, the present model still needs to be extended by many factors and correlates. For example, there might be dependencies between the cognition vector and the evaluation duration, the testimonial length, the revaluations of a given rating, but also the weather, acute stress and emotional states are possible candidates for biasing factors. Further research will also focus on accelerating the classification approach as brute force is very slow and expensive. For example, the runtime of the classification of useritempairs for the MAD was about 6 days using multiprocessing on 400 CPUs with 2TB RAM of our university’s high performance cluster.
Acknowledgements
Computational support and infrastructure was provided by the Centre for Information and Media Technology (ZIM) at the University of Duesseldorf (Germany).
References
 (1)
 Amatriain et al. (2009) Amatriain and others. 2009. I Like It… I Like It Not: Evaluating User Ratings Noise in Recommender Systems. UMAP Conference (2009).
 Amatriain and Pujol (2009) Xavier Amatriain and Josep Pujol. 2009. Rate It Again: Increasing Recommendation Accuracy by User Rerating. In RecSys Conference. ACM.
 Beck et al. (2007) J Beck, WJ Ma, PE Latham, and A Pouget. 2007. Probabilistic population codes and the exponential family of distributions. Progress in brain research 165 (2007), 509–519.
 Bobadilla et al. (2013) Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowledgebased systems 46 (2013), 109–132.
 Buffler et al. (2001) Andy Buffler, Saalih Allie, and Fred Lubben. 2001. The development of first year physics students’ ideas about measurement in terms of point and set paradigms. International Journal of Science Education 23, 11 (2001), 1137–1156.
 D’Elia and Piccolo (2005) Angela D’Elia and Domenico Piccolo. 2005. A mixture model for preferences data analysis. Computational Statistics & Data Analysis 49, 3 (2005), 917–934.
 Doya (2006) Kenji Doya. 2006. Bayesian Brain: Probabilistic Approaches to Neural Coding.
 Faisal et al. (2008) A Aldo Faisal, Luc PJ Selen, and Daniel M Wolpert. 2008. Noise in the nervous system. Nature reviews neuroscience 9, 4 (2008).
 Friston (2010) Karl Friston. 2010. The freeenergy principle: a unified brain theory? Nature Reviews Neuroscience 11, 2 (2010), 127–138.
 Friston (2013) Karl Friston. 2013. The anatomy of choice: active inference and agency. (2013).
 Grabe (2011) Michael Grabe. 2011. Grundriss der Generalisierten Gauß’schen Fehlerrechnung. Springer Berlin Heidelberg.
 Herlocker (2004) Herlocker. 2004. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22, 1 (2004), 5–53.
 Hill (1995) Will Hill. 1995. Recommending and Evaluating Choices. In SIGCHI Conference.
 Jannach et al. (2010) Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender Systems: An Introduction. Cambridge University Press.
 Jasberg (2017a) Kevin Jasberg. 2017a. Assessment of Prediction Techniques: The Impact of Human Uncertainty. In Proceedings of WISE.
 Jasberg (2017b) Kevin Jasberg. 2017b. The Magic Barrier Revisited: Accessing Natural Limitations of Recommender Assessment. In Proceedings of ACM RecSys.
 Jasberg (2018) Kevin Jasberg. 2018. Human Uncertainty and Ranking Error  Fallacies in MetricBased Evaluation of Recommender System. In In Proceedings of ACM SAC.
 JCGM (2008a) JCGM. 2008a. Guide to the Expression of Uncertainty in Measurement. Technical Report. BIPM.
 JCGM (2008b) JCGM. 2008b. Supplement 1 to the GUM  Propagation of distributions using a Monte Carlo method. Technical Report. BIPM.
 Knijnenburg et al. (2012) Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User Modeling and UserAdapted Interaction 22, 45 (2012), 441–504.
 Koren and Sill (2011) Yehuda Koren and Joe Sill. 2011. OrdRec: An Ordinal Model for Predicting Personalized Item Rating Distributions. In Proceedings of ACM RecSys.
 Ma and Pouget (2009) WJ Ma and A Pouget. 2009. Population Codes: theoretic aspects. Encyclopedia of neuroscience 7 (2009), 749–755.
 McNee et al. (2006) Sean M McNee, John Riedl, and Joseph A Konstan. 2006. Being accurate is not enough: how accuracy metrics have hurt recommender systems. In CHI’06 extended abstracts on Human factors in computing systems. ACM, 1097–1101.
 Pouget (2006) Alexandre Pouget. 2006. Bayesian inference with probabilistic population codes. Nature Neuroscience 9 (2006).
 Ricci et al. (2015) Francesco Ricci, Lior Rokach, and Bracha Shapira. 2015. Recommender Systems Handbook. Springer.
 Said et al. (2012) Alan Said, Brijnesh Jain, Sascha Narr, and Till Plumbaum. 2012. Users and Noise: The Magic Barrier of Recommender Systems. In User Modeling, Adaptation, and Personalization. Vol. 7379. Springer Berlin / Heidelberg, 237–248.
 Tolhurst (1983) D.J. Tolhurst. 1983. The statistical reliability of signals in single neurons in cat and monkey visual cortex. In Vision Research.
 White et al. (2000) John A White, Jay T Rubinstein, and Alan R Kay. 2000. Channel noise in neurons. Trends in neurosciences 23, 3 (2000), 131–137.