Infinite Factorial Finite State Machine for Blind Multiuser Channel Estimation
Abstract
New communication standards need to deal with machinetomachine communications, in which users may start or stop transmitting at any time in an asynchronous manner. Thus, the number of users is an unknown and timevarying parameter that needs to be accurately estimated in order to properly recover the symbols transmitted by all users in the system. In this paper, we address the problem of joint channel parameter and data estimation in a multiuser communication channel in which the number of transmitters is not known. For that purpose, we develop the infinite factorial finite state machine model, a Bayesian nonparametric model based on the Markov Indian buffet that allows for an unbounded number of transmitters with arbitrary channel length. We propose an inference algorithm that makes use of slice sampling and particle Gibbs with ancestor sampling. Our approach is fully blind as it does not require a prior channel estimation step, prior knowledge of the number of transmitters, or any signaling information. Our experimental results, loosely based on the LTE random access channel, show that the proposed approach can effectively recover the datagenerating process for a wide range of scenarios, with varying number of transmitters, number of receivers, constellation order, channel length, and signaltonoise ratio.
Infinite Factorial Finite State Machine
for Blind Multiuser Channel Estimation
Francisco J. R. Ruiz,
Isabel Valera,
Lennart Svensson,
and Fernando PerezCruzF. J. R. Ruiz is with the Department of Computer Science, Columbia University in the City of New York, United States of America, and with the Engineering Department, University of Cambridge, United Kingdom.
Email: f.ruiz@columbia.edu
I. Valera is with the Department of Empirical Inference at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, and with the Department of Signal Processing and Communications, University Carlos III in Madrid, Spain.
L. Svensson is with Chalmers University of Technology at Gothenburg, Sweden.
F. PerezCruz is Chief Data Scientist at the Swiss Data Science Center, Zurich, Switzerland, and is also with the Department of Signal Processing and Communications, University Carlos III in Madrid, Spain.
Manuscript received MONTH DD, 2017; revised MONTH DD, 2017.
I Introduction
One of the trends in wireless communication networks (WCNs) is the increase of heterogeneity [1]. Nowadays, users of WCNs are no longer only humans talking, and the number of services and users are booming. Machinetomachine (M2M) communications and the Internet of Things (IoT) will shape the traffic in WCN in the years to come [2, 3, 4, 5]. While there are millions of M2M cellular devices already using second, third and fourth generation cellular networks, the industry expectation is that the number of devices will increase tenfold in the coming years [6].
M2M traffic, which also includes communication between a sensor/actuator and a corresponding application server in the network, is distinct from consumer traffic, which has been the main driver for the design of long term evolution (LTE) systems. First, while current consumer traffic is characterized by small number of long lived sessions, M2M traffic involves a large number of shortlived sessions, with typical transactions of a few hundred bytes. The short payloads involved in M2M communications make it highly inefficient to establish dedicated bearers for data transmission. Therefore, in some cases it is better to transmit small payloads in the random access request itself [7]. Second, a significant number of battery powered devices are expected to be deployed at adverse locations such as basements and tunnels (e.g., underground water monitors and traffic sensors) that demand superior link budgets. Motivated by this need for increasing the link budget for M2M devices, transmission techniques that minimize the transmit power for short burst communications are needed [8]. Third, the increasing number of M2M devices requires new techniques on massive access management [9, 10]. Due to these differences, there are strong arguments for the need to optimize WCNs specifically for M2M communications [6].
The nature of M2M traffic leads to multiuser communication systems in which a large number of users may aim at entering or leaving the system (i.e., start or stop transmitting) at any given time. In this context, we need a method that allows the users to access the system in a way that the signaling overhead is reduced. The method should determine the number of users transmitting in a communication system, and jointly perform channel estimation and detect the transmitted data, with minimum pilot signaling. This problem appears in several specific applications. For instance, in the context of wireless sensor networks, where the communication nodes can often switch on and off asynchronously during operation. It also appears in massive multipleinput multipleoutput (MIMO) multiuser communication systems [11, 12], in which the base station has a very large number of antennas and the mobile devices use a single antenna to communicate within the network. In a code division multiple access (CDMA) context, a set of terminals randomly access the channel to communicate with a common access point, which receives the superposition of signals from the active terminals only [13]. In these applications, the number of users is an unknown and timevarying parameter that we need to infer.
In this paper, we aim at solving the channel estimation and symbol detection problems in a fully unsupervised way, without the need of signaling data. Our approach is thus suitable for applicability on the random access channel, in which more than one terminal may decide to transmit data at the same time. To this end, we advocate for the use of Bayesian nonparametric (BNP) tools, which can adapt to heterogeneous structures by considering an unbounded number of users in their prior. We develop a BNP model that has the required flexibility to account for any number of transmitters without the need of additional previous knowledge or bounds. We are interested in showing that BNP models can solve the problem blindly and that versatility can improve detection in heterogeneous WCNs. Thus, or goal is to show that we can recover the messages and infer the number of transmitters using as little information as possible. The transmitted messages are therefore assumed to lack additional pilots or other structure. This makes our approach applicable on any interferencelimited multiuser communication system in which the transmission of pilots or the need for synchronization may severely limit the spectral efficiency of the WCN.
The main difference between our model and the existing approaches in the literature is that our model is fully blind—with respect to the channel coefficients, the number of transmitters, and the transmitted symbols—and can recover the number of users without the need of synchronization. Indeed, the problem of multiuser detection has been widely studied in the literature, using standard linear methods (least squares or minimum mean squared error) or nonlinear approaches (e.g., successive interference cancellation or parallel interference cancellation) [14, 15]. Most of these approaches require synchronization techniques (see, e.g., [16, 17, 13]). In order to perform the user identification step, the common approach is to prespecify signature sequences that are specific for each user [18, 19, 20], which imposes a constraint on the total number of users in the system.
Our model also differs from existing approaches in other ways. For example, in [21] users may only enter or leave the system at certain predefined instants. In [22] the user identification step is performed first without symbol detection. The approach in [23] requires knowledge of the channel, and the method in [13] is restricted to flatfading channels. The methods in [24, 25] come at the cost of both important approximations and prohibitive computational cost (experiments are presented with short frames of only ten symbols). In contrast, our model is fully blind, it does not need a previous channel estimation step, and it is not restricted to flatfading channels or very short frames.
All these methods assume that the number of users in the system is known, which makes sense in a DSCDMA system but may represent a limitation in other scenarios. They also require synchronization mechanisms. We build on previous BNP approaches that do not have these constraints [26, 27, 28]. In particular, we extend the method in [28] to channels with memory, and we address the exponential complexity of [26, 27], which is only applicable for BPSK systems and for a channel length of at most . Instead, the complexity of our algorithm scales with the square of the channel length.
In summary, our model does not need any of the following requirements: a constraint on the number of users, that the users are synchronized with the network, that the users transmit a preamble, that the channel is known, or that the channel length equals one. Our model allows for an unbounded number of transmitters due to its nonparametric nature, and it is not restricted to memoryless channels because transmitters are modeled as finite state machines. Our goal is to show that we can recover the number of transmitters and their payloads with very little information.
Our model follows the three steps of cognitive communications. It gathers the incoming signal, which is a mixture of all transmitters that are active at a given time instant (perception); it learns the number of users, the channel they face, and it detects their payload in a blind way (learning); and it assigns the symbols from each user and passes that information to the network to act upon (decisionmaking). These features can mitigate the effect of collisions and reduce the number of retransmissions over the random access channel, which is of mayor concern in LTEA systems when M2M communications start being a driving force in WCNs [29, 30].
Ia Technical contributions
We model the multiuser communication system as an infinite factorial finite state machine (IFFSM) in which a potentially infinite number of finite state machines (FSMs), each representing a single transmitter, contribute to generate the observations. The symbols transmitted by each user correspond to the inputs of each FSM, and its memory accounts for the multipath propagation between each transmitterreceiver pair. The output of the IFFSM corresponds to the received signal, which depends on the inputs and the current states of the active FSMs (i.e., the active transmitters), under noise. Our IFFSM considers that transmitters may start or stop transmitting at any time, and it ensures that only a finite subset of the users become active during any finite observation period, while the remaining (infinite) transmitters remain in an idle state (i.e., they do not transmit).
As for most BNP models, one of the main challenges of our IFFSM is posterior inference. We develop a suitable inference algorithm by building a Markov chain Monte Carlo (MCMC) kernel using particle Gibbs with ancestor sampling (PGAS) [31], a recently proposed algorithm that belongs to the broader family of particle Markov chain Monte Carlo (PMCMC) methods [32]. This algorithm presents quadratic complexity with respect to the memory length, avoiding the exponential complexity of previous approaches, such as forwardfiltering backwardsampling schemes [33, 34, 26, 27]. Our experimental results, based on the LTE random access channel, show that the proposed approach efficiently solves user activity detection, channel estimation and data detection in a jointly and fully blind way.
IB Organization of the paper
The rest of the paper is organized as follows. In Section II, we review the basic building blocks that we use to develop our model, namely, the stochastic finitememory FSM and the Markov Indian buffet process (mIBP). Section III details our proposed IFFSM, whereas we describe the inference algorithm in Section IV. Sections V and VI are devoted to the experiments and conclusions, respectively.
Ii Background
Here we describe the two main building blocks that we use to develop our IFFSM: the FSM [35] and the mIBP [34].
Iia Stochastic Finite State Machines
FSMs have been applied to a huge variety of problems in many areas, including biology (e.g., neurological systems), signal processing (e.g., speech modeling), control, communications and electronics [36, 37]. In its general form [35], an FSM is defined as an abstract machine consisting of:

A set of states, which might include one or several initial and final states.

A set of discrete or continuous inputs.

A set of discrete or continuous outputs.

A state transition function, which takes the current input and state and returns the next state.

An emission function, which takes the current state and input and returns an output.
The transition and emission functions are, in general, stochastic (e.g., hidden Markov models (HMMs) [38, 39]), which implies that the input events and the states are not directly observable through the output events (instead, each input and state produces one of the possible output events with a certain probability). We focus on stochastic finitememory FSMs, in which each state can be represented as a finitelength sequence containing the last inputs. This FSM relies on a finite memory and a finite alphabet . Each input produces a deterministic change in the state of the FSM, and a stochastic observable output at time . The next state and the output depend on the current state and the input. The state can be expressed as the vector containing the last inputs, i.e., , therefore yielding different states, where denotes the cardinality of the set . Each state can only transition to different states, depending on the next input event. A state diagram of an HMM and an FSM is depicted in Figure 3.
For any finitememory FSM, the probability distribution over the inputs and the observations (outputs) can be written as
(1) 
i.e., the likelihood of each observation depends not only on , but also on the previous inputs. The model also requires the specification of the initial state, which is defined by the inputs . We show the graphical model for an FSM with memory length in Figure 4. Note that this model can be equivalently represented as a standard HMM with a sparse transition probability matrix of size . However, the HMM representation of the FSM leads to a computationally intractable inference algorithm due to the exponential dependency on the memory length of the state space cardinality.
IiB Markov Indian Buffet Process
The central idea behind BNPs is the replacement of classical finitedimensional prior distributions with general stochastic processes, allowing for an openended number of degrees of freedom in a model [40]. They constitute an approach to model selection and adaptation in which the model complexity is allowed to grow with data size [41].
Within the family of BNP models, the Markov Indian buffet process (mIBP) is a variant of the Indian buffet process (IBP) [42] and constitutes the main building block of the infinite factorial hidden Markov model (IFHMM), which considers an infinite number of firstorder Markov chains with binaryvalued states that evolve independently [34].
The mIBP places a prior distribution over a binary matrix with a finite number of rows and an infinite number of columns. The th row represents time step , whereas the th column contains the (binary) states of the th Markov chain. Each element indicates whether the th Markov chain is active at time instant , with , , and . The states evolve according to the transition matrix
(2) 
i.e., is the transition probability from inactive to active, and is the selftransition probability of the active state. We assume an inactive initial state for all chains at time .
In order to specify the prior distribution over the transition probabilities and , we focus here on the stickbreaking construction of the IBP in [43], because it allows for simple and efficient inference algorithms. Under this construction, we first introduce the notation to denote the sorted values of , such that , with
(3) 
and
(4) 
being the indicator function, which takes value one if its argument is true and zero otherwise. Here, is the concentration hyperparameter of the model, which controls the expected number of latent Markov chains that become active a priori. The prior over variables is given by
(5) 
which does not depend on the index . In (3) and (5), , and are hyperparameters of the model. This prior distribution ensures that, for any finite value of , only a finite number of columns become active, while the rest of them remain in the allzero state.
Iii Model Description
We use the stochastic finitememory FSM and the mIBP detailed in Section II as building blocks for our IFFSM model, which we use for modeling a multiuser communication system in which the number of transmitters is potentially infinite.
The key idea of our approach is to model the multiuser communication system as a factorial FSM in which each FSM represents a user, and the inputs to the FSM are the symbols sent by each transmitter. The memory of the FSM accounts for the multipath propagation between each transmitterreceiver pair, and the output of the factorial FSM model corresponds to the received signal, which depends on the inputs and the current states of all the active FSMs (transmitters).
In order to account for an unbounded number of transmitters (parallel FSMs), we need to consider an inactive state, such that the observations cannot depend on those transmitters that are inactive. Furthermore, we need to ensure that only a finite number of them become active for any finitelength observed sequence. As detailed in Section IIB, the mIBP in [34] meets these requirements by placing a prior distribution over binary matrices with an infinite number of columns.
Iiia Infinite Factorial Finite State Machine
We place a mIBP prior over an auxiliary binary matrix , where each element indicates whether the th FSM is active at time instant . While active, the input symbol to the th FSM at time instant , denoted by , is assumed to belong to the set , with finite cardinality . Here, represents the constellation, and stands for the transmitted symbol of the th transmitter at time instant . We assume that the transmitted symbols are independent and uniformly distributed on the set . While inactive, we can assume that and, therefore, each input , with . This can be expressed as
(6) 
where denotes a point mass located at , and denotes the uniform distribution over the set .
In the IFFSM model, each input symbol does not only influence the observation at time instant , but also the future observations, . Therefore, the likelihood function for depends on the last input symbols of all the FSMs, yielding
(7) 
with being the matrix that contains all symbols . We assume innactive states for (note that implies and viceversa).
The resulting IFFSM model, particularized for , is shown in Figure 5. Note that this model can be equivalently represented as a nonbinary version of the IFHMM in [34], using the extended states given by the vector However, we maintain the representation in Figure 5 because it allows deriving an efficient inference algorithm.
IiiB Observation Model
The model described in the previous section is general and can be applied to any sequence that can be explained by a potentially unbounded number of parallel FSMs. In this section, we particularize this model for its applicability on wireless communications, in which the parallel chains correspond to different transmitters. The IFFSM requires two general conditions for the likelihood model to be valid as the number of FSMs tends to infinity: i) the likelihood must be invariant to permutations of the chains, and ii) the distribution on cannot depend on any parameter of the th FSM if for . These conditions are met by simultaneous transmissions in WCNs. The first one says that the likelihood does not depend on how we label the different transmitters, and the second one indicates that inactive transmitters should not affect the observations. Among others, the standard discretetime interference channel model with additive white Gaussian noise fulfils these restrictions:
(8) 
In (8), represents the complex input (i.e., a symbol from a given constellation) at time instant for the th FSM (user), are the channel coefficients (emission parameters, in the BNP literature), and is the additive white noise, which is circularly symmetric complex Gaussian distributed,
(9) 
The hyperparameter is the noise variance, is the identity matrix of size , and is the number of receiving antennas, and hence the dimensionality of , and . Consequently, the probability distribution over the received complex vector at time is described by
(10) 
We finally place a circularly symmetric complex Gaussian prior over the channel coefficients, i.e.,
(11) 
and an inversegamma hyperprior over the variances ,
(12) 
where and , being , and hyperparameters of the model. The mean and standard deviation of the variances are respectively given by and . The choice of this particular prior is based on the assumption that the channel coefficients are a priori expected to decay with the index , since they model the multipath effect. However, if the data contains enough evidence against this assumption, the posterior distribution will assign higher probability mass to larger values of .
We depict the corresponding graphical model in Figure 6, with a value of .
Iv Inference
One of the main challenges in Bayesian probabilistic models is posterior inference, which involves the computation of the posterior distribution over the hidden variables in the model given the data. In most models of interest, including BNP models, the posterior distribution cannot be obtained in closed form, and an approximate inference algorithm is used instead. In many BNP time series models, inference is carried out via MCMC methods [44]. In particular, for the IFHMM [34], typical approaches rely on a blocked Gibbs sampling algorithm that alternates between sampling the number of parallel chains and the global variables (emission parameters and transition probabilities) conditioned on the current value of matrices and , and sampling matrices and conditioned on the current value of the global variables. We follow a similar algorithm, which proceeds iteratively as follows:

Step 1: Add new inactive chains
^{2} (FSMs) using an auxiliary slice variable and a slice sampling method. In this step, the number of considered parallel chains is increased from its initial value to (we do not update because the new chains are in the allzero state). 
Step 2: Sample the states and the input symbols of all the considered chains. Compact the representation by removing those FSMs that remain inactive in the entire observation period, consequently updating .

Step 3: Sample the global variables, i.e., the transition probabilities and and the observation parameters for each active chain (), as well as the variances .
In Step 1, we follow the slice sampling scheme for inference in BNP models based on the IBP [43, 34], which effectively transforms the model into a finite factorial model with parallel chains. We first sample an auxiliary slice variable , which is distributed as
(13) 
where , and we can replace the uniform distribution with a more flexible scaled beta distribution. Then, starting from , new variables are iteratively sampled from
(14) 
with , until the resulting value is less than the slice variable, i.e., until . Since Eq. 14 is logconcave in [43], we can apply adaptive rejection sampling (ARS) [45]. Let be the number of new variables that are greater than the slice variable. If , then we expand the representation of matrices and by adding zero columns, and we sample the corresponding perchain global variables (i.e., and ) from the prior, given in Eqs. 11 and 5, respectively.
Step 2 consists in sampling the elements of the matrices and given the current value of and the global variables. In this step, several approaches can be taken. A naïve Gibbs sampling algorithm that sequentially samples each element (jointly with ) is simple and computationally efficient, but it presents poor mixing properties due to the strong couplings between successive time steps [46, 34]. An alternative to Gibbs sampling typically applied in factorial HMMs is blocked sampling, which sequentially samples each parallel chain, conditioned on the current value of the remaining ones. This approach requires a forwardfiltering backwardsampling (FFBS) sweep in each of the chains, which yields runtime complexity of for our IFFSM model. The exponential dependency on makes this method computationally intractable. In order to address this limitation of the FFBS approach, we propose to jointly sample matrices and using PGAS, an algorithm recently developed for inference in statespace models and nonMarkovian latent variable models [31]. The runtime complexity of this algorithm scales as , where is the number of particles used for the PGAS kernel. Details on the PGAS approach are given in Section IVA.
After running PGAS, we remove those chains that remain inactive in the whole observation period. This implies removing some columns of and as well as the corresponding variables , and , and updating .
In Step 3, we sample the global variables in the model from their complete conditional distributions.
(15) 
being the number of transitions from state to state in the th column of . For the selftransition probabilities of the active state , we have
(16) 
The complete conditional distributions over the emission parameters for all chains and for all taps are given by complex Gaussians of the form
(17) 
for , where is the matrix containing all vectors . Here, we have defined for notational simplicity as the vector that contains the th component of vectors for all and , as given by
(18) 
We can write the parameters and as follows. We first define the extended matrix of size , with
(19) 
as the diagonal matrix containing all variables , and as the dimensional vector containing the th element of each observation . Then, the posterior parameters in Eq. 17 are given by
(20) 
and
(21) 
being the conjugate transpose, ’’ the Kronecker product, and the identity matrix of size .
Regarding the complete conditionals of the variances , they are given by inversegamma distributions of the form
(22) 
being the squared Lnorm of the vector .
Iva Particle Gibbs with Ancestor Sampling
We rely on PGAS [31] for Step 2 of our inference algorithm, in order to obtain a sample of the matrices and (which at this stage of the inference algorithm are truncated to columns). PGAS is a method within the framework of PMCMC [32], which is a systematic way of combining sequential Monte Carlo (SMC) and MCMC to take advantage of the strengths of both techniques.
In PGAS, a Markov kernel is constructed by running an SMC sampler in which one particle is set deterministically to a reference input particle. This reference particle corresponds to the output of the previous PGAS iteration (extended to account for the new FSMs). After a complete run of the SMC algorithm, a new reference trajectory is obtained by selecting one of the particle trajectories with probabilities given by their importance weights. In this way, the resulting Markov kernel leaves its target distribution invariant, regardless of the number of particles used. In contrast to other particle Gibbs with backward simulation methods [47, 48], PGAS can also be applied to nonMarkovian latent variable models, i.e., models that are not expressed on a statespace form [31, 49]. In this section, we briefly describe the PGAS algorithm and provide the necessary equations for its implementation.
In PGAS, we assume a set of particles for each time instant , each representing the hidden states (hence, they also represent ). We denote the state of the th particle at time by the vector of length . Similarly, the input reference particle for each time instant is denoted as . We also introduce the ancestor indexes in order to denote the particle that precedes the th particle at time . That is, corresponds to the index of the ancestor particle of . Let also be the ancestral path of particle , i.e., the particle trajectory that is recursively defined as
(23) 
Figure 7 shows an example to clarify the notation.
The machinery inside the PGAS algorithm resembles an ordinary particle filter, with two main differences: one of the particles is deterministically set to the reference input sample, and the ancestor of each particle is randomly chosen and stored during the algorithm execution. Algorithm 1 summarizes the procedure. For each time instant , we first generate the ancestor indexes for the first particles according to the importance weights . Given these ancestors, the particles are then propagated across time according to a distribution . For simplicity, and dropping the global variables from the notation for conciseness, we assume that
(24) 
i.e., particles are propagated as in Figure 5 using a simple bootstrap proposal kernel, . The th particle is instead deterministically set to the reference particle, , whereas the ancestor indexes are sampled according to some weights . Indeed, this is a crucial step that vastly improves the mixing properties of the MCMC kernel.
We now focus on the computation of the importance weights and the ancestor weights . For the former, the particles are weighted according to
(25) 
being the set of observations . We have applied (24) to derive this expression. Eq. 25 implies that, in order to obtain the importance weights, it suffices to evaluate the likelihood at time .
The weights used to draw a random ancestor for the reference particle are given by
(26) 
In order to obtain this expression, we have made use of the Markov property of the model, and we have also ignored factors that do not depend on the particle index . Note that the transition probability factorizes across the parallel chains of the factorial model, as given in (24). We also note that, for memoryless models (i.e., ), Eq. 26 can be simplified, since the product in the last term is not present and, therefore, .
IvB Computational Complexity
For any channel length , the resulting complexity of each iteration of the algorithm scales as . This is because the most expensive step is the computation of the weights in (26) for , which has complexity scaling as . This computation needs to be performed for each time instant , and hence the resulting overall complexity.
V Experiments
In this section, we apply the IFFSM model in Section III to the problem of user activity detection and blind channel estimation. We use two different channel models, namely, a standard multipath Rayleigh fading channel model in Section VA and a raytracing model in Section VB.
Va Multipath Rayleigh Fading
In LTE, when the user equipment (UE) wants to transmit information, it needs to ask for resources in the random access channel (RACH) [50]. This is the first of four messages between the enhanced Node B (eNB), i.e., the network access point, and the UE before the actual transmission of information starts. The UE selects one of random sequences with symbols and uses subcarriers ( MHz) to transmit the sequence in ms in a ms subframe ( ms are used for cyclic prefix and time guard band). In the typical configuration, there is a RACH subframe every ms frame. If two UEs transmit the same preamble sequence at the same physical RACH, one of them (or both) would not get a response from the network and it (they) would need to resend the RACH preamble, although in most cases this channel goes unused.
In our simulations, we assume that instead of a RACH preamble asking for resources, our IoT devices send a symbol sequence with the payload that they want to transmit for ms, and the remaining ms are reserved for time guard band. The timing of the IoT might not be as accurate as typical cellphones, and hence we allow for much larger guard band. We assume that the devices can start their transmission at any time in the first half of the physical RACH, and we also assume active users in each physical RACH channel interval. We consider a Rayleigh AWGN channel, i.e., the channel coefficients and the noise are circularlysymmetric complex Gaussian distributed with zero mean, where the covariances matrices are and , respectively. We simulate a base scenario with receiving antennas, quadrature phaseshift keying (QPSK) modulation (the constellation is normalized to yield unit energy), and . Using this base configuration, we vary one of the parameters while holding the rest fixed.
We set the hyperparameters of the IFFSM as , , , , and . The choice of and is based on the fact that we expect the active Markov chains to remain active and, therefore, the transition probabilities from active to active , which are distributed, are a priori expected to be of the order of one over a few hundred bits. In order to avoid getting trapped in local modes of the posterior, we use a tempering procedure. We first add artificial noise so that the resulting and after each iteration we reduce this noise by a factor of , until there is no artificial noise left. After that, we run additional iterations to favor exploitation.
Evaluation. For the recovered transmitters,
(27) 
being the inferred channel coefficients.
We compare our approach (denoted by IFFSM in the plots) with three genieaided methods which have perfect knowledge of the true number of transmitters and channel coefficients.
For each considered scenario, we run independent simulations, each with different simulated data. We run iterations of our inference algorithm, and obtain the inferred symbols as the componentwise maximum a posteriori (MAP) solution over the last samples. The estimates of the channel coefficients are then obtained as the MAP solution, conditioned on the data and the inferred symbols . For the BCJR algorithm, we obtain the symbol estimates according to the componentwise MAP solution for each transmitter and each instant . For the genieaided PGAS and FFBS methods, we follow a similar approach by running the algorithms for iterations and considering the last samples to obtain the symbol estimates. Unless otherwise specified, we use particles for PGAS.
Results with perfect knowledge of the channel memory.
We first evaluate the performance of our model and inference procedure assuming that the memory of the channel is accurately known. We initially consider memoryless channels, i.e., with memory length .
Figure 13 shows the results when the noise variance varies from to (SNR from 1dB to 13dB).
Figure 19 shows the results when the true number of transmitters ranges from 2 to 6. Although the number of parameters to be estimated grows with the number of transmitters, we observe that the performance is approximately constant. The IFHMM recovers all the transmitters in nearly all the simulations, with performance similar to the genieaided methods.
Figure 25 shows the results when the number of receiving antennas varies from 2 to 30. In this figure, we observe that we need at least 6 receivers in order to properly recover the messages of all the transmitters. We see no further improvement in detectability with more than 8 receivers. As expected, the performance in terms of ADER and SER improves when the number of receiving antennas increases, as the diversity in the observations helps to recover the transmitted symbols and channel coefficients. This behavior is similar to the obtained by the genieaided PGAS and FFBS, as shown in this figure. Note that the MSE curve flattens after approximately receivers, as it reaches the threshold imposed by the noise level.
We now evaluate the performance of our model and inference procedure for frequencyselective channels, i.e., considering . Figure 31 shows the results when the noise variance varies from to (SNR from 2dB to 11dB), considering to generate the data. We use the true value of the channel length for inference. In the figure, we show the ADER, the SER, the MSE, a boxplot representation of the inferred number of transmitters , and also a boxplot representation of the number of recovered transmitters. As in the memoryless case, the performance improves with the SNR. However, the values of the resulting SER and ADER are much lower than in the memoryless case for a fixed value of the noise variance. For instance, the SER in Figure 13 for (SNR=dB) is one order of magnitude above the reported SER in Figure 31 for the same value of the noise variance (SNR=dB). This is a sensible result, because the channel memory adds more redundancy in the observed sequence, and therefore the resulting SNR is higher for a fixed value of the noise variance. Our inference algorithm is able to exploit such redundancy to better estimate the transmitted symbols, despite the fact that more channel coefficients need to be estimated. Note that the performance in terms of ADER and SER is similar to the genieaided PGASbased method.
In Figure 37, we show the obtained results for different values of the parameter , ranging from to . We use the true value of the channel length for inference, and we consider in these experiments. The figure shows the ADER, the SER, the MSE, and boxplot representations of the inferred number of transmitters and the number of recovered transmitters. Here, it becomes clear that our model can exploit the redundancy introduced by the channel memory, as the performance in terms of SER and ADER improves as increases. The MSE also improves with , although it reaches a constant value for , similarly to the experiments in which we increase the number of receivers (although differently, in both cases we add redundancy to the observations). We can also observe that the performance is similar to the genieaided methods (we do not run the FFBS algorithm for due to its computational complexity).
Finally, we run an experiment with a different constellation order. Specifically, we use a 1024QAM constellation, normalized to yield unit energy, and we vary the noise variance from to (SNR from 28dB to 37dB). In order to improve the mixing of the sampling procedure, and due to the high cardinality of , after running iterations of the inference algorithm, we run additional iterations in which we sequentially sample each hidden chain conditioned on the current value of the remaining ones, similarly to the standard FFBS algorithm for factorial models. We proceed similarly for the genieaided PGAS method. We do not run the genieaided FFBS or BCJR methods due to their high computational complexity in this scenario. We plot in Figure 43 the ADER, SER, and MSE, as well as a boxplot representation of the inferred number of transmitters and the number of recovered transmitters. We observe that, for the considered values of the SNR, we can recover the five transmitters in nearly all the cases. Furthermore, the SER curve of our approach closely follows the curve of the genieaided method with perfect knowledge of the channel coefficients and the number of transmitters.
Results for mismatched channel length. In the experiments above, we have assumed that the true value of the channel length is known at the receiver side. As this scenario might not be realistic, we now run an experiment to show that we can properly estimate the transmitted symbols and the channel coefficients as long as our inference algorithm considers a sufficiently large value of . For this purpose, we use our base experimental setup and generate data using (i.e., memoryless channel). However, we use different values for the channel length for inference.
In Figure 49, we show the obtained results for ranging from to . The obtained ADER and SER do not significantly degrade with increasing values of , and we are able to recover the five transmitters in nearly all the cases. Interestingly, the MSE improves as increases. This is a consequence of the way we measure it when is larger than the ground truth, as we compare our channel estimates with zero. The fact that the MSE becomes lower indicates that we obtain better estimates for the zero coefficients than for the nonzero ones, which in turn implies that our inference algorithm can properly decrease the channel variances when needed.
Sensitivity to the number of particles. Finally, we evaluate the impact of the number of particles in the PGAS kernel on the performance of the proposed algorithm. Note that, as the effective dimensionality of the hidden space increases, we should expect a larger number of particles to be required in order to properly estimate the transmitted symbols. To see this, we design an experiment with transmitters and . Figure 50 shows the loglikelihood trace plot for iterations of the inference algorithm, with a number of particles ranging from to . This experiment is based on a single run of the algorithm, as it is enough to understand its behavior. It can be seen that the best performance is achieved with the largest number of considered particles. Additionally, this plot suggests that particles are enough for this scenario.
We also show in Figure 53 the number of inferred transmitters , as well as the number of recovered transmitters, for each value of . In this figure, we represent with a green star the true number of transmitters (again, these results are obtained after a single run of the algorithm). Although we infer transmitters with only particles, Figure (b)b shows that we only recover of them. The other two transmitters have been mixed in the last two inferred chains. In agreement with Figure 50, increasing the number of particles from to does not seem to improve performance: in both cases our algorithm is able to recover all the transmitters. Furthermore, even the genieaided PGAS algorithm, which has perfect knowledge of the channel coefficients, needs a large value of (above ) in order to recover all the transmitters.
We can conclude from these plots that we should adjust the number of particles based on the number of transmitters. However, the number of transmitters is an unknown quantity that we need to infer. There are two ways to overcome this apparent limitation. A sensible solution is to adaptively select the number of particles as a function of the current number of active transmitters, . In other words, as we gather evidence for the presence of more transmitters, we consequently increase . A second approach, which is computationally less demanding but may present poorer mixing properties, consists in running the PGAS inference algorithm sequentially over each chain, conditioned on the current value of the remaining transmitters, similarly to the standard FFBS procedure for the IFHMM [34]. Alternatively, we can apply the PGAS algorithm over fixedsized blocks of randomly chosen transmitters.
VB Raytracing Model
With the aim of considering a more realistic communication scenario, we use WISE software [52] to design an indoor wireless system. This software tool, developed at Bell Laboratories, includes a 3D raytracing propagation model, as well as algorithms for computational geometry and optimization, to calculate measures of radiosignal performance in userspecified regions. Its predictions have been validated with physical measurements.
Using WISE software and the map of an office located at Bell Labs Crawford Hill, we place receivers and transmitters across the office, intentionally placing the transmitters together in order to ensure that interferences occur in the nearby receivers. Figure 54 shows the considered map.
We consider a WiFi transmission system with a bandwidth of MHz or, equivalently, ns per channel tap. We simulate the transmission of symbol bursts over this communication system, using a QPSK constellation normalized to yield unit energy. We scale the channel coefficients by a factor of , and we consequently scale the noise variance by , yielding . We set the transmission power to dBm. Each transmitter becomes active at a random point, uniformly sampled in the interval . We set time instants, so the burst duration of symbols ensures overlapping among all the transmitted signals.
WiFi systems are not limited by the noise level, which is typically small enough, but by the users’ interferences, which can be avoided by using a particular frequency channel for each user. Our goal is to show that cooperation of receivers in a WiFi communication system can help recover the symbols transmitted by several users even when they simultaneously transmit over the same frequency channel, therefore allowing for a larger number of users in the system.
In our experiments, we vary from to . Five channel taps correspond to the radio signal travelling a distance of 750 m, which should be enough given the dimensions of this office space (the signal suffers attenuation when it reflects off the walls, so we should expect it to be negligible in comparison to the lineofsight ray after a 750m travelling distance). Following the tempering procedure described above, we initialize the algorithm with and we linearly increase the SNR for around iterations, running additional iterations afterwards. We compare our IFFSM with a nonbinary IFHMM model with state space cardinality using FFBS sweeps for inference (we do not run the FFBS algorithm for due to its computational complexity). In this case, we set the hyperparameters as , , , , and .
We show in Table (a)a the number of transmitters recovered after running the two inference algorithms, together with the inferred value of , averaged for the last iterations. We see that the IFFSM recovers the transmitters in all cases, and it does not tend to create spurious chains. In contrast, the IFHMM significantly overestimates the number of transmitters, which deteriorates the overall symbol estimates and, as a consequence, not all the transmitted symbols are recovered. In the best case, the IFHMM only recovers out of the transmitters. In the extreme case of , the inference algorithm for the IFHMM estimates chains, but none of them corresponds to a true transmitter; rather, each transmitter is explained by a combination of several chains.
For completeness, we additionally report in Table (b)b the MSE of the channel coefficients, averaged for the last iterations. We sort the transmitters so that the MSE is minimized, and ignore the extra inferred transmitters. As expected, the MSE decreases as we consider a larger value of , since the model better fits the true radio propagation model.


Vi Conclusions and Discussion
In this paper, we have proposed a fully blind approach for joint channel estimation and detection of the transmitted data when the number of transmitters is unknown. Our approach is based on a BNP model, which we refer to as IFFSM, and considers a potentially unbounded number of stochastic finitememory FSMs that evolve independently over time. Our model and the PGASbased inference algorithm allow us to readily account for frequencyselective channels, avoiding the exponential complexity with respect to the channel length of previous approaches. We have evaluated the performance of our approach through a comprehensive experimental design using both synthetic and real data.
These results are promising for the suitability of BNPs applied to signal processing for communications. As a result, this paper opens several further research lines on this area. Some examples of such possible future research lines are:

Improving the scalability of the inference algorithm. This would allow us to account for both a larger number of transmitters and larger observation sequences at a sublinear computational cost. We believe that further investigations on faster approximate inference algorithms would be of great interest and importance for the development of practical receivers that operate in realtime.

An online inference algorithm. In many cases, the receiver does not have access to a fixed window of observations, but data arrives instead as a neverending stream. Thus, an online inference algorithm instead of a batch one is also of potential interest.

Modelling of timevarying channels. Our current approach is restricted to static channels, i.e., the channel coefficients do not vary over time. A potentially useful research line may consist in taking into account the temporal evolution of the channel coefficients.

An extension of the model that accounts for coding schemes. A channel coding scheme can be used in order to add redundancy to the user’s data, effectively decreasing the resulting bit error probability. This redundancy can potentially be included into the model and exploited by the inference algorithm.

An extension for sparse code multiple access (SCMA). Our approach can also be adapted to the particularities of the uplink SCMA techniques [53, 54], which use nonorthogonal transmission signals in the random access channel to reduce latency. The signal structure and the sparsity of the code/pilot scheme may simplify the inference procedure of the Bayesian model, at the cost of an upper bound on the number of potential users in the network. This would result in a model that does not need to be nonparametric, but our joint approach may still reduce the errors of the current severalstage detectors.
An extension of our IFFSM, less related to the aforementioned research lines, consists in a semiMarkov approach. In this way, transmitters are assumed to send only a burst of symbols during the observation period, and a changepoint detection algorithm may detect the (de)activation instants.
Acknowledgments
Francisco J. R. Ruiz acknowledges the European Union H2020 program (Marie SkłodowskaCurie grant agreement 706760).
Francisco J. R. Ruiz received the B.Sc. degree in Electrical Engineering from the University of Seville, Spain, and the M.Sc. degree in Multimedia and Communications from the University Carlos III in Madrid, Spain, in 2010 and 2012, respectively. He received the Ph.D. degree in 2015 from the University Carlos III in Madrid. He is currently a Postdoctoral Research Scientist at the Department of Computer Science in Columbia University, and at the Engineering Department in the University of Cambridge. Francisco holds a Marie SkłodowskaCurie fellowship in the context of the E.U. Horizon 2020 program. His research is focused on statistical machine learning, specially Bayesian modeling and inference. 
Isabel Valera is currently leading the Probabilistic Learning Group in the Department of Empirical Inference at the Max Planck Institute for Intelligent Systems in Tübingen. Prior to this she was a postdoctoral researcher at the University of Cambridge, and previously, she was a Humboldt postdoctoral fellowship holder at Max Planck Institute for Software Systems. She obtained her PhD in 2014 and her Master degree in 2012, both from the University Carlos III in Madrid. 
Lennart Svensson was born in Älvängen, Sweden in 1976. He received the M.S. degree in electrical engineering in 1999 and the Ph.D. degree in 2004, both from Chalmers University of Technology, Gothenburg, Sweden. He is currently Professor of Signal Processing, again at Chalmers University of Technology. His main research interests include machine learning and Bayesian inference in general, and nonlinear filtering, deep learning, and multitarget tracking in particular. 
Fernando PerezCruz (IEEE Senior Member) was born in Sevilla, Spain, in 1973. He received a PhD. in Electrical Engineering in 2000 from the Technical University of Madrid and an MSc/BSc in Electrical Engineering from the University of Sevilla in 1996. He is the Chief Data Scientist at the Swiss Data Science Center (ETH Zurich and EPFL). He has been a member of the technical staff at Bell Labs and an Associate Professor with the Department of Signal Theory and Communication at University Carlos III in Madrid and Computer Science at Stevens Institute of Technology. He has been a visiting professor at Princeton University under a Marie Curie Fellowship and a Research Scientist at Amazon. He has also held positions at the Gatsby Unit (London), Max Planck Institute for Biological Cybernetics (Tübingen), BioWulf Technologies (New York) and the Technical University of Madrid and Alcala University (Madrid). His current research interest lies in machine learning and information theory and its application to signal processing and communications. Fernando has organized several machine learning, signal processing, and information theory conferences. Fernando has supervised 8 PhD students and numerous MSc students, as well as one junior and one senior Marie Curie Fellow. 
Footnotes
 The complex Gaussian distribution over a vector of length is given by , where , denotes the complex conjugate, and denotes the conjugate transpose. A circularly symmetric complex Gaussian distribution has and .
 An inactive chain is a chain in which all elements .
 The complete conditional is the conditional distribution of a hidden variable, given the observations and the rest of hidden variables.
 Our procedure is fully blind, so it might return spurious transmitters or it might mix two or more transmitters in a single chain. We discard those chains in our evaluations.
 For the genieaided methods, we use and .
 We obtain the peruser SNR (in dB) as .
 We depict the 25th, 50th and 75th percentiles in the standard format, as well as the most extreme values. Moreover, the mean value is represented with a pink circle, and the true number of transmitters is represented with a green star.
References
 “Small cell market status,” Informa Telecoms and Media. Issue 4, Tech. Rep., December 2012.
 “M2M white paper: the growth of device connectivity,” The FocalPoint Group, Tech. Rep., 2003. [Online]. Available: goo.gl/S0j5O
 “Global machine to machine communication,” Vodafone, Tech. Rep., 2010. [Online]. Available: goo.gl/4V8az
 “Device connectivity unlocks value,” Ericsson, Tech. Rep., Jan. 2011. [Online]. Available: goo.gl/alou6
 S.Y. Lien, K.C. Chen, and Y. Lin, “Toward ubiquitous massive accesses in 3GPP machinetomachine communications,” IEEE Communications Magazine, vol. 49, no. 4, pp. 66–74, 2011.
 H. S. Dhillon, H. Huang, H. Viswanathan, and R. A. Valenzuela, “Fundamentals of throughput maximization with random arrivals for M2M communications,” IEEE Transactions on Communications, vol. 62, no. 11, pp. 4094–4109, Nov. 2014.
 Y. Chen and W. Wang, “Machinetomachine communication in LTEA,” in 2010 IEEE Vehicular Technology Conference, Sep. 2010.
 H. S. Dhillon, H. C. Huang, H. Viswanathan, and R. A. Valenzuela, “Powerefficient system design for cellularbased machinetomachine communications,” CoRR, vol. abs/1301.0859, 2013. [Online]. Available: http://arxiv.org/abs/1301.0859
 C.Y. Tu, C.Y. Ho, and C.Y. Huang, “Energyefficient algorithms and evaluations for massive access management in cellular based machine to machine communications,” in Proceedings of 2011 IEEE Vehicular Technology Conference, Sep. 2011, pp. 1–5.
 C.Y. Ho and C.Y. Huang, “Energysaving massive access control and resource allocation schemes for M2M communications in OFDMA cellular networks,” IEEE Wireless Communications Letters, vol. 1, no. 3, pp. 209–212, Jun. 2012.
 J. Hoydis, S. T. Brink, and M. Debbah, “Massive MIMO in the UL/DL of cellular networks: How many antennas do we need?” IEEE Journal on Selected Areas in Communications, vol. 31, no. 2, pp. 160–171, Feb. 2013.
 L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Benefits and challenges,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 742–758, Oct. 2014.
 M. A. Vázquez and J. Míguez, “User activity tracking in DSCDMA systems,” IEEE Transactions on Vehicular Technology, vol. 62, no. 7, pp. 3188–3203, 2013.
 S. Verdu, Multiuser detection. Cambridge University Press, 1998.
 M. Jiang and L. Hanzo, “Multiuser MIMOOFDM detection for next generation wireless systems,” Proceedings of the IEEE, vol. 95, no. 7, pp. 1430–1469, Jul. 2007.
 W.C. Wu and K.C. Chen, “Identification of active users in synchronous CDMA multiuser detection,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 9, pp. 1723–1735, dec 1998.
 P. Pad, A. Mousavi, A. Goli, and F. Marvasti, “Simplified MAPMUD for active user CDMA,” IEEE Communications Letters, vol. 15, no. 6, pp. 599–601, Jun. 2011.
 Z. Pi and U. Mitra, “On blind timing acquisition and channel estimation for wideband multiuser DSCDMA systems,” Journal of VLSI signal processing systems for signal, image and video technology, vol. 30, no. 13, pp. 127–142, 2002.
 S. Buzzi, L. Venturino, A. Zappone, and A. De Maio, “Blind user detection in doubly dispersive DS/CDMA fading channels,” IEEE Transactions on Signal Processing, vol. 58, no. 3, pp. 1446–1451, Mar. 2010.
 A. Mousavi, P. Pad, P. Delgosha, and F. Marvasti, “A new decoding scheme for errorless codes for overloaded CDMA with active user detection,” in 18th International Conference on Telecommunications, May 2011, pp. 201–205.
 K. W. Halford and M. BrandtPearce, “Newuser identification in a CDMA system,” IEEE Transactions on Communications, vol. 46, no. 1, pp. 144–155, jan 1998.
 A. Angelosante, E. Biglieri, and M. Lops, “Multiuser detection in a dynamic environment  Part II: Joint user identification and parameter estimation,” IEEE Transactions on Information Theory, vol. 55, pp. 2365–2374, May 2009.
 H. Zhu and G. B. Giannakis, “Exploiting sparse user activity in multiuser detection,” IEEE Transactions on Communications, vol. 59, no. 2, pp. 454–465, Feb. 2011.
 D. Angelosante, E. Grossi, G. B. Giannakis, and M. Lops, “Sparsityaware estimation of CDMA system parameters,” in IEEE 10th Workshop on Signal Processing Advances in Wireless Communications ’09, Jun. 2009, pp. 697–701.
 D. Angelosante, E. Biglieri, and M. Lops, “Lowcomplexity receivers for multiuser detection with an unknown number of active users,” Signal Processing, vol. 90, pp. 1486–1495, May 2010.
 I. Valera, F. J. R. Ruiz, and F. PerezCruz, “Infinite factorial unbounded hidden Markov model for blind multiuser channel estimation,” in 4th International Workshop on Cognitive Information Processing, May 2014.
 ——, “Infinite factorial unboundedstate hidden Markov model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 9, pp. 1816–1828, 2015.
 I. Valera, F. J. R. Ruiz, L. Svensson, and F. PerezCruz, “A Bayesian nonparametric approach for blind multiuser channel estimation,” in European Signal Processing Conference, 2015.
 M. S. Ali, E. Hossain, , and D. I. Kim, “LTE/LTEA random access for massive machinetype communications in smart cities,” IEEE Communication Magazine, vol. 55, no. 1, pp. 76–83, Jan. 2017.
 M. Polese, M. Centenaro, A. Zanella, and M. Zorzi, “M2m massive access in lte: Rach performance evaluation in a smart city scenario,” in Proceedings of the International Conference on Communications, May 2016.
 F. Lindsten, M. I. Jordan, and T. B. Schön, “Particle Gibbs with ancestor sampling,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 2145–2184, 2014.
 C. Andrieu, A. Doucet, and R. Holenstein, “Particle Markov chain Monte Carlo methods,” Journal of the Royal Statistical Society Series B, vol. 72, no. 3, pp. 269–342, 2010.
 Z. Ghahramani and M. I. Jordan, “Factorial hidden Markov models,” Machine Learning, vol. 29, no. 2–3, pp. 245–273, 1997.
 J. Van Gael, Y. W. Teh, and Z. Ghahramani, “The infinite factorial hidden Markov model,” in Advances in Neural Information Processing Systems, vol. 21, 2009.
 J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduction to Automata Theory, Languages, and Computation (3rd Edition). Boston, MA, USA: AddisonWesley Longman Publishing Co., Inc., 2006.
 J. A. Anderson and T. J. Head, Automata Theory with Modern Applications. Cambridge University Press, 2006.
 J. Wang, Handbook of Finite State Based Models and Applications, 1st ed. Chapman & Hall/CRC, 2012.
 L. E. Baum and T. Petrie, “Statistical inference for probabilistic functions of finite state Markov chains,” The Annals of Mathematical Statistics, vol. 37, no. 6, pp. 1554–1563, 1966.
 L. Rabiner and B. Juang, “An introduction to hidden Markov models,” ASSP Magazine, IEEE, vol. 3, no. 1, pp. 4–16, Jan. 1986.
 M. I. Jordan, Hierarchical models, nested models and completely random measures, M.H. Chen, D. Dey, P. Mueller, D. Sun, and K. Ye, Eds. New York, (NY): Springer, 2010.
 P. Orbanz and Y. W. Teh, “Bayesian nonparametric models,” in Encyclopedia of Machine Learning. Springer, 2010.
 T. L. Griffiths and Z. Ghahramani, “The Indian buffet process: An introduction and review,” Journal of Machine Learning Research, vol. 12, pp. 1185–1224, 2011.
 Y. W. Teh, D. Görür, and Z. Ghahramani, “Stickbreaking construction for the Indian buffet process,” in Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 11, 2007.
 C. P. Robert and G. Casella, Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer New York., 2005.
 W. R. Gilks and P. Wild, “Adaptive rejection sampling for Gibbs sampling,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 41, no. 2, pp. 337–348, 1992.
 S. L. Scott, “Bayesian methods for hidden Markov models: recursive computing in the 21st century,” Journal of the American Statistical Association, vol. 97, no. 457, pp. 337–351, 2002.
 N. Whiteley, C. Andrieu, and A. Doucet, “Efficient Bayesian inference for switching statespace models using particle Markov chain Monte Carlo methods,” Bristol Statistics Research, Tech. Rep., 2010.
 F. Lindsten and T. B. Schön, “Backward simulation methods for Monte Carlo statistical inference,” Foundations and Trends in Machine Learning, vol. 6, no. 1, pp. 1–143, 2013.
 I. Valera, F. J. R. Ruiz, L. Svensson, and F. PerezCruz, “Infinite factorial dynamical model,” in Advances in Neural Information Processing Systems, 2015.
 S. Sesia, I. Toufik, and M. Baker, LTE  The UMTS Long Term Evolution: From Theory to Practice, 2nd ed. Wiley, 2011.
 L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Transactions on Information Theory, vol. 20, no. 2, pp. 284–287, Mar. 1974.
 S. J. Fortune, D. M. Gay, B. W. Kernighan, O. Landron, R. A. Valenzuela, and M. H. Wright, “WISE design of indoor wireless systems: Practical computation and optimization,” IEEE Computing in Science & Engineering, vol. 2, no. 1, pp. 58–68, Mar. 1995.
 K. Au, L. Zhang, H. Nikopour, E. Yi, A. Bayesteh, U. Vilaipornsawai, J. Ma, and P. Zhu, “Uplink contention based SCMA for 5G radio access,” in Globecom Workshops (Austin, TX), 2014.
 A. Bayesteh, E. Yi, H. Nikopour, and H. Baligh, “Blind detection of SCMA for uplink grantfree multipleaccess,” in 11th International Symposium on Wireless Communications Systems (Barcelona, Spain), 2014.