On the Application of the Baum-Welch Algorithm for Modeling the Land Mobile Satellite Channel
Accurate channel models are of high importance for the design of upcoming mobile satellite systems. Nowadays most of the models for the land mobile satellite channel (LMSC) are based on Markov chains and rely on measurement data, rather than on pure theoretical considerations. A key problem lies in the determination of the model parameters out of the observed data. In this work we face the issue of state identification of the underlying Markov model whose model parameters are a priori unknown. This can be seen as a hidden Markov model (HMM) problem. For finding the maximum likelihood (ML) estimates of such model parameters the Baum-Welch (BW) algorithm is adapted to the context of channel modeling. Numerical results on test data sequences reveal the capabilities of the proposed algorithm. Results on real measurement data are finally presented.
- multi-protocol encapsulation
- forward error correction
- maximum distance separable
- Reed Solomon
- binary erasure channel
- binary error-and-erasure channel
- with respect to
- binary erasure channel
- single-error multiple-erasures
- Gaussian elimination
- independent and identically distributed
- low-density parity-check
- binary symmetric channel
- cyclic redundancy check
- Gauss-Jordan elimination
- error-and-erasure channel
- pulse-position modulation
- generalized irregular repeat-accumulate
- log-likelihood ratio
- land mobile satellite channel
- hidden Markov model
- maximum a posteriori
- line of sight
- probability density function
- state transition matrix
- soft-in soft-out
- satellite digital audio radio service
- maximum likelihood
- global positioning system
- digital video broadcasting
- European Space Agency
- simulated annealing
Satellite services to mobile users are experiencing a renewed interest thanks to the licenses granted for S-band usage for broadcast and interactive services [1, 2, 3]. The underlying communication channel, referred to as land mobile satellite channel (LMSC), is characterized by strong variations of the received signal power. Obstacles in the propagation path between the satellite and the mobile terminal, such as buildings or trees may cause shadowing or even a complete blockage of the signal. With increasing frequency and decreasing elevation angle such events become more and more likely and strongly impact service availability. A further source of fading is due to multipath propagation: objects in the vicinity of the receiver are source of reflections that cause constructive or destructive interference. In the past several authors proposed Markov chain models to describe the behavior of the LMSC [4, 5, 6, 7, 8]. The modeling approach can be divided into two stages. First a Markov chain is set up to model slow transitions between different signal levels due to macroscopic effects such as blockage, shadowing, etc. In practice models with two or three states are common, but also a larger number of states is possible. Second, fast signal variations within each state due to multipath are taken into account assuming that the signal amplitude follows some specific distribution. To give an example, a Ricean distribution may be used to describe the signal amplitude in line of sight (LoS) conditions, whereas the amplitude in a blockage state could be assumed to be Rayleigh distributed.
Knowing the underlying channel model, a major issue consists in how to determine the model parameters out of a sequence of measurement data. In literature there exist several approaches, most of them being rather simple and empirical. In  the authors propose first to associate with each measurement sample a state of the underlying Markov chain. This association is done manually. Then, for each state the distribution of the associated samples is approximated by some known distribution by means of curve fitting. In  the weighted sum of some known distributions is fitted to the probability density function (p.d.f.) of the measured data. This gives the parameters for the distributions in the different states. Then, each sample is associated with a state by placing thresholds on the signal level. The thresholds are put according to the state probabilities from the fitting step. For highly overlapping distributions, this only works with limited accuracy, as we will show later. A more rigorous attempt is the technique in  based on reversible jump Monte Carlo computation . It suggests fully blind estimation, making no prior assumptions on the number of states, nor on the specific distributions, allowing huge flexibility. However, this has the price of a significant increase in complexity and the resulting states and distributions often lack sufficient explanations in terms of underlying physical effects.
Within this work we propose a further way to estimate the model parameters. It exploits the fact that the state identification can be seen as a hidden Markov model (HMM) problem: out of the channel observation we would like to draw conclusions about the underlying Markov process that is not directly observable. A solution to this problem is given by the Baum-Welch (BW) algorithm, that has been widely used in other fields, such as speech or pattern recognition. An application to models of digital channels has already been provided in . In the sequel our focus is on the LMSC. We impose some constraints on the BW algorithm in order to improve its convergence and for sake of simplification. In particular it is well-known that its convergence properties depend on the initial model assumptions. Hence, unlike in  we assume prior knowledge on the type of distributions and the distribution parameters (to be obtained by a preceding curve fitting step). Also, we fix the number of states in advance.
The remaining part of the paper is organized as follows. In Section II we recap the BW algorithm. Further we introduce a log-domain computation of the forward-backward metric of the BW algorithm and discuss some adaptations. Section III reports the performance of the algorithm on test data, as well as on sequences of real measurement data. A comparison with the method in  is provided.
Ii Overview of the Bw Algorithm
Following the footsteps of , we consider next the problem of associating an observed sample with a state of our HMM. The BW algorithm can be applied to maximize the probability of a state given the entire observation sequence. Let us denote as the state of the HMM at time , and as the vector of observations. The problem can be formalized as follows: given the vector of , we are interested in locally calculating the probability of being in state at time , i.e. . We define
Here we used the shorthand to denote the elements of the observation sequence , with . Further, referring to (2), we define a forward metric and a backward metric . It follows that
where the normalization by corresponds to in (1) and denotes the number of states. Being the transition probability from state to , the probability of state and the probability density function given state , the forward and the backward metric can be computed iteratively as
with the initial metrics and .
Figure 1 shows an excerpt of a state diagram for a Markov chain in the interval . The nodes of the trellis diagram at each time instance denote one of the three possible states, whereas the lines denote all possible transitions. Consider for example state . As indicated by the solid arrows, the metrics from all the nodes at , as well as contribute to the calculation of the probability of state at time . The most likely state sequence can be determined by choosing the state with the highest probability at each time instance.
Further, we may wish to estimate the probability of having a transition from state at time to state at time , given the observation . This can be expressed as
and it turns out that
Even if the BW algorithm is in principle more general, we restrict ourselves to the simple case where the density functions , , are perfectly known, whereas we do not have any knowledge about the transition probabilities , which we want to estimate. To do so, we chose some initial values for ,111In principle the choice is arbitrary. Nevertheless initial values not too far from the real values facilitate the convergence of the BW algorithm. Good starting points can be found in literature (e.g. in ). run the forward-backward algorithm and re-estimate the transition probabilities and the initial state probabilities according to the re-estimation formulae
Former values of and are replaced by the new estimates and the forward-backward algorithm is run again, leading to updated estimates, which are then fed-back. This process is iterated several times. The state identification step corresponds to the E-step of the expectation-maximization (EM) algorithm, where the model parameters are assumed to be fixed. The re-estimation step corresponds to the M-step, where the most likely model parameters are determined given the hidden state sequence.
A final remark is related to the convergence of the algorithm. It is well known that the EM algorithm, as well as its special instance, the BW algorithm, increases the likelihood of the model iteration by iteration till it converges to a maximum value . However, the algorithm may converge to a local maximum of the likelihood function, rather than to a global one. The convergence of the algorithm can be facilitated by limiting the set of a priori unknown model parameters. Alternatively, a set of various starting points can be considered.
Ii-a Log-domain Implementation of the Bw Algorithm
Already for short observation sequences () the forward-backward metric may get numerically unstable. As a solution, for each a normalization of the metric is usually performed . Alternatively, a log-domain representation of the corresponding equations is proposed here. Let us define log-probabilities as , and , with being the natural logarithm. Then, (3) can be rewritten as
Note that the last term can be solved in the log-domain by applying recursively the so-called operator (also known as Jacobi logarithm) that is defined as . Exploiting the identity
and noticing that can be recursively calculated as , we have
In a similar manner, using the shorthand , and we have that the recursions
The estimation of the parameters proceeds as
Ii-B Restrictions on the Bw Algorithm
For modeling the LMSC applying the BW algorithm some a priori restrictions on the channel parameters have been applied. This is mainly motivated by two reasons. First, if reasonably good estimates of some channel parameters are available, their use may facilitate the convergence of the algorithm. Second, we consider important that the obtained results have a clear physical interpretation. To give an example, we would like states to be associated with different physical events, such as blockage of the signal or direct LoS. During this work the following restrictions have been applied:
The type of distributions to be used has been fixed in advance. The original BW algorithm allows estimating the densities iteratively as a mixture of Gaussian distributions . It is however well-established that typical propagation conditions (blockage or LoS, for instance) can be accurately modeled by known distributions.
The number of states is fixed in advance corresponding to some physical events, such as total blockage of the signal by obstacles or LoS .
Iii Applications of the Bw Algorithm
The capabilities of the BW algorithm on different data sets are evaluated next. First we generate artificially a test sequence of samples and run iterative re-estimation. Knowing the original model parameters, our goal is to assess the quality of the re-estimations provided by the BW algorithm. A comparison with the commonly used threshold method and some derivatives is done. Second, the BW algorithm is applied to data obtained from a measurement campaign.
Iii-a Application on Test Data Sequences
Given a Markov chain with transition probabilities we generate a sequence of states . For each state, an observation sample according to the associated p.d.f. is produced. For simplicity, we fix the number of states to 2. For state 1, we choose a Gaussian distribution with standard deviation . To perform different tests, the mean value ranges from to . The Gaussian distribution associated with state 2 has mean and variance . Since typically the LMSC is highly correlated , we choose the state transition probabilities of the Markov chain
with corresponding state probabilities and . The length of the state sequence (observation sequence) was set to .
Given the observations and the knowledge on the p.d.f.s, our iterative re-estimation algorithm is ran to determine the state sequence , for , as well as the transition probabilities and the state probabilities . It should be obvious that the closer the mean values of both Gaussian distributions are, the bigger shall be the deviation between the re-estimated state sequence , the associated re-estimated transition probabilities , as well as state probabilities and the actual values. To measure the distance of the two distributions associated with the two states, we make use of the Bhattacharyya distance
For sake of comparison we also apply the threshold method to separate the states . Samples below the threshold are associated with one state, the ones above with the other. We select the threshold , such that the average error probability
is minimized. In addition, we assume a priori knowledge of the state probabilities (which could be e.g. provided by a previous curve-fitting step). For two Gaussian distributions with variances , this yields
Further, to suppress frequent state transitions (crossing of the threshold) we apply moving average filtering on the observation sequence. The span of the moving average is set to 10 or 20 samples.
Figure 2 illustrates the estimated transition probability versus the Bhattacharyya distance for the BW algorithm and the threshold methods with and w/o filtering. Despite close mean values of both distributions, the BW algorithm provides always accurate estimates for the state transition probability , whereas the threshold methods typically fail when . Empirically we found that best results for the threshold methods can be obtained with an averaging window span of 10 samples. It shall be noted however that in this case the resulting state probabilities deviate remarkably as illustrated in Table I. It turns out that with increasing averaging window size even for the estimated state probability is too low. Figure 3 depicts the share of wrongly labeled states in the estimated state sequence , obtained through a comparison of the original state sequence with . Again the BW algorithm provides by far the best results, followed by the threshold methods with filtering. Note that for low Bhattacharyya distances the share of errors converges to 0.33 which corresponds to .
Iii-B Application on Measurement Data
In fall 2008 a vast measurement campaign was carried out along the US East Coast in the framework of the European Space Agency (ESA) funded MiLADY project . During the field trials the signal levels of the four satellite digital audio radio service (SDARS) satellites were recorded with a mobile vehicular receiver. A statistical channel model was derived out of the collected measurement data employing the BW algorithm. The proceedings are as follows: we first perform a curve-fitting step on the overall p.d.f. of similar to . We obtain parameters of the distributions in the different states which serve as input for the BW algorithm. The resulting state probabilities are used to initialize . The curve fitting is performed using simulated annealing (SA) , a fast meta-heuristic method for global optimization. In case the function to be optimized has several local maxima SA may overcome these and converge to the global minimum. Following literature, we chose three simple distributions to characterize the fast signal variations in the different states. The signal amplitude is assumed to follow a Rice distribution in case of direct LoS to the satellite. We associate a lognormal distribution with the shadowing state and assume that the signal amplitude in the blockage state is Rayleigh distributed. As second step, a preprocessing stage is required. The fast signal variations within a state are known to be correlated (see e.g. ). However, (2) implicitly assumes independency among observation samples given a certain state. To comply with the independence assumption, the measurement data is down-sampled, taking into account the coherence time of the process222The spatial separation between samples after down-sampling was chosen to be in accordance with . . This leads the final observation . Finally, given the three distributions and the observed sequence, the BW algorithm is applied as described in Section II.
Let’s consider a typical US urban environment with a satellite elevation of . The solid line in Figure 4 shows the p.d.f. of the measured signal envelope, whereas the dashed lines with markers give the results of curve fitting using the three p.d.f.s specified previously. The weighted sum of the Rice, lognormal and Rayleigh distributions is also plotted (dashed with diamonds) and turns out to be close to the p.d.f. of the measured data. The Bhattacharyya distance between the lognormal (Rayleigh) and Rice (lognormal) distribution is 0.5 (1.1), thus posing challenges for state identification. Table II gives the mean state durations and state probabilities obtained with different state identification methods with minimum state duration set to . Results for the BW algorithm, the threshold method from , as well as the modified threshold method with a filter length of 10 samples are shown. It can be observed that the BW algorithm and the threshold method yield the same state probabilities as obtained by means of curve fitting. However, the mean state durations obtained by the threshold method are very short. As illustrated in Figure 2 the threshold method tends to over-dimension , thus to under-dimension . If a prior filtering step is applied, the state durations become longer than the ones obtained with BW. This is caused by an under-dimensioning of for Bhattacharyya distances greater than 0.5 (c.f. Figure 2). Here, the state probabilities are no longer preserved.
This work investigates the application of the BW algorithm to determine the parameters for a LMSC model out of a set of measurement data. The BW algorithm, allows estimating iteratively the hidden state sequence and the transition probabilities of the underlying HMM even for highly overlapping states. Especially in environments with frequent shadowing events conventional methods, such as the threshold methods may lead to inaccurate results on the state transition matrix (STM) of the hidden Markov process. Adaptations of the BW algorithm presented here guarantee numerical stability, as well as proper convergence at manageable complexity. Adaptations to channels different from the LMSC are possible and may be a matter of future investigation.
This work was partly carried out in the framework of the ESA funded MiLADY project (contract no. 21159/07/NL/GLC). The authors would like to acknowledge Dr. S. Scalise and R. Prieto Cerdeira for the useful discussions.
-  A. Bolea Alamanac, P. Burzigotti, R. De Gaudenzi, G. Liva, H. N. Pham, and S. Scalise, “In-depth analysis of the satellite component of DVB-SH: Scenarios, system dimensioning, simulations and field trial results,” International Journal of Satellite Communications and Networking, vol. 27, no. 4-5, pp. 215–240, 2009. [Online]. Available: http://dx.doi.org/10.1002/sat.933
-  S. Scalise, C. Niebla, G. Gallinaro, M. Andrenacci, R. Rinaldo, O. Del Rio Herrero, M. Breiling, D. Finocchiaro, J. Cebrian Puyuelo, and G. Schlueter, “System design for Pan-European MSS services in S-band,” in Advanced Satellite Multimedia Systems Conference (ASMS) and the 11th Signal Processing for Space Communications Workshop (SPSC), 2010 5th, 2010, pp. 538 –545.
-  Satellite Digital Audio Radio Service (SDARS). [Online]. Available: http://www.sirius.com/
-  E. Lutz, D. Cygan, M. Dippold, F. Dolainsky, and W. Papke, “The land mobile satellite communication channel - recording, statistics, and channel model,” in IEEE Trans. Vehicular Technology, vol. 40, no. 2, May 1991, pp. 375–386.
-  Y. Karasawa, K. Kimura, and K. Minamisono, “Analysis of availability improvement in LMSS by means of satellite diversity based on three-state propagation channel model,” in IEEE Trans. Vehicular Technology, vol. 46, no. 4, November 1997, pp. 957–1000.
-  F. Perez-Fontan, M. Vazquez-Castro, C. Cabado, J. Garcia, and E. Kubista, “Statistical modeling of the lms channel,” in IEEE Trans. Vehicular Technology, vol. 50, no. 6, November 2001, pp. 1549–1567.
-  C. Alasseur, S. Scalise, L. Husson, and H. Ernst, “A novel approach to model the land mobile satellite channel through reversible jump Markov chain Monte Carlo technique,” IEEE Transactions on Wireless Communications, vol. 7, pp. 532–542, 2008.
-  S. Scalise, H. Ernst, and G. Harles, “Measurement and modeling of the land mobile satellite channel at Ku-band,” Vehicular Technology, IEEE Transactions on, vol. 57, no. 2, pp. 693 –703, 2008.
-  P. Green, “Reversible jump MCMC computation and Bayesian model determination,” in Biometrika, vol. 82, no. 40, 1995, pp. 711–732.
-  W. Turin and M. M. Sondhi, “Modeling Error Sources in Digital Channels,” IEEE Journal on Selected Areas in Communications, vol. 11, pp. 340–347, 1993.
-  L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Proceedings of the IEEE, February 1989, pp. 77(1):257–286.
-  W. Turin, Performance Analysis and Modeling of Digital Transmission Systems (Information Technology: Transmission, Processing and Storage), 1st ed. Springer, May 2004.
-  Mobile satellite channeL with Angle DiversitY. [Online]. Available: http://telecom.esa.int/telecom/www/object/index.cfm?fobjectid=29020
-  S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by Simulated Annealing,” Science, Number 4598, 13 May 1983, vol. 220, 4598, pp. 671–680, 1983.
-  W. C. Jakes, Microwave mobile communications. Wiley, New York, 1974.