On the Application of the BaumWelch Algorithm for Modeling the Land Mobile Satellite Channel
Abstract
Accurate channel models are of high importance for the design of upcoming mobile satellite systems. Nowadays most of the models for the land mobile satellite channel (LMSC) are based on Markov chains and rely on measurement data, rather than on pure theoretical considerations. A key problem lies in the determination of the model parameters out of the observed data. In this work we face the issue of state identification of the underlying Markov model whose model parameters are a priori unknown. This can be seen as a hidden Markov model (HMM) problem. For finding the maximum likelihood (ML) estimates of such model parameters the BaumWelch (BW) algorithm is adapted to the context of channel modeling. Numerical results on test data sequences reveal the capabilities of the proposed algorithm. Results on real measurement data are finally presented.
 LT
 Lubytransform
 CER
 codeworderrorrate
 ML
 maximumlikelihood
 MPE
 multiprotocol encapsulation
 FEC
 forward error correction
 MDS
 maximum distance separable
 RS
 Reed Solomon
 BEC
 binary erasure channel
 BEEC
 binary erroranderasure channel
 w.r.t.
 with respect to
 BEC
 binary erasure channel
 SEME
 singleerror multipleerasures
 GE
 Gaussian elimination
 i.i.d.
 independent and identically distributed
 LDPC
 lowdensity paritycheck
 BSC
 binary symmetric channel
 CRC
 cyclic redundancy check
 IT
 iterative
 GJE
 GaussJordan elimination
 EEC
 erroranderasure channel
 PPM
 pulseposition modulation
 GeIRA
 generalized irregular repeataccumulate
 LLR
 loglikelihood ratio
 LMSC
 land mobile satellite channel
 HMM
 hidden Markov model
 BW
 BaumWelch
 MAP
 maximum a posteriori
 LoS
 line of sight
 p.d.f.
 probability density function
 STM
 state transition matrix
 BCJR
 BahlCockeJelinekRaviv
 SISO
 softin softout
 EM
 expectationmaximization
 SDARS
 satellite digital audio radio service
 ML
 maximum likelihood
 GPS
 global positioning system
 DVB
 digital video broadcasting
 ESA
 European Space Agency
 SA
 simulated annealing
I Introduction
Satellite services to mobile users are experiencing a renewed interest thanks to the licenses granted for Sband usage for broadcast and interactive services [1, 2, 3]. The underlying communication channel, referred to as land mobile satellite channel (LMSC), is characterized by strong variations of the received signal power. Obstacles in the propagation path between the satellite and the mobile terminal, such as buildings or trees may cause shadowing or even a complete blockage of the signal. With increasing frequency and decreasing elevation angle such events become more and more likely and strongly impact service availability. A further source of fading is due to multipath propagation: objects in the vicinity of the receiver are source of reflections that cause constructive or destructive interference. In the past several authors proposed Markov chain models to describe the behavior of the LMSC [4, 5, 6, 7, 8]. The modeling approach can be divided into two stages. First a Markov chain is set up to model slow transitions between different signal levels due to macroscopic effects such as blockage, shadowing, etc. In practice models with two or three states are common, but also a larger number of states is possible. Second, fast signal variations within each state due to multipath are taken into account assuming that the signal amplitude follows some specific distribution. To give an example, a Ricean distribution may be used to describe the signal amplitude in line of sight (LoS) conditions, whereas the amplitude in a blockage state could be assumed to be Rayleigh distributed.
Knowing the underlying channel model, a major issue consists in how to determine the model parameters out of a sequence of measurement data. In literature there exist several approaches, most of them being rather simple and empirical. In [6] the authors propose first to associate with each measurement sample a state of the underlying Markov chain. This association is done manually. Then, for each state the distribution of the associated samples is approximated by some known distribution by means of curve fitting. In [4] the weighted sum of some known distributions is fitted to the probability density function (p.d.f.) of the measured data. This gives the parameters for the distributions in the different states. Then, each sample is associated with a state by placing thresholds on the signal level. The thresholds are put according to the state probabilities from the fitting step. For highly overlapping distributions, this only works with limited accuracy, as we will show later. A more rigorous attempt is the technique in [7] based on reversible jump Monte Carlo computation [9]. It suggests fully blind estimation, making no prior assumptions on the number of states, nor on the specific distributions, allowing huge flexibility. However, this has the price of a significant increase in complexity and the resulting states and distributions often lack sufficient explanations in terms of underlying physical effects.
Within this work we propose a further way to estimate the model parameters. It exploits the fact that the state identification can be seen as a hidden Markov model (HMM) problem: out of the channel observation we would like to draw conclusions about the underlying Markov process that is not directly observable. A solution to this problem is given by the BaumWelch (BW) algorithm, that has been widely used in other fields, such as speech or pattern recognition. An application to models of digital channels has already been provided in [10]. In the sequel our focus is on the LMSC. We impose some constraints on the BW algorithm in order to improve its convergence and for sake of simplification. In particular it is wellknown that its convergence properties depend on the initial model assumptions. Hence, unlike in [7] we assume prior knowledge on the type of distributions and the distribution parameters (to be obtained by a preceding curve fitting step). Also, we fix the number of states in advance.
The remaining part of the paper is organized as follows. In Section II we recap the BW algorithm. Further we introduce a logdomain computation of the forwardbackward metric of the BW algorithm and discuss some adaptations. Section III reports the performance of the algorithm on test data, as well as on sequences of real measurement data. A comparison with the method in [4] is provided.
Ii Overview of the Bw Algorithm
Following the footsteps of [11], we consider next the problem of associating an observed sample with a state of our HMM. The BW algorithm can be applied to maximize the probability of a state given the entire observation sequence. Let us denote as the state of the HMM at time , and as the vector of observations. The problem can be formalized as follows: given the vector of , we are interested in locally calculating the probability of being in state at time , i.e. . We define
(1) 
For now, we focus on the joint p.d.f. in the enumerator. Under the HMM assumption, after dropping the subscripts for simplicity, this p.d.f. can be rewritten as
(2) 
Here we used the shorthand to denote the elements of the observation sequence , with . Further, referring to (2), we define a forward metric and a backward metric . It follows that
(3) 
where the normalization by corresponds to in (1) and denotes the number of states. Being the transition probability from state to , the probability of state and the probability density function given state , the forward and the backward metric can be computed iteratively as
(4) 
(5) 
with the initial metrics and .
Figure 1 shows an excerpt of a state diagram for a Markov chain in the interval . The nodes of the trellis diagram at each time instance denote one of the three possible states, whereas the lines denote all possible transitions. Consider for example state . As indicated by the solid arrows, the metrics from all the nodes at , as well as contribute to the calculation of the probability of state at time . The most likely state sequence can be determined by choosing the state with the highest probability at each time instance.
Further, we may wish to estimate the probability of having a transition from state at time to state at time , given the observation . This can be expressed as
and it turns out that
(6) 
Even if the BW algorithm is in principle more general, we restrict ourselves to the simple case where the density functions , , are perfectly known, whereas we do not have any knowledge about the transition probabilities , which we want to estimate. To do so, we chose some initial values for ,^{1}^{1}1In principle the choice is arbitrary. Nevertheless initial values not too far from the real values facilitate the convergence of the BW algorithm. Good starting points can be found in literature (e.g. in [4]). run the forwardbackward algorithm and reestimate the transition probabilities and the initial state probabilities according to the reestimation formulae
Former values of and are replaced by the new estimates and the forwardbackward algorithm is run again, leading to updated estimates, which are then fedback. This process is iterated several times. The state identification step corresponds to the Estep of the expectationmaximization (EM) algorithm, where the model parameters are assumed to be fixed. The reestimation step corresponds to the Mstep, where the most likely model parameters are determined given the hidden state sequence.
A final remark is related to the convergence of the algorithm. It is well known that the EM algorithm, as well as its special instance, the BW algorithm, increases the likelihood of the model iteration by iteration till it converges to a maximum value [12]. However, the algorithm may converge to a local maximum of the likelihood function, rather than to a global one. The convergence of the algorithm can be facilitated by limiting the set of a priori unknown model parameters. Alternatively, a set of various starting points can be considered.
Iia Logdomain Implementation of the Bw Algorithm
Already for short observation sequences () the forwardbackward metric may get numerically unstable. As a solution, for each a normalization of the metric is usually performed [12]. Alternatively, a logdomain representation of the corresponding equations is proposed here. Let us define logprobabilities as , and , with being the natural logarithm. Then, (3) can be rewritten as
Note that the last term can be solved in the logdomain by applying recursively the socalled operator (also known as Jacobi logarithm) that is defined as . Exploiting the identity
and noticing that can be recursively calculated as , we have
In a similar manner, using the shorthand , and we have that the recursions
with and
with , . Finally, for the reestimation of the BW metrics we define . Taking (6) we have
The estimation of the parameters proceeds as
while
IiB Restrictions on the Bw Algorithm
For modeling the LMSC applying the BW algorithm some a priori restrictions on the channel parameters have been applied. This is mainly motivated by two reasons. First, if reasonably good estimates of some channel parameters are available, their use may facilitate the convergence of the algorithm. Second, we consider important that the obtained results have a clear physical interpretation. To give an example, we would like states to be associated with different physical events, such as blockage of the signal or direct LoS. During this work the following restrictions have been applied:

The type of distributions to be used has been fixed in advance. The original BW algorithm allows estimating the densities iteratively as a mixture of Gaussian distributions [11]. It is however wellestablished that typical propagation conditions (blockage or LoS, for instance) can be accurately modeled by known distributions.

The number of states is fixed in advance corresponding to some physical events, such as total blockage of the signal by obstacles or LoS .
Iii Applications of the Bw Algorithm
The capabilities of the BW algorithm on different data sets are evaluated next. First we generate artificially a test sequence of samples and run iterative reestimation. Knowing the original model parameters, our goal is to assess the quality of the reestimations provided by the BW algorithm. A comparison with the commonly used threshold method and some derivatives is done. Second, the BW algorithm is applied to data obtained from a measurement campaign.
Iiia Application on Test Data Sequences
Given a Markov chain with transition probabilities we generate a sequence of states . For each state, an observation sample according to the associated p.d.f. is produced. For simplicity, we fix the number of states to 2. For state 1, we choose a Gaussian distribution with standard deviation . To perform different tests, the mean value ranges from to . The Gaussian distribution associated with state 2 has mean and variance . Since typically the LMSC is highly correlated [6], we choose the state transition probabilities of the Markov chain
with corresponding state probabilities and . The length of the state sequence (observation sequence) was set to .
Given the observations and the knowledge on the p.d.f.s, our iterative reestimation algorithm is ran to determine the state sequence , for , as well as the transition probabilities and the state probabilities . It should be obvious that the closer the mean values of both Gaussian distributions are, the bigger shall be the deviation between the reestimated state sequence , the associated reestimated transition probabilities , as well as state probabilities and the actual values. To measure the distance of the two distributions associated with the two states, we make use of the Bhattacharyya distance
For sake of comparison we also apply the threshold method to separate the states [4]. Samples below the threshold are associated with one state, the ones above with the other. We select the threshold , such that the average error probability
is minimized. In addition, we assume a priori knowledge of the state probabilities (which could be e.g. provided by a previous curvefitting step). For two Gaussian distributions with variances , this yields
Further, to suppress frequent state transitions (crossing of the threshold) we apply moving average filtering on the observation sequence. The span of the moving average is set to 10 or 20 samples.
Figure 2 illustrates the estimated transition probability versus the Bhattacharyya distance for the BW algorithm and the threshold methods with and w/o filtering. Despite close mean values of both distributions, the BW algorithm provides always accurate estimates for the state transition probability , whereas the threshold methods typically fail when . Empirically we found that best results for the threshold methods can be obtained with an averaging window span of 10 samples. It shall be noted however that in this case the resulting state probabilities deviate remarkably as illustrated in Table I. It turns out that with increasing averaging window size even for the estimated state probability is too low. Figure 3 depicts the share of wrongly labeled states in the estimated state sequence , obtained through a comparison of the original state sequence with . Again the BW algorithm provides by far the best results, followed by the threshold methods with filtering. Note that for low Bhattacharyya distances the share of errors converges to 0.33 which corresponds to .
BW  T1  T10  T20  

1.13  0.60  0.33  0.33  0.31  0.29 
0.78  0.50  0.33  0.32  0.30  0.28 
0.50  0.40  0.33  0.31  0.28  0.26 
0.28  0.30  0.33  0.28  0.22  0.20 
0.13  0.20  0.33  0.22  0.07  0.04 
0.03  0.10  0.33  0.08  0.00  0.00 
IiiB Application on Measurement Data
In fall 2008 a vast measurement campaign was carried out along the US East Coast in the framework of the European Space Agency (ESA) funded MiLADY project [13]. During the field trials the signal levels of the four satellite digital audio radio service (SDARS) satellites were recorded with a mobile vehicular receiver. A statistical channel model was derived out of the collected measurement data employing the BW algorithm. The proceedings are as follows: we first perform a curvefitting step on the overall p.d.f. of similar to [4]. We obtain parameters of the distributions in the different states which serve as input for the BW algorithm. The resulting state probabilities are used to initialize . The curve fitting is performed using simulated annealing (SA) [14], a fast metaheuristic method for global optimization. In case the function to be optimized has several local maxima SA may overcome these and converge to the global minimum. Following literature, we chose three simple distributions to characterize the fast signal variations in the different states. The signal amplitude is assumed to follow a Rice distribution in case of direct LoS to the satellite. We associate a lognormal distribution with the shadowing state and assume that the signal amplitude in the blockage state is Rayleigh distributed. As second step, a preprocessing stage is required. The fast signal variations within a state are known to be correlated (see e.g. [15]). However, (2) implicitly assumes independency among observation samples given a certain state. To comply with the independence assumption, the measurement data is downsampled, taking into account the coherence time of the process^{2}^{2}2The spatial separation between samples after downsampling was chosen to be in accordance with [4]. . This leads the final observation . Finally, given the three distributions and the observed sequence, the BW algorithm is applied as described in Section II.
Let’s consider a typical US urban environment with a satellite elevation of . The solid line in Figure 4 shows the p.d.f. of the measured signal envelope, whereas the dashed lines with markers give the results of curve fitting using the three p.d.f.s specified previously. The weighted sum of the Rice, lognormal and Rayleigh distributions is also plotted (dashed with diamonds) and turns out to be close to the p.d.f. of the measured data. The Bhattacharyya distance between the lognormal (Rayleigh) and Rice (lognormal) distribution is 0.5 (1.1), thus posing challenges for state identification. Table II gives the mean state durations and state probabilities obtained with different state identification methods with minimum state duration set to . Results for the BW algorithm, the threshold method from [4], as well as the modified threshold method with a filter length of 10 samples are shown. It can be observed that the BW algorithm and the threshold method yield the same state probabilities as obtained by means of curve fitting. However, the mean state durations obtained by the threshold method are very short. As illustrated in Figure 2 the threshold method tends to overdimension , thus to underdimension . If a prior filtering step is applied, the state durations become longer than the ones obtained with BW. This is caused by an underdimensioning of for Bhattacharyya distances greater than 0.5 (c.f. Figure 2). Here, the state probabilities are no longer preserved.
Method  LoS  Shadowing  Blockage  

BW  0.66  0.14  0.20  

T1  0.66  0.14  0.20 

T10  0.70  0.12  0.18 
BW  22.42  4.11  26.88  

T1  7.25  1.48  8.90 

T10  64.35  6.87  25.15 
Iv Conclusion
This work investigates the application of the BW algorithm to determine the parameters for a LMSC model out of a set of measurement data. The BW algorithm, allows estimating iteratively the hidden state sequence and the transition probabilities of the underlying HMM even for highly overlapping states. Especially in environments with frequent shadowing events conventional methods, such as the threshold methods may lead to inaccurate results on the state transition matrix (STM) of the hidden Markov process. Adaptations of the BW algorithm presented here guarantee numerical stability, as well as proper convergence at manageable complexity. Adaptations to channels different from the LMSC are possible and may be a matter of future investigation.
V Acknowledgements
This work was partly carried out in the framework of the ESA funded MiLADY project (contract no. 21159/07/NL/GLC). The authors would like to acknowledge Dr. S. Scalise and R. Prieto Cerdeira for the useful discussions.
References
 [1] A. Bolea Alamanac, P. Burzigotti, R. De Gaudenzi, G. Liva, H. N. Pham, and S. Scalise, “Indepth analysis of the satellite component of DVBSH: Scenarios, system dimensioning, simulations and field trial results,” International Journal of Satellite Communications and Networking, vol. 27, no. 45, pp. 215–240, 2009. [Online]. Available: http://dx.doi.org/10.1002/sat.933
 [2] S. Scalise, C. Niebla, G. Gallinaro, M. Andrenacci, R. Rinaldo, O. Del Rio Herrero, M. Breiling, D. Finocchiaro, J. Cebrian Puyuelo, and G. Schlueter, “System design for PanEuropean MSS services in Sband,” in Advanced Satellite Multimedia Systems Conference (ASMS) and the 11th Signal Processing for Space Communications Workshop (SPSC), 2010 5th, 2010, pp. 538 –545.
 [3] Satellite Digital Audio Radio Service (SDARS). [Online]. Available: http://www.sirius.com/
 [4] E. Lutz, D. Cygan, M. Dippold, F. Dolainsky, and W. Papke, “The land mobile satellite communication channel  recording, statistics, and channel model,” in IEEE Trans. Vehicular Technology, vol. 40, no. 2, May 1991, pp. 375–386.
 [5] Y. Karasawa, K. Kimura, and K. Minamisono, “Analysis of availability improvement in LMSS by means of satellite diversity based on threestate propagation channel model,” in IEEE Trans. Vehicular Technology, vol. 46, no. 4, November 1997, pp. 957–1000.
 [6] F. PerezFontan, M. VazquezCastro, C. Cabado, J. Garcia, and E. Kubista, “Statistical modeling of the lms channel,” in IEEE Trans. Vehicular Technology, vol. 50, no. 6, November 2001, pp. 1549–1567.
 [7] C. Alasseur, S. Scalise, L. Husson, and H. Ernst, “A novel approach to model the land mobile satellite channel through reversible jump Markov chain Monte Carlo technique,” IEEE Transactions on Wireless Communications, vol. 7, pp. 532–542, 2008.
 [8] S. Scalise, H. Ernst, and G. Harles, “Measurement and modeling of the land mobile satellite channel at Kuband,” Vehicular Technology, IEEE Transactions on, vol. 57, no. 2, pp. 693 –703, 2008.
 [9] P. Green, “Reversible jump MCMC computation and Bayesian model determination,” in Biometrika, vol. 82, no. 40, 1995, pp. 711–732.
 [10] W. Turin and M. M. Sondhi, “Modeling Error Sources in Digital Channels,” IEEE Journal on Selected Areas in Communications, vol. 11, pp. 340–347, 1993.
 [11] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Proceedings of the IEEE, February 1989, pp. 77(1):257–286.
 [12] W. Turin, Performance Analysis and Modeling of Digital Transmission Systems (Information Technology: Transmission, Processing and Storage), 1st ed. Springer, May 2004.
 [13] Mobile satellite channeL with Angle DiversitY. [Online]. Available: http://telecom.esa.int/telecom/www/object/index.cfm?fobjectid=29020
 [14] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by Simulated Annealing,” Science, Number 4598, 13 May 1983, vol. 220, 4598, pp. 671–680, 1983.
 [15] W. C. Jakes, Microwave mobile communications. Wiley, New York, 1974.