On the Achievable Communication Rates of Generalized Soliton Transmission Systems ^{†}^{†}thanks: Work in progress as part of Eado Meron’s PHD
Abstract
We analyze the achievable communication rates of a generalized solitonbased transmission system for the optical fiber channel. This method is based on modulation of parameters of the scattering domain, via the inverse scattering transform, by the information bits. The decoder uses the direct spectral transform to estimate these parameters and decode the information message. Unlike ordinary OnOff Keying (OOK) soliton systems, the solitons’ amplitude may take values in a continuous interval. A considerable rate gain is shown in the case where the waveforms are 2bound soliton states. Using traditional information theory and inverse scattering perturbation theory, we analyze the influence of the amplitude fluctuations as well as soliton arrival time jitter, on the achievable rates. Using this approach we show that the time of arrival jitter (GordonHaus) limits the information rate in a continuous manner, as opposed to a strict threshold in OOK systems.
I Introduction
Communication through optical fiber channels has evolved enormously in the past couple of decades leading to unprecedented information rates. Current information theoretic techniques are unsuccessful in producing relevant methods to predict capacity bounds for these channels.
The nonlinear terms that affect signal evolution led to the following question: Is the information capacity of the optical fiber channel monotonically increasing with the input power and if so does the capacity grow logarithmically with power as it does for linear channels?. Moreover, as the complexity allowed in receivers grows, one looks for insights regarding the best (not necessarily the simplest) modulation schemes, signal space and error correcting codes.
The basic generic partial differential equation (PDE) that describes the value of the electric field in space and time (in one dimension) in the optical fiber channel is (using normalized coordinates and the notations of [1]):
(1) 
where the input of the channel is and the output is . This equation is also known as the nonlinear scalar Schrodinger (NLS) equation.
Since the equivalent channel is nonlinear, a Fourier frequency based analysis is not applicable. The usual way to analyze a continuoustime channel in traditional information theoretic methods is to reduce the problem into a discrete one by considering the Nyquist samples of the input and output. However, since a bandlimited input signal evolves into an output signal of an infinite bandwidth, it is hard to find such discretetime models. We stress that the nonlinearity invoked by the channel is fundamental and is conceptually different than nonlinearities caused by transmitter/reciever elements, e.g., amplifier nonlinearities, that have been studied in the past.
A different approach to analyzing signal evolution in nonlinear channels is the inverse scattering transform (IST). In this paper we present this method and apply it to a few tractable problems in which we approximate the achievable data rates. We also explain how this method should be developed to characterize the channel capacity and useful modulation schemes. A similar approach, first proposed by Hasegawa and Nyu ([2]), suggested using multiple solitonic waveforms. It should be noted that the IST approach presented in this paper is not complete in the following aspects:

It does not provide single letter results for capacity but rather a new method to evaluate it which we feel is more esthetic and better suited for this channel.

It does not solve the problems associated with the bounded symbol rate for solitonic waveforms which is characterized by the GordonHaus bound ([3]).

It lacks a simple representation of the manner in which white noise is projected onto complex solitonic waveforms.
We now give a short introduction to the inverse scattering transform which solves a set of nonlinear evolution problems via the solution of three linear problems. A recent more complete introduction to the IST and its properties can be found in [4].
Ii A primer on the inverse scattering transform
The inverse scattering method does not consist of a single generic transform. In fact, it is more like a recipe for solving a family of nonlinear evolution problems. This recipe involves finding two dependent operators, and , that obey certain conditions. The first operator of the two defines an eigenvalue problem for an auxiliary wave function. This problem gives rise to solutions that obey boundary conditions at and . The way these solutions evolve from to defines the scattering coefficients or the scattering data which is analogous to spectral content in the Fourier frequency domain for linear channel problems. Extracting the scattering data from the dependent operator is called the direct transform. Due to special properties of the above operators the evolution of the scattering data in time is rather simple. Moreover, there is a well defined inverse transform that maps the scattering data back to . All of the above steps, direct transform, inverse transform and time evolution are essentially linear problems. We now present the details of the IST for NLS.
To solve integrable systems such as the NLS one needs to express the system as a compatibility condition of two linear equations for a wave equation, :
(2)  
(3) 
where and are differential operators in the derivatives and are called a Lax pair if:
(4) 
The right hand side is called the commutator of and . If (4) holds then one can show that the eigenvalues of the operator are Zinvariant:
even though is not invariant.
Finding a Lax pair for a given channel is not an obvious task. The Lax pair for the NLS, found by Zacharov and Shabat, is given by:
(5)  
(6) 
It is readily verified that for these operators, equation (2) results in the NLS equation. To solve equation 2 we define vector wave functions for real with asymptotic boundary conditions:
(7)  
(8) 
The pair is a complete system of solutions for (2). Therefore:
(9) 
For we have:
(10) 
Comparing with equation (7) we recognize and as the transmission and reflection coefficients which characterize the scattering data. The origin of these names is in the fact that they describe what happens to a wave as it evolves from to and scatters due to a certain ”potential”, (these terms are borrowed from quantum physics).
The discrete eigenvalues of the direct scattering problem are the set of points:
(11) 
for which:
(12) 
Equation (12) shows that both and approach zero as approaches infinity. The scattering data, which has a onetoone correspondence with and hence carries the same information is comprised of:
(13) 
where:
(14) 
are called the norming constants of the bound states.
The time evolution of the scattering data is governed by (3). The solution of which (see [1]) is:
(15)  
(16)  
(17) 
The inverse problem of finding given the scattering data is solved by a set of linear integral equations which are beyond the scope of this introduction.
The IST is important because it allows the use of linear techniques to solve initial value problems for nonlinear problems. The main advantages of the IST is that the number of degrees of freedom that a signal is comprised of, i.e. number of solitons and radiation bandwidth, does not change through signal evolution and that there are natural invariantovertime scalar entities, i.e. eigenvalues. The evolution of the solution in time is most naturally described through the IST and thus the IST may lead us to insights regarding communication strategies. For an indepth survey of the IST also known as the nonlinear Fourier transform, and an OFDMlike communication transmission method, see the paper by Yousefi et al. ([4, 5]).
Actually, Hasegawa and Nyu (see [2, 1])proposed a communication method that utilizes the fact that the eigenvalues associated with the IST do not change in time. The advantages of the method proposed by Hasegawa et al. is that it is inherently multivalued and is similar to frequency based methods for linear channels. The authors do not analyze the effects of amplifier noise on the eigenvalues and its implications on channel capacity. In the following we elaborate on the ideas of eigenvalue communications, extend it, and use results from perturbation theory (see for example [6, 7]) for nonlinear models to estimate the capacity of nonlinear channels. We extend the idea of eigenvalue communication to that of spectral data modulation and use the inverse scattering transform as our transmitter and the direct spectral transform in the receiver. We quantify the effects of amplitude fluctuations and jitter on achievable communication rates and evaluate them for realistic configurations.
Iii Carrying information using the scattering data
We assume that the channel model is represented by:
(18) 
where is the perturbation term. Throughout this paper we assume that is a white noise Gaussian process (in space and time) with a unit power spectral density (PSD) and is used as a scaling parameter for the noise power that can be related to the physical parameters of the channel. We will later plugin these parameters to obtain practical results. The noise is generated by the effects of amplifiers that are spread throughout the fiber but we assume it is injected adiabatically ^{1}^{1}1i.e. infinitesimal noise admitted at every point along the channel .
The information rate, , that can be achieved on this channel is upper bounded by the channel capacity which is the maximal mutual information between the channel’s input and output :[8]
(19) 
where the maximization is taken over some input constraint (e.g. an average power constraint, a peak power constraint, Fourier bandwidth or maximal number of solitons). Evaluating the quantity above turns out to be a very difficult task for nonlinear channels. In this paper we argue that the most tractable way of evaluating this quantity is through the statistics of the scattering data of the IST, namely the eigenvalues and the absolute value of the norming constants.
Since the IST is a onetoone transformation the mutual information between the waveforms is equivalent to the mutual information between the scattering data, i.e.,
(20) 
To lower bound this quantity one can assume that the input is a reflectionless potential so that the information transmitted solely through the discrete eigenvalues and corresponding norming constants, i.e.,
where the time index is added since the Gaussian noise changes the eigenvalues (that are otherwise constant) and can also possibly change their number via the birth/death of a soliton.
The observation that the mutual information in a nonlinear integrable channel can and should be evaluated through the statistics of the scattering data is the main observation in this paper. This approach is motivated by several reasons. First, unlike the linear spectral domain (i.e., Fourier methods where spectral broadening is a result of the nonlinearity) the number of degrees of freedom in the scattering domain remains unchanged throughout the noiseless evolution. Second, the eigenvalues and norming constants serve as scalar candidates for the transmission of information implying a new notion of a nonlinear signal space. The evaluation of equation (III) is still a cumbersome task, yet it can be approximated assuming some further restrictions on the input signals.
Iv Main Results
In the generalized soliton transmission system we analyze, a codeword is a (large) set of symbols. Each symbol is in fact a set of eigenvalues and norming constants. At the transmitter, the waveform to be transmitted is generated using the inverse scattering transform. At the receiver, direct scattering is applied to derive the set of (perturbed) eigenvalues and norming constants. The waveforms used by the transmitter have infinite support but decay exponentially so that if we truncate the waveforms to create a finite symbol period at a suitable distance we can treat the resulting soliton interaction as being negligible to the added noise.
Throughout this Section the imaginary parts of the eigenvalues, which can be considered to be generalized amplitudes, will be the information carrying agents.
Iva Information embedded in a single soliton
In this setting single solitons are modulated. Unlike ordinary OOK their amplitudes belong to a continuous interval. Without a perturbation, the single soliton solution for the NLS is
(21) 
for which the corresponding discrete eigenvalue of the IST is . For the rest of the paper we assume all eigenvalues are purely imaginary (except for perturbations). The localization of the soliton is around .
We use results from [1] for the first order perturbations of the eigenvalues. The resulting fluctuation in the amplitude is:
(22) 
where .
Assuming is a bandlimited white Gaussian noise, i.e. , we get:
(23) 
i.e., the variance of the additive noise is proportional to (unlike ordinary multiplicative noise for which the variance is proportional to ).
Thus, assuming information is transmitted in the amplitude of a single is soliton () we have the following scalar channel:
(24) 
where is a Gaussian r.v. with zero mean and a variance of . We dismiss the probability that the soliton vanishes completely and allow for to be theoretically zero (or negative). This scenario can be prevented (with high probability) by using which in the limit of going to zero has negligible effect on the capacity. We lower bound the mutual information for the case with . It is assumed that the noise is Gaussian and of the the largest possible variance:
(25)  
(26)  
(27)  
(28) 
where we use the uniform distribution as the input prior and bound (27) using the fact that Gaussian noise has the highest entropy for a given variance. We refer to this quantity as the ”soliton spectral efficiency” which can be considered to be the NLS analog of spectral efficiency in conventional (linear) channels where it’s measured in bits/Hertz.
The capacity can also be directly evaluated using the BlahutArimoto algorithm ([9, 10]). Using this algorithm for the channel model restricted s.t. and we get that the true capacity is 1.568 bits per channel use while our bounds reads 1.275 bits per channel use. The capacity achieving prior and the resultant distribution are plotted in Figures 2 and 3. Note that the capacity achieving prior has both atoms and a continuous distribution which is typical of interval constrained capacity problems ([11]).
IvB Information embedded in a soliton train below the GordonHaus rate
The above result shows that the interval should be as large as possible to allow for each soliton to convey as many bits as possible. In fact when one considers transmitting many solitons one after the other, there are other considerations which bound the optimal interval size from both sides, namely intersoliton interaction and arrival time jitter.
We now consider the case where many solitons are modulated sequentially. The distance between neighboring solitons is a multiple of the width of the widest soliton, i.e., where is chosen so that the intersoliton interaction has a negligible (compared to that of the noise) effect on the eigenvalues. The distance between solitons is inversely proportional to the symbol rate and thus in an optimal system is bounded from below.
Since we wish to assume a perfectly (or at least an almost perfectly) synchronized communication system, the typical arrival time jitter needs to be less than the distance between neighboring solitons. The time of arrival jitter is known to be directly connected to fluctuations of the real part of the soliton which is linearly related to the velocity of the soliton as can be seen from 21. The fluctuations of the real part of the eigenvalue are very similar to that of the imaginary part:
(29) 
Using we integrate to account for the arrival time jitter (neglecting terms that do not originate from the velocity change):
(30) 
This is the known GordonHaus ([3]) phenomena that bounds the symbol rate of all regular soliton systems (including OOK). The worstcase arrival time jitter is proportional to . Thus, requiring a (almost) jitter free model, e.g., a outofsynchronization probability of bounds from above .
We wish to compare the gain (in terms of bits/second) of the continuous amplitude modulation scheme versus that of the OOK modulation. We assume that is tuned by the GordonHaus bound requiring nojitter and is shared by both the continuous system and the onoff reference system. The continuous system has a lower symbol rate which is times smaller than that of the reference system^{2}^{2}2Actually, one can also analyze the case where symbol widths are not constant and are proportional to . However, the continuous system conveys more bits than just one per soliton. Weighing both terms the continuous system has a bit rate which is
(31) 
times that of the reference system. We refer to this term as the ”Modulation gain”. If one would also consider the possibility that a symbol can also contain no soliton at all, and if so that the transfer probability between the continuous interval and the zero hypothesis would be less than than the modulation gain would approximately read:
(32)  
(33) 
where is the binary entropy of (see union of channels in [12]). The modulation gain is plotted in Figure 4 for different values of . It is evident that as the effective SNR improves a larger is better since it does not reduce the symbol rate.
IvC Information embedded in a 2bound soliton train below the GordonHaus rate
The system described above could be analyzed using the framework of perturbations to sech profiles without necessarily using the perturbation theory of the inverse scattering transform. However, considering more complicated symbols made up of more than one soliton the IST has major analytical and practical advantages. This is the case when the symbols are confined to be either a 2soliton bound state or a single soliton (or non). We now analyze the modulation gain of this more complicated system and address such issues as common jitter and whether the solitons should be concentric or partially spaced apart.
The idea of transmitting a few concentric solitons is proposed in the paper by Hasegawa et al. However, a 2bound soliton is effected by noise differently than each one of its components. We show that a 2bound soliton solution has a larger jitter than its components. Therefore there is a tradeoff between the enlarged bit rate and a smaller symbol rate that is induced by a larger jitter.
The basic symbol is now comprised of a 2bound soliton. This means the transmitter solves the following reflectionless algebraic inverse scattering problem for ([1]):
The norming constants are used to localize the different eigenfunctions. As a generalization of the single soliton case, we choose where is the generalized position of the eigenfunction. Actually, the eigenfunctions interact with one another and the resulting time waveform is not a superposition of 2 single soliton profiles. Nevertheless, their generalized position remains unchanged throughout the evolution (apart from noise influence) and can be recovered at the receiver. The generalized position evolution is given by (to the first order):
and thus it behaves in the same way as the center of single soliton. However, the fluctuations of the eigenvalues of a 2bound soliton, both imaginary and real parts are not orthogonal anymore. In fact they are highly correlated in the case of a small separation between generalized locations or in the case of very similar eigenvalues. Moreover, the variance of the fluctuations is generally magnified when the solitons ”overlap”. This effect makes modulating nonconcentric solitons (or actually eigenfunctions) a sensible thing to do. We plot the variance of the eigenvalues as a function of the separation between the generalized positions in Figure 5.
In this setting the detector sees two eigenvalues and two norming constants that translate to generalized positions. All of these scalar quantities are now perturbed by noise. Since the two eigenfunctions are assumed to be much closer to each other than to allow for neglecting the GordonHaus jitter, we must account for the way the jitter effects the capacity.
In linear communication problems a nonnegligible jitter in symbol arrival times can diminish the achievable rate to zero. This is due to the fact that in a linear channel the signal space is made up of translations of a limited number of base functions. Once there is a jitter, these functions are no longer orthogonal and one can not differentiate between neighboring symbols.
However, in a nonlinear integrable system, solitons can be detected through the direct scattering transform even if they are one on top of the other. Actually, they can be detected but not differentiated, i.e., both will be apparent but the receiver will not know which of the two belongs to the original slot.
To lower bound the achievable rate of the jitter effected system we assume that once the eigenfunctions are detected they are sorted according to time of arrival. This channel is equivalent to transmitting a couple of solitons (eigenvalues), adding noise and finally permuting them in the case the switched places. We note the perturbed eigenvalues before and after the possible permutation and correspondingly (n=2 for the 2bound soliton case). The permutation, which is a random variable, is noted by . The information theoretic loss (in bits) due to the jitter is bounded by:
For the two soliton case, the permutation R.V. is equivalent to a Bernoulli R.V. where the mixup probability is equal to the probability that the order of the generalized positions is changed. Using the assumption that the eigenvalues will approximately fluctuate in the same way as if the solitons were apart (and this is not true when they walkby each other) we can approximate this probability. For the set of generalized positions 1,1, this probability is equal to where . If this probability turns out to be , which is conventionally thought to be prohibitively large, the rate loss is only bits for the 2soliton symbol and only 0.25 bits per soliton ( is the Shannon binary entropy function). The main advantage is a major increase in the soliton rate, since there are two solitons per symbol.
Assuming the spacing between solitons of the same symbol is about and that original distance between symbols was the soliton rate is increased by a factor of . We approximate the mixup probability to be . Thus for this setting the ”modulation gain” compared to a simple OOK system is approximately:
(34) 
The modulation gain for a certain set of parameters is shown in Figure 6 . The gain compared to single soliton trains is roughly 2 for a wide set of parameters.
IvD Approximating the Information embedded in a soliton train slightly above GordonHaus rate
The next natural generalization is to consider an Nbound solution that is made up a train of wellspaced (spacing relates to the value of the norming constants) eigenfunctions (we assume N to be large, i.e. ¿5). The analysis of the former subsection is still a good approximation. The difference is that now the ambiguity in time of arrival is not bounded to a pair of solitons. Still, if the eigenfunctions are properly spaced the entropy of the orderofarrival sequence, , is mainly to do with the probability that consecutive eigenfunctions will change their order of arrival. The information theoretic penalty on the bit rate due to this effect is:
(35) 
Now, assume the spacing between solitons is approximately (much smaller than the one called for by the GordonHaus limit) and the total modulation gain in this setting is:
(36) 
Again, there is no problem with trains of eigenfunctions with a typical mixup (between consecutive eigenfunctions) probability of . Moreover now there is a clear tradeoff for eigenfunction spacing. The bigger the spacing the smaller the symbol rate. As the spacing becomes smaller the penalty due to jitter is larger and so a unique maximum exists. The main disadvantage compared to the previous subsection is that the processing now involves a more complicated channel code. The main advantage is a larger symbol rate.
The analysis above neglects a few things:

There is small coupling between amplitude and timeofarrival fluctuations. A precise analysis should only yield a higher rate.

When two solitons pass by each other, their perturbation statistics is changed. In many cases, their amplitude fluctuations grow and are now dependent. We ignore the growth in fluctuations since, assuming that solitons are not too crowded, the walkoff is time bounded and its effects are negligible. Furthermore, the dependency can only increase the rate. only

We ignore the possibility that a soliton will die/be born. This happens with a small probability and we assume that its effect on the achievable rates can also be bounded.
V Discussion and further work
The notion of modulating the “natural” domain of the channel is not new to communication theory. In fact, the scheme discussed in this paper can be considered to be the nonlinear analog of OFDM. Both of the methods allow for a natural examination of their respective channel capacities. There are two main differences between the two methods. The first is that in linear channels the noise projection on different modes (spectral bands) is orthogonal while in the nonlinear case the noise projection on different modes (solitons) is orthogonal only in some cases (see Figure ). The second is that OFDM is very efficient in terms of complexity (through the use of the celebrated FFT and IFFT) while the direct scattering is a computationally intensive method.
Future research directions include:

Find reasonable complexity (preferably analog) methods to carry out the tasks of inverse and especially direct scattering in the transmitter and receiver.

Use the approach discussed in the paper with more complex potentials/waveforms (not reflectionless) to lower and upper bound the overall capacity (and not just achievable rates).

While the problems above are not related to information theory, there is a totally new and interesting informationtheoretic problem that relates to communication via the scattered domain. When receiving waveforms that are comprised of Nbound solitons or solitons that are cocentric due to jitter (and not thru the constructed modulation) one detects a set of scalar values that can be detected but not differentiated. Essentially, the transmitter and receiver communicate through the transmission of a set, not a sequence, of perturbed scalar values. Clearly, transmitting and receiving a 3bound solitons conveys less information than a sequence of (ordered in time) three solitons. The question is how much less? We call this problem: communicating with colorless, but not massless, balls. For more on this issue see the work by Meron et al. [13].
References
 [1] A. Hasegawa and Y. Kodama, Solitons in optical communications. Oxford, 1995.
 [2] A. Hasegawa and T. Nyu, “Eigenvalue communication,” Journal of lightwave technology, vol. 11, no. 3, pp. 395–399, March 1993.
 [3] J. P. Gordon and H. A. Haus, “Random walk of coherently amplified solitons in optical fiber transmission,” Opt. Lett., vol. 11, no. 10, pp. 665–667, 1986. [Online]. Available: http://ol.osa.org/abstract.cfm?URI=ol1110665
 [4] M. I. Yousefi and F. R. Kschischang, “Information transmission using the nonlinear fourier transform, part i: Mathematical tools,” CoRR, vol. abs/1202.3653, 2012.
 [5] ——, “Information transmission using the nonlinear fourier transform, part ii: Numerical methods,” CoRR, vol. abs/1204.0830, 2012.
 [6] D. J. Kaup, “A perturbation expansion for the ZakharovShabat inverse scattering transform,” SIAM Journal on Applied Mathematics, vol. 31, no. 1, pp. 121–133, July 1976.
 [7] Y. S. Kivshar and B. A. Malomed, “Dynamics of solitons in nearly integrable systems,” Rev. Mod. Phys., vol. 61, no. 4, pp. 763–915, Oct 1989.
 [8] C. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. pt. I, pp. 379–423, 1948; pt. II, pp. 623–656, 1948.
 [9] S. Arimoto, “An algorithm for computing the capacity of arbitrary discrete memoryless channels,” IEEE Transactions on Information Theory,, vol. 18, no. 1, pp. 14 – 20, jan 1972.
 [10] R. E. Blahut, “Computation of channel capacity and ratedistortion functions,” IEEE Transactions on Information Theory, vol. 18, pp. 460–473, 1972.
 [11] S. Shamai and I. BarDavid, “The capacity of average and peakpowerlimited quadrature gaussian channels,” IEEE Transactions on Information Theory, vol. 41, no. 4, pp. 1060–1071, 1995.
 [12] T. Cover and J. Thomas, Elements of Information Theory. Wiley series in telecommunications, 1991.
 [13] M. Eado, M. Shtaif, and M. Feder, “Information transmission between sets of values phd work in progress,” to be submitted, 2012.