# Capacity Lower Bounds of the Noncentral Chi-Channel with Applications to Soliton Amplitude Modulation

###### Abstract

The channel law for amplitude-modulated solitons transmitted through a nonlinear optical fibre with ideal distributed amplification and a receiver based on the nonlinear Fourier transform is a noncentral chi-distribution with degrees of freedom, where and correspond to the single- and dual-polarisation cases, respectively. In this paper, we study capacity lower bounds of this channel under an average power constraint in bits per channel use. We develop an asymptotic semi-analytic approximation for a capacity lower bound for arbitrary and a Rayleigh input distribution. It is shown that this lower bound grows logarithmically with signal-to-noise ratio (SNR), independently of the value of . Numerical results for other continuous input distributions are also provided. A half-Gaussian input distribution is shown to give larger rates than a Rayleigh input distribution for . At an SNR of dB, the best lower bounds we developed are approximately bit per channel use. The practically relevant case of amplitude shift-keying (ASK) constellations is also numerically analysed. For the same SNR of dB, a -ASK constellation yields a rate of approximately bit per channel use.

## I Introduction

Optical fibre transmission systems carrying the overwhelming bulk of the world’s telecommunication traffic have undergone a long process of increasing engineering complexity and sophistication [1, 2, 3]. However, the key physical effects affecting the performance of these systems remain largely the same. These are: attenuation, chromatic dispersion, fibre nonlinearity due to the optical Kerr effect, and optical noise. Although the bandwidth of optical fibre transmission systems is large, these systems are ultimately band-limited. This bandwidth limitation combined with the ever-growing demand for data rates is expected to result in a so-called “capacity crunch” [4], which caps the rate increase of error-free data transmission [4, 5, 6, 7]. Designing spectrally-efficient transmission systems is therefore a key challenge for future optical fibre transmission systems.

The channel model used in optical communication that includes all three above-mentioned key effects for two states of polarisation is the so-called Manakov equation (ME) [7, eq. (1.26)], [8, Sec. 10.3.1]. The ME describes the propagation of the optical field for systems employing polarisation division multiplexing. The ME therefore generalises the popular scalar nonlinear Schrödinger equation (NSE) [6, 9, 7, 8], used for single-polarisation systems. In both models, the evolution of the optical field along the fibre is represented by a nonlinear partial differential equation with complex additive Gaussian noise.^{1}^{1}1The precise mathematical expressions for both channel models are given in Sec. II-A. The accumulated nonlinear interaction between the signal and the noise makes the analysis of the resulting channel model a very difficult problem. As recently discussed in, e.g., [10, Sec. 1], [11], [12], exact channel capacity results for fibre optical systems are scarce, and many aspects related to this problem remain open.

Until recently, the common belief among some researchers in the field of optical communication was that nonlinearity was always a nuisance that necessarily degrades the system performance. This led to the assumption that the capacity of the optical channel had a peaky behaviour when plotted as a function of the transmit power^{2}^{2}2However, nondecaying bounds can be found in the literature, e.g., in [10, 13] (lower bounds) and [14, 15] (upper bounds).. Partially motivated by the idea of improving the data rates in optical fibre links, a multitude of nonlinearity compensation methods have been proposed (see, e.g., [20, 21, 17, 16, 18, 19]), each resulting in different discrete-time channel models. Recently, a paradigm-shifting approach for overcoming the effects of nonlinearity has been receiving increased attention. This approach relies on the fact that both the ME and NSE in the absence of losses and noise are exactly integrable [23, 22].

One of the consequences of integrability is that the signal evolution can be represented using nonlinear normal modes. While the pulse propagation in the ME and NSE is nonlinear, the evolution of these nonlinear modes in the so-called nonlinear spectral domain is essentially linear [24], [25]. The decomposition of the waveform into the nonlinear modes (and the reciprocal operation) is often referred to as nonlinear Fourier transform (NFT), due to its similarity with the application of the conventional Fourier decomposition in linear systems [26].^{3}^{3}3In mathematics and physics literature, the name inverse scattering transform method for the NFT is more commonly used. The linear propagation of the nonlinear modes implies that the nonlinear cross-talk in the NFT domain is theoretically absent, an idea exploited in the so-called nonlinear frequency division multiplexing [24, 27]. In this method, the nonlinear interference can be greatly suppressed by assigning users different ranges in the nonlinear spectrum, instead of multiplexing them using the conventional Fourier domain.

Integrability (and the general ideas based around NFT) has also lead to several nonlinearity compensation, transmission and coding schemes [28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]. These can be seen as a generalisation of soliton-based communications [8], [9, 39, Chapter 5], which follow the pioneering work by Hasegawa and Nyu [40], and where only the discrete eigenvalues were used for communication. The development of efficient and numerically stable algorithms has also attracted a lot of attention [41]. Furthermore, there have also been a number of experimental demonstrations and assessments for different NFT-based systems [33, 34, 35, 36, 37, 38]. However, for systems governed by the ME, the only results available come from the recent theoretical work of Maruta and Matsuda [32].

Two nonlinear spectra (types of nonlinear modes) exist in the NSE and the ME. The first one is the so-called continuous spectrum, which is the exact nonlinear analogue of the familiar linear FT, inasmuch as its evolution in an optical fibre is exactly equivalent to that of the linear spectrum under the action of the chromatic dispersion and the energy contained in the continuous spectrum is related to that in the time domain by a modified Parseval equality [31, 26]. The unique feature of the NFT is, however, that apart from the continuous spectrum, it can support a set of discrete eigenvalues (the nondispersive part of the solution). In the time domain, these eigenvalues correspond to stable localised multi-soliton waveforms immune to both dispersion and nonlinearity [8]. The spectral efficiency of the multiple-eigenvalue encoding schemes is an area actively explored at the moment [29, 42, 43]. Multi-soliton transmission has also received increased attention in recent years, see, e.g., [44] and [45] and references therein. Finding the capacity of the multi-eigenvalue-based systems in the presence of in-line noise that breaks integrability still remains an open research problem. If only a single eigenvalue per time slot is used, the problem is equivalent to a well-known time-domain amplitude-modulated soliton transmission system^{4}^{4}4Since the imaginary part of a single discrete eigenvalue is proportional to the soliton amplitude.. In this paper, we consider this simple set-up, where a single eigenvalue is transmitted in every time slot. The obtained results are applicable not only to classical soliton communication systems, but also to the novel area of the eigenvalue communications.

Although the set-up we consider in this paper is one of the simplest ones, its channel capacity is still unknown. Furthermore, the only results available in the literature [29, 48, 42, 43, 47, 46, 49] are exclusively for the NSE, leaving the ME completely unexplored. In particular, previous results include those by Meron et al. [48], who recognised that mutual information (MI) in a nonlinear integrable channel can (and should) be evaluated through the statistics of the nonlinear spectrum, i.e., via the channel defined in the NFT domain. Using a Gaussian scalar model for the amplitude evolution with in-line noise, a lower bound on the MI and capacity of a single-soliton transmission system was presented. The case of two and more solitons per one time slot was also analysed, where data rate gains of the continuous soliton modulation versus an on-off-keying (OOK) system were also shown. A bit-error rate analysis for the case of two interacting solitons has been presented in [50]. The derivations presented there, however, cannot be used straightforwardly for information theoretic analysis. Yousefi and Kschischang [29] addressed the question of achievable spectral efficiency for single- and multi-eigenvalue transmission systems using a Gaussian model for the nonlinear spectrum evolution. Some results on the continuous spectrum modulation were also presented. Later in [42], the spectral efficiency of a multi-eigenvalue transmission system was studied in more detail. In [43], the same problem was studied by considering the correlation functions of the spectral data obtained in the quasi-classical limit of large number of eigenvalues. Achievable information rates for multi-eigenvalue transmission systems utilising all four degrees of freedom of each scalar soliton in NSE were analytically obtained in [46]. These results were obtained within the framework of a Gaussian noise model provided in [29, 47] (non-Gaussian models have been presented in [51, 52]) and assuming a continuous uniform input distribution subject to peak power constraints. The spectral efficiency for the NFT continuous spectrum modulation was considered in [53, 54, 55]. Periodic NFT methods have been recently investigated in [56].

In [49], we used a non-Gaussian model for the evolution of a single soliton amplitude and the NSE. Our results showed that a lower bound for the capacity per channel use of such a model grows unbounded with the effective signal-to-noise ratio (SNR). In this paper, we generalise and extend our results in [49] to the ME. To this end, we use perturbation-based channel laws for soliton amplitudes previously reported in [51, 52] (for the NSE) and [57] (for the ME). Both channel laws are a noncentral chi () distribution with degrees of freedom, where and correspond to the NSE and ME, respectively. Motivated by the similarity of the channel models mentioned above, in this paper we study asymptotic lower bound approximations on the capacity (in bit per channel use) of a general noncentral chi-channel arbitrary (even) number of degrees of freedom. To the best of our knowledge, this has not been previously reported in the literature. Similar models, however, do appear in the study of noise-driven coupled nonlinear oscillators [58].

The first contribution of this paper is to numerically obtain lower bounds for the channel capacity for three continuous input distributions, as well as for amplitude shift-keying (ASK) constellations with discrete number of constellation points. For all the continuous inputs, the lower bounds are shown to be nondecreasing functions of the SNR under an average power constraint. The second contribution of this paper is to provide an asymptotic closed-form expression for the MI of the noncentral chi-channel with a arbitrary (even) number of degrees of freedom. This asymptotic expression shows that the MI grows unbounded and at the same rate, independently of the number of degrees of freedom.

## Ii Continuous-time Channel Model

### Ii-a The Propagation Equations

The propagation of light in optical fibres in the presence of amplified spontaneous emission (ASE) noise can be described by a stochastic partial differential equation which captures the effects of chromatic dispersion, nonlinear polarisation mode dispersion, optical Kerr effect, and the generation of ASE noise from the optical amplification process. Throughout this paper we assume that the fibre loss is continuously compensated along the fibre by means of (ideal) distributed Raman amplification (DRA)[59, 60]. In this work we consider the propagation of a slowly varying 2-component envelope over a nonlinear birefringent optical fibre, where and represent time and propagation distance, respectively. Our model also includes the 2-component ASE noise due to the DRA. We also assume a uniform change of polarised state on the Poincaré sphere [61].

The resulting lossless ME is then given by [7, eq. (1.26)],[8, Sec. 10.3.1],[62, 57]^{5}^{5}5Throughout this paper, vectors are denoted by boldface symbols , while scalars are denoted by nonboldface symbols. The scalar product is denoted by , and over-bar denotes complex conjugation. The Euclidean norm is denoted by . The partial derivatives in the partial differential equations are expressed as subscripts, e.g., , , etc. The imaginary unit is denoted by .

(1) |

where the retarded time is measured in the reference frame moving with the optical pulse average group velocity, represents the slowly varying 2-component envelope of electric field, is the group velocity dispersion coefficient characterising the chromatic dispersion, and is the fibre nonlinearity coefficient. The pre-factor in (1) comes from the averaging of the fast polarisation rotation [8, Sec. 10.3.1], [61]. For simplicity we will further work with the effective averaged nonlinear coefficient when addressing the ME. In the case of a single polarisation state, the propagation equation above reduces to the lossless generalised scalar NSE [6, 9]

(2) |

In this paper we consider the case of anomalous dispersion (), i.e., the focusing case. In this case, both the ME in (1) and the NSE in (2) permit bright soliton solutions (“particle-like waves”), which will be discussed in more detail in Sec. II-B.

It is customary to re-scale (1) to dimensionless units. We shall use the following normalisation: The power will be measured in units of mW since it is a typical power level used in optical communications. The normalised (dimensionless) field then becomes . For the distance and time, we define the dimensionless variables and as and , where

(3) |

For the scalar case (2), we use the same normalisation but we replace by . Then, the resulting ME reads

(4) |

while the NSE becomes

(5) |

The ASE noise in (4) is a normalised version of , and is assumed to have the following correlation properties

(6) |

with , with being a Kronecker symbol, is the mathematical expectation operator, and is the Dirac delta function. The correlation properties (6) mean that each noise component is assumed to be a zero-mean, independent, white circular Gaussian noise. The scalar case follows by considering a single noise component only.

The noise intensity in (6) is (in dimensionless units)

(7) |

where is the spectral density of the noise, with real world units . For ideal DRA, this can be expressed through the optical fibre and transmission system parameters as follows: , where is the fibre attenuation coefficient, is the average photon energy, is a temperature-dependent phonon occupancy factor [6].

From now on, all the quantities in this paper are in normalised units unless specified otherwise. Furthermore, we define the continuous time channel as the one defined by the normalised ME and the NSE. This is shown schematically in the inner part of Fig. 1, where the transmitted and received waveforms are and , respectively, where is the propagation distance.

### Ii-B Fundamental Soliton Solutions

It is known that the noiseless ME (4) possesses a special class of solutions, the so-called fundamental bright solitons.^{6}^{6}6Fundamental solitons are “bright” only for the focusing case we consider in this paper, i.e., for anomalous dispersion. In general, the Manakov fundamental soliton is fully characterised by 6 parameters [57] (4 in the NSE case): frequency (also having the meaning of velocity in some physical applications), phase, phase mismatch, centre-of-mass position, polarisation angle, and amplitude (the latter is inversely proportional to the width of the soliton). In this paper we consider amplitude-modulated solitons, and thus, no information is carried by the other 5 parameters. The initial values of these 5 parameters can therefore be set to arbitrary values. In this paper, all of them have been set to zero. For the initial frequency, this can be further motivated to avoid deterministic pulse walk-offs. As for the initial phase, phase mismatch, and centre-of-mass position, as we shall see in the next section, their initial values do not affect the marginal amplitude channel law. Under these assumptions, the soliton solution at is given by [62, 57]

(8) |

where is the soliton amplitude and is the polarisation angle. The value of can be used to control how the signal power is split across the two polarisations.

For any , the Manakov soliton solution after propagation over a distance with the initial condition given by (8), is expressed as

(9) | ||||

(10) |

The soliton solution for the NSE in (5) can be obtained by using in (8)–(10)^{7}^{7}7This corresponds to the case where all the signal power is transmitted in the first polarisation., which gives

(11) |

and

(12) |

## Iii Discrete-time Channel Model

### Iii-a Amplitude-modulated Solitons: One and Two Polarisations

We consider a continuous-time input signal of the form

(13) |

where and is the discrete-time index. Motivated by the results in Sec. II-B, the pulses are chosen to be

(14) |

where is the symbol period. In principle, it is also possible to encode information by changing the polarisation angle from slot to slot. However, in this paper, we fix its value to be the same for all the time slots corresponding to a fixed (generally elliptic) degree of polarisation. Thus, the transmitted waveform corresponds to soliton amplitude modulation, which is schematically shown in Fig. 2 for the scalar (NSE) case.

At the transmitter, we assume that symbols are mapped to soliton amplitudes via . This normalisation is introduced only to simplify the analytical derivations in this paper. To avoid soliton-to-soliton interactions, we also assume that the separation is large, i.e., , . The receiver in Fig. 1 is assumed to process the received waveform during a window of via the forward NFT [22, 32] and returns the amplitude of the received soliton, which we denoted by .

Before proceeding further, it is important to discuss the role of the amplitudes on a potential enhancement of soliton-soliton interactions. The interaction force prefactor is known to scale as the amplitude cubed [8, Chapter 9.2], [9, Chapter 5.4]. However, the interaction also decays exponentially as . This exponential decay dominates the interaction, and thus, considering very large amplitudes (or equivalently, very large powers, as we will do later in the paper), is in principle not a problem. At extremely large amplitudes, however, the model used in this paper is invalid for different reasons: higher order nonlinearities should be taken into account. This includes stimulated Brilloin scattering (for very large powers) or Raman scattering (for very short pulses). Studying these effects is, however, out of the scope of this paper.

We would also like to emphasise that for a fixed pulse separation , the channel model we consider in this paper is not applicable for low soliton amplitudes. This is due to two reasons. The first one is that for low amplitude solitons, the perturbation theory used to derive the channel law becomes inapplicable as the signal becomes of the same order as noise. Secondly, low amplitude solitons are also very broad, and thus, nonnegligible soliton interactions are generated. These two cases can be overcome if the soliton amplitudes are always forced to be larger than certain cutoff amplitude , which we will now estimate. For the first case (noise-limited), the threshold is proportional to . In the second case (interaction-limited), the threshold is proportional to the symbol rate, i.e., . This shows that for fixed system parameters, the threshold is a constant. The implications of this will be discussed at the end of Sec. IV.

Having defined the transmitter and receiver, we can now define a discrete-time channel model, which encompasses the transmitter, the optical fibre, and the receiver, as shown in Fig. 1. Due to the assumption on solitons well-separated in time, we model the channel as memoryless, and thus, from now on we drop the time index . This memoryless assumption is supported by additional numerical simulations we performed, which are included in Appendix A. Nevertheless, at this point it is important to consider the implications of a potential mismatch between the memoryless assumption of the model and the true channel in the context of channel capacity lower bounds. In particular, if in some regimes (e.g., low power or large transmission distances) the memoryless assumption would not hold, considering a memoryless channel model would result in approximated lower bounds on the channel capacity. Provable lower bounds can be obtained by using mismatched decoding theory [63] (as done in [64, Sec. III-A and III-B]) or by considering an average memoryless channel (as done in [6, Sec. III-F]). Although both approaches can in principle be used in the context of amplitude-modulated solitons, they both rely on having access to samples from the true channel, and not from a (potentially memoryless) model. Such samples can only be obtained through numerical simulations or an optical experiment, which is beyond the scope of this paper. In this context, the channel capacity lower bounds in Sec. IV, should be considered as a first step towards more involved analyses.

The conditional probability density function (PDF) for the received soliton amplitude given the transmitted amplitude was obtained in [57, eq. (15)] using standard perturbative approach and the Fokker-Planck equation method. The result can be expressed as a noncentral chi-squared distribution

(15) |

where

(16) |

is the normalised variance of accumulated ASE noise, and is the modified Bessel function of the first kind of order two. The expression in (15) is a noncentral chi-squared distribution with six degrees of freedom (see, e.g., [65, eq. (29.4)]) providing non-Gaussian statistics for Manakov soliton amplitudes. By making the change of variables , and using , the PDF in (15) can be expressed as

(17) |

which corresponds to the noncentral chi-distribution with six degrees of freedom. An extra factor before the exponential function comes from the Jacobian.

For the NSE, it is possible to show that the channel law becomes [49, 51, 52]

(18) |

which corresponds to a noncentral chi-distribution with four degrees of freedom.

We note that although in this paper we only consider an amplitude modulation (or in the NFT terms the imaginary part of each discrete eigenvalue), it is possible to include other discrete degrees of freedom corresponding to various soliton parameters in (14) in order to improve the achievable information rates. This is, however, beyond the scope of this paper. Furthermore, the channel models presented in this section were obtained via a perturbative treatment, and thus, in the context of soliton/eigenvalue communications they are technically valid only at high SNR.^{8}^{8}8More precisely, when the total soliton energy in the time slot is much greater than that of the ASE noise. Despite that, in the current paper we will also study capacity lower bounds of a general noncentral chi-channel with arbitrary number of degrees of freedom any range of SNR. While admittedly the low-SNR region is currently only of interest when (noncoherent phase channel) we believe its generalization for can still be of interest for the new generation of nonlinear optical regeneration systems

### Iii-B Generalised Discrete-time Channel Model

The results in the previous section show that both scalar and vector soliton channels can be modelled using the same class of the noncentral chi-distribution with an even number of degrees of freedom , with . The simplest channel of this type corresponds to , which describes a fibre optical communication channel with zero-dispersion[13] as well as the noncoherent phase channel studied in [66] (see also [67]). Motivated by this, here we consider a general communication channel described by the noncentral chi-distribution with an arbitrary (even) degrees of freedom . Although we are currently not aware of any physically-relevant communication system that can be modelled with , we present results for arbitrary to provide an exhaustive treatment for channels of this type.

The channel in question is therefore modelled via the PDF corresponding to noncentral chi-distribution

(19) |

with and where . This channel law corresponds to the following input-output relation

(20) |

where is a set of independent and identically distributed Gaussian random variables with zero mean and variance . The above input-output relationship is schematically shown in Fig. 3, which particularises to (17) and (18), for and , respectively.

## Iv Main Results

In this section, we study capacity lower bounds of the channel in (19). We will show results as a function of the effective SNR defined as , where is the second moment of the input distribution and is given by (16). The value of also corresponds to the average soliton amplitude, i.e., . It can be shown that for given system parameters, the noise power (in real world units) is constant and proportional to , and the signal power (in real world units) is proportional to . The parameter therefore indeed corresponds to an effective SNR.

As previously explained, the inter-symbol interference due to pulse interaction can be neglected due to the large enough soliton separation assumed, and thus, the channel can be treated as a memoryless (see Appendix A for more details). The channel capacity, in bits per channel use, is then given by[68, 69]

(21) |

where

(22) | ||||

(23) |

and where and are the output and conditional differential entropies, respectively. The optimisation in (21) is performed over all possible statistical distributions that satisfy the power constraint. In our case this constraint corresponds to a fixed second moment of the input symbol distribution or, equivalently, to a fixed average signal power in a given symbol period.

The exact solution for the power-constrained optimisation problem (21) with the channel law (19) is unknown. For the noncentral chi-distribution with 2 degrees of freedom (i.e., to the noncoherent additive noise channel), it was shown [66] that the capacity-achieving distribution is discrete with an infinite number of mass points. To the best of our knowledge, that proof has not been extended to higher number of degrees of freedom, however, we expect that will be the case for (19) too.

In this paper, we do not aim at finding the capacity-achieving distribution, but instead, we study lower bounds on the capacity. We do this because the capacity problem is in general very difficult, but also because of the relevance of having nondecreasing lower bounds on the capacity for the optical community. To obtain a lower bound on the capacity, we will simply choose an input distribution (as done in, e.g., [5, 49]). Without claiming the generality, we, however, consider four important candidates for the input distribution. First, following [49], we use symbols drawn from a Rayleigh distribution

(24) |

As we will see later, this input distribution is not the one giving the highest lower bound. However, it has one important advantage: it allows some analytical results for the mutual information. The other three distributions are considered later in this section as numerical examples.

The next two Lemmas provide an exact closed-form expression for the conditional differential entropy and an asymptotic expression for the output differential entropy .

###### Lemma 1

###### Proof:

See Appendix B. \qed

###### Proof:

See Appendix C. \qed

The next theorem is one of the main results of this paper.

###### Theorem 3

###### Proof:

We expand the function in (27) defining the conditional entropy in Lemma 1. At fixed large the integrand asymptotically decays as , i.e., with small decrement (which can be proven by a standard large argument asymptotes of the Bessel functions). This means that the main contribution to the integral comes from the asymptotic region in most part of which the large argument expansion of both Bessel functions is indeed justified. Using it uniformly we obtain

which used in (1) gives the asymptotic expression

(30) |

The proof is completed by combining (30) and (28) with (23). \qed

The result in Theorem 3 is a universal and -independent expression. The expression in (29) shows that the capacity lower bound is asymptotically equivalent to half of logarithm of SNR plus a constant which is order-independent. Fig. 4 shows the numerical evaluation of for obtained by numerically evaluating all the integrals in the exact expressions for the conditional and output entropies in (1) and (53), as well as the asymptotic expression in Theorem 3. Interestingly, we can see that even in the medium-SNR region, the influence of the number of degrees of freedom on the MI is minimal, and the curves are quite close to each other. In this figure, we also include the lower and upper bounds for given by [67, eq. (21)] and [66, eq. (41)], resp. These results show that the asymptotic results in Theorem 3 correctly follow these two bounds.

The main reason for considering a Rayleigh input distribution was that it yields a semi-analytical lower bound on the the capacity. In the following example, we consider three other input distributions and numerically calculate the resulting MI.

###### Example 1

Consider the geometric (exponential), half-Gaussian, and Maxwell-Boltzmann distributions given by

(31) |

(32) |

and

(33) |

respectively. The MIs for these three distributions for are shown in Fig. 5 and show that the lower bound given by the geometric input distribution in (31) displays high MI in the low SNR regime ( dB), whereas the half-Gaussian input distribution in (32) is better for medium and large SNR. On the other hand, the Maxwell-Boltzmann distribution in (33) gives the lowest MI for all SNR. Numerical results also indicate that all the presented MIs asymptotically exhibit an equivalent growth irrespective of the number of the degrees of freedom .

The following example considers the use of discrete constellations. In particular, we assume that the soliton amplitudes take values on a set , where is the cardinality of the constellation, and is a number of bits per symbol. The MI (23) in this case can be evaluated as

(34) |

where we assumed the symbols are equally likely.

###### Example 2

Consider ASK constellations with and second moment , which correspond to OOK, 4-ASK, 8-ASK, and 16-ASK, respectively. The MI numerically evaluated for these constellations is shown in Fig. 6 for chi-channel with . As a reference, in this figure we also show (black lines) the MI for the (continuous) half-Gaussian input distribution. The results in this figure show that in the low SNR regime, the use of binary modulation is in fact better than the half-Gaussian distribution. This can, however, be remedied by using a geometric distribution, which, as shown in Fig. 5, outperforms the half-Gaussian distribution in the low SNR regime. In the high SNR regime, however, this is not the case.

Finally, let us address the impact of the cutoff we introduced in Sec. III. All our results for continuous input distributions have been obtained for the input distributions that are not bounded away from zero (see (24), (31)–(33)). Therefore, symbols are generated below the threshold , where the channel law considered in this paper does not hold. We shall now only consider here the case of the Rayleigh input (24) as this distribution was used to obtain the main result of this section. We will prove that in the high-power (i.e., high SNR) regime, the effect of the cutoff on the achievable data rate tends to zero. To do so, we note that for fixed fibre parameters and propagation distance, the cutoff is also fixed, while grows linearly with SNR. In other words, one can achieve high SNR at the expense of high power solitons for fixed noise variance. One possible way of showing that the effect of the cutoff on the achievable rate is zero as SNR tends to infinity is to consider a transmitter which generates a dummy symbol every time . The value of the threshold is message-independent and thus, can be assumed to be known to the receiver which will discard sub-threshold symbols. This allows us to keep the main results of the paper at the expense of a data rate loss (since part of the time, dummy symbols are transmitted). The probability of such “outage” event is given by an the integral of the input distribution from zero to the threshold. For the Rayleigh input PDF (24) this probability is given by (see (64)–(67)). Therefore asymptotically when . The average rate loss is then given by , which tends to zero as .

An alternative and more rigorous solution to the problem above is to consider directly the difference between the MI asymptote obtained in the current paper (i.e., Theorem 3) and that obtained by a truncated input Rayleigh distribution which simply does not generate sub-threshold symbols. This difference can be shown to tend to zero as . This proof is given in Appendix D.

## V Conclusions

A non-Gaussian channel model for the conditional PDF of well-separated (in time) soliton amplitudes was used to study lower bounds on the channel capacity. Results for propagation of signals over a nonlinear optical fibre using one and two polarisations were presented. The results in this paper demonstrated both analytically and numerically that there exist lower bounds on the channel capacity that display an unbounded growth with the effective SNR, similarly to the linear Gaussian channel. All the results in this paper are given in bit per channel use only, and thus, they should be considered as a first step towards analysing the more practically relevant problem of channel capacity in bit per second per unit bandwidth. This is a considerably more challenging problem, which is left for further investigation.

Apart from the ME soliton channel model this paper also studied lower bounds on the capacity of an abstract general noncentral chi-channel with arbitrary number of degrees of freedom. Similar channel models appear in the study of relatively general systems of noise-driven coupled nonlinear oscillators [58]. Therefore, we believe that the results for large number of degrees of freedom might also some day find applications in nonlinear communication channels.

The results obtained in this paper for the general noncentral chi-Channel are true capacity lower bounds for that channel model. For the case of the application considered in this paper (amplitude-modulated soliton systems), however, the presented analysis was based on a perturbative-based model which holds at high SNR. This model also does not consider potential interaction between solitons, and thus, the results in this paper are limited to solitons well separated in time. Another way of interpreting these results is that the obtained expressions are approximated lower bounds on the capacity of the true channel. Bounds that consider memory effects are left for further investigation. Furthermore, another interesting open research problem is the derivation of capacity upper bounds for amplitude-modulated soliton systems. This is also left for further investigation.

## Appendix A Memoryless property of the discrete-time channel model

In this section, we present numerical simulations to verify the memoryless assumption for the discrete channel model in Sec. III. To this end, we simulated the propagation of sequences of soliton symbols through the scalar waveform channel given by (5). Two launch powers ( and dBm) and two propagation distances ( km and km) are considered. The simulations were carried out via the standard split-step Fourier method. The soliton amplitudes were generated as i.i.d. samples from a Rayleigh input distribution (see (24)) and the variance of was chosen to be 1.25 and 20, so that the resulting soliton waveforms have powers of and dBm, respectively. The transmitted waveform was created using (13) at a symbol rate of GBd. To guarantee an accurate simulation, the time-domain samples were taken every ps and the step size was km. White Gaussian noise was added at each step to model the ideal DRA process. The simulation parameters are similar to those used in [44] and are summarised in Table I.

Fig. 7 shows the waveforms before and after propagation through the channel given in (5). As expected, the received signal is a noisy version of the transmitted waveform, where the noise increases as the propagation distance increases. These results show that doubling the transmission distance and/or (approximately) doubling the launch power has very little effect in the soliton shapes.

The noisy waveforms shown in Fig. 7 were then used to obtain soliton amplitudes via the forward NFT. Each amplitude is obtained by processing the corresponding symbol period via the spectral matrix method [28, Sec. IV-B]. To test the memoryless assumption, we perform a simple correlation test. In particular, we consider the normalised output symbol correlation matrix, whose entries are defined as

(35) |

The obtained correlation matrices are shown in Fig. 8, where statistics were gathered by performing Monte-Carlo runs of the signal propagation. As we can see from Fig. 8, the matrices are almost diagonal. Since our communication channel is believed to be non-Gaussian, the absence of correlation does not of course necessarily imply the memoryless property (understood here as the statistical independence). However, it does constitute an important quantification of the qualitative criterion as given in Sec. III-A.

## Appendix B Proof of Lemma 1

The MI is invariant under a simultaneous linear re-scaling of the variables and . For notation simplicity, and without loss of generality, throughout this proof we thus assume . Furthermore, we study the conditional entropy as a function of and all the results will be given in nats.

We express the conditional differential entropy as

(36) | ||||

(37) |

where (37) follows from (19). In what follows, we will compute the 5 expectations in (37).

To compute the second and fifth terms in (37), we first calculate the output distribution as

(40) | ||||

(41) |

where the joint distribution can be expressed using (19) and (24) as

(42) |

with

(43) |

and where (41) can be obtained using a symbolic integration software. Using (41), we obtain (using a symbolic integration software)

(44) |

where is the digamma function, is given by (26). The second moment of the output distribution is obtained directly from the channel input-output relation (20), yielding

(45) |

Substituting (38), (39), (44) and (45) into (37), we have

(46) |

where

(47) |

The last step is to compute the term , which using (42) can be expressed as

(48) |

We then make the change of variables , , with the Jacobian , yielding

(49) |

The integration over can be performed analytically, yielding

(50) |

where is the modified Bessel function of the second kind of order . Using (50) in (49) gives

(51) | ||||

(52) |

The proof is completed by using (52) in (46), the definition of in (43), and by returning to logarithm base 2.

## Appendix C Proof for Lemma 2

From (41), it follows that the output entropy can then be expressed as^{9}^{9}9Similarly to Appendix B, the results in this proof are in nats.

(53) |

where is given by (43),

(54) | ||||

(55) |

where is given by (41) and

(56) | ||||

(57) |

Notice that from its definition it follows that the function is confined to the interval . We shall now prove that decays as or faster when . Indeed, one has

(58) | ||||

(59) |

Next, one notices that is positive and can be upper-bounded as follows

(60) | ||||

(61) |

It is therefore only left to prove that the integral converges, i.e., that the constant is finite. This can be done as follows:

where in the second line we have used an inequality , . Therefore, asymptotically decays not slower than .

## Appendix D Proof of the asymptotically vanishing rate loss

Here we shall prove that an input distribution bounded (truncated) away from zero gives the same results as Theorem 3 in the limit of large average power . To this end, consider a system where the transmitted amplitudes are drawn from a Rayleigh distribution with PDF given in (24). Let us now introduce a threshold of amplitudes realisations below which our channel law model is expected to be inapplicable. Let us now introduce an alternative system where the symbols are drawn from a “truncated” Rayleigh distribution with PDF

(63) |

where is the Heaviside step function, and is defined as

(64) |

This probability can be expressed as

(65) | ||||

(66) | ||||

(67) |

As discussed in Sec. III-A and Sec. IV, the threshold is a constant, and thus, .

To prove that the rate loss tends to zero, we shall prove that

(68) |

or equivalently,

(69) |

and