# Probabilistic Eigenvalue Shaping for

Nonlinear Fourier Transform Transmission

###### Abstract

We consider a \acNFT-based transmission scheme, where data is embedded into the imaginary part of the nonlinear discrete spectrum. Inspired by probabilistic amplitude shaping, we propose a \acPES scheme as a means to increase the data rate of the system. We exploit the fact that for an \acNFT-based transmission scheme, the pulses in the time domain are of unequal duration by transmitting them with a dynamic symbol interval and find a capacity-achieving distribution. The \acPES scheme shapes the information symbols according to the capacity-achieving distribution and transmits them together with the parity symbols at the output of a \aclLDPC encoder, suitably modulated, via time-sharing. We furthermore derive an achievable rate for the proposed \acPES scheme. We verify our results with simulations of the discrete-time model as well as with \aclSSFM simulations.

ASE short = ASE, long= amplifier-induced spontaneous emission \DeclareAcronymAWGN short = AWGN, long= additive white Gaussian noise \DeclareAcronymBER short = BER, long= bit error rate \DeclareAcronymBICM short = BICM, long= bit-interleaved coded modulation \DeclareAcronymBRGC short = BRGC, long = binary reflected Gray code \DeclareAcronymCCDM short = CCDM, long = constant composition distribution matcher \DeclareAcronymCD short = CD, long = chromatic dispersion \DeclareAcronymC3SE short = C3SE, long = Chalmers Centre for Computational Science and Engineering \DeclareAcronymCOIN short = COIN, long = coding for optical communications in the nonlinear regime \DeclareAcronymDB short = DBP, long = digital backpropagation \DeclareAcronymDM short = DM, long = distribution matcher \DeclareAcronymDVBS2 short = DVB-S2, long = digital video broadcasting - satellite 2 \DeclareAcronymEDFA short = EDFA, long = erbium-doped fiber amplifier \DeclareAcronymFEC short = FEC, long = forward error correction \DeclareAcronymiid short = i.i.d., long= independently and identically distributed \DeclareAcronymINFT short = INFT, long= inverse nonlinear Fourier transform \DeclareAcronymLDPC short = LDPC, long = low-density parity-check \DeclareAcronymMI short = MI, long = mutual information \DeclareAcronymNFDM short = NFDM, long= nonlinear frequency-division multiplexing \DeclareAcronymNFT short = NFT, long= nonlinear Fourier transform, alt = Nonlinear Fourier Transform \DeclareAcronymNLSE short = NLSE, long= nonlinear Schrödinger equation \DeclareAcronymOOK short = OOK, long= on-off keying \DeclareAcronymPAS short = PAS, long= probabilistic amplitude shaping \DeclareAcronymPDF short = PDF, long= probability density function \DeclareAcronymPES short = PES, long= probabilistic eigenvalue shaping \DeclareAcronymPMF short = PMF, long= probability mass function \DeclareAcronymPS short = PS, long= probabilistic shaping \DeclareAcronymrv short = RV, long= random variable \DeclareAcronymSMF short = SMF, long= single mode fiber \DeclareAcronymSNR short = SNR, long= signal-to-noise ratio \DeclareAcronymSSMF short = SSMF, long= standard single mode fiber \DeclareAcronymSSFM short = SSF, long= split-step Fourier

## I Introduction

Pulse propagation in optical fibers is severely impaired by nonlinear effects that should be either compensated or utilized for the design of the communication system. The \acNFT [1] provides a method to transform a signal from the time domain into a nonlinear frequency domain (spectrum), where the channel acts as a multiplicative filter on the signal. The nonlinear spectrum consists of a continuous and a discrete part. Both parts can be used to transmit information, either separately or jointly, and several schemes have been presented in theory and practice [1, 2, 3, 4, 5, 6]. However, very little is known so far about the \acPDF of the received signal in the nonlinear spectral domain when it is contaminated by channel noise.

In [7], a simplified communication system modulating only the imaginary part of the eigenvalues in the discrete nonlinear spectrum was presented. For this scheme, an approximation for the conditional \acPDF of the channel can be obtained in closed form. In general, for a given channel, the capacity-achieving distribution is not known and is often different from the conventional distribution with equispaced signal points and uniform signaling. Hence, some form of shaping is required [8]. Two popular methods of shaping are probabilistic shaping and geometric shaping. In geometric shaping, the capacity-achieving distribution is mimicked by optimizing the position of the constellation points for equiprobable signaling [9] whereas probabilistic shaping uses uniformly spaced constellation points and approximates the capacity-achieving distribution by assigning different probabilities to different constellation points [8].

The main drawback of probabilistic shaping is its practical implementation. An abundance of probabilistic shaping schemes have been presented, most suffering from high decoding complexity, low flexibility in adapting the spectral efficiency, or error propagation. For a literature review on probabilistic shaping, we refer the reader to [10, Section II].

Recently, a new scheme called \acPAS has been proposed in [10]. Compared to other shaping schemes, \acPAS yields high flexibility and close-to-capacity performance over a wide range of spectral efficiencies for the \acAWGN channel while still allowing bit-metric decoding. Although originally introduced for the \acAWGN channel, \acPAS can be applied to other channels with a symmetric capacity-achieving input distribution assuming a sufficiently high spectral efficiency.

In this paper, we consider a similar \acNFT-based transmission scheme to the one presented in [7], where data is embedded into the imaginary part of the nonlinear discrete spectrum. As a means to increase the data rate, we demonstrate that the concept of \acPAS can be adapted to this \acNFT-based transmission system. In particular, we propose a \acPES scheme, enabling similar low complexity and bit-metric decoding as \acPAS. We take advantage of the dependence of the pulse length on the data for the \acNFT-based transmission system and transmit each pulse as soon as the previous one has been transmitted rather than with a fixed interval as in [7], yielding increased data rate. Accordingly, we find the capacity-achieving input distribution, maximizing the time-scaled \acMI. For ease of notation, we refer to the maximized \acMI as capacity noting that it is in fact the constrained capacity of a system transmitting first-order solitons. The \acPES scheme then shapes the information symbols according to the capacity-achieving distribution by a \acDM. The information symbols are also encoded by a \acLDPC encoder and the parity symbols at the output of the encoder are suitably modulated. The resulting sequence of modulated symbols and the sequence at the output of the \acDM are transmitted via time-sharing. We further derive an achievable rate for such a \acPES scheme. We demonstrate via discrete-time Monte-Carlo and \acSSFM simulations, that \acPES performs at around from capacity using off-the-shelf \acLDPC codes. The proposed \acPES scheme yields a significant improvement of up to twice the data rate compared to an unshaped system as in [7].

It is important to note that although first-order solitons do not outperform conventional coherent systems due to their spectrally inefficient pulse shape compared to a Nyquist pulse shape, they have some other advantages. For instance, the first-order soliton transmission does not require \acCD compensation or \acDB as dispersion and nonlinearity are balanced and hence compensated. This work attempts to approach the limits of current NFT-based systems. To improve the spectral efficiency further, one should use higher-order solitons as well as the continuous part of the nonlinear spectrum together [4]. However, the channel equalization will not be as easy as the one for the first-order solitons and the channel model is not yet fully known.

The remainder of the paper is organized as follows. In Section II, we describe pulse propagation in an optical fiber and the \acNFT-based transmission scheme. In Section III, we optimize the input distribution and in Section IV, we introduce and describe the proposed \acPES scheme and derive an achievable rate. In Section V, we present numerical results for \acPES, both from Monte-Carlo simulation and \acSSFM simulation, and in Section VI we draw some conclusions.

Notation: The following notation is used throughout the paper. and denote the real and the imaginary part of a complex number, respectively, and denotes the imaginary unit. Vectors are typeset in bold, e.g., , \acprv are capitalized, e.g., , and hence vectors of \acprv are capitalized bold, e.g., . The \acPDF of an \acrv is written as and its expectation as . The conditional \acPDF of given is denoted as . The \acPMF of an \acrv is denoted by . The transpose of a vector or matrix is given as . A set is denoted by a capitalized Greek letter, e.g., , and its cardinality by . We write for the logarithm of base and for the natural logarithm.

## Ii Nonlinear Fourier Transform-based Transmission System

### Ii-a Pulse Propagation and the Nonlinear Fourier Transform

Pulse propagation in optical fibers is governed by a partial differential equation, the stochastic \acNLSE,

(1) |

where denotes the envelope of the electrical field as a function of the position along the fiber and time , the attenuation, the second order dispersion, the nonlinearity parameter, and is a white Gaussian process in time and in space with spectral density . The spectral density depends on the system and for distributed Raman amplification is given as , where is the temperature-dependent phonon occupancy factor, and is the average photon energy [7]. A general closed-form solution of the stochastic \acNLSE does not exist. In some special cases, e.g., for noisefree and lossless fibers, special solutions like, e.g., solitons, exist. Furthermore, we consider the \acNLSE in normalized form in the focusing regime, i.e., , under the assumption of ideal distributed Raman amplification, i.e., ,

(2) |

where , , , and is the length of the fiber. In this case, the \acNLSE is an integrable partial differential equation for which a pair of operators, called Lax pair, can be found. The eigenvalues of such an operator remain invariant during noiseless propagation and the Lax pair can be used to solve the partial differential equation. Solutions of (2) can be uniquely represented in terms of its eigenvalues via the so-called \acNFT. For a given position , the \acNFT of a signal (we drop the position for simplicity of presentation) with support on the time interval , is calculated by solving the partial differential equation

(3) |

where is the eigenvector of the auxiliary operator, with boundary conditions

and is the spectral component. Solving (3) gives rise to the continuous and discrete nonlinear spectrum

respectively, where , , and are the zeros of , , a finite set of isolated complex zeros, referred to as eigenvalues. Hence, the \acNFT represents the signal in the nonlinear spectral domain, where the influence of the channel on the signal is a multiplicative filter.

As a counterpart to the \acNFT that transforms a signal from the time domain to the nonlinear spectral domain, the \acINFT transforms a signal from the nonlinear spectral domain to the time domain. For an in-depth mathematical description of the \acINFT, we refer the interested reader to [1].

### Ii-B Soliton Transmission

As in [7], we embed information in the imaginary part of the discrete spectrum, also referred to as eigenvalues. Hence, the input of the channel is an \acrv , where is the set of eigenvalues, is the th eigenvalue, and is the order of the modulation. The eigenvalues are assumed to be ordered in ascending order by their imaginary parts. Furthermore, the output of the channel is an \acrv , where . A block diagram is depicted in Fig. 1. The information embedded in a single eigenvalue is transformed to a time-domain signal via the \acINFT where the transmitter is located at position along the fiber. At position , the receiver calculates the discrete spectrum from the received signal via the \acNFT. The time-domain signal corresponds to first order solitons, i.e.,

For the \acNFT to be valid, the signal must have finite support, i.e., before transmitting the next pulse, the previous one must have returned to zero. As the pulses in general have infinite tails, we truncate them when they fall below a threshold close to zero. We define the pulse over the smallest support containing a fraction of the energy of the pulse and hence, we can formally define the pulse width as follows.

###### Definition 1.

The pulse width of is defined as the smallest support containing a fraction of the energy of the pulse,

where .

The value of the cutoff parameter must be chosen in a way such that soliton-soliton interactions are negligible. For longer transmission distances, decreases, i.e., the pulses must be spaced further apart. Furthermore, the condition

(4) |

must be fulfilled [7].

At this point, it is important to comment on the memorylessness of the system emanating from the absence of soliton-soliton interactions. A pulse train of well-separated first order solitons was investigated in [7] for launch powers of and and transmission over and . It was shown via \acSSFM simulations that the correlation between the symbols at the receiver is essentially zero, concluding that the channel is indeed memoryless in the transmission range of to and transmit power range of to for which the model (5) is applicable. While this approach is not a rigorous proof, the results indicate that memorylessness is a valid assumption. Although the transmission scheme is different in [7], the underlying condition that any two pulses need to be sufficiently separated is the same. Hence, we can treat the \acNFT-based transmission system in this work as a memoryless channel.

In a practical system, we assume distributed Raman amplification and \acASE noise with received power spectral density to compensate for the lossy fiber and be able to use the \acNFT to relate the input and the output. The conditional \acPDF of such a system has been derived via a perturbative approach and the Fokker-Planck equation method [11] and is used to design a communication system in [7]. It is given by

(5) |

where is the received symbol as in Fig. 1, and is the modified Bessel function of the first kind of order one. The power spectral density of the received \acASE noise is normalized and relates to real world units as . The \acSNR is defined as . It is important to note that the model (5) assumes the noise intensity to be small such that it can be treated as a perturbation to the soliton. Hence, the model is only applicable if the signal energy is not the same order as that of the noise. Furthermore, for very high signal powers, (5) is no longer valid either since the impact of the inelastic scattering effects (i.e., stimulated Raman or Brillouin scattering) is not considered within the 1st-order perturbation approach. For a detailed derivation of the model, we refer the reader to [11].

In [7], the shortest possible symbol interval is defined by the pulse duration of , i.e., the longest pulse. However, this tends to be inefficient since especially for short pulses, the guard interval between two consecutive pulses is longer than necessary and thereby limits the data rate. Here, we exploit the effect of varying pulse lengths and transmit each pulse as soon as the previous one has returned to zero. This concept is depicted in Fig. 2, where pulse sequences with fixed and varying symbol interval are compared. The figure clearly shows the advantage of a varying pulse interval and also demonstrates the aforementioned inefficiencies. The data rate of a system with varying symbol intervals depends on the distribution of the data. Thus, we define the average symbol interval as follows.

###### Definition 2.

The average symbol interval is

In [7], only eigenvalues with an imaginary part larger than zero are used. We extend this by allowing . In the time domain, this results in a pulse with amplitude zero, i.e., we do not transmit anything. We define its corresponding duration as the same as the duration of the shortest pulse, .

As any practical system can handle only a maximum peak power and a maximum bandwidth, we enforce a peak power constraint which relates to a maximum eigenvalue constraint. Especially in systems with lumped amplification and \acpEDFA, such a constraint is required as eigenvalues fluctuate depending on their amplitude, which decreases the performance [12].

We note that the varying symbol interval introduces additional challenges on detection. In particular, an erroneously detected symbol may lead to error propagation, insertion errors (detection of symbols when none was transmitted), deletion errors (not detecting a transmitted symbol), or the loss of synchronization. To calculate the capacity, however, we neglect these effects. Hence, the results can be seen as an upper bound on the performance.

## Iii Capacity Achieving Distribution

From Fig. 2, it is intuitive that pulses with short duration should be transmitted more frequently than pulses with long duration. However, shorter pulses are more perturbed by noise than longer pulses. Hence, the optimal input distribution to the channel as described by the conditional \acPDF (5) is not the conventional uniform distribution. The channel capacity is obtained by maximizing the \acMI,

over all possible input distributions . Here, due to the variable transmission duration, we need to consider the \acMI under a variable cost constraint [13],

(6) |

To emphasize that the cost of a symbol is its corresponding pulse duration, we refer to the \acMI in the form of (6) as time-scaled \acMI. We can therefore define the capacity as

(7) |

where we set the supremum to zero if the set of distributions therein is empty. The capacity-achieving distribution, denoted by , is in the set for which the supremum is non-zero.

As the \acMI is concave in and is linear in and positive, the time-scaled \acMI is quasiconcave [14, Table 2.5.2]. We can solve (7) and obtain the corresponding capacity-achieving distribution numerically.

Exemplary results of the capacity-achieving distribution are shown in Fig. 3. We note that the lowest and highest amplitudes are always used with equal and high probability. For low \acpSNR, only these are used, i.e., \acOOK is optimal. Furthermore, the capacity-achieving distribution is discrete and is of exponential-like shape with the exception of a point mass at zero as it can be seen in Fig. 3.

Note that assumes memorylessness, which does not necessarily hold due to the variable symbol interval. Hence, is, in fact, the constraint capacity under the assumption of a memoryless channel and the constraint of transmitting only first-order solitons. However, for notational simplicity, we refer to it simply as capacity with its corresponding capacity-achieving distribution.

In the case of a noiseless channel, it is possible to derive a closed form solution to (7) under the assumption of a finite discretization.

###### Lemma 1.

Let be eigenvalues with and let be the time of transmitting a pulse with eigenvalue . Let be the unique real positive root of the polynomial . Then, in the noiseless case, the capacity is obtained as

and the capacity-achieving distribution is given by

(8) |

###### Proof.

Suppose that the -th eigenvalue is transmitted with probability . For any fixed average symbol interval , where , we are interested in the distribution that maximizes the entropy while leading to the average symbol duration . It is known that this distribution takes the form [15, Ch. 12]

(9) |

where ensures that and has to be selected such that . In the noiseless case, the MI is given by . The entropy then is

The time-scaled MI hence takes the form

In order to maximize , we find the optimal parameter by setting . This can be seen by setting the derivative of to zero, with

where denotes the variance of for the given . By assumption, as all are different, the middle part of this expression is strictly positive and . Hence, it is easy to see that this derivative can only be zero if . The optimal is hence found by setting . Consider the polynomial

As this polynomial is monotonically decreasing for positive , with and , has exactly one positive real root. Let be the unique positive real root of . Then . Inserting into and (9) proves the lemma. ∎

We clearly see that (8) is of exponential shape with an additional point mass at zero. Furthermore, we note that the shape of the distribution is mostly caused by the variable pulse duration. The noise then determines the optimal location and optimal number of constellation points.

For a transmission system, the \acMI is an upper bound on the achievable rate. In Fig. 4 we evaluate the time-scaled \acMI for various input distributions for a cutoff parameter . The capacity is depicted with a black solid line. To reduce the complexity of implementation, we constrain the constellation to linearly spaced points from to , i.e.,

and plot the corresponding time-scaled \acMI in colored solid lines with markers. We note that the time-scaled \acMI is very close to the capacity curve until it saturates. Increasing the modulation order shows significant increase in the time-scaled \acMI. For comparison purposes, we also plot the time-scaled \acMI for a system with fixed symbol duration and conventional uniform distribution on a linearly spaced constellation as in [7]. We observe that the rate saturates at very low values and that increasing the modulation order shows only slight improvement.

## Iv Probabilistic Eigenvalue Shaping

In the previous section, we observed a significant gap between the time-scaled \acMI of the system in [7] and the capacity. This gap is referred to as shaping gap. In order to close it, we propose a \acPES system as shown in Fig. 5, inspired by \acPAS [10].

In the \acPAS scheme, the sequence of uniformly distributed data bits is mapped to a sequence of positive amplitudes distributed half Gaussian by a \acDM. The binary image of this sequence is encoded by a systematic \acFEC code, resulting in uniformly distributed parity bits, which are then used to map the sequence of half Gaussian distributed symbols to a stream of Gaussian distributed symbols.

As the capacity-achieving distribution is not symmetric, \acPAS cannot be directly applied here. However, in order to keep the benefits of \acPAS, we wish to apply the \acDM before the \acFEC. We describe \acPES in the following with reference to Fig. 5. The binary data sequence of length bits is mapped by the \acDM to a sequence of eigenvalues of length distributed according to . The \acCCDM can be used for that purpose [16]. It is asymptotically optimal as its rate approaches the entropy of the desired channel input ,

For large block sizes, the gap between and is sufficiently small and can be neglected. Note that some of the possible eigenvalues may occur with probability zero.

We consider the modulation order to be a power of two such that we can define its binary image. The binary image of , , is then encoded by a systematic encoder with information block length , code length , and rate . The code is denoted by , with . The parity bits at the output of the encoder are mapped to a sequence of eigenvalues with modulation order and by the block in Fig. 5 such that they are uniformly distributed.

Assuming that a high code rate is used, we accept a small penalty with respect to the optimal channel input distribution and transmit and via time-sharing. The major difference of \acPES compared to \acPAS is the fact that the channel input distribution is not the optimal distribution due to the time-sharing with the sequence . Consequently, this causes a performance degradation. However, \acPES is highly flexible as the spectral efficiency can be adapted by the \acDM and the code rate , and a single code can be used. Note that every eigenvalue is protected by the code as \acFEC is performed after the \acDM and decoding and demapping can be performed independently. Thus, \AcPES shares these advantages with \acPAS.

We wish for a high code rate to keep the performance degradation due to the time-sharing low. More precisely, we wish to maximize the number of symbols distributed according to . The ratio between information symbols and coded symbols, denoted by , is an indication for the expected performance degradation,

(10) |

### Iv-a Parity symbols

The parity symbols at the output of the \acFEC code encoder are uniformly distributed. In Fig. 4, we observed that \acOOK with uniform signaling, i.e., and , is optimal for low \acSNR as it achieves capacity and performs reasonably well for high \acSNR. However, we note from Fig. 4 that for a higher order modulation, even with uniform signaling, higher rates are possible. Hence, here we consider a scenario where . We further increase the rate by only using a subset of and by picking the eigenvalues such that they are not uniformly spaced.

###### Example 1.

Consider the information symbol alphabet with . For the , we could pick with and .

To find the function that maps the parity symbols onto , we use a greedy algorithm as described in Algorithm 1. It starts with \acOOK, i.e., . For each of the remaining symbols , it calculates the time-scaled \acMI of , finds the symbol for which the time-scaled \acMI of is maximized, and adds it to . All symbols with a greater or equal imaginary part than are removed, i.e., the eigenvalues are removed. This process is repeated until there are no symbols left. We then choose the set of symbols that gives the highest time-scaled \acMI as . We note that this procedure does not guarantee an optimal solution. However, for an exhaustive search gives the same result as that of Algorithm 1.

In Fig. 6, we show for different modulation orders and \acpSNR. For , we note that for low \acSNR \acOOK gives the best result. Increasing the \acSNR results in a third level being added. The same behavior is observed for . Compared to , the third level is introduced at a slightly lower \acSNR. This results from the fact that for , different constellation points are available. For , we note that again a third level appears when increasing the \acSNR. When further increasing it, this third level moves to an eigenvalue with larger imaginary part and consequently a fourth level at an eigenvalue with lower imaginary part appears. This behavior can be observed repeatedly. To map the binary parity bits to the constellation points, we require to be a power of two. As this is not always the case (see Fig. 6), we pick the largest power of two that is smaller or equal than the number of constellation points given by Algorithm 1.

### Iv-B Achievable Rate of Probabilistic Eigenvalue Shaping

To characterize the performance of \acPES, we derive the achievable rate of \acPES, denoted by . We assume that the channel is memoryless and that the decoder performs bit-metric decoding.

###### Theorem 1.

The achievable rate of \acPES is

(11) |

###### Proof.

The achievable rate for \acPAS has been derived in [17]. For a system employing time-sharing, the resulting achievable rate is the average of the achievable rate of the two transmission schemes. ∎

In Fig. 7, we plot the capacity and the achievable rate (11) for different code rates , , , , , , , , , , and modulation orders for a cutoff parameter . and hence are chosen according to the results of Algorithm 1. For each modulation order, we notice that the curves cross at a certain \acSNR. For \acpSNR below this point, the lowest code rate (corresponding to the highest curve) gives the best performance whereas for \acpSNR above this point, the highest code rate (corresponding to the highest curve) gives the best performance. We note the influence of time-sharing, which results in a gap between the achievable rate and capacity. The gap increases for lower code rates as the channel input distribution deviates more from the optimal one.

## V Numerical Evaluation

In this section, we evaluate the performance of the \acPES scheme via discrete-time Monte-Carlo and \acSSFM simulations. For the mapping (see Fig. 5), we use Gray labeling. Also, for the \acFEC, we use the binary \acLDPC codes of the \acsDVBS2 standard with code length and code rates , , , , , , , , , , . For the parity symbols, we use the constellation arising from Algorithm 1, depicted in Fig. 6.

### V-a Detection

For the \acSSFM simulation, we simulate a continuous signal and hence, we require a detector. We use the following method to deal with the variable pulse durations: We set a threshold sufficiently higher than the noise. Once the magnitude of the signal rises above , we save the time as and when the magnitude of the signal falls below , we save the time as . We then extend the interval bounded by and , i.e., and . Calculating the \acNFT over the interval using the spectral method [1, Part II, Section IV] and only considering the imaginary part of the discrete eigenvalue gives the received symbol . This approach requires that the \acSNR is sufficiently high. As the model has the same requirement due to the perturbation approach, this requirement is fulfilled.

It may happen that due to noise, a received pulse never rises above the threshold . In this case, the shortest duration is assumed (i.e., the duration of the pulse with amplitude zero). This scenario can be avoided by choosing the threshold sufficiently lower than the lowest amplitude. Furthermore, due to the shape of the capacity-achieving distribution, lower amplitudes are less likely, hence preventing this scenario.

To find the best threshold, we tested the performance for different values of and found that the performance of a threshold at of the lowest non-zero amplitude of the constellation works best. We observed that small deviations of the threshold do not affect the performance significantly whereas setting the threshold too high (missing symbols with low amplitude) or to low (detecting a symbol where there is none) leads to performance degradation. Furthermore, we assume synchronization sequences spread sufficiently far apart in order not to impact the rate. We assume synchronization to be ideal such that it is guaranteed that error propagation is limited.

[ cap = Simulation parameters, caption = Simulation parameters., label = tab:fiber_param, pos = tb, width = doinside = ]llr \FLSpan length & & \NNSecond order dispersion & & \NNNonlinearity parameter & & \NNAttenuation & & \NNShortest pulse & & \NNLongest pulse & & \NNBandwidth & & \NNAvg. transmit power & & \NNCutoff parameter & & \LL

### V-B Numerical Results

We perform Monte-Carlo simulations of the discrete-time model (5) and show the results in Fig. 8, where we plot the transmission rate at a \acBER of for and . The highest transmission rate for each modulation order corresponds to the highest code rate . We notice that the gap to capacity for is smaller than for and . If we consider , i.e., the difference of the modulation order of and , we note that for a low , is low was well. For example, for , . Hence, the rate loss due to time-sharing is small. For , the gap to capacity is smaller than for . Considering the relevant \acSNR range, we note that is smaller for than for and thus explaining the smaller rate loss.

We also simulated the transmission over a fiber using \acSSFM simulations transmitting a train of solitons. We consider a \acSMF with parameters as in Table LABEL:tab:fiber_paramLABEL:tab:fiber_param and two different amplification schemes, distributed Raman amplification and lumped amplification using \acpEDFA. For both schemes, the peak power constraint is chosen such that the effect of the \acpEDFA can be neglected, i.e., . We employ the detection schemes as described in V-A and choose the cutoff-parameter , i.e., of the energy is contained in the pulse, for which the condition (4) is fulfilled. This then leads to a similar cutoff parameter as in [7]. For each modulation order , we determine the furthest distance over which we achieve a \acBER of less than and consider the rate gain compared to an unshaped system as in [7]. This results for in transmission over , , and at a rate gain of , , and , respectively. The results do not differ for distributed and lumped amplification as this is ensured by the peak power constraint.

## Vi Conclusion

In this paper, we presented a probabilistic shaping scheme for an \acNFT-based transmission system embedding information in the imaginary part of the discrete spectrum. It shapes the information symbols according to the capacity-achieving distribution and transmits them via time-sharing together with the uniformly distributed, suitably modulated parity symbols. We exploited the fact that the pulses of the signal in the time domain are of unequal length to improve the data rate compared to [7]. We used the time-scaled \acMI and derived the capacity-achieving distribution in closed form for the noiseless case and numerically in the general case. We showed that \aclPES significantly improves the performance of an \acNFT-based transmission scheme, and can almost double the data rate. As a possible extension of our work, the continuous spectrum can be used to increase the spectral efficiency [4].

## Acknowledgments

The authors would like to thank the anonymous reviewers for their feedback and comments which helped to improve this paper significantly. Especially, we would like to acknowledge one of the reviewers for proposing an elegant way to prove Lemma 1, which is included in this paper.

## References

- [1] M. I. Yousefi and F. R. Kschischang, “Information transmission using the nonlinear fourier transform, part I-III,” IEEE Trans. Inf. Theory, vol. 60, no. 7, pp. 4312–4369, Jul. 2014.
- [2] Z. Dong, S. Hari, T. Gui, K. Zhong, M. I. Yousefi, C. Lu, P. K. A. Wai, F. R. Kschischang, and A. P. T. Lau, “Nonlinear frequency division multiplexed transmissions based on NFT,” IEEE Photon. Technol. Lett., vol. 27, no. 15, pp. 1621–1623, Aug. 2015.
- [3] V. Aref, H. Bülow, K. Schuh, and W. Idler, “Experimental demonstration of nonlinear frequency division multiplexed transmission,” in Proc. 41st Eur. Conf. Opt. Commun. (ECOC), Valencia, Spain, Sep. 2015, pp. 1–3.
- [4] V. Aref, S. T. Le, and H. Bülow, “Demonstration of fully nonlinear spectrum modulated system in the highly nonlinear optical transmission regime,” in Proc. 42nd Eur. Conf. Opt. Commun. (ECOC), Düsseldorf, Germany, Sep. 2016, pp. 1–3.
- [5] A. Geisler and C. Schaeffer, “Experimental nonlinear frequency division multiplexed transmission using eigenvalues with symmetric real part,” in Proc. 42nd Eur. Conf. Opt. Commun. (ECOC), Düsseldorf, Germany, Sep. 2016, pp. 1–3.
- [6] S. Hari, M. I. Yousefi, and F. R. Kschischang, “Multieigenvalue communication,” J. Lightw. Technol., vol. 34, no. 13, pp. 3110–3117, Jul. 2016.
- [7] N. A. Shevchenko, S. A. Derevyanko, J. E. Prilepsky, A. Alvarado, P. Bayvel, and S. K. Turitsyn, “Capacity lower bounds of the noncentral chi-channel with applications to soliton amplitude modulation,” IEEE Trans. Commun., to appear.
- [8] G. D. Forney, R. Gallager, G. Lang, F. Longstaff, and S. Qureshi, “Efficient modulation for band-limited channels,” IEEE J. Sel. Areas Commun., vol. 2, no. 5, pp. 632–647, Sep. 1984.
- [9] F.-W. Sun and H. C. A. van Tilborg, “Approaching capacity by equiprobable signaling on the Gaussian channel,” IEEE Trans. Inf. Theory, vol. 39, no. 5, pp. 1714–1716, Sep. 1993.
- [10] G. Böcherer, F. Steiner, and P. Schulte, “Bandwidth efficient and rate-matched low-density parity-check coded modulation,” IEEE Trans. Commun., vol. 63, no. 12, pp. 4651–4665, Dec. 2015.
- [11] S. A. Derevyanko, S. K. Turitsyn, and D. A. Yakushev, “Fokker-planck equation approach to the description of soliton statistics in optical fiber transmission systems,” J. Opt. Soc. Am. B, vol. 22, no. 4, pp. 743–752, Apr. 2005.
- [12] M. Zafruullah, M. Waris, and M. K. Islam, “Simulation and design of EDFAs for long-haul soliton based communication systems,” in Proc. Asia-Pacific Conf. Commun. (APCC), Penang, Malaysia, Sep. 2003.
- [13] S. Verdú, “On channel capacity per unit cost,” IEEE Trans. Inf. Theory, vol. 36, no. 5, pp. 1019–1030, Sep. 1990.
- [14] I. M. Stancu-Minasian, Fractional Programming, 1st ed. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1997.
- [15] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ, USA: Wiley, 2006.
- [16] P. Schulte and G. Böcherer, “Constant composition distribution matching,” IEEE Trans. Inf. Theory, vol. 62, no. 1, pp. 430–434, Jan. 2016.
- [17] G. Böcherer, “Achievable rates for probabilistic shaping,” ArXiv e-prints, Jul. 2017. [Online]. Available: https://arxiv.org/abs/1707.01134