Polar Write Once Memory Codes

# Polar Write Once Memory Codes

## Abstract

A coding scheme for write once memory (WOM) using polar codes is presented. It is shown that the scheme achieves the capacity region of noiseless WOMs when an arbitrary number of multiple writes is permitted. The encoding and decoding complexities scale as O(N log N) where N is the blocklength. For N sufficiently large, the error probability decreases sub-exponentially in N. The results can be generalized from binary to generalized WOMs, described by an arbitrary directed acyclic graph, using nonbinary polar codes. In the derivation we also obtain results on the typical distortion of polar codes for lossy source coding. Some simulation results with finite length codes are presented.

## 1Introduction

The model of a write once memory (WOM) was proposed by Rivest and Shamir in [1]. In write once memories writing may be irreversible in the sense that once a memory cell is in some state it cannot easily convert to a preceding state. Flash memory is an important example since the charge level of each memory cell can only increase, and it is not possible to erase a single memory cell. It is possible to erase together a complete block of cells which comprises a large number of cells, but this is a costly operation and it reduces the life cycle of the device.

Consider a binary write-once memory (WOM) which is comprised of memory cells. Suppose that we write on the device times, and denote the number of possible messages in the th write by (). The number of bits that are written in the th write is and the corresponding code rate is . Let denote the dimensional state vector of the WOM at time (generation) for , and suppose that . For , the binary message vector is ( bits). Given and the memory state , the encoder computes using an encoding function and writes the result on the WOM. The WOM constraints can be expressed by where the vector inequality applies componentwise. Since the WOM is binary, and are binary vectors, so that if for some component , then . The decoder uses a decoding function to compute the decoded message . The goal is to design a low complexity read-write scheme that satisfies the WOM constraints and achieves for with high probability for any set of messages . As is commonly assumed in the literature (see e.g. [2] where it is explained why this assumption does not affect the WOM rate), we also assume that the generation number on each write and read is known.

The capacity region of the WOM is [3]

( denotes a -dimensional vector with positive elements; is the binary entropy function). Note that this is both the zero-error capacity region and the -error capacity region (see the comment after the statement of Theorem 4 in [3]). We also define the maximum average rate,

The maximum average rate was shown to be [3] . This means that the total number of bits that can be stored on WOM cells in writes is which is significantly higher than . The maximum fixed rate was also obtained [3]. WOM codes were proposed in the past by various authors, e.g. [1], [4], [5], [6], [7], [8], [2], [9] and references therein. For the case where there are two writes, , the method in [9] can approach capacity in polynomial in the blocklength computational complexity. To the best of our knowledge, this was the first solution with this property.

In this work, which is an expanded version of [10], we propose a new family of WOM codes based on polar codes [11]. The method relies on the fact that polar codes are asymptotically optimal for lossy source coding [12] and can be encoded and decoded efficiently ( operations where is the blocklength). We show that our method can achieve any point in the capacity region of noiseless WOMs when an arbitrary number of multiple writes is permitted. The encoding and decoding complexities scale as . For sufficiently large, the error probability is at most for any . We demonstrate that this method can be used to construct actual practical WOM codes. We also show that our results also apply to generalized WOMs, described by an arbitrary directed acyclic graph (DAG), using nonbinary polar codes. In the derivation we also obtain results on the typical distortion of polar codes for lossy source coding.

Recently, another WOM code was proposed [13], that can approach any point in the capacity region of noiseless WOMs in computational complexity that scales polynomially with the blocklength. On the one hand, the method in [13] is deterministic and guarantees zero error, while our method is probabilistic and only guarantees a vanishing with the blocklength error probability. On the other hand, the method in [13] requires a very long blocklength to closely approach capacity, and it is not clear whether it can be used in practice. In an actual WOM (e.g., flash memory) there is also some channel noise. Hence, there is some small inevitable error.

The rest of this paper is organized as follows. In Section 2 we provide some background on polar codes for channel and lossy source coding. In Section 3 we provide extended results on polar codes for lossy source coding that will be required later. In Section 4 we present the new proposed polar WOM code for the binary case and analyze its performance. In Section 5 we present a generalization of our solution to generalized WOMs, described by an arbitrary DAG, using nonbinary polar codes. In Section 6 we present some simulation results. Finally, Section 7 concludes the paper.

## 2Background on Polar codes

In his seminal work [11], Arikan has introduced Polar codes for channel coding and showed that they can achieve the symmetric capacity (i.e. the capacity under uniform input distribution) of an arbitrary binary-input channel. In [14] it was shown that the results can be generalized to arbitrary discrete memoryless channels. We will follow the notation in [12]. Let and let its th Kronecker product be . Also denote . Let be an -dimensional binary message vector, and let where the matrix multiplication is over . Suppose that we transmit over a memoryless binary-input channel with transition probability and channel output vector . If is chosen at random with uniform probability then the resulting probability distribution is given by

Define the following sub-channels,

Denote by the symmetric capacity of the channel (it is the channel capacity when the channel is memoryless binary-input output symmetric (MBIOS)) and by the Bhattacharyya parameters of the sub-channels . In [11], [15] it was shown that asymptotically in , a fraction of the sub-channels satisfy for any . Based on this result the following communication scheme was proposed. Let be the code rate. Denote by the set of sub-channels with the highest values of (denoted in the sequel as the frozen set), and by the remaining sub-channels. Fix the input to the sub-channels in to some arbitrary frozen vector (known both to the encoder and to the decoder) and use the channels in to transmit information. The encoder then transmits over the channel. The decoder applies the following successive cancelation (SC) scheme. For , if then ( is common knowledge), otherwise

where

Asymptotically, reliable communication under SC decoding is possible for any . The error probability is upper bounded by for any , and the SC decoder can be implemented in complexity .

Polar codes can also be used for lossy source coding [12]. Consider a binary symmetric source (BSS), i.e. a random binary vector uniformly distributed over all -dimensional binary vectors. Let be a distance measure between two binary vectors, and , such that where and . Define a binary symmetric channel (BSC) with crossover parameter and construct a polar code with frozen set that consists of the sub-channels with the largest values of . This code uses some arbitrary frozen vector which is known both to the encoder and to the decoder (e.g. ) and has rate .

Given the SC encoder applies the following scheme. For , if then , otherwise

(w.p. denotes with probability) The complexity of this scheme is . Since is common knowledge, the decoder only needs to obtain from the encoder ( bits). It can then reconstruct the approximating source codeword using . Let be the average distortion of this polar code (the averaging is over both the source vector, , and over the approximating source codeword, , which is determined at random from ). Also denote by the rate distortion function. In [12] it was shown, given any , and , that for (i.e., ) sufficiently large, , and any frozen vector , the polar code with rate under SC encoding satisfies

In fact, as noted in [12], the proof of is not restricted to a BSS and extends to general sources, e.g. a binary erasure source [12].

## 3Extended results for Polar source codes

Although the result in [12] is concerned only with the average distortion, one may strengthen it by combining it with the strong converse result of the rate distortion theorem in [16]. The strong converse asserts that for any , if is chosen sufficiently small and then can be made arbitrarily small by choosing sufficiently large. Combining this with , we can conclude, for a polar code designed for a BSC(), with and sufficiently small, that

for any .

We now extend the result in in order to obtain an improved upper bound estimate (as a function of ) on the considered probability. The following discussion is valid for an arbitrary discrete MBIOS, , in . As in [12] we construct a source polar code with frozen set defined by,

(note that depends on , however for simplicity our notation does not show this dependence explicitly) and

Hence, for any , if is large enough then the rate of the code satisfies,

Let be a source vector produced by a sequence of independent identically distributed (i.i.d.) realizations of . If is chosen at random with uniform probability then the vector produced by the SC encoder (that utilizes ) has a conditional probability distribution given by [12]

where

On the other hand, the conditional probability of given corresponding to is,

In the sequel we employ standard strong typicality arguments. Similarly to the notation in [17], we define an -strongly typical sequence with respect to a distribution on the finite set , and denote it by (or for short) as follows. Let denote the number of occurrences of the symbol in the sequence . Then if the following two conditions hold. First, for all with , . Second, for all with , . Similarly we define -strongly typical sequences with respect to a distribution on the finite set , and denote it by (or for short). We denote by the number of occurrences of in , and require the following. First, for all with , . Second, for all with , . The definition of -strong typicality can be extended to more than two sequences in the obvious way.

In our case . Note that is a full rank matrix. Therefore each vector corresponds to exactly one vector . We say that if with respect to the probability distribution (see ).

Recall that the SC encoder’s output has conditional probability distribution given by -. Hence, Theorem ? asserts that, for sufficiently large, .

Proof: To prove the theorem we use the following result of [12],

Hence,

In addition we claim the following,

for some constant (that can depend on ). We now prove .

In the first equality we have used the fact that implies . Let be a binary random variable such that if and otherwise. Then,

Therefore,

where the inequality is due to Hoeffding’s inequality (using the fact ). Hence,

which, together with , proves . From

Combining this with we get

Recalling the definition of , , the theorem follows immediately.
Although not needed in the rest of the paper, we can now improve the inequality using the following Theorem.

Proof: Since then,

Denote by , and the events

Then for sufficiently large,

The last inequality is due to Theorem ?. This proves (since holds for any )

## 4The proposed polar WOM code

Consider the binary WOM problem that was defined in Section 1. Given some set of parameters , and , we wish to show that we can construct a reliable polar coding scheme for any set of WOM rates in the capacity region . That is, the rates satisfy

where

For that purpose we consider the following test channels. The input set of each channel is . The output set is . Denote the input random variable by and the output by . The probability transition function of the th channel is defined by,

where

This channel is also shown in Figure 1. It is easy to verify that the capacity of this channel is and that the capacity achieving input distribution is symmetric, i.e., .

For each channel we design a polar code with blocklength and frozen set of sub-channels defined by . The rate is

where is arbitrarily small for sufficiently large. This code will be used as a source code.

Denote the information sequence by and the sequence of WOM states by . Hence and , where and are the th encoding and decoding functions, respectively, and is the retrieved information sequence. We define and as follows.

Encoding function, :

1. Let where denotes bitwise XOR and is a sample from an dimensional uniformly distributed random binary vector. The vector is a common randomness source (dither), known both to the encoder and to the decoder.

2. Let and . Compress the vector using the th polar code with . This results in a vector and a vector .

3. Finally .

Decoding function, :

1. Let .

2. where denotes the elements of the vector in the set .

Note that the information is embedded within the set . Hence, when considered as a WOM code, our code has rate , where is the rate of the polar source code.

For the sake of the proof we slightly modify the coding scheme as follows:

To prove the theorem we need the following lemma1. Consider an i.i.d. source with the following probability distribution,

Note that this source has the marginal distribution of the output of the th channel defined by - under a symmetric input distribution.

Proof: According to Theorem ?, for (i.e., ) large enough,

w.p. at least . Consider all possible triples , where , and . From the definition of , if then (w.p. at least ),

and if then

In addition, using and the channel definition - we have,

Combining this with we obtain

Hence,

This proves . Similarly is due to since from the definition of the channel.
We proceed to the proof of Theorem ?. We denote by and the random variables corresponding to and .

Proof of Theorem ?: Note that we only need to prove successful encoding since the WOM is noiseless.

Recall our definition . Suppose that . Our first claim is that under this assumption, for sufficiently small and sufficiently large, w.p. at least , the encoding will be successful and . For notational simplicity we use instead of , and instead of . Considering step 1 of the encoding we see that , after the random permutation described in (M3), can be considered as i.i.d. sampling of the source defined in (by the fact that , and since is uniformly distributed). Hence, by Lemma ? and (M1), the compression of this vector in step 2 satisfies the following for any and sufficiently large w.p. at least .

1. If then .

2. For at most components we have and .

Hence, in step 3 of the encoding, if then (i.e. the WOM constraints are satisfied). In addition there are at most components for which and . Therefore, w.p. at least , the vectors and satisfy the WOM constraints and

(in the first inequality we have used the fact that for sufficiently large, w.p. at least for some independent of ). Setting yields our first claim.

From we know that in (M4) will indeed satisfy the condition w.p. at least . The proof of the theorem now follows by using induction on to conclude that (w.p. at least ) the th encoding is successful and . The complexity claim is due to the results in [11].
Notes:

1. The test channel in the first write is actually a BSC (since in Figure 1). Similarly, in the last () write we can merge together the source symbols and (note that so that and are statistically independent given ), thus obtaining a test channel which is a binary erasure channel (BEC).

2. Consider for example a flash memory device. In practice, the dither, , can be determined from the address of the word (e.g. the address is used as a seed value to a random number generator).

3. In the rare event where an encoding error has occurred, the encoder may re-apply the encoding using another dither vector value. Furthermore, the decoder can realize which value of dither vector should be used in various ways. One possibility is that this information is communicated, similarly to the assumption that the generation number is known. Another possibility is that the decoder will switch to the next value of the dither value upon detecting decoding failure, e.g. by using CRC information. By repeating this procedure of re-encoding upon a failure event at the encoder several times, one can reduce the error probability as much as required.

## 5Generalization to nonbinary polar WOM codes

### 5.1Nonbinary polar codes

Nonbinary polar codes over a -ary alphabet () for channel coding over arbitrary discrete memoryless channels were proposed in [14]. Nonbinary polar codes over a -ary alphabet for lossy source coding of a memoryless source were proposed in [18]. First suppose that is prime. Similarly to the binary case, the codeword of a -ary polar code is related to the -dimensional () message vector by the relation , where the matrix is the same as in the binary case. However, now , where . Suppose that we transmit over a memoryless channel with transition probability and channel output vector . If is chosen at random with uniform probability over then the resulting probability distribution is given by

Define the following sub-channels,

We denote by and , respectively, the symmetric capacity parameters of and . In [14] it was shown that the sub-channels polarize as in the binary case with the same asymptotic polarization rate. The frozen set is chosen similarly to the binary case. Asymptotically, reliable communication under SC decoding is possible for any rate . The error probability is upper bounded by for any , and the decoder can be implemented in complexity .

Nonbinary polar codes were also proposed for lossy source coding [18]. Consider some random variable . For simplicity we assume that is finite. Also denote . Let the source vector random variable be created by a sequence of i.i.d. realizations of . Let be some (finite) distance measure between and . Furthermore, for and , we define . Given some distortion level, , let be the test channel that achieves the symmetric rate distortion bound, , (i.e., the rate distortion bound under the constraint that is uniformly distributed over ) for the source at distortion level . Using that channel, , we construct a polar code with frozen set defined by [18]

where . Given the SC encoder applies the following scheme. For , if then , otherwise

The complexity of this scheme is . It was shown [18] that

Hence, for sufficiently large, the rate of the code, , approaches . Furthermore, for any frozen vector, ,

under SC encoding, where is the average distortion.

In fact, using the results in [18], the statements in Section 3 immediately extend to the nonbinary case. Consider a polar code constructed using some discrete channel with frozen set defined in . Suppose that is chosen at random with uniform probability. Then, similarly to -, the vector produced by the SC encoder has a conditional probability distribution given by

where

On the other hand, the conditional probability of given corresponding to is

Similarly to above, it was shown in [18] that

Combining with exactly the same arguments that were presented in Theorem ?, yields the following generalization to Theorem ?.

Although not needed in the sequel, Theorem ? also generalizes to the -ary case:

Proof: Given some , we set sufficiently small and sufficiently large, thus obtaining

where the last inequality is due to Theorem ?, and the fact that if , for sufficiently small and sufficiently large, then .
When is not prime, the results in this section still apply provided that the polarization transformation is modified as described in [14]. In each step of the transformation, instead of

we use

where is a permutation, chosen at random with uniform probability over .

### 5.2The generalized WOM problem

Following [19], the generalized WOM is described by a rooted DAG, represented by its set of states (vertices) and by its set of edges . The set represents the possible states of each memory cell. We say that there exists a path from state to state in the WOM, and denote it by