Asymptotically Optimal Policies for Hard-deadline Scheduling over Fading ChannelsThe work of J. Lee is supported by a Motorola Partnership in Research Grant.

Asymptotically Optimal Policies for Hard-deadline Scheduling over Fading ChannelsThe work of J. Lee is supported by a Motorola Partnership in Research Grant.

Abstract

A hard-deadline, opportunistic scheduling problem in which bits must be transmitted within time-slots over a time-varying channel is studied: the transmitter must decide how many bits to serve in each slot based on knowledge of the current channel but without knowledge of the channel in future slots, with the objective of minimizing expected transmission energy. In order to focus on the effects of delay and fading, we assume that no other packets are scheduled simultaneously and no outage is considered. We also assume that the scheduler can transmit at capacity where the underlying noise channel is Gaussian such that the energy-bit relation is a Shannon-type exponential function. No closed form solution for the optimal policy is known for this problem, which is naturally formulated as a finite-horizon dynamic program, but three different policies are shown to be optimal in the limiting regimes where is fixed and is large, is fixed and is small, and where and are simultaneously taken to infinity. In addition, the advantage of optimal scheduling is quantified relative to a non-opportunistic (i.e., channel-blind) equal-bit policy.

1Introduction

Although the basic tenants of opportunistic communication over time-varying channels are well understood, much less is known when short-term delay constraints are imposed. Given the increasing importance of delay constrained communication, e.g., multimedia transmission, it is critical to understand how to optimize communication performance in delay-limited settings. Thereby motivated, we consider the discrete-time causal scheduling problem of transmitting a packet of bits within a hard deadline of slots over a time-varying channel. At each time slot the scheduler determines how many bits to transmit based on the current channel state information (CSI), but without future CSI, and the number of unserved bits, with the objective of minimizing the expected total energy cost. In order to focus on the interplay between opportunistic communication and delay, it is assumed that no other packets are simultaneously transmitted, and the hard deadline must always be met.

This basic problem was formulated as a finite-horizon dynamic program in [1], but an analytic form for the optimal scheduling policy cannot be found for most energy-bit relationships. Indeed, such a problem is difficult to solve because the transmitter only has causal CSI and because a particular rate must be guaranteed over a finite time-horizon. In our earlier work [2], we studied this problem in the setting where transmission occurs at the capacity of the underlying Gaussian noise channel and proposed different suboptimal scheduling policies.

Building upon [2], in this work we prove the optimality of certain scheduling policies in different asymptotic regimes. In particular, we show that:

  • When the number of bits is large, the optimal scheduling policy is a linear combination of a delay-associated term and an opportunistic-term. The opportunistic term depends on the logarithm of the channel quality, and the weight of this term decreases as the deadline approaches.

  • When the number of bits is small, a one-shot threshold policy where all bits are transmitted in the first slot in which the channel quality is above a specified threshold is optimal.

  • When the number of bits and the time horizon are both large, a waterfilling-like policy is optimal.

These results are particularly important in light of the fact that the general optimal solution appears intractable. In addition, the different asymptotically optimal schedulers provide an understanding of how the conflicting objectives of opportunistic communication (i.e. transmit only when the channel is strong) and delay-limited communication are optimally balanced, and how this balance depends on the time-horizon and the packet size.

In addition to showing asymptotic optimality, we also quantify the power benefits of optimal channel- and delay-aware scheduling relative to non-opportunistic equal-bit/rate transmission. These results identify that the largest benefits are obtained for severe fading, small packet size, and large time horizon. Moreover, we analyze the behavior of the scheduling policies for large and small using results on high and low SNR analysis in [3] and [4].

1.1Prior Work

The basic scheduling problem was first proposed and formulated as a finite-horizon dynamic program (DP) in [1]. In that work a closed-form solution for the optimal scheduler is provided for the special case where the number of transmitted bits is linear in the transmit energy/power and the channel quality is restricted to integer multiples of some constant. In [5], the formulation is extended to continuous time; closed-form descriptions of the optimal policies for some specific models are found, but these do not directly apply to the discrete-time problem considered here. In our earlier work [2], we specialized [1] to the setting where the energy-bit relationship is dictated by AWGN channel capacity and proposed several different suboptimal policies. Two of these policies are shown to be asymptotically optimal in the present work.

Prior work has also considered the dual problem of (expected) rate maximization over a finite time horizon, i.e., the transmitter determines how to utilize a finite energy budget over a finite number of slots with the objective of maximizing the expected rate. This problem was considered in [6], and a one-shot threshold policy and equal power scheduling are shown to be asymptotically optimal in the low- and high-SNR regimes, respectively. This work was extended to a multiple-access setting in [7].

Because transmission scheduling corresponds to power allocation, it is also useful to put the present work in the context of prior work on optimal power allocation in fading channels, with and without delay constraints. In [8] it is established that waterfilling maximizes the long-term average transmitted rate; analogously, the long-term average power needed to achieve a particular long-term average rate is minimized by waterfilling. At the other extreme, channel inversion is known to be the optimal policy when a constant rate is desired in every fading state [9]. The current setting lies between these two extremes, because our objective is to find a power allocation policy (based on causal CSI) such that a particular rate (i.e. ) is guaranteed over fading slots. The case clearly corresponds to zero-outage/delay-limited capacity in [9], while we intuitively expect to correspond to the long-term average rate scenario of [8]. The latter correspondence is made precise in Section 4.3.

2Problem Setup

This section summarizes the scheduling problem introduced in [2], which is a discrete-time delay constrained scheduling problem over a wireless fading channel as illustrated in Figure 1.

Figure 1: Point-to-point delay constrained scheduling
Figure 1: Point-to-point delay constrained scheduling

A packet of bits1 is to be transmitted within a deadline of slots. The scheduler determines the number of bits to allocate at each time slot using the fading realization/statistics to minimize the total expected transmit energy while satisfying the delay deadline constraint. We assume no other packets are to be scheduled simultaneously and that no outage is allowed.

The discrete-time slots are indexed by in descending order (i.e., starting at down to ), and thus represents the number of remaining slots to the deadline. The channel state (at slot ) is denoted by in power units. We assume that are independently and identically distributed (i.i.d.) and the probability density function (PDF) and the cumulative distribution function (CDF) are denoted by and , respectively2 The scheduler is assumed to have only causal knowledge of channel states (at time , are known but are unknown). Assuming unit variance Gaussian additive noise and transmission at capacity, if energy is used under channel state , the number of transmitted bits is given by:

By inverting this formula, the required energy to transmit bits with channel state is:

The queue state is denoted by , which is the number of unserved bits at the beginning of slot . Thus, the number of bits to allocate at slot is determined by the queue state and the channel state . That is, a scheduling policy is a sequence of functions, indexed by the time step, that map from the current queue and channel state to the bit allocation: . As for terminology, the entire set is referred to as a policy or a scheduler, and each element of it is referred to as a policy function or a scheduling function.

3Optimal & Suboptimal Schedulers

In this section we describe the optimal scheduling policy, two suboptimal policies introduced in [2], and a heuristic modification of the ergodic (infinite-horizon) policy.

3.1The Optimal Scheduler

The optimal scheduler for the hard-deadlined causal scheduling problem described in Section 2 can be found by solving the sequential optimization:

where denotes the expectation operator. Equivalently, this can be formulated as a finite-horizon dynamic program (DP):

where is the cost-to-go function, i.e., the expected cost to serve bits in slots if the optimal policy is used.

At the final step () all remaining bits must be served because outage is not allowed. At all other steps the optimal bit allocation is determined by balancing the current energy cost and the expected energy expenditure in future slots . Although the optimal scheduler can be found in closed form for (Section III-A in [2]), it is not possible to do the same for because no close-form expression for the cost-to-go function is known for . Nevertheless, the optimal scheduling functions can be described as [2]:

where represents the solution3 of the argument equation. The differentiability of can be verified by the properties of convexity and infimal convolution (pp. 254-255 in [10]).

See Appendix Section 7.

Intuitively, monotonicity in the queue and the channel state is expected because more bits should be served when there remain more unserved bits or when the channel is strong.

3.2The Boundary-relaxed Scheduler

The first suboptimal scheduler is derived by relaxing the boundary constraints (we no longer require ), while maintaining the deadline constraint . The relaxed version of the original optimization is given by

where and can be calculated by induction [2]:

where denotes the geometric mean operator (i.e., ) and are the fractional moments of the fading distribution defined as:

Due to the simple form of the cost-to-go function , by substituting into and solving the minimization we obtain the following closed-form description of the optimal policy for the relaxed problem [2]:

where serves as a channel threshold given by

The policy function in solves the boundary-relaxed problem but does not guarantee in each slot.

To obtain a policy for the actual unrelaxed problem, we simply truncate at 0 and , and reach what we refer to as the boundary-relaxed scheduler4:

where denotes truncation below and above . Notice that this policy function is optimal for , i.e., for all and since .

Note that this same scheduling policy can be reached using the high-SNR approximation . More specifically, if the energy-bit relationship in is approximated by:

and the optimal policy is found with the same relaxation as above, the policy in also reached.

3.3The One-shot Scheduler

The second scheduler is derived by modifying the boundary constraint into a stronger constraint (equivalently, ), i.e., in each slot either the entire packet is transmitted or nothing is transmitted. Then, the dynamic program is given by

where . Equivalently, we can express the above DP as an optimal stopping problem [11] (this can be shown inductively with ):

The optimal solution is a sequential threshold policy [2]:

where is the channel threshold in slot , and is recursively computed as:

Notice that the thresholds depend only on the channel statistics and are independent of , and that the thresholds decrease as the deadline approaches (i.e., as decreases) [2].

3.4The Delay-constrained Ergodic Scheduler

The above two suboptimal policies are developed to solve the DP, formulated in , by simplifying the cost-to-go function. Unlike these two policies, we now consider a policy by modifying the ergodic scheduling policy to meet the hard deadline constraint. The ergodic policy is the optimal solution to a problem of minimizing the average energy to transmit a certain average number of bits (i.e., no hard deadline constraint). If we denote this average rate constraint as , the ergodic scheduling policy function , which does not depend on and determines how many bits to transmit based only upon the channel state , is determined by solving:

This optimization is readily solvable by standard waterfilling [12] and the solution is given by

where serves as a channel threshold and is the solution to:

When the time-horizon is large, we intuitively expect the ergodic policy to perform well in the delay-limited setting considered here. In order to meet the deadline constraint, we utilize the ergodic policy, with for some ,5 at each time step with the exception that all remaining unserved bits are transmitted in the final step:

which is referred to as the delay-constrained ergodic scheduler.

4Asymptotic Optimality

This section investigates the optimality of the suboptimal schedulers introduced in the previous section. The optimality can be analyzed in two ways: optimality in policy and optimality in the associated energy cost. Both forms of optimality are shown for the boundary-relaxed scheduler and the one-shot scheduler, whereas energy optimality is shown for the delay-constrained ergodic scheduler.

4.1Large and Finite : Asymptotic Optimality of Boundary-relaxed Scheduler

We first prove that the boundary-relaxed scheduler converges to the optimal policy when is fixed and the number of bits is taken to infinity. When is large, we intuitively expect that the optimal policy will allocate strictly positive bits to all time slots with high probability due to the nature of the Shannon energy-bit function. Thus, we expect the boundary-relaxed scheduler to coincide with the optimal policy when the number of bits to serve is large. The following theorem makes this relationship precise:

See Appendix Section 8.

Figure 2a illustrates the behaviors of and vs. for different values of and Figure 2b illustrates the behaviors in terms of for different values of , when is a truncated exponential variable with a support of (the pdf is given in ). When , for instance, it can be seen that the difference between and gets smaller as increases in both Figure 2a and Figure 2b. Notice also that the value of making the difference between and small varies with the value of . As can be seen in Figure 2b, larger is required for larger . Additionally, we can observe from Figure 2b that the slope of the plots is 1 in small and the slope changes to for some larger depending on the value of , which is due to the policy function in .

Figure 2: The behavior of b_3^\text{relax} and b_3^\text{opt} when \{g_t\} are truncated exponential variables with support [0.001, 10^6]
The behavior of b_3^\text{relax} and b_3^\text{opt} when \{g_t\} are truncated exponential variables with support [0.001, 10^6]
Figure 2: The behavior of and when are truncated exponential variables with support

We now compare the incurred energy costs of the two polices. We first define the incurred energy with the boundary-relaxed scheduler as:

where . Notice that is not an optimization but is instead a calculation based upon the definition of in . Also notice that denotes the cost for the actual un-relaxed problem (the energy cost with a policy satisfying for all ), while the function defined in Section 3.2 denotes the cost for the relaxed problem (the energy cost with a policy that may not satisfy ).

See Appendix Section 9.

While proving Theorem ?, we obtain the asymptotic relations between the actual cost of the boundary-relaxed scheduler, the cost of the relaxed version, and the cost of the optimal one, i.e., and . Since we have a closed-form expression of shown in , these relations help us understand the behavior of the optimal cost for large , which will be discussed in Section 5.1.

Although the analytic form of the optimal scheduler is not available, the above two theorems tell us that the boundary-relaxed scheduler, which has a very simple form that can be easily implemented, is asymptotically optimal when the number of bits to transmit is sufficiently large. Furthermore, the scheduling function provides intuition on the interplay between the channel quality and the deadline. When the deadline is far away (large ), the bit allocation is almost completely determined by the channel quality; on the other hand, as the deadline approaches (small ), the policy becomes less opportunistic.

4.2Small and Finite : Asymptotic Optimality of One-shot Scheduler

We now show that the one-shot scheduling policy is asymptotically optimal when is fixed and is taken to zero. We first show convergence in terms of the policy function, and then in terms of the energy cost.

See Appendix Section 10.

Furthermore, we claim that the costs of the two policies also converge to one another. Since the average costs for the two policies converge to zero as , cost convergence is investigated by studying the ratio, rather than the absolute difference, between the two costs:

See Appendix Section 11.

In Figure 3 the additional power cost of one-shot scheduling relative to optimal scheduling (i.e., ) is plotted versus the number of bits for and when is a truncated exponential variable with a support of . As can be seen, the ratio converges to 1 (0 dB) as converges to 0.

Figure 3: Additional power cost of one-shot scheduling relative to optimal scheduling as a function of B, when g is a truncated exponential variable with support [0.001, 10^6]
Figure 3: Additional power cost of one-shot scheduling relative to optimal scheduling as a function of , when is a truncated exponential variable with support

The optimality of one-shot scheduling can also be seen by upper and lower bounding the energy-bit function by linear functions. Using for , we have:

If we solve the DP using either of these bounds on the energy-bit function, the optimization in becomes a linear program and thus a one-shot policy is optimal because a constrained linear program has a solution at a boundary of the constraint. Furthermore, the one-shot policy based on the upper and lower bounds converge to the one-shot policy described in Section 3.3 as because the bounds themselves converge.

4.3Large : Asymptotic Optimality of Causal Delay-constrained Ergodic Scheduler

When and are simultaneously taken to infinity at a particular ratio (i.e., with for some constant ), we can show the energy-cost optimality of the ergodic policy in Section 3.4.

The average energy cost of the delay-constrained ergodic scheduler is given by

where denotes the remaining bits at the final slot and the value of is chosen such that

See Appendix Section 12.

The effect of the hard-deadline becomes inconsequential for large because the channel realizations over the deadline horizon closely match the fading distribution. As a result, the delay-constrained ergodic scheduler performs similar to the ergodic scheduler when is large. Moreover, the delay-constrained ergodic scheduler becomes causal optimal since any causal policy cannot be better than the ergodic policy.

4.4Numerical Results: Policy Comparison

In order to compare the different asymptotically optimal policies, we compare their respective energy costs for different time-horizons . Since the analytical expression for the optimal policy is not available for , we solve the dynamic programming numerically by the discretization method [13]. In Figure 4 the per-slot energy consumption of

Figure 4: Per slot energy cost for T=5 and T=50
Per slot energy cost for T=5 and T=50
Figure 4: Per slot energy cost for and

the suboptimal schedulers is plotted for and assuming that the fading are i.i.d. truncated exponential with a support of , i.e.,

where is a normalization factor. As can be seen, the one-shot scheduler is near-optimal only when is small. The other schedulers performs close to the optimal through all ranges of . When , as in Figure 4a, the delay-constrained ergodic scheduler performs worse than the boundary-relaxed for all . This is because is too small for the delay-constrained ergodic scheduler to perform like the optimal. When , as in Figure 4b, there exists a range of such that the delay-constrained ergodic scheduler outperforms the boundary-relaxed scheduler. This phenomenon can be clearly illustrated in Figure 5, where the number of bits are given in logarithmic scale.

Figure 5: Average energy cost per slot for T=50 when g is a truncated exponential variable with support [0.001, 10^6]
Figure 5: Average energy cost per slot for when is a truncated exponential variable with support

As can be seen in Figure 5, the one-shot scheduler performs best for small (region ) and the boundary-relaxed scheduler outperforms when is very large (region ). In the middle range (region ), the delay-constrained ergodic scheduler performs better than the other two.

5Scheduling Gain

We have shown that the boundary-relaxed and the one-shot schedulers are asymptotically optimal as and , respectively. Another interesting issue is quantifying the advantage these schedulers provide compared to a non-opportunistic equal-bit scheduler that simply transmits bits during each time slot.

To compare energy performance, we first calculate the expected energy cost of the equal-bit scheduler, which is

since the equal-bit scheduler chooses for all . Notice that the equal-bit scheduler achieves the delay-limited capacity [9] [14] (i.e., zero-outage capacity) with rate .

We define the scheduling gain as the ratio between the expected energy expenditures:

and quantify its behavior in the following theorem:

See Appendix Section 13.

Since the boundary-relaxed scheduler is optimal as , the scheduling gain of the optimal scheduler and that of the boundary-relaxed scheduler are the same as ; the same is true for the optimal and the one-shot scheduler as . The plot of scheduling gain vs. in Figure 6 agrees with the results of Theorem ?. Intuitively, scheduling delivers a larger power gain for small because in such scenarios one can be very opportunistic and transmit the entire packet once a sufficiently good channel state is realized. For larger , however, it is inefficient to transmit the entire packet in a single slot (because energy increases exponentially with the number of bits) and thus transmissions must be spread across many slots (in fact, all slots are used as ), which reduces the channel quality during those transmissions and thus reduces the benefit of scheduling.

In Table 1 the limited scheduling gains are showed for various fading distributions. As intuitively expected, the scheduling gain is larger for more severe fading distributions and for larger time horizons . From the fact that both and decrease as increases [2], the asymptotic scheduling gains in and increase with .

Figure 6: Scheduling gain \Delta_5 when g is a truncated exponential variable with support [0.001, 10^6]
Figure 6: Scheduling gain when is a truncated exponential variable with support
Table 1: Scheduling gain examples for several fading distributions
[0cm][0cm]distribution of channel state
truncated exponential with supp. 0.97 dB 4.42 dB 1.26 dB 5.98 dB 1.63 dB 8.59 dB
truncated exponential with supp. 2.19 dB 6.72 dB 2.80 dB 8.63 dB 3.52 dB 11.51 dB
truncated exponential with supp. 3.38 dB 8.38 dB 4.22 dB 10.44 dB 5.17 dB 13.40 dB

5.1Large Behavior (High SNR)

When is large relative to , it is useful to interpret the scheduling gain in terms of the well-known affine approximation to high-SNR () capacity [3]: , where denotes the slope representing the multiplexing gain and denotes the constant term representing the power/rate offset. We define the average SNR on a per-slot basis, i.e., . Similarly, the average rate is defined as , which represents the average spectral efficiency per slot. Then, we investigate in terms of and :

With algebraic calculations, we can obtain and for the equal-bit policy, the optimal scheduler (which is equal to the boundary-relaxed scheduler in this regime6), as well as the ergodic capacity (see Appendix Section 14 for derivation). The three policies have the same multiplexing gain (degrees of freedom) per slot (), but the offsets are different:

The offset of the equal-bit scheduler is independent of since it does not take advantage of time diversity. On the other hand, the offset of the boundary-relaxed scheduler decreases with since decreases [2]. Moreover, the offset of the boundary-relaxed scheduler converges to that of the ergodic capacity because as [2]. Figure 7 illustrates the offsets for several fading distributions. As can be seen, for all the fading distributions decreases from as increases and converges to . We can also see that the offsets have larger values for more severe fading distributions.

Figure 7: \mathcal{L}_{\infty,T} for several fading distributions
Figure 7: for several fading distributions

Figure 8 illustrates the behavior of the spectral efficiency versus SNR. The dashed lines are obtained from the affine approximations in while the solid lines are obtained numerically by running the optimal scheduling policy. As can be seen, the affine approximations are very accurate when SNR is 20dB or higher.

Figure 8: High SNR behavior when g is a truncated exponential variable with support [0.001, 10^6]
Figure 8: High SNR behavior when is a truncated exponential variable with support

Furthermore, as increases the spectral efficiency increases from the delay-limited capacity (achieved with the optimal scheduling or the equal-bit scheduling) to the ergodic capacity (achieved with the optimal scheduling).

It is interesting to note that for the dual problem of rate maximization over a finite time horizon when subject to a per-realization energy constraint (i.e., for every realization of channel gains the amount of energy used by the scheduling policy cannot exceed some constraint considered in [6] and [7], at high SNR the optimal policy converges to uniform power allocation (independent of channel state) and there is no advantage to using an intelligent scheduling policy. This is to be contrasted with the setting considered here, where there is a non-vanishing benefit to using the optimal scheduler even at high SNR (i.e., large ).

5.2Small Behavior (Low SNR)

In this regime, we characterize the linear approximation to the spectral efficiency versus curve based on the wideband analysis in [4]. The linear approximation consists of a constant term and a slope that represent the minimum energy per bit for reliable communication and the growth of spectral efficiency with respect to . To be clear, we adopt the notion of as the required energy per slot to transmit one bit per slot instead of the required energy to transmit one bit throughout the entire slots:

These parameters and can be obtained for the equal-bit scheduler and the one-shot scheduler, which is optimal for , (see Appendix Section 15 for derivations):

and both and are zero for ergodic capacity.

Figure 9 illustrates the behavior of and with respect to . As can be seen, both and decrease from the delay-limited values to the ergodic capacity values as due to the available time diversity.

Figure 9: Low SNR behavior when g is a truncated exponential variable with support [0.001, 10^6]
Low SNR behavior when g is a truncated exponential variable with support [0.001, 10^6]
Figure 9: Low SNR behavior when is a truncated exponential variable with support

6Conclusion

We have shown the asymptotic optimality of three different scheduling policies for delay-constrained transmission over a fading channel. When only a small number of bits need to be served, a one-shot threshold policy is optimal: once a sufficiently good channel state is experienced, the entire packet is transmitted. On the other hand, when the number of bits is large, the number of transmitted bits at each time step should be a weighted sum of the unserved bits and a channel state-related term, where the weight is proportional to the time to deadline. In each of these two policies, the scheduler is opportunistic while also being cognizant of the deadline. Furthermore, a modification of the ergodic waterfilling policy is shown to be optimal when the number of bits and the time horizon are both large.

Although problems involving delay-limited communication are of great practical importance and have been the subject of considerable research, such problems generally do not have closed-form solutions. In this work, however, we are able to circumvent this general difficulty by considering different asymptotic regimes. It would be interesting to see if the asymptotically optimal policies identified here, which admit a very simple analytical form, can be extended to other more general settings. For example, to scheduling with time-varying channels and randomly arriving packets [15][16] and possibly to multi-user channels [17].

7Proof of Proposition

    1. We show the monotonicity of in . When , and thus is non-decreasing in . When , we suppose that decreases in . Then, increases and increases. As a result, increases but this leads a contradiction. Thus, is non-decreasing in when . When , and thus is non-decreasing in .

    2. We show the strict monotonicity of for large . To do this, we first show the unboundedness of . Suppose not, i.e., there exists such that for all . By integrating both sides, we have for all . Note also that for all by . Consequently, we have for all , which leads a contradiction. Therefore, is unbounded.

      Since is unbounded and monotonically increasing, for any given there exists such that for all . In this region of , we showed that is non-decreasing in by (i). Suppose maintains a constant value as increases in this region of . Then, increases and increases. As a result, increases but this leads a contradiction, too. Therefore, increases strictly in if .

    3. Finally, we show the monotonicity of in . Since is non-decreasing in , is non-decreasing. Since is an increasing function, must be non-decreasing in . If , then is strictly increasing by (ii) and thus is strictly increasing by the same argument.

  1. If and , is constant as increases, and thus is non-decreasing with respect to . When ,

    If we suppose that decreases strictly as increases, will increase and thus will also increase. This leads a contradiction because the left hand side of decreases strictly while the right hand side increases. Therefore, is non-decreasing in .

8Proof of Theorem

We show the result by induction, i.e., we show that if the scheduling functions converge at time step , then the functions also converge at time step . The base cases occur at and : by construction, for every , and for every .

In order to show policy convergence, it is useful to write as:

which is identical to the expression for in except replacing with . Since and are convex, and are increasing and moreover unbounded (shown in Appendix Section 7 (a)(ii)). Since and where and are the lower and the upper bounds of the support of the PDF (), there exists such that if then

Henceforth, we only consider , and thus, no truncation occurs in both policy functions, i.e., and are determined by

for , where and .

Let be given. By Lemma ? (stated later in Appendix Section 8), there exists such that if , then

Since and are strictly increasing in (when is sufficiently large) by Proposition ? and is unbounded due to the unboundedness and the monotonicity of , there exists such that implies

Therefore, if ,

If ,

where the last inequality follows from . Additionally, we have

Combining and , we have . By the same argument for , we have . Thus, we obtain

By , , and the continuity, uniformly on is obtained.

From and , we write the expected cost-to-go as:

where is a function of (and ). By differentiating using integral calculus7, the derivative (with respect to ) of is:

Since is unbounded increasing and , and for sufficiently large , and thus

Since for by ,

As a result, the derivative of the expected cost-to-go can be stated simply in the limit of large :