Approximation capabilities of neural networks on unbounded domains

# Approximation capabilities of neural networks on unbounded domains

## Abstract

We prove that if and if the activation function is a monotone sigmoid, relu, elu, softplus or leaky relu, then the shallow neural network is a universal approximator in . This generalizes classical universal approximation theorems on

We also prove that if and if the activation function is a sigmoid, relu, elu, softplus or leaky relu, then the shallow neural network expresses no non-zero functions in . Consequently a shallow relu network expresses no non-zero functions in . Some authors, on the other hand, have showed that deep relu network is a universal approximator in . Together we obtained a qualitative viewpoint which justifies the benefit of depth in the context of relu networks.

Index terms— Universal approximation theorem, unbounded domain, neural networks, sigmoid, relu, elu, softplus, leaky relu, tail risk, benefit of depth

## 1 Introduction

The universal approximation theorem in the mathematical theory of artificial neural networks was established by Cybenko [4], with various versions and different proofs contributed by Hornik-Stinchcombe-White [6] and Funahashi [5]. This theory treats the approximation of continuous functions as well as integrable functions.

Classical universal approximation theorems are mainly focused on bounded domains. We list here some articles which have also explored the thoery on certain unbounded domains. Some authors, such as [8] and [2], studied the approximation capabilities of neural networks in the space of contintous functions of vanishing at the infinity. Hornik [7] studied the approximation capabilities of neural networks in , with respect to a finite input space enviroment measure . Regarding approximation on unbounded domains with respect to its Lebesgue measure, Lu et al. [10] showed that deep narrow relu network is a universal approximator in ; Clevert et al. [1] proved that deep relu network with at most hidden layers is a universal approximator in ; Qu and Wang [17] proved that an artificial neural network with a single hidden layer and logistic activation is a universal approximator in .

Qu and Wang [17] pointed out the following connection between the universal approximation theory on and the management of tail risks in option trading: experiments in [17] demonstrated that in the design of a option price learning model, a decision function that fits into the approximation capability of networks in yields faster learning and better generalization performance. This suggests that a further study of approximation capabilities of neural networks on unbounded domains is not only of value in theory but also of practical applications.

The main result of this paper is Theorem 4.7 and Theorem 5.7. We proved that if and if the activiation function is a monotone sigmoid, relu, elu, softplus or leaky relu then the shallow neural network is a universal approximator in . We also prove that if and if the activiation function belongs to a sigmoid, relu, elu, softplus or leaky relu then the shallow neural network expresses no non-zero function in . As a corollary, a relu shallow neural network expresses no non-zero function in . In contrast, [10] and [1] showed that deep shalow neural network (even narrow indeed) is a universal approximator in . Together we obtained a qualitative viewpoint which justifies the benefit of depth for relu neural networks.

The organization of this article is as follows. In section 2 we prove Theorem 2.1 which demonstrates that the universal approximation capacity in is equivalent to the universal approximation capacity in . In section 3 we prove a result which will be used to prove that typical shallow networks are not integrable functions on as well as(as a corollary) on . In section 4 we discuss shallow networks with bounded, eventually monotone activation functions. In section 5 we discuss shallow networks with relu and other popular unbounded activation functions.

## 2 Approximation capabilities of networks on R×[0,1]n

Given any measurable function and we define

 ϕτϱ: =x∈R↦ϕ(x+ϱ) ϕβy: =x∈Rn↦ϕ(⟨y,x⟩) ϕτϱδy =(ϕτϱ)δy.

These are called ridge functions. The space of shallow networks on with activation function is denoted by and consists of

 k∑i=1tiϕτϱiδyi,k∈N,yi∈Rn,ti∈R,ϱi∈R.

The closure of a subset of a metric space is denoted by . The canonical Lebesgue measure on will be denoted by The indicator function of a set will be denoted by Before proving Theorem 2.1 we recall some facts from harmonic analysis. If then its Fourier transform is a bounded continuous function with the integral form

 ˆf(ξ)=∫x∈Rne−2πi⟨x,ξ⟩f(x)dx,   ξ∈Rn. (1)

Let be the Schwartz space on ([13]) which is a Frechet space. Let be the space of temperate distributions which is the dual of . The Fourier transform of is defined by

 ˆu(φ)=u(ˆφ),   φ∈S. (2)

For there is a natural embedding ([13][p.135]). If then its Fourier transform (2) agrees with (1). To make Cybenko’s strategy working in our case we need to show that certian is zero. Because that Fourier transformation is an isomorphism of ([13][Theorem IX.2]), it suffices to show that

###### TheoremTheorem 2.1 ().

Let , and . If a measurable function satisfies then

###### Proof.

Write . Suppose and our theorem is not true. By Hahn-Banach Theorem there exists a nonzero and a nonzero real valued such that if then

 u(φ)=∫Ωφ(x)h(x)dx=0, (3)

and we have .

If satisfies and if , then we show . Indeed by change of variables we have

 ∣∣∣∣γδy∣∣∣∣pLp(Ω) =∫R×[0,1]n|γ(⟨y,x⟩)|pdx0dx1…dxn =|y0|−1∫R×[0,1]n|γ(t0)|pdt0dt1…dtn =|y0|−1||γ||pLp(R).

Moreover by definition one can check . Put together, if satisfies and if then , and then by (3) Next we prove a stronger fact: if satisfies and then

 u(γδy)=∫Ωγδy(x)h(x)dx=0. (4)

Because , for any there exists such that From (4) we have and furthermore

 ∣∣u(γδy)∣∣ =∣∣u(γδy−γδyy0,ϵ)+u(γδyy0,ϵ)∣∣ =∣∣u(γδy−γδyy0,ϵ)∣∣ ≤||u||⋅||(γ−γy0,ϵ)δy||Lp(Ω) =||u||⋅|y0|−1/p||γ−γy0,ϵ||Lp(R) ≤ϵ.

The above inequality holds for all and proves (4). For write . We identify as an element of by assigning , and we define elements of by setting . With respect to the topology of we have and therefore

 limk→∞ˆuk=ˆu.

By setting we have and

 uk(φ)=∫Rn+1φ(x)hk(x)dx.

The Fourier transform of integrable is represented by the integral form

 ˆhk(ξ)=∫Ωke−2πi⟨x,ξ⟩h(x)dx. (5)

As is represented by , is also represented by . Hence if then

 ˆuk(φ)=∫Rn+1φ(ξ)ˆhk(ξ)dξ=∫Rn+1φ(ξ)∫Ωke−2πi⟨x,ξ⟩h(x)dxdξ.

Let be any compact set in and let . There is a constant such that is bounded from above by . Let . If , and , then and Therefore Similarly if , and then . In particular we have proved that if then

 ⟨{0}×[0,1]n,K⟩∩⟨{±k}×[0,1]n,K⟩=∅. (6)

For any let respectively consist of all point respectively such that there exists satisfying :

 X±z,k ={x∈Ωk:⟨z,x⟩∈⟨z,{±k}×[0,1]n⟩} ¯¯¯¯¯X±z,k ={x∈Ω:⟨z,x⟩∈⟨z,{±k}×[0,1]n⟩}.

Because are connected, (6) implies that if and then

 X±z,k⊂¯¯¯¯¯X±z,k⊂R±×Rn. (7)

Let and be integers greater than we want to show that

 X±z,k2=(±(k2−k1),0)+X±z,k1. (8)

Let , we have and . It is obviously that and which implies . By (7), we have and therefore . We have proved that The same argument proves also and consequently Similar arguments prove another part of (8). By (7) and by (8), the number

 νK :=λn+1(⋃z∈KX+z,k)+λn+1(⋃z∈KX−z,k)

is finite and is independent of the choice of . Moreover (8) leads to

 ⋃z∈KX±z,k2=(±(k2−k1),0)+⋃z∈KX±z,k1, (9)

which implies that moves to infinity as goes to infinity. Write

 X†z,k =X+z,k∪X−z,k Y†z,k =Ωk∖X†z,k.

We want to prove

 ω∈Ω and ⟨z,ω⟩∈⟨z,Y†z,k⟩⇔ω∈Y†z,k. (10)

By definition the part of (10) is true. Now assume that and . By assumption there exists such that Suppose then the line intersects the boundary of , as

 {tw+(1−t)win:t∈[0,1]}∩{±k}×[0,1]n≠∅.

Let be the intersection point. We have . This together with the definition of gives , contradicting to . Therefore is not ture, and instead . Suppose then there exists such that , which in the same way contradicts to . Therefore is not ture, and instead . Putting together we have proved that if satisfies the left hand side of then Finally we proved , which leads to

 IY†z,k=I{ω:ω∈Ω,⟨z,ω⟩∈⟨z,Y†z,k⟩}. (11)

For , we define . Because is bounded and with bounded support, . By and (11), for ,

 ∫Y†z,kα(⟨z,x⟩)hk(x)dx =∫Ωα(⟨z,x⟩)h(x)IΩk(x)IY†z,k(x)dx =∫Ωα(⟨z,x⟩)h(x)IY†z,k(x)dx =∫Ωα(⟨z,x⟩)h(x)I{ω:ω∈Ω,⟨z,ω⟩∈⟨z,Y†z,k⟩}(x)dx =∫Ωα(⟨z,x⟩)I{ω:⟨z,ω⟩∈⟨z,Y†z,k⟩}(x)(h(x)IΩ(x))dx =∫Ωαk(⟨z,x⟩)h(x)dx.

Therefore for any we have

 ∫Rn+1α(⟨z,x⟩)hk(x)dx =∫X†z,kα(⟨z,x⟩)hk(x)dx+∫Y†z,kα(⟨z,x⟩)hk(x)dx =∫X†z,kα(⟨z,x⟩)hk(x)dx+∫Ωαk(⟨z,x⟩)h(x)dx

Because , by (4) we have and therefore

 ∫Rn+1α(⟨z,x⟩)hk(x)dx=∫X†z,kα(⟨z,x⟩)hk(x)dx.

Take then by (5) and the above equality,

 ∣∣ˆhk(z)∣∣ =∣∣∣∫Rn+1α(⟨z,x⟩)hk(x)dx∣∣∣=∣∣ ∣∣∫X†z,kα(⟨z,x⟩)hk(x)dx∣∣ ∣∣ ≤||αδz||Lp(X†z,k)||hk||Lq(X†z,k) ≤ν1/pK||hk||Lq(X†z,k) (12)

Let with where is a compact subset of . By (9), for any there exists such that if then

 ||hk||Lq(⋃z∈KX†z,k)<||ψ||−1L∞(Rn+1)λ−1n+1(K)ν−1/pKϵ

and for all such and , using (2) and the fact that ,

For all we have

 |ˆuk(ψ)| =∣∣∣∫Rn+1ψ(z)ˆhk(z)dz∣∣∣ =∣∣∣∫Kψ(z)ˆhk(z)dz∣∣∣ ≤λn+1(K)||ψ||L∞(Rn+1)||ψ||−1L∞(Rn+1)λ−1n+1(K)ϵ =ϵ.

Consequently for all we have

 ˆu(ψ)=limk→∞ˆuk(ψ)=0.

By similar arguments for for all we have

 ˆu(ψ)=limk→∞ˆuk(ψ)=0.

This implies that is supported on the hyperplane . Because , and . As is represented by some function in , and therefore . Which contradicts to the fact that is non-zero. ∎

## 3 Inexpressivity of sums of ridge functions

It is well known that “The graph of ridge function is a ruled surface, and it is this ruled nature that make ridge functions difficult to use in any standard way in approximation theory”([9]). For instance [9][Proposition 1.1] shows that the space contains no ridge function except 0. In this section we shall prove a stronger

###### TheoremProposition 3.1 ().

Let and . Suppose real valued functions defined on and satisfy

1. For all and , there exist real numbers and such that

 limt→ω(Fk(t)−βk,ωt)=αk,ω.
2. is uniformly continuous.

3. for some .

Then we have

###### Proof.

Throughout the proof, refers to the point in Because

 Gδρeiθ+Hδseiθ =(Gδρ+Hδs)δeiθ Gδei(θ+π) =(Gδ−1)δeiθ,

without loss of generality we can assume

 F=n∑k=1Fδeiθkk, (13)

where are different numbers in . The condition 1 in the assumption implies that this condition holds for as well, i.e. for all and , there exist real numbers and such that

 limt→ω(Fk(t)−βk,ωt)=αk,ω.

We claim that are all linear functions. If this is not true then pick for which is not linear. By nonlinearity, there exists such that

 Fδeiθjj(x♭) ≠αj,∞+βj,∞(x♭1cosθj+x♭2sinθj). (14)

The function satisfies:

1. and is uniformly continuous.

2. There exist functions such that and that for all and there are real numbers satisfying

 limt→ω(¯¯¯¯¯¯Fk(t)−βk,ωt)=¯¯¯¯¯¯¯¯¯αk,ω.
3. .

Fact follows from . Fact follows from

 ¯¯¯¯F(x) =n∑k=1Fk((x1+x♭1)cosθk+(x2+x♭2)sinθk) =n∑k=1¯¯¯¯¯¯Fkδeiθk(x)

where and the following calculation

 limt→∞(¯¯¯¯¯¯Fk(t)−βk,∞t) =limt→∞(Fk(t+x♭1cosθk+x♭2sinθk)−βk,∞t) =limt→∞(Fk(t)−βk,∞t)+βk,∞(x♭1cosθk+x♭2sinθk) =αk,∞+βk,∞(x♭1cosθk+x♭2sinθk).

Fact follows from the following and (14):

 ¯¯¯¯¯¯¯¯¯¯αj,∞ =αj,∞+βj,∞(x♭1cosθj+x♭2sinθj), ¯¯¯¯¯Fj(0) =Fδeiθjj(x♭).

Next we prove that for all

 limr→∞¯¯¯¯F(reiθ)=0. (15)

By , the infinity behavior of at is the same as a linear function in . Therefore exists in . If this limit is not zero then there exists and such that if then As is uniformly continuous there exists such that if then