Representation Power of Feedforward Neural Networks · 2020. 8. 29. · Representation Power of...

Representation Power of Feedforward NeuralNetworks

Based on work by Barron (1993), Cybenko (1989),Kolmogorov (1957)

Matus Telgarsky

Feedforward Neural Networks

I Two node types:I Linear combinations:

x 7→∑i

wixi + w0.

I Sigmoid thresholded linear combinations:

x 7→ σ (〈w, x〉+ w0) .

I What can a network of these nodes represent?n∑i=1

wixi one layer,

n∑i=1

wiσ

ni∑j=1

wjixj + wj0

two layers,...

...

Forget about 1 layer.

I Target set [0, 3]; target function 1[x ∈ [1, 2]].

I Standard sigmoid σs(x) := 1/(1 + e−x).

I Consider sigmoid output at x = 2.

I w ≥ 0 and σ(2w + w0) ≥ 1/2: mess up on right side.

0 3

1

0.5

I w ≥ 0 and σ(2w + w0) < 1/2: mess up on middle bump.

0 3

1

0.5

I Can symmetrize (w < 0); no matter what, error ≥ 1/2.


I Target set [0, 3]; target function 1[x ∈ [1, 2]].I Standard sigmoid σs(x) := 1/(1 + e

−x).

I Consider sigmoid output at x = 2.

I w ≥ 0 and σ(2w + w0) ≥ 1/2: mess up on right side.

0 3

1

0.5


0 3

1

0.5



I Target set [0, 3]; target function 1[x ∈ [1, 2]].I Standard sigmoid σs(x) := 1/(1 + e

−x).

I Consider sigmoid output at x = 2.I w ≥ 0 and σ(2w + w0) ≥ 1/2: mess up on right side.

0 3

1

0.5


0 3

1

0.5


Meaning of “Universal Approximation”

Target set [0, 1]n; target function f ∈ C([0, 1]n).

I For any � > 0, exists NN f̂ ,

x ∈ [0, 1]n =⇒ |f(x)− f̂(x)| < �.

I This gives NNs f̂i → f pointwise.

I For any � > 0, exists NN f̂ and S ⊆ [0, 1]n, m(S) ≥ 1− �,

x ∈ S =⇒ |f(x)− f̂(x)| < �.

I If (for instance) bounded on Sc, gives NNs f̂i → f m-a.e..

Goal: 2-NNs approximate continuous functions over [0, 1]n.


Target set [0, 1]n; target function f ∈ C([0, 1]n).I For any � > 0, exists NN f̂ ,

x ∈ [0, 1]n =⇒ |f(x)− f̂(x)| < �.

I This gives NNs f̂i → f pointwise.I For any � > 0, exists NN f̂ and S ⊆ [0, 1]n, m(S) ≥ 1− �,

x ∈ S =⇒ |f(x)− f̂(x)| < �.





x ∈ [0, 1]n =⇒ |f(x)− f̂(x)| < �.

I This gives NNs f̂i → f pointwise.

I For any � > 0, exists NN f̂ and S ⊆ [0, 1]n, m(S) ≥ 1− �,

x ∈ S =⇒ |f(x)− f̂(x)| < �.





x ∈ [0, 1]n =⇒ |f(x)− f̂(x)| < �.


x ∈ S =⇒ |f(x)− f̂(x)| < �.

I If (for instance) bounded on Sc, gives NNs f̂i → f m-a.e..Goal: 2-NNs approximate continuous functions over [0, 1]n.



x ∈ [0, 1]n =⇒ |f(x)− f̂(x)| < �.


x ∈ S =⇒ |f(x)− f̂(x)| < �.





x ∈ [0, 1]n =⇒ |f(x)− f̂(x)| < �.


x ∈ S =⇒ |f(x)− f̂(x)| < �.

I If (for instance) bounded on Sc, gives NNs f̂i → f m-a.e..Goal: 2-NNs approximate continuous functions over [0, 1]n.

Outline

I 2-nn via functional analysis (Cybenko, 1989).

I 2-nn via greedy approx (Barron, 1993).

I 3-nn via histograms (Folklore).

I 3-nn via wizardry (Kolmogorov, 1957).

Overview of Functional Analysis proof (Cybenko, 1989)

I Hidden layer as a basis:

B := {σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R} .

I Want to show cl(span(B)) = C([0, 1]n).I Work via contradiction: if f ∈ C([0, 1]n) far from

cl(span(B)), can bridge the gap with a sigmoid.



B := {σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R} .

I Want to show cl(span(B)) = C([0, 1]n).

I Work via contradiction: if f ∈ C([0, 1]n) far fromcl(span(B)), can bridge the gap with a sigmoid.



B := {σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R} .

I Want to show cl(span(B)) = C([0, 1]n).I Work via contradiction: if f ∈ C([0, 1]n) far from

cl(span(B)), can bridge the gap with a sigmoid.

Abstracting σ

I Cybenko needs σ discriminates:

µ = 0 ⇐⇒ ∀w,w0 �∫σ(〈w, x〉+ w0)dµ(x) = 0.

I Satisfied for the standard choices

σs(x) =1

1 + e−x,

1

2(tanh(x) + 1) =

1

2

(ex − e−x

ex + e−x+ 1

)= σs(2x).

I Most results today only need σ approximates 1[x ≥ 0]:

σ(x)→

{1 as x→ +∞,0 as x→ −∞.

Combined with σ bounded&measurable givesdiscriminatory (Cybenko, 1989, Lemma 1).

Proof of Cybenko (1989)

I Consider the

closed

subspace

S :=

cl

(

span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

)

.

I Suppose (contradictorily) exists f ∈ C([0, 1]n) \ S.

I Exists bounded linear L 6= 0 with L|S = 0.I Exists µ 6= 0 so that L(h) =

∫h(x)dµ(x).

I Contradiction: σ is discriminatory.

I L|S = 0 implies ∀w,w0 �∫σ(〈w, x〉+ w0)dµ(x) = 0.

I But “discriminatory” means µ = 0 ⇐⇒ ∀w,w0 �∫. . ..


I Consider the closed subspace

S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).

I Suppose (contradictorily) exists f ∈ C([0, 1]n) \ S.

I Exists bounded linear L 6= 0 with L|S = 0.I Exists µ 6= 0 so that L(h) =

∫h(x)dµ(x).






S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).

I Suppose (contradictorily) exists f ∈ C([0, 1]n) \ S.I Exists bounded linear L 6= 0 with L|S = 0.

I Exists µ 6= 0 so that L(h) =∫h(x)dµ(x).






S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).


I Can think of projecting f onto S⊥.

I Don’t need inner products: Hahn-Banach Theorem.







S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).


I Can think of projecting f onto S⊥.I Don’t need inner products: Hahn-Banach Theorem.







S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).








S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).

I Suppose (contradictorily) exists f ∈ C([0, 1]n) \ S.I Exists bounded linear L 6= 0 with L|S = 0.I Exists µ 6= 0 so that L(h) =

∫h(x)dµ(x).






S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).


∫h(x)dµ(x).

I In Rn, linear functions have form 〈·, y〉 for some y ∈ Rn.

I With infinite dimensions, integrals give inner products.I A form of the Riesz Representation Theorem.






S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).


∫h(x)dµ(x).

I In Rn, linear functions have form 〈·, y〉 for some y ∈ Rn.I With infinite dimensions, integrals give inner products.

I A form of the Riesz Representation Theorem.






S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).


∫h(x)dµ(x).

I In Rn, linear functions have form 〈·, y〉 for some y ∈ Rn.I With infinite dimensions, integrals give inner products.I A form of the Riesz Representation Theorem.






S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).


∫h(x)dµ(x).






S := cl

(span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})

).


∫h(x)dµ(x).

I Contradiction: σ is discriminatory.I L|S = 0 implies ∀w,w0 �

∫σ(〈w, x〉+ w0)dµ(x) = 0.


What was Cybenko’s proof about?

I Set S := cl (span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})).

I Another way to look at it.

I Given f ∈ C([0, 1]n), current approximant f̂ ∈ S.I The residue f − f̂ must have nonzero projection onto S.I Add residue projection into f̂ ; does this now match f?

I Can this be turned into an algorithm?


I Set S := cl (span ({σ(〈w, x〉+ w0) : w ∈ Rn, w0 ∈ R})).I Another way to look at it.





I Given f ∈ C([0, 1]n), current approximant f̂ ∈ S.

I The residue f − f̂ must have nonzero projection onto S.I Add residue projection into f̂ ; does this now match f?




I Given f ∈ C([0, 1]n), current approximant f̂ ∈ S.I The residue f − f̂ must have nonzero projection onto S.

I Add residue projection into f̂ ; does this now match f?


Outline





The method of Barron (1993, Section 8)

Input: basis set B,

1. Choose arbitrary f̂0 ∈ B.2. For t = 1, 2, . . .:

2.1 Choose (αt, gt) apx minimizing

infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.

2.2 Update f̂t := αtf̂t−1 + (1− αt)gt.

I Can rescale B so f ∈ cl(conv(B)). (or tweak alg.)I Is this familiar?

I Barron carefully shows rate ‖f − f̂t‖22 = O(1/t).


Input: basis set B, target f ∈ cl(conv(B)).

1. Choose arbitrary f̂0 ∈ B.2. For t = 1, 2, . . .:


infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.


I Can rescale B so f ∈ cl(conv(B)). (or tweak alg.)

I Is this familiar?



Input: basis set B, target f ∈ cl(conv(B)).1. Choose arbitrary f̂0 ∈ B.

2. For t = 1, 2, . . .:


infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.



I Is this familiar?



Input: basis set B, target f ∈ cl(conv(B)).1. Choose arbitrary f̂0 ∈ B.2. For t = 1, 2, . . .:


infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.



I Is this familiar?




2.1 Choose (αt, gt) apx minimizing (tolerance O(1/t2))

infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.



I Is this familiar?





infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.


I Can rescale B so f ∈ cl(conv(B)). (or tweak alg.)I Is this familiar?





infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.


I Can rescale B so f ∈ cl(conv(B)). (or tweak alg.)I Is this familiar? Oh let’s see..





infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.


I Can rescale B so f ∈ cl(conv(B)). (or tweak alg.)I Is this familiar? Oh let’s see.. gradient descent, coordinate

descent, Frank-Wolfe, projection pursuit, basis pursuit,boosting.. as usual, cf. Zhang (2003).





infα∈[0,1],g∈B

‖f − (αf̂t−1 + (1− α)g)‖22.


I Can rescale B so f ∈ cl(conv(B)). (or tweak alg.)I Is this familiar? YES.


Greedy approx as coordinate descent

I Intuition: linear operator A with basis B as “columns”:

f̂t = Awt, g = Aej .

I Split the update into two steps:

infg∈B‖f − (f̂t−1 + g)‖22 = inf

i‖f − (Awt−1 +Aei)‖22,

infα∈[0,1]

‖f − (αf̂t−1 + (1− α)gt)‖22.

I Suppose ‖g‖2 = 1 for all g ∈ B; first step equiv to

supg∈B

〈g, f̂t−1 − f

〉= sup

i(Aei)

>(Awt−1 − f).

I Second version is coordinate descent update!

I (Standard rate proofs use curvature of ‖ · ‖22.)

Story for 2-layer NNs so far

I Can L2 apx any f ∈ C([0, 1]n) at rate O(1/√t)..

I ..so why aren’t we using this algorithm for neural nets?

I This is not an easy optimization problem:

infα∈[0,1],w∈Rn,w0∈R

‖f − (αf̂t−1 + (1− α)σ(〈w, ·〉+ w0)‖22.

I For a general basis B, must check each basis element..

I From an algorithmic perspective, much remains to be done.

Story for 2-layer NNs so far

I Can L2 apx any f ∈ C([0, 1]n) at rate O(1/√t)..

I ..so why aren’t we using this algorithm for neural nets?I This is not an easy optimization problem:

infα∈[0,1],w∈Rn,w0∈R

‖f − (αf̂t−1 + (1− α)σ(〈w, ·〉+ w0)‖22.

I For a general basis B, must check each basis element..

I From an algorithmic perspective, much remains to be done.

Lower bound from Barron (1993, Section 10)

I The greedy algorithm gets to search whole basis.

I If an n-element basis Bn is fixed, then

supf∈cl(spanC(B))

inff̂∈span(Bn)

‖f − f̂‖2 = Ω(

1

n1/d

),

the curse of dimension!

I Can defeat this with more layers (cf. Kolmogorov (1957)).

Outline





Folklore proof of NN power

I What’s an easy way to write function as sum of others?

I Project onto favorite basis (Fourier, Chebyshev, ...)!

I Let’s use this to build NNs.

I Easiest basis: histograms!


I What’s an easy way to write function as sum of others?I Project onto favorite basis (Fourier, Chebyshev, ...)!

I Let’s use this to build NNs.

I Easiest basis: histograms!


I What’s an easy way to write function as sum of others?I Project onto favorite basis (Fourier, Chebyshev, ...)!

I Let’s use this to build NNs.I Easiest basis: histograms!

Histogram basis for C([0, 1]n)

I Grid [0, 1]n into {Ri}mi=1; consider

f̂(x) =

m∑i=1

ai1[x ∈ Ri],

where target f(x) = ai for some x ∈ Ri.

I Fact: for any � > 0, exists histogram f̂ with

x ∈ [0, 1]n =⇒ |f(x)− f̂(x)| < �.

I Pf. Continuous function over compact set is uniformlycontinuous.

I Now just write individual histogram bars as a NNs.

Histogram basis for C([0, 1]n)

I Grid [0, 1]n into {Ri}mi=1; consider

f̂(x) =

m∑i=1

ai1[x ∈ Ri],

where target f(x) = ai for some x ∈ Ri.I Fact: for any � > 0, exists histogram f̂ with

x ∈ [0, 1]n =⇒ |f(x)− f̂(x)| < �.

I Pf. Continuous function over compact set is uniformlycontinuous.

I Now just write individual histogram bars as a NNs.

Histograms via 0/1 NNs

I Write 1[x ∈ R] as sum/composition of 1[〈w, x〉 ≥ w0].

I 2-D example. First carve out axes.

2

1

1 2

3

2

3

2

2

3

4

3

I Now pass through a second layer 1[input ≥ 3.5].

1

0


I Write 1[x ∈ R] as sum/composition of 1[〈w, x〉 ≥ w0].I 2-D example. First carve out axes.

2

1

1 2

3

2

3

2

2

3

4

3


1

0



2

1

1

2

3

2

3

2

2

3

4

3


1

0



2

1

1 2

3

2

3

2

2

3

4

3


1

0

Histograms via sigmoidal NNs

I Write 1[x ∈ R] as sum/composition of σ(〈w, x〉+ w0).

I 2-D example. First carve out axes, with fuzz region.

>2-2ε

Histograms via sigmoidal NNs

I Write 1[x ∈ R] as sum/composition of σ(〈w, x〉+ w0).I 2-D example. First carve out axes, with fuzz region.

>2-2ε

Histograms apx via sigmoidal NNs, some math

I Histogram region is clearly 2-layer 0/1 NN:

1[∀i � xi ≥ li ∧ xi ≤ ui]

= 1

[n∑i=1

(1[xi ≥ li] + 1[xi ≤ ui]) ≥ (2n− 1)/2n

].

I Choose B huge; apx 1[〈w, x〉 ≥ w0] with σ(B(〈w, x〉 −w0)).I �/2-apx f with histogram {ai, Ri}mi=1;�/(2m)-apx each bar with sigmoids.

I Final NN f̂(x) :=∑m

i=1 ai(sigmoidal apx to 1[x ∈ Ri]).I Have fuzz S ⊆ [0, 1]n with m(S) ≥ 1− �,

x ∈ S =⇒ |f(x)− f̂(x)| < �.



1[∀i � xi ≥ li ∧ xi ≤ ui]

= 1

[n∑i=1

(1[xi ≥ li] + 1[xi ≤ ui]) ≥ (2n− 1)/2n

].

I Choose B huge; apx 1[〈w, x〉 ≥ w0] with σ(B(〈w, x〉 −w0)).

I �/2-apx f with histogram {ai, Ri}mi=1;�/(2m)-apx each bar with sigmoids.



x ∈ S =⇒ |f(x)− f̂(x)| < �.



1[∀i � xi ≥ li ∧ xi ≤ ui]

= 1

[n∑i=1

(1[xi ≥ li] + 1[xi ≤ ui]) ≥ (2n− 1)/2n

].




x ∈ S =⇒ |f(x)− f̂(x)| < �.



1[∀i � xi ≥ li ∧ xi ≤ ui]

= 1

[n∑i=1

(1[xi ≥ li] + 1[xi ≤ ui]) ≥ (2n− 1)/2n

].



i=1 ai(sigmoidal apx to 1[x ∈ Ri]).

I Have fuzz S ⊆ [0, 1]n with m(S) ≥ 1− �,

x ∈ S =⇒ |f(x)− f̂(x)| < �.



1[∀i � xi ≥ li ∧ xi ≤ ui]

= 1

[n∑i=1

(1[xi ≥ li] + 1[xi ≤ ui]) ≥ (2n− 1)/2n

].




x ∈ S =⇒ |f(x)− f̂(x)| < �.

Folklore proof discussion

I Curse of dimension!

I Still, high level features useful.

Outline





Preface to the Kolmogorov Superposition Theorem

I In 1957, at 19, Kolmogorov’s student Vladimir Arnoldsolved Hilbert’s 13th problem:

Can the solution to

x7 + ax3 + bx2 + cx+ 1 = 0

be written as a finite number of 2-variablefunctions?

I Kolmogorov’s generalization is a NN apx theorem.

I The “transfer functions” (i.e., sigmoids), are not fixedacross nodes.

The Kolmogorov Superposition Theorem

Theorem. (Kolmogorov, 1957)

There exist continuous {ψp,q : p ∈ [n], q ∈ [2n+ 1]}, so that

forany f ∈ C([0, 1]n),

there exist continuous {χq}

2n+1

q=1 ,

f(x) =

2n+1

∑q=1

χq

n

∑p=1

ψp,q(xp)

I The proof is also histogram based.

I It must remove the “fuzz” gaps.I It must remove dependence on a particular � > 0.

I The magic is within ψp,q!!




forany f ∈ C([0, 1]n), there exist continuous {χq}

2n+1

q=1 ,

f(x) =

2n+1

∑q=1

χq

n

∑p=1

ψp,q(xp)







forany f ∈ C([0, 1]n), there exist continuous {χq}

2n+1

q=1 ,

f(x) =

2n+1

∑q=1

χq

n∑p=1

ψp,q(xp)







forany f ∈ C([0, 1]n), there exist continuous {χq}2n+1q=1 ,

f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)






There exist continuous {ψp,q : p ∈ [n], q ∈ [2n+ 1]}, so that forany f ∈ C([0, 1]n), there exist continuous {χq}2n+1q=1 ,

f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)







f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)


I It must remove the “fuzz” gaps.

I It must remove dependence on a particular � > 0.





f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)




Constructing ψp,q

I Given resolution k ∈ Z++, build a staggered gridding.

I Let Sqk,i1,...,im range over the cells.

I Construct ψp,q so that for each k and q,∑n

p=1 ψp,q maps

each Sqk,i1,...,im to its own specific interval of R.I Holds for all q: gracefully handles the fuzz regions.

I Holds for all k, simultaneously handles all resolutions.

I The functions ψp,q are fractals.

Constructing ψp,q


q



p=1 ψp,q maps




Constructing ψp,q


q



p=1 ψp,q maps

each Sqk,i1,...,im to its own specific interval of R.

I Holds for all q: gracefully handles the fuzz regions.



Constructing ψp,q


q



p=1 ψp,q maps




Constructing χq

I Recall goal:

f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)

I Start χq0 := 0; eventually χq := limr→∞ χ

qr.

I To build χqr+1 from χqr:

I Choose kr+1 so gridding fluctuations sufficiently small.I Make χq some value of f in each cell, smooth interpolation

in fuzz.I Every x ∈ [0, 1]n hits at least n+ 1 cells Sqk,i1,...,in ; i.e., sinceq ∈ [2n+ 1], more than half the approximations are good.

I (I’m leaving a lot out =( )

Constructing χq

I Recall goal:

f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)

I Start χq0 := 0; eventually χ

q := limr→∞ χqr.





Constructing χq

I Recall goal:

f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)




I Choose kr+1 so gridding fluctuations sufficiently small.

I Make χq some value of f in each cell, smooth interpolationin fuzz.

I Every x ∈ [0, 1]n hits at least n+ 1 cells Sqk,i1,...,in ; i.e., sinceq ∈ [2n+ 1], more than half the approximations are good.


Constructing χq

I Recall goal:

f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)





in fuzz.

I Every x ∈ [0, 1]n hits at least n+ 1 cells Sqk,i1,...,in ; i.e., sinceq ∈ [2n+ 1], more than half the approximations are good.


Constructing χq

I Recall goal:

f(x) =

2n+1∑q=1

χq

n∑p=1

ψp,q(xp)







Discussion 1

Discussion 2

I It needs different transfer functions..

I ..but these can be approximated by multiple layers!

I It shows the value of powerful nodes, whereas the standardapx results just suggest a very wide hidden layer.

Conclusion

I Some fancy mechanisms give 2-NNs f̂i → f ∈ C([0, 1]n).I Histogram constructions hint to the power of deeper

networks.

Thanks!

Andrew R. Barron. Universal approximation bounds forsuperpositions of a sigmoidal function. IEEE Transactions onInformation theory, 39(3):930–945, 1993.

George Cybenko. Approximation by superpositions of asigmoidal function. Mathematics of Control, Signals, andSystems, 2:303–314, 1989.

Andrei N. Kolmogorov. On the representation of continuousfunctions of several veriables as superpositions of continuousfunctions of one variable and addition. Dokl. Acad. NaukSSSR, 114(5):953–956, 1957. Translation to English: V. M.Volosov.

Tong Zhang. Sequential greedy approximation for certainconvex optimization problems. IEEE Transactions onInformation Theory, 49(3):682–691, 2003.

Representation Power of Feedforward Neural Networks · 2020. 8. 29. · Representation Power of...

Documents

Recurrent Neural Networksjcheung/teaching/fall-2017/...Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks

Representation Learning Using Multi-Task Deep Neural Networksfor Semantic Classification and Information Retrieval

Deep Learning for Natural Language Processing › slides › 20160618_DL4NLP@CityU.pdf · Machine Learning Deep Learning Implementation 2 Neural Models for Representation Learning

Web view"pure" hardware neural networks, cybernetic gadgets computer MYCIN expert system, STRIPS representation, knowledge-based systems network agents,

Controlkyodo/kokyuroku/contents/pdf/...feedforward in control. Second, we need other supervising neural network which,determines when the connection change should bedone. Third, this

Aula: Controle Antecipatório (Alimentação ou Feedforward)Aula: Controle Antecipatório (Alimentação ou Feedforward) Prof. Eduardo Stockler Departamento de Engenharia Elétrica

Feedforward Control ( 前馈控制 )

Representation presentation

INVERSOR MÓDULO INTEGRADO UTILIZANDO UM FEEDFORWARD

Knowledge Representation

Feedforward Backpropagation, Genetic Algorithm Approaches ... Shafika Sultan.pdf · Feedforward Backpropagation, Genetic Algorithm Approaches for Predicting ... Algoritma genetik;

5 Deep Learning Multi-layered feedforward neural networks ... · 5 Deep Learning •Some Topics in Deep Learning: ∗Learning algorithms: ~Back propagation, Stochastic Gradient Descent

FEEDFORWARD NEURAL NETWORKS UNTUK PEMODELAN …personal.its.ac.id/files/pub/1088-suhartono-statistics-Disertasi... · Gambar 1.1 : Arsitektur MLP dengan satu lapis tersembunyi, tiga

ANN and DNN-based Models for DDoS Detection via Network ... · Deep learning Neural Networks (DNN). The ANN models are Single-Layer Feedforward architecture ANN model, and Single-Layer

Representation Examples

REPRESENTATION VISUELLE

Feedback och Feedforward - Svenska kulturfonden...Varför feedback och feedforward? •Undervisning och lärande är sociala processer, där de delade erfarenheterna ger ett önskvärt

cascada y feedforward

Redes Neurais Feedforward e Backpropagation - DECOM-UFOP · Problemas que requerem duas camadas intermediárias são raros. Basicamente, uma rede neural com duas camadas intermediárias

Aula: Controle Antecipatório (Alimentação ou Feedforward) · 107484 – Controle de Processos Aula: Controle Antecipatório (Alimentação ou Feedforward) Prof. Eduardo Stockler