Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 1. Introduction

Neural Networks 2nd Edition

Simon Haykin

柯博昌

Chap 1. Introduction

2

What is a Neural Network

A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use.

Knowledge is acquired by the network from its environment through a learning process. The procedure performing learning process is called a learning algorithm.

Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.

3

Benefits of Neural Networks

The computing power of neural networks– Massively parallel distributed structure– Ability to learn and therefore generalize.

Using neural networks offers following properties:– Nonlinearity– Input-Output Mapping– Adaptively– Evidential Response– Contextual Information

– Fault Tolerance – VLSI Implementability– Uniformity of Analysis and Design– Neurobiological Analogy

Supervised Learning: Modifying the synaptic weights by applying a set of training samples, which constitute of input signals and corresponding desired responses.

4

Human Brain - Function Block

Receptors: Convert stimulus from the human body or the external environment into electrical impulses that convey information to brain.

Effectors: Convert electrical impulses generated by brain into discernible responses as system outputs.

Block diagram representation of human nervous system

Stimulus ReceptorsNeuralNet.

Effectors Response

(Brain)

Forward

Feedback

5

Comparisons: Neural Net. Vs. Brain

Neuron: The structural constituents of the brain. Neurons are five to six orders of magnitude slower

than silicon logic gates. (e.g. Silicon chips: 10-9 s, Neural Event: 10-3 s)

10 billion neurons and 60 trillion synapses or connections are in the human cortex.

The energetic efficiency– Brain: 10-16 joules per operation per second.– Best Computer today: 10-6 joules per operation per second.

6

Synapses

Synapses are elementary structural and functional units that mediate the interactions between neurons.

The most common kind of synapse is chemical synapse.

The operations of synapse:– A pre-synaptic process liberates a transmitter substance

that diffuses across the synaptic junction between neurons.– Acts on a post-synaptic process.– Synapse converts a pre-synaptic electrical signal into a

chemical signal and then back into a post-synaptic electrical signal. (Nonreciprocal two-port device)

7

Pyramidal Cell

8

Cytoarchitectural map of the cerebral cortex

9

Nonlinear model of a neuron

wk1

wk2

wkm

x1

x2

xm

......

S

Biasbk

Summing junction

Synaptic weights

Input signals j(×)

Activation function

vkOutput

yk

m

jjkjk xwu

1kkk buv ( ) ( )kkkk buvy jj

Let bk=wk0 and x0=+1

m

jjkjk xwv

0

and ( )kk vy j

10

Nonlinear model of a neuron (Cont.)

wk1

wk2

wkm

x1

x2

xm

......

S

wk0=bk (bias)

Summing junction

Synaptic weights

(including bias)

Input signals j(×)

Activation function

vkOutput

yk

wk0

Fixed input x0=+1

bk>0

Bk=0Bk<0

Linear combiner’s output, uk

Induced local field, uk

Another Nonlinear model of a neuronAffine transformation produced by the presence of a bias

11

Types of Activation Function

0.20.40.60.81

1.2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

( )vj

v

0.20.40.60.81

1.2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

( )vj

v

0.20.40.60.81

1.2

-8 -6 -4 -2 0 2 4 6 80

( )vj

v

Increasinga

Threshold Function

( )

0v if

0v ifv

0

1j

Piecewise-Linear Function

( )

2/1

2/12/1

0

2/11

v

vvv

vj

Sigmoid Function

( ) ( )parameter slopethe is a

avv

exp1

1j

12

Types of Activation Function (Cont.)

The activation functions defined above range from 0 to +1.

Sometimes, the activation function ranges from -1 to +1. (How to do?)Assume the activation function ranging from 0 to +1 is denoted as j(×), ranging from -1 to +1 is denoted as j’(×)

j’(×)=j(×)*2-1

Notes: if j(v)=sigmoid function

( ) ( )( )( ) )tanh(

exp1

exp1

12*exp1

1

vav

av

avv

j

13

Stochastic Model of a Neuron

The above model is deterministic in that its input-output behavior is precisely defined.

Some applications of neural network base the analysis on a stochastic neuronal model.

P(v)-1y probabilit with

P(v)y probabilit withx

1

1

Let x denote the state of the neuron, and P(v) denote the probability of firing, where v is the induced local field of the neuron.

A standard choice for P(v) is the sigmoid-shaped function. T is a pseudo-temperature that is used to control the noise level and therefore the uncertainty in firing.

( ) ( )TvvP

/exp1

1

14

Neural Network Directed Graph

xj yk=wkjxj

wkj

xj yk=j(xj)j(×)

yk=yi+yj

yi

yj

xj

xj

xj

Synaptic Links

Activation Links

Synaptic Convergence(fan-in)

Synaptic Divergence(fan-out)

15

Signal-flow Graph of a Neuron

x1

x2

xm

......

wk0=bk

vkyk

x0=+1

wk1

wk2j(×)

wkm

16

Feedback

Feedback plays a major role in recurrent network.

xj(n) yk=j(xj)xj

’(n)

B

A

yk(n)=A[xj’(n)] xj

’(n)=xj(n)+B[yk(n)] where A and B act as operators

( ) ( ) nxAB

Any jk

1A/(1-AB) is referred as closed-loop operator, AB as open-loop operator.

In general, ABBA

17

Feedback (Cont.)

Let A be a fixed weight, w; and B is a unit-delay operator, z-1

( ) 111

111

wzw

wz

w

AB

A

( )

0

111l

ll zwwz

01 l

ll zwwAB

A

( ) ( ) nxzwwny jl

llk

0

( ) ( )lnxnxz jjl

( ) ( ) lnxwny jl

lk

0

1

Use Taylor’s Expansion or Binomial Expansion to prove it.

18

Time Responses for different weight, w

0n

yk(n)

wxj(0)

1 2 3 4 5

w<1

0n

yk(n)

wxj(0)

1 2 3 4 5

w=1

0n

yk(n)

wxj(0)

1 2 3 4 5

w>1

Conclusions:1. |w|<1, yk(n) is exponentially convergent. System is stable.2. |w|1, yk(n) is divergent. System is unstable.

Think about:1. What does the time response change, If -1<w<0?2. What does the time response change, If w-1?

19

Network Architectures

Input layer of source

nodes

Output layer of neurons

Input layer of source

nodes

Layer of hidden neurons

Layer of output

neurons

Single-Layer Feedforward Ne

tworks

MultiLayer Feedforward Netw

orks

Fully Connected: Every node in each layer is connected to every other node in the adjacent forward layer. Otherwise, it’s Partially Connected.

20

Network Architectures (Cont.)

z-1 z-1 z-1 z-1 Unit-delay operators

Recurrent Networks with no self-

feedback loops and no hidden neurons

Outputsz-1

Unit-delay operators

z-1

z-1

z-1

Inputs

Recurrent Networks with hidden neurons

21

Knowledge Representation

Primary characteristics of knowledge representation– What information is actually made explicit– How the information is physically encoded for subsequent

use Knowledge is goal directed. A good solution depends on a good representation of

knowledge. A set of input-output pairs, with each pair consisting

of an input signal and the corresponding desired response, is referred to as a set of training data or training sample.

22

Rules for Knowledge Representation

( ) ( )2

1

1

2,

m

kjkikjiji xxxxxxd

Rule 1: Similar inputs from similar classes should usually produce similar representations inside the network.

Similarity Measuring:

(1) Using Euclidian distance, d(xi, xj) Let xi=[xi1, xi2, …, xim]T

( )ji xxdSimilarity

,

1

(2) Using Inner Product, (xi, xj) Let xi=[xi1, xi2, …, xim]T

( )

m

kjkikj

Tiji xxxxxx

1

,

||xi -x

j ||

xi

xjxi

Txj

If ||xi||=1 and ||xj||=1 d2(xi, xj)=(xi-xj)T(xi-xj)=2-2(xiTxj)

( ) ( )

ji

ji

xx

xx

×

,cos

23

Rules for Knowledge Representation (Cont.)

Rule 2: Items to be categorized as separate classes should be given widely different representations in the network.

(This is the exact opposite of Rule 1.)

Rule 3: If a particular feature is important, then there should be a large number of neurons involved in the representation of that item.

Rule 4: Prior information and invariance should be built into the design of a neural network, thereby simplifying the network design by not having to learn them.

24

How to Build Prior Information into Neural Network Design

Restricting the network architecture though the use of local connections knows as receptive fields.

Constraining the choice of synaptic weights through the use of weight-sharing.

Ex: 1,2,3,4j xwvi

jiij

,6

11

x1, …, x6 constitute the receptive field for hidden neuron 1 and so on for the other hidden neurons.

Convolution Sum

Convolution Network

25

Artificial Intelligence (AI)

Goal: Developing paradigms or algorithms that require machines to perform cognitive tasks.

AI system must be capable of doing:– Store knowledge– Apply knowledge stored to solve problems– Acquire new knowledge through experience

Key components– Representation– Reasoning– Learning

Reasoning

Representation

Learning

Documents

Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 1. Introduction