Chapter 5 Recurrent Networks and Temporal Feedforward Networks 國立雲林科技大學資訊工程研究所張傳育 (Chuan-Yu Chang ) 博士 Office: ES 709 TEL: 05-5342601 ext

Chapter 5Recurrent Networks and Temporal Feedforward Networks (Chuan-Yu Chang ) Office: ES 709TEL: 05-5342601 ext. 4337E-mail: [email protected]

Chuan-Yu Chang Ph.D.*

Overview of Recurrent Neural NetworksA network that has closed loops in its topological structure is considered a recurrent network.Feedforward networks:Implemented fixed-weighted mapping from input space to output space.The state of any neuron is solely determined by the input to the unit and not the initial and past states of the neuron.Recurrent neural networksRecurrent neural networks utilize feedback to allow initial and past state involvement along with serial processing.Fault-tolerantThese networks can be fully connected.The connection weights in a recurrent neural network can be symmetric or asymmetric.In symmetric case, (wij=wji) the network always converges to stable point. However, these networks cannot accommodate temporal sequences of pattern.In the asymmetric case, (wijwji) the dynamics of the network can exhibit limit cycles and chaos, and with the proper selection of weights, temporal spatial patterns can be generated and stored in the network.


Hopfield Associative MemoryHopfield(1988)The physical systems consisting of a large number of simple neurons can exhibit collective emergent properties.A collective property of a system cannot emerge from a single neuron, but it can emerge from local neuron interactions in the system.Produce a content-addressable memory that can correctly yield an entire memory from partial information.


Hopfield Associative Memory (cont.)The standard discrete-times Hopfield neural networkA kind of recurrent networkCan be viewed as a nonlinear associative memory, or content-addressable memory.To perform a dynamic mapping function.Intended to perform the function of data storage and retrieval.The network stores the information in a dynamically stable environment.A stored pattern in memory is to be retrieved in response to an input pattern that is a noisy version (incomplete) of the stored pattern.


Hopfield Associative Memory (cont.)Content-addressable memory (CAM)Attractor is a state that the system will evolve toward in time, starting from a set of initial conditions. (basin of attraction)If an attractor is a unique point in the state space, it is called a fixed point.A prototype state Fh is represented by a fixed point sh of the dynamic system.Thus, Fh is mapped onto the stable points sh of the network.


Hopfield Associative Memory (cont.)


Hopfield Associative Memory (cont.)Activation function: symmetric hard-limiterOutput can only be +1 or -1The output of a neuron is not fed back to itself. Therefore, Wij=0 for i=j.


Hopfield Associative Memory (cont.)The output of the linear combiner is written as where is the state of the networkThe state of each neuron is given by if vi=0, the value of xj will be defined as its previous state.The vector-matrix form of (5.1) is given byExternal threshold(5.1)(5.2)(5.3)


Hopfield Associative Memory (cont.)The network weight matrix W is written as Each row in (5.4) is the associated weight vector for each neuron.The output of the network can be written as vector-matrix form Scalar form (5.4)(5.5)(5.6)


Hopfield Associative Memory (cont.)There are two basic operational phases associated with the Hopfield network: the storage phase and the recall phase.During the storage phase, the associative memory is build according to the outer-product rule for correlation matrix memories.Given the set of r prototype memories, the network weight matrix is computed as

Recall phaseA test input vector xThe state of network x(k) is initialized with the values of the unknown input, ie., x(0)=x.Using the Eq.(5.6), the elements of the state vector x(k) are updated one at a time until there is no significant change in the elements of the vector. When this condition is reached, the stable state xe is the network output.

(5.7)wij=0 for i=j


Hopfield Associative Memory (cont.)Discrete-time Hopfield network training algorithmStep 1: (storage phase) Given a set of prototype memories, using (5.7), the synaptic weights of the network are calculated according to

Step 2: (Recall Phase) Given an unknown input vector x, the Hopfield network is initialized by setting the state of the network x(k) at time k=0 to x Step 3: The element of the state of the network x(k) are update asynchronously according to (5.6) This iterative process is continued until it can be shown that the element of the state vector do not change. When this condition is met, the network outputs the equilibrium state

(5.8)(5.9)(5.10)(5.11)


Hopfield Associative Memory (cont.)The major problem associated with the Hopfield network is spurious equilibrium state.These are stable equilibrium states that are not part of the design set of prototype memories.spurious equilibrium stateThey can result from linear combinations of an odd number of patterns.For a large number of prototype memories to be stored, there can exist local minima in the energy landscape.Spurious attractors can result from the symmetric energy function.Li et al., proposed a design approach, which is based on a system of first-order linear ordinary differential equation.The number of spurious attractors is minimized.


Hopfield Associative Memory (cont.)Because the Hopfield network has symmetric weights and no neuron self-loop, an energy function (Lyapunov function) can be defined.An energy function for the discrete-time Hopfield neural network can be written as

The change in energy function is given by(5.12)(5.13)The operation of Hopfield network leads to a monotonically decreasing energy function, and changes in the state of the network will continue until a local minimum of the energy landscape is reached/x is the state of the network,x is an externally applied input presented to the networkq is the threshold vector.


Hopfield Associative Memory (cont.)For no externally applied inputs, the energy function is given by

The energy change is

The storage capacity (bipolar patterns) of the Hopfield network is approximately by

If most of the prototype memories be recalled perfectly, the maximum storage capacity of the network given by

If it is required that 99% of the prototype memories are to be recalled perfectly (5.14)(5.15)(5.16)(5.17)(5.18)n is the number of neurons in the network


Hopfield Associative Memory (cont.)Example 5.1threshold0(bipolar vector)(prototype memory) [-1, 1, -1][1, -1, 1](5.7)


Hopfield Associative Memory (cont.) [-1, -1, 1], [1, -1, -1] [1, 1, 1][ 1, -1, 1](5.14) 5.1,energy


Hopfield Associative Memory (cont.)Example 5.212*12(+1-1)12*12=144144*144=20736Threshold q=0(-1+1)


Hopfield Associative Memory (cont.)5prototype vector(5.7)5.730% bit error rate0


The Traveling-Salesperson ProblemOptimization problemsFinding the best way to do something subject to certain constraints.The best solution is defined by a specific criterion.In many cases optimization problems are described in terms of a cost function.The Traveling-Salesperson Problem, TSPA salesperson must make a circuit through a certain number of cities.Visiting each city only once.The salesperson returns to the starting point at the end of the trip.Minimizing the total distance traveled.


The Traveling-Salesperson Problem (cont.)ConstraintWeak constraintEg. Minimum distanceStrong constraintConstraints that must be satisfied.The Hopfield network is guaranteed to converge to a local minimum of the energy function.To use the Hopfield memory for optimization problems, we must find a way to map the problem onto the network architecture.The first step is to develop a representation of the problems solutions that fit an architecture having a single array of PEs.


The Traveling-Salesperson Problem (cont.)Hopfield(Lyapunov energy function)

Wq


The Traveling-Salesperson Problem (cont.)An energy function must satisfies the following criteria.Visit each city only once on the tourVisit each position on the tour in a timeInclude all n cities.The shortest total distances.


The Traveling-Salesperson Problem (cont.)The energy equation isN


The Traveling-Salesperson Problem (cont.)Comparing the cost function and the Lyapunov function of the Hopfield networks, the synaptic interconnection strengths and the bias input of the network are obtained as where the Kronecker delta function defined as .


The Traveling-Salesperson Problem (cont.)The total input to neuron (x,i) is

A Contextual Hopfield Neural Networks for Medical Image Edge Detection (Chuan-Yu Chang)Optical Engineering, vol. 45, No. 3, pp. 037006-1~037006-9,2006. (EISCI)


IntroductionEdge detection from medical images(such as CT and MRI) is an important steps in the medical image understanding system.


IntroductionThe Proposed CHNNThe input of CHNN is the original two-dimensional image and the output is an edge-based feature map. Taking each pixels contextual information.Experimental results are more perceptual than the CHEFNN.The execution time is fast than the CHEFNNChangs[2000]- CHEFNNThe CHEFNN Advantage:--Taking each pixels contextual information.--Adoption of the competitive learning rule.Disadvantage:-- predetermined parameters A and B, obtain by trial and errors-- Execution time is long, 26 second above.


The Contextual Hopfield Neural Network, CHNNThe architecture of CHNN


The CHNN The total input to neuron (x,i) is computed as

The activation function in the network is defined as

(2)(1)


The CHNNBase on the update equation, the Lyapunov energy function of the two dimensional Hopfield neural network as(3)


The CHNNThe energy function of CHNN must satisfy the following conditions:The gray levels within an area belonging to the non-edge points have the minima Euclidean distance measure.where(5)(4)


The CHNNThe neighborhood function (7)(6)


The CHNNThe objective function for CHNN(8)


The CHNNComparing the objection function of the CHNN in Eq.(8) and the Lyapunov function Eq.(3) of the CHNN(9)(10)(11)


The CHNN AlgorithmInput: The original image X, the neighborhood parameters p and q.Output: The stabilized neuron representing the classified edge feature map of the original image.


The CHNN AlgorithmAlgorithm:Step 1) Assigning the initial neuron states as 1.Step 2) Use Eq.(11) to calculate the total input of each neuron .Step 3) Apply the activation rule given in Eq.(2) to obtain the new output states for each neuron.Step 4) Repeat Step 2 and Step 3 for all neurons and count the number of neurons whose state is changed during the updating. If there is a change, then go to Step 2. Otherwise, go to Step 5.Step 5) Output the final states of neurons that indicate the edge detection results.


Experimental Results(a) Original phantom image (b) added noise (K=18), (c) added noise (K=20),(d) added noise (K=23), (e) added noise (K=25), (f) noise (K=30) Phantom images


Experimental ResultsNoiseless phantom image. Laplacian-based,(b) the Marr-Hildreths,(c) the wavelet-based,(d) the Cannys, (e) the CHEFNN,(f) the proposed CHNN.


Experimental ResultsNoise phantom image(K=18). Laplacian-based,(b) the Marr-Hildreths,(c) the wavelet-based,(d) the Cannys, (e) the CHEFNN,(f) the proposed CHNN.










Experimental Results


Experimental ResultsKnee joint based MR image


Experimental ResultsSkull-based CT image


ConclusionProposed a new contextual Hopfield neural networks called Contextual Hopfield Neural Network (CHNN) for edge detection.CHNN considers the contextual information of pixels.The results of our experiments indicate that CHNN can be applied to various kinds of medical image segmentation including CT and MRI.


Recommended ReadingChuan-Yu Chang and Pau-Choo Chung, Two-layer competitive based Hopfield neural network for medical image edge detection, Optical Engineering, Vol. 39, No. 3, pp.695-703, March. 2000. (SCI)Chuan-Yu Chang, and Pau-Choo Chung, Medical Image Segmentation Using a Contextual-Constraint Based Hopfield Neural Cube, Image and Vision Computing, Vol 19, pp. 669-678, 2001. (SCI)Chuan-Yu Chang, "Spatiotemporal-Hopfield Neural Cube for Diagnosing Recurrent Nasal Papilloma," Medical & Biological Engineering & Computing, Vol. 43. pp. 16-22, 2005(EISCI). Chuan-Yu Chang, A Contextual-based Hopfield Neural Network for Medical Image Edge Detection, Optical Engineering, vol. 45, No. 3, pp. 037006-1~037006-9,2006. (EISCI)Chuan-Yu Chang, Hung-Jen Wang and Si-Yan Lin, Simulation Studies of Two-layer Hopfield Neural Networks for Automatic Wafer Defect Inspection, Lecture Notes in Computer Science 4031, pp. 1119 1126, 2006.(SCI)


Simulated AnnealingHopfield neural network(recalling stored pattern)local minimaoptimization problemglobal minimumHopfield neural networkgradient descentlocal minimumglobal minimumSANP-completeSAlocal minimum (global minimum) SAMelting the system to be optimized at an effectively high temperature.Lowering the temperature in slow stages until the system freezes.


Simulated Annealing (cont.)Plot of a function of two variables with multiple minima and maxima


Simulated Annealing (cont.)E(x)xx KBboltzmann(KB=1.3806*10-23 J/K) TZpartition function Tr(5.25)(5.24)Boltamann-Gibbs Distribution(5.24)(5.25)(5.26)


Simulated Annealing (cont.)Pr(x->xp)xxp(thermal equilibrium)Pr(x->xp)(sufficient condition)xp xxpxpx(5.27)


Simulated Annealing (cont.)(5.27)(5.26) Metropolis algorithmMonte Carlo technique

(atom)(5.28)(5.29)


Simulated Annealing (cont.)uniformly distributedrandom number([0-1])Pr(DE)< Pr(DE)>= Pr(DE)


Simulated Annealing (cont.)There are four basic components associated with a simulated annealing based global search algorithm:A concise description of the system configuration.An object or cost function.An exploration process, or a random generator of move or rearrangements of the system elements in a configuration.An annealing schedule of temperatures and defined time periods for which the system is to be evolved.The basic idea is to go downhill most of the time instead of always going downhill. (a video)


Simulated Annealing (cont.)SATschedulelocal minimumGemanglobal minimumtemperature schedule

SA

(5.32)(5.33)Decrementing factor should be small and close to unity, 0.8~0.99


Simulated Annealing based global search algorithmStep 1: Initialize the vector x to a random point in the set f.Step 2: Select an annealing schedule for the parameter TInitialize T to a sufficiently large number.Step 3: Compute xp=x+x.Step 4: Compute the change in the cost function f=f(xp)-f(x)Step 5: Use (5.29), associated with the Metropolis algorithm, to decide if xp should be used as the new state of the system or keep the current state x.


Simulated Annealing based global search algorithm (cont.)Step 6: Step 3 through 5 are repeated until the system reaches equilibrium,which is determined when the number of accepted transitions becomes insignificant. Typically, Steps 3 through 5 are carried out a predetermined number of times.Step 7: The temperature T is updated according to the annealing schedule specified in step 2,.Steps 3 through 6 are repeated. The process can be stopped when the temperature T reaches zero or a predetermined small number.


Simulated Annealing (cont.)Example 5.3 TSP problemThe optimization task is to determine the optimum sequence of cities that the salesman is to follow on the trip.The steps of the SA to this problemIdentify the state space of possible solutions.An ordered list of cities on the sales trip.The possible number of different sequences is equal to N!.To specify the nature of the state perturbation.Assume that a new solution is obtained by swapping the position of two cities in the current solution.To specify the cost function that facilitates fitness quantification of the proposed solution.The total distance traveled by the salesman.


Simulated Annealing (cont.)Example 5.3Random20Initial solutionFinal solution by SAcost


Boltzmann MachineThe Boltzmann machine is a parallel constraint satisfaction network based on simulated annealing and uses stochastic neurons.Boltzmann machine(pattern) (feedback)(stochastic neuron) stochastic recurrent network.Boltzmann machineHopfield:Boltzmann machinehidden neuron HopfieldBoltzmann machinestochastic neuronHopfieldMcCulloch-Pitts neuron.HopfieldBoltzmann machineg


Boltzmann Machine (cont.)Boltzmann machineHopfield(self-feedback)Processing unit have bipolar states.The neurons are selected randomly and one at a time for updating.ConstraintsStrong constraintMust be satisfied by any solution.The strong constraints are the rule.Weak constraintBoltzmann machineweak constraint


Boltzmann Machine (cont.)Stochastic neuronqfire

vq=0yq=+1/-10.5(5.34)(5.35)Tpseudo temperatureTHopfield network (5.2)


Boltzmann Machine (cont.)Probability distribution function for a stochastic neuron firing and the MiCulloch-Pitts neuron activation function.

Hopfield networkBoltzmann machinewij=wjiself-feedback,wij=0 i=j.T=0:MiCulloch-Pitts neuron activation functionstochastic neuron


Boltzmann Machine (cont.)Boltzmann-Gibbs Boltzmann machineBoltzmann machinenvVisiblenhHidden(nv + nh)(nv + nh -1) visible clamping patterns associated with the environment onto the visible neurons with the appropriate probabilities.The supervised mode of training may involve a probabilistic correct response pattern for each of the input pattern.


Boltzmann Machine (cont.)The energy of global network configuration

The energy function can be written in vector form

Bolzmann machinelearning cyclePositive phase and negative phase alternate followed by synaptic weight adjustments.The state transition function is given by(5.36)(5.37)(5.38)xi denotes the ith neuron output state, qi is the ith neuron threshold


Boltzmann Machine (cont.)neuron i(xi=>-xi)

(5.39)(5.38)

neuron ixi=-1(xi=1)

(5.39)(5.40)(5.41)


Boltzmann Machine (cont.)neuron ixi=1(xi=-1)

(5.42)

(5.41)(5.43)(5.34)general stochastic neuronBoltzmann machinen=nv+nh(+1/-1) 2nBoltzmann machinesimulated annealingTTThe Boltzmann machine learning rule will presented in a step-by-step algorithm.

(5.42)(5.43)


Learning algorithm for the Boltzmann machineLoop 1: At the outermost loop, the synaptic weights of the network are update many times to ensure convergence according to where m>0, and(5.45)(5.46)


Learning algorithm for the Boltzmann machine (cont.)Loop 2:For each iteration in loop 1 must be calculated in an unclamped state, and with the visible units clamped in each desired pattern.To operate the Boltzmann machine, the system must be in thermal equilibrium for some positive temperature T>0.The state of the system x then fluctuates and the correlations are measured by taking the time average of xixj.To obtain all information that is necessary to compute the synaptic weight update rule in (5.45), this process must be carried out once with the visible neurons clamped in each of their states a for Ra>0, and once with the neurons unclamped.The system must repeatedly reach thermal equilibrium before an average can be taken.


Learning algorithm for the Boltzmann machine (cont.)Loop 3: Foe each of these averages in loop 2, thermal equilibrium must be reached using a simulated annealing temperature schedule {T(k)}, for a sufficiently large initial temperature T(0), and then a gradual decrease in the temperature.


Learning algorithm for the Boltzmann machine (cont.)Loop 4:At each of these temperatures in loop3, many neurons must be sampled and updated according to the rule from where and vi is the activity level of neuron i, that is(5.47)(5.48)(5.49)


Overview of Temporal Feedforward networkThe time delays allow the network to become a dynamic network.The most common types of temporal networkTime-delay neural network (TDNN)Finite impulse response (FIR)Simple recurrent network (SRN)Real-time recurrent neural network (RTRNN)Pipeline recurrent neural network (PRNN)Nonlinear autoregressive moving average (NARMA)


Simple Recurrent NetworkSimple recurrent networkElman networkA single hidden-layer feedforward neural network.It has feedback connections from the outputs of the hidden-layer neurons to the input of the network.Developed to learn time-varying patterns or temporal sequences.


Simple Recurrent Network (cont.)The upper portion of the network contains the context units.The function of these units is to replicate the hidden-layer output signals at the previous time step.The purpose of the context units is to deal with input pattern dissonance.


Simple Recurrent Network (cont.)The feedback provide a mechanism within the network to discriminate between patterns occurring at different times that are essentially identical.The weights of the context units are fixed.The other network weights can be adjusted in a supervised training mode by using the error backpropagation algorithm with momentum.


Time-delay neural networkUsing time delays to perform temporal processing.A Feedforward neural network, with the inputs to the network successively delayed in time.A temporal sequence for the input is established and can be expressed as

The total number of weights required for the single neuron is (p+1)nThis single-neuron model can be extended to a multilayer structure.The TDNN can be trained using a modified version of the standard backpropagation algorithm.


Time-delay neural network (cont.)Basic TDNN neuron with n inputs and p delays for each input.


Time-delay neural network (cont.)Three layered TDNN architecture for the recognition of phonemes.


Distributed Time-Lagged Feedforward neural networksA DTLFNN is distributed in the sense that the element of time is distributed throughout the entire network.


Distributed Time-Lagged Feedforward neural networks (cont.)The output of the linear combiner is given by where

In the z domain we can write from (5.52)

The sum in (5.52) is referred to as a convolution sum.(5.51)(5.52)(5.53)


Distributed Time-Lagged Feedforward neural networks (cont.)Or as a transfer function or

The output of the linear combiner in Fig. 5.19 for the qth neuron of the network is(5.56)(5.54)(5.55)


Distributed Time-Lagged Feedforward neural networks (cont.)Each filtered input in Fig. 5.19 expressed in the time domain is given by the convolution sum

The output of the jth neuron in the network is given by(5.57)(5.58)


Distributed Time-Lagged Feedforward neural networks (cont.)A DTLFNN is trained using a supervised learning algorithma temporal backpropagation algorithmThis training algorithm is a temporal generalization of the standard backpropagation training algorithm.Update the appropriate network weight vector according to

(5.59)


Distributed Time-Lagged Feedforward neural networks (cont.)where In (5.60) ej(k) is the instantaneous error, and(5.60)

Documents

Chapter 5 Recurrent Networks and Temporal Feedforward Networks 國立雲林科技大學 資訊工程研究所 張傳育 (Chuan-Yu Chang ) 博士 Office: ES 709 TEL: 05-5342601 ext

Chapter 5 Recurrent Networks and Temporal Feedforward Networks 國立雲林科技大學資訊工程研究所張傳育 (Chuan-Yu Chang ) 博士 Office: ES 709 TEL: 05-5342601 ext