Improvised Architeture of visual attention model

BARATH MUTHU KUMAR BLENU4CSE08023

RAVIKIRAN CH BLENU4CSE08027

V SUBASHINI BLENU4CSE08510

GUIDE

DR.AMUDHA J

ASSOCIATE PROFESSOR

AMRITA SCHOOL OF ENGG.,BANGLORE

Amrita School of Engineering,Bangalore-35

Problem statement

Detailed Design(Training)

Detailed Design(Testing)

Coding Guidelines

Implementation

Performance Evaluation

References

Conclusion


Given an Image frame or a video, analyse it using

Improvised VAM to generate Salient region and

find the target object.


IMAGE

VAM

VAM

WINNER

TAKE ALL

EXTRACTED

FEATURES

DECISION

TREE

CLASSIFIER

FEATURE

SELECTION

TYPE

SM

90FM2

BYFM3

BYFM1

FEATURE MAPS

B

O

T

T

O

M

-

U

P

M

O

D

U

L

E

CONSPICUITY MAPS

SALIENCY

MAP

PYRAMID

CONSTRUCTION

FEATURE

NO.

TYPE

1 RGFM1

2 SM

3 RGFM3

4 BYFM4

5 90FM2

6 INTFM1

7 INTFM5

8 BYFM3

9 0FM1

10 BYFM1

11 135FM3


2ND PRESENTATION PICS/WTA

2ND PRESENTATION PICS/FEATURE MAPS

2ND PRESENTATION PICS/CONSPICUITY MAPS

2ND PRESENTATION PICS/pYRAMIDS

REQUIRED

FEATURES

TYPE

SM

90FM2

BYFM3

BYFM1

T

O

P

-

D

O

W

N

M

O

D

U

L

E

CLASSIFIER

DETECT TARGET

OBJECT

TEST IMAGE


IDE:

Using CodeBlocks cross platform IDE

CodeBlocks console project(.cbp)

Variable Naming Convention:

Names relevant to the use of variable.

For ex: variable that stores original image read from Hard

disk is named as original_img.

Documented beside each variable to state the data it holds.


Function Naming Convention:

Named relevant to functionality.

For ex: function to find pyramids is named as find_pyramids()

Seperate function to perform each modular task in the project.

Data Structures:

Used data structures present in “cxcore.h” header file of

Opencv libraries.

Ex: “Mat” to store image

Made use of vectors and scalars in C++.


Predefined

cv.h

highgui.h

cxcore.h

mll.h

User Defined

color.h

intensity.h

orientation.h

saliency.h

Winner_takeall.h

color_feature.cpp

Intensity_feature.

cpp

orientation_feature

.cpp

Saliency.cpp


buildPyramid()

pyrup()

absdiff()

minMaxLoc()

Winner take

all.cpp

Function name: buildPyramid

Number of Parameters: 3

Parameters:

1. Actual image

2. Variable of type vector to store different levels of Pyramid

3. Number of levels

Syntax:

buildPyramid( image, dest vec, no of levels)

Output:

Images at different levels of pyramids


Function name: pyrUp


Parameters:

1. Actual image

2. Variable to store resized image

3. Destination size

Syntax:

pyrUp( image, dest , size)

Output:

Resized image


Function name: absdiff


Parameters:

1. Image 1

2. Image 2

3. Destination to store difference of image1 and image2

Syntax:

absdiff( image1,image2,dest)

Output:

Difference image


Function name: minMaxLoc


Parameters:

1. Image

2. Variable to store minimum intensity

3. Variable to store maximum intensity

Syntax:

minMaxLoc( image,&min,&max)

Output:

Minimum and Maximum intensities in an image


color_feature.cpp

Called through “color_feature( )”

Finds colour pyramids

Finds RG,BY colour maps

Colour Feature maps.

Colour conspicuity map



Color_feature.cpp

R=r-(g+b)/2.

G=g-(r+b)/2.

B=b-(r+g)/2.

Y=r+g-2(|r-g|+b)

RG_fmap=|(R(c)-G(c) ) - (R(s)-G(s) )|

BY-fmap=|(B(c)-Y(c) ) - (B(s)-Y(s) )|

intensity_feature.cpp

Called through “intensity_feature( )”

Finds intensity pyramids

Intensity Feature maps.

Intensity conspicuity map


intensity_feature.cpp

I=(r+g+b)/3.

Intensity_fmap=|I(c)-I(s)|


orientation_feature.cpp

Called through “orientation_feature( )”

Finds orientation pyramids

Finds orientation feature maps

Uses Gabor function with kernel size of 21x21

Feature maps are found for 4 orientation angles viz. 0, 45,90

and 135 degrees respectively.


orientation_feature.cpp

Gabor function has the following parameters

λ -> Wavelength of sinusoidal factor

θ -> Represents the orientation angle

ψ -> Phase offset

σ -> Gaussian envelope

γ -> Specifies spatial aspect ratio


where,

x’ = x cos θ + y sin θ

y’ = -x sin θ + y cos θ

saliency.cpp

Uses the results from color_feature.cpp, intensity_feature.cpp

and orientation_feature.cpp

Finds average of all the conspicuity maps to obtain saliency

map.

Saliency_map=(color_consp+int_consp+orient_consp)/3;


Signboard Class No of samples

Pedestrain Sb 16

Bike sb 16

Crossing sb 16

Total 48

Images

Signboard Class No of samples

Pedestrain Sb 6

Bike sb 6

Crossing sb 6

Total 18

Training Testing

Two classes on total

1.Signboard(SB)

2. Non signboard(NSB)

Every object detected must belong to one of the above

mentioned classes.


Four Categories for every classification:

1.True Positive

2.True Negative

3.False Positive

4.False Negative


Consider the following example:

A study Evaluating a new test that screens people for a disease.

Each person taking test will

1.Either have disease (sick class)

2.Does not have disease(nsick class)

The test results may be

1.positive->stating disease

2.Negative->No disease


True Positive OR True Negative:

Outcome belongs to class to which it is from.

i.e. With respect to our example

Healthy people correctly diagnosed as Healthy or vice versa.

False Positive OR False Negative:

Outcome does not belongs to original class.

i.e. With respect to our example

Healthy people incorrectly diagnosed as sick or vice versa.


Confusion Matrix:

• A 2D-array showing all the four possible classifications.

• Shown as below

SB NSB

SB True positive False Negative

NSB False positive True negative


Detection Rate:

Ratio of total number of objects correctly detected to the

total number of detections.

Gives the efficiency of the system.

Detection Rate=(true+ve + true-ve) / total detections

where total detections=sum of all values in confusion

matrix


Sensitivity:

It relates to the system’s ability to identify positive results.

Sensitivity=(true+ve)/(true+ve + true-ve)

It gives the probability of classification being correct given it is

a signboard.


Specificity:

It relates to the system’s ability to identify negative results.

Specificity=( true-ve)/(true+ve + True-ve)

It gives the probability of classification being correct given it is a not a signboard.


Precision:

It is a measure of system’s accuracy.

Proportion of the samples which truly belong to a class x to

the total classified under class x.

Precision=(no of true+ve)/(true+ve + false+ve)


Computation Time:

Time taken to construct required feature maps,detect and classify object in a given image.

Depends on the number of feature maps to be constructed.

Lesser the computational time,more efficient the system is.


Completed Literature survey

Completed design

Completed 60% of implementation


• An M-tech thesis on Computational Attention Model for Traffic Sign Detection System, by N.V.P Kiran Yarlagadda, July 2011

• Computational Visual Attention Systems and their Congnitive Foundations, by Simon Frintrop, Eric Rome, Henrik I. Christensen. ACM transactions,Vol. 7, No.1, 2011.

• B.Alefs, G.Eschemann , H.Ramoser and C.Beleznai,”Road Sign Detection From Edge orientation Histograms” in Intelligent Vehicle Symposium in IEEE 2007

• N.Dalal and B.Triggs, “Histograms of Oriented Gradients for human Detection in Computer Vision and Pattern Recognition”,2005 (IEEE)

• Visual Attention: From Bio-inspired modelling to Real-time Implementation, by Nabil Ouerhani,2003


• Itti, L., Koch, C. and Niebur, E. “A Model of Saliency-Based

Visual Attention for Rapid Scene Analysis”. IEEE Trans. on

PAMI 20 (11, 1998) 1254–1259.

• http://ilab.usc.edu/bu/

• http://opencv.willowgarage.com

• http://www.websters-online-dictionary.org/


http://ilab.usc.edu/bu/








http://opencv.willowgarage.com/








THANK YOU


Technology

Improvised Architeture of visual attention model