Computer Vision - Unict

1

Computer Vision A.A. 2010/2011 – Prof. Sebastiano Battiato

Computer Vision

Corso di Laurea Magistrale in Informatica

(9 CFU)

A.A. 2012/2013

Sebastiano Battiato

Martedì - Giovedì (10.00-13.00) – Aula 24


What is Computer Vision?

La Visione è forse il senso più importante che l’uomo possiede. Essa permette di inferire il mondo tridimensionale, di riconoscere e localizzare gli oggetti presenti in una scena, di percepire i rapidi cambiamenti dell’ambiente, ecc.

La Computer Vision è la disciplina che studia come abilitare i computer alla comprensione e alla interpretazione delle informazioni visuali presenti in immagini o video.

2


Computer Vision

Tra tutte le abilità sensoriali, la visione è largamente

riconosciuta come quella con le maggiori potenzialità. Le

capacità dei sistemi biologici sono formidabili: l’occhio

raccoglie una banda di radiazioni elettromagnetiche

rimbalzate su diverse superfici e provenienti da fonti

luminose diverse ed il cervello elabora questa informazione

formando il quadro della scena come noi la percepiamo.

Se volessimo dare una definizione, potremmo dire che la

Visione Computazionale (VC) o Computer Vision, si

occupa della analisi di immagini numeriche al calcolatore.


Computer Vision

L’analisi è finalizzata a scoprire cosa e presente nella scena e dove. Non si occupa di:

Elaborazione di immagini: miglioramento, restauro e compressione di immagini. Si elabora una immagine per ottenerne un’altra in qualche senso “migliore”;

Riconoscimento di pattern: (estrazione), identificazione, classificazione di caratteristiche nelle immagini.

Computer Vision ≠ Pattern Recognition

Computer Vision ≠ Image Processing

3


Obiettivi della Computer Vision

Interpretare pixel

Ciò che vediamo Ciò che un computer vede

Source: S. Narasimhan


Source: “80 million tiny images” by Torralba et al.

Obiettivi della Computer Vision

4


Visione come strumento

Real-time stereo Structure from motion

NASA Mars Rover

Pollefeys et al.

Reconstruction from

Internet photo collections

Goesele et al.


Challenges: variazioni del punto di vista

Michelangelo 1475-1564

slide credit: Fei-Fei, Fergus & Torralba

5


Challenges: illuminazione

image credit: J. Koenderink


Challenges: Scala


6


Challenges: deformazioni

Xu, Beihong 1943



Challenges:

occlusione

Magritte, 1957


7


Challenges: Moto

slide credit: Lazebnik


Challenges: Variazioni sul tema (intra-classe)


8


Challenges: ambiguità locale




Source: Rob Fergus and Antonio Torralba

9



Source: Rob Fergus and Antonio Torralba


Challenges or opportunities?

Si possono però sfruttare al di là di tutto, alcune peculiarità

intrinseche delle immagini stesse (i cosiddetti cues)

Image source: J. Koenderink

10


Depth: Prospettiva (lineare)



Depth: Prospettiva “aerea”


11


Shape: Texture gradient



Shape and lighting: Shading

Source: J. Koenderink

12


Position and lighting: Cast shadows

Source: J. Koenderink


Casi limite

13


Connections to other

disciplines

Computer Vision

Image Processing

Machine Learning

Artificial Intelligence

Robotics

Cognitive science

Neuroscience Computer Graphics


Optical character recognition (OCR)

Source: S. Seitz, N. Snavely

Digit recognition

yann.lecun.com License plate readers

http://en.wikipedia.org/wiki/Automatic_number_plate_recognition

Sudoku grabber

http://sudokugrab.blogspot.com/

Automatic check processing

Other_Slides/yann.lecun.com

Other_Slides/yann.lecun.com





14


Biometrics

Fingerprint scanners on

many new laptops,

other devices

Face recognition systems now beginning

to appear more widely http://www.sensiblevision.com/

Source: S. Seitz


Biometrics

How the Afghan Girl was Identified by Her Iris Patterns

Source: S. Seitz

http://www.sensiblevision.com/

http://www.sensiblevision.com/

http://www.cl.cam.ac.uk/~jgd1000/afghan.html

http://www.cl.cam.ac.uk/~jgd1000/afghan.html

15


Mobile visual search: Google Goggles



Face detection

Many new digital cameras now detect faces Canon, Sony, Fuji, …

Source: S. Seitz

http://www.google.com/mobile/goggles/

16


Smile detection

Sony Cyber-shot® T70 Digital Still Camera

Source: S. Seitz


Face recognition: Apple iPhoto software

http://www.apple.com/ilife/iphoto/


http://www.sonystyle.com/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10551&storeId=10151&productId=8198552921665200469&langId=-1






17


Automotive safety

Mobileye: Vision systems in high-end BMW, GM, Volvo models

Pedestrian collision warning

Forward collision warning

Lane departure warning

Headway monitoring and warning Source: A. Shashua, S. Seitz


Vision-based interaction: Xbox Kinect

http://electronics.howstuffworks.com/microsoft-kinect.htm


http://www.mobileye.com/





18


3D from Projected Light

Picoprojector

Structured light

Lowcost webcam


R&D projects on Safety and Security

Goal:

Camera Stereo – Real time monitoring of dangerous

conflicts (car, pedestrian, etc.)

Real-time traffic monitoring

19


NewFrameworks

– Analyze the feedback of audiovisual advertising

– Integrate of interactive multimedia content through natural interface


Computer Vision Goals

Costruire sistemi capaci di prendere decisioni a partire da una descrizione della scena estrapolata da immagini/video;

Inferire il mondo 3D a partire da immagini digitali;

Riconoscimento di oggetti, scene, contesto a partire da immagini digitali.

….

video/popchannelita.mp4 (Oggetto video mp4).mp4

20


Perché studiare Computer Vision?



Applications: The Computer Vision Industry (1)

Automobile driver assistance – Systems that warn automobile drivers of danger, provide adaptive cruise control, and give

driver assistance.

Automobile traffic management – Systems for reading automobile license plates.

Film and Television – Systems for tracking objects in video or film action to provide enhanced broadcasts.

General purpose vision systems – Vision systems for object recognition and navigation. Applications include mobile robotics,

grocery retail, and recognition from cell phone cameras.

Image search – Image retrieval based on content.

Industrial automation and inspection – Automotive industry: Systems for vision-guided robotics in the automotive industry.

– Electronics industry: Electronics inspection systems for component assembly and semiconductor manufacturing.

– Food and agriculture: Vision systems for inspecting and grading fruits and vegetables.

– Printing and textiles: Inspection for the printing and packaging industries.

21


Applications: The Computer Vision Industry (2)

Medical and biomedical – Uses real-time stereo vision to detect and track the pose of markers for surgical

applications.

Pedestrian tracking – Systems for counting and tracking pedestrians using overhead cameras.

Safety monitoring – System monitors swimming pools to warn of accidents and drowning victims.

Security – Vision systems for video surveillance, including tracking, object monitoring, and

behavior analysis.

Biometric – Systems for Fingerprint recognition and biometric face recognition

Three-dimensional modeling – Creation of texture-mapped 3-D models from a small number of photographs.

Video Games – Interactive advertising for projected displays that tracks human gestures.


Videos Examples

Object Classification

Automatic Object Detection and Recognition

Pedestrian Detection

Pedestrian Detection in Crowds

Face Tracking

Body Tracking

People Counting in store

In/out People counting

Detection of scene in video

Detection of Actions in Video

Detection of independent motion in Crowds

3D city modelling from photos

3D bone classification and Reconstruction

3D from single photo

3D Object Modelling from images

22


Demos

Visualizzazione di fotografie in un ambiente 3D “virtuale”

– http://photosynth.net


Pitt Patt: Video Face Mining

http://facemining.pittpatt.com/play_video.php?S1E03

http://photosynth/collectionHome.htm













23


Links in Rete

– The Computer Vision Home Page

– http://www.cs.cmu.edu/~cil/vision.html

– Computer Vision Education – http://www.cved.org/

– The Computer Vision Industry – http://www.cs.ubc.ca/spider/lowe/vision.html

– CVOnLine – http://homepages.inf.ed.ac.uk/rbf/CVonline/


Programma di massima del Corso (1/2)

Il corso si propone di approfondire teorie e tecniche specificatamente rivolte alla visione artificiale con una serie di applicazioni.

La prima parte del corso verterà su:

- Modelli di Formazione dell’Immagine: Camera Calibration

- Filters e Features - Edge, Linee, Trasformata di Hough

- Piramidi Laplaciane

- Corner Detection (Harris, …)

- SIFT: Teoria e Applicazioni

- Beyond SIFT

- Tecniche di segmentazione:

- Thresholding

- Seeded Region Growing

- Statistical Region Merging, …

http://www.cs.cmu.edu/~cil/vision.html

http://www.cved.org/

http://www.cs.ubc.ca/spider/lowe/vision.html

http://homepages.inf.ed.ac.uk/rbf/CVonline/

24


Programma di massima del Corso (2/2)

La seconda parte verterà su:

- Modelli probabilistici applicati alla Visione

- Shape Modeling

- Face Detection and Recognition

Alcuni casi di studio e applicazioni

CBIR Retrieval

Video Stabilization

L'ultima parte del corso è dedicata ad un tema "specialistico" d'approfondimento.


Computational Photography

Computational photography refers broadly to sensing strategies and algorithmic techniques that enhance or extend the capabilities of digital photography. The output of these techniques is an ordinary photograph, but one that could not have been taken by a traditional camera.

Camera 2.0 project

Stanford Computer Graphics Laboratory,Nokia Research Center Palo Alto Laboratory, Adobe Systems, Kodak, Hewlett-Packard, Walt Disney Company. Also in collaboration with F. Durand and W. Freeman of MIT. http://graphics.stanford.edu/projects/camera-2.0/

http://graphics.stanford.edu/projects/camera-2.0/



25


Recent Trends: FrankenCamera

An Experimental Platform for Computational Photography

[SIGRAPH10][IEEE CGA 2010] by Levoy et al.

It has been designed and implemented an open architecture and

API for the so-called Frankencamera. It consists of a base hardware

specification, a software stack based on Linux, and an API for C++.

The architecture permits control and synchronization of the sensor

and image processing pipeline at the microsecond time scale, as

well as the ability to incorporate and synchronize external hardware

like lenses and flashes.

http://graphics.stanford.edu/papers/fcam/


FrankenCamera results

http://graphics.stanford.edu/papers/fcam/

26


FrankenCamera results

http://graphics.stanford.edu/papers/fcam/fcam.mov


Typical Imaging Pipeline (1)

Data coming from the sensor (in Bayer format) are first analyzed to collect useful statistics for parameters setting (pre-acquisition) and then properly processed in order to obtain, at the end of the process, a compressed RGB image of the acquired scene (post-acquisition and camera applications).

Lens Sensor

Real Scene

Filters

Pre-Acquisition

Auto Exposure

Image Statistics

Auto Focus

Post-Acquisition

Color Matrixing

Sharpening

White Balance

Color Interpolation

Gamma Correction

Color Conversion

Camera Applications

Panoramic

Multi-Frame Res. Enhanc.

Red Eye Removal

Video Stabilization

Noise Reduction

http://graphics.stanford.edu/papers/fcam/fcam.mov

27


Typical Imaging Pipeline (2)

Camera application functionalities are not mandatory and usually include solution for panoramic, red-eye removal, video stabilization. They can be considered an added value.

Lens Sensor

Real Scene

Filters

Pre-Acquisition

Auto Exposure

Image Statistics

Auto Focus

Post-Acquisition

Color Matrixing

Sharpening

White Balance

Color Interpolation

Gamma Correction

Color Conversion

Camera Applications

Panoramic

Multi-Frame Res. Enhanc.

Red Eye Removal

Video Stabilization

Noise Reduction


Limiti della fotografia tradizionale

Slides from Lazebnik

28







29







30







31


Embedded Computer Vision

Implementazione su dispositivi consumer (digital

camera, smartphone) di tecnologie ad-hoc

Internet Computer Vision

Soluzioni di CV orientati alla Rete (Scalabilità,

Copyright, Privacy, ecc.)


OpenCV

OpenCV (Open Source Computer Vision) è una libreria di funzioni per la realizzazione di soluzioni di computer vision in applicazioni real time.

OpenCV is released under a BSD license, it is free for both academic and commercial use. The library has >500 optimized algorithms. It is used around the world, has >2M downloads and >40K people in the user group. Uses range from interactive art, to mine inspection, stitching maps on the web on through advanced robotics.

Link: http://opencv.willowgarage.com/wiki/

http://opencv.willowgarage.com/wiki/







32



Books

E. Trucco, A. Verri, “Introductory Techniques for 3-D Computer Vision”, Prentice Hall, 1998

Richard Szeliski, Computer Vision: Algorithms and Application, Springer 2010 (lnk)

Mubarak Shah, "Fundamentals of Computer Vision" (On-Line), 1997

G. Bradski, A. Kaehler, “Learning OpenCV Computer Vision with the OpenCV Library” O'Reilly Media, 2008

R. Hartley and A. Zisserman, “Multiple View Geometry in Computer Vision”, 2004

D. A. Forsyth, J. Ponce, “Computer Vision a Modern Approach”, Prentice Hall PTR, 2002

R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern Classification”, Wiley Interscience, 2001

C. M. Bishop, “Pattern Recognition and Machine Learning”, 2006

Gonzalez, Woods, “Elaborazione delle Immagini Digitali”, PBM, Terza Edizione, 2008

http://szeliski.org/Book/

33


Modalità d’esame

Progetto SW personale da concordare con il docente.

Prove in Itinere (almeno una) con esonero.

Colloquio Orale

comprendente la Demo del progetto


Utility

Slides e Materiale Vario: www.dmi.unict.it/~battiato/CVision1213/CVision1213.htm

Forum

E-mail:

[email protected]

Ricevimento:

(Consultare il web)

34


Outline del corso

Introduzione

Camera Calibration

Imaging Pipeline/Computational Photography

Low Level Computer Vision

Edge, Linee, Texture, Corner

SIFT: Teoria ed Applicazioni

Beyond SIFT

Tecniche di Segmentazione applicate alle immagini digitali

Face Detection and Recognition

Shape Characterization/Modeling

Modelli probabilistici applicati alla Visione

Applicazioni

Video Stabilization

Tracking

…


Computer Vision

Distinguiamo la CV di basso livello e di alto livello.

La prima si occupa di estrarre determinate proprietà fisiche

dell’ambiente visibile, come profondità, forma

tridimensionale, contorni degli oggetti.

I processi di visione di basso livello sono tipicamente

paralleli, spazialmente uniformi e relativamente indipendenti

dal problema e dalla conoscenza a priori associata a

particolari oggetti.

35


Computer Vision

Viceversa, la visione di alto livello si occupa della

estrazione delle proprietà delle forme e di relazioni spaziali,

di riconoscimento e classificazione di oggetti. I processi di

alto livello sono di solito applicati ad una porzione

dell’immagine, dipendono dall’obbiettivo della

computazione e dalla conoscenza a priori associata agli

oggetti.


36


Problemi tipici (1)

Condizioni di illuminazione che producono una

variazione nella distribuzione dell’intensità luminosa

della scena.

Trasformazioni geometriche rigide dell’oggetto (in

ordine di difficoltà crescente):

– roto-traslazioni e variazioni di scala in 2D (e in 3D).

Rumore.

Gap: tipo particolare di rumore consistente nella

mancanza di elementi nell’immagine.

Occlusione.


Problemi tipici (2)

Segmentazione: partizionamento dei dati di input in entità semantiche distinte (linee, regioni, oggetti).

Indexing: effettuare una ricerca efficiente in un catalogo di modelli.

Identificazione: riconoscere l’istanza di un oggetto in un’immagine.

Oggetti non rigidi (forbici, volti umani, ...). Il loro riconoscimento è complicato dalla possibilità che ha la loro forma di variare.

Classificazione: riconoscere l’appartenenza ad una data classe di un oggetto in un’immagine.

37


Calendario di massima

Camera Calibration e cenni di stereoscopia

OpenCV (Android), Kinect (SDK)

Imaging Pipeline

Low Level Vision

Mid Level Vision: Tecniche di Segmentazione

ecc.

Documents

Computer Vision - Unict