50
Analysis & Knowledge Extraction of Online User Behaviour and Visual Content for Art and Culture Events Marco Brambilla Tahereh Arabghalizi Behnam Rahdari Marco Brambilla Contacts: @marcobrambi, [email protected], http://datascience.deib.polimi.it UNIVERSITY OF PITTSBURGH

Analysis and knowledge extraction of user behaviour and social media content for art culture events

Embed Size (px)

Citation preview

Page 1: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Analysis & Knowledge Extraction of Online User Behaviour and Visual Content

for Art and Culture Events

Marco Brambilla Tahereh Arabghalizi Behnam Rahdari

Marco Brambilla

Contacts: @marcobrambi, [email protected], http://datascience.deib.polimi.it

UNIVERSITY OF PITTSBURGH

Page 2: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Agenda

Context

Method

• Pre-processing

• Topic analysis

• User clustering

• Multimedia: Images• concepts vs. text extraction

• color schema and the main color pattern(s)

• Prediction of interests

Challenges & Conclusions

Page 3: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Context

• Role of social media in our life

• Social media for cultural and artistic events

• Behaviour and content

• Multi-disciplinary collaboration on social media analysis and

cultural heritage

• Collaboration: Politecnico di Milano, Musei di Brescia, University

of Pittsburg

Page 4: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Research Questions

Topics of interest of visitors?

Categorization of users?

Demographics of visitors?

Engagement and online

participation?

Relation between photos, time,

location, text and the event?

Page 5: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Approach

Domain-specific pipeline to profile social media users

and content in cultural or art events

Page 6: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Case Study

The Floating Piers by Christo and Jeanne Claude

Iseo Lake, Italy

June 2016

Page 7: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Case Study

Page 8: Analysis and knowledge extraction of user behaviour and social media content for art culture events
Page 9: Analysis and knowledge extraction of user behaviour and social media content for art culture events
Page 10: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Case Study

• 17 MLN $

• 220,000 floating blocks

• 1.5 MLN visitors in 16 days

Page 11: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Pre-processing

Page 12: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Data Extraction

• Using Instagram and Twitter APIs

• Extract relevant tweets/posts during the event

• Extract all relevant users

o That tweet/post directly

o that like, comment, retweet, etc.

• Extract all properties

o Textual: bio, tweet/post text, hashtag, etc.

o Quantitative: #followers, #followings, etc.

o Media: photos, metadata (geotag, …)

Page 13: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Tweets Posts

14,062 30,256

Users Users

23,916 94,666

Authors Reacting Authors Reacting

7,724 16,197 16,681 77,985

From June 10th to July 30th

Collected Data

Page 14: Analysis and knowledge extraction of user behaviour and social media content for art culture events

• Text normalization (NLP)

• Language identification and translation

• Gender detection

• Data cleansing

• Store clean and transformed data

Preprocessing

Page 15: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Time Distribution (Twitter)

Page 16: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Time series – Instagram vs. Twitter

Page 17: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Instagram Likes and Comments

Page 18: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Italy Lombardy Region Iseo Lake

Geographical Distribution (Instagram)

Page 19: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Data Analysis Process

1. Document Term Matrix (DTM)

2. Topic Extraction

3. Dimension Reduction

4. Cluster Analysis and Validation

5. Prediction

6. Media Analysis

7. Content Network Analysis

Page 20: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Topics

Page 21: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Document-term Matrix

A matrix that describes the frequency of terms that

occur in a collection of documents

Terms

Documents

Art Travel Italy Design …

Post 1 0 1 1 0

Post 2 1 2 0 1

Post 3 0 0 1 0

Post 4 1 1 3 1

Page 22: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Topic Extraction

Latent Dirichlet Allocation (LDA):

documents as mixtures of topics (with probability)

Input: Document Term Matrix

Outputs: Topics, Topic Probabilities Matrix

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 …

Post 1 0.19 0.16 0.27 0.14 0.11 0.13

Post 2 0.31 0.18 0.21 0.08 0.10 0.12

Post 3 0.25 0.24 0.20 0.17 0.09 0.05

Post 4 0.19 0.32 0.22 0.10 0.07 0.10

Page 23: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Dimensionality Reduction

• Hundreds of topics extracted with LDA

• Using Principle Component Analysis (PCA) to extract a smaller set

of linearly uncorrelated topics

> 0.95

Variance share Cumulative variance share

Page 24: Analysis and knowledge extraction of user behaviour and social media content for art culture events

User Clustering

Page 25: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Cluster Analysis

• Apply clustering algorithms over Topic Probabilities

Matrix to cluster users

• Multiple data slices

• Multiple algorithms

o K-means

o Hierarchical

o DBSCAN

Topic 1

Topic 3

Topic 2

Page 26: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Cluster Validity

• How to evaluate the “goodness” of the resulting

clusters?

• Validation Measures

– Internal : ex. Silhouette Coefficient, Dunn’s Index,

Calinski-Harabasz index, etc.

– External: ex. Entropy, Purity, Rand index, etc.

Page 27: Analysis and knowledge extraction of user behaviour and social media content for art culture events

User Clustering

Page 28: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Travel

Lovers

Art

Lovers

Internet & Tech

Lovers

Users’ Biography Word Clouds

Cluster Labeling

Page 29: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Word Network for Clusters

Page 30: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Travel Lovers

Art Lovers

Tech Lovers

Hierarchical Clustering

Page 31: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Language

Gender

Impact of Demographics

Page 32: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Prediction

Page 33: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Prediction

Predict the category or the interest area of potential new users for

similar cultural or art events in the future

Decision Trees

o Prepare Required Data

o Grow Decision Tree

o Extract rules from the tree

o Predict using test data

o Evaluate

Page 34: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Extracted Rules

Rule 1 : if (0.36 < Bio_score < 0.37 OR Bio_score < 0.35)

then Travel Lover

Rule 2: if (0.35 < Bio_score < 0.36 AND Status_count >

14.5) OR (Bio_score > 0.37 AND language != Italian)

then Art Lover

Rule 3: if (Bio_score > 0.37 AND Language = Italian) then

Tech Lover

Otherwise: Not Interested

accuracy = 62 %

Prediction rules

Page 35: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Decision Tree

Page 36: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Image Analysis

Page 37: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Tweets Posts

14,062 30,256

Users Users

23,916 94,666

Authors Reacting Authors Reacting

7,724 16,197 16,681 77,985

From June 10th to July 30th

Only Instagram

Page 38: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Used Instagram Filters

Page 39: Analysis and knowledge extraction of user behaviour and social media content for art culture events

People in Pictures

Page 40: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Age Sex50.4% female

49.6% male

Visitor Analytics

Race

Bias of the medium?

Page 41: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Image content analsys

Concept extraction (DNN based third party

service)

Comparison with hashtags / text

Image low-level feature analysis

Page 42: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Concepts in Pictures Hashtags

Users tend not to report the actual content of the photos

in their textual descriptions /hashtags

Object Extraction from Pictures

Page 43: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Main color shades among all photos

Color Detection for Subject Identification

Page 44: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Confusion Matrix

Simple techniques “good enough”?

Objects or Colors?

Page 45: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Ongoing Challenges

Page 46: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Future Challenges of KE

Determining exact

positioning based on

perspective

Page 47: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Future Challenges of KE

Network structures

and their temporal

evolution

Max graph perturbation

Daily graph variations

Page 48: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Future Challenges

Real cross-disciplinarity

(cultural heritage, humanities,

social science)

No visitors for the cultural part of the event!

(exhibition at the museum)

Exhibit--->

Page 49: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Conclusions

• (Sometimes) Simple methods work just fine

• Interesting profiling and behaviour detection

• Still far from cross-disciplinary approaches

Page 50: Analysis and knowledge extraction of user behaviour and social media content for art culture events

Contacts: Marco Brambilla, @marcobrambi, [email protected]

http://datascience.deib.polimi.it

http://www.marco-brambilla.com

Analysis of Online User Behaviourfor Art and Culture Events

Marco Brambilla, Tahereh Arabghalizi, Behnam Rahdari

UNIVERSITY OF PITTSBURGH