Download pptx - Big Data Analysis

Transcript
Page 1: Big Data Analysis

Big Data Analysis

Chin-Chih Chang 張欽智 [email protected]

Computer Science and Information Engineering

Chung Hua University

2014/03/24

1

Page 2: Big Data Analysis

Big Data Analysis

What is Big Data? Why is Big Data important? How to do with these data? Example: A Recommender System

Combining Social Networks for Tourist Attractions

2

Page 3: Big Data Analysis

What is Big Data?

Big Data refers to datasets whose size are beyond the ability of typical database software tools to capture, store, manage and analyze.

This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data.

Big data in many sections today will range from a few dozen terabytes (1012) to multiple petabytes (1015).

3

Page 4: Big Data Analysis

What is big data?

Big Data is not just about the size of data but also includes data variety and data velocity. Together, these three attributes form the three V’s of Big Data.

4

Page 5: Big Data Analysis

Data types

Structured data: This type describes data which is grouped into a relational scheme (e.g., rows and columns within a standard database). The data configuration and consistency allows it to respond to simple queries to arrive at usable information, based on an organization's parameters and operational needs.

5

Page 6: Big Data Analysis

Data Types

Semi-structured data: This is a form of structured data that does not conform to an explicit and fixed schema. The data is inherently self-describing and contains tags or other markers to enforce hierarchies of records and fields within the data. Examples include weblogs and social media feeds.

Unstructured data: This type of data consists of formats which cannot easily be indexed into relational tables for analysis or querying. Examples include images, audio and video files.

6

Page 7: Big Data Analysis

What is big data?

7

Page 9: Big Data Analysis

How much data?

9

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 20200

5

10

15

20

25

30

35

40

Structured block-based data storage Unstructured file-based data storage

Zet

tab

ytes

Page 10: Big Data Analysis

How much data?

“We except to create 12.6 exabytes of data every day in 2014 — so much that 90% of the data in the world today has been created in the last two years alone.

This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.

This data is “big data.”

10

Page 11: Big Data Analysis

Big Data is everywhere!

Lots of data is being collected and warehoused Web data, e-commerce purchases at department/grocery stores Bank/credit card transactions Social network Instant messaging Internet of things

11

Page 12: Big Data Analysis

Type of Data Relational Data (Tables/Transaction/Legacy

Data) Text Data (Web) Semi-structured Data (XML) Graph Data

Social Network, Semantic Web (RDF), … Streaming Data

You can only scan the data once

12

Page 13: Big Data Analysis

Why is Big Data important?

Successful Stories: Netflix Movies Super markets …

13

Page 14: Big Data Analysis

What to do with these data?

Aggregation and Statistics Data warehouse and OLAP

Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF)

Knowledge discovery Data Mining Statistical Modeling

14

Page 15: Big Data Analysis

What is Data Mining?

Discovery of useful, possibly unexpected, patterns in data

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

15

Page 16: Big Data Analysis

Data Mining Tasks

Classification [Predictive]

Clustering [Descriptive]

Association Rule Discovery [Descriptive]

Sequential Pattern Discovery [Descriptive]

Regression [Predictive]

Deviation Detection [Predictive]

Collaborative Filter [Predictive]

16

Page 17: Big Data Analysis

Example: A Recommender System Combining Social Networks for Tourist

Attractions

17

Page 18: Big Data Analysis

Outline

Abstract Introduction Related Work System Design and Mechanism System Implementation and Experiments Experimental Results Conclusion and Future Work

18

Page 19: Big Data Analysis

Abstract

In this paper we present a recommender system combining social networks for tourist attractions.

Three mechanisms are analyzed: Using similarity among users and their

trustability. Using information collected from social

networks. Combination of similarity and social networks.

19

Page 20: Big Data Analysis

Introduction

A recommender system is a system that suggests things which users might be interested in after learning their preferences.

A recommender system can help users cope with the problem of information overload.

Social networks have become a common platform for people to share their thoughts and extend their friendships into a virtual world.

20

Page 21: Big Data Analysis

Introduction

There is high potential to enhance recommender systems by incorporating social network information.

But how to effectively use social network information is still a research topic.

A tourist information system will be convenient to those who are preparing to travel or just on the road.

21

Page 22: Big Data Analysis

Introduction

Similar information overload could happen in these tourist information systems.

In this paper, we will present a tourist information system that combines recommender systems and social network.

22

Page 23: Big Data Analysis

Related WorkRecommender System

A recommender system is used to help users find items they prefer faster and more accurate by suggesting them the right things.

There are mainly four approaches for recommendation: content-based filtering, collaborative filtering, knowledge-based approaches, and hybrid approaches.

23

Page 24: Big Data Analysis

Related WorkRecommender System

Content-based filtering: The method recommends items that are similar to the ones that the user liked in the past.

Collaborative filtering: The method recommends the items that are likely used by those who have the similar interest to the user.

Knowledge-based approaches: One example of this type of approaches is to ask the user directly about her or his requirements. Based on the criteria provided by the user the items are recommended.

Hybrid approaches: The method is a hybrid of above methods.

24

Page 25: Big Data Analysis

Related WorkRecommender System

Comparison of Recommender Techniques

25

Recommendation techniques Advantages Drawbacks

Content-based filtering

Effective in locatingitems that are relevant

to the topic

Capturing only certainaspects of the content; over-

specialization

Collaborative filteringThe items are

recommended based on user’s rating.

The coverage of rating could be very sparse; the new

items would not be recommended; algorithm is

not so efficient.

Knowledge-based approaches

It does not rely on the existence of a purchase

history.

Detailed knowledge about items might be required.

Hybrid approachEfficient and more

accurate Not so simple.

Page 26: Big Data Analysis

Related WorkSocial Network Sites

Social network sites are Web-based services which enable online social networks or relationships.

Social network sites are one type of social media which is any platform where people can create, share, and exchange their activities, views, interests, experiences, or information.

26

Page 27: Big Data Analysis

Related WorkSocial Network Sites

Social media have become a part of our daily life. It is not easy for us not to notice people are

focusing on their mobile device to use Facebook or LINE no matter where they are.

User profiles, friends, and comments are three key components of social network sites.

27

Page 28: Big Data Analysis

Related WorkSocial Network Sites

Social network users have been growing drastically. There approximate 800 million users on

Facebook. Some even called it Facebook country. A social recommendation utilizes user's social

network and related information for recommendation.

28

Page 29: Big Data Analysis

Related WorkSocial Network Sites

Social network users have been growing drastically. There approximate 800 million users on

Facebook. Some even called it Facebook country. A common technique for social

recommendations is collaborative filtering. Based on two assumptions: people who are

socially associated are more likely to share the common interests and users can be easily influenced by the friends they trust. 29

Page 30: Big Data Analysis

Related WorkTourist Information Systems

A tourist information system is a system that provides travel guides, maps, information of accommodation and transportation.

A system that can recommends tourist attractions will be very helpful to any tourist.

30

Page 31: Big Data Analysis

System Design and Mechanism

In our design we aim at building a tourist information system which lets users access the attraction information either from an information kiosk.

The system is associated with Facebook. Whenever an interface device is equipped

with a RFID reader, users can to log into the system without typing the account and password by using a RFID card.

31

Page 32: Big Data Analysis

System Design and Mechanism

The interactions between users and attraction information website are shown as follows.

32

Page 33: Big Data Analysis

System Design and Mechanism

The system operation is shown as follows:

33

Page 34: Big Data Analysis

System Design and MechanismSystem Operation

1. Facebook App interface is available to users.

2. Users can access Facebook App to share, like, comment on, and rate the attractions using their Facebook account. The first user needs to choose their interest and can register their RFID cards.

3. A Web server and a database management system (DBMS) are running on a server machine.

4. Users can directly log into the system through a RFID card if they have registered their RFID card.

34

Page 35: Big Data Analysis

Personalized Social Recommendation (PSR)

1. Acquire users’ appraisal on each attraction and activities on the social network site.

2. Use collaborative filtering or keep track of activities on the social network site for recommendation.

3. Calculate the score for each attraction.

4. Rank attractions based on the score. 1. If the scores are same, then check the appraisal time. The

evaluation done in the more recent time obtains the higher rank.

5. Recommend the attractions with top 3 scores and show the attraction of the top 1 on the main page. 35

Page 36: Big Data Analysis

List of Recommendation Methods

  Recommendation Method

1 Collaborative filtering based on users’ appraisal and trustability evaluations: Equation (1)(2)(3).

2 Social recommendation based on users’ activities in social network sites: Equation (4)(5).

3 Combination of 1 and 2: Equation (6).

36

Page 37: Big Data Analysis

Attraction appraisals of different users

User

 Attraction

User0 User1 User2 User3 User4 User5

A1 10 8 4 10 7 10

A2 4 2 4 2 2 6

A3 8 4 6 8 8 10

A4 4 3 10 8 5 5

A5 5 2 8 10 4 10

37

Page 38: Big Data Analysis

Recommendation Methods Collaborative filtering

First calculate the average appraisal of kth attraction from all users.

Then evaluate the difference between a user and the mean value by the equation where is the appraisal of kth attraction from the userj.

The average trustability of the userj is calculated using Equation (1).

38

Page 39: Big Data Analysis

Recommendation Methods Collaborative filtering

(1) k indicates kth attraction; m indicates the total number of the appraisal that

userj gave; C is a constant value used to control the

difference degree; and the default value is set to 1.5. The larger C is, the less trustability is.

39

Page 40: Big Data Analysis

Recommendation Methods Collaborative filtering

Trustability of users

40

Attraction User1 User2 User3 User4 User5

A1 0.922 0.132 0.420 0.559 0.410

A2 0.523 0.723 0.532 0.523 0.273

A3 0.198 0.523 0.730 0.723 0.273

A4 0.252 0.132 0.359 0.667 0.667

A5 0.112 0.482 0.191 0.296 0.182

0.401 0.398 0.446 0.555 0.361

Page 41: Big Data Analysis

Recommendation Methods Collaborative filtering

Similarity matrix among users

41

ji 1 2 3 4 5 6 7 8 9 10

1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

2 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

3 0.8 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3

4 0.7 0.8 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4

5 0.6 0.7 0.8 0.9 1 0.9 0.8 0.7 0.6 0.5

6 0.5 0.6 0.7 0.8 0.9 1 0.9 0.8 0.7 0.6

7 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9 0.8 0.7

8 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9 0.8

9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Page 42: Big Data Analysis

Recommendation Methods Collaborative filtering

(2) Si,j is the similarity between user i and user j to an

attraction; is the average similarity between user i and user

j; n is the number of attractions that both user i and

user j recommend.

42

Page 43: Big Data Analysis

Recommendation Methods Collaborative filtering

Average similarity between User 0 and other users

43

  User1 User2 User3 User4 User5

User0 0.760 0.660 0.780 0.860 0.800

Page 44: Big Data Analysis

Recommendation Methods Collaborative filtering

(3) is the average appraisal weighting of user i for

each user j. Average appraisal weighting of User 0 for

each user j

44

  User1 User2 User3 User4 User5

User0 R0,j 0.305 0.263 0.348 0.477 0.289

Page 45: Big Data Analysis

Recommendation MethodsSocial Network Activities

A user’s preference is evaluated using Equation (4).

The normalization of user’s rating is calculated using Equation (5) where Ri is the appraisal of kth attraction from the user i.

P = R + S + L + I (4) (5)

45

Page 46: Big Data Analysis

Recommendation MethodsCF plus Social Recommendation

We then combine Equation (3) for collaborative filtering and Equation (4) for social recommendation into Equation (6) where each method are given the weight 0.5.

T = 0.5R + 0.5P (6)

46

Page 47: Big Data Analysis

System Implementation and Experiments

Development environment

47

HardwareCPU Intel Core i5-560M, 2.67GHzMemory 4GB DDRIIINetwork Interface Card Atheros AR8131 PCI-E Gigabit

Ethernet ControllerSoftware

OS Windows 7 EnterpriseDevelopment Zend Studio 8.0.1、 Apache Programming Languages PHPSDK Facebook SDK for PHPDatabase PostgreSQL 9.0

Page 48: Big Data Analysis

System architecture

48

Page 49: Big Data Analysis

Map of Web pages

49

Page 50: Big Data Analysis

Main page

50

Page 51: Big Data Analysis

Page of an attraction

51

Page 52: Big Data Analysis

Experiment Results

In our experiments there are total 1360 records based on 20 attractions and 68 participants. We test 3 methods.

The better solution needs to include more attractions without being affected by low activities on social network sites.

52

Page 53: Big Data Analysis

Experiment ResultsCollaborative Filtering

53

Page 54: Big Data Analysis

Experiment ResultsSocial Recommendation

54

Page 55: Big Data Analysis

Experiment Results collaborative filtering and social

recommendation

55

Page 56: Big Data Analysis

Conclusions and Future Work

In this paper we present a recommendation mechanism that takes user’s social network into consideration.

The system has three key features:1. Social recommendation is integrated into the

system.2. Personalization is taken into account. 3. The system is practical, cost-effective, and

expandable.56

Page 57: Big Data Analysis

Conclusions and Future Work

In order to enhance the recommendation mechanism, more factors that could affect the recommendation should be investigated.

The other issue is to figure out more methods that can mine social network sites.

In the future, we can apply our design to other systems such as a museum information system.

57

Page 58: Big Data Analysis

58

Thanks for your listening!Q & A


Recommended