Upload
tara-nelson
View
33
Download
0
Embed Size (px)
DESCRIPTION
Big Data Analysis. Chin- Chih Chang 張欽智 [email protected] Computer Science and Information Engineering Chung Hua University 2014/03/24. Big Data Analysis. What is Big D ata? Why is Big Data important? How to do with these data? - PowerPoint PPT Presentation
Citation preview
Big Data Analysis
Chin-Chih Chang 張欽智 [email protected]
Computer Science and Information Engineering
Chung Hua University
2014/03/24
1
Big Data Analysis
What is Big Data? Why is Big Data important? How to do with these data? Example: A Recommender System
Combining Social Networks for Tourist Attractions
2
What is Big Data?
Big Data refers to datasets whose size are beyond the ability of typical database software tools to capture, store, manage and analyze.
This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data.
Big data in many sections today will range from a few dozen terabytes (1012) to multiple petabytes (1015).
3
What is big data?
Big Data is not just about the size of data but also includes data variety and data velocity. Together, these three attributes form the three V’s of Big Data.
4
Data types
Structured data: This type describes data which is grouped into a relational scheme (e.g., rows and columns within a standard database). The data configuration and consistency allows it to respond to simple queries to arrive at usable information, based on an organization's parameters and operational needs.
5
Data Types
Semi-structured data: This is a form of structured data that does not conform to an explicit and fixed schema. The data is inherently self-describing and contains tags or other markers to enforce hierarchies of records and fields within the data. Examples include weblogs and social media feeds.
Unstructured data: This type of data consists of formats which cannot easily be indexed into relational tables for analysis or querying. Examples include images, audio and video files.
6
What is big data?
7
How much big data?Multiples of bytes
Decimal
Value Metric
1000 kB kilobyte
10002 MB megabyte
10003 GB gigabyte
10004 TB terabyte
10005 PB petabyte
10006 EB exabyte
10007 ZB zettabyte
10008 YB yottabyte
Binary
Value JEDEC IEC
1024 KB kilobyte KiB kibibyte
10242 MB megabyte MiB mebibyte
10243 GB gigabyte GiB gibibyte
10244 - - TiB tebibyte
10245 - - PiB pebibyte
10246 - - EiB exbibyte
10247 - - ZiB zebibyte
10248 - - YiB yobibyte
Orders of magnitude of data8
How much data?
9
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 20200
5
10
15
20
25
30
35
40
Structured block-based data storage Unstructured file-based data storage
Zet
tab
ytes
How much data?
“We except to create 12.6 exabytes of data every day in 2014 — so much that 90% of the data in the world today has been created in the last two years alone.
This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.
This data is “big data.”
10
Big Data is everywhere!
Lots of data is being collected and warehoused Web data, e-commerce purchases at department/grocery stores Bank/credit card transactions Social network Instant messaging Internet of things
11
Type of Data Relational Data (Tables/Transaction/Legacy
Data) Text Data (Web) Semi-structured Data (XML) Graph Data
Social Network, Semantic Web (RDF), … Streaming Data
You can only scan the data once
12
Why is Big Data important?
Successful Stories: Netflix Movies Super markets …
13
What to do with these data?
Aggregation and Statistics Data warehouse and OLAP
Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF)
Knowledge discovery Data Mining Statistical Modeling
14
What is Data Mining?
Discovery of useful, possibly unexpected, patterns in data
Non-trivial extraction of implicit, previously unknown and potentially useful information from data
Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
15
Data Mining Tasks
Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
Collaborative Filter [Predictive]
16
Example: A Recommender System Combining Social Networks for Tourist
Attractions
17
Outline
Abstract Introduction Related Work System Design and Mechanism System Implementation and Experiments Experimental Results Conclusion and Future Work
18
Abstract
In this paper we present a recommender system combining social networks for tourist attractions.
Three mechanisms are analyzed: Using similarity among users and their
trustability. Using information collected from social
networks. Combination of similarity and social networks.
19
Introduction
A recommender system is a system that suggests things which users might be interested in after learning their preferences.
A recommender system can help users cope with the problem of information overload.
Social networks have become a common platform for people to share their thoughts and extend their friendships into a virtual world.
20
Introduction
There is high potential to enhance recommender systems by incorporating social network information.
But how to effectively use social network information is still a research topic.
A tourist information system will be convenient to those who are preparing to travel or just on the road.
21
Introduction
Similar information overload could happen in these tourist information systems.
In this paper, we will present a tourist information system that combines recommender systems and social network.
22
Related WorkRecommender System
A recommender system is used to help users find items they prefer faster and more accurate by suggesting them the right things.
There are mainly four approaches for recommendation: content-based filtering, collaborative filtering, knowledge-based approaches, and hybrid approaches.
23
Related WorkRecommender System
Content-based filtering: The method recommends items that are similar to the ones that the user liked in the past.
Collaborative filtering: The method recommends the items that are likely used by those who have the similar interest to the user.
Knowledge-based approaches: One example of this type of approaches is to ask the user directly about her or his requirements. Based on the criteria provided by the user the items are recommended.
Hybrid approaches: The method is a hybrid of above methods.
24
Related WorkRecommender System
Comparison of Recommender Techniques
25
Recommendation techniques Advantages Drawbacks
Content-based filtering
Effective in locatingitems that are relevant
to the topic
Capturing only certainaspects of the content; over-
specialization
Collaborative filteringThe items are
recommended based on user’s rating.
The coverage of rating could be very sparse; the new
items would not be recommended; algorithm is
not so efficient.
Knowledge-based approaches
It does not rely on the existence of a purchase
history.
Detailed knowledge about items might be required.
Hybrid approachEfficient and more
accurate Not so simple.
Related WorkSocial Network Sites
Social network sites are Web-based services which enable online social networks or relationships.
Social network sites are one type of social media which is any platform where people can create, share, and exchange their activities, views, interests, experiences, or information.
26
Related WorkSocial Network Sites
Social media have become a part of our daily life. It is not easy for us not to notice people are
focusing on their mobile device to use Facebook or LINE no matter where they are.
User profiles, friends, and comments are three key components of social network sites.
27
Related WorkSocial Network Sites
Social network users have been growing drastically. There approximate 800 million users on
Facebook. Some even called it Facebook country. A social recommendation utilizes user's social
network and related information for recommendation.
28
Related WorkSocial Network Sites
Social network users have been growing drastically. There approximate 800 million users on
Facebook. Some even called it Facebook country. A common technique for social
recommendations is collaborative filtering. Based on two assumptions: people who are
socially associated are more likely to share the common interests and users can be easily influenced by the friends they trust. 29
Related WorkTourist Information Systems
A tourist information system is a system that provides travel guides, maps, information of accommodation and transportation.
A system that can recommends tourist attractions will be very helpful to any tourist.
30
System Design and Mechanism
In our design we aim at building a tourist information system which lets users access the attraction information either from an information kiosk.
The system is associated with Facebook. Whenever an interface device is equipped
with a RFID reader, users can to log into the system without typing the account and password by using a RFID card.
31
System Design and Mechanism
The interactions between users and attraction information website are shown as follows.
32
System Design and Mechanism
The system operation is shown as follows:
33
System Design and MechanismSystem Operation
1. Facebook App interface is available to users.
2. Users can access Facebook App to share, like, comment on, and rate the attractions using their Facebook account. The first user needs to choose their interest and can register their RFID cards.
3. A Web server and a database management system (DBMS) are running on a server machine.
4. Users can directly log into the system through a RFID card if they have registered their RFID card.
34
Personalized Social Recommendation (PSR)
1. Acquire users’ appraisal on each attraction and activities on the social network site.
2. Use collaborative filtering or keep track of activities on the social network site for recommendation.
3. Calculate the score for each attraction.
4. Rank attractions based on the score. 1. If the scores are same, then check the appraisal time. The
evaluation done in the more recent time obtains the higher rank.
5. Recommend the attractions with top 3 scores and show the attraction of the top 1 on the main page. 35
List of Recommendation Methods
Recommendation Method
1 Collaborative filtering based on users’ appraisal and trustability evaluations: Equation (1)(2)(3).
2 Social recommendation based on users’ activities in social network sites: Equation (4)(5).
3 Combination of 1 and 2: Equation (6).
36
Attraction appraisals of different users
User
Attraction
User0 User1 User2 User3 User4 User5
A1 10 8 4 10 7 10
A2 4 2 4 2 2 6
A3 8 4 6 8 8 10
A4 4 3 10 8 5 5
A5 5 2 8 10 4 10
37
Recommendation Methods Collaborative filtering
First calculate the average appraisal of kth attraction from all users.
Then evaluate the difference between a user and the mean value by the equation where is the appraisal of kth attraction from the userj.
The average trustability of the userj is calculated using Equation (1).
38
Recommendation Methods Collaborative filtering
(1) k indicates kth attraction; m indicates the total number of the appraisal that
userj gave; C is a constant value used to control the
difference degree; and the default value is set to 1.5. The larger C is, the less trustability is.
39
Recommendation Methods Collaborative filtering
Trustability of users
40
Attraction User1 User2 User3 User4 User5
A1 0.922 0.132 0.420 0.559 0.410
A2 0.523 0.723 0.532 0.523 0.273
A3 0.198 0.523 0.730 0.723 0.273
A4 0.252 0.132 0.359 0.667 0.667
A5 0.112 0.482 0.191 0.296 0.182
0.401 0.398 0.446 0.555 0.361
Recommendation Methods Collaborative filtering
Similarity matrix among users
41
ji 1 2 3 4 5 6 7 8 9 10
1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
2 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
3 0.8 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3
4 0.7 0.8 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4
5 0.6 0.7 0.8 0.9 1 0.9 0.8 0.7 0.6 0.5
6 0.5 0.6 0.7 0.8 0.9 1 0.9 0.8 0.7 0.6
7 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9 0.8 0.7
8 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9 0.8
9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9
10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recommendation Methods Collaborative filtering
(2) Si,j is the similarity between user i and user j to an
attraction; is the average similarity between user i and user
j; n is the number of attractions that both user i and
user j recommend.
42
Recommendation Methods Collaborative filtering
Average similarity between User 0 and other users
43
User1 User2 User3 User4 User5
User0 0.760 0.660 0.780 0.860 0.800
Recommendation Methods Collaborative filtering
(3) is the average appraisal weighting of user i for
each user j. Average appraisal weighting of User 0 for
each user j
44
User1 User2 User3 User4 User5
User0 R0,j 0.305 0.263 0.348 0.477 0.289
Recommendation MethodsSocial Network Activities
A user’s preference is evaluated using Equation (4).
The normalization of user’s rating is calculated using Equation (5) where Ri is the appraisal of kth attraction from the user i.
P = R + S + L + I (4) (5)
45
Recommendation MethodsCF plus Social Recommendation
We then combine Equation (3) for collaborative filtering and Equation (4) for social recommendation into Equation (6) where each method are given the weight 0.5.
T = 0.5R + 0.5P (6)
46
System Implementation and Experiments
Development environment
47
HardwareCPU Intel Core i5-560M, 2.67GHzMemory 4GB DDRIIINetwork Interface Card Atheros AR8131 PCI-E Gigabit
Ethernet ControllerSoftware
OS Windows 7 EnterpriseDevelopment Zend Studio 8.0.1、 Apache Programming Languages PHPSDK Facebook SDK for PHPDatabase PostgreSQL 9.0
System architecture
48
Map of Web pages
49
Main page
50
Page of an attraction
51
Experiment Results
In our experiments there are total 1360 records based on 20 attractions and 68 participants. We test 3 methods.
The better solution needs to include more attractions without being affected by low activities on social network sites.
52
Experiment ResultsCollaborative Filtering
53
Experiment ResultsSocial Recommendation
54
Experiment Results collaborative filtering and social
recommendation
55
Conclusions and Future Work
In this paper we present a recommendation mechanism that takes user’s social network into consideration.
The system has three key features:1. Social recommendation is integrated into the
system.2. Personalization is taken into account. 3. The system is practical, cost-effective, and
expandable.56
Conclusions and Future Work
In order to enhance the recommendation mechanism, more factors that could affect the recommendation should be investigated.
The other issue is to figure out more methods that can mine social network sites.
In the future, we can apply our design to other systems such as a museum information system.
57
58
Thanks for your listening!Q & A