View
1.823
Download
9
Category
Preview:
DESCRIPTION
URL 실시간 UV/PV 집계 사례를 통해 보는 빅데이터 실시간 데이터 분석
Citation preview
실시간 URL UV/PV 집계 사례를 통해 보는
'빅데이터 실시간 데이터 분석'
다음커뮤니케이션 유대은
moongtook@daumcorp.commoongtook@hanmail.netmoongtook@gmail.com
빅데이터 분석
Batch vs Real Time
Query = Function (All Data)
Big data - Nathan Marz and James Warren, http://www.manning.com/marz/http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm
Big data analytics - Batch (Hadoop)
MapReduce Job = Function (All Data)
Big data analytics - Batch (Hadoop)
Big data - Nathan Marz and James Warren, http://www.manning.com/marz/http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm
http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm
Big data analytics - Batch (Hadoop)
Big data analytics - Batch (Hadoop)
http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm
Big data analytics - Real Time (Storm)
Query = Function (Data Stream)
Data Stream을 바라보고 실시간으로 바로 분석Fast, Incremental algorithm
Topology = Function (Data Stream)
Big data analytics - Real Time (Storm)
Strom은 Data Stream을 바라보며 실시간으로 데이터를 처리하기 위한 좋은 인프라
https://github.com/nathanmarz/storm
http://www.infoq.com/presentations/Storm
spout
bolt
http://www.infoq.com/presentations/Storm
A spout is a source of streams
A bolt consumes any number of input streams, does some processing
Storm - cluster
Storm - cluster
distributed realtime computation infra
URL UV/PV 실시간 집계 사례
로그수집
https://github.com/moongtook/kestrel_tail
로그수집
https://github.com/moongtook/kestrel_tail
로그수집
로그분석
로그분석
로그 하나 꺼내오기
로그 하나 꺼내오기
URL의 UV/PV 카운트 올리기
URL의 UV/PV 카운트 올리기
Inside of Redis
URL의 UV/PV 카운트 저장하기
row key 1 super column 1 super column 2 ...
column name 1 column name2 column name 1 column name2 ...
column value column value column value column value ...
Cassandra column family
row key 2 super column 1 super column 2 ...
column name 1 column name2 column name 1 column name2 ...
column value column value column value column value ...
... ... ... ...
URL의 UV/PV 카운트 저장하기
6ed6a80a162365e78e2716d49508d974_2012-10-24
... 20:01 20:02 ...
... minutely_pv minutely_uv hourly_pv hourly_uv daily_pv daily_uv minutely_pv ... ...
... 212 202 5220 4576 233997 155723 151 ... ...
Henessy column family schema
bc2ed9981fae01adda327bcd7e2a3576_2012-10-24
... 20:01 20:02 ...
... minutely_pv minutely_uv hourly_pv hourly_uv daily_pv daily_uv minutely_pv ... ...
... 388 383 9839 8163 597338 299751 364 ... ...
... ... ... ... ...
URL의 UV/PV 카운트 저장하기
md5( reversed url) + date
Search, Aggregation, Ranking을 위해지난 1분동안 UV/PV 변화가 있었던 컨텐츠만...
Greenplum에도 저장하기
Secondary Index Pattern
2012-10-24_20_01 ... 6ed6a80a162365e78e2716d49508d974_2012-10-24 bc2ed9981fae01adda327bcd7e2a3576_2012-10-24 ...
... null null
Greenplum에도 저장하기
2012-10-24_20_02 ... 6ed6a80a162365e78e2716d49508d974_2012-10-24 bc2ed9981fae01adda327bcd7e2a3576_2012-10-24 ...
... null null
2012-10-24_20_03 ... 6ed6a80a162365e78e2716d49508d974_2012-10-24 bc2ed9981fae01adda327bcd7e2a3576_2012-10-24 ...
... null null
URL UV/PV 실시간 집계 사례
Fault-tolerant
장애 허용 시스템(Fault tolerant system)은구성 부품의 일부가 고장나도 정상적으로 처리를
수행하는 시스템 이다. - 위키백과
http://ko.wikipedia.org/wiki/장애_허용_시스템http://en.wikipedia.org/wiki/Fault-tolerant_design
Human Fault-tolerant
Human Fault-tolerant
Big data - Nathan Marz and James Warren, http://www.manning.com/marz/http://strataconf.com/strata2013/public/schedule/detail/27610
URL UV/PV 실시간 집계 사례
URL UV/PV 실시간 집계 사례
Lamda architecture
Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
Lamda architecture
Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
Lamda architecture
Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
Lamda architecture
Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
Twitter summingbird - https://speakerdeck.com/sritchie/summingbird-streaming-mapreduce-at-twitter
Lamda architecture
Lamda architecture
Twitter summingbird - https://speakerdeck.com/sritchie/summingbird-streaming-mapreduce-at-twitter
끝!
Recommended