31
Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience 黃振修 (Chris Huang) SPN 主動式雲端截毒技術架構師

Scaling big-data-mining-infra2

Embed Size (px)

Citation preview

Page 1: Scaling big-data-mining-infra2

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience

黃振修 (Chris Huang)SPN 主動式雲端截毒技術架構師

Page 2: Scaling big-data-mining-infra2

About Me

• SPN 主動式雲端截毒技術架構師• SPN Hadoop 基礎運算架構師• Hadoop in Taiwan 2013 講師• Hadoop.TW 活躍成員

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 2

Page 3: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.

The Journey to Big Data

3

Page 4: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 4

YesterdayYesterdayYesterdayYesterday~40 Hadoop nodes

~15 Service/user accounts

3 Teams

<50 TB storage

<100 Jobs per day

Page 5: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 5

TodayTodayTodayToday~200 Hadoop nodes

~130 Service/user accounts

11 Teams

~500 TB storage

>16000 Jobs per day

Page 6: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 6

1 MapReduce Job Submitted

Each 5.4 Seconds

Page 7: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 7

Why?Why?Why?Why?

Raw DataActionable Intelligence

Page 8: Scaling big-data-mining-infra2

Collaboration in the underground

Page 9: Scaling big-data-mining-infra2
Page 10: Scaling big-data-mining-infra2
Page 11: Scaling big-data-mining-infra2
Page 12: Scaling big-data-mining-infra2
Page 13: Scaling big-data-mining-infra2
Page 14: Scaling big-data-mining-infra2

網路威脅呈現爆炸性的成長

New Unique Malware Discovered

各式各樣的變種病毒、垃圾郵件、不明的下載來源等等,這些來自網路上的威脅,躲過傳統安全防護系統的偵測,一直持續呈現爆炸性的成長,形成嚴重的資安威脅

1M

unique

Malwares

every

month

1M

unique

Malwares

every

month

Page 15: Scaling big-data-mining-infra2

Reality Check

2011

New Unique Threats per Hour(worldwide estimate*)

NetworkWorms

Threats Found in Enterprises(Real-world data from 150+ assessments*)

Data-StealingMalware

IRCBots

TargetingMalware

COMPLEXITY

DA

NG

ER

Dangerous RisksSkyrocketing Volume Avoiding Detection

42%

56%

77%

100%2010200920082007

12600

NEW Threat Every

0.28Seconds

2400

• 52% of companies failed to report or remediate a cyber breach in 2011. --- SAIC, 2011

• Two new pieces of malwares are created every second. ---Trend Micro, 2012

• A cyber intrusion occurs every 5 minutes. --- US CERT 2012

Page 16: Scaling big-data-mining-infra2

Traditional approach is no more sufficient!

Page 17: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.

Big Data Exploration

17

Page 18: Scaling big-data-mining-infra2

New approach for cyber threat solution

Web CrawlerWeb Crawler

Trend Micro

Endpoint Protection

Trend Micro

Endpoint Protection

Trend Micro

Mail Protection

Trend Micro

Mail Protection

Trend Micro

Web Protection

Trend Micro

Web Protection

HoneypotHoneypot

CDN / xSPCDN / xSP Researcher

Intelligence

Researcher

Intelligence

3+ Billion Worldwide Sensors

Page 19: Scaling big-data-mining-infra2

SPN: Smart Protection Network

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 19

Collects

Protects

Identifies

BIGDATA

ANALYTICS(Data Mining,

Machine Learning,

Modeling, Correlation)

DAILY STATS:• 7.2 TB data correlated

• 1B IP addresses

• 90K malicious

threats identified

• 100+M good files

Page 20: Scaling big-data-mining-infra2

SPN High Level Architecture

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 20

Receiver

Trend Message Exchange (Message Bus)

Hadoop Distributed File System (HDFS)

HBaseMapReduce

Adhoc-Query (Pig)

Oozie

CDN/xSP

Log

Honey

Pot

SPN

Feedback

Data SourcingData Sourcing

APP 1

MySPN Platform

Solr Cloud

API Server/Portal

Service Platform

APP 2

Service DeliveryService Delivery

Page 21: Scaling big-data-mining-infra2

MySPN Ecosystem

Portal

& API

Single

Entry-Point

SPN Infrastructure

APT KB Service

TopCVE Service

APT KB

VE DB

FB Logs

Census

MySPN

Market Place

Service Platform

SSO

New App

OPS RD / Team

Monitor SDK

All My

Guard

Threat

Connect

Dashboard

Service

Catalog

Census

Profile Alert

New App

Dispatcher

Access

Login

Trender

Need

Solution

Customer

Publish

ImplementOperate

Develop

Solution

backed-by

Data Catalogue

Page 22: Scaling big-data-mining-infra2

SPN Solution Architecture

File

URLWeb / URL

Email

Domain

IP

File Reputation ServiceFile Reputation Service

Email Reputation ServiceEmail Reputation Service

Custo

mer

Sm

art P

rote

ctio

n

Community Intelligence

(Feedback loop)

Web Reputation ServiceWeb Reputation Service

SourcingProcessing

& Analysis

Validate &

Create Solution

Quality

Assurance

Solution

Distribution

Solution

Adoption

SPN Correlation

Page 23: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.

Big Data Case Study

23

Page 24: Scaling big-data-mining-infra2

Internet Web Server

4. Access page1. Intercept URL

SPN Cloud

9/10/2013 24

200K+ new URL created every day

Case Study: Web Reputation Services

Page 25: Scaling big-data-mining-infra2

8+ billions URL process daily

User Traffic / Sourcing

CDN vender

Rating Server for Known

Threats

Unknown & Prefilter

Page Download

Threat

Analysis

8 billions/day

4.8 billions/day

860 millions/day

40% filtered

82% filtered

25,000 malicious URL /day

99.98% filtered

Trend Micro

Products / Technology

CDN Cache

High Throughput Web Service

Hadoop Cluster

Web Crawling

Machine Learning

Data Mining

Technology Process Operation

Block malicious URL within 15 minutes once it goes online!

Page 26: Scaling big-data-mining-infra2

WRS Architecture Overview

Page 27: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc.

Big Data Lesson Learned

27

Page 28: Scaling big-data-mining-infra2

How to Scale?

• Un-structure data first

• If you really need structure data

– Use Google Protocol Buffers or

– JSON string

• Purify your data before processing

• Leverage HBase more

– Well design row key to prevent hot-spot

• Use MapReduce to create Lucene index

• Leverage SolrCloud for complex real-time use cases

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 28

Page 29: Scaling big-data-mining-infra2

Our Learning

• Has clear strategy first

• Start small, scale quickly

• Chose right solution for right problem

Page 30: Scaling big-data-mining-infra2

Q&A

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 30

Page 31: Scaling big-data-mining-infra2

9/10/2013 Confidential | Copyright 2013 TrendMicro Inc. 31

Big ChallengeBig Opportunity

Thank You