View
267
Download
3
Category
Preview:
DESCRIPTION
Citation preview
Mucho Big Data ¿y la Seguridad para cuándo?
July 9, 2013
Juan Carlos Vázquez
Sales Systems Engineer, LTAM
"Los datos personales son el petróleo del siglo XXI"
Una montaña de datos
>15000 Millones
Dispositivos Conectados2
(15B)
1. IDC “Server Workloads Forecast” 2009. 2.IDC “The Internet Reaches Late Adolescence” Dec 2009, extrapolation by Intel for 2015 2.ECG “Worldwide Device Estimates Year 2020 - Intel One Smart Network Work” forecast 3. Source: http://www.cisco.com/assets/cdc_content_elements/networking_solutions/service_provider/visual_networking_ip_traffic_chart.html extrapolated to 2015
En 2015… Mayor demanda para los Data Centers
>1000 Million Mas
Netizen’s1
(1B)
>1 Zetabyte Tráfico
en Internet3 (1000 Exabytes)
Source: IDC, 2011 Worldwide Enterprise Storage Systems 2011–2015 Forecast Update. Worldwide Enterprise Storage Consumption Capacity Shipped by Model, 2006–2015 (PB)
2.7 ZB de datos en 2012, 15,000 milliones de dispositivos conectados en 2015
Alrededor de 24 Petabytes De datos procesados por Google* al día en 2011
4,000 milliones Piezas de contenido compartidas en Facebook* cada día (Julio 2011)
250 milliones …de Tweets por día en Octubre de 2011
5.5 milliones Emails (legítimos) por segundo en 2011
Una explosión de datos
Más datos…
6
En 2020, el volumen de información será de 35.2 Zettabytes En el 2020, el volumen de información digital alcanzará los 35.2 Zettabytes (1 ZB es igual a 1 trillón de GB), frente al 1.8 ZB de 2010. Ese crecimiento exponencial de los datos hace de Big Data la fuerza motriz de la era de la información, de acuerdo con estimaciones de Sogeti, compañía del Grupo Capgemini. Por su parte, la consultora Gartner afirma que las empresas capaces de tener información más valiosa, procesarla y administrarla, obtendrán resultados financieros un 20% mejor que sus competidores.
Un caso
El New York Times usó 100 instancias de Amazon EC2 y Hadoop para procesar 4 TB de datos en imágenes TIFF y obtener 11 millones de PDFs en 24 hrs a un costo de $240 usd
http://en.wikipedia.org/wiki/Apache_Hadoop
Otro caso
Los clusters para Hadoop en Yahoo! cuentan con 40,000 servidores y almacenan 40 petabytes de datos, y donde el cluster mayor es de 4,000 sevidores
http://www.aosabook.org/en/hdfs.html
Solo un caso más
En 2010 Facebook declaró que tenía el cluster de Hadoop mas grande del mundo con 21 PB. En 2011 anunció que había crecido a 30PB y hacia la mitad de 2012 alcanzó los 100PB. En Noviembre 8, 2012 ellos anunciaron que su almacen de datos crece casi la mitad de un PB por día.
http://en.wikipedia.org/wiki/Apache_Hadoop
Big Data
10
Es un término aplicado a conjuntos de datos que superan la capacidad del software habitual para ser capturados, gestionados y procesados en un tiempo razonable. Los tamaños del “Big Data" se encuentran constantemente en movimiento creciente, de esta forma en 2012 se encontraba dimensionado en un tamaño de una docena de terabytes hasta varios petabytes de datos en un único data set (Cantidad Max.256GB RAM, hasta 24 TB por computadora) Los retos incluyen la captura, el procesamiento, el almacenamiento, el compartir inteligencia, el análisis y la visualización. Beneficio para el sector Salud, Financiero, Telcos, Energía, Tráfico, Marketing, Manufactura, Seguridad… quién hará la pregunta correcta?
The four Vs
11
• Volume. When the term big data is used, data volume typically ranges multiple terabytes to petabytes. This certainly fits the enterprise security model as it is not uncommon for large organizations to collect tens of terabytes of security data on a monthly basis.
• Velocity. This term is used with respect to real-time data analysis requirements. In cybersecurity, velocity can refer to the need for immediate anomaly, or incident detection. Real-time data analysis is critical here to minimize damages associated with a cybersecurity attack.
• Variety. Big data can be made up of multiple data types and feeds including structured and unstructured data. From a security perspective, data variety could include log files, network flows, IP packet capture, external threat/vulnerability intelligence, click streams, network/physical access, and social networking activity, etc. It is not unusual for enterprises to collect hundreds of different types of data feeds for security analysis.
• Veracity. Big data must be trustworthy and accurate. From a security perspective, this means trusting the confidentiality, integrity, and availability of data sources like log files and external data feeds.
Thousands of Events
The Big Security Data Challenge
BILLIONS OF EVENTS
Correlate Events Consolidate Logs
Perimeter
APTs
Cloud
Data
Insider
BILLIONS OF EVENTS
The Security Dilemma
MONITORING TECHNIQUES MUST ADVANCE
VISIBILITY
INSTRUMENTATION
Instrumentation and data collection are still critical, but applying filters derived from intelligence is the path to achieving better security.
Big Data vs. Big Security Data
Datasets whose size and variety is beyond the ability of typical database software to capture, store, manage and analyze.
Understanding Security Data As Big Data
• How do I gather security context?
• How do I manage big security information?
• How do I make security information management work?
BIG DATA
BIG SECURITY DATA
• Size of Security Data doubling annually
• Advanced threats demand collecting more data
• Legacy data management approaches failing
• SIEM use shifting from compliance to security
Security Big Data is about matching security intelligence with the right collected data.
Gartner says…
• The amount of data analyzed by enterprise information security organizations will double every year through 2016.
• By 2016, 40% of enterprises will actively analyze at least 10 terabytes of data for information security intelligence, up from less than 3% in 2011.
• By 2016, 40% of Type A enterprises will create and staff a security analytics role, up from less than 1% in 2011.
Goal…
One of the primary drivers of security analytics will be the need to identify when an advanced targeted attack has bypassed traditional preventative security controls and has penetrated the organization.
Needle in a Datastack
17
• Organizations are storing approximately 11-15 terabytes of security data a week. • The ability to detect data breaches within minutes is critical in preventing data loss, yet
only 35 percent of firms stated that they have the ability to do this. • In fact, more than a fifth (22 percent) said they would need a day to identify a breach,
and five percent said this process would take up to a week. On average, organizations reported that it takes 10 hours for a security breach to be recognized.
• Nearly three quarters (73 percent) of respondents claimed they can assess their security status in real-time and they also responded with confidence in their ability to identify in real-time insider threat detection (74 percent), perimeter threats (78 percent), zero day malware (72 percent) and compliance controls (80 percent). However, of the 58 percent of organizations that said they had suffered a security breach in the last year, just a quarter (24 percent) had recognized it within minutes. In addition, when it came to actually finding the source of the breach, only 14 percent could do so in minutes, while 33 percent said it took a day and 16 percent said a week.
The study, conducted by research firm Vanson Bourne, interviewed 500 senior IT decision makers in January 2013, including 200 in the USA and 100 each in the UK, Germany and Australia.
Datos útiles…de Verizon 2012
18
• “84% de los incidentes de seguridad (intrusiones exitosas) se han reflejado en los logs”
• “Sólo el 8% de los incidentes de seguridad detectados por las empresas han sido por minar sus logs”
Normalización
19
What else happened at this time?
Near this time?
What is the time zone?
What is this service? What other
messages did it produce?
What other systems does it run on?
What is the hosts IP address?
Other names? Location on the
network/datacenter?
Who is the admin? Is this
system vulnerable to exploits?
What does this number
mean? s this documented
somewhere?
Who is this user? What is the users
access-level? What is the users
real name, department, location?
What other events from this user? What is this port? Is this a
normal port for this
service? What else is this
service being used for?
DNS name, Windows name, Other names?
Whois info? Organization owner? Where does
the IP originate from (geo location info)? What
else happened on this host? Which other hosts
did this IP communicate with?
SIEM is Still Evolving …Beyond Logs
SEM + SIM = SIEM
SIEM is the Evolution and Integration of Two Distinct Technologies Security Event Management (SEM)
― Primarily focused on Collecting and Aggregating Security Events
Security Information Management (SIM) ― Primarily focused on the Enrichment,
Normalization, and Correlation of Security Events
Security Information & Event Management (SIEM) is a Set of Technologies for: Log Data Collection
Correlation
Aggregation
Normalization
Retention
Analysis and Workflow
1 2 3
Three Major Factors Driving the Majority of SIEM Implementations
Real-Time Threat Visibility
Security Operational Efficiency
Compliance and/or Log Management Requirements
The State of SIEM
Antiquated Architectures Force Choices Between Time-to-Data
and Intelligence
Events Alone Do Not Provide Enough Context to Combat Today’s Threats
Complex Usability and Implementation Have Caused
Costs To Skyrocket
00001001001111
11010101110101
10001010010100
00101011101101 VS
Legacy SIEM REALITY:
Turns Security Data Into Actionable Information
Provides an Intelligent Investigation Platform
Supports Management and Demonstration of Compliance
SIEM Promise:
Shifting from Compliance to Security
23
Source: InformationWeek 2012 Security Information and Event Management Vendor Evaluation Survey of 322 business technology professionals, April 2012
SIEM as solution to detect CyberAttacks
Medium Risk High Risk
Global Threat Intelligence and SIEM
McAfee Labs IP Reputation Updates
GOOD SUSPECT BAD
IP REPUTATION CHECK
Botnet/ DDos
Mail/ Spam
Sending
Web Access
Malware Hosting
Network Probing
Network Probing
Presence of Malware
DNS Hosting Activity
Intrusion Attacks
AUTOMATIC IDENTIFICATION
AUTOMATIC RISK ANALYSIS VIA ADVANCED CORRELATION
ENGINE
GTI with SIEM Delivers Even Greater Value
Sorting Through a Sea of Events…
200M events
18,000 alerts and logs
Dozens of endpoints
Handful of users
Specific files breached
(if any)
Optimized response
RESPOND
Have I Been Communicating With Bad Actors?
Which Communication Was Not Blocked?
What Specific Servers/Endpoints/ Devices Were Breached?
Which User Accounts Were Compromised?
What Occurred With Those Accounts?
How Should I Respond?
Manejo de Eventos…
Priorizar los eventos de seguridad
De arriba hacia abajo…
Si bueno, con quién hablo?
Conocimiento de mi ambiente…
McAfee ESM
McAfee Starts at the Core
July 9, 2013 32
McAfee DB
• Real-time, complex analysis
• Indexing purpose-built for SIEM
• Massive context feeds with enrichment
• Historical retrieval and analytics
• Integrated log and event management
• No DBA required
SMART FAST
Scale, Analytical flexibility, Performance
Sitios Web Maliciosos…
33
El malware está aquí…
Spam y Bots en descenso…
Conclusiones… • Usar y encender tus Logs • Primero un Log Mgmt antes que un SIEM • No hay “balas de plata” • Gana el pensamiento vs la tecnología • Menos es más
• Windows Events Logs • Syslogs • DNS • App Logs • Context Awareness (Geolocation, Users, VM, Asset Mgmt, etc)
• Casos de uso , caso de uso, casos de uso! • Arquitecturas de Big Data
• Alta velocidad (I/O), horas para ver un reporte? O minutos para una vista? • Feeds de Seguridad (Sistemas de reputación) • Seguridad Interconectada
• IP mala de reputación automáticamente bloqueada por el IPS. • Equipo que tuvo contacto con IP maliciosa ser analizado desde el SIEM
“If you’re in a fight, you need to know that while it’s happening, not after the fact”
Recommended