19
Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Big Data Analytics for Network Resiliency John S. Eberhardt III Adjunct Professor - Volgenau School of Engineering, George Mason University Partner - 3E Services, LLC 8 December 2015

John Eberhardt NSTAC Testimony

Embed Size (px)

Citation preview

Page 1: John Eberhardt NSTAC Testimony

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Big Data Analytics for Network Resiliency

John S. Eberhardt III

Adjunct Professor - Volgenau School of Engineering, George Mason University

Partner - 3E Services, LLC

8 December 2015

Page 2: John Eberhardt NSTAC Testimony

Professional Background

• Adjunct Professor at the Volgenau School, George Mason University

• Partner and Founder, 3E Services (data consulting)• Founder and former Chief Scientist at Decision Q Corp

(machine learning)• 48 publications and conference presentations

Disclaimer: This presentation represents the personal opinions of Mr. Eberhardt, based upon his professional experience. It should not be viewed as a complete overview of the sector, and does not represent the institutional views of either George Mason University or 3E Services, LLC.

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 3: John Eberhardt NSTAC Testimony

BiographyJohn is a Data Scientist with nearly 20 years of experience in the Analytical Sector. John has led the development of multiple advanced analytical products and methods, managing teams of scientists and engineers to rapidly create customer-centered analytical solutions.

With one patent and five patent applications in process and over 35 publications, John is a thought leader in advanced analytics with experience in machine learning, statistical algorithms, and user interface design for decision support in Security, Healthcare, Financial Services, Life Sciences, and Consumer Products.

John has developed over 20 analytical solutions in clinical decision support, cyber security, molecular diagnostics, risk management, and product marketing including award winning healthcare quality applications. John has applied his expertise with the Department of Defense, Altamira, Roche, Genentech, Novartis, Walter Reed Army Medical Center, Memorial Sloan Kettering, the University of Wisconsin, University of Mississippi, and Thomas Jefferson University among others. John has a BA Cum Laude from Duke University in Economics and History.

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 4: John Eberhardt NSTAC Testimony

Understanding the NSTAC Scoping Report

• Explore how private sector data sets and infrastructural resources can be utilized can be utilized in support of its national security and emergency preparedness activities

• Create policies to:• Identify current and emerging big data sets within the public and

private sectors• Select and/or develop models that will further encourage information

sharing• Access big data sets to support NS/EP capabilities, when appropriate

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 5: John Eberhardt NSTAC Testimony

Key Questions I Will Focus On

• How is data being created and collected?• What data is identified and made available for analysis?• How do we get the data to work with the analytics?• How do the analytic outputs support the mission?

Key Definition of a term I will be using in this briefing:An Ontology is a system of knowledge.

(My apologies if you already know this)

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 6: John Eberhardt NSTAC Testimony

Big Data Process

• NIST Draft Special Publication 1500-6 describes a big data analytics value chain• Collection, Preparation/Curation, Analysis, Visualization, and Access

• This briefing is focused on the issues of Preparation/Curation, Analysis, and Access• How they relate to the NSTAC scoping questions of data collection,

access, analysis, and use

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 7: John Eberhardt NSTAC Testimony

Network Behavior and Resiliency

Key Aspects of Understanding Network Behavior and Resiliency

• Highly polymorphic, subject to emergent behavior• Emergent Behavior defined as: “the arising of novel and coherent

structures, patterns and properties during the process of self-organization in complex systems” (1)

• e.g., Program Trading in the 1987 stock crash (2)

• Facilitated by the DOT program of the NYSE• Systems behaved rationally locally but irrationally globally• Data was coming too quickly for humans “One notable problem was the

difficulty gathering information in the rapidly changing and chaotic environment.” 2

• This is an extremely challenging data collection and analysis problem

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 8: John Eberhardt NSTAC Testimony

Network Behavior and Resiliency – Cont.

Key Aspects of Understanding Network Behavior and Resiliency

• Data structures and formats in networking data are semantically inconsistent• This makes curating data for use in analytics extraordinarily labor

intensive; and • Handicaps the ability to use computers to detect emergent patterns

• What does “RFC” mean?• While there is a great deal of established syntax, networking is

continuously evolving• This makes creating knowledge structures difficult – biology changes,

but slowly – the internet changes suddenly

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 9: John Eberhardt NSTAC Testimony

Bottom Line Up Front

We don’t have a technology problem. We have an understanding problem.

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”

Antoine de Saint-Exupéry

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 10: John Eberhardt NSTAC Testimony

Where are we today? Cyber Response

• Forensic, Backward looking, heavily human dependent with a need for high technical competency

• Examples (Technology and Commercial Response)• Antivirus/Anti-Malware: Symantec, Intel/McAfee, Kaspersky,

Microsoft, F5, Barracuda, Palo Alto Networks• Signature and forensic based

• Threat Intelligence: Norse, FireEye, Barracuda, Palo Alto Networks• Focused on specific threats and activity and requires user subscription• Doesn’t provide countermeasure

• Analytics: Palo Alto Networks, Splunk• A good start, but still very rudimentary

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 11: John Eberhardt NSTAC Testimony

Where are we today? Big Data in Cyber

• Big data technologies are sufficient to address this problem• The proprietary and open source tools for data collection,

storage/retrieval, and analysis are more than adequate in their own right – the challenge is access to data and semantic structuring

• Competing, limited standards• Message formats focused on sharing threat intel rather than

providing a knowledge/semantic structure for raw data analysis• e.g., TAXII, STIX, CybOX• Useful for sharing threat intel but do not provide the knowledge

structure needed to create common analysis of raw network traffic• Structure for moving raw analytical data limited (e.g., Cisco

NetFlow)• Very high level

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 12: John Eberhardt NSTAC Testimony

Where are we today? Big Data in Cyber Cont.• Currently Human-to-Human Oriented Exchange

• Cyber Threat Intelligence Integration Center still being stood up• DHS has a number of collaboratives and working groups – this is a

great first step, but to my knowledge no one is working toward a common semantics (ontology)

• We need a common ontology to anchor analysis and research – a language of cyber that allows us to compare results objectively

• Technology is advanced, not accessible (cost and sophistication limits it to government and big corporations)• Cloud may help by moving basic information technology services to

shared providers that have the scale to protect them (e.g., AWS)• However, individual devices cannot be protected this way – medical

devices and IPv6/IoT are a great example of continued attack surface• Configuring and using current tools, especially in network security,

requires an extraordinarily high level of technical and subject matter expertise

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 13: John Eberhardt NSTAC Testimony

Gaps in the Current Architecture

• Availability of Raw Data for Research and Development• Exchange Standards with Semantic Structure to make

research results objectively comparable• Accessible Tools to implement findings beyond large

organizations

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 14: John Eberhardt NSTAC Testimony

Lack of Raw Data

• data.gov• 188,420 data sets• Number with actual IP traffic data: 2 (3)

• “PCAP, PCAP everywhere, but not a drop to drink” (Borrowed from Samuel Taylor Coleridge)

• Other Data Sets?• NETRESEC – private company, data from CTF exercises• DHS PREDICT – useful, more current, but relatively narrow and

focused on static problems• Only one intrusion detection data set from 2005-2010 (U Wisconsin –

only log data) with attack events, not raw data• SKAION simulation • C-State and Merit Network – flow data

• AFRL DARPA Intrusion Detection Data Set – 1998!• Here’s the key: almost all of it is simulated!

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 15: John Eberhardt NSTAC Testimony

Exchange Standards with Semantic Structure

• NetFlow is inadequate• Threat Intel interchange standards do not support raw data

analytical structures• Current standards are like disease coding in healthcare

• But we also have extensive languages (SNOMED, LOINC) for describing the components of the systems underlying the diagnosis to support systems research

• In network security, we have developed standards for the diagnosis, but not to describe the system (RFCs provide syntax, but no ontology)• This means that structuring real data for analysis is extremely time

consuming and challenging• This also means that the terms of reference for different analysis projects

can be radically different, making comparison of research conclusions extremely challenging

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 16: John Eberhardt NSTAC Testimony

Accessible ToolsTools are either too rudimentary, or require a high level of technical sophistication. We need to enable humans to detect patterns, and take action, sooner.

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 17: John Eberhardt NSTAC Testimony

Recommendations

• Facilitate making data sets available through creating the conditions for safe, trusted sharing on a broad basis• This may mean letting a few shady characters under the tent!

• Facilitate data exchange standards so the data is meaningful – form an ontology working group like SNOMED• Make the results of data sharing comparable

• Facilitate the development and delivery of tools that are accessible• Support research in basic methodologies for making data analysis

more broadly accessible

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 18: John Eberhardt NSTAC Testimony

Thank You!

Thank you for your time today!

If you want to reach me:[email protected]

“And so, in the end, the only thing that fails to conform to our wishes is reality”

Vaclav Havel

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015

Page 19: John Eberhardt NSTAC Testimony

References

1. Goldstein, Jeffrey, Emergence: Complexity and Organization 1 (1): 49–72, doi:10.1207/s15327000em0101_4

2. Mark Carlson, Board of Governors of the Federal Reserve, November 2006

3. Based on “OR” keyword search of data.gov using the following terms: PCAP, PCAP Data, PCAP Files, IP Traffic, IP Network Traffic, Internet traffic, Netflow, raw IP network data, internet protocol data, network log files, netflow data

Big Data Analysis for Network Resiliency – NSTAC – John S. Eberhardt III – 8 Dec 2015