Upload
lisanl
View
264
Download
1
Embed Size (px)
Citation preview
© 2015 IBM Corporation
What’s new in Toolkits
IBM Streams 4.1
Ankit Pasricha
Toolkits Team Lead
2 © 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2015 IBM Corporation
Agenda
(New) Spark MLLib Toolkit
(New) Cybersecurity Toolkit
(New) Distributed Process Store (DPS) Toolkit
Messaging Toolkit
Geospatial Toolkit
Text Toolkit
Other updates
4 © 2015 IBM Corporation
Combines the power of Spark MLLib and real-time streaming capabilities of
Streams
Allows scoring of real-time streaming data using Spark models
Github project
http://ibmstreams.github.io/streamsx.sparkMLLib/
Support for a number of MLLib models• Classification
• Linear SVM
• Naive Bayes
• Clustering
• KMeans
• Collaborative Filtering
• Regression
• Isotonic
• Linear
• Logistic
• Tree
• Decision Tree
• Gradient Boosted Trees
• Random Forest
Spark MLLib Toolkit
5 © 2015 IBM Corporation
Streams + Spark Demo
Incidents
Calls for Service
(911, etc)
311
Code Violations
Permits
Buildings Apache Spark
MLlib
hdfs
Historical City Data Sets
Model :
Is this call for
service a
false alarm?
Real-time
Calls for Service
Real-time
Predictions &
Relevant Context
IBM
Streams
Real-time
Dashboard
6 © 2015 IBM Corporation
Resources
Getting Started Guide:
https://developer.ibm.com/streamsdev/docs/getting-started-with-the-
spark-mllib-toolkit/
Documentation:
http://ibmstreams.github.io/streamsx.sparkMLLib/com.ibm.streamsx.
sparkmllib/doc/spldoc/html/
MLLib Guide: https://spark.apache.org/docs/latest/mllib-guide.html
Samples:
https://github.com/IBMStreams/streamsx.sparkMLLib/tree/master/sa
mples
7 © 2015 IBM Corporation
Cybersecurity Toolkit
The toolkit can detect active threats occurring within a network in
real-time.
Contains 3 machine-learning cybersecurity models:
DomainProfiling: Capable of analyzing DNS response records and reporting
on whether any domains are behaving suspiciously
HostProfiling: Capable of analyzing DNS response records and reporting if
individual hosts are behaving suspiciously
PredictiveBlacklisting: Capable of analyzing DNS response records and
predicting if a domain should be added to an internal blacklist
8 © 2015 IBM Corporation
Resources
Introduction to Cybersecurity toolkit:
https://developer.ibm.com/streamsdev/docs/detect-active-threats-in-
real-time-streams-cybersecurity-toolkit/
Getting Started Guide:
http://ibmstreams.github.io/streamsx.documentation/docs/4.1/cybers
ecurity/cybersecurity-getting-started/
Documentation: http://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.strea
ms.toolkits.doc/spldoc/dita/tk$com.ibm.streams.cybersecurity/tk$co
m.ibm.streams.cybersecurity.html?lang=en
Starter Apps:
https://github.com/IBMStreams/streamsx.cybersecurity.starterApps
9 © 2015 IBM Corporation
Distributed Process Store (DPS) Toolkit
Allows sharing of data across operators, Streams applications and
Streams and other applications.– Provides a collection of APIs in Java, C++ and SPL to read/write from redis
– Support for Redis 2.8.x and 3.0
Java Example: Creating a distributed store
10 © 2015 IBM Corporation
Distributed Process Store (DPS) Toolkit
Java Example: Acquiring a distributed lock
Java Example: Writing data
11 © 2015 IBM Corporation
Resources
Documentation: https://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.strea
ms.toolkits.doc/spldoc/dita/tk$com.ibm.streamsx.dps/tk$com.ibm.st
reamsx.dps.html?lang=en
Samples:
https://github.com/IBMStreams/streamsx.dps/tree/master/com.ibm.st
reamsx.dps/samples
12 © 2015 IBM Corporation
Messaging Toolkit updates
Guaranteed Processing Support KafkaConsumer
On checkpoint: save the offset within the message log
On reset: Replay messages from offset
JMSSource Runs in a transacted session with MQ running in persistent mode
On checkpoint: Acknowledge read messages so they are removed from the queue
On reset: Start replaying any unacknowledged messages
Performance improvements in Kafka operators (Pre-release) Using new KafkaProducer API
Developed in github: https://github.com/IBMStreams/streamsx.messaging
RabbitMQ support (Pre-release)
Kafka 0.9 and Message Hub support on Bluemix
13 © 2015 IBM Corporation
Geospatial Toolkit Update
New PointMapMatcher operator
We have a map, and a set of imprecise points
coming from a GPS or some other source in real
time
The data may only have a certain inherent
precision
There may be errors due signal noise
The map itself may be imprecise or
incorrect
We want to clean and smooth this data one
point at a time to lock the incoming points to the
road network.
“Where is this entity right now?”
14 © 2015 IBM Corporation
Operator Details
14
PointMapMatcher
Entity Locations
Map Geometry Updates
Matches
Errors
15 © 2015 IBM Corporation15
Some use cases:
• Routing
• Traffic reports
• Transit scheduling
• Taxi/emergency dispatching
• Streams Dev article:
https://developer.ibm.com/streamsdev/do
cs/realtime-map-matching-in-streams-v4-
0-1/
16 © 2015 IBM Corporation
Text Toolkit update
Added support for AQLs generated from BI 4.0+ web tooling
2 Step process
– Step 1: Create an extractor in BI web tool
17 © 2015 IBM Corporation
Text Toolkit update
Step 2: Load the extractor in the TextExtract operator for execution
stream<DataToAnalyze, ReferencesFound> TextExtractOutput =
TextExtract(InputFromSocialMedia)
{
param
moduleSearchPath : "etc/extractor" ;
inputDoc : "text" ;
outputViews : "ProductSearch" ;
outputMode : "multiPort" ;
}
For more information: http://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.stream
s.toolkits.doc/spldoc/dita/tk$com.ibm.streams.text/tk$com.ibm.strea
ms.text.html?lang=en
18 © 2015 IBM Corporation
Other updates
Bluemix support HDFS Toolkit for Bluemix
Hbase Toolkit for Bluemix
More information: https://developer.ibm.com/streamsdev/docs/integrating-
streams-biginsights-hbase-service-bluemix/
Data Governance support HDFS Toolkit
DB Toolkit
Inet Toolkit
Messaging Toolkit
Webcast Replay: https://developer.ibm.com/streamsdev/docs/streams-v-4-1-0-
developer-conference-replay/
Support for BI 4.1 HDFS Toolkit
Hbase Toolkit