51
© 2013 IBM Corporation SmartCloud Application Performance Management and Analytics SmartCloud Analytics Log Analysis & Predictive Insights

SCAPM Technical Series Analytics

Embed Size (px)

DESCRIPTION

Technical Series

Citation preview

  • 2013 IBM Corporation

    SmartCloud Application Performance Management and Analytics

    SmartCloud Analytics Log Analysis & Predictive Insights

  • 2013 IBM Corporation 2

    Please note

    IBMs statements regarding its plans, direc3ons, and intent are subject to change or withdrawal without no3ce at IBMs sole discre3on. Informa3on regarding poten3al future products is intended to outline our general product direc3on and it should not be relied on in making a purchasing decision. The informa3on men3oned regarding poten3al future products is not a commitment, promise, or legal obliga3on to deliver any material, code or func3onality. Informa3on about poten3al future products may not be incorporated into any contract. The development, release, and 3ming of any future features or func3onality described for our products remains at our sole discre3on.

    Performance is based on measurements and projec3ons using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considera3ons such as the amount of mul3programming in the users job stream, the I/O congura3on, the storage congura3on, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

  • 2013 IBM Corporation 3

    Autonomic Operations

    Developer Productivity

    Deep Compression

    pureXML

    pureScale

    Pervasive Content

    Stream Computing

    Content Analytics

    Advanced Case Management

    Workload Optimized Systems

    Social Analytics/Consumer Insight

    2013

    2005

    Decision Management

    More than $16B in Acquisitions Since 2005

    More than 10,000 Technical Professionals

    More than 7,500 Dedicated Consultants

    Largest Math Department in Private Industry

    More than 27,000 Business Partner Certifications

    IBM Investment in Analytics

  • 2013 IBM Corporation 4

    Shifting market for IT Operations

    APM Digest survey* of Senior IT Ops @ Fortune 500 50% growing dissatisfaction with traditional performance

    management solutions for Production IT 75% of them are dissatisfied with their Business

    management solutions Inability to adapt to rapidly changing applications &

    workloads (Systems of Interaction) 30% of them believe that they do not have a way to

    proactively detect problems Looking to operate on raw data and gain actionable

    insights

    IT Analytics solutions can predict, detect and help solve problems by churning through piles of data and translating this to understandable, relevant information, and actionable insights.

    * Source: APMDigest: hEp://apmdigest.com/it-analy3cs-emerging-as-dissa3sfac3on-grows-with-apm-and-bsm-tools

    Operational Visibility

    IT Overwhelmed by data

  • 2013 IBM Corporation 5

    Business Value to IT Analytics Adoption

    Predic've Outage Avoidance

    Ensure availability of applica3ons and services

    Use learning tools to augment custom best prac3ces Leverage sta3s3cal methods to maximize predic3ve warning Use past maintenance to predict part failures

    Predict

    Faster Problem Resolu'on

    Find & correct problems faster with tools that determine ac3ons

    required to resolve issues

    Iden'fy problems quicker with insight to large unstructured repositories

    Isolate problems quicker by bringing relevant unstructured data into problem inves3ga3ons

    Repair problems quicker with the right details quickly to hand.

    Search

    Op'mized Performance

    Track, Op3mize, and Predict capacity and performance needs

    over 3me

    Track capacity and performance of applica3ons and services in classic and cloud environments Op3mize resource deployment with what-if and best t planning tools Increase u3liza3on of exis3ng assets

    Perform

    Improved Insight Enhance visibility into systems resource rela3onships while

    increasing customer sa3sfac3on

    Determine what resources are interdependent to assess impact of failures Gain insight into what is important to your customer

    Decrease customer churn and acquisi3on costs while increasing customer reten3on and sa3sfac3on

    Know

    Lower IT Administra'on Costs with Automated Analy'cs

    Escalate performance and capacity issues automa3cally, reducing manual analysis eorts Reduce manual customiza3on using learning tools that automa3cally adjust to new normals Detect and present problems with a proposed resolu3on, to be able to do more with less

  • 2013 IBM Corporation 6

    IT Data Requirements - Metrics, Events and Logs

    Logs

    Events

    Metrics

    When we need to resolve problems with workloads we typically look at three types of data. Metrics - Structured performance data Events - Discrete alerts Logs - Unstructured/semi-structured

    data

    Metrics and events can tell you what is happening.

    To answer why oSen need to look at the logs.

    IT professionals need Metrics, Events and Logs to resolve IT issues.

  • 2013 IBM Corporation 7

    Operations / Performance Data is Exploding

    A typical enterprise with 5000 servers, running 125 applications across 2 to 3 data centers generates in excess of 1.3 TB of data per day

    Only 3% of the data generated is opera3ons oriented metric data.

    97% is made up of unstructured/semi structured data Workloads are running on heterogeneous plaXorms.

    3%

    97%

    Data Ratio Metric Data Unstructured Data

  • 2013 IBM Corporation 8

    Network

    Systems

    Security

    Applications

    Voice

    Mainframe

    Storage

    Wireless

    Workloads Assets

    InfoSphere Streams

    InfoSphere BigInsights InfoSphere Informa'on Server

    more . . . IBM Watson

    IT Opera'onal Insight Pack Smarter Infrastructure Insight Pack

    BP / Customer driven Ecosystem

    Flexible Consump9on Models SaaS Embed On Premise Digital Download

    Faster problem detec'on and resolu'on Plan and op'mize Insight & Care

    Failure Risk Es'ma'on and Avoidance

    End Customer Client value

    Search Predict Op'mize

    SmartCloud Analytics Marketecture Overview

  • 2013 IBM Corporation 9 9 9

    SmartCloud Analytics

    Monitoring

    Solution

    SmartCloud Analytics delivers end-to-end problem resolution

    Add and search event data in Log Analytics

    Show events in the context of Log search

    Show log searches in the context of events

    Alerts generated from scheduled searches

    Detect and alert on anomalies based on trends observed in logs

    Search logs in the context of an anomaly event

    Link metrics in the context of Log search results.

    Add log data into

    APM and IT Dashboards

    Search metric data

    Integration with various types of solutions to accelerate end-to-end problem resolution and increase visibility into the IT systems*

    Metrics Events

    Logs, Support docs / Social data

    Problem / Anomaly Detection Solutions

    Event Management

    Solutions Cong / Topology

    Discovery and APM solutions

    Refine / scope search in logs and docs using topology and configuration context

    Service Desk Solution

    Tickets

    Search and analyze service tickets

    Search events, logs, docs with ticket context

    *Planned roadmap items

  • 2013 IBM Corporation 10

    Capacity Trending Server, Process, Middleware and DB Trending to automatically highlight risk while there is time available to avoid outages or slow-downs Extend with SPSS Correlation for maximum confidence

    Dynamic Thresholds automatically recommend and set thresholds based on attribute performance.

    Event Thresholds Manage flows to highlight important alerts: -- throttle floods -- escalate threshold events that are important

    Examples: Analytics for Operational Environments

    Dynamic Thresholds and Trending are built-in to Tivoli Monitoring for immediate value, and significantly reduce Set-up and Administration of Monitoring Environments

  • 2013 IBM Corporation 11

    Example: Capacity Analytics for Cloud & Virtualized Infrastructure Provides visibility of how Resources are Allocated to Applications and Services

    Cloud Consumers conservative in estimating their system needs (Over / Under Estimate) Understanding Historical behavior helps optimize Capital Management

    Know what Resources are Available and Predict how they will be Used Maintain awareness of total and available capacity Predict physical and virtual resource capacity bottlenecks Gain business agility by determining room for expansion via what-if analysis

    Optimize Resource Allocation Right-size virtual machines Policy-driven workload placement for

    performance and security optimization

    Visibility into the cloud infrastructure See and Manage all major Hypervisor

    environments from one place Leverage Perspec3ve from the Past to the

    Future to Ensure Resource availability

  • 2013 IBM Corporation 12

    IBM SmartCloud Analytics Log Analysis

    Search, and Index unstructured data to provide consolidated view

    Built on IBMs Big Data platform Integrate structured and

    unstructured data for better problem identification and resolution

    Extensible, with IBM and partner expertise built-in Get the last critical piece of data for identifying, isolating,

    and correcting problems faster

    Faster Problem Resolu'on

    Find & correct problems faster with tools that determine ac3ons

    required to resolve issues

    Iden'fy problems quicker with insight to large unstructured repositories

    Isolate problems quicker by bringing relevant unstructured data into problem inves3ga3ons

    Repair problems quicker with the right details quickly to hand.

    Search

    Delivers Problem Isolation and Faster Problem Resolution

  • 2013 IBM Corporation 13

    Collects large volumes of obscure unstructured data and transforms it through analytics into actionable intelligence.

    IBM SmartCloud Analytics Log Analysis

    GBs of

    Obscure Log Files

    Single Actionable Dashboard

    Intelligent Support Docs

    Integration through

    Advanced Text Analytics

    Insight Packs

  • 2013 IBM Corporation 14

    IBM SmartCloud Analy'cs Log Analysis helps IT Generalists and Applica'on Specialists accelerate problem resolu'on through rapid analysis of unstructured data

    IBM SmartCloud Analytics - Log Analysis Client Value

    Faster Problem Repair -By linking expert knowledge to log error/warning messages

    Improved Service Availability and Maintainability of Custom Apps

    - Provide users with advanced insights into custom applica'ons quickly

    Collec'on and Annota'on of data

    Generic Logs Support Federa'on of Data

    Faster Problem Identification and Isola'on

    Quickly search structured and unstructured data. Perform cross domain analysis on this data.

    Value Highlights

    Advanced Text Analy'cs Downloadable insight packs on the ISM Library star'ng with WebSphere and DB2

    Tools to create custom insight packs for your own applica'ons

  • 2013 IBM Corporation 15

    SmartCloud Analytics Roadmap Overview

    SmartCloud Analy'cs

    Log Analysis v1.1

    Workgroup Edi'on

    Release Key Dates Key Capabili'es

    Fast to install and download

    Data collec3on, annota3on and indexing

    Search UI WAS & DB2 insight

    packs Generic log support Insight pack tooling

    Enterprise scalability* Integra3on with Tivoli Monitoring and Event Management Solu3ons Addi3onal Content and Tooling Logstash support for data collec3on

    June 2013

    Q4 2013

    SmartClould Analy'cs

    Log Analysis V.Next

    Enterprise Edi'on

    SmartCloud Analy'cs

    Log Analysis V.Next

    Workgroup Edi'on

    * Enterprise scalability only on Enterprise Edi3on

  • 2013 IBM Corporation 16

    Linking Information

    Linking of search results with structured data Supports linking indexed data with federated sources Plan to provide out of the box linkages with key Tivoli/IBM products

  • 2013 IBM Corporation 17

    Expert Guidance

    Provides Expert advice by searching support docs E.g. when there is an error message found in a log file, search in support documentation for relevant

    information on further explanation and/or fix.

  • 2013 IBM Corporation 18

    A Healthcare Provider reduces 3me to diagnose system problems by providing a holis3c view of all relevant data

    Need

    Have too many tools across structured and unstructured datasets making problem resolu3on dicult and 3me consuming

    Desired a solu3on to 3me-correlate a view into many sources of data to perform problem detec3on, isola3on and repair

    Benefits Reduced time to determine root cause of

    problems by leveraging performance, event and log data

    Skills required to diagnose problems were easily saved and repeated to reduce overall costs

  • 2013 IBM Corporation 19

    90-Day Free Trial Available

    hVp://www-01.ibm.com/soZware/'voli/products/log-analysis

  • 20

    Few companies are genuinely proactive or preventative Most organization react to service outages in progress

    Diagnosis can be complicated by organizational silos, disparate tools, complexity and the sheer volume of data.

    Outages and degradation can cost millions of dollars,

    impact brand, customer churn & retention

    CxOs are challenging their management teams to prevent outages rather than just reacting after failure.

    Proactive, Predictive and Preventative Management

    Why arent operations teams preventative today?

    - Too much data to analyze manually - Existing analytic techniques, such as standard thresholds, are not up to the task - They cannot detect problems while they are emerging (before business impact) - Set threshold too high, insufficient warning before total failure. - Set threshold too low, too much noise, everything is ignored

  • 21

    SmartCloud Analytics Predictive Insights

    Proactive and self-learning Performance intelligence

    Real-time analytics for detecting and avoiding service disruption

    Uses advanced Watson research algorithms

    Correlates metrics across multiple domains and heterogeneous data sources.

    Leverages IBM Big Data technology

    Embeds InfoSphere Streams, IBMs unique streaming analytic engine

    Enables ultra-high scalability commodity server computing clusters and large algorithm sizes to maximize machine intelligence value

    Leverages InfoSphere Datastage, IBMs market leading mediation solution

    Quickly integrate to any monitoring source using a large library of out-of-the-box connectors

    Leverages your Tivoli and non-Tivoli environments

  • 22

    Predictive Anomaly Detection using Behavior Learning Automated problem detection Learn the environment through statistical analysis & correlation Predict problems with high confidence based on changes in metric behavior

    Augment manually applied thresholds by noticing when metrics behave deviates from normal behavior Watch related groups of metrics for additional insight and maximum predictive warning

    Discover and Group Resources based on behavior

    Augments service and application modeling

    Powered by IBM BigData and Data Mediation for rapid delivery, performance & scalability

  • 23

    Example Scenario: Internet Banking Application

    Goal: Automatically learn normal mathematical relationships between metrics Web Response Time

    WRT Bad

    WRT Good

    User Requests

    Time

    Web Response Time

    Anomaly Event Business Impacted

    Early Warning

    Learns Web Response Time has a normal causal relationship with User Requests - WRT gets slower as user load gets higher.

    If this healthy historical relationship breaks down, say due to a memory leak, an anomaly is raised immediately

    The problem is detected even while WRT service is good

    Emerging problems can be detected even while service levels are good in absolute terms

    Core Banking Application

    z/OS

    ESB

    AIX

    Java / WAS

    RHEL

    Oracle

    Windows

    Application

    Internet Banking

  • 24

    Correlation of Multiple Metrics Statistical models can discover mathematical relationships between metrics

    The extent this can be achieved depends on a number of factors, such as: range and type of data, availability of data, and stability of environment. Analytics falls back to a single metric if metrics are unrelated.

    Core Banking Application

    z/OS

    ESB

    AIX

    Java / WAS

    RHEL

    Oracle

    Windows

    Application

    Internet Banking

    G

    I

    B

    D

    C

    E

    F

    H

    A

    Internet Banking

  • 25

    Multiple Metrics Analysis - Value of this approach

    Learns normal operational behaviour across the infrastructure, including how metrics behave together.

    Maximize Advance Warning: Identifies metric relationship changes that signal a problem long before traditional thresholds

    Identifies problems before you know to look for them

    Detects service impacts that are not identifiable by fixed thresholds alone.

    Assists with root cause analysis by indicating the most offending metrics.

    Reduces expensive and time consuming false alerts.

    Provides a more intelligent real-time assessment of data, able to detect

    problems as they are emerging

  • 26

    Large retail bank increased online banking application availability through predictive analytics

    Need Ensure critical retail banking applications

    were online 24x7 for high customer satisfaction

    Proactive anomaly detection was needed to ensure adequate time to resolve major incidents before they became service impacting

    Benefits Alerted 10 major incidents in a 4 week period

    in advance of customer detection

    Simultaneously monitors over 80 servers and 40K metrics

    Estimated savings with outage avoidance analytics was $600K for this 4 week period

    26

  • 27

    Example: Field Trial at Large Retail Bank

    Results: 15 Major Incidents reported during the 4 week trial period 10 major incident were detected or predicted by SCA-Predictive Inisights 5 missed incidents were application code problems and not manifest in health metrics 100% of detectable problems detected Prediction & Detection Intervals: Report included a Problem Start Time, a Problem Detection Time and Problem Resolution Time 6 out of the 10 detected incidents were predicted before the customers Problem Start Time All 10 out of 10 detected problem were detected before or around the customers Problem Detection Time interval Results for this Customer

    Using industry average outage costs, potential outage avoidance savings for 4 weeks: $600k Event reduction savings for 4 weeks: $53k

    Retail Bank experiencing severe problems with their online banking applica3on Trial Scope: Online banking service with back end application ITM AIX, Linux, Windows, ITCAM for WAS, ITCAM for WRT

    ~80 servers ~40k metrics

  • 2013 IBM Corporation 28

    Backup

  • 29

    Solution Architecture - Mediation

    Mediation IBM InfoSphere Datastage

    Analytic Engine IBM InfoSphere Streams

    Analytic Application

    Post-Processing Rules Uses OMNIbus Rule Engine

    Anomaly Consolidation

    User Interface & Management Tivoli Integrated Portal

    Predictive Insights Market leading mediation - provided as option

    Proven rapid integration to new data sources.

    Productivity tooling & collaboration included

    High performance and scalability.

    Large framework of connectors.

    Fast integration to common monitoring data formats.

    Windows based development environment

  • 30

    Mediation Rapid Common Extraction

    Predictive Insights provides a quick setup Common Extractor feature that allows fast extraction from the most common interface types such as:

    - CSV

    - Databases and database connectors, e.g. JDBC

    Monitoring Suites Interface Implemented in trials

    HP Sitescope JDBC Yes

    Quest Foglight Script dump to CSV Yes

    CA Wily Introscope JDBC Yes

    IBM ITM TDW DB2 Yes

    IBM TDW Proxy Agent (low lat) CSV Yes

    IBM ITCAM TDW DB2 Yes

    Compuware VAM Script dump to CSV Yes

    HP Mercury BAC JDBC Yes

    IBM Performance Manager CSV Yes

    Brix CSV Yes

    IBM Service Quality Manager CSV Yes

    Other extractions can be quickly built from a large library of Datastage connectors

  • 31

    Media3on Connector Library InfoSphere DataStage

    RDBMS!DB2 (on Z, I, P or X series)"Oracle"Informix (IDS and XPS)"Ingres"Netezza"Progress"RDB"RedBrick"SQL/DS"SQL Server"Sybase (ASE & IQ)"Teradata"Universe"UniData"NonStop SQL"InfoSphere Federation Server"InfoSphere Classic Federation"And more.."

    General Access "Sequential File"Complex Flat File"File Set"Data Set"Named Pipe"iWay"FTP"SFTP "Compressed / Encoded Data"External Command Call"Parallel/wrapped 3rd party apps"EMC InfoMover"Web logs"Email"

    Enterprise Applications!JDE/PeopleSoft OneWorld "Oracle Applications"PeopleSoft"SAS"SAP BW"SAP R/3"Siebel"Ariba"Manugistics"I2"Etc"

    Standards & Real Time !WebSphere MQ"Java Messaging Services (JMS)"Java"XML & XSL-T"EBXML"Web Services (SOAP)"Enterprise Java Beans (EJB)"EDI"FIX"SWIFT"HIPAA"

    CDC!DB2 (on Z, I, P, X series)"Oracle"SQL Server"Sybase"Informix"IMS"VSAM"ADABAS"IDMS"Datacom""

    Legacy!Allbase/SQL"C-ISAM"D-ISAM"Datacom/DB"DS Mumps"Enscribe"Essbase"FOCUS"IDMS/SQL"ImageSQL"Infoman"KSAM"M204"MS Analysis"Nomad"Nucleus"RMS S2000"Supra"TOTAL"TurboImage"Unify"And many more."

  • 32

    Solution Architecture Analytic Engine

    Mediation IBM InfoSphere Datastage

    Analytic Engine IBM InfoSphere Streams

    Analytic Application

    Post-Processing Rules Uses OMNIbus Rule Engine

    Anomaly Consolidation

    User Interface & Management Tivoli Integrated Portal

    Predictive Insights

    Real-time streaming analytic engine, provided as a component

    High volume and low latency.

    Supports server clustering and redundancy (next rel)

    Enables large algorithm capacity 80,000 metrics in a single algorithm instance (a typical banking application produces ~30,000 - 60,000 metrics)

    Allows multiple algorithm instances spread across commodity server computing clusters, making maximum advantage of multi-core parallelism (next rel)

  • 33

    Solution Architecture Analytics

    Mediation IBM InfoSphere Datastage

    Analytic Engine IBM InfoSphere Streams

    Analytic Application

    Post-Processing Rules Uses OMNIbus Rule Engine

    Anomaly Consolidation

    User Interface & Management Tivoli Integrated Portal Automated anomaly detection and prediction on

    time-series performance metrics Behavioural learning to model not only one metric at a time, but the relationships between them for anomaly detection...

    Single metric evaluation replacing many manual thresholds for any time series data

    Multiple metric correlation enabling earlier detection than traditional thresholds with higher confidence

    Predictive Insights

  • 34

    Solution Architecture Anomaly Consolidation

    Mediation IBM InfoSphere Datastage

    Analytic Engine IBM InfoSphere Streams

    Analytic Application

    Post-Processing Rules Uses OMNIbus Rule Engine

    Anomaly Consolidation

    User Interface & Management Tivoli Integrated Portal Targeted for next release, the alarm consolidation

    framework reduces the events that are presented externally allowing for efficient processing and accurate alerts.

    Different techniques will be selectable depending on the richness of the data processed.

    It reduces the volume of external alarms forwarded to event consoles or application/domain administrators, without removing any information that could be useful in prediction, detection or RCA.

    Internal Events External Events

    UV: Node A: Metric 1 EXT: Node A, B, C, Metric 1, 2, 3

    UV: Node B: Metric 2 EXT: Node M, Metric 47

    U V: Node C: Metric 3

    MV: Node B, C: Metric 2, 3

    MV Node A, B, C: Metric 1, 3, 2

    UV: Node M: Metric 47

    Predictive Insights

  • 35

    Solution Architecture Anomaly Post Processing

    Mediation IBM InfoSphere Datastage

    Analytic Engine IBM InfoSphere Streams

    Analytic Application

    Post-Processing Rules Uses OMNIbus Rule Engine

    Anomaly Consolidation

    User Interface & Management Tivoli Integrated Portal

    The post-processing engine allows anomaly events to modified, customized, or enriched

    It can an be optionally used to put some business/domain context around the domain agnostic analytic anomaly events. Typically this will be used to re-prioritize anomaly severity to major if it is service impacting.

    For example, if Metric 2 represents Online

    Banking Web Response Time, then the anomaly severity can be changed to Major.

    This reuses OMNIbus Probe rules libraries, but is dependent on having a northbound OMNIbus object server to receive the anomaly events.

    Internal Events External Events

    UV: Node A: Metric 1 EXT: Node A, B, C, Metric 1, 2, 3

    UV: Node B: Metric 2 EXT: Node M, Metric 47

    U V: Node C: Metric 3

    MV: Node B, C: Metric 2, 3

    MV Node A, B, C: Metric 1, 3, 2

    UV: Node M: Metric 47

    Predictive Insights

  • 36

    Solution Architecture User Interface & Management

    Mediation IBM InfoSphere Datastage

    Analytic Engine IBM InfoSphere Streams

    Analytic Application

    Post-Processing Rules Uses OMNIbus Rule Engine

    Anomaly Consolidation

    User Interface & Management Tivoli Integrated Portal

    TIP based anomaly visualization

    Allow all anomalous metric to be visualized together

    Normalizes metric scales, and allows, pan/zoon etc, so that anomalous conditions are more readily apparent.

    In-context linking between OMNIbus, TBSM, ITMM AEL and anomaly charts

    Inherits all TIP features for unified user management and permissions.

    Predictive Insights

  • 2013 IBM Corporation 37

    IBM SmartCloud Analytics - Differentiators

    Expert Advice: Out of the box IBM exper3se based on advanced text analy3cs

    Easily Extendable: For customers, OEM or value-added reseller to add their own Insight Packs

    Integra3on: Integra3on with Tivoli products

    Licensing Model: Based on average data consump3on not your worst day

    Data Federa3on: Easily link structured and unstructured data

    Mul3 Source: Ingests metrics, cong, events, logs, traces, and topology to perform RCA not just logs

    PlaXorm: Core technology based on IBMs Big Data PlaXorm poten3al to use a common Big Data plaXorm for the en3re business

    Text Analy3cs: Only product in industry with advanced text analy3cs engine able to extract insights from unstructured sources like service 3ckets & support documents

  • 2013 IBM Corporation 38

    IBM SmartCloud Analytics - Differentiators

    End to end Solu3on: End to end solu3on to predict, analyze and resolve problems, not just a point solu3on that analyzes a narrow spectrum of data

    Anomaly Detec3on: Anomaly detec3on and predic3on on logs and metrics - helps users know what to search for and what is trending

  • 2013 IBM Corporation 39

    User Scenarios

    39

  • 2013 IBM Corporation 40

    C

    Applica'on owner/Support Engineer

    Applica'on developer

    Opera'ons Teams

    Targeted Users

    Analy'c Content Creator

    Develops and tests large distributed mul3-component applica3ons on a middleware stack. Debugs applica3on during development.

    Domain experts build content and leverage available support

    documenta9on

    IT Opera'ons

    Con3nuously monitors for anomalies across the en3re infrastructure. When an anomaly is detected, this user routes the problem to the right team.

    Supports a business applica3on that is built on middleware. Needs to solve problems quickly to avoid any business impact. Needs to capture best prac3ces in a tool so that diagnosis is less dependent on skills availability.

  • 2013 IBM Corporation 41

    A developer validates a complex distributed application by running tests overnight. He wants to know which points in his application are causing exceptions through detailed analysis of the stack traces

    Example 1

    He uses text analy3cs of complex logs such as stack traces to nd paEerns Searches for frequent problems and paEerns to decide which por3on of the code requires aEen3on.

  • 2013 IBM Corporation 42

    Users of healthcare application are facing problems. Its taking too much time to view patient data and at times they see errors on their browser. They complain to customer support

    Example 2

    The customer support engineer creates a 3cket. The applica3on support engineer now needs to nd the root cause of the problem.

    The applica3on is distributed over mul3ple nodes and involves various middleware and legacy technologies. It generates dierent types and large volumes of metric and log data.

    The support engineer searches the applica3on logs to locate a period when transac3ons were slow for a specic set of users.

    Using problem context, he searches expert advice for solu3ons

  • 2013 IBM Corporation 43

    Product Capabilities

    43

  • 2013 IBM Corporation 44

    Key Product Capabilities to Meet Client Needs

    Mul3ple op3ons to upload log data

    Search a massive amount of log les

    Link structured data with unstructured data

    Expert guidance

    Analy3cs for trends and anomaly detec3on

    Ability to build insight packs for domain/applica3on

  • 2013 IBM Corporation 45

    Multiple Options to Upload Log Data

    Business Users

    Applica'on/system

    Applica3on Components

    Log Analy'cs Server

    Push logs (Log File Agent, REST interface)

    Pull logs using remote monitoring (agent less op3on)

    App Developer/ IT Ops Engineer

  • 2013 IBM Corporation 46

    Searching Information

    Log le

    [10/9/12 5:51:38:295 GMT+05:30] 0000006a servlet E com.ibm.ws.webcontainer.servlet.ServletWrapper service SRVE0068E: Uncaught exception created in one of the service methods of the servlet TradeAppServlet in application DayTrader2-EE5. Exception created : javax.servlet.ServletException: TradeServletAction.doSell(...) exception selling holding 3111 for user =uid:43 at org.apache.geronimo.samples.daytrader.web.TradeServletAction.doSell(TradeServletAction.java:708)

    Log Analy'cs Server

  • 2013 IBM Corporation 47

    Text Analytics on Logs

    Leverages Market leading Text Analytics solution

    Need to extract what are errors and class names from WebSphere logs

    Informa3on extrac3on may involve complex context sensi3ve grammar that need processing beyond simple regular expression parsing.

    Follows immediately

    Within a single log record

    Developer: Which are the top 10 Java classes that have most errors ?

    [7/25/12 8:27:09:391 CEST] 00000028 E com.ibm.bpe.u3l.Assert.asser3on

  • 2013 IBM Corporation 48

    Sample App Dashboard

  • 2013 IBM Corporation 49

    Example: Metadata Extraction from Tech Note

  • 2013 IBM Corporation 50

    Ability to Build Insight Packs

    [07/25/12 02:38:25:295 GMT+05:30] 00000010 TraceResponse E DSRA1120E: Application did not explicitly close all handles to this Connection. Connection cannot be pooled.

    [07/25/12 02:38:25:295 GMT+05:30] 00000010 TraceResponse E DSRA1120E: Application did not explicitly close all handles to this Connection. Connection cannot be pooled.

    Provides ability to build / deliver client, industry or scenario specic use cases. Insight Pack contains

    informa3on such as

    How to interpret log data ?

    How to link log data with metric data and applica3on data

    What to search in log les?

    What are the sources of expert advice?

  • 2013 IBM Corporation 51 2013 IBM Corporation 51