Upload
jurijs-jefimovs
View
271
Download
7
Embed Size (px)
Citation preview
Oracle Advanced Analytics: insurance claim fraud detection
Oracle Innovation Days 2015, Riga
• Established in November, 2007
• 100+ employees
• Customers in Nordics, Latvia, Russia and
the USA
• Provide systems integration services
(CRM, Decision Support Systems)
• Develops original products
• (Micromiles, Debessmana)
Who we are
• Defining needs
• Collecting data
• Generating and evaluating options
• Selecting the best possible
• Applying and using
• Getting feedback and following up
Decisions Making Process Is …
Data Mining is
• the computational process of discovering
patterns in large data sets
• Knowledge Discovery in Databases
What is Data Mining?
Financial Services
- Credit risk analysis
- Cross-LOB up-selling
- Fraud detection
- Retail banking personalization
- “Best customer” prediction & profiling
Retail
- Product recommendations
- Customer segmentation
- Customer profiling
- Market Basket Analysis
Telecommunications
- Churn prevention
- Social network analysis
- Network monitoring
- Customer handling time reduction
Transportation and logistics
- Anticipate bottlenecks
- Proactive resource planning
- Improved preventative maintenance strategies
Data Mining use cases
Cross Industry Standard Process for Data Mining (CRISP)
Business Understanding • Business Objectives • Success Criteria • Project plan • Deliveries
Data Understanding • Initial Data Collection • Data Description • Data Exploration
Data Preparation • Data cleaning • Sampling • Normalization • Feature Selection
Modeling • Select modeling techniques • Build/train model • Prediction
Evaluation • Model validation • Review results • Success criteria evaluation
Deployment • Results visualization • Report creation
Business Understanding
Fraud detection analysis for insurance claims (car insurance) Business Objectives The goal of this analysis is to create a tool which helps to identify fraudulent claims in auto insurance (KASKO) Deliveries • Possible fraud prediction • Descriptive analysis
Data Understanding
Initial Data Collection 250 attributes 404 k claims 4% fraud
FraudNormal
Source: Oracle Siebel CRM
Data preprocessing
FraudNormal
Activities: • normalization • inputting missing data • attribute selection • stratified sampling
• 70% training dataset • 30% test dataset
Final data set 150 of 250 attributes selected
Data Mining techniques
• Classification
• Clustering
Data mining tools: Oracle Data Miner
Modeling
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
– In-database data mining algorithms and open source R algorithms
– SQL, PL/SQL, R languages
– Scalable, parallel in-database execution
– Workflow GUI and IDEs
– Integrated component of Database
– Enables enterprise analytical applications
Key Features
Oracle Advanced Analytics Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
OBIEE
Oracle Database Enterprise Edition
Oracle Advanced Analytics Architecture
Oracle Advanced Analytics Native SQL Data Mining/Analytic Functions + High-performance
R Integration for Scalable, Distributed, Parallel Execution
SQL Developer
Applications
R Client
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Function Algorithms Applicability
Classification
Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machines (SVM)
Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text
Regression Linear Regression (GLM) Support Vector Machine (SVM)
Classical statistical technique
Wide / narrow data / text
Anomaly Detection
One Class SVM Unknown fraud cases or anomalies
Attribute Importance
Minimum Description Length (MDL) Principal Components Analysis (PCA)
Attribute reduction, Reduce data noise
Association Rules
Apriori Market basket analysis / Next Best Offer
Clustering Hierarchical k-Means Hierarchical O-Cluster Expectation-Maximization Clustering (EM)
Product grouping / Text mining Gene and protein analysis
Feature Extraction
Nonnegative Matrix Factorization (NMF) Singular Value Decomposition (SVD)
Text analysis / Feature reduction
Oracle Advanced Analytics In-Database Data Mining Algorithms—SQL & R & GUI Access
A1 A2 A3 A4 A5 A6 A7
F1 F2 F3 F4
• Automated data preprocessing (normalizing, cleaning)
• Workflow type modeling • Build several models in
parallel
Modeling
Classification modeling using Oracle Data Miner
Models comparison and validation (confusion matrix)
Classification modeling evaluation
Models Actual values Predicted Values
Accuracy
Value Y N
SVM
Y 66% 34%
69%
N 29% 71%
DT
Y 66% 34%
66%
N 33% 67%
GLM
Y 70% 30%
70%
N 30% 70%
Where Y – Fraud cases N – Normal cases
Cluster evaluation
% of fraud vs normal cases
The top left quadrant is our goal
22
Cluster analysis OBIEE dashboard
Fraudulent claims prediction
Output: - List of possible
fraudulent cases - Probabilities
Contacts
• Web: www.ideaportriga.lv
• Blog: blog.ideaportriga.lv
• Email: [email protected]
• LinkedIn: lv.linkedin.com/in/jurijsj
Find out more
Q&A