15
BigData in Banking Challenges and Solutions Arshavsky Andzhey Director, Big Data dept., SberBank Avarshavsky.sbt@ sberbank.ru [email protected] 2015

BigData in Banking

Embed Size (px)

Citation preview

Page 1: BigData in Banking

BigData in Banking

Challenges and Solutions

Arshavsky AndzheyDirector Big Data dept SberBank

Avarshavskysbtsberbankruandzheymaccom

2015

3

Innovations like killers ndash destruction stages of standard banking system

① Internet amp social networks

Control and choice

② Screens and Smartphones

Anyplace any time

③ Mobile wallet

Out of cash and plastic cards

④ Accounts without Banks

No bank accounts

⑤ BigDataCros-system personalization and targeting

Бретт Кинг Банк 30

4

BIGDATA as the development of approaches to the use of data

Information like competition

differentiator

Information like innovation

enablement

Information as strategic asset

Information for business analysys

Data for business

ldquoDay by day operationsrdquo

ldquoDatawarehousingrdquo

The

valu

e of

info

rmat

ion

for b

usin

ess

BIGDATA

ldquoInformation in business contextrdquo

ldquoBusiness innovations based on informationrdquo

ldquoAdaptive business strategyrdquo

Flexible information infra

structu

re

Continues Busin

ess Innovations

Information in th

e management context

Change management

Information usage methods maturity

+ INTERNET AND OPEN DATA

BIGDATA in Banking

5

BIGDATA In BankingInformation challenges in large Banks (XL)

Data is the most valuable asset in all XL banks

A few know how to apply data for solving even this day challenges

A few know how to leverage internet external or open data sources to understand clients better and attract new customers

6

The Key challenge with data analysis

Through the development of the Big Data Infrastructure which solves the challenges with data pre-processing and attribution thru building intelligent data processing Framework the company will be able to optimize labor costs by reducing works on data preparation of data for the development of business applications up to 70

BIGDATA in Banking

It is estimated (by Gartner) 70 of the time spent on analytical projects are dedicated to bringing cleaning and data integration mainly due to the following problems

The difficulty of locating data due to the carelessness among disparate business applications and business systemsTo be more than appropriate for analysis data require reengineering and reformatting1113088The acquisition of data for analysis in a specified format creates a huge burden on the teams that own the systems data source Often the same data is requested or purchase by a variety of departments and business units which creates additional work and chaosThe need for process setup regular data exchange

7

Data and Analytics tools as shared resource

ClientProductTransactionsLocationhellipInstruments

RISKS Dept

RETAIL Dept

OPERATIONS Dept

SEQURITY Dept

CORPORATE CLIENTS Dept

HR

BIGDATA in Banking

BIGDATA to a lesser extent about the data size and is more about the opportunity to work with many different data types formats and applications with powerful analytic capabilities

8

Sources of business growth and execution excelenceBIGDATA in Banking

Client

ПРИВЛЕЧЕНИЕ

УДЕРЖАНИЕ

ПРОДАЖИ

ПЕРВИЧНЫЕ

ВТОРИЧНЫЕ

КРЕДИТЫРИСКИ

ЗАДОЛЖЕННОСТИ

АНТИФРОД

ВНУТРЕННИЙ

ВНЕШНИЙ

HR ОПТИМИЗАЦИЯ

ПРОЦЕССОВ①

③ ④

9

Data Factory conception

Big Data Factory should enable data processing in a uniform manner for all platforms functions and customers To build easily changeable and easy to use data processing operating model with the required level of trust for both traditional and not so traditional data sources

Tasks Information trust

Traditional and not so traditional data sources

BIGDATA in Banking

bull Delivery informationbull Information integration

(Cleaning Transformation Mapping Improvement)

bull Information searchbull Access to informationbull Study hypothesesbull Learning models and

information analysisbull Backup Cleanup Restorebull Administration

bull Lifecycle managementbull Data qualitybull Reference databull Record linkage and the resolution of

contradictionsbull Classificationbull Reporting

bull Internet databull Data virtualization

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 2: BigData in Banking

3

Innovations like killers ndash destruction stages of standard banking system

① Internet amp social networks

Control and choice

② Screens and Smartphones

Anyplace any time

③ Mobile wallet

Out of cash and plastic cards

④ Accounts without Banks

No bank accounts

⑤ BigDataCros-system personalization and targeting

Бретт Кинг Банк 30

4

BIGDATA as the development of approaches to the use of data

Information like competition

differentiator

Information like innovation

enablement

Information as strategic asset

Information for business analysys

Data for business

ldquoDay by day operationsrdquo

ldquoDatawarehousingrdquo

The

valu

e of

info

rmat

ion

for b

usin

ess

BIGDATA

ldquoInformation in business contextrdquo

ldquoBusiness innovations based on informationrdquo

ldquoAdaptive business strategyrdquo

Flexible information infra

structu

re

Continues Busin

ess Innovations

Information in th

e management context

Change management

Information usage methods maturity

+ INTERNET AND OPEN DATA

BIGDATA in Banking

5

BIGDATA In BankingInformation challenges in large Banks (XL)

Data is the most valuable asset in all XL banks

A few know how to apply data for solving even this day challenges

A few know how to leverage internet external or open data sources to understand clients better and attract new customers

6

The Key challenge with data analysis

Through the development of the Big Data Infrastructure which solves the challenges with data pre-processing and attribution thru building intelligent data processing Framework the company will be able to optimize labor costs by reducing works on data preparation of data for the development of business applications up to 70

BIGDATA in Banking

It is estimated (by Gartner) 70 of the time spent on analytical projects are dedicated to bringing cleaning and data integration mainly due to the following problems

The difficulty of locating data due to the carelessness among disparate business applications and business systemsTo be more than appropriate for analysis data require reengineering and reformatting1113088The acquisition of data for analysis in a specified format creates a huge burden on the teams that own the systems data source Often the same data is requested or purchase by a variety of departments and business units which creates additional work and chaosThe need for process setup regular data exchange

7

Data and Analytics tools as shared resource

ClientProductTransactionsLocationhellipInstruments

RISKS Dept

RETAIL Dept

OPERATIONS Dept

SEQURITY Dept

CORPORATE CLIENTS Dept

HR

BIGDATA in Banking

BIGDATA to a lesser extent about the data size and is more about the opportunity to work with many different data types formats and applications with powerful analytic capabilities

8

Sources of business growth and execution excelenceBIGDATA in Banking

Client

ПРИВЛЕЧЕНИЕ

УДЕРЖАНИЕ

ПРОДАЖИ

ПЕРВИЧНЫЕ

ВТОРИЧНЫЕ

КРЕДИТЫРИСКИ

ЗАДОЛЖЕННОСТИ

АНТИФРОД

ВНУТРЕННИЙ

ВНЕШНИЙ

HR ОПТИМИЗАЦИЯ

ПРОЦЕССОВ①

③ ④

9

Data Factory conception

Big Data Factory should enable data processing in a uniform manner for all platforms functions and customers To build easily changeable and easy to use data processing operating model with the required level of trust for both traditional and not so traditional data sources

Tasks Information trust

Traditional and not so traditional data sources

BIGDATA in Banking

bull Delivery informationbull Information integration

(Cleaning Transformation Mapping Improvement)

bull Information searchbull Access to informationbull Study hypothesesbull Learning models and

information analysisbull Backup Cleanup Restorebull Administration

bull Lifecycle managementbull Data qualitybull Reference databull Record linkage and the resolution of

contradictionsbull Classificationbull Reporting

bull Internet databull Data virtualization

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 3: BigData in Banking

4

BIGDATA as the development of approaches to the use of data

Information like competition

differentiator

Information like innovation

enablement

Information as strategic asset

Information for business analysys

Data for business

ldquoDay by day operationsrdquo

ldquoDatawarehousingrdquo

The

valu

e of

info

rmat

ion

for b

usin

ess

BIGDATA

ldquoInformation in business contextrdquo

ldquoBusiness innovations based on informationrdquo

ldquoAdaptive business strategyrdquo

Flexible information infra

structu

re

Continues Busin

ess Innovations

Information in th

e management context

Change management

Information usage methods maturity

+ INTERNET AND OPEN DATA

BIGDATA in Banking

5

BIGDATA In BankingInformation challenges in large Banks (XL)

Data is the most valuable asset in all XL banks

A few know how to apply data for solving even this day challenges

A few know how to leverage internet external or open data sources to understand clients better and attract new customers

6

The Key challenge with data analysis

Through the development of the Big Data Infrastructure which solves the challenges with data pre-processing and attribution thru building intelligent data processing Framework the company will be able to optimize labor costs by reducing works on data preparation of data for the development of business applications up to 70

BIGDATA in Banking

It is estimated (by Gartner) 70 of the time spent on analytical projects are dedicated to bringing cleaning and data integration mainly due to the following problems

The difficulty of locating data due to the carelessness among disparate business applications and business systemsTo be more than appropriate for analysis data require reengineering and reformatting1113088The acquisition of data for analysis in a specified format creates a huge burden on the teams that own the systems data source Often the same data is requested or purchase by a variety of departments and business units which creates additional work and chaosThe need for process setup regular data exchange

7

Data and Analytics tools as shared resource

ClientProductTransactionsLocationhellipInstruments

RISKS Dept

RETAIL Dept

OPERATIONS Dept

SEQURITY Dept

CORPORATE CLIENTS Dept

HR

BIGDATA in Banking

BIGDATA to a lesser extent about the data size and is more about the opportunity to work with many different data types formats and applications with powerful analytic capabilities

8

Sources of business growth and execution excelenceBIGDATA in Banking

Client

ПРИВЛЕЧЕНИЕ

УДЕРЖАНИЕ

ПРОДАЖИ

ПЕРВИЧНЫЕ

ВТОРИЧНЫЕ

КРЕДИТЫРИСКИ

ЗАДОЛЖЕННОСТИ

АНТИФРОД

ВНУТРЕННИЙ

ВНЕШНИЙ

HR ОПТИМИЗАЦИЯ

ПРОЦЕССОВ①

③ ④

9

Data Factory conception

Big Data Factory should enable data processing in a uniform manner for all platforms functions and customers To build easily changeable and easy to use data processing operating model with the required level of trust for both traditional and not so traditional data sources

Tasks Information trust

Traditional and not so traditional data sources

BIGDATA in Banking

bull Delivery informationbull Information integration

(Cleaning Transformation Mapping Improvement)

bull Information searchbull Access to informationbull Study hypothesesbull Learning models and

information analysisbull Backup Cleanup Restorebull Administration

bull Lifecycle managementbull Data qualitybull Reference databull Record linkage and the resolution of

contradictionsbull Classificationbull Reporting

bull Internet databull Data virtualization

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 4: BigData in Banking

5

BIGDATA In BankingInformation challenges in large Banks (XL)

Data is the most valuable asset in all XL banks

A few know how to apply data for solving even this day challenges

A few know how to leverage internet external or open data sources to understand clients better and attract new customers

6

The Key challenge with data analysis

Through the development of the Big Data Infrastructure which solves the challenges with data pre-processing and attribution thru building intelligent data processing Framework the company will be able to optimize labor costs by reducing works on data preparation of data for the development of business applications up to 70

BIGDATA in Banking

It is estimated (by Gartner) 70 of the time spent on analytical projects are dedicated to bringing cleaning and data integration mainly due to the following problems

The difficulty of locating data due to the carelessness among disparate business applications and business systemsTo be more than appropriate for analysis data require reengineering and reformatting1113088The acquisition of data for analysis in a specified format creates a huge burden on the teams that own the systems data source Often the same data is requested or purchase by a variety of departments and business units which creates additional work and chaosThe need for process setup regular data exchange

7

Data and Analytics tools as shared resource

ClientProductTransactionsLocationhellipInstruments

RISKS Dept

RETAIL Dept

OPERATIONS Dept

SEQURITY Dept

CORPORATE CLIENTS Dept

HR

BIGDATA in Banking

BIGDATA to a lesser extent about the data size and is more about the opportunity to work with many different data types formats and applications with powerful analytic capabilities

8

Sources of business growth and execution excelenceBIGDATA in Banking

Client

ПРИВЛЕЧЕНИЕ

УДЕРЖАНИЕ

ПРОДАЖИ

ПЕРВИЧНЫЕ

ВТОРИЧНЫЕ

КРЕДИТЫРИСКИ

ЗАДОЛЖЕННОСТИ

АНТИФРОД

ВНУТРЕННИЙ

ВНЕШНИЙ

HR ОПТИМИЗАЦИЯ

ПРОЦЕССОВ①

③ ④

9

Data Factory conception

Big Data Factory should enable data processing in a uniform manner for all platforms functions and customers To build easily changeable and easy to use data processing operating model with the required level of trust for both traditional and not so traditional data sources

Tasks Information trust

Traditional and not so traditional data sources

BIGDATA in Banking

bull Delivery informationbull Information integration

(Cleaning Transformation Mapping Improvement)

bull Information searchbull Access to informationbull Study hypothesesbull Learning models and

information analysisbull Backup Cleanup Restorebull Administration

bull Lifecycle managementbull Data qualitybull Reference databull Record linkage and the resolution of

contradictionsbull Classificationbull Reporting

bull Internet databull Data virtualization

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 5: BigData in Banking

6

The Key challenge with data analysis

Through the development of the Big Data Infrastructure which solves the challenges with data pre-processing and attribution thru building intelligent data processing Framework the company will be able to optimize labor costs by reducing works on data preparation of data for the development of business applications up to 70

BIGDATA in Banking

It is estimated (by Gartner) 70 of the time spent on analytical projects are dedicated to bringing cleaning and data integration mainly due to the following problems

The difficulty of locating data due to the carelessness among disparate business applications and business systemsTo be more than appropriate for analysis data require reengineering and reformatting1113088The acquisition of data for analysis in a specified format creates a huge burden on the teams that own the systems data source Often the same data is requested or purchase by a variety of departments and business units which creates additional work and chaosThe need for process setup regular data exchange

7

Data and Analytics tools as shared resource

ClientProductTransactionsLocationhellipInstruments

RISKS Dept

RETAIL Dept

OPERATIONS Dept

SEQURITY Dept

CORPORATE CLIENTS Dept

HR

BIGDATA in Banking

BIGDATA to a lesser extent about the data size and is more about the opportunity to work with many different data types formats and applications with powerful analytic capabilities

8

Sources of business growth and execution excelenceBIGDATA in Banking

Client

ПРИВЛЕЧЕНИЕ

УДЕРЖАНИЕ

ПРОДАЖИ

ПЕРВИЧНЫЕ

ВТОРИЧНЫЕ

КРЕДИТЫРИСКИ

ЗАДОЛЖЕННОСТИ

АНТИФРОД

ВНУТРЕННИЙ

ВНЕШНИЙ

HR ОПТИМИЗАЦИЯ

ПРОЦЕССОВ①

③ ④

9

Data Factory conception

Big Data Factory should enable data processing in a uniform manner for all platforms functions and customers To build easily changeable and easy to use data processing operating model with the required level of trust for both traditional and not so traditional data sources

Tasks Information trust

Traditional and not so traditional data sources

BIGDATA in Banking

bull Delivery informationbull Information integration

(Cleaning Transformation Mapping Improvement)

bull Information searchbull Access to informationbull Study hypothesesbull Learning models and

information analysisbull Backup Cleanup Restorebull Administration

bull Lifecycle managementbull Data qualitybull Reference databull Record linkage and the resolution of

contradictionsbull Classificationbull Reporting

bull Internet databull Data virtualization

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 6: BigData in Banking

7

Data and Analytics tools as shared resource

ClientProductTransactionsLocationhellipInstruments

RISKS Dept

RETAIL Dept

OPERATIONS Dept

SEQURITY Dept

CORPORATE CLIENTS Dept

HR

BIGDATA in Banking

BIGDATA to a lesser extent about the data size and is more about the opportunity to work with many different data types formats and applications with powerful analytic capabilities

8

Sources of business growth and execution excelenceBIGDATA in Banking

Client

ПРИВЛЕЧЕНИЕ

УДЕРЖАНИЕ

ПРОДАЖИ

ПЕРВИЧНЫЕ

ВТОРИЧНЫЕ

КРЕДИТЫРИСКИ

ЗАДОЛЖЕННОСТИ

АНТИФРОД

ВНУТРЕННИЙ

ВНЕШНИЙ

HR ОПТИМИЗАЦИЯ

ПРОЦЕССОВ①

③ ④

9

Data Factory conception

Big Data Factory should enable data processing in a uniform manner for all platforms functions and customers To build easily changeable and easy to use data processing operating model with the required level of trust for both traditional and not so traditional data sources

Tasks Information trust

Traditional and not so traditional data sources

BIGDATA in Banking

bull Delivery informationbull Information integration

(Cleaning Transformation Mapping Improvement)

bull Information searchbull Access to informationbull Study hypothesesbull Learning models and

information analysisbull Backup Cleanup Restorebull Administration

bull Lifecycle managementbull Data qualitybull Reference databull Record linkage and the resolution of

contradictionsbull Classificationbull Reporting

bull Internet databull Data virtualization

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 7: BigData in Banking

8

Sources of business growth and execution excelenceBIGDATA in Banking

Client

ПРИВЛЕЧЕНИЕ

УДЕРЖАНИЕ

ПРОДАЖИ

ПЕРВИЧНЫЕ

ВТОРИЧНЫЕ

КРЕДИТЫРИСКИ

ЗАДОЛЖЕННОСТИ

АНТИФРОД

ВНУТРЕННИЙ

ВНЕШНИЙ

HR ОПТИМИЗАЦИЯ

ПРОЦЕССОВ①

③ ④

9

Data Factory conception

Big Data Factory should enable data processing in a uniform manner for all platforms functions and customers To build easily changeable and easy to use data processing operating model with the required level of trust for both traditional and not so traditional data sources

Tasks Information trust

Traditional and not so traditional data sources

BIGDATA in Banking

bull Delivery informationbull Information integration

(Cleaning Transformation Mapping Improvement)

bull Information searchbull Access to informationbull Study hypothesesbull Learning models and

information analysisbull Backup Cleanup Restorebull Administration

bull Lifecycle managementbull Data qualitybull Reference databull Record linkage and the resolution of

contradictionsbull Classificationbull Reporting

bull Internet databull Data virtualization

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 8: BigData in Banking

9

Data Factory conception

Big Data Factory should enable data processing in a uniform manner for all platforms functions and customers To build easily changeable and easy to use data processing operating model with the required level of trust for both traditional and not so traditional data sources

Tasks Information trust

Traditional and not so traditional data sources

BIGDATA in Banking

bull Delivery informationbull Information integration

(Cleaning Transformation Mapping Improvement)

bull Information searchbull Access to informationbull Study hypothesesbull Learning models and

information analysisbull Backup Cleanup Restorebull Administration

bull Lifecycle managementbull Data qualitybull Reference databull Record linkage and the resolution of

contradictionsbull Classificationbull Reporting

bull Internet databull Data virtualization

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 9: BigData in Banking

10

ЦК Супермассивов данныхBIGDATA PLATFORM HIGH-LEVEL CONCEPTION

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 10: BigData in Banking

11

BIGDATA in BankingData Factory Scenarious

The experts of the subject areas of the Banks business need to access the organizations data for research sampling annotation and modelling

Data Scientists works on new models

Marketing is looking for data for the new compains

Security services looking for data for drill a suspicious transaction

Retail unit wants to make the best proposal to the clienthelliphellip

Daily activity

The need for ad hoc access to diverse data

Support analysis and decision making

To use the terminology subject matter experts when accessing data

Providing the same easy access to data in spreadsheets with the ability to scale to huge volumes and distribution on a huge variety of types of information while protecting sensitive information and optimizing it storage systems

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 11: BigData in Banking

BIGDATA in BankingData 2 profit process

Task formalization

DATAPREPARATION

DATAEXPLORATION

ADDITIONAL INDICATORS

ALATITICS ampMODELING

MODEL VALIDATION

MODEL PRODUCTIZATION

EFFECIENCY MONITORING

12

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 12: BigData in Banking

13

HDFS row dataData

ex

chan

ge

Data preparation processing and analytical layer

Analytical Views

Ad-hoc analytics Development factory

Streaming

Big Data applications Integration

marts API

BIGDATA in BankingPossible architecture

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 13: BigData in Banking

14

BI amp BIGDATA

Traditional BI Big Data

Based on DWH

Precession is crucial

Flat data scheme

Long time 2 market

hi-end hardware

Based on Hadoop and Spark

Any precesion

Complex and variable data schemes

Ad-hoc analytics

Short time 2 market

New data sources

Low cost

Both approaches are valid

BIGDATA in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 14: BigData in Banking

15

BIGDATA in BankingIs not expensive - OPEN SOURCE does work

Low costNo vendor lockCommunity support

APPLICATION LAYER

Spark

Hadoop

SQL

NoS

QL

DB

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
Page 15: BigData in Banking

16

BIGDATA in Banking

Thanks and good luck

  • BigData in Banking
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16