60
Nguyễn Phạm Luân Tiến 50702449 Trần Đình Hương Trà 50702573 Dương Bách Tùng 50702839

Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Embed Size (px)

Citation preview

Page 1: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Nguyễn Phạm Luân Tiến 50702449Trần Đình Hương Trà 50702573

Dương Bách Tùng 50702839

Page 2: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Content

1• Introduction about OLAP

Systems.

2• Security requirement in OLAP

Systems.

3• Some Security Issues.

Page 3: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Introduction of OLAP Systems

Nowaday database is used in two main context:

1.OLTP: On-Line Transaction Processing

2.OLAP / DS: On-Line Analytical Processing / Decision

Support

Page 4: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

OLTP vs OLAPOLTP OLAP

Function Constanly handling Decision support

Database design

Applications – Oriented Subjects – Oriented

Data Now, update, detail,… History, aggregation of multidimensions

Access Read / Write / Index Review many times

Unit of work Short single transactions Complex queries

# Record access

k . 10 k . 106

# User k . 103 k . 102

Database size 100 Mb – GB 100 Gb – Tb

Page 5: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Data WarehouseA data warehouse (DW) is a database used for reporting. The data is offloaded from the operational systems for reporting. DW collect data in support of manager’s decesion – making process.

Subjects – oriented Integrated Time – variant Non – volatile

Page 6: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Subject OrientedData is categorized and stored by business subject rather than

by application.

OperationalOperational SystemsSystems

Savings

Shares

Loans

Insurance

EquityPlans

CustomerProduct, Sales

Information

CustomerProduct, Sales

Information

Data Warehouse Data Warehouse Subject AreaSubject Area

Page 7: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Integrated

Data WarehouseData WarehouseOperational EnvironmentOperational Environment

Subject = CustomerSubject = Customer

SavingsApplication

Current Accounts

Application

LoansApplication

NoNo

ApplicationApplication

FlavorFlavor

Page 8: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Time VariantData is stored as a series of snapshots, each representing a

period of time.

DataTime

01/97

02/97

03/97

Data for January

Data for February

Data for March

Data Data WarehouseWarehouse

Page 9: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Non Volatile Typically data in the data warehouse is not updated

or deleted.

ReadRead

LoadLoad

INSERT ReadINSERT Read

UPDATEUPDATE

DELETEDELETE

Operational DatabasesOperational Databases Warehouse DatabaseWarehouse Database

Page 10: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

OLAPIn computing, online analytical processing, or

OLAP is an approach to swiftly answer multi-dimensional analytical queries.

The OLAP database is usually updated in batch, often from multiple sources which most people want from their applications is consistently fast response time.

OLAP is a protocol for processing business data. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling.

Page 11: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

OLAP SERVICES

Page 12: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

OLAP ARCHITECTURESPopular architectures of OLAP systems include ROLAP

(relational OLAP) and MOLAP (multidimensional OLAP).

1)ROLAP provides a front-end tool that translates multidimensional queries into corresponding SQL queries to be processed by the relational backend.

2)MOLAP does not rely on the relational model but instead materializes the multidimensional views.

3)Using MOLAP for dense parts of the data and ROLAP for the others leads to a hybrid architecture, namely, the HOLAP or hybrid OLAP.

Page 13: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

ROLAP

Page 14: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

ColumnsColumns

RowsRows

TableTable

Key values to joinKey values to join

Page 15: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

KEY IN ROLAP

TimeTime

ProductProduct

StoreStore

Single ColumnSingle ColumnTime KeyTime Key

Single ColumnSingle ColumnProduct KeyProduct Key

Single ColumnSingle ColumnStore KeyStore KeyCompositeComposite

KeyKey

Page 16: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Star schema – 4 dimensions

Page 17: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Snowflake schema

Page 18: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

MOLAP

Page 19: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

MOLAP

Page 20: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

MOLAP

Page 21: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

all

time item city supplier

time,item time,city

time,supplier

item,city

item,supplier

city,supplier

time,item,location

time,item,supplier

time,city,supplier

item,city,supplier

time, item, city, supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

Page 22: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839
Page 23: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Geography

Product

Item

Type

Category

All

City

State

Country

All Time

Month

Year

Day

Week

All

Quarter

Page 24: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Geography

Product

Item

Type

Category

All

City

State

Country

All Time

Month

Year

Day

Week

All

Quarter

Page 25: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839
Page 26: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

SalesYear to date ($millions)

ProductsTime

Q1 Q2

ElectronicsToys

ClothingsCosmetics

$5.2$1.9$2.3$1.1

ElectronicsToys

ClothingsCosmetics

$8.9$0.75$4.6$1.5

Store 1Store 2

SalesYear to date ($millions)

ProductsQ1

Store 1 Store 2

ElectronicsToys

ClothingsCosmetics

$5.2$1.9$2.3$1.1

$8.9$0.75$4.6$1.5

ElectronicsToys

ClothingsCosmetics

$8.9$0.75$4.6$1.5

Store 1Store 2

Page 27: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Relational Multidimentsional

Data representation

Two dimensions Multiple dimenstions

Data extraction Specific rows Specific dimensions

Computations Functions High speed matrix

Results Tool specific Matrix

Page 28: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

HOLAP

Page 29: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

OLAP ARCHITECTURESMOLAP ROLAP HOLAP

Underlying data storage

Cube Relational Table

Relational Table

Aggregative data storage

Cube Relation Table Cube

Productivity of Queries Fastest Slowest Fast

Consumption of storage space

High Low Normal

Maintenance cost High Low Normal

Page 30: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Security requirement in OLAP Systems

OLAP system heavily depends on aggregates of data.

They are very vulnerable to indirect inferences of protected data.

Threat of Inferences It is illustrated through 4 Examples:

1. 1 Dimensional Inference (1-d Inference)

2.Multi-Dimensional Inference (m-d Inference) with SUM only.

3.M-d Inference with MAX only.4.M-d Inference with SUM, MAX and MIN.

Page 31: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

One dimensional Inference(1-d Inference):

Security requirement in OLAP Systems

Suggest that adversary :

• Can’t access the cuboid <Employee,Quarter> but is allowed to access <Department,Quarter>.

• Knows the Empty cells ‘ value through the outbound channels. Then he can infer that <Bob,Q1> as exactly same value as <A1,Q1>.

Organization

Page 32: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Multi-Dimensional Inferences( m-d Inference) with SumSecurity requirement in OLAP Systems

Suggest that adversary can:• Only Access to <Department,Quarter> and <Employee,Year>.• Know the empty cells ‘ value through out the outbound channels.

A m-d inference is possible as follow:• He first sum the cells <Bob,Y1> and <Alice,Y1> then subtract the cells <A1,Q2> and <A1,Q3>. The final result yeilds a sensitive cell: <Bob,Q1>.

<Bob,Q1> = 1500.

Time

Organization

Page 33: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Multi-Dimensional Inferences( m-d Inference) with MaxSecurity requirement in OLAP Systems

Now, adversary don’t know the value of the empty cells( core cuboid is full of unknown values). The cube will be free of inferences with the SUM aggregations. Can make a m-d inference with MAX aggregations as follow:

- MAX values in cells <Janny,Y1> and <A1,Q4>( that is 6000 and 5000).

- From here, he can infer 1 of 3 cells <Janny, Q1>, <Janny,Q2> or <Janny,Q3> is 6000.

- Neither <Janny,Q2> nor <Janny,Q3> can be 6000.

<Janny,Q1> = 6000

Page 34: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Multi-Dimensional Inferences( m-d Inference) with Sum, Max and Min:

Security requirement in OLAP Systems

Page 35: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Security requirement in OLAP SystemsMulti-Dimensional Inferences( m-d Inference) with

Sum, Max and Min:

• Now suppose that adversary can ask queries using SUM, MAX, MIN on the data cube.

• Following last example, he can infer <Janny,Q1> = 6000.

• SUM, MAX, MIN of <Janny,Y1> are 11000, 6000, and 5000.

• From here, he can infer that <Janny,Q2>,<Janny,Q3> <Janny,Q4> must be 5000 and 2 zeros.(but don’t know exactly).

• With the SUM, MAX, MIN of <A1,Q2>, <A1,Q3> and <A1,Q4> , he can concludes that <Janny,Q4> must be 5000 and the others are zeros.

Page 36: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

A security solution for OLAP systems must combine access control and inference control to remove threats.

A practical solution must achieve a balance among following objectives:Security

- Sesitive data should be guarded from both unauthorized accesses and malicious inferences.

Applicability- The solution should not rely on any unrealistic assumptions and should cover a wide range of scenarios without the need for significant modifications.

Security requirement in OLAP SystemsRequirement

Page 37: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Effeciency- Queries should be answered in a matter of seconds or

minutes.- A desired security must be computationally efficient, especially with respect to on-line overhead.

Availability- Data should be available to legitimate users who have sufficient privileges.

Practicality- The solution should not demand significant modifications to the existing infrastructure of an OLAP system.

The main challenge is the inherent tradeoff between above objectives.

Security requirement in OLAP Systems

Page 38: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Some Security Issues

Three-Tier Security

Architeture

Securing Data Cubes in OLAP

Systems

Sum-Only Data Cubes

Generic Data Cubes

Page 39: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Three-Tier Security Architecture

Security in statistic databases usually has 2 tiers:Sensitive Data.

Inference ControlAggregation Queries.

Inference Control mechanisms are used to check each aggregation query to decide whether answering the query.

Through the previously answered queries, many protected data may be disclosed.

Page 40: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Applying two-tier architecture to OLAP has some inherent drawbacks:Checking queries for inferences at run time

may cause unacceptable delay to processing queries.

The complexity of this checking is usually high.

Inference control methods can’t take advantage due to the special characteristic of OLAP system.

Three-Tier Security Architecture

Page 41: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

This Architecture has:3 tiers.3 relations.3 properties satified by

aggregation tier.

Three-Tier Security ArchitectureUser

Queries

Pre-defined Aggregations

Data Set

Access

Control

InferenceControl

Page 42: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP Systems

SUM-Only Data Cubes:As an inherited limitation of statistical

databases, Only SUMs are considered.Only core cuboid is considered as sensitive.2 methods :

Cardinality-BasedMethod. Parity-BasedMethod.

Page 43: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Cardinality-Based MethodNumbers of Empty Cells.

The existance of 1-d inferences only be determined in 2 cases:Core cuboid has no empty cell.Core cuboid of any data cube has fewer non-

empty cells than the given upper bound 2k-1 * d max.

Securing Data Cubes in OLAP Systems

Page 44: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP SystemsCardinality-Based Method

Numbers of Empty Cells.

1-d Inferences:Core cuboid has no empty cell.Core cuboid of any data cube has fewer non-

empty cells than the given upper bound 2k-1 * d max.

Page 45: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP SystemsCardinality-Based Method

M-d Inferences:Core cuboid has no empty cell.Data cube is free of inferences if it has fewer

empty cells than the given upper bound.Data cube having more empty cells than the

given bound always has inferences.Upper bound :

2(du − 4)+2(dv − 4) − 1du, dv are the 2 smallest among di values.di ‘s are values of attribute ith in core

cuboid.

Page 46: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP SystemsCardinality-Based Method

Above results can beused to compute inference-free aggregations based on the three-tier architecture.

Data tier corresponds to core cuboid.

The aggregation tier corresponds to a collection of cells in aggregation cuboids that are free of inferences.

The query tier includes any query that can be rewritten using the cells in the aggregation tier.

Page 47: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Parity-Based MethodBased on a simple fact that even number is

closed under the operation of addition and subtraction.

Suppose now all the sets of queries include even number of cells.

Adding and subtracting these sets to get one cell would be more difficult .

Securing Data Cubes in OLAP Systems

Page 48: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP Systems• Parity-Based Method

X1+X2+X3+X4+X5+X6X1+X2X4+X5X5+X6X3+X5

X5 =<Q3,Allice>= 2500

Page 49: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP Systems

• Parity-Based Method• If a set of queries (set 2) is derivable from another set(set 1) then the answer of the set 2 can be computed using the answer of the set .

If set 1 is free of inference then set 2 is so.

• To detect inferences caused by sets of MDR queries (Q*), we find another collection of queries that are equivalent to Q* and whose inferences are easier to detect.

Page 50: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP Systems

• Parity-Based Method

Page 51: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP Systems

• Parity-Based Method

This method can be enforced based on the three-tier inference control architecture described earlier:

• A partition of the core cuboid based on dimension hierarchies composes the data tier.

• The parity-based method is applied to each block in the partition to compute the aggregation tier.

• The query tier includes any query that is derivable from the aggregation tier.

Page 52: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Generic Data Cubes

A method that does not directly detect inferences, but prevents m-d inferences and then removes 1-d inferences.

It’ s able to deal with datacubes with generic aggregation types.Access Control.Lattice-Based Inference Control.

Securing Data Cubes in OLAP Systems

Page 53: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Access ControlLimit access control to the core cuboid is not

always appropriate.Values in aggregation cuboids may also

carry sensitive information.

Securing Data Cubes in OLAP Systems

Page 54: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Access ControlDescribe a Object:

Function Below() partitions data cube along the dependency lattice.

Function Slice() partitions data cube along dimensions.An Object is simply the intersection of two.

Example: Object (L,S) ,L = <year,employee> and S includes all the cells in the first four quarters of the core cuboids(<Quarter,Employee>).

Securing Data Cubes in OLAP Systems

Page 55: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Lattice-Based Inference ControlGiven two set of cells in a data cube ( S and

T): Cell c is redundant to T if S includes c and it’s

ancestors in any single cuboid. Cell c is non-comparable to T if for every c’ ∈ T, c is

neither ancestor or descendant of c’.

Securing Data Cubes in OLAP Systems

Page 56: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Lattice-Based Inference ControlConsider an Object(L,S):

This object is the union of the cuboids in Below(L).

Let T be the object and S be it’s complement to the data cube.

To remove inferences from S to T, we find a subset of S that is free of m-d inferences to T.

Securing Data Cubes in OLAP Systems

Page 57: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Lattice-Based Inference Control

Securing Data Cubes in OLAP Systems

Page 58: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

After m-d inferences are prevented,need to remove 1-d inferences.

Procedure to remove 1-d inferences: Check each cell and add those that cause 1-d

inferences to the object so they will be prohibited by access control.

We control m-d inferences to this new object by applying the last resultsRepeat these steps, we remove all 1-d inferencesFinal set of cells are free of inferences to the object.

Securing Data Cubes in OLAP SystemsLattice-Based Inference Control

Page 59: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

Securing Data Cubes in OLAP SystemsLattice-Based Inference Control

This method can be implemented based on the three-tier security model:

The authorization object computed through the above process comprises the data tier.

The complement of the object is the aggregation tier because it does not cause any inferences to the data tier.

And the user are free to input queries to the query tier.

Page 60: Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng 50702839

THANK YOU !