Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng...

Preview:

Citation preview

Nguyễn Phạm Luân Tiến 50702449Trần Đình Hương Trà 50702573

Dương Bách Tùng 50702839

Content

1• Introduction about OLAP

Systems.

2• Security requirement in OLAP

Systems.

3• Some Security Issues.

Introduction of OLAP Systems

Nowaday database is used in two main context:

1.OLTP: On-Line Transaction Processing

2.OLAP / DS: On-Line Analytical Processing / Decision

Support

OLTP vs OLAPOLTP OLAP

Function Constanly handling Decision support

Database design

Applications – Oriented Subjects – Oriented

Data Now, update, detail,… History, aggregation of multidimensions

Access Read / Write / Index Review many times

Unit of work Short single transactions Complex queries

# Record access

k . 10 k . 106

# User k . 103 k . 102

Database size 100 Mb – GB 100 Gb – Tb

Data WarehouseA data warehouse (DW) is a database used for reporting. The data is offloaded from the operational systems for reporting. DW collect data in support of manager’s decesion – making process.

Subjects – oriented Integrated Time – variant Non – volatile

Subject OrientedData is categorized and stored by business subject rather than

by application.

OperationalOperational SystemsSystems

Savings

Shares

Loans

Insurance

EquityPlans

CustomerProduct, Sales

Information

CustomerProduct, Sales

Information

Data Warehouse Data Warehouse Subject AreaSubject Area

Integrated

Data WarehouseData WarehouseOperational EnvironmentOperational Environment

Subject = CustomerSubject = Customer

SavingsApplication

Current Accounts

Application

LoansApplication

NoNo

ApplicationApplication

FlavorFlavor

Time VariantData is stored as a series of snapshots, each representing a

period of time.

DataTime

01/97

02/97

03/97

Data for January

Data for February

Data for March

Data Data WarehouseWarehouse

Non Volatile Typically data in the data warehouse is not updated

or deleted.

ReadRead

LoadLoad

INSERT ReadINSERT Read

UPDATEUPDATE

DELETEDELETE

Operational DatabasesOperational Databases Warehouse DatabaseWarehouse Database

OLAPIn computing, online analytical processing, or

OLAP is an approach to swiftly answer multi-dimensional analytical queries.

The OLAP database is usually updated in batch, often from multiple sources which most people want from their applications is consistently fast response time.

OLAP is a protocol for processing business data. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling.

OLAP SERVICES

OLAP ARCHITECTURESPopular architectures of OLAP systems include ROLAP

(relational OLAP) and MOLAP (multidimensional OLAP).

1)ROLAP provides a front-end tool that translates multidimensional queries into corresponding SQL queries to be processed by the relational backend.

2)MOLAP does not rely on the relational model but instead materializes the multidimensional views.

3)Using MOLAP for dense parts of the data and ROLAP for the others leads to a hybrid architecture, namely, the HOLAP or hybrid OLAP.

ROLAP

ColumnsColumns

RowsRows

TableTable

Key values to joinKey values to join

KEY IN ROLAP

TimeTime

ProductProduct

StoreStore

Single ColumnSingle ColumnTime KeyTime Key

Single ColumnSingle ColumnProduct KeyProduct Key

Single ColumnSingle ColumnStore KeyStore KeyCompositeComposite

KeyKey

Star schema – 4 dimensions

Snowflake schema

MOLAP

MOLAP

MOLAP

all

time item city supplier

time,item time,city

time,supplier

item,city

item,supplier

city,supplier

time,item,location

time,item,supplier

time,city,supplier

item,city,supplier

time, item, city, supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

Geography

Product

Item

Type

Category

All

City

State

Country

All Time

Month

Year

Day

Week

All

Quarter

Geography

Product

Item

Type

Category

All

City

State

Country

All Time

Month

Year

Day

Week

All

Quarter

SalesYear to date ($millions)

ProductsTime

Q1 Q2

ElectronicsToys

ClothingsCosmetics

$5.2$1.9$2.3$1.1

ElectronicsToys

ClothingsCosmetics

$8.9$0.75$4.6$1.5

Store 1Store 2

SalesYear to date ($millions)

ProductsQ1

Store 1 Store 2

ElectronicsToys

ClothingsCosmetics

$5.2$1.9$2.3$1.1

$8.9$0.75$4.6$1.5

ElectronicsToys

ClothingsCosmetics

$8.9$0.75$4.6$1.5

Store 1Store 2

Relational Multidimentsional

Data representation

Two dimensions Multiple dimenstions

Data extraction Specific rows Specific dimensions

Computations Functions High speed matrix

Results Tool specific Matrix

HOLAP

OLAP ARCHITECTURESMOLAP ROLAP HOLAP

Underlying data storage

Cube Relational Table

Relational Table

Aggregative data storage

Cube Relation Table Cube

Productivity of Queries Fastest Slowest Fast

Consumption of storage space

High Low Normal

Maintenance cost High Low Normal

Security requirement in OLAP Systems

OLAP system heavily depends on aggregates of data.

They are very vulnerable to indirect inferences of protected data.

Threat of Inferences It is illustrated through 4 Examples:

1. 1 Dimensional Inference (1-d Inference)

2.Multi-Dimensional Inference (m-d Inference) with SUM only.

3.M-d Inference with MAX only.4.M-d Inference with SUM, MAX and MIN.

One dimensional Inference(1-d Inference):

Security requirement in OLAP Systems

Suggest that adversary :

• Can’t access the cuboid <Employee,Quarter> but is allowed to access <Department,Quarter>.

• Knows the Empty cells ‘ value through the outbound channels. Then he can infer that <Bob,Q1> as exactly same value as <A1,Q1>.

Organization

Multi-Dimensional Inferences( m-d Inference) with SumSecurity requirement in OLAP Systems

Suggest that adversary can:• Only Access to <Department,Quarter> and <Employee,Year>.• Know the empty cells ‘ value through out the outbound channels.

A m-d inference is possible as follow:• He first sum the cells <Bob,Y1> and <Alice,Y1> then subtract the cells <A1,Q2> and <A1,Q3>. The final result yeilds a sensitive cell: <Bob,Q1>.

<Bob,Q1> = 1500.

Time

Organization

Multi-Dimensional Inferences( m-d Inference) with MaxSecurity requirement in OLAP Systems

Now, adversary don’t know the value of the empty cells( core cuboid is full of unknown values). The cube will be free of inferences with the SUM aggregations. Can make a m-d inference with MAX aggregations as follow:

- MAX values in cells <Janny,Y1> and <A1,Q4>( that is 6000 and 5000).

- From here, he can infer 1 of 3 cells <Janny, Q1>, <Janny,Q2> or <Janny,Q3> is 6000.

- Neither <Janny,Q2> nor <Janny,Q3> can be 6000.

<Janny,Q1> = 6000

Multi-Dimensional Inferences( m-d Inference) with Sum, Max and Min:

Security requirement in OLAP Systems

Security requirement in OLAP SystemsMulti-Dimensional Inferences( m-d Inference) with

Sum, Max and Min:

• Now suppose that adversary can ask queries using SUM, MAX, MIN on the data cube.

• Following last example, he can infer <Janny,Q1> = 6000.

• SUM, MAX, MIN of <Janny,Y1> are 11000, 6000, and 5000.

• From here, he can infer that <Janny,Q2>,<Janny,Q3> <Janny,Q4> must be 5000 and 2 zeros.(but don’t know exactly).

• With the SUM, MAX, MIN of <A1,Q2>, <A1,Q3> and <A1,Q4> , he can concludes that <Janny,Q4> must be 5000 and the others are zeros.

A security solution for OLAP systems must combine access control and inference control to remove threats.

A practical solution must achieve a balance among following objectives:Security

- Sesitive data should be guarded from both unauthorized accesses and malicious inferences.

Applicability- The solution should not rely on any unrealistic assumptions and should cover a wide range of scenarios without the need for significant modifications.

Security requirement in OLAP SystemsRequirement

Effeciency- Queries should be answered in a matter of seconds or

minutes.- A desired security must be computationally efficient, especially with respect to on-line overhead.

Availability- Data should be available to legitimate users who have sufficient privileges.

Practicality- The solution should not demand significant modifications to the existing infrastructure of an OLAP system.

The main challenge is the inherent tradeoff between above objectives.

Security requirement in OLAP Systems

Some Security Issues

Three-Tier Security

Architeture

Securing Data Cubes in OLAP

Systems

Sum-Only Data Cubes

Generic Data Cubes

Three-Tier Security Architecture

Security in statistic databases usually has 2 tiers:Sensitive Data.

Inference ControlAggregation Queries.

Inference Control mechanisms are used to check each aggregation query to decide whether answering the query.

Through the previously answered queries, many protected data may be disclosed.

Applying two-tier architecture to OLAP has some inherent drawbacks:Checking queries for inferences at run time

may cause unacceptable delay to processing queries.

The complexity of this checking is usually high.

Inference control methods can’t take advantage due to the special characteristic of OLAP system.

Three-Tier Security Architecture

This Architecture has:3 tiers.3 relations.3 properties satified by

aggregation tier.

Three-Tier Security ArchitectureUser

Queries

Pre-defined Aggregations

Data Set

Access

Control

InferenceControl

Securing Data Cubes in OLAP Systems

SUM-Only Data Cubes:As an inherited limitation of statistical

databases, Only SUMs are considered.Only core cuboid is considered as sensitive.2 methods :

Cardinality-BasedMethod. Parity-BasedMethod.

Cardinality-Based MethodNumbers of Empty Cells.

The existance of 1-d inferences only be determined in 2 cases:Core cuboid has no empty cell.Core cuboid of any data cube has fewer non-

empty cells than the given upper bound 2k-1 * d max.

Securing Data Cubes in OLAP Systems

Securing Data Cubes in OLAP SystemsCardinality-Based Method

Numbers of Empty Cells.

1-d Inferences:Core cuboid has no empty cell.Core cuboid of any data cube has fewer non-

empty cells than the given upper bound 2k-1 * d max.

Securing Data Cubes in OLAP SystemsCardinality-Based Method

M-d Inferences:Core cuboid has no empty cell.Data cube is free of inferences if it has fewer

empty cells than the given upper bound.Data cube having more empty cells than the

given bound always has inferences.Upper bound :

2(du − 4)+2(dv − 4) − 1du, dv are the 2 smallest among di values.di ‘s are values of attribute ith in core

cuboid.

Securing Data Cubes in OLAP SystemsCardinality-Based Method

Above results can beused to compute inference-free aggregations based on the three-tier architecture.

Data tier corresponds to core cuboid.

The aggregation tier corresponds to a collection of cells in aggregation cuboids that are free of inferences.

The query tier includes any query that can be rewritten using the cells in the aggregation tier.

Parity-Based MethodBased on a simple fact that even number is

closed under the operation of addition and subtraction.

Suppose now all the sets of queries include even number of cells.

Adding and subtracting these sets to get one cell would be more difficult .

Securing Data Cubes in OLAP Systems

Securing Data Cubes in OLAP Systems• Parity-Based Method

X1+X2+X3+X4+X5+X6X1+X2X4+X5X5+X6X3+X5

X5 =<Q3,Allice>= 2500

Securing Data Cubes in OLAP Systems

• Parity-Based Method• If a set of queries (set 2) is derivable from another set(set 1) then the answer of the set 2 can be computed using the answer of the set .

If set 1 is free of inference then set 2 is so.

• To detect inferences caused by sets of MDR queries (Q*), we find another collection of queries that are equivalent to Q* and whose inferences are easier to detect.

Securing Data Cubes in OLAP Systems

• Parity-Based Method

Securing Data Cubes in OLAP Systems

• Parity-Based Method

This method can be enforced based on the three-tier inference control architecture described earlier:

• A partition of the core cuboid based on dimension hierarchies composes the data tier.

• The parity-based method is applied to each block in the partition to compute the aggregation tier.

• The query tier includes any query that is derivable from the aggregation tier.

Generic Data Cubes

A method that does not directly detect inferences, but prevents m-d inferences and then removes 1-d inferences.

It’ s able to deal with datacubes with generic aggregation types.Access Control.Lattice-Based Inference Control.

Securing Data Cubes in OLAP Systems

Access ControlLimit access control to the core cuboid is not

always appropriate.Values in aggregation cuboids may also

carry sensitive information.

Securing Data Cubes in OLAP Systems

Access ControlDescribe a Object:

Function Below() partitions data cube along the dependency lattice.

Function Slice() partitions data cube along dimensions.An Object is simply the intersection of two.

Example: Object (L,S) ,L = <year,employee> and S includes all the cells in the first four quarters of the core cuboids(<Quarter,Employee>).

Securing Data Cubes in OLAP Systems

Lattice-Based Inference ControlGiven two set of cells in a data cube ( S and

T): Cell c is redundant to T if S includes c and it’s

ancestors in any single cuboid. Cell c is non-comparable to T if for every c’ ∈ T, c is

neither ancestor or descendant of c’.

Securing Data Cubes in OLAP Systems

Lattice-Based Inference ControlConsider an Object(L,S):

This object is the union of the cuboids in Below(L).

Let T be the object and S be it’s complement to the data cube.

To remove inferences from S to T, we find a subset of S that is free of m-d inferences to T.

Securing Data Cubes in OLAP Systems

Lattice-Based Inference Control

Securing Data Cubes in OLAP Systems

After m-d inferences are prevented,need to remove 1-d inferences.

Procedure to remove 1-d inferences: Check each cell and add those that cause 1-d

inferences to the object so they will be prohibited by access control.

We control m-d inferences to this new object by applying the last resultsRepeat these steps, we remove all 1-d inferencesFinal set of cells are free of inferences to the object.

Securing Data Cubes in OLAP SystemsLattice-Based Inference Control

Securing Data Cubes in OLAP SystemsLattice-Based Inference Control

This method can be implemented based on the three-tier security model:

The authorization object computed through the above process comprises the data tier.

The complement of the object is the aggregation tier because it does not cause any inferences to the data tier.

And the user are free to input queries to the query tier.

THANK YOU !

Recommended