5
1 Departamento de Eletrónica, Telecomunicações e Informática Universidade de Aveiro EXPLORAÇÃO DE DADOS / DATA MINING PRACTICE PROBLEMS ( Multidimensional Data Modeling ) 1. Movies distribution association A movies distribution association has set up contracts with several theatres for exhibition of independent movies. To make the communication with the partners easier the association provides two desktop applications, one for Microsoft Windows and the other for Linux operating systems. These applications enable the management a local database with information about the movies exhibitions, the number of spectators and the amounts received in tickets sales for each theatre. They have been developed by two different software houses several years ago (before the advent of web applications) and, as it can be understood from the schemas bellow, the database schemas are different. Schema 1: database for Microsoft Windows applications Estudios PK theaterId PK, FK1 theaterID PK movieId PK studioId name PK, FK2 movieId title name capacity sessionDate year country city ticketsTotalAmount budget ceo numberOfSpectators genre FK1 studioId Realizadores FK2 directorId PK directorId name email cachet WND_theaters WND_exhibitions WND_movies Schema 2: database for Linux applications PK theaterId PK, FK1 theaterId PK movieId name PK, FK2 movieId title capacity year year city month budget day genre totalAmount studio numberTicketsSold director LNX_theaters LNX_exhibitions LNX_movies

Departamento de Eletrónica, Telecomunicações e Informática ... · Departamento de Eletrónica, Telecomunicações e Informática Universidade de Aveiro EXPLORAÇÃO DE DADOS

  • Upload
    vanhanh

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Departamento de Eletrónica, Telecomunicações e Informática ... · Departamento de Eletrónica, Telecomunicações e Informática Universidade de Aveiro EXPLORAÇÃO DE DADOS

1

Departamento de Eletrónica, Telecomunicações e Informática

Universidade de Aveiro

EXPLORAÇÃO DE DADOS / DATA MINING

PRACTICE PROBLEMS

( Multidimensional Data Modeling )

1. Movies distribution association

A movies distribution association has set up contracts with several theatres for exhibition of independent movies. To make the communication with the partners easier the association provides two desktop applications, one for Microsoft Windows and the other for Linux operating systems. These applications enable the management a local database with information about the movies exhibitions, the number of spectators and the amounts received in tickets sales for each theatre. They have been developed by two different software houses several years ago (before the advent of web applications) and, as it can be understood from the schemas bellow, the database schemas are different.

Schema 1: database for Microsoft Windows applications

Estudios

PK theaterId PK, FK1 theaterID PK movieId PK studioId

name PK, FK2 movieId title name

capacity sessionDate year country

city ticketsTotalAmount budget ceo

numberOfSpectators genre

FK1 studioId Realizadores

FK2 directorId PK directorId

name

email

cachet

WND_theaters WND_exhibitions WND_movies

Schema 2: database for Linux applications

PK theaterId PK, FK1 theaterId PK movieId

name PK, FK2 movieId title

capacity year year

city month budget

day genre

totalAmount studio

numberTicketsSold director

LNX_theaters LNX_exhibitions LNX_movies

Page 2: Departamento de Eletrónica, Telecomunicações e Informática ... · Departamento de Eletrónica, Telecomunicações e Informática Universidade de Aveiro EXPLORAÇÃO DE DADOS

2

Having noticed the occurrence of many deficiencies in the process of migration and centralization of information from local databases, preventing the delivery of reliable and timely information for the selection of movies for exhibition at the theatres, it was decided to set up a centralized data repository accessible via Web applications. Hence, it is required to design a Data Mart to centralize the information from all partners that can be maintained by the staff of theaters or by the personal of the association, using web forms. The Data Mart must hold information allowing to:

• Follow the evolution in the number of spectators and the amounts received by theater or city, categorized by genre of movie.

• Determine who are the directors and the genres attracting a larger number of spectators by region.

• Which are the seasons (year and month) and the regions attracting a larger number of spectators.

Please, answer to the following requests:

1. Identify the facts and their granularity. 2. Name the facts table(s) are their measures. 3. Name the dimensions. 4. Design a multidimensional data model for the decision support system described above.

2. Administration of medical centers

A company that administrates several medical centers maintains an operational system for the registration of medical appointments at each center. To improve their business, they want to implement a decision support system and they have identified the following needs:

• To understand billing by medical specialty or location (city) of the appointments, considering the sex and the age group of the patients.

• To analyze the data by month, quarter and year.

• To understand the evolution of drug prescriptions (quantity) by clinic, physician, sex and age group.

Design a multidimensional data model for this decision support system.

3. Automobile insurance company

An automobile insurance company wants to implement a Data Mart (very simplified) and has identified the following dimensions of information:

• Insured party: identification and name. There is an average of two insured parties for each policy and covered item.

• Coverage item: coverage key and description. There is an average of 10 covered items per policy.

• Agent: identification and name.

Page 3: Departamento de Eletrónica, Telecomunicações e Informática ... · Departamento de Eletrónica, Telecomunicações e Informática Universidade de Aveiro EXPLORAÇÃO DE DADOS

3

There is one agent for each policy and covered item.

• Policy: policy key and type. Currently, the company has approximately 500.000 policies.

• Period: date key and fiscal period.

Considering that the company also wants to keep track of the policy-premium, deductible, and number of transactions, answer to the following requests:

1. Design a multidimensional data model for this Data Mart. 2. Estimate the number of rows in the fact table, using the assumptions stated above.

4. Hotel chain

A hotel chain has 30 hotels in Europe. There are two brand lines of hotels. The primary line of hotels features larger than average room, most of which are suites. The target customers are business travelers and upscale vacationers. The second line of hotel features competitive rates, though with limited facilities. The target customers are price-sensitive customers.

There are different type of rooms, like standard rooms and suites. Rooms are also categorized by size as small, medium, and large. Each room may also incorporate certain optional features, such as a refrigerator or kitchenette.

The management wants to analyze use of the hotel chain’s capacity i.e. occupancy rate and the biggest challenge they face is determining how to price the hotel rooms.

You are required to design a data mart for the hotel management based on the following requirements:

1. Analysis of the daily occupancy rate by room type or location.

Occupancy rate = occupied room / (occupied rooms + vacant rooms + unavailable rooms)

2. Analysis of average utilization levels for specific hotels or room types over time. 3. For each room type and hotel, capture the accommodation revenues for comparison to

occupancy levels.

Design a multidimensional data model for this decision support system keeping in mind the above requirements. You are required to identify all dimension and facts. Also classify each fact as additive, semi-additive, or non-additive.

Page 4: Departamento de Eletrónica, Telecomunicações e Informática ... · Departamento de Eletrónica, Telecomunicações e Informática Universidade de Aveiro EXPLORAÇÃO DE DADOS

r>.lÍaie~~L1fulhdJ.a,-.ens'h:ni"---Xn..<h\.emo.S~&:i..'-C5

t _~àsXJC jC\ <:cP cjQ '\)',sI-n,l-,\ 'i~Çp d€ G\ooP<,

" .,.Ao,,',I,·cf'.a r-< 10.ê.if,' P (\<' ,",un, C>."•..••. ~ ,""\~~r

U

-nW::De no d",

c

_~o >0 ~ .~·L-1

V

) (: '.~ .r~ ÀEo ,lq\YY'P",

--./'. ." . !(Ao ""~+rA'""t><

". Ao ,""orn\\{'\( S\J

':\ D!tY"\Pn<,noS . . -,

r;rlr,d", \ r.,Ao

d;"",<..l,r.~

Qp,e"n'r" r .•~U

,~,~ \ "c-Ae--.J

~.~ c ~6<. c ,"<V''v-

(.,11'~" ~. ~. o

~ ~'i'f"";>'Of-----

r:-",\....\c;:;-' AI'" C\""".,

<:i~ ~"_"",,-,..,,,"r.~", ---oo e o d,,-ta rrx'.' <>"'-'

Q",,~~ ,,: r<' N"V> '0. ·.\-r.< -I.

U U

I

~ ""\'",,If"IR

::l 7\d~,,, ~ dP fY"Ic>dic,",<

-"'

rr-Ao r.A,..,..I,p

I~ O""","", . ~...r-.~ ,

,...,'P"«I' ,-:nc"',,W<

~ I, . ~. .~ ., ,\ \..:<..- &>,,'" ?c-\crip

Page 5: Departamento de Eletrónica, Telecomunicações e Informática ... · Departamento de Eletrónica, Telecomunicações e Informática Universidade de Aveiro EXPLORAÇÃO DE DADOS

~

...I~À,,~

\---

IR ar- i<rt("'; Aa \""", ,\t-nçv

,,,,,,,D _9;-- -~".-~,--'~

1-", .l~,_. ~', .r> "'~"

~A'-:'"iII

1"A--I"

"\,,,cc\

(O ~~~+r,<

'",~

, - "', ""~~"'o -sé"e,:(o 0.<'(')

0--~,

,ri ~r\<;;--

\~

;_.

l... (i ,n ,-1."" "'''~ \~"s, n, ",r.~",.-,',<-) --.)

<. .,-, ~e '60 ~,-,,,,r~_ ~D"~\r".V

~A(--, ~

o~'"""' ' ,,\v,., e "'O<r~'It';:;:'"

l..,,~I.D. ,r\ ..;',r~,,;:;;" '" I...~"v

~~,?rA,", ,r'- I.:.",.~\

~ J" \.,«_\

~'d

---

D. ,-,'<'1 "D

I

",(1 \v-r,~<-rc c;::~" - -/

"i ,A """"-'"

.L:,6

\

~;d

J'-:'lP""