23
Bài 7: Chuẩn hoá dữ liệu, Một số vấn đề khác 1 EE4509, EE4253, EE6133 HK1 2013/2014 TS. Đào Trung Kiên ĐH Bách khoa Hà Nội

07. Data Normalization & More.pdf

Embed Size (px)

DESCRIPTION

Normalization

Citation preview

  • Bi 7: Chun ho d liu,

    Mt s vn khc

    1 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Chun ho d liu

    (data normalization)

    2 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Khi nim

    Vic thit k mt CSDL quan h l xy dng mt lc quan h

    cho php lu tr nhng d liu mong mun

    gim thiu tnh d tha d liu

    cho php trch xut thng tin d dng

    s dng cc dng chun (normal forms): l tp hp cc tiu chun cho CSDL

    Chun ho d liu l qu trnh cu trc mt CSDL quan h nhm gim thiu d tha v ph thuc ca d liu (da vo kho v cc ph thuc dng hm)

    3 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Cc dng chun

    Dng chun th nht (First normal form 1NF): 1970

    Dng chun th hai (Second normal form 2NF): 1971

    Dng chun th ba (Third normal form 3NF): 1971

    Dng chun Boyce-Codd (Boyce-Codd normal form BCNF): 1974

    Dng chun th t (Forth normal form 4NF): 1977

    Dng chun th nm (Fifth normal form 5NF): 1979

    Dng chun th su (Sixth normal form 6NF): 2003

    4 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Dng chun th nht 1NF

    Mt thc th tho mn 1NF nu n khng c nhm thuc tnh no c lp li

    Loi b cc thuc tnh a tr

    Phn VD: thc th Order vi phm 1NF v nhm (item_name, item_number, item_price)

    lp li 9 ln

    y l dng chun n gin nht

    Chuyn v 1NF:

    Chia cc nhm thuc tnh lp li thnh cc quan h nh hn

    S dng thuc tnh kho v kho ngoi

    Lin kt 1..n gia cc quan h

    5 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

    Order

    id: int

    shipdate: date

    customer: string

    address: string

    item_name1: string

    item_number1: int

    item_price1: int

    item_name2: string

    item_number2: int

    item_price2: int

    item_name9: string

    item_number9: int

    item_price9: int

    ...

  • V d: Chuyn mt thc th v 1NF

    nh ngha thm quan h ph: OrderItem

    Lin kt 1..n

    Ch s dng thm kho ngoi: order_id

    6 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

    Order

    id: int

    shipdate: date

    customer: string

    address: string

    item_name: string

    item_number: int

    item_price: int

    OrderItem

    order_id: int

  • Dng chun th hai 2NF

    Mt quan h tho mn 2NF khi v ch khi ng thi:

    tho mn 1NF

    khng c thuc tnh no c xc nh bi mt tp con ca kho

    7 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

    m-sv lp h-tn

    123 CSDL Trn Khnh Linh

    123 KTLT Trn Khnh Linh

    456 LTM Bill Gates

    456 CSDL Bill Gates

    456 KTLT Bill Gates

    789 VXL L Lin Kit

    789 LTM L Lin Kit

    Phn VD: quan h bn khng tho mn 2NF v:

    h-tn c xc nh hon ton bi m-sv

    m-sv l tp con ca kho (m-sv, lp)

    Tnh d tha d liu: h tn c lu tr nhiu ln

  • Chuyn v 2NF

    chuyn mt quan h v 2NF:

    Tch thnh cc quan h nh hn

    S dng lin kt 1..n

    8 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

    M SV H tn

    123 Trn Khnh Linh

    456 Bill Gates

    789 L Lin Kit

    M SV Lp

    123 CSDL

    123 KTLT

    456 LTM

    456 CSDL

    456 KTLT

    789 VXL

    789 LTM

  • Ph thuc dng hm

    2NF s dng khi nim ph thuc dng hm (functional dependencies): l s tng qut ho ca khi nim kho

    nh ngha: Trn mt quan h R, cho R v R l hai tp thuc tnh ca R. Gi ph thuc dng hm vo (k hiu: ) khi v ch khi mi gi tr ca xc nh mt gi tr ca .

    Vi v d trc ta c:

    m-sv h-tn

    Vi khi nim ny, c th nh ngha li 2NF: khng c tp thuc tnh khng kho no ph thuc dng hm vo mt tp con ca thuc tnh kho

    9 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Cc tnh cht ca ph thuc dng hm

    Cc tin Armstrong

    Phn x (ph thuc tm thng): nu th

    Tng cng: nu th (, ) (, )

    Bc cu: nu v th

    Mt s tnh cht khc

    Hp: nu v th (, )

    Phn r: nu (, ) v th

    Gi bc cu: nu v (, ) th (, )

    10 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Dng chun th ba 3NF

    Mt quan h tho mn 3NF khi v ch khi ng thi:

    tho mn 2NF

    khng c ph thuc dng hm no vi thuc tnh khng kho

    Phn v d:

    (Tn ti ph thuc dng hm: tc-gi nm-sinh-tg)

    11 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

    tn sch nm tc-gi nm-sinh-tg

    The universe in a nutshell 2001 Stephen Hawking 1942

    The Da Vinci code 2003 Dan Brown 1964

    A brief history of time 1988 Stephen Hawking 1942

    Digital fortress 1998 Dan Brown 1964

    The lost symbol 2009 Dan Brown 1964

  • Chuyn v 3NF

    chuyn mt quan h v 3NF (tng t vi 2NF):

    Tch thnh cc quan h nh hn

    S dng lin kt 1..n

    Thm thuc tnh kho ca quan h mi

    12 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

    tn sch nm tc-gi

    The universe in a nutshell 2001 Stephen Hawking

    The Da Vinci code 2003 Dan Brown

    A brief history of time 1988 Stephen Hawking

    Digital fortress 1998 Dan Brown

    The lost symbol 2009 Dan Brown tc-gi nm-sinh-tg

    Stephen Hawking 1942

    Dan Brown 1964

  • Dng chun Boyce-Codd BCNF

    BCNF c nh ngha b tr cho 3NF cn c gi l 3.5NF

    nh ngha: quan h R tho mn BCNF khi v ch khi: vi mi ph thuc dng hm , mt trong hai iu kin sau tho mn:

    l ph thuc dng hm tm thng (tc )

    l kho ca R

    Tnh cht:

    Nu R tho mn BCNF th tho mn 3NF

    Ngc li cha chc ng, nhng ch mt s t trng hp

    13 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Khung nhn

    (views)

    14 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Khi nim

    Khung nhn l cc quan h o thun tu v mt logic, c to ra da trn cc quan h thc, nhm gip thun tin trong s dng

    V d:

    Trn CSDL nhn vin, n thng tin v mc lng, a ch nh vi cc ngi dng thng thng

    Trn CSDL sinh vin, gp cc quan h SinhVien, LopHoc, DangKy thnh mt quan h o khc d s dng

    Vic sa i hay trch thng tin trn khung nhn phi m bo phn nh ng nh khi thao tc trn cc quan h thc

    15 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • L do chnh dng khung nhn

    Ch lm vic trn mt phn ca d liu

    C th gp nhiu quan h thnh mt quan h o

    To ra cc quan h c kh nng tu bin cao theo nhu cu s dng

    16 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

    Khung nhn khng lu tr thm d liu, m thc thi trn cc quan h thc

    H tr thm kh nng bo mt thng tin

    n nhng phn d liu khng mun th hin ra bn ngoi

  • SQL

    To khung nhn:

    create view tn as select ;

    nh ngha ca khung nhn ph thuc vo cu lnh select

    Xo khung nhn:

    drop view tn;

    Sau khi c to, vic truy vn v cp nht d liu ca khung nhn tng t nh vi cc quan h thng

    17 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • V d (MySQL) mysql> select * from t; +------+-------+ | qty | price | +------+-------+ | 3 | 50 | | 5 | 60 | | 2 | 20 | +------+-------+ 3 rows in set (0.00 sec) mysql> create view t1 as select qty, price as value from t where qty>2; Query OK, 0 rows affected (0.02 sec) mysql> select * from t1; +------+-------+ | qty | value | +------+-------+ | 3 | 50 | | 5 | 60 | +------+-------+ 2 rows in set (0.01 sec) mysql> _

    18 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • nh ch mc

    (indexing)

    19 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Khi nim

    Vic tm kim vi d liu c sp xp s nhanh hn nhiu so vi d liu khng c sp xp

    V d: bi ton tra t in

    nh ch mc (indexing) l vic to ra cc cu trc d liu (cy, bng bm,) ph gip tm kim d liu nhanh hn

    C th to nhiu index cho mi quan h

    Nn to index cho cc thuc tnh hay c dng trong cc iu kin tm kim (mnh where..)

    Khng phi iu kin tm kim no cng c th dng c index (VD: tm kim chui con,)

    20 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • SQL

    To index:

    create index tn-index

    on tn-quan-h(tn-thuc-tnh);

    Xo index:

    drop index tn-index on tn-quan-h;

    Lit k cc index:

    (MySQL) show indexes from tn-quan-h;

    (SQL Server) exec sp_helpindex tn-quan-h;

    Sau khi index c to, vic s dng cc quan h vn nh trc. Vic s dng ti cc index l t ng do DBMS t quyt nh.

    21 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • V d (MySQL) mysql> select count(*) from thivien_poem where AUTHOR=20; +----------+ | count(*) | +----------+ | 158 | +----------+ 1 row in set (1.74 sec) mysql> create index thivien_poem_AUTHOR on thivien_poem(AUTHOR); Query OK, 40349 rows affected (1 min 14.00 sec) Records: 40349 Duplicates: 0 Warnings: 0 mysql> select count(*) from thivien_poem where AUTHOR=20; +----------+ | count(*) | +----------+ | 158 | +----------+ 1 row in set (0.00 sec) mysql> _

    22 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni

  • Bi tp

    1. Xc nh xem quan h sau thuc dng chun no: ITEM (SKU, PromID, Vendor, Style, Price)

    (SKU, PromID) (Vendor, Style, Price)

    SKU (Vendor, Style)

    2. Chun ho quan h trn v dng cao hn

    3. Chn mt kho v lit k cc ph thuc dng hm cho: ITEMS (PONum, ItemNum, PartNum, Desc, Price, Qty)

    23 EE4509, EE4253, EE6133 HK1 2013/2014 TS. o Trung Kin H Bch khoa H Ni