35
SANtopia Design Features 컴컴컴 . 컴컴컴컴컴컴컴컴 ETRI

SANtopia Design Features

Embed Size (px)

DESCRIPTION

SANtopia Design Features. 컴퓨터 . 소프트웨어연구소. 배경. 인터넷의 확산으로 인한 데이터의 폭발적 증가 대용량 저장장치의 요구사항 증가 확장 (Scalability) 가능한 저장매체 필요. Source: IDC. (PetaBytes). (Years). 저장매체 용량의 수요예측. Client. Client. Internet. Web Server. Application Server. DB Server. 배경. 기존 서버 중심 환경의 문제점 - PowerPoint PPT Presentation

Citation preview

Page 1: SANtopia Design Features

SANtopiaDesign Features

컴퓨터 .소프트웨어연구소 ETRI

Page 2: SANtopia Design Features

자료저장시스템워크샵

배경 인터넷의 확산으로 인한 데이터의 폭발적 증가

대용량 저장장치의 요구사항 증가 확장 (Scalability) 가능한 저장매체 필요

0

200

400

600

800

1,000

1,200

1,400

1,600

1998 1999 2000 2001 2002

Source: IDC(PetaB

ytes)

(Years)

저장매체 용량의 수요예측

Page 3: SANtopia Design Features

자료저장시스템워크샵

배경 기존 서버 중심 환경의 문제점

성능상의 문제 발생 (Performance Bottleneck) 확장성의 한계 ( 저장장치 , 컴퓨팅 파워 )

Application Server

DB Server

ClientClient

Internet

Web Server

기존에 사용되던 서버 중심 저장장치 환경

Page 4: SANtopia Design Features

자료저장시스템워크샵

SAN 기반 저장장치 수 많은 저장장치를 고속의 전용 네트워크 (Fiber Channel) 에 연결하여 대용량의 공유 저장매체를 제공하는 새로운 개념의 저장장치

Internet

Storage Area NetworkStorage Area Network

Web Server

Appl. Server

DB Server

FC Switch

ClientClient

RAID Tape Driver

RAID RAIDDisk Disk

FC Switch

배경

Page 5: SANtopia Design Features

자료저장시스템워크샵

배경 SAN 의 수요 확대

년 평균 증가율 : 85% 스토리지 수요 증가율 (87%) 과 비슷한 증가율

01,5003,0004,5006,0007,5009,00010,50012,00013,50015,000

1996 1997 1998 1999 2000 2001 2002

Source: IDC

(Million

s of $)

(Years)

SAN(Storage Area Network) 시장 예측

Page 6: SANtopia Design Features

자료저장시스템워크샵

배경SAN 은 대용량 저장장치를 지원하기 위한 새로운 개념의 저장장치 H/W 기술

SAN 하드웨어 기술- 데이터 공유- 성능 병목 해결- 사고 발생시 복구- 통합 관리

SAN의 가치를 더욱 높이기 위해서는 SAN Virtualization을 지원하는 시스템 소프트웨어가 제공되어야 함

대용량 공유 파일 시스템의 지원

H/W 독립적인 논리적 저장장치 지원

중앙집중식 시스템 매니지먼트 지원

- 대용량 저장 매체 지원- 저장장치 확장성 지원

추가 요구사항

Page 7: SANtopia Design Features

자료저장시스템워크샵

배경 SAN Virtualization 시장 예측

SAN H/W 증가율보다 높은 증가율을 나타냄 (100% 이상 ) SAN Virtualization 시장규모는 SAN H/W 의 10%

수준

0

200

400

600

800

1,000

1,200

1,400

1997 1998 1999 2000 2001 2002 2003

Source: IDC

(Million

s of $)

(Years)

SAN Virtualization 시장 예측

Page 8: SANtopia Design Features

자료저장시스템워크샵

SANtopia 란 ? S/W to provide SAN Virtualization

High Performance

• Fast Accessible Directory Structure• Load Balancing• Global Buffer Sharing

High Performance

• Fast Accessible Directory Structure• Load Balancing• Global Buffer Sharing

High Availability

• Fast recovery• Online backup• Snapshot

High Availability

• Fast recovery• Online backup• Snapshot

SANInfrastructure

SANInfrastructure

Shared File System

Shared File System

Logical VolumeDriver

Logical VolumeDriver System

ManagementSystem

Management

SANtopiaSANtopiaSANtopiaSANtopia

High Scalability• Dynamic Inode - No preallocated inode table• Dynamic Reconfiguration - Online Resizing

High Scalability• Dynamic Inode - No preallocated inode table• Dynamic Reconfiguration - Online Resizing

Page 9: SANtopia Design Features

자료저장시스템워크샵

Features of SANtopia 64-bit File and File System Global File Sharing

Provide Global buffer

Open SAN File System Storages Cluster File System Centralized Lock Manager with Load Balancing

Not use device lock Integration of Buffer Manager and Lock Manager

Software RAID(0, 1, 0+1, 5, Concatenate) Comprised of three parts

Logical Volume Manager Global Shared File System Lock and Buffer Manager

Page 10: SANtopia Design Features

자료저장시스템워크샵

SANtopia 구조

DiskDiskDiskDisk

File Manager

Global Lock & Buffer Manager

VNODE Interface System Call Interface

IP over SAN SCSI over SAN

• Mapping Management• Configuration Management

Logical Volume Manager

System Management• Performance Monitor• Online Backup• Scalability Management

• I/O Management• Mapping Management

• Inode Management• Log Management

• Recovery Management• BitMap Management• Transaction Management• File Operation Management

IP over SAN

Page 11: SANtopia Design Features

자료저장시스템워크샵

SANtopia Logical Volume Manager

Page 12: SANtopia Design Features

자료저장시스템워크샵

Features of LVM Volume Create/Remove On-line Volume Resize Dynamic Reconfiguration Software RAID(0, 1, 0+1, 5, Concatenation)

disk1 disk2 disk3 disk4

disk5 disk6 disk7 disk8

Volume 1 : Striping (RAID 0)

Volume 2 : Concatenation

Volume 3 : Striped parity (RAID 5)

Volume 4 : Striped Mirroring (RAID 0+1)

Page 13: SANtopia Design Features

자료저장시스템워크샵

A Disk Layout

label

Private partition(physical partition)

Public partition(physical partition)

Logical Partition Information Disk Identifier Information about Logical

Volume

Allocation Bitmap

Mapping Info.

Logical partition

Logical partition

Logical partition

Page 14: SANtopia Design Features

자료저장시스템워크샵

Volume Resize

Extend/Shrink Unit = Logical Partition

When a Volume is Striped Add Row Add Column

• Data Relocation Needed

Page 15: SANtopia Design Features

자료저장시스템워크샵

Free Space Manager

Physical Allocation Bitmap Divide into fixed size units Each unit controlled by separate locks Entire bitmap is duplicated

Effects Increase Parallelism Get scalability Avoid bottleneck Reduce metadata search time

physical allocation bitmap

Logical partition

Logical partition

Page 16: SANtopia Design Features

자료저장시스템워크샵

Mapping Manager Virtualization of Physical Storage

provide flexibility enable data movement between

Logical Partitions enable snapshot

Each Mapping Information Covered by one host Chained declustered for safety Same effects as Free Space Manager Flexible to fail-over

Host C

Logical partition

Host D

Logical partition

Host A

Logical partition

Host B

Logical partition

Page 17: SANtopia Design Features

자료저장시스템워크샵

I/O Manager

Load Balancing of I/O Read Policy

Round-Robin Policy§ In case of same Capability

Preferred-Plex Policy§ In case of different Capability

Page 18: SANtopia Design Features

자료저장시스템워크샵

SANtopia File Manager

Page 19: SANtopia Design Features

자료저장시스템워크샵

Features of SANtopia File Mgr

Extent Based 64-bit File System 64-bit Address Support Large File

Dynamic inode allocation Multi-Level inode

Support Large Directory Extensible Hash based directory management

Fast Recovery Metadata Journaling

Inode Stuffing

Page 20: SANtopia Design Features

자료저장시스템워크샵

SANtopia File System Layout

Boot Super Allocation Blocks ExtentBlock Block (inode, directory, data block) Bitmap

0 264-1

Extent based allocation Super Block : SANtopia file system information Allocation Block

No preallocated area for inode, directory entry, data block Extent based allocation (4KB ~ 64KB)

Extent bitmap Located end of address space(file system size) Need to distinguish from object type in Extent Allocation Bitmap Use 2 bit : 00 – not used, 01 – inode 10 – dir entry, 11 – data block

Page 21: SANtopia Design Features

자료저장시스템워크샵

inode Dynamic allocation inode

No limitation of inode number No preallocated inode area Cf) ext2 file system

: 1 node per 4KB

Each inode size is 1 extent Fragmentation Stuffed inode for space efficiency

64-bit inode number Using unique ID in SANtopia

inode number(inode information)

file or directoryinformation

Data Block Pointer

or

Stuffed Data

Extent

Page 22: SANtopia Design Features

자료저장시스템워크샵

inode structure

 

Dinode Info.    

Double Indirect blocks

Double Indirect blocks

: Extent

Single Indirect blocks

Single Indirect blocks

……

Dynamic Multi-Level Inode Allocation

Page 23: SANtopia Design Features

자료저장시스템워크샵

Directory(Extendible Hash)

DirInfo.

00

01

10

11

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

Directory

Node(Extent)

Indirect

hash

roothash

2

4

Page 24: SANtopia Design Features

자료저장시스템워크샵

Recovery

Journaling 기법 사용 Write in-core log buffer to

log-disk when metadata updates.

Log disk is circular buffer

Metadata modification operations(transaction)

create, remove, unlink, link, allocation, truncate, rename, …

Log

TransactionManager

TransactionManager

RecoveryManager

RecoveryManager

LogManager

LogManager

MetadataManager

MetadataManager Metadata

File Operation

(transaction)System

Manager(system recovery)

Page 25: SANtopia Design Features

자료저장시스템워크샵

SANtopia Buffer and Lock Manager

Page 26: SANtopia Design Features

자료저장시스템워크샵

Features of Buffer Manager(I)

Support Global File Sharing

Reduce disk I/O Sharing each buffer

Split distributed BM GBM are distributed(partitioned) on several nodes

Manage Global Buffer List and Local Buffer List Communication vs. Space overhead

Manage the logical global buffer Weak correctness of global buffer list Safe but not up-to-date

Page 27: SANtopia Design Features

자료저장시스템워크샵

Features of Buffer Mgr(II)

Integration of buffer and lock message Overlapped with global lock manager Piggyback the buffer lists over lock messages

Reduce the number messages

Adopt write invalidation scheme For the sake of simplicity

Support buffer forwarding scheme Enlarging the performance by reducing the disk I/O

Page 28: SANtopia Design Features

자료저장시스템워크샵

Structure of Buffer Manager

Local and Global Buffer Manager Decision of GBM : Inode hash

SANtopia Host SANtopia Host SANtopia Host

SANtopia Host(Global Buffer Server)

LBM(Local Buffer

Manager)

GBM(Global Buffer

Manager)

LBM(Local Buffer

Manager)

. . . . .

. . . . .

SANtopia Host(Global Buffer Server)

SAN(Sotrage Area Network)

LBM(Local Buffer

Manager)

LBM(Local Buffer

Manager)

LBM(Local Buffer

Manager)

GBM(Global Buffer

Manager)

Page 29: SANtopia Design Features

자료저장시스템워크샵

Operations between GBM and LBM

Buffer list information GBM Server Failure

Local

Buffer

Manager

Global

Buffer

Manager

Buffer List for GBM

(Lock Message)

Buffer List for LBM

(Lock Message)

Local

Buffer

Manager

Global

Buffer

Manager

Buffer List for new GBM Server

Modifies Buffer Server Table for LBM

Page 30: SANtopia Design Features

자료저장시스템워크샵

Features of Lock Manager Lock Mode

Shred lock and Exclusive lock

Lock Object 64bits inode - File Lock

Distributed(partitioned) on several nodes Host-based locking Overlapped with global buffer manager Global Lock Manager(GLM) vs. Local Lock Manager(LLM)

Delayed Lock Free Callback scheme for lock free

Callback by lock server No lock entrance after receiving a callback message

Recovery on host failure I/O Fencing Rebuild lock table: take locks from the failed host

Page 31: SANtopia Design Features

자료저장시스템워크샵

Integration of Lock Mgr and Buffer Mgr

SANtopia Host SANtopia Host SANtopia Host

SANtopia Host(Global Buffer & Lock Server)

LBM(Local Buffer)

GBM(Global Buffer)

LBM

. . . . .

. . . . .LLM

LBM

LLM

LBM

LLM

LLM(Local Lock Table)

GLM(Global Lock

Table)

SANtopia Host(Global Buffer & Lock Server)

LBM(Local Buffer)

GBM(Global Buffer)

LLM(Local Lock Table)

GLM(Global Lock

Table)

SAN(Sotrage Area Network)

Page 32: SANtopia Design Features

자료저장시스템워크샵

Operational Design(I)

local buffer manager

bufferbuffer

bufferbuffer

buffer

buffer

local lock manager

local lock 1

……

local lock 2 …

global buffer manager

bufferbuffer

bufferbuffer

buffer

buffer

global lock manager

host 1

……

host 2 …

globallock 1

lock(lock_id,mode,local_buffer_list), unlock(lock_id,local_buffer_list)

lock_grant( lock_id, mode, host_related_global_buffer_list)

buffer forwarding

call_back( lock_id, hrgbl)

invalidate at unlock

Page 33: SANtopia Design Features

자료저장시스템워크샵

Operational Design (II)

Global Lock Manager Upon receiving lock request

Update global buffer list using the local_buffer_list

Upon receiving unlock request Grant lock before processing the unlock request Update global buffer list using the local_buffer_list

Upon granting lock Piggyback a part of global buffer list concerned with the host

Upon sending callback Piggyback a part of global buffer list concerned with the host

Page 34: SANtopia Design Features

자료저장시스템워크샵

Operational Design (III)

Local Lock Manager Upon sending lock request

Piggyback the local buffer list Upon sending unlock request

Invalidate buffer related with the lock Piggy back the local buffer list

Upon receiving lock grant Save the piggybacked global buffer list

Upon receiving callback Prohibit the lock counter from being increased Unlock as soon as possible

Page 35: SANtopia Design Features

자료저장시스템워크샵

Operational Design (IV)

Local Buffer Manager Upon receiving forward request

Send the requested buffer without validity check Of course, check whether the requested block is still cached If the buffer is already flushed, send an acknowledge signal