38
Software Defined Storage based on OpenStack Open Frontier Lab. Manseok (Mario) Cho [email protected]

[OpenStack Days Korea 2016] Track2 - OpenStack 기반 소프트웨어 정의 스토리지 기술

Embed Size (px)

Citation preview

Software Defined Storagebased on OpenStack

Open Frontier Lab.Manseok (Mario) Cho

[email protected]

Who am I ?Development Experience◆ Bio-Medical Data Processing based on

HPC for Human Brain Mapping◆ Medical Image Reconstruction

(Computer Tomography) ◆ Enterprise System Architect◆Open Source Software Developer

Open Source Develop◆ Linux Kernel (ARM, x86, ppc)◆ LLVM (x86, ARM, custom) ◆ OpenStack : Orchestration (heat)◆ SDN (OpenDaylight, OVS, DPDK) ◆ OPNFV: (DPACC, Genesis, Functest, Doctor,)

Technical Book◆ Unix V6 Kernel

Open Frontier Lab.Manseok (Mario) Cho

[email protected]

Open Source S/W developer community

http://kernelstudy.net- Linux�Kernel�(ARM,�x86)- LLVM�Compiler- SDN/NFV

뭐냐?

*http://www.containerstore.com/s/kitchen/1** http://cool.conservation-us.org/coolaic/sg/bpg/annual/v11/bp11-38.html

Storage (저장, 보관, 창고)

Just one more than the rest combined

*http://www.funnyjunk.com/Computer+storage+throughout+time+part+2/funny-pictures/5465540/

10 2MAINFRAME

CLIENT-SERVER

WEB

SOCIAL

INTERNET OF THINGS

CLOUD

Few Employees

ManyEmployees

Customers/Consumers

BusinessEcosystems

Communities & Society

Devices& Machines

10 4

10 6

10 7

10 910 11

Front OfficeProductivityBack Office

Automation

E-Commerce

Line-of-BusinessSelf-Service

Social Engagement

Real-TimeOptimization1960s-1970s

1980s

1990s

2011

2016

2007

OS/360

USERS

VALUE

TECHNOLOGIES

SOURCES

BUSINESS

TECHNOLOGY

10 11

The Technical Challenge

* http://www.slideshare.net/SanjeevKumar17/tech-mahindra-i5sanjeevdec2013/

10 2

MAINFRAME

CLIENT-SERVER

WEB

SOCIAL

INTERNET OF THINGS

CLOUD

Few Employees

ManyEmployees

Customers/Consumers

BusinessEcosystems

Communities & Society

Devices& Machines

10 4

10 6

10 7

10 910 11

Front OfficeProductivityBack Office

Automation

E-Commerce

Line-of-BusinessSelf-Service

Social Engagement

Real-TimeOptimization1960s-1970s

1980s

1990s

2011

2016

2007

OS/360

USERS

VALUE

TECHNOLOGIES

SOURCES

BUSINESS

TECHNOLOGY

10 11

Processes

Stand alone projectsCorporate IT driven

Data InfrastructureLOB driven

Data ecosystem

++

Data integration becoming the barrier to business success

PeopleProduct & Things

* http://www.slideshare.net/SanjeevKumar17/tech-mahindra-i5sanjeevdec/

The Technical Challenge

* http://www.toonpool.com/artists/toons_589

File System (파일 시스템)

* http://computerrepair-vancouver.org/deleted-file-recovery-dos-and-donts/** http://www.ibm.com/developerworks/tivoli/library/t-tamessomid/*** http://www.informatics.buzdo.com/p778-debian-root-boot-bin-l ib-dev.htm

Operating System focus on Storage

SSD/HDD Memory

Application

System Call

Processor

Schedulerprocess manager

Memory Manager

File System

I/O Interface Device Driver

User Space

Hardware Space

Operating System(Kernel Space)

Network

Application Application

Logical Block Layer

Redundant Arrays of Independent Disks

SSD/HDD

Application

Resource Manager (VFS)

File System

User Space

Hardware Space

Operating System(Kernel Space)

Application Application

Logical Block Layer

SSD/HDD SSD/HDDSSD/HDDSSD/HDD SSD/HDD SSD/HDDSSD/HDDSSD/HDD SSD/HDD SSD/HDDSSD/HDDSSD/HDD SSD/HDD SSD/HDDSSD/HDD

Hardware Block Layer (RAID Controller)

RAID: The First Software Defined Storageat 1988

* Source: 1988, Anil Vasudeva “ A case for Disk Arrays” Presented at conference, Santa Clara CA. Aug 1988

OpenStackOpenStack is collection of software for setting up a massiveIaaS (Infrastructure as a Service) environment

OpenStack consists of six main components below

OpenStack support Block Storage (Cinder) & Object Storage(Swift)

* http://www.openstack.org/software/

Storage System on OpenStack

SSD/HDD

Application

Resource Manager

File System

User Space

Hardware Space

Operating System(Kernel Space)

Logical Block Layer

Application ApplicationApplication

Virtual Computing Machine Manage (Nova)

Block StorageManager(Cinder)

Object StorageManager(Swift)

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Shared File System(Manila)

Application

Comparison of OpenStack Storage

Swift CinderManilaOpenStack Component

Object BlockFile

REST API iSCSINFS, CIFS/SMB

- VM live migration- Storage of VM files- Use with legacy application

Storage Type

Primary Interface

Use Cases - Large datasets- Movie,Images, Sounds- Storage of VM files- Archiving

- High performance- DBs- VM Guest Storage- Snaps shot- VMs clones

Benefit ScalabilityDurability ManageabilityCompatibility

* http://www.openstack.org/openstack-manuals/openstack-ops/content/storage_decision.html

Cinder provides persistent block storage resource to the virtual machines running on Nova compute

Cinder uses plugin to support multiple types of backend storages

Cinder: Block Storage Layer

Cinder

Nova Compute #1

VM #2VM #1 VM #3 …

Nova Compute #2

VM #8VM #7 VM #9 …

Nova Compute #3

VM #12VM #11 VM #13 …

CreateA volume

DeleteA volume Snapshot

Attach a volume

Detech a volume

Cinder: Volume Manage APIsAPI no. Work Function

(1)

Volume Operation

Create Volume(2) Create Volume from Volume(3) Extend Volume(4) Delete Volume(5)

Connection OperationAttach Volume

(6) Detach Volume(7)

Volume snapshot OperationCreate Snapshot

(8) Create Volume from Snapshot(9) Delete Snapshot

(10)Volume Image Operation

Create Volume from Image(11) Create Image from Volume

Nova VM VM #1

Glance Image

Cinder Volume

Snapshot

VM #2

1) create volume

2) Create volume from volume

5) Attach volume 5) Attach volume

7) Create Snapshot

VM #4

8) Create volume from Snapshot

5) Attach volume

VM #5

10) Create volume from Image

5) Attach volume

VM #6

11) Create Image from Volume

5) Attach volume

VM #3

5) Attach volume

3) Extent volume

Cinder: Requirement of backend

Life Cycle of VM CreateVM

LaunchVM

RunningVM

StopVM

DeleteVM

Cinder Work Create /

Attach

Extend/

SnapshotDetach Delete

Technical Requirement

1. When needed, quickly prepare block space

2. Copy and reuse existing block

1. Flexible add

2. Automatic Extend block

1. Preserve important data

2. Safety delete unnecessary confidential data

Cinder: Volume Manage (Scheduler)

Volume Service 1

Volume Service 2

Volume Service 3

Volume Service 4

Volume Service 5

Volume Service 1

Volume Service 2

Volume Service 3

VolumeService 4

Volume Service 5

Weight = 25

Weight = 20

Weight = 41

Volume Service 2

Volume Service 4

Volume Service 5

Filters Weighers

Winner!

• AvailabilityZoneFilter

• CapabilitiesFilter

• JsonFilter• CapacityFilter• RetryFilter

• CapacityWeigher• AllocatedVolumesWeigher• AllocatedSpaceWeigher

* http://www.intel.com/

Cinder: Create Volume Create volume- User: POST http://volume1.server.net:8776/v2/{tenat_id}/volumes

Cinder-API: CALL cinder.volume.API().create()Cinder.volume.API: RPC CAST cinder.scheduler()

Cinder.scheduler: SCHEDULE volume hostCinder.scheduler: RCP CAST cinder.volume.create_volume()

Cinder.volume.manager: CALL cinder.volume.driver.create_volume()Cinder.volume.manager: CALL cinder.volume.dirver.create_export()

Attach volume- User: POST http://novacompute1.server.net:8776/v2/{tenat_id}/servers/{vm_uuid}/os-

volume_attachments

Nova-API: CALL Nova.compute.API.attach_volume()Nova.compute.API: RPC CAST Nova.compute.manager.attach_volume()

Nova.compute.manager.attach_volume: RPC CALL cinder.volume.initialize_connection()Noba.compute.manager.attach_volume: RPC CALL virt volume driver attach_volume()

- libvirt.driver.attach_volume() -> volume_driver.connect_volume()

Nova.compute.manager.attach_volume: RPC CALL cinder.volume.attach()

* Source: https://tw.pycon.org/2013/site_media/media/proposal_files/cinder_2013.pdf

Cinder: Plug-In

CinderPlugins

SoftwareBased

HardwareBased

File SystemBased

BlockBased

FibreChannel

iSCSI

NFS

DFS

NFS

GlusterFSGPFSCeph

LVM

Storage vender specificPlug-ins such asEMC, Hitachi, HP, Dell…

Cinder Plug-In: LVM caseNova VM VM #1

Cinder

LVM

Cinder API

VM #2 VM #3 …

Hyper-visior (KVM, VMWARE, … )

iSCSI initiator /dev/sdx

Nova VM VM #6 VM #7 VM #8 …

Hyper-visior (KVM, VMWARE, … )

iSCSI initiator /dev/sdx

LV#1 LV#2 LV#3 LV#4

Cinder Scheduler

Volume (LVM Plugin)

iSCSI target

iSCSI target

Attach/Detach via Hypervisor Attach/Detach via Hypervisor

Create/Delete/Extend/…

Cinder Plug-In: FC caseNova VM VM #1

Cinder

LVM

Cinder API

VM #2 VM #3 …

Hyper-visior (KVM, VMWARE, … )

/dev/sdx(LUN1)

Nova VM VM #6 VM #7 VM #8 …

Hyper-visior (KVM, VMWARE, … )

Cinder Scheduler

Volume (FC Plugin)

Storage Controller

Attach/Detach via Hypervisor Attach/Detach via Hypervisor

Create/Delete/Extend/…

LUN1 LUN2 LUN3 LUN4

/dev/sdy(LUN2)

/dev/sdx(LUN1)

/dev/sdy(LUN2)

Cinder Compare LVM vs FC

LVM FC Remark

Volume Implementation

Managed LVM Managed Storage Controller

VolumeOperation

LVM (Software) FC (Hardware) LVM more flexible

SupportedStorage

Storage Independent

SpecificStorage (req. plug-in)

LVM BetterSupport coverage

Access Path iSCSI (Software)

Fibre Channel(Hardware)

FC Better performance

Swift: Object Storage

Client

Swift

ProxyNode

StorageNodeHTTP (REST API) HTTP (REST API)

Account NodeContainer NodeObject Node

Reliable

Highly Scalable

Hardware Proof

Configurable replica model with zones & regionsEasy to use HTTP API – Developers don’t shardHigh Concurrency (support lots of users)

Multi-tenant: each account has its own namespaceTier & Scale any component in the system

No Single Point of Failure (High AvailabilityAssumes unreliable hardwareMix & match hardware vendors

* https://www.openstack.org/assets/presentation-media/Swift-Workshop-OSS-Atlanta-2014.pdf

Swift: Ring Hash

* Source: https://ihong5.wordpress.com/tag/consistent-hashing-algorithm/

Swift architectureUser Space

Swift Stooge Node

Swift Proxy Node

Application ApplicationApplication

HTTP Load balancer

ProxyNode

ProxyNode

(expand)

Storage Node

Storage Node

Storage Node

Storage Node

Proxy Node

Application

Network

Storage Node

ExpandProxy Server“Throughput”

ExpandStorage Server“Volume”

Swift ReplicatorNode #1 Node #2 Node #3 Node #4 Node #5

Each Nodes checks

Node #1 Node #2 Node #3 Node #4 Node #5Find Defect data

Node #1 Node #2 Node #3 Node #4 Node #5Copy data to another node

Node #1 Node #2 Node #3 Node #4 Node #5Recovery data to original node

Node #1 Node #2 Node #3 Node #4 Node #5Delete temp. data

Swift: hash synchronizeNode #1

HASH A HASH B

Data #1

Data #3

Node #2

HASH A HASH B

Data #2 tmp

Node #1

HASH A HASH B

Data #1

Data #3

Node #2

HASH A HASH B

Data #2 tmp

Sync

Data #2 tmp

Data #1

Data #3

Node #1

HASH A HASH B

Data #1

Data #3

Node #2

HASH A HASH B

Data #2 tmp

Data #2 tmp

Data #1

Data #3

Swift: Object Update

Node

tmp HASH

upload Data #1

Node Node

tmp HASH

upload Data #1

Data#1’

Move

tmp HASH

Data #1

Data#1’

Delete

Swift: Object Update

Node Node

HASH

Data #1

NewData

HASH

Data#1

Delete old

https://swift.server.net/v1/AUTH_account/container/object

Swift: REST APIs

Prefix API version

Account Container Object

Swift: Using REST for Object handling

Basic Command- http://swift.server.net/v1/account/container/object

Get a list of all container in an account- GET http://swift.server.net/v1/account/

To create new container- PUT http://swift.server.net/v1/account/new_container

To list all object a container- GET http://swift.server.net/v1/account/container

To create new object- PUT http://swift.server.net/v1/account/container/new_object

* Source: https://tw.pycon.org/2013/site_media/media/proposal_files/cinder_2013.pdf

Swift vs Ceph

* http://japan.zdnet.com/article/35072972/

Load balancer

Client

Proxy Proxy

Client

OSDOSD OSD OSD

Monitor/Metadata

Placement Group

Cluster Map

Swift Ceph

Data Science work flow

EDW

NoSQL

Data

Dis

tribu

tion

Business Intelligence

Hadoop

Grid

MDM Data Quality

Real-time Streaming

Batch

Replication

Collection Layer

Data IntegrationLayer

ReportLayer

Data Sources

Archival

HadoopData

Exploration

Network Elements

Content

Network Logs

Social Media

External Data

Transactions CEP

StagingLayer

* http://www.slideshare.net/SanjeevKumar17/tech-mahindra-i5sanjeevdec/

Thanks you!

Q&A

The OpenStack® Word Mark and OpenStack Logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

• GPFS is a trademark of International Business Machines Corporation in the United States, other countries, or both.

• GlusterFS, the Gluster ant logo, and the Gluster Community logo are all trademarks of Red Hat, Inc. All other trademarks, registered trademarks, and product names may be trademarks of their respective owners.

• Dell is a trademark of Dell Inc.

• EMC and CLARiiON are registered trademarks of EMC Corporation.

• HP is a trademark of Hewlett-Packard Development Company, L.P. in the U.S. and other countries.

•Other company, product or service names may be trademarks or service mark of others.