55
當當當當當當當當當 當當當當當當當當當 DRBL-Hadoop DRBL-Hadoop Jazz Wang Jazz Wang Yao-Tsung Wang Yao-Tsung Wang [email protected] [email protected]

當企鵝龍遇上小飛象 DRBL-Hadoop

  • Upload
    laurel

  • View
    126

  • Download
    0

Embed Size (px)

DESCRIPTION

當企鵝龍遇上小飛象 DRBL-Hadoop. Jazz Wang Yao-Tsung Wang [email protected]. Programmer v.s. System Admin. Source: http://www.funnyjunksite.com/wp-content/uploads/2007/08/programmer.jpg. Source: http://www.sysadminday.com/images/people/136-3697.JPG. Agenda. PART 1 :. - PowerPoint PPT Presentation

Citation preview

當企鵝龍遇上小飛象當企鵝龍遇上小飛象DRBL-HadoopDRBL-Hadoop

當企鵝龍遇上小飛象當企鵝龍遇上小飛象DRBL-HadoopDRBL-Hadoop

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

AgendaAgendaAgendaAgenda

What is What is Cluster Computing Cluster Computing ??How to deploy PC cluster ?How to deploy PC cluster ?

What is What is DRBLDRBL and and ClonezillaClonezilla ? ?Can DRBL help to Can DRBL help to deploy Hadoopdeploy Hadoop ? ?

Live Demo of Live Demo of DRBL Live DRBL Live and and Clonezilla LiveClonezilla Live

PART 3 :PART 3 :PART 3 :PART 3 :

PART 1 :PART 1 :PART 1 :PART 1 :

PART 2 :PART 2 :PART 2 :PART 2 :

PC Cluster 101PC Cluster 101PC Cluster 101PC Cluster 101

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

PART 1 :PART 1 :PART 1 :PART 1 :

At First, We have “ 4 + 1 ” PC ClusterAt First, We have “ 4 + 1 ” PC ClusterAt First, We have “ 4 + 1 ” PC ClusterAt First, We have “ 4 + 1 ” PC Cluster

It'd better beIt'd better be

22nn

It'd better beIt'd better be

22nnManageManage

SchedulerSchedulerManageManage

SchedulerScheduler

GiE SwitchGiE SwitchGiE SwitchGiE Switch

WANWANWANWAN

Then, We connect 5 PCs with Then, We connect 5 PCs with Gigabit EthernetGigabit Ethernet Switch Switch

Then, We connect 5 PCs with Then, We connect 5 PCs with Gigabit EthernetGigabit Ethernet Switch Switch

10/100/100010/100/1000MBpsMBps

10/100/100010/100/1000MBpsMBps

Add 1 NICAdd 1 NICfor WANfor WAN

Add 1 NICAdd 1 NICfor WANfor WAN

LAN SwitchLAN SwitchLAN SwitchLAN Switch

WANWANWANWAN

4 4 Compute NodesCompute Nodes will communicate via will communicate via LAN SwitchLAN Switch. Only . Only Manage NodeManage Node have have

Internet Access Internet Access for Security!for Security!

4 4 Compute NodesCompute Nodes will communicate via will communicate via LAN SwitchLAN Switch. Only . Only Manage NodeManage Node have have

Internet Access Internet Access for Security!for Security!

Compute NodesCompute NodesCompute NodesCompute Nodes

Manage NodeManage NodeManage NodeManage Node

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

MPICHMPICHMPICHMPICH

BashBashBashBash

PerlPerlPerlPerl

MessagingMessagingMessagingMessaging

YPYPYPYPNISNISNISNIS

Account Mgnt.Account Mgnt.Account Mgnt.Account Mgnt.

SSHDSSHDSSHDSSHD

GCCGCCGCCGCC

Compute NodesCompute NodesCompute NodesCompute Nodes

BasicBasicSystemSystemSetupSetup

forforClusterCluster

BasicBasicSystemSystemSetupSetup

forforClusterCluster

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

MPICHMPICHMPICHMPICHOpenPBSOpenPBSOpenPBSOpenPBS

BashBashBashBash

PerlPerlPerlPerl

MessagingMessagingMessagingMessaging

YPYPYPYPNISNISNISNIS

Account Mgnt.Account Mgnt.Account Mgnt.Account Mgnt.

SSHDSSHDSSHDSSHD

GCCGCCGCCGCC

Job Mgnt.Job Mgnt.Job Mgnt.Job Mgnt.

NFSNFSNFSNFS

File SharingFile SharingFile SharingFile Sharing

ExtraExtraExtraExtra

On On Manage NodeManage Node,,We need to install We need to install SchedulerScheduler and and Network File SystemNetwork File System for sharing for sharing

Files with Compute NodeFiles with Compute Node

On On Manage NodeManage Node,,We need to install We need to install SchedulerScheduler and and Network File SystemNetwork File System for sharing for sharing

Files with Compute NodeFiles with Compute Node

Research topics about PC ClusterResearch topics about PC ClusterResearch topics about PC ClusterResearch topics about PC Cluster

Ref: Cluster Computing in the Classroom: Topics, Guidelines, and Experienceshttp://www.gridbus.org/papers/CC-Edu.pdf

ClusterClusterComputingComputing

ClusterClusterComputingComputing

SystemSystemArchitectureArchitecture

SystemSystemArchitectureArchitecture

ParallelParallelComputingComputing

ParallelParallelComputingComputing

ParallelParallelAlgorithmsAlgorithms

AndAndApplicationsApplications

ParallelParallelAlgorithmsAlgorithms

AndAndApplicationsApplications

ProcessProcessArchitectureArchitecture

ProcessProcessArchitectureArchitecture

NetworkNetworkArchitectureArchitecture

NetworkNetworkArchitectureArchitecture

StorageStorageArchitectureArchitecture

StorageStorageArchitectureArchitecture

System-levelSystem-levelMiddlewareMiddleware

System-levelSystem-levelMiddlewareMiddleware

Share MemoryShare MemoryProgrammingProgramming

Share MemoryShare MemoryProgrammingProgramming

Distributed MemoryDistributed MemoryProgrammingProgramming

Distributed MemoryDistributed MemoryProgrammingProgramming

Application-levelApplication-levelMiddleware ProgrammingMiddleware Programming

Application-levelApplication-levelMiddleware ProgrammingMiddleware Programming

Challenges of Cluster ComputingChallenges of Cluster ComputingChallenges of Cluster ComputingChallenges of Cluster Computing

Hardware

Ethernet Speed / PC Density

Power / Cooling / Heat

Network and Storage Architecture

Software

Job Scheduler ( Cluster level )

Account Management

File Sharing / Package Management

Limitation

Shared Memory

Global Memory Management

Common Method to deploy ClusterCommon Method to deploy ClusterCommon Method to deploy ClusterCommon Method to deploy Cluster

1.1. Setup one Setup oneTemplateTemplatemachinemachine

1.1. Setup one Setup oneTemplateTemplatemachinemachine

2.2. CloningCloningtoto

multiplemultiplemachinemachine

2.2. CloningCloningtoto

multiplemultiplemachinemachine

3.3. ConfigureConfigureSettingsSettings

↓↓

4.4. InstallInstall

JobJobSchedulerScheduler

↓↓

5.5. Running Running

BenchmarkBenchmark

3.3. ConfigureConfigureSettingsSettings

↓↓

4.4. InstallInstall

JobJobSchedulerScheduler

↓↓

5.5. Running Running

BenchmarkBenchmark

Challenges of Common MethodChallenges of Common MethodChallenges of Common MethodChallenges of Common Method

Upgrade Software ?Upgrade Software ?Upgrade Software ?Upgrade Software ?

Add New User Account ?Add New User Account ?Add New User Account ?Add New User Account ?

Configuration SyncronizationConfiguration SyncronizationConfiguration SyncronizationConfiguration Syncronization

How to share user data ?

How to share user data ?

How to share user data ?

How to share user data ?

4 000 T o ta l N o des

3 000 0T o ta l co res

1 6P BD ata

4 000 T o ta l N o des

3 000 0T o ta l co res

1 6P BD ata

資料標題: Scaling Hadoop to 4000 nodes at Yahoo!資料日期: September 30, 2008

6 64 01 85 .8a v g. thro ughput (M B/s)

4422tasks per node

5 ,0 40 ,00 05 ,0 40 ,00 03 16 ,80 03 16 ,80 0to ta l M B pro cesses

3 603 603 203 20file s ize (M B )

1 4,0001 4,0009 909 90num ber o f files

rea dw riterea dw rite

4 000-no de c luster5 00-no de cluster

6 64 01 85 .8a v g. thro ughput (M B/s)

4422tasks per node

5 ,0 40 ,00 05 ,0 40 ,00 03 16 ,80 03 16 ,80 0to ta l M B pro cesses

3 603 603 203 20file s ize (M B )

1 4,0001 4,0009 909 90num ber o f files

rea dw riterea dw rite

4 000-no de c luster5 00-no de cluster

4 000 T o ta l N o des

3 000 0T o ta l co res

1 6P BD ata

4 000 T o ta l N o des

3 000 0T o ta l co res

1 6P BD ata

資料標題: Scaling Hadoop to 4000 nodes at Yahoo!資料日期: September 30, 2008

6 64 01 85 .8a v g. thro ughput (M B/s)

4422tasks per node

5 ,0 40 ,00 05 ,0 40 ,00 03 16 ,80 03 16 ,80 0to ta l M B pro cesses

3 603 603 203 20file s ize (M B )

1 4,0001 4,0009 909 90num ber o f files

rea dw riterea dw rite

4 000-no de c luster5 00-no de cluster

6 64 01 85 .8a v g. thro ughput (M B/s)

4422tasks per node

5 ,0 40 ,00 05 ,0 40 ,00 03 16 ,80 03 16 ,80 0to ta l M B pro cesses

3 603 603 203 20file s ize (M B )

1 4,0001 4,0009 909 90num ber o f files

rea dw riterea dw rite

4 000-no de c luster5 00-no de cluster

How to deploy 4000+ Nodes ????How to deploy 4000+ Nodes ????How to deploy 4000+ Nodes ????How to deploy 4000+ Nodes ????

Advanced Methods to deploy ClusterAdvanced Methods to deploy ClusterAdvanced Methods to deploy ClusterAdvanced Methods to deploy Cluster

SSI ( Single System Image )

Multiple PCs as Single Computing Resources

Image-based

homogeneous

ex. SystemImager, OSCAR, Kadeploy

Package-based

heterogeneous

easy update and modify packages

ex. FAI, DRBL

Other deploy tools

Rocks : RPM only

cfengine : configuration engine

Comparison of Cluster Deploy ToolsComparison of Cluster Deploy ToolsComparison of Cluster Deploy ToolsComparison of Cluster Deploy Tools

Distribution

Support

Diskless/

Sysmless

Type

Node

configuration

tools

Cluster

management

tools

Database

installation

System

ImagerALL Yes Image Yes No No

OSCARRPM-

basedYes Image Yes Yes No

Kadeploy ALL No Image Yes Yes Yes

DRBL ALL Yes Package Yes Yes No

FAIDebian-

BasedYes Package Yes No No

Hadoop Deployment ToolHadoop Deployment ToolHadoop Deployment ToolHadoop Deployment Tool

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

PART 2-1 :PART 2-1 :PART 2-1 :PART 2-1 :

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Source: Deploying hadoop with smartfroghttp://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

工商服務時間工商服務時間工商服務時間工商服務時間企鵝龍與再生龍企鵝龍與再生龍企鵝龍與再生龍企鵝龍與再生龍

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

PART 2-2 :PART 2-2 :PART 2-2 :PART 2-2 :

Diskless Remote Boot in Linux

網路是便宜的,人的時間才是昂貴的。

企鵝龍簡單來說就是 .....

用網路線取代硬碟排線 所有學生的電腦都透過網路連接到一台伺服器主機

+ +=

ServerDisklessPC

source: http://www.mren.com.tw

DiskfullPC

何謂企鵝龍何謂企鵝龍 DRBL ??DRBL ??何謂企鵝龍何謂企鵝龍 DRBL ??DRBL ??

何謂再生龍何謂再生龍 Clonezilla ??Clonezilla ??何謂再生龍何謂再生龍 Clonezilla ??Clonezilla ?? Clone (複製 ) + zilla = Clonezilla (再生龍 )

裸機備分還原工具

Norton Ghost 的自由軟體版替代方案

Disk to DiskDisk to DiskDisk to DiskDisk to Disk

Image to N DisksImage to N DisksImage to N DisksImage to N Disks

Disk Disk to to

ImageImage

Disk Disk to to

ImageImage

需分別處理設定 (每班約 40台 )

如:電腦中毒、環境設定系統操作問題、開關機、備份還原等

教師 1人維護管理多組設備

教學同時分派或收集作業

需要「化繁為簡」的解決方案!

一般國內小學的電腦教室

人力、時間成本高

設備維護成本高

降低資訊教育管理成本降低資訊教育管理成本降低資訊教育管理成本降低資訊教育管理成本

知識和軟體都需要讓孩子「帶著走」!

在校學習,也需回家複習學校每台 (平均 ) 2萬學生家用 (平均 ) 4萬

教育知識,也需教育尊重尊重智財權觀念

商業軟體授權高成本

知識與法治的學習

平衡商業軟體與知識教育平衡商業軟體與知識教育平衡商業軟體與知識教育平衡商業軟體與知識教育

以個人叢集電腦 (PC Cluster)經驗發展 DRBL&Clonezilla

多元化資訊教學的新選擇!

企鵝龍DRBL

再生龍Clonezilla

適用完整系統備份、裸機還原或災難復原

…是自由!不是免費

分送、修改、存取、使用軟體的自由。免費是附加價值。

適合將整個電腦教室轉換成純自由軟體環境

(Diskless Remote Boot in Linux )

國網中心自由軟體開發國網中心自由軟體開發國網中心自由軟體開發國網中心自由軟體開發

電腦教室管理的新利器!

■以每班 40台電腦為估算單位

企鵝龍企鵝龍 DRBLDRBL與再生龍與再生龍 ClonezillaClonezilla企鵝龍企鵝龍 DRBLDRBL與再生龍與再生龍 ClonezillaClonezilla

節省龐大軟體授權費

降低台灣盜版率

提升台灣形象

降低管理維護成本

帶動自由軟體使用

節樽軟體授權成本 (估計 )

NT. 98,595,000 元以某商業獨家軟體每機 3000元授權費計,每班 35台電腦 (3000*35*939)

教育單位採用 DRBL

高速計算研究

資料儲存備援

擴至全國各單位

企鵝龍的開機原理企鵝龍的開機原理企鵝龍的開機原理企鵝龍的開機原理

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

PART 1-3 :PART 1-3 :PART 1-3 :PART 1-3 :

1st, We install Base System of 1st, We install Base System of GNU/Linux GNU/Linux on on Management NodeManagement Node. You . You

can choose:can choose:Redhat, Fedora, CentOS, Mandriva,Redhat, Fedora, CentOS, Mandriva,

Ubuntu, Debian, ...Ubuntu, Debian, ...

1st, We install Base System of 1st, We install Base System of GNU/Linux GNU/Linux on on Management NodeManagement Node. You . You

can choose:can choose:Redhat, Fedora, CentOS, Mandriva,Redhat, Fedora, CentOS, Mandriva,

Ubuntu, Debian, ...Ubuntu, Debian, ...

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

2nd, We install 2nd, We install DRBL packageDRBL package and and configure it as configure it as DRBL ServerDRBL Server. .

There are lots of service needed:There are lots of service needed:SSHD, DHCPD, TFTPD, NFS Server,SSHD, DHCPD, TFTPD, NFS Server,

NIS Server, YP Server ...NIS Server, YP Server ...

2nd, We install 2nd, We install DRBL packageDRBL package and and configure it as configure it as DRBL ServerDRBL Server. .

There are lots of service needed:There are lots of service needed:SSHD, DHCPD, TFTPD, NFS Server,SSHD, DHCPD, TFTPD, NFS Server,

NIS Server, YP Server ...NIS Server, YP Server ...

DHCPDDHCPDDHCPDDHCPDTFTPDTFTPDTFTPDTFTPDNFSNFSNFSNFS

BashBashBashBashPerlPerlPerlPerl

Network BootingNetwork BootingNetwork BootingNetwork Booting

YPYPYPYPNISNISNISNIS

Account Mgnt.Account Mgnt.Account Mgnt.Account Mgnt.

DRBL ServerDRBL Serverbased on existingbased on existingOpen SourceOpen Source and and

keep keep HackingHacking!!

DRBL ServerDRBL Serverbased on existingbased on existingOpen SourceOpen Source and and

keep keep HackingHacking!!

SSHDSSHDSSHDSSHD

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuz-pxevmlinuz-pxevmlinuz-pxevmlinuz-pxe

initrd-pxeinitrd-pxeinitrd-pxeinitrd-pxe

Config. FilesConfig. FilesEx. hostnameEx. hostname

Config. FilesConfig. FilesEx. hostnameEx. hostname

After running “After running “drblsrv -idrblsrv -i” & ” & ““drblpush -idrblpush -i”, there will be ”, there will be pxelinux, pxelinux,

vmlinux-pex, initrd-pxevmlinux-pex, initrd-pxe in TFTPROOT, and in TFTPROOT, and different different configuration filesconfiguration files for for

each Compute Node in NFSROOTeach Compute Node in NFSROOT

After running “After running “drblsrv -idrblsrv -i” & ” & ““drblpush -idrblpush -i”, there will be ”, there will be pxelinux, pxelinux,

vmlinux-pex, initrd-pxevmlinux-pex, initrd-pxe in TFTPROOT, and in TFTPROOT, and different different configuration filesconfiguration files for for

each Compute Node in NFSROOTeach Compute Node in NFSROOT

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

DHCPDDHCPDDHCPDDHCPDTFTPDTFTPDTFTPDTFTPDNFSNFSNFSNFS YPYPYPYPNISNISNISNISSSHDSSHDSSHDSSHD

BIOS PXEBIOS PXEBIOS PXEBIOS PXE BIOS PXEBIOS PXEBIOS PXEBIOS PXE BIOS PXEBIOS PXEBIOS PXEBIOS PXE BIOS PXEBIOS PXEBIOS PXEBIOS PXE

3nd, We enable 3nd, We enable PXEPXE function in function in BIOSBIOS configuration. configuration.

3nd, We enable 3nd, We enable PXEPXE function in function in BIOSBIOS configuration. configuration.

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuz-pxevmlinuz-pxevmlinuz-pxevmlinuz-pxe

initrd-pxeinitrd-pxeinitrd-pxeinitrd-pxe

Config. FilesConfig. FilesEx. hostnameEx. hostname

Config. FilesConfig. FilesEx. hostnameEx. hostname

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

DHCPDDHCPDDHCPDDHCPDTFTPDTFTPDTFTPDTFTPDNFSNFSNFSNFS YPYPYPYPNISNISNISNISSSHDSSHDSSHDSSHD

BIOS PXEBIOS PXEBIOS PXEBIOS PXE BIOS PXEBIOS PXEBIOS PXEBIOS PXE BIOS PXEBIOS PXEBIOS PXEBIOS PXE BIOS PXEBIOS PXEBIOS PXEBIOS PXE

While Booting, While Booting, PXEPXE will query will queryIP address from IP address from DHCPDDHCPD..

While Booting, While Booting, PXEPXE will query will queryIP address from IP address from DHCPDDHCPD..

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuz-pxevmlinuz-pxevmlinuz-pxevmlinuz-pxe

initrd-pxeinitrd-pxeinitrd-pxeinitrd-pxe

Config. FilesConfig. FilesEx. hostnameEx. hostname

Config. FilesConfig. FilesEx. hostnameEx. hostname

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

TFTPDTFTPDTFTPDTFTPDNFSNFSNFSNFS YPYPYPYPNISNISNISNISSSHDSSHDSSHDSSHDDHCPDDHCPDDHCPDDHCPD

IP 1IP 1IP 1IP 1 IP 2IP 2IP 2IP 2 IP 3IP 3IP 3IP 3 IP 4IP 4IP 4IP 4

While Booting, While Booting, PXEPXE will query will queryIP address from IP address from DHCPDDHCPD..

While Booting, While Booting, PXEPXE will query will queryIP address from IP address from DHCPDDHCPD..

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuz-pxevmlinuz-pxevmlinuz-pxevmlinuz-pxe

initrd-pxeinitrd-pxeinitrd-pxeinitrd-pxe

Config. FilesConfig. FilesEx. hostnameEx. hostname

Config. FilesConfig. FilesEx. hostnameEx. hostname

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

TFTPDTFTPDTFTPDTFTPDNFSNFSNFSNFS YPYPYPYPNISNISNISNISSSHDSSHDSSHDSSHDDHCPDDHCPDDHCPDDHCPD

IP 1IP 1IP 1IP 1 IP 2IP 2IP 2IP 2 IP 3IP 3IP 3IP 3 IP 4IP 4IP 4IP 4

After PXE get its IP address, it will After PXE get its IP address, it will download booting files from download booting files from TFTPDTFTPD..After PXE get its IP address, it will After PXE get its IP address, it will

download booting files from download booting files from TFTPDTFTPD..

Config. FilesConfig. FilesEx. hostnameEx. hostname

Config. FilesConfig. FilesEx. hostnameEx. hostname

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

NFSNFSNFSNFS YPYPYPYPNISNISNISNISSSHDSSHDSSHDSSHDDHCPDDHCPDDHCPDDHCPD

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuz-pxevmlinuz-pxevmlinuz-pxevmlinuz-pxe

initrd-pxeinitrd-pxeinitrd-pxeinitrd-pxe

TFTPDTFTPDTFTPDTFTPD

IP 1IP 1IP 1IP 1 IP 2IP 2IP 2IP 2 IP 3IP 3IP 3IP 3 IP 4IP 4IP 4IP 4

Config. FilesConfig. FilesEx. hostnameEx. hostname

Config. FilesConfig. FilesEx. hostnameEx. hostname

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

NFSNFSNFSNFS YPYPYPYPNISNISNISNISSSHDSSHDSSHDSSHDDHCPDDHCPDDHCPDDHCPD

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuz-pxevmlinuz-pxevmlinuz-pxevmlinuz-pxe

initrd-pxeinitrd-pxeinitrd-pxeinitrd-pxe

TFTPDTFTPDTFTPDTFTPD

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

initrdinitrdinitrdinitrd

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

initrdinitrdinitrdinitrd

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

initrdinitrdinitrdinitrd

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

initrdinitrdinitrdinitrd

Config. FilesConfig. FilesEx. hostnameEx. hostname

Config. FilesConfig. FilesEx. hostnameEx. hostname

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

YPYPYPYPNISNISNISNISSSHDSSHDSSHDSSHDDHCPDDHCPDDHCPDDHCPD

initrdinitrdinitrdinitrd initrdinitrdinitrdinitrd initrdinitrdinitrdinitrd

IP 1IP 1IP 1IP 1 IP 2IP 2IP 2IP 2 IP 3IP 3IP 3IP 3 IP 4IP 4IP 4IP 4pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

initrdinitrdinitrdinitrd

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuz-pxevmlinuz-pxevmlinuz-pxevmlinuz-pxe

initrd-pxeinitrd-pxeinitrd-pxeinitrd-pxe

TFTPDTFTPDTFTPDTFTPD

After downloading booting files, scripts After downloading booting files, scripts in in initrd-pxeinitrd-pxe will config will config NFSROOTNFSROOT for for

each Compute Node.each Compute Node.

After downloading booting files, scripts After downloading booting files, scripts in in initrd-pxeinitrd-pxe will config will config NFSROOTNFSROOT for for

each Compute Node.each Compute Node.

NFSNFSNFSNFS

Linux KernelLinux KernelLinux KernelLinux Kernel

Kernel ModuleKernel ModuleKernel ModuleKernel Module

GNU LibcGNU LibcGNU LibcGNU Libc

Boot LoaderBoot LoaderBoot LoaderBoot Loader

YPYPYPYPNISNISNISNISSSHDSSHDSSHDSSHDDHCPDDHCPDDHCPDDHCPD

initrdinitrdinitrdinitrd initrdinitrdinitrdinitrd initrdinitrdinitrdinitrd

IP 1IP 1IP 1IP 1 IP 2IP 2IP 2IP 2 IP 3IP 3IP 3IP 3 IP 4IP 4IP 4IP 4pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuzvmlinuzvmlinuzvmlinuz

initrdinitrdinitrdinitrd

pxelinuxpxelinuxpxelinuxpxelinuxvmlinuz-pxevmlinuz-pxevmlinuz-pxevmlinuz-pxe

initrd-pxeinitrd-pxeinitrd-pxeinitrd-pxe

TFTPDTFTPDTFTPDTFTPD

Config. FilesConfig. FilesEx. hostnameEx. hostname

Config. FilesConfig. FilesEx. hostnameEx. hostname

NFSNFSNFSNFS

Config. 1Config. 1Config. 1Config. 1 Config. 2Config. 2Config. 2Config. 2 Config. 3Config. 3Config. 3Config. 3 Config. 4Config. 4Config. 4Config. 4

DRBL ServerDRBL ServerDRBL ServerDRBL Server

YPYPYPYPNISNISNISNISDHCPDDHCPDDHCPDDHCPDTFTPDTFTPDTFTPDTFTPDNFSNFSNFSNFS

BashBashBashBashPerlPerlPerlPerl

SSHDSSHDSSHDSSHD

BashBashBashBashPerlPerlPerlPerl

SSHDSSHDSSHDSSHDBashBashBashBashPerlPerlPerlPerl

SSHDSSHDSSHDSSHDBashBashBashBashPerlPerlPerlPerl

SSHDSSHDSSHDSSHDBashBashBashBashPerlPerlPerlPerl

SSHDSSHDSSHDSSHD

ApplicationsApplications and and ServicesServices will also will also deployed to each Compute Node deployed to each Compute Node

via via NFSNFS .... ....

ApplicationsApplications and and ServicesServices will also will also deployed to each Compute Node deployed to each Compute Node

via via NFSNFS .... ....

DRBL ServerDRBL ServerDRBL ServerDRBL Server

DHCPDDHCPDDHCPDDHCPDTFTPDTFTPDTFTPDTFTPD

With the help of With the help of NISNIS and and YPYP,,You can login each Compute NodeYou can login each Compute Node

with the with the Same ID / PASSWORDSame ID / PASSWORDstored in DRBL Server! stored in DRBL Server!

With the help of With the help of NISNIS and and YPYP,,You can login each Compute NodeYou can login each Compute Node

with the with the Same ID / PASSWORDSame ID / PASSWORDstored in DRBL Server! stored in DRBL Server!

NFSNFSNFSNFS SSHDSSHDSSHDSSHD YPYPYPYPNISNISNISNIS

SSHDSSHDSSHDSSHD SSHDSSHDSSHDSSHD SSHDSSHDSSHDSSHD SSHDSSHDSSHDSSHD

SSH ClientSSH ClientSSH ClientSSH Client

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

PART 2 -1:PART 2 -1:PART 2 -1:PART 2 -1:

當企鵝龍遇上小飛象當企鵝龍遇上小飛象當企鵝龍遇上小飛象當企鵝龍遇上小飛象

使用使用 DRBLDRBL佈署佈署 HadoopHadoop使用使用 DRBLDRBL佈署佈署 HadoopHadoop 仍在開發中,待整理套件

drbl-hadoop – 掛載本機硬碟給 HDFS 用

svn co http://trac.nchc.org.tw/pub/grid/drbl-hadoop

hadoop-register – 註冊網站與 ssh applet

svn co http://trac.nchc.org.tw/pub/cloud/hadoop-register

關於關於 hadoop.nchc.org.twhadoop.nchc.org.tw關於關於 hadoop.nchc.org.twhadoop.nchc.org.tw DRBL Server - 1台 (hadoop),加大 /home與 /tftpboot 空間。

DRBL Client - 19台 (hadoop101~hadoop119)

使用 Cloudera的Debian套件

使用 drbl-hadoop 的設定跟 init.d script來協助部署

使用 hadoop-register 來提供使用者註冊與 ssh applet 介面

Lesson LearnLesson LearnLesson LearnLesson Learn Cloudera套件的好處:使用 init.d script 來啟動關閉

name node, data node, job tracker, task tracker

建立大量帳號:

可透過DRBL 內建指令完成 /opt/drbl/sbin/drbl-useradd

使用者預設HDFS家目錄

跑迴圈切換使用者,下 hadoop fs -mkdir tmp

設定使用者HDFS權限

跑迴圈切換使用者,下 hadoop dfs -chown $(id) /usr/$(id)

HDFS 會使用 /var/lib/hadoop/cache/hadoop/dfs

MapReduce 會使用 /var/lib/hadoop/cache/hadoop/mapred

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

PART 2 -2:PART 2 -2:PART 2 -2:PART 2 -2:

Live DemoLive DemoLive DemoLive Demo

WANWANWANWAN DRBL-Live

1. Boot Server with DRBL-Live CD1. Boot Server with DRBL-Live CDhttp://free.nchc.org.tw/drbl-live/stable/http://free.nchc.org.tw/drbl-live/stable/

2. Download DRBL-Hadoop Script2. Download DRBL-Hadoop Scripthttp://classcloud.org/drbl-hadoop-live.shhttp://classcloud.org/drbl-hadoop-live.shhttp://classcloud.org/drbl-hadoop-live-run.shhttp://classcloud.org/drbl-hadoop-live-run.sh

3. Follow the steps3. Follow the stepshttp://classcloud.org/drbl-hadoophttp://classcloud.org/drbl-hadoop

1. Boot Server with DRBL-Live CD1. Boot Server with DRBL-Live CDhttp://free.nchc.org.tw/drbl-live/stable/http://free.nchc.org.tw/drbl-live/stable/

2. Download DRBL-Hadoop Script2. Download DRBL-Hadoop Scripthttp://classcloud.org/drbl-hadoop-live.shhttp://classcloud.org/drbl-hadoop-live.shhttp://classcloud.org/drbl-hadoop-live-run.shhttp://classcloud.org/drbl-hadoop-live-run.sh

3. Follow the steps3. Follow the stepshttp://classcloud.org/drbl-hadoophttp://classcloud.org/drbl-hadoop

Demo with DRBL-Live CDDemo with DRBL-Live CDDemo with DRBL-Live CDDemo with DRBL-Live CD

Questions?Questions?Questions?Questions?

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw

Jazz WangJazz WangYao-Tsung WangYao-Tsung Wang

[email protected]@nchc.org.tw