113
© 2009 IBM Corporation High Performance Power System 효율적으로 사용하기 High Performance Power System Date. 15/10/2009 DongJoon Cho ([email protected]) MTS, GTS, IBM Korea

High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

  • Upload
    vomien

  • View
    231

  • Download
    5

Embed Size (px)

Citation preview

Page 1: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

High Performance Power System 효율적으로사용하기

High Performance Power System

Date. 15/10/2009DongJoon Cho ([email protected])MTS, GTS, IBM Korea

Page 2: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Agenda

• Concerns about Power System

• Summary of the solutions

• Architectures for effective computing– H/W Architecture– System Architecture– S/W Architecture

• Case Study

Page 3: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Table of Contents

• Concerns about Power System

• Summary of the solutions

• Architectures for effective computing– H/W Architecture– System Architecture– S/W Architecture

• Case Study

Page 4: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Concerns about Power System

• 왜 고성능 Server를 구매해놓고 100% 활용을 하지 못할까?

• CPU Clock은 높아졌는데 왜 Application 성능은 나오지 않는 걸까?

• Clock은 2배로 빨라졌는데 왜 성능은 2배가 되지 않는 걸까?

• Memory를 2배로 추가했는데 왜 사용률이 ½로 떨어지지 않는 걸까?

• IBM Power System은 왜 다른 System에 비해 tpmC가 높게 나올까?

• IBM Power System은 response time은 좋은데 왜 사용량이 높을까?

S/W의 변화 없이 System만바꾼다고 성능이 향상될까?

System에 대해 CPU Clock이외에 무엇을 더 알고 있을까?

Page 5: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Summary of the solutions

• 간접적 방법

– Firmware update

– AIX update

– Software update

• 직접적 방법

– AIX configuration

– Plan/Selection Hardware

– System Architecture

– Software Architecture

대부분의 software 문제는 개발시간 및 비용문제로 인해 직접적인 방법으로 해결하기 어려움

Page 6: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Table of Contents

• Concerns about Power System

• Summary of the solutions

• Architectures for effective computing– H/W Architecture

• Hardware Architecture – CPU• Hardware Architecture - I/O

– System Architecture– S/W Architecture

• Case Study

Page 7: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - CPU

• CISC

– Complex Instruction Set Computer Architecture

– 필요한 모든 명령어 셋을 갖추도록 설계

– VAX, x86

• EPIC

– Explicitly Parallel Instruction Computing Architecture

– HP/Intel 공동 설계, 명시적 병렬 처리를 제공

– IA64

• RISC

– Reduced Instruction Set Computer Architecture

– 명령어 셋 자체를 가장 자주 사용되는 명령어만으로 개수를 줄임으로써 대부분의 활용 업무 면에서 소요시간을 단축할 수 있도록 설계

– SPARC, POWER, PA-RISC

Page 8: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture – Power Chip

Page 9: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture – Power5

Page 10: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - Power6

Page 11: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - CPU Instructions

• Computation Instructions

• Operands Types

Arithmetic operations Logical operations

ADD Add AND True if A and B true

SUB Subtract OR True if A or B true

MUL Multiply NOT True if A is false

DIV Divide XOR True if only one of

INC Increment A and B is true

DEC Decrement SHL Shift bits left

CMP Compare SHR Shift bits right

BSWAP Reverse byte order

Stack Accumulator Register Memory

Push A Ld A Ld R1, A Add C, B, A

Push B Add B Ld R2, B

Add St C Add R3, R2, R1

Pop C St C, R3

Page 12: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - CPU Instructions

• Data Transfer Instructions

LD Load value from memory to a register

ST Store value from a register to memory

MOV Move value from register to register

CMOV Conditionally move value from register to register if a condition is met

PUSH Push value onto top of stack

POP Pop value from top of stack

Page 13: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - CPU Instructions

• Control Flow Instructions

• Control Flow Relative Frequency

JMP Unconditional jump to another instruction

BR Branch to instruction if condition is met

CALL Call a procedure

RET Return from procedure

INT Software interrupt

Instruction Integer programs Floating-point programs

Branch 75% 82%

Jump 6% 10%

Call & return 19% 8%

Page 14: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - CPU Instructions

• Common InstructionsInstruction Instruction type Percent of instructions

executedInstruction type Overall percentage

Load Data transfer 22% Data transfer 38%

Branch Control flow 20% Computation 35%

Compare Computation 16% Control flow 22%

Store Data transfer 12%

Add Computation 8%

And Computation 6%

Sub Computation 5%

Move Data transfer 4%

Call Control flow 1%

Return Control flow 1%

Total 95%

Page 15: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - CPU and I/O

• CPU Speed versus I/O Speeds

• Several options to overcome I/O limitations

– Incorporate more I/O buses (parallelism)

– Extend current I/O technology (increase bandwidth, enhance operating modes)

– Develop new I/O technology

CPU보다 느린 I/O

I/O에 의한 wait를 줄이는여러 기술 필요

Page 16: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - CPU and I/O

• CPU Efficiency and CPU Access Costs

I/O에 의한 성능 저하

Page 17: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Table of Contents

• Concerns about Power System

• Summary of the solutions

• Architectures for effective computing– H/W Architecture

• Hardware Architecture – CPU• Hardware Architecture - I/O

– System Architecture– S/W Architecture

• Case Study

Page 18: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - I/O

• The elements of an I/O system

Page 19: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - I/O : InfiniBand

• Comparing InfiniBand to Existing Technology

– Differences and Benefits

Change Benefit

From: To:

Memory mapped Channel based CPU efficiency, scalability, isolation, recovery.

Parallel bus Switched fabric Scalability, isolation, redundancy, reduced pin-out, modularity, higher cross-sectional bandwidth.

Shared bus access Point to point Greater distance, higher speeds.

Load/store DMA scheduling Improved CPU efficiency.

Single open address space Independent address domains Protection, isolation, recovery, reliability.

Page 20: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - I/O : InfiniBand

Shared Bus Topology Shared Bus ArchitectureSwitched Fabric Topology

InfiniBand Switched Architecture

traditional

InfiniBand

InfiniBand Architecture

Page 21: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - I/O : InfiniBand

Accessing InfiniBand Services - The Channel Interface : Work / Completion Queue Architecture

Page 22: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - I/O : InfiniBand

InfiniBand Queue Operations – Operations on the send queue fall into three subclass

Queue를 통해 wait 최소화, 비동기 처리

Page 23: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - I/O : InfiniBand

• VIA (Virtual Interface Architecture)

– Messages Model

– Direct, protected access by user level software to the communications hardware; the protection is effected by means of the virtual memory system.

Comparison of VIA and traditional communications

Send and receive packet descriptors that specify scatter-gather operations—specifying where data must be distributed to and collected up from—when sending and receiving

A send message queue and a receive message queue, comprising linked lists of packet descriptors

A means of notifying the network interface that packets have been placed on a queue

An asynchronous notification process for the status of the operations requested (completion of a send or receive operation is signaled by writing state information into a packet descriptor)

Registration of memory areas used for communications: before communications are started, the memory areas for each hardware unit are identified and noted, allowing expensive operations, such as locking the pages, to be used and translating from virtual to real addresses to be done once, outside performance-critical data transfers

Page 24: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - I/O : InfiniBand

Logical processing steps in TCP/IP

White indicates per-message processing: it is the processing load imposed by the system call on the sockets interface, and is independent of the size of the message

Light gray indicates per-fragment processing (a long message is broken up into several fragments): this covers TCP, IP, media access and interrupt handling

Dark grey indicates per-byte processing (actually, per fragment plus per byte in fragment): this covers the data-copying overhead along with computation of the checksum

Checksum계산, memory 관리에 의해서도 overhead 발생

Page 25: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Hardware Architecture - I/O : InfiniBand

Operation Simple DMA Improved DMA

Send

•set up the DMA registers (with buffer address and size)•lock the page containing the buffers and purge corresponding addresses in the data cache•activate the send command•wait until the end of the operation•interrupt upon completion of the operation, and free (unlock) the page

•refill the free buffers with data to be sent•lock the buffer page(s) and purge corresponding addresses in the data cache•refill a descriptor with the addresses and sizes of the buffers just set up•change the descriptor status indicator to "DMA"•if the DMA was inactive, wake it up

Receive

•DMA interrupts processor•allocate a page and purge the cache of its addresses•set up the DMA registers (with buffer address and size)•when the operation completes the DMA will raise an interrupt

•refill descriptor(s) for receiving•purge corresponding addresses in the data cache•when a receive operation completes, DMA sets the descriptor indicator to System; the OS can test the status of different descriptors•if there are no free buffers, the DMA raises an interrupt

• Mechanisms to reduce the number of interrupts

개선된 DMA 방식으로 interrupt 횟수를 줄여 overhead를 줄임

Page 26: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Table of Contents

• Concerns about Power System

• Summary of the solutions

• Architectures for effective computing– H/W Architecture– System Architecture

• System Architecture (Hardware)• System Architecture (System Software)

– S/W Architecture

• Case Study

Page 27: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

System Architecture (Hardware)

• LPAR / DLPAR

Page 28: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

System Architecture (Hardware)

• LPAR / DLPAR

– Hypervisor

Page 29: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

System Architecture (Hardware)

• Micro Partitioning

– 프로세서당 최대 10개의 파티션 작성

– 여러 파티션 간 자원 공유

Page 30: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

System Architecture (Hardware)

• Micro Partitioning

Page 31: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

System Architecture (Hardware)

• VIO– Part of the Advanced POWER Virtualization feature

– Allows for sharing of physical devices, including storage and network

– Implemented as a customized AIX-based appliance

– Requires careful planning to maintain VIO Server with minimal impact to VIO

– Clients

– Provides command line tools for maintenance or can be maintained with NIM

Page 32: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

System Architecture (System Software)

• SMT (Simultaneous Multi-Threading)

– POWER5에서 향상된 하드웨어 디자인으로 프로세서가 동시에 두 개의 개별instruction을 실행할 수 있는 기능

– 하드웨어와 소프트웨어 thread의 우선 순위 선정을 통해서 어플리케이션의 성능에지장을 주지 않고 더 많은 하드웨어 자원의 사용률을 증대

• WLM (Workload Manager)

– 시스템을 분할하지 않고서 운영중인 업무간에 동적으로 시스템자원을 할당

– CPU 프로세서 단위가 아닌 CPU 시간을 분할하여 관리하므로 보다 세밀하게 CPU 자원을 제어

– CPU 시간, 메모리, 입출력량 등의 개별적 제어를 통해 특성이 다른 여러 종류의 어플리케이션들을 하나의 서버상에서 관리

Page 33: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

System Architecture (System Software)

• WPARs (Workload Partitions)

– A workload partition (WPAR), new with the IBM® AIX® 6.1 operating system, expands on the traditional IBM AIX logical partitioning (LPAR) technology by further allowing AIX to be virtualized within a single operating-system image.

– A simple definition of a WPAR is that it is a virtualized AIX instance that runs within a single AIX operating-system image.

Page 34: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Table of Contents

• Architectures for effective computing– H/W Architecture– System Architecture– S/W Architecture

• Software Architecture – OS• Software Architecture - Application Architecture

– Multi-Process Model» IPC

– Multi-Thread Model– Process Scheduling / Context Switching / Cache Hit– I/O Multiplexing Model– Event based I/O Model through Real-time Signal– epoll– IOCP– Parallel Programming– Java

» Java pollset

– AIX I/O Model» AIX IOCP

Page 35: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

S/W Architecture - OS

• What is the OS?

• OS와 Network Program과의 관계

– Network Program의 구성요소

• Socket API

• I/O

• Multi Connection 처리를 위한 Process or Thread

• Process or Thread를 동기화하기 위한 IPC(Inter Process Communication)

H/W (Disk, NIC …)

OS

File System, Memory

Socket API

I/OProcessThread

IPC

OS와 Network Program과의관계

Page 36: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

S/W Architecture – File on the Unix

• What is the File on the Unix?

• Ex) Process가 열고 있는 file 확인

– Process가 생성되면 기본적으로 Open하는 File

• 0 : 표준입력

• 1 : 표준출력

• 2 : 표준오류

office2@root/proc/9804/fd>ls -altotal 120dr-x------ 1 root system 0 Sep 27 03:22 .dr-xr-xr-x 1 root system 0 Sep 27 03:22 ..lr-xr-xr-x 24 root system 1024 Sep 22 18:48 0 -> /lr-xr-xr-x 24 root system 1024 Sep 22 18:48 1 -> /lr-xr-xr-x 24 root system 1024 Sep 22 18:48 2 -> /--w--w---- 1 root system 12506 Sep 15 18:13 7--w--w---- 1 root system 12506 Sep 15 18:13 8--w--w---- 1 root system 12506 Sep 15 18:13 9

Page 37: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

S/W Architecture – Program and OS

• Application Programs and OS

– Type of Software (Conceptual Model)

Page 38: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

S/W Architecture – Program and OS

• Application Programs and OS

– Application Programs

Page 39: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

S/W Architecture – Program and OS

• Application Programs and OS

– Operating Systems

Page 40: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

S/W Architecture – Program and OS

• Application Programs and OS

– Device Drivers

Page 41: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

S/W Architecture – Program and OS

• Application Programs and OS

– AIX 5L Structure

Page 42: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Table of Contents

• Architectures for effective computing– H/W Architecture– System Architecture– S/W Architecture

• Software Architecture – OS• Software Architecture - Application Architecture

– Multi-Process Model» IPC

– Multi-Thread Model– Process Scheduling / Context Switching / Cache Hit– I/O Multiplexing Model– Event based I/O Model through Real-time Signal– epoll– IOCP– Parallel Programming– Java

» Java pollset

• AIX I/O Model– AIX IOCP

Page 43: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Multi-Process Model

– Process : Program이 실행될 때 생성되는 Program을 대표하는 제어흐름과 System자원(memory,file,IPC…)등을 의미

– Process 생성 및 제어

• fork()

– Process 복사본 생성

– 자신과 코드를 공유하는 Child Process 생성

• exec()

– 현재 Process에 Program의 실행 이미지를 변경

– 새로운 Program을 Load해서 실행

Init Process Process’

Process A

Process’’

Process B

fork() fork()

exec()exec()

Multi-processing으로 인한IPC는 kernel overhead를 증가시킴

Page 44: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Multi-Processing Model

– Socket Program

socket

bind

listen

accept

read

write

socket

connect

write

read

close

Server Client

연결요청

데이터요청

데이터수신

fork()

Page 45: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Multi-Processing Model

Server App

Server AppServer App

Server AppServer App

Client App

Client App

Client App

Client App…

Server AppServer App

Server AppServer App

Client App

Client App

Client App

Client App

fork()① connecting Client to Server② fork()

요청이있을때마다 fork()가일어난다.1

2

process Pool① fork()② connecting Client to Server

fork() 시간이오래걸리므로 pool에미리 fork()를해서 child processes를만들어놓는다.

12

Page 46: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC (Inter Process Communication)

– What is IPC?

• Process간에 data를 공유하고 동기화하기 위해 사용하는 방법

– IPC 종류

• Semaphore

– 세마포어는 프로세스간 데이타를 동기화하고 보호

• Shared Memory

– 다중프로세스들이 가상메모리를 공유, 메모리 공유를 위한 가장 빠른 수단

• Message Queues

– queue 는 자료구조의 한종류인데, 먼저 들어온 자료가 먼저 나가는 구조

– 메시지큐의 IPC로써의 특징은 다른 공유방식에 비해서 사용방법이 매우 직관적이고 간단

– 제어하기가 상당히 까다롭다.

Page 47: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC 종류

• Pipe

– 프로세스의 데이타를 다른 프로세스에게 넘기기 위한 목적으로 사용. 데이타는 한쪽방향으로만흐를 수 있으며(읽거나 쓸 수만 있고, 동시에 읽고 쓰기를 할 수는 없다.- Read only or Write only), 동일한 부모를(PPID가 같은) 가지는 process 사이에서만 사용이 가능 하다

• FIFO (Named Pipe)

– 연속처리 I/O STREAM 선입선출로 Pipe와 비슷하나 이름을 부여해 서로다른 Process 사이의 사용이 가능한 것이 Pipe와 다른점

– mknod를 이용하여 FIFO를 생성

• UDS (Unix Domain Socket)

– socket API를 수정 없이 이용가능하며, port 기반의 Internet Domain Socket에 비해서 로컬 시스템의 파일시스템을 이용해서 내부프로세스간의 통신을 위해 사용한다.

Page 48: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC commands

– lpcs comnand

• ipcs -m ( shared memory )

• ipcs -q ( message gueues )

• ipcs -s ( semaphore )

– lpcrm comnand

• 세마포어, 메세지큐,공유메모리부분을 시스템에서 제거

기능 메세지큐 세마포어 공유메모리

1.IPC할당방법 msgget semget shmget

2.IPC제어방법 msgctl semctl shmctl

(상태변경,해제)

3.IPC작동방법 msgsnd semop shmat

(send/receive) msgrcv shmdt

Page 49: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC Limits

Semaphores 4.3.0 4.3.1 4.3.2 5.1 5.2 5.3

Maximum number of semaphore IDs for 32-bit kernel 4096 4096 131072 131072 131072 131072

Maximum number of semaphore IDs for 64-bit kernel 4096 4096 131072 131072 131072 1048576

Maximum semaphores per semaphore ID 65535 65535 65535 65535 65535 65535

Maximum operations per semop call 1024 1024 1024 1024 1024 1024

Maximum undo entries per process 1024 1024 1024 1024 1024 1024

Size in bytes of undo structure 8208 8208 8208 8208 8208 8208

Semaphore maximum value 32767 32767 32767 32767 32767 32767

Adjust on exit maximum value 16384 16384 16384 16384 16384 16384

Page 50: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC Limits

Message Queue 4.3.0 4.3.1 4.3.2 5.1 5.2 5.3

Maximum message size 4 MB 4 MB 4 MB 4 MB 4 MB 4 MB

Maximum bytes on queue 4 MB 4 MB 4 MB 4 MB 4 MB 4 MB

Maximum number of message queue IDs for 32-bit kernel 4096 4096 131072 131072 131072 131072

Maximum number of message queue IDs for 64-bit kernel 4096 4096 131072 131072 131072 1048576

Maximum messages per queue ID 524288 524288 524288 524288 524288 524288

Page 51: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC Limits

Shared Memory 4.3.0 4.3.1 4.3.2 5.1 5.2 5.3

Maximum segment size (32-bit process) 256 MB 2 GB 2 GB 2 GB 2 GB 2 GB

Maximum segment size (64-bit process) for 32-bit kernel 256 MB 2 GB 2 GB 64 GB 1 TB 1 TB

Maximum segment size (64-bit process) for 64-bit kernel 256 MB 2 GB 2 GB 64 GB 1 TB 32 TB

Minimum segment size 1 1 1 1 1 1

Maximum number of shared memory IDs (32-bit kernel) 4096 4096 131072 131072 131072 131072

Maximum number of shared memory IDs (64-bit kernel) 4096 4096 131072 131072 131072 1048576

Maximum number of segments per process (32-bit process) 11 11 11 11 11 11

Maximum number of segments per process (64-bit process)268435456

268435456

268435456

268435456

268435456

268435456

Page 52: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC tunable parameters

– msgmax

– msgmnb

Purpose: Specifies maximum message size.

Values: Dynamic with maximum value of 4 MB

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Purpose: Specifies maximum number of bytes on queue.

Values: Dynamic with maximum value of 4 MB

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Page 53: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC tunable parameters

– msgmni

– msgmnm

Purpose: Specifies maximum number of message queue IDs.

Values: Dynamic with maximum value of 131072

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Purpose: Specifies maximum number of messages per queue.

Values: Dynamic with maximum value of 524288

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Page 54: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC tunable parameters

– semaem

– semmni

Purpose: Specifies maximum value for adjustment on exit.

Values: Dynamic with maximum value of 16384

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Purpose: Specifies maximum number of semaphore IDs.

Values: Dynamic with maximum value of 131072

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Page 55: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC tunable parameters

– semmsl

– semopm

Purpose: Specifies maximum number of semaphores per ID.

Values: Dynamic with maximum value of 65535

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Purpose: Specifies maximum number of operations per semop() call.

Values: Dynamic with maximum value of 1024

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Page 56: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC tunable parameters

– semume

– semvmx

Purpose: Specifies maximum number of undo entries per process.

Values: Dynamic with maximum value of 1024

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Purpose: Specifies maximum value of a semaphore.

Values: Dynamic with maximum value of 32767

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Page 57: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC tunable parameters

– shmmax

– shmmin

Purpose: Specifies maximum shared memory segment size.

Values: Dynamic with maximum value of 256 MB for 32-bit processes and 0x80000000u for 64-bit

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Purpose: Specifies minimum shared-memory-segment size.

Values: Dynamic with minimum value of 1

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Page 58: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IPC

– IPC tunable parameters

– shmmni

Purpose: Specifies maximum number of shared memory IDs.

Values: Dynamic with maximum value of 131072

Display: N/A

Change: N/A

Diagnosis: N/A

Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel.

Page 59: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Multi-Thread Model

– Thread : Process 내에서 존재하는 제어 흐름

– Socket Program

socket

bind

listen

accept

read

write

socket

connect

write

read

close

Server Client

연결요청

데이터요청

데이터수신

pthread_create()

Page 60: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Multi-Thread Model

Server App

ThreadThread

ThreadThread

Client App

Client App

Client App

Client App…

ThreadThread

ThreadThread

Client App

Client App

Client App

Client App

pthread_create()① connecting Client to Server② pthread_create()

요청이있을때마다 pthread_create()가일어나지만, fork()보다는훨씬가볍다.

1

Thread Pool① pthread_create()② connecting Client to Server

fork() 보다는가볍지만 thread 생성시간조차도줄이기위해 pool을사용.

12

2

Page 61: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Multi-Thread Model

– Memory 관점에서의 Multi-Process Model 과 Multi-Thread Model

Code Area

Parent Process

Data Area

Heap Area

Stack Area

Multi-Process Multi-Thread

Code Area

Child Process

Data Area

Heap Area

Stack Area

Code Area

Child Process

Data Area

Heap Area

Stack Area

fork( )

fork( )

Code Area

Parent Process

Data Area

Heap Area

Stack Area

Thread Stack Area

Thread Stack Area

pthread_create( )

공유

Page 62: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Process Scheduling / Context Switching / Cache Hit

• Context Switching Speed : Threads > Processes

Running

Start

Ready

Blocked

End

1

2 3

4

5

6

CPU

Memory

Disk

Cache

Processor

Cache Hit

CPU Time or Clock

Time Slice

Page 63: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• I/O Multiplexing Model

– Socket이 각자의 socket I/O를 이용하여 통신하지 않고 하나의 socket I/O를 통해서 통신하는 방법으로 Socket을 file descriptor table에 등록한 후 file descriptor table의 I/O를 감시해서 다중 접속을 처리

– select / poll

Server Client

연결요청File descriptor 지정

Server Client

data송수신 File descriptor 감시

Server Client

연결종료File descriptor 해제

Page 64: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• I/O Multiplexing Model

– 단점

• I/O Multiplexing을 위해 selec / poll을 이용하는데 넓은 범위의 file descriptor array 중에어떤 file descriptor에서 event가 발생하였는지 일일이 loop를 돌며 확인해야 함

지정한 File descriptorFile descriptor table모든 File descriptor를검사해야함

I/O Multiplexing Model의단점

Page 65: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Event based I/O Model through Real-time Signal

– Event 기반의 socket 처리 방식

• UNIX/Linux : POSIX Real-time Signal, epoll

• Windows, AIX, iSeries OS : IOCP

• FreedBSD : kqueue (kernel queue)

Page 66: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Event based I/O Model through Real-time Signal

– Real-time Signal

• 대기열이 존재하지 않는 Signal의 단점과 이로인해 아무런 정보다 전달되지 않는 단점을보완

• Real-time Signal은 대기열이 존재하며, 대기열의 크기만큼 event를 저장할 수 있어signal의 손실을 피할 수 있다.

• 또한, real-time signal을 발생시킨 socket의 descriptor 등의 정보 전달이 가능하여, 부가적인 정보를 저장할 수 있다.

• select / poll 과 같이 file descriptor table의 descriptor array를 뒤지지 않아도 된다.

Socket1 Thread1Client1Client1

Socket2 Thread2Client2Client2

Socket3 Thread3Client3Client3

SIGRTMIN+1

SIGRTMIN+2

SIGRTMIN+3

Thread-pool을 이용하여Real-time signal을 thread

와 함께 사용

Page 67: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• epoll

– epoll : event poll

– Real-time Signal 보다 약 10% ~ 20%의 성능 향상

– HP-UX, Redhat 지원, AIX 미지원

– Event poll에 넣고 관리하기 때문에 read/write event가 발생하면 관련 정보를 return해줌. Return 되는 정보는 descriptor와 같은 정보로 poll과 같은 loop를 통해 확인할필요가 없다.

Socket1Socket1

Socket2Socket2

Socket3Socket3

Event poll

File descriptor

Page 68: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

dphttpd symmetric multiprocessor result

• epoll

– httpd test result

dphttpd uniprocessor result

Page 69: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• epoll

– Pipetest

Pipetest symmetric multiprocessor result Pipetest uniprocessor result

Page 70: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• epoll

– Dead connecton test

128bytes context ,Dead connections test result 1024ytes context ,Dead connections test result

Page 71: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• IOCP (I/O Completion Ports)

– IOCP on iSeries

• AS/400부터 지원, i는 1988년 AS/400으로 시작, AS/400, OS/400, i5/OS, i6/OS로 발전

• AS/400 QMU 5.0.1.02 introduces asynchronous I/O completion ports (IOCP)

– IOCP on Windows NT

• Windows NT Winsock2부터 지원

– IOCP on AIX

• I/O completion port support was first introduced in AIX 4.3 by APAR IY06351. An I/O completion port was originally a Windows NT scheduling construct that has since been implemented in other OS's. Domino uses these constructs to improve the scalability of the server. It allows one thread to handle multiple session requests, so that a Notes client session is no longer bound to a single thread for its duration. The completion port is tied directly to a device handle and any network I/O requests that are made to that handle.

Page 72: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Parallel Programming

– Fundamental of Parallel Programming

• Multi-Process/Multi-Thread

• Asynchronous Procedure Calls

• Signal, Event

• Queuing Asynchronous Procedure Calls

• IOCP

Ex) File Finder Agent

Page 73: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Parallel Programming – OpenMP(Open Multi-Processing)

– An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism

– Comprised of three primary API components

• Compiler Directives

• Runtime Library Routines

• Environment Variables

– Portable

– Standardized

Page 74: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Parallel Programming - MPI

– MPI (Message Passing Interface)

• Message Passing Parallel Programming을 위한 Standard Data Communication Library

• References

– http://www.mcs.anl.gov/mpi/index.html

– http://www.mpi-forum.org/docs/docs.html

– MPI 목표

• 이식성 (portability)

• 효율성 (efficiency)

• 기능성 (functionality)

Page 75: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Parallel Programming

– MPI 기본 개념

• Process 기준으로 작업 할당

• Processor : Process = 1:1 or 1:N

• Message = data + envelope

– 어떤 process가 보내는가?

– 어디에 있는 data를 보내는가?

– 어떤 data를 보내는가?

– 얼마나 보내는가?

– 어떤 process가 받는가?

– 어디에 저장할 것인가?

– 얼마나 받을 준비를 해야 하는가?

• Tag

– Message matching과 구분에 이용

– 순서대로 메시지 도착을 처리할 수 있음

– 와일드 카드 사용 가능

• Communicator

– 서로간에 통신이 허용되는 프로세스들의 집합

Page 76: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Parallel Programming

– MPI 기본 개념

• Process Rank

– 동일한 communicator 내의 process들을 식별하기 위한 식별자

• Point to Point Communication

– 두 개 process 사이의 통신

– 하나의 송신 process에 하나의 수신 process가 대응

• Collective communication

– 동시에 여러 개의 process가 참여

– 1:N, N:1, N:N 대응 가능

– 여러 번의 P2P Communication 사용을 하나의 Collective Communication으로 대체» 오류 가능성 적음, 최적화로 빠름

Page 77: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– Development and execution of Java applications

Page 78: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– Java application을 이용하여 System을 효율적으로 사용하는 방법

• NIO (New I/O)

• NIO pollset

• Garbage collector는 자동으로 collect하도록 나둘 것

• 특별한 이유가 없으면 JRE는 최신으로 update할 것

• 개발시 source code는 최신으로 유지할 것 (Deprecated로 명시된 API는 되도록 다른 API로 변경하여 사용)

• Framework을 사용한다면 framework을 최신으로 유지할 것

반드시개선됨

JRE나 Framework로인해개선이안될수도있음

Page 79: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– pollset

• Java Source code

– DatagramChannel channel = DatagramChannel.open();

– Channel = configureBlocking(false);

– Selector selector = Selector.open();

– Channel.register(selector, SelectKey.OP_READ);

– Channel.register(selector, SelectKey.OP_READ);

– int poll(struct pollfd fds[], nfds_t nfds, int timeout);

• Native pollset interface C source code

– pollset_t ps = pollset_create(int maxfd);

– int rc = pollset_destory(pollset_t ps);

– int rc = pollset_ctl(pollset_t ps, struct poll_ctl *pollctl_array, int array_length);

– int nfound = pollset_poll(pollset_t ps, struct pollfd *polldata_array, int array_length,

int timeout);

Page 80: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– pollset

• Traditional poll method

Page 81: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– pollset

• pollset method

Page 82: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– pollset

• pollcache internal

– pollcache control block

Page 83: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– pollset

• pollset() – bulky update

Page 84: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– pollset

• The throughput performance two drivers(with poll() and with pollset())

– pollset driver가이 poll driver보다 13.3% 성능 향상

Page 85: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Application Architecture

• Java

– pollset

• Time spent on CPU

Page 86: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Table of Contents

• Architectures for effective computing– H/W Architecture– System Architecture– S/W Architecture

• Software Architecture – OS• Software Architecture - Application Architecture

– Multi-Process Model» IPC

– Multi-Thread Model– Process Scheduling / Context Switching / Cache Hit– I/O Multiplexing Model– Event based I/O Model through Real-time Signal– epoll– IOCP– Parallel Programming– Java

» Java pollset

• AIX I/O Model– AIX IOCP

Page 87: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX I/O Model

• select / poll

• pollset

• event

• Real-time Signal

• AIO

• IOCP

Page 88: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– I/O completion port support was first introduced in AIX 4.3 by APAR IY06351. An I/O completion port was originally a Windows NT scheduling construct that has since been implemented in other OS's.

– Software

• DB2

• WebSphere

• Lotus

Page 89: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– Synchronous I/O versus asynchronous I/O

Page 90: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– What is the Queue?

• 한쪽 끝에서 데이터가 삽입되고 그 반대쪽 끝에서 삭제가 일어나는 순서리스트

• FIFO(First Input First Out) List

– What is the Stack?

• LIFO(Last Input First Out) List

Queue Stack

Page 91: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– What is the IOCP?

IOCP

Process

Completion 통보Worker Thread

에서 처리

처리요청등록

Page 92: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– IOCP Operation

Page 93: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– How to configure IOCP on AIX

# smitty iocp

Page 94: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– How to configure IOCP on AIX

Page 95: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– How to configure IOCP on AIX

Page 96: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– How to configure IOCP on AIX

Page 97: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– How to configure IOCP on AIX

Page 98: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– How to configure IOCP on AIX

• fileset : bos.iocp.rte

$ lslpp -l bos.iocp.rteThe output from the lslpp command should be similar to the following : Fileset Level State Description

----------------------------------------------------------------------------Path: /usr/lib/objreposbos.iocp.rte 5.3.9.0 APPLIED I/O Completion Ports API

Path: /etc/objreposbos.iocp.rte 5.3.0.50 COMMITTED I/O Completion Ports API

office2@root/>lsdev -Cciocpiocp0 Available I/O Completion Ports

office2@root/>lsattr -Eliocp0autoconfig available STATE to be configured at system restart True

Page 99: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP API

– CreateCompletionPort

– GetMultipleCompletionStatus

– GetQueuedCompletionStatus

– PostQueuedCompletionStatus

– AcceptIOCP

– ReadFile

– WriteFile

Page 100: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

iSeries IOCP

• IOCP API

– QsoStartAccept

– QsoCreateIOCompletionPort

– QsoDestroyIOCompletionPort

– QsoPostIOCompletion

– QsoStartRecv

– QsoStartSend

– QsoCancelOperation

– QsoWaitForIOCompletion

Page 101: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Windows IOCP

• IOCP API

– CreateIoCompletionPort

– GetQueuedCompletionStatus

– GetQueuedCompletionStatusEx

– PostQueuedCompletionStatus

– ReadFileEx

– WriteFileEx

– Kernel Functions

• NtCreateIoCompletion, NtRemoveIoCompletion

• KeInitializeQueue, KeRemoveQueue

• KeInsertQueue

• KeWaitForSingleObject

• KeDelayExecutionThread

• KiActivateWaiterQueue

• KiUnwaitThread

• NtSetIoCompletion

Page 102: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

AIX IOCP

• IOCP

– CreateIoCompletionPort Function

< IOCP on AIX >

#include <iocp.h>int CreateIoCompletionPort (FileDescriptor, CompletionPort, CompletionKey, ConcurrentThreads)HANDLE FileDescriptor, CompletionPort;DWORD CompletionKey, ConcurrentThreads;

< IOCP on Windows >

HANDLE CreateIoCompletionPort (HANDLE FileHandle, // handle to file (socket)HANDLE ExistingCompletionPort, // handle to I/O completion portULONG_PTR CompletionKey, // completion keyDWORD NumberOfConcurrentThreads // number of threads to execute concurrently);

Page 103: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Table of Contents

• Concerns about Power System

• Summary of the solutions

• Architectures for effective computing– H/W Architecture– System Architecture– S/W Architecture

• Case Study– DB Connection– Network Socket Backlog– Java Process – Signal 11 Received– Mulipage and Compile Option

Page 104: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 1 – DB Connection

• N:N DB Connection (Multi-Process Model)

Oracle

Child processChild process

Child processChild process

Child processChild process.…

Process

Child processChild process

Child processChild process

Child processChild process.…

fork()

.…

fork()

DB Connection은 n:n으로이루어지지만 Oracle의fork()로인해 system resource를낭비

Connection n:n

DB Query의 가장 큰 load1.DB Connect (from network)2.DB Query 해석

Page 105: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 1 – DB Connection

• 1:1 DB Connection (Multi-Process Model)

Oracle

Child process

Process

Child processChild process

Child processChild process

Child processChild process.…

fork()fork()

DB Connection은 1:1로 oracle의 fork()는 1회로제한되어 system resource 낭비가적지만 client의

연결이원활하지않을수있음

Connection 1:1

DB Query의 가장 큰 load1.DB Connect (from network)2.DB Query 해석

Page 106: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 1 – DB Connection

• DB Connection Pool (Multi-Process Model)

– Thread Pool or Process Pool

Oracle

Child process

Process

Child processChild process

Child processChild process

Child processChild process.…

fork()

Connection n:n

ThreadThreadThreadThread

Child processChild process

Child process

Pool 내의 미리맺어놓은 Connection으로처리, Pool의자원을빌려주는형태로, 부족할때 Pool의자원을유동적으로할당가능

Connection Pool

fork()

DB Query의 가장 큰 load1.DB Connect (from network)2.DB Query 해석

Page 107: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 1 – DB Connection

• DB Connection Pool (Multi-Thread Model)

• Multi Treading Model (①, ⑤)

• Thread Pool Model for DB Connection (①, ②, ③, ⑥)

Server App

Thread

Client App

Client App

Client App

Client App…

ThreadThread

ThreadThread

1 2

Oracle

Child processChild process

Child processChild process

54 3

6

ThreadThread

Thread

Pre-Process Model (Process Pool)Pre-Thread Model (Thread Pool)

Page 108: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 2 – Network Socket Backlog

• What are the Socket connection and backlog

socket

bind

listen

accept

read

write

socket

connect

write

read

close

Server Client

연결요청

데이터요청

데이터수신

int listen (SOCKET s, int backlog);

Socket 대기 큐의 최대 개수= Backlog x 1.5(aix pussy factor)

fork()

fork()가 완료되면 listen

Page 109: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 3 – Java Process – Signal 11 Received

• JRE– Java version : J2RE 1.4.2 IBM AIX build ca142-20060421(SR5)

– OS Version : AIX 5.3 TL05 CSP

– Heap Config : -X ms500m -Xmx500m

• Cause of thread dump : signal 11 received

– SIGSEGV received at 0x52b1a52c in <unknown>. Processing terminated.

– SIGSEGV raised in libjitc.a

• 원인

– java process crash while running JIT’ed version of java/io/UnixFileSystem.normalize() method in a java thread

• 해결책

– Java 1.4 SR13으로 update

– JIT에서 java/io/UnixFileSystem.normalize() method skip하도록 명시

• export JITC_COMPILEOPT="SKIP{java/io/UnixFileSystem}{normalize}"

Page 110: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 4 – Multipage and Compile Option

# svmon -P 852128-------------------------------------------------------------------------------

Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB852128 db2sysc 372534 65669 0 371671 Y N N

PageSize Inuse Pin Pgsp Virtuals 4 KB 4521 0 0 3657m 64 KB 302477 133 0 302478

Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual0 0 work kernel (lgpg_vsid=0) L 65536 65536 0 65536

Addr Range: 0..655352e845f 78000048 work default shmat/mmap m 4096 0 0 4096

Addr Range: 0..40951987b1 78000021 work default shmat/mmap m 4096 0 0 4096

• Multiple Page Size Application Support

Page 111: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 4 – Multipage and Compile Option

• Supported page sizes

TLB : translation lookaside buffer CPI : clock cycles per instruction

Page Size Required HardwareRequires UserConfiguration

Restricted Kernel

4KB ALL No No 64 & 32

64KBIBM POWER5+™ or

laterNo No 64 only

16MB POWER4™ or later Yes Yes 64 & 32

16GB POWER5+ or later Yes Yes 64 only

Page 112: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Case Study 4 – Multipage and Compile Option

• To configure the number of 16MB large pages on a system, a system administrator can use the vmo command. The following example configures 1GB of 16MB large pages

– # vmo –r –o vmm_mpsize_support=1

– # vmo -r -o lgpg_regions=64 -o lgpg_size=16777216

• A user can set an application's preferred page sizes in its XCOFF/XCOFF64 binary via the ldedit or ld commands.

– ld –o mpsize.out –btextpsize:4K –bstackpsize:64K sub1.o sub2.o

– cc –o mpsize.out –btextpsize:4K –bstackpsize:64K sub1.o sub2.o

Region ld / ldedit optionLDR_CNTRL

environment variableDescription

Data -bdatapsize DATAPSIZE Initialized data, bss, heap

Stack -bstackpsize STACKPSIZE Initial thread stack

Text -btextpsize TEXTPSIZE Main executable text

Page 113: High Performance Power System 효율적으로사용하기 High … ·  · 2009-10-20• Case Study © 2009 IBM ... • CISC –Complex ... –SPARC, POWER, PA-RISC © 2009 IBM Corporation

© 2009 IBM Corporation

Q & A