Upload
donald-williamson
View
215
Download
1
Embed Size (px)
Citation preview
PHENIX Computing
Sangsu Ryu
RHIC PHENIX 실험의 목적QCD 에 의하면 쿼크는 독립적으로 존재할 수 없고 다른 쿼크와 합쳐 color singlet 상태인 강입자로 존재 => 색 속박 현상
Lattice QCD: 극한 상태의 핵 물질에서 색 속박이 풀린 쿼크 - 글루온 플라즈마 (QGP) 상태로 상전이 를 예측
빅뱅 직후 수 μs 동안 초기의 우주물질은 쿼크 - 글루온 플라스마 상태 => 초기 우주상태의 실험실 재현
RHIC Configurations: Two concentric superconducting magnet rings (3.8Km circumference) with 6 interaction regions
Ion Beams: Au + Au (or p + A) s = 200 GeV/nucleon luminosity = 21026 cm-2 s-1
Polarized proton: p + p s = 500 GeV luminosity = 1.4 1031 cm-2 s-1
Experiments: PHENIX, STAR, PHOBOS, BRAHMS
PHENIX Experiment
Physics Goals Search for Quark-Gluon Plasma Hard Scattering Processes Spin Physics
Experimental Apparatus PHENIX Central Arms (e, , hadrons) PHENIX Muon Arms ( )
MVD (Multiplicity Vertex Detector)
• MVD– Charged particle multiplicity– Event characterization– Reconstruction of collision vertex wi
th 100 μ m accuracy– dNch/dηdΦ
• Silicon Detectors– Strips
• 200 μ m pitch, 64cm long– Pads a z=+/-35cm– High granularity with 34,816 channe
ls
MVD Silicon Detector
Pads
Strips
MVD Software•Offline software
•Charged particle multiplicity•dNch/dη•Reconstruction of collision vertex
•Simulation (GEANT3)•Detector response
•Online monitor
PHENIX Data Size
Peak DAQ bandwidth in PHENIX is 20 MB/sec.
Ion Beams (Au + Au) 1. Minimum bias events (0.16 MB/event): Raw event rate = 1400 Hz (224 MB/sec) Trigger rate = 124 Hz (20 MB/sec) 2. Central events (0.4MB/event): Trigger rate = 50 Hz (20 MB/sec)
Polarized proton (p + p) All events (25 KB/event):
Raw event rate = 250 KHz (6250 MB/sec) Trigger rate = 800 Hz (20 MB/sec)
PHENIX Computing Software
• Simulation (PISA)– GEANT3 (in Fortran)
• Reconstruction– PHOOL (modified ROOT) framework
• C++• Online monitor• Software management
– CVS• Rebuild
– Daily rebuild• Batch server
– LSF• Event display• Database
– Calibration : Objectivity, postgreSQL– Datacarousel : postgreSQL
RCF
RHIC Computing Facility (RCF) provides computing facilities for four RHIC experiments (PHENIX, STAR, PHOBOS, BRAHMS).
Typically RCF gets ~ 30 MB/sec (or a few TB/day) from the PHENIX counting house only through Gigabit network. Thus RCF is required to have complicated data storage and data handling systems.
RCF has established an AFS cell for sharing files with remote institutions and NFS is the primary means through which data is made available to the users at the RCF.
The similar facility is established at RIKEN (CC-J) as a regional computing center for PHENIX.
Compact but effective system is also installed at Yonsei.
Calibrations&
Run Info
PHENIX Computing System•Linux OS with ROOT •C++ class library (PHOOL) based on top of ROOT•GNU build system •Database (Objectivity OODB, postgreSQL)
calibration 자료 , 파일 목록 , run 정보 , etc.연간 ~100 GB 의 calibration 자료
LocalDisks
Database
Reconstruction Farm
RawData
DSTData
BigDiskHPSS
Mining&
Staging
AnalysisJobs
Tag DB
CountingHouse
PrettyBigDisk
RawData
Data Carousel using HPSS
To handle annual volume of 500TB from PHENIX only, High Performance Storage System (HPSS) is used as Hierarchical Storage system with tape robotics and disk system.
IBM computer (AIX4.2) organizes the request of users to retrieve data without chaos.
PHENIX used ten 9840 and eight 9940 drives from STK.
The tape media costs about $1/GB.
“ORNL”software
carouselserver
mySQLdatabase
filelist
HPSStape
HPSScache
pftprmine0x client
pftp CAS
data mover
CAS local disk
NFS disk
Data carousel architecture
Disk Storage at RCF
The storage resources are provided by a group of SUN NFS servers with 60TB of SAN based RAID arrays backed by a series of StorageTek tape libraries managed by HPSS.
Vendors of storage disks are Data Direct, MTI, ZZYZX, and LSI.
Linux Farms at RCF
CRS (Central Reconstruction Server) farms are dedicated to the processing of raw event data to generate reconstructed events (strictly batch systems without being available for general users).
CAS (Central Analysis Server) farms are dedicated to the analysis of the reconstructed events (mix of interactive and batch systems).
The LSF, the Load Sharing Facility, manages batch jobs.
There are about 600 machines (dual CPU, IGB memory, 30GB local disks) at RCF and about 200 machines are allocated for PHENIX.
offline software technology
analysis framework C++ class library (PHOOL) based on top of ROOT
base class for analysis modules• “tree” structure for holding, organizing data
can contain raw data, DSTs, transient results• uses ROOT I/O
database using Objectivity OODB for calibration data, file catalog, run info, etc.
expecting ~100 GB/year of calibration data
code development environment based heavily on GNU tools (autoconf, automake, libtool)
PHENIX CC-J
The PHENIX CC-J at RIKEN is intended to serve as the main site of computing for PHENIX simulations, a regional Asia computing center for PHENIX, and as a center for SPIN physics analysis.
In order to exchange data between RCF and CC-J, a proper bandwidth of the WAN between RCF and CC-J is required.
CC-J has CPU farms of 10K SPECint95, tape storage of 100 TB, disk storage of 15 TB, tape I/O of 100 MB/sec, disk I/O of 600MB/sec, and 6 SUN SMP data server units.
Comparable mirror image into Yonsei by “Explicit” copy of the remote system
Usage of the local cluster machines Similar operation environment (same OS, and similar hardware spec) 1. Disk sharing through NFS One installation of analysis library and sharing by other machines 2. Easy upgrade and management
Local clustering Unlimited network resources between the cluster machines by using 100Mbps Current number of the cluster machines = 2 (2CPUs) + 2 (as RAID)
File transfers from RCF Software update by copying shared libraries (once/week, takes less than about 1 hour) Raw data copy using “scp” or BBFTP (~1GB/day)
Situations at YONSEI
Yonsei Computing Resources
Yonsei Linux boxes for PHENIX analysis use
4 desktop boxes in a firewall (Pentium III/IV) Linux (RedHat 7.3, Kernel 2.4.18-3, GCC 2.95.3) ROOT(ROOT 3.01/05)
One machine has all software required for PHENIX analysis Event generation, reconstruction, analysis Remaining desktops share one library directory via NFS 2 large RAID disk box with several IDE HDDs (~500G X 2) and several small disks (total ~500G) in 2 desktops Compact but effective system for small user group
Yonsei Computing Resources Linux (RedHat 7.3, Kernel 2.4.18-3 ,GCC 2.95.3) ROOT(ROOT 3.01/05)
Database
Reconstruction
Calibrations&
Run Info
RawData & DST
Big Disk(480G X 2)RAID tools for Linux
AnalysisJobs
Tag DB
Gateway
Library(NFS)OBJY
P4 2G480G DISK
P3 1G
P3 1G
P4 1.3G480G RAID DISK
Firewall
100Mbps
PHENIX Library
Belle Monte-Carlo Production
• Goals– Monte Carlo event produ
ction– Analysis
• Computing System– Gateway and firewall– 24 cpus, ~5TB disks– 1 Gbps intra network– Servers and clients– Library, database, stora
ge
IntranetIntranet
An
aly
sis
Serv
er
An
aly
sis
Serv
er
An
aly
sis
Serv
er
GatewayGateway
Belle MC Cluster
Internet
Belle L
ibra
ry
Serv
er
Belle D
ata
base
Serv
er
Syste
m S
erv
er
eth
1
eth
0
1G
bp
s C
op
per
1G
bp
s C
op
per
1G
bp
s C
op
per
1G
bp
s C
op
per
1G
bp
s C
op
per
1G
bp
s C
op
per
1G
bp
s F
iber
1G
bp
s C
op
per
MasqueradeFirewall
1Gbps Switching Hub
HardwareItem Quanti
tyModel Specification
Computer 24 Dell Dimension 3000
P4 2.8GHz (total 67GHz), 512MB (333MHz) memory, 160GB HDD (total 3.8TB), 3 year Warranty
Gigabit Fiber NIC
1 3Com 3C-996SX
1Gbps, Optical Fiber, 32/64-bit, 33/66 MHz PCI
Gigabit Copper NIC
24 3Com 3C-2000T
1Gbps, Copper
24-port Switching Hub
1 3Com 3C16479
1Gbps, Copper
KVM 1 ATEN CS-9138
8 ports
Belle Cluster Servers• Automated OS installation
– DHCP/BOOTP– TFTP– NFS or FTP
• System administration– 사용자 관리를 위한 NIS– 파일 공유를 위한 NFS
• Belle 라이브러리– NFS 를 통한 파일 공유
• 자료파일– NFS 를 통한 파일 공유
• Belle 데이터베이스– PostgreSQL 서버