Upload
luke-young
View
216
Download
0
Embed Size (px)
Citation preview
KISTI-GSDC SITE REPORTSang-Un Ahn, Jin Kim
On the behalf of KISTI GSDC
24 March 2015
HEPiX Spring 2015 Workshop
Oxford University, Oxford, UK
HEPiX Spring 2015 Workshop 2
CONTENTS• KISTI GSDC Overview
• Tier-1 operations
• Summary
2015-03-24
HEPiX Spring 2015 Workshop 3
KISTI GSDC OVERVIEW
2015-03-24
HEPiX Spring 2015 Workshop 4
KISTI LocationSouth Korea
Seoul
Busan
Daejeon
2h by car
Gwangju
Jeju Island
Daegu
KISTI
30 Government Research Institutes11 Public Research Institutes29 Non-profit Organizations 7 Universities
Daedeok R&D Innopolis
2015-03-24
Rare Isotope Accelerator(To be constructed)
HEPiX Spring 2015 Workshop 5
KISTI GSDC• Government funding research institute for IT founded in 1962
• 600 people working for National Information Service(distribution & analysis), Super-computing and Networking
• Operating Supercomputing and NREN Infrastructure• Supercomputer: 307.4 TFlops at peak(14th ranked at Top500 in 2009; 201st now)
• NREN Infrastructure: KREONet2 • Domestic: Seoul ←(100G)→ Daejeon• International: HK ←(10G)→ Chicago/Seattle(Member of GLORIAD)
KISTI (Korea Institute of Science and Technology Information)
History of GSDC• 7 years of the experience running grid computing centre
with the collaboration with the ALICE experiment and WLCG
GSDC (Global Science experiment Data hub Center)• Government funding project to promote research experiment pro-
viding computing power and storage• HEP: ALICE, CMS, Belle, RENO• Others: LIGO, Bioinformatics
• Running Data-Intensive Computing Facility• 18 staffs: sysadmin, experiment support, external-relation, administration• Total 5,500 cores, 4,000 TB disk and 1,500 TB tape storage
GSDC Facility
2007 2009 2013201220112010 2014
ALICE T2 operation startFormation of GSDCALICE T2 Test-bed
ALICE T1 Test-bed KISTI Analysis Facility ALICE T1 candidate Full T1 for ALICE CMS T3
2015-03-24
HEPiX Spring 2015 Workshop 6
GSDC System Overview
2015-03-24
1.5 PB
Torque/MAUI 3,000 slots ALICE T1, Belle, RENO
HTCondor 2,000 slots CMS T3, LIGO, KIAF
Public Private
1.5 PB
2.5 PB
IBM TSM/GPFS
HITACHI USP/VSPEMC
HITACHI HNASEMC ICILON
4 Spine switches 74 Leaf switches500+ Servers in 22 racks 14 Storage racks 4 tape frames
40 RACKS!!!!
HEPiX Spring 2015 Workshop 7
System Management
• Services are defined at Puppet (manifests, profiles)• Stash is used for Puppet code management
• Nodes are created/provisioned via Foreman with Puppet classes• Any VMs are managed by the Red Hat solution
• Centralized authN/authZ are provided via IPA (SSO to be imple-mented)
• JIRA helps to track issues and to manage project • Confluence is a useful tool for documentation and sharing
2015-03-24
ProjectIssue tracking
Puppet codemanagement(via Git)
Documentation & Space
Node definitionProvisioning
ManifestsProfiles
v3.7.4
8
TIER-1 OPERATIONS
2015-03-24 HEPiX Spring 2015 Workshop
HEPiX Spring 2015 Workshop 9
KISTI, 4.88%Jobs
Aug 2014 Feb 2015
~ 2500
~ 100 (Queued Agents)
Proxy failure due to KISTI-CERN network downAutomatic backup routing established afterwards
Linux Kernel security patch
2015-03-24
• 2,688 concurrent jobs = 28 kHS06• 84 nodes, 32 (logical) cores per node, 10.5 HS06/core• 2015 pledges
• Stable and smooth running• Down of KISTI-CERN 2G link in September• Linux kernel security patch before Christmas in 2014
• Completed 2.3M jobs in the last 6 months
HEPiX Spring 2015 Workshop 10
Storage• Disk: 1000TB
• Usage > 50%• Managed by XRootD
• Tape: 1500TB• 310 TB RAW data (p-Pb from ALICE)• Available tape buffer = 400 TB• Raw data on tape buffer for fast access• Managed by XRootD
99% Availability (Last 6 Months) for R/W
3 Years history (KISTI_GSDC::SE2)
←Apr 2012 Feb 2015→
2015-03-24
HEPiX Spring 2015 Workshop 11
Site Availability/Reliability
Sep Oct Nov Dec Jan Feb
Reliability 100 100 100 100 100 100
Availability 100 100 100 99 100 100
< Monthly Availability/Reliability (%) >
2015-03-24
• 100% Reliable for the last 6 month (from Sep-2014 to Feb-2015 )• Monthly Target for Reliability of ALICE test: 97%
• On track for a stable and reliable site • Participating in weekly WLCG operations meetings(2 times (Mon/Thu) per week): reporting operation-related issues
HEPiX Spring 2015 Workshop 12
Plan• Additional resources will be procured
• ~900 CPU cores (Ivy-bridge)• ~700 TB disks (NAS/SAN)
• 2016 pledges (31k HS06) for ALICE will be made by the end of this year
• Elasticsearch-Logstash-Kibana will be deployed to moni-tor the whole system
2015-03-24
HEPiX Spring 2015 Workshop 13
KISTI-CERN Network
10Gbps Upgrade
As-Is:
To-Be:
2G KREONET2 + 2G SURFnet
Dedicated Circuit 10G + 10G SURFnet
Contracted provider will
allocate the dedicated cir-
cuit 10G.
31 April
2015-03-24
HEPiX Spring 2015 Workshop 14
감사합니다
2015-03-24