Upload
delilah-patterson
View
214
Download
0
Embed Size (px)
Citation preview
18-sep-02 1
Computing for CDFStatus and Requests for 2003
Stefano BelforteINFN – Trieste
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 2
The CDF-Italy Computing Plan
Presented on June 24, 2002 Referees (and CSN1) postponed discussion/approval
until November 2002: decide based on experience Collecting experience now No reason to modify plan so far
Today: Status report on analysis farm at FNAL Update on work toward de-centralization
GRID - CNAF Progress toward MOU/MOF Rational for 2003 requests
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 3
Status of CAF
FNAL Central Analysis Farm (CAF): a big success so far Easy to use Effective Convenient
Measure of success 100% used now Upgrade in progress Many institutions spending their $$$ there Cloning started (Korea)
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 4
CDF Central Analysis Farm
Compile/link/debug everywhere Submit from everywhere Execute @ FNAL
Submission of N parallel jobs with single command
Access data from CAF disks now
Access tape data via transparent cache soon now
Get job output everywhere Store small output on local
scratch area for later analysis Access to scratch area from
everywhere
IT WORKS NOW
FNAL
Local Data servers A pile of PC’s
My Desktop
My favorite Computer
gateway
ftp
switch
jobLog
out
out
NFS
rootd
N jobs
rootd
scratchserver
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 5
Tape to Disk to CPU
2TB/day From disk
From tape
Days in September 2002
“Spec. from 2000 review”:
Disk cache should satisfy 80% of all data requests
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 6
CAF promise fulfilled
Giorgio Chiarelli runs 100 section jobs and integrates 120x7x24x3% = 600 CPU hours in a few days using up to more then half the full CAF at the same time
Go through 1TB of data in a few hours
All of this with one single few lines script that automaticallydivides the input among the various job sections
Made in Italy
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 7
Monitoring jobs and sections on the Web
Made in Italy
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 8
Managing user’s area on CAF O(100GB)
Made in Italy
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 9
CAF this summer
CAF stage 1 saved the day for summer conferences 61 duals (10 INFN 16Pitt/CMU) 15 fileservers (4 INFN 1 MIT)
CPU usage ~90% since June
Users happy
Made in Italy
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 10
CAF today
Wait times get longer
Users want more
Ready for Stage 2
New hardware ready this fall for ski conferences
Made in Italy
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 11
CAF Stage 2 (Stage1 x4)
FNAL/CD centralized bid ~ two times/year CDF procurement for Stage 2 this summer
JustInTime to catch INFN funds released in June (x3) Bids are in
Hope for HW up and running in November CSN1 users = 6 months
Many others will join CAF in Stage2 KEK-Japan: 2 fileservers 38 duals Korea : 0.5 fileserver (+ 2 later) Spain : 1 fileserver Canada : 1 fileserver US (8 universities) : 10 fileservers 4 duals More to come
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 12
Why is CAF a success
CAF is more than a pile of PC’s Integrated hw/sw design for farm and access tools Designed for optimized access to data
Lots of disk resident data Large transparent disk cache in front of tape robot Tuning of disk access (data striping, minimal NFS,…)
Designed for users convenience Simple GUI’s, Kerberos based authentication, large local user
areas Professional system management and close loop with
vendors Several hw/firmware/sw problems solved so far
RAID controller, defective RAM, file system or kernel bugs …
Plus the normal failure rate of disks, power supplies etc. 2 FTE on CAF infrastructure
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 13
Will CAF success last ?
User community: ramping up in these days: 20 200 From the pioneers to the masses Exposure to all kinds of access patterns
Hardware expansion: up to a factor 10 over the next 2 years
Only experience will tell CAF is build with the cheapest hardware Will have to learn to live with 10~20% of hardware
broken at any given time
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 14
Beyond CAF
FERMILAB wants to join the GRID FNAL will be Tier1 for CMS-US
Foreign CDF institutions want to integrate their local farms
Spain, Korea, UK, Germany, Canada, Italy In many case to exploit LHC/GRID hardware
So far no big offer of help for common work, unlikely D0 Exception: Canada: 224 nodes “now” for CDF MC
No software tool to do this integration “transparently” yet
Not clear how much this will help CDF analysis
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 15
Decentralizing analysis computing
FNAL-CD working hard to promote SAM for remote work SAM: Metadata catalog + distributed disk caches
Run analysis locally Copy data as needed (only 1st time) Works in Trieste (as other places)
SAM to become “the” CDF data access tool SAM integration with (EuroData)GRID being tried
CDF working on “packaging CAF for export” Decentralized CAFs Each handling data independently Cloning FNAL CAF is the easiest way (Korea choice)
Remote farms = extra costs for FNAL
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 16
CDF computing outside US (approx)
2002 2003NotesTB dual
sTB duals
Spain - - 10 50Shared with CMS, plan for EDG toolsNo plan for shared access
Germany 3 +1
20 +10
20 +20
50 +40
Tier1 (shared with LHC) + Tier3 (CDF)No plan for shared accessTesting SAM on Tier3
UK(4 sites)
24 16 80 64Maybe 5x the CPU if 8-way dualsNo EDG, Kerberos for user access, SAM for data. maybe open
Korea 1 20 7 40Want to clone CAF by end of 2002Kerberos for user access, open to allStart w/o SAM
Canada 1 8 28 224No GRID toolsRun official CDF MC and copy to FNAL
Italy 1 5 7 29No plan for shared accessExploring SAM on single node
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 17
MOU/MOF
Moving to a way to recognize foreign contribution IFC and Scrutiny Group to work on this
INFN present in both
Issues being talked about: Computing will have to enter MOF somehow Allow and encourage contribution Take into account history and present situation
No indication of a “crisis” that has to be dumped on the collaborators for help
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 18
2003 requests detailed: 5 items
Stick to June plan :1) Invest majority of resources on FNAL CAF2) Modest growth in Italy for interactive work
Summer experience: needs do not scale down with luminosity No reason to expect large variation from June
numbers Requested resources well within June forecast Nevertheless, prudent, incremental approach
(referees)
New in 20033) Start MC4) Interactive work at FNAL5) Start transition to CNAF
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 19
Tevatron keeps us busy
By next summer tune analysis to same level as Run1: Alignments, precision tracking, secondary vertexes, B-
tag Jet energy corrections, underlying event
Do interesting physics in the meanwhile
Example: All italian Dhh By end of year (100pb-1)
10^6 events in the mass peak, 10^7 in the histogram
4TB of data by spring, 16TB by end 2003 This channel alone saturates disk financed so far
(15TB) Learning field for Bhh
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 20
Monte Carlo
CDF has talked about central production But no overall estimate of needs yet
Next year safe bet: everybody on his/her own Just the same as Run 1
Italian groups starting on this now Plan for capacity of 10^7 events/months
Modest hw need: 10 dual cpu nodes Adequate for most analysis (10x a given dataset) Future growth should be small
Further requests only on basis of clear “cases”
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 21
Interactive work at FNAL
When at FNAL can not run root on Italy’s machines Need “some” “better then desktop” PC (Cfr. June’s
talk)
Referees asked for central management: Defined total cap at 10 “power PCs” Asked for 5 in 2003
4 full time physicists doing analysis at FNAL P.Azzi, R.Carosi, S.Giagu, M.Rescigno
Explore central alternative in 2003 Interactive login pool in CAF Some ideas so far, will try and see
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 22
Moving CAF to CNAF
Spend money in Italy Join INFN effort in building
world class computing center
Easier access to 3rd data and/or interactive resources GARR vs WAN
Tap on GRID/LHC hardware pool for peak needs
Import here tools and experience learnt on CAF
Not an “experiment need” FNAL CAF may be enough
Costs more Poor access to main data
repository (FNAL tapes) Need to replicate easy of use
and operation of FNAL CAF Different hardware = different
problems Have to divert time and effort
from data analysis
PRO’s CON’s
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 23
Moving CAF to CNAF: the proposal
Start with limited, but significant hardware 2003 at CNAF ½ of private share of CAF in 2002 7TB of disk and 29 dual processor estimated on the
basis of expected data needs for top6j and Zbbar
Explore effectiveness of work environment Don’t give up on CAF features Look for added value Will need help (manpower)
Will try and see, decision to leave FNAL will have to be based on proof of existence of valid alternative here
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 24
Summary of requests
2001 0.6 02002 20 80 3362003 40 140 266
Luminosityyear disk
(TB)CPU
(duals)cost/y
(Keuro)
ANALYSIS FARM
2.01.0
commissioning
Planned (Church)
0.3
Target (adjusted)
1.2
2001 0.6 02002 15 60 2632003 37 130 3052.0
0.8commissioning
0.3
1.2
June 24 “plan”
After CSN1’sJune decision
Analysis at FNAL
FNAL CAF: 22TB disk + 63 dual nodes = 132+173=306KEu
Monte Carlo: 10 dual nodes = 28KEu (FNAL price)
CNAF: 7TB disk + 29 dual nodes = 70+96=166KEu
Interactive FNAL: 5 “power PC” = 22.5KEu
Interactive Italy: disk and cpu Pd/Pi/Rm/Ts/… = 50KEu total
18-sep-02 25
SPARE
Spare slides from here on
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 26
Working on CDF CAF is easy
1. Pick a dataset by name2. Decide how many parallel
execution threads (sections)3. Prepare 1 executable, 1 tcl
and 1 script file
Submit from anywhere via simple GUI
Query CAF status at any time via web monitor
Retrieve log/data anywhere via simple GUI
2 step submission of 100 sections
1) In the script: setenv TOT_SECT 100 @ section = $1 - 1 setenv CAF_SECTION $section
2) In the tcl file (only one tcl file) module talk DHInput include dataset bhmu03 setInput cache=DCACHE splitInput slots=$env(TOT_SECT) this=$env(CAF_SECTION)
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 27
Working on CAF is effective
Quickly go through any CDF dataset (disk or tape) Create personalized output and store it locally Run on that output (data file or root ntuple)
Locally on CAF nodes Remotely via rootd (e.g. Root from desktop)
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 28
CAF is convenient: can work from anywhere
All needed code and tools for CDF offline via anonymous ftp or simply from /afs/infn.it Everything runs on plain RedHat 6.x, 7.x
even on GRID testbed no need for customized system install
Need Kerberos ticket to talk to FNAL, but.. One click install of kerberos client from the web
No need for system manager Just type “kinit” and your Fermilab password
Many people work from their laptop !
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 29
CAF future
CSN1 18-sep-02Stefano Belforte – INFN Trieste
CDF computing 30
Little data ? No way !
DAQ runs at full speed Typical Luminosity better then Run1 2 track trigger from SVT is full of charm We are refocusing attention on samples that in the
default scenario would have been limited in statistics Low Pt jets (20GeV) and leptons (8GeV) Charm Interesting for physics
improve on PDG in charm sector Fundamental control samples
Particle ID on Dhh as learning field for Bhh Heavy flavor content in jets B-jet tagging Jets resolution …