View
9
Download
0
Category
Preview:
Citation preview
www. chameleoncloud.org
THE MANY COLORS OF CHAMELEON Kate Keahey
Mathematics and CS Division, Argonne National Laboratory
CASE, University of Chicago
keahey@anl.gov February 6, 2019 Chameleon User Meeting
www. chameleoncloud.org
CHAMELEONINANUTSHELL� Weliketochange:testbedthatadaptsitselftoyourexperimentalneeds
� Deepreconfigurability(baremetal)andisolation(CHI)–butalsoeaseofuse(KVM)� CHI:poweron/off,reboot,customkernel,serialconsoleaccess,etc.
� Wewanttobeallthingstoallpeople:balancinglarge-scaleanddiverse� Large-scale:~largehomogenouspartition(~15,000cores),5PBofstoragedistributedover
2sites(now+1!)connectedwith100Gnetwork…� …anddiverse:ARMs,Atoms,FPGAs,GPUs,Corsaswitches,etc.
� Wewanttolast:cost-effectivetodeploy,operate,andenhance� PoweredbyOpenStackwithbaremetalreconfiguration(Ironic)� ChameleonteamcontributionrecognizedasofficialOpenStackcomponent
� Welivetoserve:open,productiontestbedforComputerScienceResearch� Startedin10/2014,testbedavailablesince07/2015,renewedin10/2017� Currently~3,000users,~500projects,~100institutions
www. chameleoncloud.org
CHAMELEONHARDWARE
ChameleonCoreNetwork100Gbpsuplinkpublicnetwork
(eachsite)
CoreServices3.5PBStorageSystem
CoreServices0.5PBStorageSystem
HeterogeneousCloudUnitsGPUs(K80,M40,P100),FPGAs,NVMe,SSDs,IB,
ARM,Atom,low-powerXeon
HaswellStandardCloudUnit
42compute4storage
x2
HaswellStandardCloudUnit
42compute4storage
x10
SkyLakeStandardCloudUnit
32computeCorsaSwitch
x2
SkyLakeStandardCloudUnit
32computeCorsaSwitch
x1
GENIandotherpartners
ChameleonAssociateSiteNorthwestern
ChicagoAustin
www. chameleoncloud.org
EXPERIMENTALWORKFLOW
discover resources
allocate resources
configure and interact monitor
- Fine-grained - Complete - Up-to-date - Versioned - Verifiable
- Advance reservations - On-demand - Isolation - Across resource types
- Deeply reconfigurable - Appliance catalog - Snapshotting - Complex Appliances - Network Isolation
- Hardware metrics - Fine-grained data - Aggregate - Archive
CHI = 65%*OpenStack + 10%*G5K + 25%*”special sauce”
www. chameleoncloud.org
IMPROVINGTHEPLATFORM:NETWORKING� Multi-tenantnetworkingallowsuserstoprovisionisolatedL2VLANs
andmanagetheirownIPaddressspace(sinceFall2017)� StitchingdynamicVLANsfromChameleontoexternalpartners
(ExoGENI,ScienceDMZs)(sinceFall2017)� VLANs+AL2SconnectionbetweenUCandTACCfor100Gexperiments
(sinceSpring2018)� BYOC–BringYourOwnController:isolatedusercontrolledvirtual
OpenFlowswitches(sinceSummer2018)� Managingmultiplestitches(sinceFall2018)� VLANreservations(sinceWinter2019),floatingIPreservationscoming
soon!
www. chameleoncloud.org
BRING-YOUR-OWN-CONTROLLER(BYOC)� SoftwareDefinedNetworking
(SDN)� CorsaVirtualForwarding
Context(VFC)� OpenFlow1.3� Userdefinedcontroller
� WithinChameleonoranywhereontheInternet
� AvailableonSkylakenodes
� Supportedcapabilities� SDNexperiments� Experimentsrequiringnon-
standardnetworkingcapabilities
StandardCloudUnit
CorsaSwitch
OpenFlowController(TenantA)
Ryu
ComputeNode
(TenantA)
ComputeNode
(TenantA)
ComputeNode
(TenantB)
ComputeNode
(TenantB)
VFC(TenantA)
OpenFlowController(TenantB)
VFC(Tenantb)
OpenFlowController(TenantA)
www. chameleoncloud.org
EXTERNALSTITCHING
� Layer2VLANsfromChameleontoexternalpartners� ExoGENI,ScienceDMZs,Esnet,andAL2S
� VFCswithmultipleL2stitchedlinks� NamedVFCs
StandardCloudUnit
Internet 2 AL2S, GENI, Future Partners
ChameleonCoreNetwork100Gbpsuplinkpublicnetwork
Chicago
Austin
ComputeNode
(TenantA)
ComputeNode
(TenantA)
ComputeNode
(TenantB)
ComputeNode
(TenantB)
VFC(TenantA)
OpenFlowController(TenantB)
OpenFlowController(TenantA)
Ryu
VFC(Tenantb)
www. chameleoncloud.org
NETWORKINGPATTERNSMADEEASY
� Sharednet1� Pre-configuredlocalsharednetwork
� Sharedwan1� Stitchedsharednetwork
� Pre-configured
� ConnectsUCandTACC
� Upto100Gbps
� Askhowtoaddittoyourproject!
ChameleonCoreNetwork100Gbpsuplinkpublicnetwork
Chicago
StandardCloudUnit
ComputeNode
ComputeNode
Austin
sharednet1 ComputeNode
ComputeNode
StandardCloudUnit
ComputeNode
ComputeNode
sharednet1ComputeNode
ComputeNode
sharedwan1sharedwan1
www. chameleoncloud.org
IMPROVINGTHEPLATFORM:OTHERFEATURES
� Leasemanagement:adding/removingnodesto/fromalease,notificationsofleasestartandimpendingtermination
� Advancereservationorchestration� Powerandtemperaturemetrics� WholediskimagebootforARMnodes� Newappliances(Hadoop,ExoGENI,BYOCexamples)andarichersetof
appliancefeatures:FUSEmoduleandnetworkingsupport� Usabilityfeatures:multi-regionconfiguration,singlelogintoallweb
interfaces,betteraccesstoinformation,bettererrorhandling,softwareself-updates,betterappliancepublishing,documentationoverhaul,etc.
� Chameleontracesarenowavailableatwww.scienceclouds.org
www. chameleoncloud.org
BEYONDTHEPLATFORM:BUILDINGANECOSYSTEM� Helpinghardwareprovidersinteract
� BringYourOwnHardware(BYOH)
� CHI-in-a-Box:deployyourownChameleonsite
� Helpingouruserinteract–withusbutprimarilywitheachother� Facilitatingcontributionsofappliances,tools,andotherartifacts:appliancecatalog,
blogasapublishingplatform,andeventuallynotebooks
� Integratingtoolsforexperimentmanagement
� Makingreproducibilityeasier
� Improvingcommunication–notjustwithusbutwithourusersaswell
www. chameleoncloud.org
CHI-IN-A-BOX� CHI-in-a-box:packagingacommodity-basedtestbed
� Firstreleasedinsummer2018,continuouslyimproving
� CHI-in-a-boxscenarios� Independenttestbed:packageassumesindependentaccount/projectmanagement,
portal,andsupport� Chameleonextension:jointheChameleontestbed(currentlyservingonlyselected
users),andincludesbothuserandoperationssupportPart-timeextension:defineandimplementcontributionmodels
� Part-timeChameleonextension:likeChameleonextensionbutwiththeoptiontotakethetestbedofflineforcertaintimeperiods(supportislimited)
� Adoption� NewChameleonAssociateSiteatNorthwesternsincefall2018–newnetworking!� Twoorganizationsworkingonindependenttestbedconfiguration
www. chameleoncloud.org
REPRODUCIBILITYDILEMMA
� Reproducibilityasside-effect:loweringthecostofrepeatableresearch� Example:Linux“history”command� Fromameanderingscientificprocesstoarecipe
� Reproducibilitybydefault:documentingtheprocessviainteractivepapers
? Should I invest in more new research instead?
Should I invest in making my experiments repeatable?
www. chameleoncloud.org
REPEATABILITYMECHANISMSINCHAMELEON� Testbedversioning(collaborationwithGrid’5000)
� BasedonrepresentationsandtoolsdevelopedbyG5K
� >50versionssincepublicavailability–andcounting
� Stillworkingon:betterfirmwareversionmanagement
� Appliancemanagement� Configuration,versioning,publication
� Appliancemeta-dataviatheappliancecatalog
� OrchestrationviaOpenStackHeat
� Monitoringandlogging� However…theuserstillhastokeeptrackofthisinformation
www. chameleoncloud.org
KEEPINGTRACKOFEXPERIMENTS� Everythinginatestbedisarecordedevent� Theresourcesyouused� Theappliance/imageyoudeployed� Themonitoringinformationyourexperimentgenerated� Plusanyinformationyouchoosetosharewithus:e.g.,“start
power_exp_23”and“stoppower_exp_23
� Experimentprécis:informationaboutyourexperimentmadeavailableina“consumable”form
www. chameleoncloud.org
REPEATABILITY:EXPERIMENTPRÉCIS
Experiment précis
OpenStack services
Instance monitoring
Infrastructure monitoring
User events
Store and share
Orchestrator (Heat)
www. chameleoncloud.org
EXPERIMENTPRÉCIS:ACASESTUDY
Based on Wang et al., Understanding and Auto-Adjusting Performance-Sensitive Configurations. ASPLOS, 2018
Based on Wang et al., Understanding and Auto-Adjusting Performance-Sensitive Configurations. ASPLOS, 2018
www. chameleoncloud.org
INTERACTIVEPAPERS� Whatdoesitmeantodocumentaprocess?� Somerequirements
� Easytoworkwith:humanreadable/modifiableformat� IntegrateswellwithALLaspectsofexperimentmanagement� Bitbybitreplay–allowsforbitbybitmodification(andintrospection)aswell–elementof
interactivity� Supportstorytelling:allowsyoutoexplainyourexperimentdesignandmethodology
choices� Hasadirectrelationshiptotheactualpaperthatgetswritten� Canbeversioncontrolled� Sustainable,apopularopensourcechoice
� Implementationoptions� Orchestrators:Heat,thedashboard,andOpenStackFlame� Notebooks:Jupyter,NextJournal
www. chameleoncloud.org
CHAMELEONJUPYTERINTEGRATION� Combiningtheeaseofnotebooksandthepowerofasharedplatform
� StorytellingwithJupyter:ideas/text,process/code,results� Chameleonsharedexperimentalplatform
� JupyterLabserverforourusers
� Justgotojupyter.chameleoncloud.organdloginwithyourChameleoncredentials
� Chameleon/Jupyterintegration� Alternativeinterface
� Allthemaintestbedfunctions
� “HelloWorld”templateScreencastofacomplexexperiment:https://vimeo.com/297210055
www. chameleoncloud.org
SHARING,EXPERIMENTING,LEVERAGING� SharingJupyternotebooksinChameleon
� Today:fromhomedirectorytosharingviaourSwiftstoragewithyourprojectmembers
� Challengesahead:moreflexiblesharingpolicyimplementation,integratingwithgithubforbetterversioningandsharingsupport
� AutomatingexperimentswithJupyter
www. chameleoncloud.org
PARTINGTHOUGHTS� Physicalenvironment:Chameleonisarapidlyevolvingexperimental
platform� Originally:“Adaptstotheneedsofyourexperiment”� Nowalso:“Adaptstotheneedsofitscommunityandthechangingresearchfrontier”
� TowardsanEcosystem:ameetingplaceofusersandproviderssharingresourcesandresearch� Testbedsaremorethanjustexperimentalplatforms� Common/sharedplatformisa“commondenominator”thatcaneliminatemuch
complexitythatgoesintosystematicexperimentation,sharing,andreproducibility
� Bepartofthechange:telluswhatcapabilitiesweshouldprovidetohelpyoushareandleveragethecontributionsofothers!
Recommended