1 Rafael Ferreira da Silva – [email protected] Online and non-clairvoyant self-healing of workflow executions on grids Rafael FERREIRA

1Rafael Ferreira da Silva – [email protected]

Online and non-clairvoyant self-healingof workflow executions on grids

Rafael FERREIRA DA SILVA, Tristan GLATARD

University of Lyon, CNRS, INSERM, CREATISVilleurbanne, France

Frédéric DESPREZINRIA, University of Lyon, LIP, ENS Lyon

Lyon, France

RésuméRafael Ferreira da Silva

Brazilian, 29 years old, from João Pessoa – PB

PhD candidate at INSA-Lyon (France)Advisors: Frédéric Desprez and Tristan Glatard

MS in Computer Science at UFCG (Brazil, 2010)Advisors: Francisco Brasileiro and Raquel Lopes

BS in Computer Science at UFPB (Brazil, 2007)

ExperienceSoftware Engineer at CNRS (currently)Research of the OurGrid projectTutor and Task Activity Leader of the EELA-2 projectUniversity Campus Ambassador (Sun Microsystems)


Outline

The Virtual Imaging Platform

Self-healing of workflow executions on grids

Handling blocked activities

Optimizing task granularity

Controlling fairness among workflow executions

Conclusions


Outline






Conclusions


Platform goals

Multi-modality medical image simulators Computation time from 1 min to 1 year

Objectives Workflow execution on the European Grid

Infrastructure (EGI) Access to storage resources High–level interface for non-experts

No IT required Software as a Service (SaaS) No client software instalation New features automatically available Consolidated support and troubleshooting


VIP – Architecture


GASW

Workflow EngineJob Generation

Job Scheduler

Data Management

VIP – Web Portal


User Front-End Openly-accessible web portal Access point to models and simulators. User-friendly interface which assists users in using image

simulators. Modular code design (GWT + SmartGWT)

VIP – GRIDA


Grid Data Management Agent Handles file catalog and transfer operations by

pooling Performs file replication on grid storage sitesUser

MachineVIP Server

Grid Storage

User uploads file to VIP Server

GRIDA Uploads file to the grid(replication)

GRIDA Downloadsfile to VIP Server

User downloadsthe file


MOTEUR workflow engine Applications described on formal language http://modalis.i3s.unice.fr/softwares/moteur

Bash scripts wrapped in grid jobs Self-healing of workflow execution

VIP – Workflow Engine

VIP – Task ManagementWorkload Management

System with Pilot Jobs Distributed Infrastructure

with Remote Agent Control (DIRAC) [CPPM-LHCb]

http://diracgrid.org

Hosted by CC-IN2P3French National Instance

Data Storage and Computing Back-End EGI infrastructure, Biomed

VO http://www.egi.eu


Workflow Execution

Rafael Ferreira da Silva – [email protected]

2. User launchesa simulation

3. MOTEUR generatesinvocations

4. GASW generatesgrid jobs

5. Jobs are submittedto DIRAC

6. Pilot jobs aresubmitted to EGI

1. Input dataupload

7. Pilot jobsfetch grid jobs

8. Inputs download

10. Results upload

11. Download results

9. Execution

11

VIP – Facts410 registered users, from

48 countries

Most used portal certificate in EGI (August 2012)https://wiki.egi.eu/wiki/

EGI_robot_certificate_users

Consumed 260 CPU years from August 2012 to April 2013http://dirac.france-grilles.fr

1/10 of the total activity of the biomed international VO. One of the most active users


Repartition of users per country

VIP consumption since August 2012

Outline






Conclusions


Workflow Self-Healing


Problem: costly manual operations Rescheduling tasks, restarting services, killing misbehaving

experiments or replicating data files

Objective: automated platform administration Autonomous detection of operational incidents Perform appropriate set of actions

Assumptions: online and non-clairvoyant Only partial information available Decisions must be fast Production conditions, no user activity and workloads

prediction

Workflow Execution Instances


The healing process sets the degree of FuSM states from incident detection metrics

Fuzzy Finite State Machine


Fuzzy states

Cri

sp

sta

tes

Possible values: 0 or 1

Values between 0 and 1

General MAPE-K loop


Incident 1degree η = 0.8



level1

level2

level3

Roulette wheel selection

Incident 1

Selected

Rule Confidence (ρ)

ρxη

2 1 0.8 0.32

3 1 0.2 0.02

1 1 1.0 0.80

Association rules for incident 1

Incident 2

Selected

Roulette wheel selectionbased on association rules

Set of Actions

x2

level1

level2

level3

level1

level2

level3

€

=ηi

η jj =1

n

∑

event(job completion and failures)

ortimeout

Monitoring Analysis

Execution Knowledge

Planning

Monitoring data

Incident degrees are quantified in discrete incident levels

Thresholds are determined from visual mode clustering or K-means

Incident Levels and Actions


No actions are triggered Triggers a set of actions

Thresholds cluster platformconfigurations into groups

Workload for Case StudiesBased on the workload of VIP

January 2011 to April 2012

Case Studies on: Pilot Jobs User accounting Task analysis Bag of tasks Workflows

112 users 2,941 workflow executions 680,988 tasks

338,989 completed

138,480 error

105,488 aborted

15,576 aborted replicas

48,293 stalled

34,162 queued339,545 pilot jobs


R. Ferreira da Silva, T. Glatard, A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executionss, CoreGRID/ERCIM Workshop on Grids, Clouds and P2P Computing (CGWS), Rhodes Island, Greece, 2012.

Outline






Conclusions


Incident: Activity BlockedAn invocation is late compared to the others

Possible causes Longer waiting times Lost tasks (e.g. killed by site due to quota violation) Resources with poor performance


Invocations completion rate for a simulation Job flow for a simulation

Activity blocked: degreeDegree computed from all completed jobs of the

activity Job phases: setup inputs download execution outputs upload Assumption: bag of tasks (all jobs have equal durations) Median-based estimation:

Incident degree: job performance w.r.t median


€

ηb = 2⋅max pi = p(ti, t~

) =ti

t~

+ ti

,i∈ [1,n] ⎧ ⎨ ⎪

⎩ ⎪

⎫ ⎬ ⎪

⎭ ⎪−1

Median durationof jobs phases

Real jobduration

42s

300s

20s

?

42s

300s

400s*

15s

Estimated jobduration

50s

250s

400s

15s

completed

current

Mi = 715s Ei = 757s

*: max(400s, 20s) = 400s

Levels: identified from the platform logs

Actions Job replication

Cancel replicas with bad performance

Replicate only if all active replicas are running

Activity blocked: levels and actions


Replication process for one task

Level 1(no actions)

Level 2

action: replicate jobs

d

€

τb

Experiment Conditions

Goal: Self-Healing vs No-Healing

Cope with recoverable errors

Metrics Makespan of the activity execution Resource waste

For w < 0: self-healing consumed less resources

For w > 0: self-healing wasted resources€

w =(CPU + data) self −healing

(CPU + data)no−healing

−1


25

FIELD-II/pasa Mean-Shift/hs3

speeds up execution up to 4.5 speeds up execution up to 3.2

Self-Healing process reduced resource consumption up to 35% when compared to

the No-Healing execution

Repetition

w

1 –0.09

2 –0.01

3 –0.05

4 –0.08

5 –0.03

Repetition

w

1 –0.01

2 –0.35

3 –0.01

4 –0.17

5 –0.02

Results


R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of operational workflow incidents on distributed computing infrastructures, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, 2012.

R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of operational workflow incidents on distributed computing infrastructures, Future Generation Computer Systems (FGCS), 2013.

Outline






Conclusions


Low performance of lightweight (a.k.a. fine-grained) tasks: Communication overhead High queuing times

Task Granularity


time

R1

R2

R3

t1

t2

t3

t4

t5

t1

t2

t3

t4

t5

t2

t3

t4

t5

t4

t5

t4

t5 t5 t5

t5

Res

ourc

esT

asks

Task execution

Incident degree

where:

Fineness control: degree


€

η f = maxi∈[1,m ]{ f i = di⋅ ri}

€

di =t~

_ shared

t~

_ shared + ni(t~

− t~

_ shared )

€

ri =max j∈[1,n i ] q j

max j∈[1,n i ] q j + t~

_ shared + ni(t~

− t~

_ shared )

Queued Time Shared Input DataOther Input

DataApplication Execution

€

t~

_ shared

€

t~

€

q j

Fineness control: levels and actions



Actions Task grouping

Grouped pairwise until or the amount of waiting groups Q is smaller or equalto the amount of running groups R

€

τ f

Level 1(no actions)

Level 2

action: task grouping

€

η f ≤ τ f

Levels Incident degree

Coarseness control


€

ηc =R

Q + R

€

τc = 0.5

time

R1

R2

R3

t1

t2

t3

t4

t5

t1

t2+t3

t4+t5

Res

ourc

esT

asks

t2+t3

t4+t5

Loss of parallelism

Non-stationary load

Loss of parallelism

Task-degrouping



Experiment 1 Evaluate the fineness control process under stationary load

Experiment 2 Evaluate the de-grouping control process under non-stationary load

Workflows characteristics

32

Results: stationary load


speeds up execution up to 2.6

R. Ferreira da Silva, T. Glatard, F. Desprez, On-line, non-clairvoyant optimization of workflow activity granularity task on grids, Euro-Par (Submitted), 2013.

33

Results: non-stationary load


R. Ferreira da Silva, T. Glatard, F. Desprez, On-line, non-clairvoyant optimization of workflow activity granularity task on grids, Euro-Par (Submitted), 2013.

Outline






Conclusions


The demand for resources is higher than the offer Workflows are slowed down by concurrent executions

Fairness among workflow executions


time

R1

R2

R3

t1,1

t1,2

t1,3

t1,4

t1,5

Res

ourc

esT

asks

t2,2

t2,1

t2,3

t1,5

t1,3

t1,4

t2,2

t1,5

t2,3

t3,1

t3,1

t1,6

t1,7

t3,2

t2,4

t2,5

t3,3

t1,6

t1,7

t2,4

t2,5

t3,2 t3,3

Very short workflow

Long workflow

Very short workflowexecutions are delayed

Unfairness degree

where:

Fairness control: degree


€

ηu = Wmax −Wmin

€

W i = max j∈[1,n i ] wi, j =Qi, j

Qi, j + Ri, j ⋅ Pi, j

⋅Ti, j

⎧ ⎨ ⎩

⎫ ⎬ ⎭

Qi,j = number of waiting tasksRi,j = number of running tasks

€

Ti, j =t~

i, j

maxv∈[1,m ],w∈[1,n i

* ](t

~

v,w )

Relative observed duration

€

Pi, j = 2⋅ 1 − maxu∈[1,k j ]

tu

t~

i, j + tu

⎧ ⎨ ⎪

⎩ ⎪

⎫ ⎬ ⎪

⎭ ⎪

⎛

⎝

⎜ ⎜

⎞

⎠

⎟ ⎟

Performance


Actions Task prioritization

Task priority is an integer initialized to 1

Increase priority of Δi,j tasks:

Fairness control: levels and actions


€

τuLevel 1(no actions)

Level 2(action: task prioritization)

€

Δ i, j = Qi, j −(τ u +Wmin )(Qi, j + Ri, jPi, j )

Ti, j

⎢

⎣ ⎢

⎥

⎦ ⎥



Experiment 1 Tests whether unfairness among identical workflows is properly

addressed

Experiment 2 Tests whether the performance of very short workflow executions is

improved by the fairness mechanism

Experiment 3 Tests whether unfairness among different workflows is detected and

properly handled

Workflows characteristics

Experiments: metrics


Unfairness Is the area under the curve ηu during the execution:

Slowdown

where:€

s =Mmulti

Mown

€

μ = ηu(ti)⋅ (ti − ti−1)i=2

M

∑

€

Mown = maxp∈Ω tuu∈p

∑

40

Results: identical workflows


R. Ferreira da Silva, T. Glatard, F. Desprez, Workflow fairness control on online and non-clairvoyant distributed computing platforms, Euro-Par (Submitted), 2013.

makespans and unfairness degree values are significantly reducedreduced σm up to a factor of 15, σs up to a factor of 7, and μ by about 2

41

Results: very short workflows



makespans of very short workflow executions are significantly reducedreduced σs up to a factor of 5.9, and μ up to a factor 1.9

42

Results: very short workflows (2)



Speeds up executions up to a factor of 2.9, reduces task averagewaiting time up to a factor of 4.4 and slowdown up to a factor of 5.9

43

Results: different workflows



reduced σs up to a factor of 3.8, and μ up to a factor 1.9

Outline






Conclusions


Concluding remarks


VIP is an open-accessible web portal for multi-modality medical image simulators

No IT required (Software as a Service) Workflow execution on EGI High level interface for non-experts

Self-healing of workflow incidents Implements a generic MAPE-K loop Incident degrees computed online and quantified into levels Actions set based on incident level Non-clairvoyance and online

Handling blocked activities Properly detects and handle blocked activities Speeds up execution up to a factor of 4.5 Reduced resource consumption up to 35%

Concluding remarks (2)


Optimizing task granularity Properly detects and handles lightweight tasks Stationary load: fineness control significantly reduces the makespan of

all applications Non-stationary load: de-grouping algorithm compensates lack of

adaptation of task grouping

Controlling fairness among workflow executions Properly detects and handles unfairness among workflow executions Significantly reduced the standard deviation of the slowdown and

unfairness metric for: Identical workflows Very short workflow execution Different workflows

Rafael Ferreira da Silva – [email protected]

Thank you for your attention.Questions?

http://vip.creatis.insa-lyon.fr

Rafael FERREIRA DA SILVA, Tristan GLATARD

University of Lyon, CNRS, INSERM, CREATISVilleurbanne, France

Frédéric DESPREZINRIA, University of Lyon, LIP, ENS Lyon

Lyon, France

Online and non-clairvoyant self-healingof workflow executions on grids

Documents

1 Rafael Ferreira da Silva – [email protected] Online and non-clairvoyant self-healing of workflow executions on grids Rafael FERREIRA