Upload
brianna-thorpe
View
214
Download
1
Embed Size (px)
Citation preview
Modeling a Self-Recoverable Grid-Based Task Scheduler
Matheus Costa LeiteOrientador: Arndt von Staa, Carlos Lucena
04/12/05 2© LES/PUC-Rio
Agent Network (Grid)
0 1 2
1
2
0
04/12/05 © LES/PUC-Rio
Adding a Task
0 1 2
1
2
0
= Task
04/12/05 © LES/PUC-Rio
Task Structure
Task
- id- current replication level- max replication level- description- due date
04/12/05 © LES/PUC-Rio
Task Handling
1 2
1
2
0
I am too busy to accept this task. Maybe my right buddy can help me.
I´ll handle this task, but I´ll also send to my right buddy just in case I die.
Incremented replication
level
04/12/05 © LES/PUC-Rio
Task Execution: Global View
HashBase
EntriesBase
0 1 2
1
2
0
04/12/05 © LES/PUC-Rio
Task Execution: Global View
HashBase
EntriesBase
0 1 2
1
2
0
= Here are some results from task id = 5
04/12/05 8© LES/PUC-Rio
Fault Tolerance
• An important issue when building a grid is what to do if a node fail.
• In order to be effective the grid must implement some sort of
fault tolerance mechanism.
• In our approach the network is a self-healing structure. When
a node dies, new links are formed to substitute the old ones.
• When a node comes back, the original configuration is
restored.
George, S; Evans, D; Marquette, S. A Biological Programming Model for Self-Healing
04/12/05 © LES/PUC-Rio
Fault Tolerance: Hello Messages
0 1 2
1
2
0
= Hello from (0, 2)
(2, 2) is alive, so I won´t remove it
04/12/05 © LES/PUC-Rio
Fault Tolerance: Agent Death
0 1 2
2
0
I haven´t heard from (1, 1) for a while. It
must be dead.
04/12/05 © LES/PUC-Rio
Fault Tolerance: Fixing the Network
0 1 2
1
2
0
= to the right neighbor of (1, 1). Please add me as yours left
neighbor
04/12/05 © LES/PUC-Rio
Fault Tolerance: Fixing the Network
0 1 2
1
2
0
04/12/05 © LES/PUC-Rio
Fault Tolerance: Restoring a Dead Agent
0 1 2
2
0
1
04/12/05 © LES/PUC-Rio
Fault Tolerance: Restoring a Dead Agent
0 1 2
2
0
1
04/12/05 © LES/PUC-Rio
Agent Structure
• Internally the agent is structured as a set of interconnected components:
– Worker
– Buddy Handler
– Replica Handler
– Router
– Neighbor Map
– NET
• Each component receives and sends tasks through the router
to accomplish various goals.
04/12/05 © LES/PUC-Rio
References
• Magalhães, J.; Lucena, C., 2003. Using an Agent-Based Framework and Separation of Concerns for the Generation of Document Classification Tools. In AAAI Spring Symposium/AMKM-Agent Mediated Knowledge Management (to appear LNCS – Springer).
• Wang, Liu - Dynamics of Agent-based Load-Balancing
on Grids
• Fedoruk, A.; Deters, R. Using Agent Replication to
Enhance Reliability and Availability of Multi-Agent
Systems
• George, S; Evans, D; Marquette, S. A Biological
Programming Model for Self-Healing