TaskMan-Middleware 2011

Riccardo Pulvirenti, Giuseppe Ravidà & Andrea Tino

Università degli Studi di Catania - A.A. 2010/11

Corso di Laurea Specialistica in Ing. Informatica DIIT @ Università di Catania

Prof. Eng. A. Di StefanoEng. G. Morana

Distributed Systems 2010-11

TaskMan-Middleware 2011A Standard C++ distributed middleware for work�ow management over a P2P INET TCP/IP PUSH-Send-Model oriented network

Development Project for Distributed Systems

.cpp

TaskMan-Middleware 2011 by Riccardo Pulvirenti, Giuseppe Ravidà and Andrea Tino Distributed Systems Project 2011

TaskMan-Middleware is a distributed kernel intended to support higher level applications meant to manage large amounts of tasks over a network of peers exposing available computetional resources.

By taking advantage of Infrastructure-as-a-Service (IaaS) we can provide a high level application with some APIs in order to manage tasks and their execution over a P2P network by always guaranteeing a fair share of the computetional shared resource. This approach might even lead to some sort of grid computing system able to be utilized by all people in the world.

A collection of tasks can be submitted to a peer in order to let it being executed. A workflow contains different numbers of tasks, they are processed by a local entity.

The local routing entity is the Manager, it decides which is the best peer where that task must be sent to. Manager embodies complex logic.

Manager cannot decide by its own; Discovery helps him in selecting the best peer.

When a peer has been selected, it is possible to send the task to it. When a peer receives a task, this is executed.

Conceptualizing on the fly...

TaskMan-Middleware OverviewGetting started with basic dynamics


Managing a distributed kernel, we have many elements to deal with. Let’s consider some basic aspects and some definitions that will always be with us during the exploration of the system.

Peer: A machine of the network. It is a node of our computetional grid. Every machine is able to produce its own workflows, but must always let all other machines use its compute-tional resources in order to execute other tasks (not belonging to the current machine). This is needed in order to guarantee fair share.

Task: An entity to be executed. It is commonly viewed as two possible types: bash com-mand tasks and executable tasks. The second one needs binaries to be executed.

Workflow: A collection of independent tasks to be executed.

Network: A collection of machines/peers connected, through TCP/IP protocol, one to each other according to every possible schemes (no topology constraints applied).

Connection: A one-way link from a machine to another one. It states that the first machine knows the second. In our system a one-way connection implicitely turns to a bidirectional linkage because of implemented knowledge dynamics.

Setting a common groundDe�nitions and assumptions


Every peer embodies four important entities, everyone of them is meant to work for a specific purpose and reach some goals. These entities interact altogether through local connections and network connections.

UI: User Interface is meant to produce workflows to be sent to Manager in order to be executed.

Manager: Manager works in order to execute all tasks in a workflow. It must also submit a task to its Worker (residing in the same peer) when it comes from the network. Manager works in symbiosis with Discovery in order to get the correct peer to send the task to.

Discovery: This component locally communicates with Manager in order to specify the correct peer to execute a certain task. It also communicates with Worker (through the net-work) in order to accept all their status notifications. This enables a more intelligent routing dynamic based on a prestational index evaluated basing on workers’ conditions.

Worker: This component executes a task when arriving from its Manager. It also notifies all neighbour peers’ Discovery for its status to be changed.

Another component should be considered too: CORE. It initializes all components in the peer allowing them to run independently as separate threads. CORE also configures the peer basing on a configuration file.

Main actorsIntroducing UI, Manager, Discovery and Worker


TaskMan-Middleware implements many services in order to reach its goals:

Task routing: In order to let the network execute a task, all workflows are managed and each task is assigned to a specific worker on a different peer. It will be, leter, executed.

Peers’ status notification: In order to correctly route a task to the most suitable peer (suitable here means: “the peer having the current best time-variant performance”), all workers periodically send their status to all neighbours.

Multithreaded peers’ status management: When peers’ status notifications are received from a peer, a thread is created in order to manage every single notification.

Multithreaded queue management: All queues are managed using threads. When a new workflow or task must be dequeued, a new thread is created to manage it.

PUSH data sending model: When an executable task must be sent, immediately after its sending, binaries are sent too. This automatically implies every worker to maintain a local collection of data in order to use it when a task must be executed.

Safe sending model: When something must be sent, a control loop is considered in order to manage unreachable destination exception.

Logging: A robust logging system is used to audit every operation occurring in the kernel.

Main servicesIntroducing functionalities and capabilities


There are some issues that were solved when developing our. Previously, we introduced the most important services in TaskMan-Middleware, now let’s try to map an issue to all those services implemented to solve it.

Fast computation Multithread

Robustness Attempts to send

PUSH sending model Collected table for data

Intelligent routing Noti�cation system

Event tracing Synchronized logging

Con�gurability Con�guration system

Flexibility Proxy + Factory

Problems/Requirements to handleSolving issues: easy to say, a little harder to do


Development model and architecture were decided at the beginning of the development process.

Development model: Development had to be completed very fastly. Considering time constraints, technology and requirements for the application, a “light” and easy-to-manage workflow was considered. For this reason the team selected a prototypal model, in particu-lar, a spiral model in which the creation and the specialisation of a single, initial, prototype, determined the evolution of the final software. Many branches from the original prototype were created and others abandoned in order to get the final, working, application.

Software architecture: Due to constraints and requirements, the final application was designed in order to reach, as first important target, Scalability. Flexibility was also consid-ered as an important requirement, as well as Modularity. Following a responsability-chain pattern, the team chose to develop a middleware according to a modular/multilevel plain architecture. Thanks to this architecture, the final software defines different modules (components) able to operate as single entities with the lowest possible interactions with the others (loose coupling); furthermore, thanks to proxies and factories, all communica-tions are transparent and easy-to-manage.

Development processDevelopment model and architecture


TaskMan-Middleware has been developed with the precise purpose of ob-taining a fast and low level system. According to such requirements, lan-guage and architecture were choices to be taken very carefully.

Target language: Standard C++ (g++)Rationale: Low level language, object oriented, high perfor-mance.

Target architecture: Unix/x86Rationale: High performance, Unix compatibility.

External resources: Boost Libraries 1.45.0Rationale: C++ high level library for networking, interprocess, threading, functors, binding.

Development & codingLanguage and technical features


Network flows: Exchanges happen between different peers. Typically involve Discovery/Worker.

Local flows: Exchanges happen in the same peer. Typically involve Manager/Discovery.

Network flows: Exchanges happen between different peers. Typically involve Manager/Manager.

Local flows: Exchanges happen in the same peer. Typically involve Manager/Discovery.

WorkerDescriptor flows: All flows occurring between peers exchanging a worker descrip-tor. These flows are experienced during the kernel execution and typically involve peers’ Manager, Discovery and Worker.

TaskDescriptor flows: All flows occurring between peers exchanging a task descriptor. These flows are experienced during the kernel execution and typically involve peers’ Man-ager and Discovery only.

When running, the kernel exchanges data with all other peers. Trying to figure a global configuration out, let’s consider first what types of communi-cation occur among all components.

Data �owsDescribing data exchanged among components


All three types of flow are represented: line styles define the type of flow. The large dashed line represents data flows (binaries to be sent for exec tasks).

This diagram shows all flows occurring among peers. Only two peers are shown here (for simplicity).

UI

MANAGER DISCOVERY

WORKER

Send

Send

Notify

Notify

State: Worker Descriptor

State: Worker DescriptorTask Descriptor

Task Descriptor

T

W

T

W

UI

MANAGER DISCOVERY

WORKER

Proxies

Output port

Input port

Task

Worker

Data �ows (2)Data �ows at a closer look


Every flow starts from a peer’s component to another one. ALL COMMUNICATIONS happen thanks to an intermediate entity: a proxy whose purpose is hiding low level communication dynamics and avoiding communicating entities to know how they are communicating (it can be over the network or locally).By doing so (and taking advantage of factory creation pattern) it is possible to extend our model to a Cient/Server one by just modifying proxies. This approach provides simplicty, flexibility and scalability.

After introducing flows we are ready to analyze them a bit closer and in-spect dynamics that make possible all communications among peers.

Dst CompSrc Comp

Communication �ow

PROXY

Inspecting interactionsGetting inside �ows


Let’s consider the first steps occurring when a workflow is created and then submitted for its execution.

TIME

Ready to send the task to the

obtained worker.

A worker address is

returned to Manager.

Each task is considered

inside the work�ow.

A new thread is created to

manage the work�ow after

it has been dequeued.

AddrTDWF

Work�ow is sent to Manager through a local

proxy by UI.

A new request for a worker descriptor arrives

from Manager. A task descriptor is provided

along with this call.

A new work�ow is generated with a casual

number of tasks inside it.

MAN

DISC

UI

Local comm.

Net comm.

TaskDescriptor �owBefore sending


Manager knows now the destination peer. Manager also evaluates that a command task must be sent, let’s see what happens.

TIME

TDTD

Task is EXECUTED.

Task typology is evaluated, it is a command

task.

Task is dequeued.

Task to send is recognized to be a command task.

The task is enqueued in

task queue residing on

Worker by a speci�c thread.

The task arrives to Manager on the

other peer.

MAN2

MAN1

W2

Local comm.

Net comm.

TaskDescriptor �ow (2.a)After sending - sending a command task


What if the task were an executable one? Manager has to perform a differ-ent activity and a new dynamic is considered.

TIME

TD

Data

TD

The internal collection of task data

is searched. When data is found (many

attempts are performed) then the

task is EXECUTED.

Task is recognized to

be an executable

task, need to retireve its

data.

Task is sent to local Worker.

Task is dequeued and a new thread is

created to manage it.

Data is inserted in a local

collection.

Binaries (data) are sent

directly to destination

peer’s Worker.

Data is considered and

managed by a speci�c well

created thread.

Task to send is recognized to be

an executable task.

The task is enqueued in

task queue residing on

Worker by a speci�c thread.

The task arrives to Manager on the

other peer.

MAN2

MAN1

W2

Local comm.

Net comm.

TaskDescriptor �ow (2.b)After sending - sending an executable task


Let us focus on a Worker. Every Worker has an internal contol loop that de-tects status changes. Status notifications are sent to neighbour peers.

TIME

WDs

WD

To other neighbours.

Status noti�cation is sent to ALL neighbours.

Status noti�cation is

formed.

A status change has

been detected.Status control

loop activates.

The new status of the peer who

sent the noti�cation is

updated or added.

A new thread is created to

manage the noti�cation.

The noti�cation reaches the Discovery

of one neighbour peer.

DISC2

W1

Local comm.

Net comm.

WorkerDescriptor �ow Sending and receiving noti�cations


Discovery can take the correct decision, to choose a peer, thanks to the notification system. To evaluate the best peer a comparison between PIs (Performance Index) is performed. PI ensures that the current best (having the best performance) peer will be selected to execute a given task.

When trying to send something, every component calls a proxy. All sending procedures in proxies are safe. The possibility that destination peer is temporary unreachable is consid-ered and many attempts are made trying to successfully send the payload. After a max number of attempts, only in this case, the sending process is aborted.

Three flows ensure that a task can be successfully executed with the lowest possible effort. This is guaranteed by taking advantage of network communications and worker status noti-fications.

All communications are performed using proxies. In particular, a proxy is created using its factory.

As the kernel runs and the network is working, a peer can communicate to its neighbours using some flows. Each flow involves two different components on the same or in different peers, or two same components in different peers.

Summarizing everything, we can say the following:

Summarizing �owsTo get the ball rolling...


Let’s have a much closer look to everything we’ve introduced up until now. We are going to consider the most important elements focusing on chosen strategies and solutions in order to face all issues and avoid code flooding.

ADDRESSING

RUNNABLE CLASSES

SYNQUEUE(ING)

MULTITHREADING

PROXY(ING)

TDC GCOLLECT(ING)

NOTIFICATION SYSTEM SYNLOGGING

CONFIGURATION

PERFORMANCE INDEX

Deep diving in implementationsGetting started with code, design and algorithms


Every peer dialogues to all its neighbours thanks to three specific TCP ports on a common IP address. The collection of the IP addr and the three ports, defines the peer network interface.

IP addr: Specifies the unique network location of the peer where it is reachable over the INet.

Man2Man port: Specifies the TCP port where the current peer’s Manager can listen for incoming TaskDescriptor to be executed by the local Worker.

Man2W port: Specifies the TCP port where the current peer’s Worker can listen for incoming data (bins) sent by another peer’s Manager after sending an executable task.

W2Disc port: Specifies the TCP port where the current peer’s Discovery can listen for incoming WorkerDescriptor in order to update performances of all its neighbours.

Network interface is set by CORE class at initialization time basing on settings inside the configuration file.

The Address class is responsible for containing all the necessary informa-tion for the peer network interface. This class is an associated member of the Worker class.

ADDRESSING

010203040506070809101112

namespace middleware {typedef string InetIpAddr;typedef _uint InetPort; class Address { bool operator==(..) {..} bool operator!=(..) {..} InetIpAddr _ip; InetPort _port; InetPort _port_disc; InetPort _port_w;};}

Peer addressingNetwork interfaces of every single peer


Every component runs as a thread in the context of the main application. Every component also creates its own threads in order to perform all tasks in the best (fastest) possible way (minding concurrency).

MULTITHREADING

* * *

CORE

Crea

te/S

ubm

it W

F

UI

Deq

ue W

F

MAN

Wor

ker D

escr

ipto

rar

rival

list

ener

DISC

Deq

ue T

D

W

Man

age

WF

Task

Man

ager

Dat

aSen

d

Man

age

WD

Man

age

Task

Dat

aSen

der

Wor

ker S

tatu

s M

anag

er

= This is a thread

* = More instances of the thread are created

010203040506070809101112131415161718192021

namespace middleware {typedef struct { string man_ip_to_man_bind; string man_port_toman_bind; Worker* ptrto_worker; WorkerDiscovery* ptrto_discovery; string log_postfix;} ManagerConfig; class Manager : public Runner { void exec() {..} void exec_taskmanager() {..} void enqueuer(..) {..}public: void run() {..} void join() {..}};}

Multithreaded componentsWhere threads are used


The kernel is initialized thanks to a configuration file. If no configuration file is provided, the kernel fails running, and quits.

The main actor is the configuration file. It is a plain text file with a very simple line-oriented syntax. All set-tings for the current peer are stored in the file. CORE class acquires the configuration file and sets all compo-nents using settings in the file. Taking advantage of this cascade configuration flow, the application can get scalability and flexibility. Typically, configuration files are named using the .config extension.

CONFIGURATION

01020304050607080910111213141516

C:Every unrecognized sequence (before :) is treated as a comment (by def, C).C:Every line is a configuration entry, recognized sequences are processed.C:--------------------------------------------------------------------------------C:Configuring the current peerCONFIG_ADDRME_IP:127.0.0.1CONFIG_ADDRME_MAN_PORT:1040CONFIG_ADDRME_DISC_PORT:1041CONFIG_ADDRME_W_PORT:1042C:--------------------------------------------------------------------------------CONFIG_OTHERPEERS_NUM:1C:--------------------------------------------------------------------------------C:Configuring beighbour knowledgeCONFIG_ADDRPEER_1_IP:127.0.0.1CONFIG_ADDRPEER_1_TASKS_PORT:2040CONFIG_ADDRPEER_1_WRKRS_PORT:2041CONFIG_ADDRPEER_1_DATA_PORT:2042

Con�guring the kernelCon�guration �le and syntax


To achieve good programming style and in order to let code be easy-to-read, all classes to be run as a thread are provided with two methods inherited by an interface (a pure C++ virtual method class, that is an abstract class).

The interface.hpp file defines a nice trick to let C++ “recognize” the interface keyword. All interfaces can be, so, declared using the interface keyword.All interfaces are implemented by simply using the colon notation as for inheritance system.The Runner interface defines the two needed methods to let a class be runnable.

The interface keyword defini-tion.

The Runner interface definition. Definition of Worker class, note how to make it runnable.

RUNNABLE CLASSES

01020304050607

#ifndef _INTERFACE_HPP_#define _INTERFACE_HPP_

// The trick#define interface class

#endif

010203040506070809

namespace middleware { interface Runner {public: // Runs thread virtual void run() = 0 // Joins thread virtual void join() = 0};}

01020304050607

#include “runner.hpp”namespace middleware {class Worker : public Runner { // Runnable class body}; }

Making things runnablePattern used to develop runnable classes


As said before, all communications are performed by means of proxies, this enables the application to reach scalability and flexibility.

When a component needs to communicate with another one located in the same peer or in a different one, a proxy is used. By doing so it is possible to hide communication implementation/logic to that component with the result that it will never know whether the communication is a local call or a remote connection.

ALL PROXIES are created via a corresponding factory. ALL FACTORIES ARE FRIENDS OF THE CORRESPONDING PROXIES; this ensures that the proxy will be properly instantiated. As a consequence, ALL PROXIES HAVE PRIVATE CONSTRUCTORS.

THE POINTER-SAFE FACTORY PATTERN IS USED FOR FACTORIES. It means that a fac-tory returns the pointer to the constructed proxy which is dynamically allocated by the factory itself. ALL FACTORIES HOLD THE POINTER TO THE WELL-CREATED PROXY, in this way, WHEN A FACTORY GOES OUT OF SCOPE (destructor is called), THE PROXY WILL BE DESTROYED TOO. So the created proxy must come along with its factory.

PROXY(ING)

010203040506070809101112

namespace middleware {class TaskProxyFactory { TaskManagerProxy* _proxy;public: // Constructors TaskProxyFactory(); TaskProxyFactory(..); // Destructor ~TaskProxyFactory() { if (this->_proxy != 0) delete this->_proxy; }};}

Communications through proxiesManaging communications using proxy pattern


All peers have many time variant information, for example, the number of task in queue waiting to be executed. A PI is assigned to every peer basing on these information, the best peer to send a task to is the one having the highest PI.

NETWORK FACTOR: Defines a non-linear neg-sec-derivative dependency between bandwidth and hop-distance

SPEED FACTORY: Defines an algebric dependency among cpu

speed, number of processors and current queue size

CAPACITY FACTOR: A non-linear factor to let cores and memory not weight too much on final result

PERFORMANCE INDEX

Evaluating performance indexHow to decide which peer is the best to execute a task?


One idea: collapsing everything it is needed to let many entities to be collected together (minding or-dering) and let many threads operate on the collec-tion avoiding data non-consistency.

SynQueue is a an object developed with the intention of managing tasks and workflows to be enqueued in a sequence collection able to keep its consistency even when many threads operate on it.

SynQueue is generic and accepts all possible types as input.An internal mutex and an internal condition variable are used to keep the collection consistent.

SYNQUEUE(ING)

0102030405

using namespace middleware::queueing// Max 10 task descriptorSynQueue<TaskDescriptor, 10> q;// Infinite capacity queueSynQueue<TaskDescriptor> qq;

Synchronized queueA class to store tasks and work�ows minding concurrency


When data (binaries of an executable task) are sent to Worker, they are stored in a temporary place to stay, waiting to be extracted when the corresponsing task descriptors arrive.

TDC is susceptible of probable data inconsistency because of binary data never extracted (caused by task descriptor loss over the network after sending). Because of this, a control loop is necessary to periodically check for old entries. This loop resides in a thread properly created by Worker at initialization time.Every entry of the table is provided with a TimeToLive initialized to a value and decreased every cycle. When TTL reaches 0, the entry is removed.

TDC GCOLLECT(ING)

TTL = 100TTL = 11TTL = 51TTL = 1TTL = 120TTL = 23TTL = 110TTL = 25

TTL = 99TTL = 10TTL = 50TTL = 0TTL = 119TTL = 22TTL = 109TTL = 24

REMOVED

Task Data Collection GCollectingGarbage collection policy for TDC in Worker


Inside Worker a well created thread is up to manage Worker status. When a change in the Worker status (change of PI) is detected, the system reacts sending the new WorkerDescriptor to all neighbour peers advising them to change their knowledge and update all PIs of neighbours. The notification is sent to every peer’s Discovery.

When notifications are sent by a Worker, these reach the corresponding destination peer’s Discovery.Discovery, in each peer, has a cotrol loop listening on a configured port and waiting for incoming WorkerD-escriptor to arrive. When a WorkerDescriptor arrives, it means that a neighbour has sent a notification and that that peer’s status must be changed.

The notification loop can be repro-grammed in order to send notifica-tiosn even without caring about status change.

Notifications ensure that Discovery is able to get a Worker for a task (to Manager when questioned) choosing the most suitable peer. This makes the general system more intelligent. If a peer has a very long queue of tasks to execute, maybe the current task should be dispatched to a different peer. But if that peer has a very powerful CPU and many cores, maybe that long queue of tasks will be shortly executed.

The notification system and PI provide the kernel with an intelligent way to dispatch tasks among peers.

NOTIFICATION SYSTEM

Notifying status changeIntelligent P2P


A well structured logging system ensures the possi-bility to trace all the most important events in the system. The system is based on files.

SynLogging is a very versatile system to create logs for one or more runs of the kernel. The system is meant to create four log files for each component in the application. Many peers might run on the same machine, so a postfix is used to differentiate a peer from another one and avoiding collisions.

Logging can be performed in separate or “dense” mode: it is possible to create a log for each component, or a common log for eveyone of them. Furthermore, if no postfix is specified for the current peer, all non-postfixed peers will operate on the same files and will log their content on a common location. This happens because the file writing policy is “open/append or create new”.

SYNLOGGING

APP APPMAN LOG

WLOG

DISCLOG

UILOG

COMMONLOG

OR

Logging systemAuditing and tracking events


TaskMan-Middleware is not a complete software, it is meant to be scaled in future and also provided with more functionalities. There are also some issues to worry about, let’s see, now, the most important information.

Licensing system: GNU GPL (General Public License) v. 3.0Rationale: Possibility to enlarge current implementations, adding new ones and solving current issues.

Code location: Hosted on Google Code Project @ http://code.google.com/p/taskman-middleware.

Most important scaling target: The possibility to act on the kernel in order to support both P2P and Client/Server models by simply operating on proxies. Thanks to the proxy pattern it is possible to re-implement all proxies in order to let them make the entire system work as a client or a server or a peer.

Final considerationsEnhancements, applications and issues


TaskMan-Middleware suffers from some issues due to time constraints and development resources.

Code style: There are some stylistic issues to be solved regarding, expecially, printing. No class still supports the << operator in order to create an output of data. Logging classes use a print() method instead of a static stream like “cout” in conjunction with a properly << operator overloaded on it.

Inheritance: Only some classes defines an internal environment ready for future subclass-ing. Most classes do not use protected members and, so, do not imply a future extension for new implementations.

Tasks: At present, tasks can be command or executables, but, for the project’s ends, no real task final execution is performed, just simulated. It is possible to extend (not even much effort is required for this step) the system and effectively execute a command or a binary when it reaches the final destination.

Task generation: At present, workflows are created randomly by UI. User cannot directly assemble a workflow and submit it using a properly user interface. It is possible, in future, to create a front end to let users create and submit workflows.

Issues and little pathologiesElements to be solved


TaskMan-Middleware has many possibilities regarding scaling and exstensi-bility. The current implementation can be modified in order to support new functionalities and new services.

GUI: There are many C++ libraries for graphic user controls and interface elements. Most of them are not open (like Borland, DevExpress) but many are free and open too. It would be possible to create a GUI for the kernel in order to better print events on tables or also taking advantage of many interactive controls for a better user usage experience.

Charting: Many companies are specialised in charting and they develop solutions and APIs to provide rendering and charting controls to developers (like DevExpress for Borland appli-cations). It would be possible to take advantage of logging classes and implement data structures for rendering all logs and getting statistics about network conditions, throughput, statistical approximation and much more.

QoS: It would be possible to act on proxies in order to provide information about the qual-ity of software, especially regarding network communication, packet loss, turnaround time, timings, events and much more.

Applications and future projectionsWhat can it be...

THANK YOU

Technology

TaskMan-Middleware 2011