Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
N° d’ordre: 2010-ISAL-0052 Année 2010
Thèse
Partage d'informations sensible à la mobilité et à l’intérêt des utilisateurs dans les réseaux mobiles ad-hoc
Présentée devant L’Institut National des Sciences Appliquées de Lyon
(INSA de Lyon)
Pour obtenir Le grade de Docteur
Ecole doctorale INFOMATHS : « Informatique et Mathématiques»
(Spécialité : Informatique)
Par Addisalem Negash Shiferaw
Soutenue le 12 juillet 2010 devant la Commission d’examen
composée de:
Prof. Sylvain Lecomte Université de Valenciennes Rapporteur
Prof. Jean-Marc Pierson Université de Paul Sabatier-Toulouse 3 Rapporteur
Prof. Ernesto Damiani Université de Milan Examinateur
Dr. Richard Chbeir Université de Bourgogne Examinateur
Dr. Dawit Bekele Gouvernance de l’Internet en Afrique Examinateur
Prof. Lionel Brunie INSA de Lyon Directeur de thèse
Dr. Marian Scuturici INSA de Lyon Co-Directeur de thèse
Ordering N°: 2010-ISAL-0052 Year 2010
Thesis
Mobility and Interest Aware Information Sharing in MANETs
Submitted to the National Institute of Applied Sciences (INSA de Lyon)
In fullfillment of the requirement for Doctoral Degree
Doctoral School INFOMATHS: « Computer Science and Mathematics »
(Affiliated Area: Computer Science)
Prepared by Addisalem Negash Shiferaw
Defended on 12 July 2010 in front of
the examination committee :
Prof. Sylvain Lecomte University of Valenciennes Reviewer
Prof. Jean-Marc Pierson University of Paul Sabatier-Toulouse 3 Reviewer
Prof. Ernesto Damiani University of Milan Examiner
Dr. Richard Chbeir University of Bourgogne Examiner
Dr. Dawit Bekele African Regional Bureau Internet Society Examiner
Prof. Lionel Brunie INSA de Lyon Supervisor
Dr. Marian Scuturici INSA de Lyon Co- Supervisor
Remerciements
Plusieurs personnes ont contribués et ont étendus leur aide précieuse dans la préparation et la réalisation de cette thèse. C’est un grand plaisir pour moi de saisir cette occasion d’exprimer ma gratitude pour tous.
Tout d'abord, je tiens à transmettre mes chaleureux remerciements à mon directeur de thèse, prof. Lionel Brunie, pour ses encouragements, ses conseils, son soutien inconditionnel et l’expérience qu’il me la transmise tout au long de ces années de doctorat, Son énergie perpétuelle et son enthousiasme dans la recherche ont rendu mon séjour dans le laboratoire agréable et enrichissant. En outre, il était toujours présent et prêt pour m'aider à surmonter les défis de la vie scolaire et sociale. Je voudrais également remercier sa famille pour l'hospitalité et l’accueil pendant mon séjour en France.
Je tiens également à remercier mon co-directeur de thèse, Dr. Marian Scuturici. Il a été toujours heureux d'interagir et de discuter de mes travaux de recherche et de fournir des conseils constructifs.
Mes remerciements vont également aux membres du jury qui ont accepté de rapporter et examiner ce travail. Je remercie Prof. Ernesto Damiani d’avoir accepté de présider le jury. J’exprime aussi ma gratitude à Prof. Jean-Marc Pierson et à Prof. Sylvain Lecomte qui ont accepté d’être rapporteurs. Je les remercie pour la lecture approfondie du mémoire et les nombreuses remarques pertinentes qu’ils ont formulés. Et enfin je remercie Dr Richard Chbeir et Dr. Dawit Bekele pour les questions très intéressantes qui ont contribué à approfondir ma réflexion.
Je reconnaissante à l’ambassade de France en Ethiopie pour avoir accepté de financer mes recherche et mon séjour en France. À cet égard, je ne veux pas passer sans parler de l'hospitalité que j'ai reçue du personnel du CROUS de Lyon. Je tiens aussi à remercier Dr. Dawit Bekele pour avoir facilité le processus administratif concernant ma bourse avec l’ambassade de France en Ethiopie.
Je tiens à remercier tout les membres de ma famille, surtout Negash Shiferaw, Aregash Mamo, Yalemzewd Negash, Yelewtfrie Negash et Helen Negash pour leurs encouragements et soutiens indispensables. J’ai de la chance d'avoir Shewangizaw Mengesha, mon fiancé, à mes côtés pendant les plus heureux et les plus tristes moments. Il a toujours été de mon coté et a consacré beaucoup de temps pour m'aider à résoudre les problèmes que j'ai rencontrés pendant mes études. Je n'oublierai jamais les soutiens et les aides de mes amis et collègues éthiopiens, y compris Dejene Ejigu, Elizabeth Addis, Fana Belay, Netsanet Mitiku, Girma Berhe et Rahel Kifle. Je voudrais aussi remercier la communauté Ethiopienne de Lyon qui a contribué de près ou de loin au succès de mon travail.
Je suis reconnaissante à mes amis Yaser Fawaz et Sonia Lajmi avec qui j'ai fait de très bonnes discussions scientifiques et nous avons passé des moments inoubliables tout au long de la thèse. Surtout, je n'oublierai jamais leurs soutiens dans des moments difficiles
tels que les deadlines d’articles, la rédaction de la thèse, etc. Par ailleurs, je voudrais remercier Faiza Najjar pour son soutien et ses conseils lors de l’identification de la problématique de recherche. Je tiens à remercier tous mes collègues et le personnel du LIRIS / INSA, surtout Valérie Lebey, Mabrouka Gheraissa, Talar Atéchian, Omar Hasan, Lyes Limam, Zeina Torbey, Armelle-Natacha Ndjafa-Yakou, Vanessa El-Khoury, Adel Ayara, Christian Vilsmaier, Tobias Mayer, Jingwei Miao, Sonia Ben Mokhtar, Nadia Bennani, Sylvie Calabretto et Elod Egyed-Zsigmond
Et enfin, le dernier, mais non le moindre, je tiens à remercier Dieu, que ton nom soit honoré et glorifié!
Addisalem Negash Shiferaw, 12 juillet, 2010, Lyon France
Acknowledgments Several people have contributed and extended their valuable assistance in the preparation
and completion of this thesis. It is a pleasure to convey my gratitude to them all in my humble acknowledgment.
First and foremost, I would like to forward my heartily thank to my supervisor, prof. Lionel Brunie, for his encouragement, guidance and unconditional support starting through out my doctoral study. Working with him permits me to have extraordinary and invaluable experiences through out the research work. His perpetual energy, intelligence and enthusiasm in research make my stay in the laboratory smoother and rewarding. In addition, he was always present and willing to help me to overcome academic and social life challenges. I would like also to thank his family for the hospitality that they have provided me during my stay in France.
I would like to thank my co-advisor Marian Scuturici. He was always delighted to interact and discuss my research work. He provides me with valuable ideas and concepts to realize my research.
My thanks also go to the examination committee members who have agreed to examine and review this research work. I thank Prof. Ernesto Damiani for accepting to chair to the examination committee. I also express my gratitude to Prof. Jean-Marc Pierson and Prof. Sylvain Lecomte who agreed to be reviewers. I am grateful for their thorough reading of the thésis and the pertinent remarks that they have pointed out. Finally, I thank Dr. Richard Chbeir and Dr. Dawit Bekele for posing very interesting questions that have helped me to deepen my reflection.
I owe so much thanks to French Embassy in Ethiopia, Addis Ababa for sponsoring my PhD study. In this regard, I do not want to pass without mentioning the hospitality that I have got from the staffs of CROUS de Lyon. I would also like to forward my special thanks to Dr. Dawit Bekele for facilitating the administrative process concerning my scholarship with the Embassy.
I would like to thank all members of my family, espically Negash Shiferaw, Aregash Mamo, Yalemzewd Negash, Yelewtfrie Negash and Helen Negash, for their indispensable encouragement and supports. I am thankful to have Shewangizaw Mengesha, my fiancé, in my side during the happiest and saddest times. He always got time to help me to resolve the problems that I encountered during my study. I will never forget the supports and helps of my Ethiopian colleagous and friends including Dejene Ejigu, Elizabeth Addis, Fana Belay, Girma Berhe, Netsanet Mitiku and Rahel Kifle. I want to use this opportunity to thank the Ethiopian communities in Lyon who have contributed in one or in the other way to the success of my research work.
I am grateful to my friends Yaser Fawaz and Sonia Lajmi with whom I made very good scientific discussions and had a wonderful time throughout my study. Above all, I will never forget their support in difficult times such as during proof readings of articles and preparation of the thesis manuscript. I would like to use this occasion to thank Faiza Najjar for her support and advice during the identification of the research problem. I would like forward my thanks to all colleagues and staffs of LIRIS/INSA, especially, Valérie Lebey, Mabrouka Gheraissa, Talar Atéchian, Omar Hasan, Zeina Torbey, Armelle-Natacha, Vanessa El-Khoury, Adel Ayara , Christian Vilsmaier, Tobias Mayer, Jingwei Miao, Sonia Ben Mokhtar, Nadia Bennani, Sylvie Calabretto and Elod Egyed-Zsigmond
Last, for not least, I would like to thank God, may your name be honored and glorified! Addisalem Negash Shiferaw, July 12, 2010, Lyon France
Résumé
Le partage d'informations au sein d'un réseau pair à pair mobile est devenu un sujet de
recherche important grâce aux progrès rapides des technologies de communication sans fil
et des dispositifs mobiles intelligents. Le partage d’informations, c'est mettre à disposition
des personnes avec lesquelles on est en contact des données afin de les visualiser, les
modifier ou les télécharger.
Les utilisateurs peuvent partager des informations d’ordre générale (par exemple, des
documents portant sur l’éducation ou le tourisme), des informations d’ordre personnel (par
exemple, des photos et des profils personnels), ou des émissions en direct (par exemple,
des émissions radio ou télévisé). Les informations à partager sont, généralement,
présentées sous la forme d'un fichier. Dans ce cas, le partage d'information peut être
considéré comme le partage de fichiers. Cette thèse traite, généralement, le problème de
partage de fichiers.
En général, les utilisateurs nomades communiquent en utilisant des réseaux sans fil
fournis par leurs fournisseurs d’accès (3G et bientôt 4G) ou des points d'accès publics
répartis dans la ville. Toutefois, les réseaux à infrastructures ne est pas toujours les plus
appropriés vue (i) leur indisponibilité partielle, par exemple, dans les moyens de transports,
dans la compagne, etc. (ii) leur coût potentiellement élevé en particulier pour le partage des
documents multimédia et (iii) la répartition de leurs points d’accès publics non uniforme.
Ainsi, les réseaux mobiles ad-hoc (MANETs) peuvent être une solution plus efficace dans
les endroits où l'installation d'une infrastructure est impossible. Dans un avenir proche, un
MANET sera plus puissant grâce à l’utilisation de la technologie de Wi-Fi direct.
L'objectif de nos travaux de recherche est de concevoir et d’implémenter un système de
partage d’information dans un environnement ad-hoc. Ce système permet aux utilisateurs
de partager les informations où et quand ils ont l'occasion sur MANET. La thèse se
focalise, particulièrement, sur les challenges liés à la mobilité et aux intérêts des
utilisateurs.
Dans un MANET, le partage de l'information est généralement effectué par la
distribution d’annonces et de requêtes. Afin d’éviter la surcharge de l'environnement avec
des annonces et des requêtes inutiles, il est important de concevoir une politique d’annonce
appropriée. Une politique d’annonce spécifie le volume d'informations à avertir, la période
après laquelle une annonce doit être relancée et le nombre de pairs maximum traversé par
une annonce. Elle doit considérer la consommation et la fourniture de l'information qui
sont liées au temps de connexion des utilisateurs (i.e., le temps qu'ils restent ensemble dans
un MANET) et à leurs contextes. Par conséquence, une politique d’annonce devrait être
paramétrée selon le temps de connexion des utilisateurs et leurs contextes.
Vu la quantité massive d’informations à partager, un contrôle/ filtrage de fichiers est mis
en place pour éviter la surcharge du réseau qui peut empêcher d’aboutir l'activité de
partage. En outre, l’interface minuscule des dispositifs mobiles n’est pas appropriée pour
parcourir tous les fichiers disponibles dans l’environnement. Par conséquent, nous
proposons que les fichiers partageables soient choisis en fonction des intérêts des
utilisateurs.
Dans cette thèse, nous proposons un middleware appelé SAMi pour permettre aux
utilisateurs nomades de partager l'information en fonction de leurs intérêts, les contextes et
leurs temps de connexion. Nous proposons une approche pour paramétrer les politiques
d’annonces en fonction des profils des utilisateurs et de leurs contextes. Le processus de
paramétrage est effectué semi-automatiquement par l'analyse des activités de partage
d’informations.
SAMi classe hiérarchiquement des fichiers et les présente dans une structure appelée une
arborescence de fichiers. Au cours du processus d’annonces, le middleware procède à un
annoncement des fichiers en utilisant soit (i) une description détaillée (situé à un niveau
profond dans l’arborescence des fichiers ou soit (ii) une descrition générale (située à un
niveau peu profond). Cette approche permet à un utilisateur de connaître le potentiel d'un
pair de fournir d'informations sans recevoir d’annonces pour chaque fichier partageables.
Ainsi, la diffusion d'une requête est limitée aux seuls pairs ayant le potentiel de fournir les
fichiers demandés.
Les utilisateurs peuvent spécifier leurs intérêts à recevoir ou à fournir des informations
de manière réactive. Les intérêts des utilisateurs peuvent également être automatiquement
déterminés en utilisant les règles d'associations. Ces règles associent les intérêts des
utilisateurs à leur contexte. Nous proposons également d'utiliser les réseaux sociaux pour
faciliter le processus d'identification d'intérêts.
SAMi a été testé dans deux environnements; un simulé et un autre réel en le déployant
sur des dispositifs mobiles reliés entre eux par Bluetooth. Les évaluations qui ont été faites,
nous ont permis de conclure que SAMi a un très bon potentiel pour aider les utilisateurs
nomades à partager l'information en fonction de leurs intérêts. Nos futurs travaux
importants sont liés à la gestion du contexte et la vie privée des utilisateurs.
Mots-clés: partage des données, sensibilité à la mobilité, sensibilité aux intérêts,
classification de fichiers, réactivité au contexte, informatique mobile, réseaux ad-hoc
Abstract
Mobile peer-to-peer information sharing has become an important research topic due to
the rapid advancement in wireless communication technologies and smart devices.
Information sharing is the practice of making information available for other individuals to
view, modify and download. Users may share general information (e.g, documents about
education and tourism), personal information (e.g, personal photos and profiles), or live
information (e.g., news being transmitted on the radio). The information to be shared is
usually presented in the form of a file. In this case, information sharing can be regarded as
file sharing. This thesis specially focuses on issues related to file sharing.
Nowadays, nomadic users usually communicate by using infrastructure-based wireless
networks provided by wireless telecommunication networks (3G and soon 4G) and public
hotspots distributed in the city. However, infrastructure-based wireless networks are not
always adequate because (i) there are places where no infrastructure-based wireless
network exists; (ii) it is costly to use telecommunication networks especially for
multimedia data and (iii) public hot spots are not uniformly distributed. Thus, an
infrastructure-less or a mobile ad-hoc network (MANET) can provide a more efficient
solution in the places where installing an infrastructure is not possible. In the near future, a
MANET will be more powerful with the usage of Wi-Fi direct.
The focus of our research is to build an information sharing system that allows users to
share information wherever and whenever they get the opportunity by using a MANET.
The thesis particularly focuses on the challenges related to the mobility and the interests of
users.
In a MANET, information sharing is usually performed by distributing advertisements
and queries. The preparation and the distribution of an advertisement are guided by an
advertisement policy. An advertisement policy describes the volume of information to be
advertised, the period after which an advertisement can be repeated and the number of
hops that an advertisement traverses. In order not to overload the environment with
unnecessary advertisements and queries, an advertisement policy should be prepared
according to the information consumptions and provisions of users. The information
consumptions and provisions of users are affected by their stay-time, the time that they stay
together in a MANET. Consequently, an advertisement policy should be parameterized
according to the users’ stay time. The users’ stay time is affected by their mobility patterns,
which are expressed by their speeds, movement directions and pause times.
Furthermore, users have a lot of information to share with each other. If files to be shared
are not controlled, the overloading of information will hinder the sharing activity.
Moreover, the input and the output facilities of mobile phones do not allow nomadic users
to browse all of the sharable files in the vicinity. Therefore, we argue that sharable files
should be selected according the users’ interests.
In this thesis, we propose an advertisement-based middleware called SAMi to allow
nomadic users to share information according to their interests, contexts and stay times.
We propose an information discovery approach, which is used by SAMi, to parameterize
advertisement policies according to users’ profiles and contexts. The parameterization
process is performed semi-automatically by analyzing users’ information sharing
activities.
SAMi classifies files hierarchically and presents them in a file tree. Files are advertised
according to users’ profile and context. During advertisements, the middleware advertises
files by using descriptions at the shallow and depth level of the file tree. This approach
permits a user to know the potentials of a peer in information provision without receiving
advertisements for each sharable-file. Thus, the dissemination of a query is limited only to
those peers having the potential to provide the required file.
Users can specify their interests to receive/provide information reactively. Users’
interests can also be automatically determined by using association rules, which associate
users’ interests with their context. We also propose to use the users’ social networks to
facilitate the interest identification processes.
SAMi has been deployed in a simulated environment. It has also been deployed over real
devices interconnected by Bluetooth. From the evaluations that have been made, we have
observed that SAMi has a very good potential to serve nomadic users to share information
according to their interests. Our important future works are related to context management
and privacy of users.
Keywords: data sharing, mobility awareness, interest awareness, classification of files,
mobile computing, context aware computing, ad-hoc network
Table of Content
Chapter 1 Introduction ......................................................................................................... 5
1.1 Background ............................................................................................................. 5
1.2 Motivation and Requirements ................................................................................. 9
1.3 Research Problem.................................................................................................. 12
1.4 Objective ............................................................................................................... 13
1.5 Research contributions .......................................................................................... 14
1.6 Structure of the Thesis........................................................................................... 14
Chapter 2 Related Work..................................................................................................... 17
2.1 Information Sharing in Peer to Peer Systems........................................................ 18
2.2 Information sharing in MANET............................................................................ 28
2.3 Service Discovery ................................................................................................. 40
2.4 Routing.................................................................................................................. 45
2.5 Summary ............................................................................................................... 51
Chapter 3 Interest Awareness ............................................................................................ 53
3.1 Motivation ............................................................................................................. 54
3.2 Definitions............................................................................................................. 56
3.3 Interest aware Information Discovery................................................................... 65
3.4 Interest Identification ............................................................................................ 69
3.5 Social Networking................................................................................................. 78
3.6 Discussion ............................................................................................................. 81
3.7 Conclusion............................................................................................................. 83
Chapter 4 Lifetime Awareness........................................................................................... 85
4.1 Overview ............................................................................................................... 86
4.2 Formalization ........................................................................................................ 87
4.3 Mobility Class Generation .................................................................................... 91
4.4 Mobility class Identification................................................................................ 101
4.5 Conclusion........................................................................................................... 104
Chapter 5 File classification and Organization................................................................ 105
5.1 Motivation ........................................................................................................... 106
5.2 Information Representation ................................................................................. 108
5.3 Classification Algorithm ..................................................................................... 111
5.4 Information Sharing Based on File Organization................................................ 119
5.5 Discussion............................................................................................................ 125
5.6 Conclusion........................................................................................................... 126
Chapter 6 Implementation and Evaluation....................................................................... 127
6.1 SAMi: a Self-Adaptive Middleware.................................................................... 128
6.2 Implementation.................................................................................................... 144
6.3 Evaluation............................................................................................................ 150
6.4 Discussion............................................................................................................ 165
6.5 Conclusion........................................................................................................... 166
Chapter 7 Conclusion and Future Work........................................................................... 169
7.1 Summary of Contributions .................................................................................. 170
7.2 Conclusion........................................................................................................... 171
7.3 Future Work......................................................................................................... 173
Glossary of Acronyms......................................................................................................... 175
Bibliography ........................................................................................................................ 176
Annex A. Résumé Etendu................................................................................................. i
Annex B. Detailed Design of SAMi ...........................................................................xviii
Annex C. Important classes of SAMi .........................................................................xxix
List of figures Figure 2-1: System Layers of P2P Information Management Systems............................... 18
Figure 2-2: Transiently Shared Tuple space........................................................................ 30
Figure 2-3: An example of the global virtual data structure managed by PeerWare .......... 32
Figure 3-1: A MANET in Bus 37........................................................................................ 55
Figure 3-2: Advertisement Distribution by p1 ..................................................................... 68
Figure 4-1. Augment-Volume ............................................................................................. 97
Figure 4-2. Reduce-Volume ................................................................................................ 97
Figure 4-3: An example of stay-time computation............................................................ 102
Figure 5-1: Query resolution via advertisements about individual files ........................... 107
Figure 5-2: File organization and Query resolution .......................................................... 107
Figure 5-3: Example of specialized metadata of a photo .................................................. 108
Figure 5-4: An example metadata of a cluster................................................................... 110
Figure 5-5: Vector representation of a cluster ................................................................... 111
Figure 5-6: An example of association between a file-tree with mobility classes. ........... 116
Figure 5-7: The redundancy created by considering all mobility classes.......................... 117
Figure 5-8: A possible result of k-means classification .................................................... 126
Figure 6-1: Architecture of SAMi ..................................................................................... 129
Figure 6-2: Examples of user agenda and habits............................................................... 132
Figure 6-3: Context management in SAMi ....................................................................... 134
Figure 6-4 : Example of adaptation process ...................................................................... 138
Figure 6-5: Implementation of ConAMi by the file adaptation module............................ 139
Figure 6-6: SAMi deployement......................................................................................... 141
Figure 6-7: Component diagram of SAMi ........................................................................ 142
Figure 6-8: Core classes and their relationships................................................................ 143
Figure 6-9: Class diagram to manage historical data ........................................................ 144
Figure 6-10: Classes for information classification........................................................... 144
Figure 6-11: A Test bed to simulate a MANET ................................................................ 145
Figure 6-12: Examples of representation of metadata in local repository......................... 147
Figure 6-13: Browsing photo by their directory organization........................................... 148
Figure 6-14: Browsing photos by their organization in a file-tree.................................... 149
Figure 6-15: Querying....................................................................................................... 149
Figure 6-16: collaboration during photo annotation ......................................................... 150
Figure 6-17: Deliverability of files for experiment one .................................................... 153
Figure 6-18: Deliverability of files for experiment two.................................................... 153
Figure 6-19: Performance of interest extraction algorithm............................................... 155
Figure 6-20: Rules to identify information demand.......................................................... 156
Figure 6-21: Rules to identify mobility classes................................................................. 158
Figure 6-22: Content based classification performance.................................................... 161
Figure 6-23: Metadata based classification performance.................................................. 161
Figure 6-24: Vector production in the PC......................................................................... 162
Figure 6-25: Performance metadata based classification.................................................. 163
Figure 6-26: Advertisement content determination in a mobile phone............................. 165
Figure 27 : Architecture de SAMi........................................................................................ xi
Figure B-1: State of a device............................................................................................ xviii
Figure B-2: States of a user ................................................................................................ xix
Figure B-3: Activity diagram of advertisement .................................................................. xx
Figure B-4: Searching information for a user query .......................................................... xxi
Figure B-5: Activity diagram of information extraction................................................... xxii
Figure B-6: Activity diagram of query treatment.............................................................. xxii
Figure B-7: File searching from incoming advertisement ............................................... xxiii
Figure B-8: Activity diagram of rule mining ................................................................... xxiv
Figure B-9: Activity diagram file representation and classification ................................ xxiv
Figure B-10: A SAMi-Adapotor yahoo messenger........................................................... xxv
Figure B-11: SAMi-thin ................................................................................................... xxvi
Figure B-12: SAMi GUI .................................................................................................. xxvi
Figure B-13: Classes in SAMi-GUI ................................................................................ xxvii
Figure B-14: SAMi-core ................................................................................................ xxviii
Figure B-15: SAMi-ext .................................................................................................. xxviii
List of Tables Table 2-1: Analysis of P2P systems .................................................................................... 27 Table 2-2: Summary of the information sharing systems designed for MANETs.............. 36 Table 2-3: Analyzes of information sharing system of MANETs....................................... 38 Table 2-4: Analyzes of service discovery protocols............................................................ 45 Table 3-1: Examples of sharing contexts ............................................................................ 59 Table 3-2: Examples of Information demands of Pascal..................................................... 60 Table 3-3: Examples of queries........................................................................................... 70 Table 3-4: Similarity values calculated by using the formula presented in Definition 3.2 . 72 Table 3-5: Example of execution flows during the decomposition of queries.................... 73 Table 3-6: Interests produced from queries listed in ........................................................... 74 Table 3-7: Historical data of Pascal .................................................................................... 77 Table 3-8: Tie-Strengths between Pascal, Anne, Bob and Eve ........................................... 79 Table 4-1: example of a mobility class................................................................................ 90 Table 4-2: Examples of sharing statistics ............................................................................ 92 Table 4-3: Range-lifetimes and advertisement volumes of classes..................................... 92 Table 4-4: Merging of mobility classes............................................................................... 94 Table 4-5: Pascal’s sharing statistics ................................................................................. 104 Table 5-1: Basic description of a file photo ...................................................................... 108 Table 5-2: Description of a cluster .................................................................................... 109 Table 6-1: The inputs of the test-bed................................................................................. 151 Table 6-2: Types of Environments .................................................................................... 152 Table 6-3: Constants for the query extraction algorithm................................................... 154 Table 6-4: Characteristics of information demands .......................................................... 155 Table 6-5: Characteristics of sharing-statistics.................................................................. 157 Table 6-6: Constants considered during rule mining evaluation....................................... 157 Table 6-7: Range-lifetimes of mobility classes designed for sharing context (“”,∅) ....... 158 Table 6-8: Parameters used classification algorithm in the first experimentation ............ 160 Table 6-9: Inputs for classification algorithm for the second type of experimentation .... 162 Table 6-10: Test data used during filtering advertisements .............................................. 164 Table 7-1: Comparing SAMi to existing information sharing systems............................. 172 Table B-1: Important activities to perform advertisement .................................................. xx Table B-2: Activities to extract and search information...................................................xxiii
Chapter 1 : Introduction
5
Chapter 1 Introduction
1.1 Background
1.1.1 Information Sharing
Information sharing is the practice of making information available for other individuals
to view, modify and download. Users may share general information like documents about
education and tourism. They may also share personal information like personal photos and
profiles. It is also possible to exchange live information like the one being transmitted on
the radio.
The information to be shared is often presented in the form of a file. In this case,
information sharing can be regarded as file sharing. The information to be shared can also
be presented as a stream of data. However, as the most popular information sharing
applications are based on file sharing, this thesis specially focuses on issues related to file
sharing.
Information sharing is accomplished via three activities: information discovery, delivery
and routing. These activities can be managed by using a centralized, a partially centralized
or a purely distributed architecture. In a centralized architecture, dedicated server(s)
manage(s) the sharing activities. In a partially centralized architecture, one or more
administrator peers are responsible for managing the information sharing activities. These
administrator peers can hold dedicated or non-dedicated devices. In a purely distributed
architecture, all peers are equal and they share responsibilities equally. In this thesis, we
consider a purely distributed architecture since finding administrator peers is difficult in a
MANET.
Finally, an information sharing system can be anonymous or social network based. In
anonymous systems, information sharing is performed without considering users’
Chapter 1 : Introduction
6
acquaintances. This feature characterizes old file sharing systems. Social network based
systems allow users to share information according to their social relationships. Especially
in MANETs where resources are limited, exploitation of social networking can facilitate
the collaboration of users.
1.1.2 Mobile Ad-hoc Networks
The initial step towards a MANET was the Packet Radio Network (PRNET). The
architecture of PRNET was quite close to the current view of a MANET. Indeed, a PRNET
comprises mobile terminals and mobile repeaters (prefiguring mobile routers). During the
1990s, a number of projects that were inspired by PRNET led to the development of ad-
hoc routing algorithms, and eventually led to the creation of the IETF MANET group. This
group focused mainly on routing algorithms with various goals but evolved to a broader
research scope. These days, various applications/services can be implemented on MANETs
Users, who are opportunistically co-located in places like airports, train stations, coffee
shops, pubs, malls, and highways, can use MANETs to share information instantly. Ad-hoc
networks can also be used for entertainment purposes like providing instant connectivity
for multi-user games.
Ad-hoc networks can be deployed to provide solutions to emergency services when the
existing network infrastructure ceased to operate or they were damaged due to some kind
of disaster like earthquakes, hurricanes, fire, and so on. Similarly, in a battlefield, a
MANET can be deployed to facilitate communications among the soldiers involved in the
field.
The following features [1, 2] characterize a MANET:
1. Mobility of nodes: The movement of peers cannot be controlled in a MANET.
Peers can move from location to location freely and hence, can leave and join the
network at anytime.
2. Lack of infrastructure: As the name implies, a MANET is an infrastructure-less
network. A message from a source peer to a destination peer goes through
Chapter 1 : Introduction
7
multiple peers due to the limited transmission radius. As there is no centralized
control, the network management should be distributed across peers.
3. Scarce resources: Wireless links have limited bandwidth and variable capacity.
In particular, peers participating in a MANET are battery-powered.
In summary, MANETs can provide solutions in situations where infrastructure-based
networks cannot be accessed due to their non-availability or cost. They can also be applied
to efficiently established communications between co-located users. However, the
characteristics of MANETs, i.e., mobility of peers, lack of infrastructure and scarcity of
computing resources create challenges on the usage of MANETs. Thus, in this thesis, our
goal is to design an information sharing middleware that works by considering these
characteristics of MANETs.
1.1.3 Advancement in Mobile Phones and Communication Technologies
Production of mobile devices, mostly cell phones, is increasing in an exponential
manner. The number of subscriptions reached 3.3 billion worldwide in October 2008.
Moreover, it is forecasted to be 5.32 billion by 2013 [3].
Mobile devices have become capable to store a number of files and to perform complex
computations that were only processed by personal computers. They are equipped with
wireless network technologies, sensors and applications; their storage capacities are
increasing each passing day; the processing power of mobile devices has been dramatically
improved. Cell phones’ battery life is also in a continuous improvement. Today, there are
devices that can serve more than 8 hours in active mode, i.e., talking without interruptions
[4].
The introduction of iPhone [5] drastically changed people’s view on cell phones. A lot of
applications and games have been produced for iPhones. These days, people are using their
iPhones to access emails and social network sites such as Facebook.
Chapter 1 : Introduction
8
Thesedays, most of the mobile phones are equipped with short range wireless
communication technologies. In most cases, either Bluetooth or WiFi technology is
integrated [6] with them.
Bluetooth [12] allows devices to communicate over short distances at moderately fast
transmission speeds. Bluetooth provides a wireless point-to-point network for PDAs,
notebooks, printers, mobile phones, audio components, and other devices. The standard
frequency band for Bluetooth is in the 2.400 GHz to 2.483 GHz (83 MHz). Typically,
devices with Bluetooth technology have a range of 10 meters to 100 meters, and data
transfer rates up to 3Mbps. One or more Bluetooth enabled devices forms so called a
piconet. In a Bluetooth piconet, one master can communicate up to 7 active slaves, while
there can be some other up to 248 devices which are in sleep mode (they may participate to
communication actively when an active device goes into sleep mode). Multiple
independent piconets can form a scatternet. In a scatternet, some slaves are used as a
bridge by participating two or more piconets. In Bluetooth scatternets, the number of
devices is not limited.
In 1997, IEEE ratified the 802.11 WLAN standards, establishing a global standard for
implementing and deploying WLANs. IEEE 802.11, which is currently obsolete, had a
throughput of 2 Mbps. Today's WiFi devices, based on IEEE 802.11a and 802.11g, provide
transmission rates up to 54 Mbps [7]. A new standard called IEEE 802.11n [7] that can
support up to 600 Mbps is being standardized. Wi-Fi devices communicate with each other
with the help of a controller-device known as a wireless access point or "hot spot". Hot
spots usually combine three primary functions; physical support for interfacing wireless
and wired networking, routing between devices on the network and service provisioning to
add and remove devices from the network. The Wi-Fi Alliance is nearing completion of a
new specification, named Wi-Fi Direct, to enable Wi-Fi devices to connect to one another
without wireless access points [8]. It allows devices equipped with Wi-fi communication
technology (IEEE 802.11a, 802.11g or IEEE 802.11n) to get involved in an ad-hoc
network by embeding a software access point into these devices.
Chapter 1 : Introduction
9
ZigBee is a low-power, low-cost, low-rate, short-range wireless technology. It is built on
top of the IEEE 802.15.4 WPAN standard [9]. ZigBee radio operates within three different
frequency ranges, 868MHz, 915MHz, and 2.4GHz, and supports data rates of 250kbps
[10]. ZigBee protocols are intended for use in embedded applications requiring low data
rates and low power consumptions. ZigBee's current focus is to define a general-purpose,
inexpensive, self-organizing mesh network that can be used for industrial control,
embedded sensing, medical data collection, smoke and intruder warning, building
automation, home automation, etc.
The maturities of communication and computing technologies indicate the feasibility of
MANETs to allow mobile devices to communicate with each other anywhere and anytime.
Thus, in our thesis, we give more emphasis to mobile phones. We do not consider any
specific communication technology in our information sharing middleware. However,
Bluetooth is considered during the evaluation of the middleware.
1.2 Motivation and Requirements
1.2.1 Scenario
The following scenario will be used to discuss the requirements of an information
sharing system in MANETs. We will also use this scenario to discuss our propositions
through out the thesis.
Pascal, a first year Ph.D. student at INSA, uses MANETs to exchange information in
different locations. In a bus, his PDA connects with devices of fellow passengers via
wireless network technologies. Passengers advertise sharable files to others in their
surrounding. Pascal usually browses the advertisements that he has received in order to
discover the files that he is looking for. If he does not find the files that he needs, he
formulates queries expressing these files. The required-files are, then, searched by
querying the neighborhood.
Chapter 1 : Introduction
10
In his office, Pascal searches documents that helps him to inforce his research in the
Internet. When he leaves the office, unresolved queries are transferred to his cell phone.
During a lunch at the university restaurant and while taking a coffee with colleagues in the
university cafeteria, documents matching with the queries are searched in the
neighborhood.
Pascal makes brainstorming with colleagues and professors in their laboratory. In this
laboratory, there is a habit of making discussions in a park during the summer and at a
restaurant during the winter. Most of the participants use a laptop to take notes. Pascal
has also an obligation to take some courses at INSA. He notes doubts and questions during
the brainstorming session and lectures. Form these doubts and questions, queries are
prepared and searched in the neighborhood. When the documents matching with the
queries are found in the neighborhood, they are downloaded and saved temporarily until
Pascal approves the download.
On weekends, Pascal likes shopping; watching football matches and has the habit to go
to nightclubs with his friends. In a shop, Pascal exchange information about goods. In
nightclubs, Pascal and his friends take photos and share them with each other. Pascal is a
supporter of the football team “Olympique Lyonnais”. Whenever he goes to a stadium, he
exchanges information about Olympique Lyonnais’ players and matches with other
supporters.
1.2.2 Requirements
An information sharing middleware should fulfill the requirements listed below. We
discuss the requirements by using the scenario.
Pervasiveness: nomadic users should be allowed to share information in anywhere, at
anytime and by using any device. Pascal exchanges information in the morning, during
night, at midday etc. In the scenario, the middleware works in different places like a
nightclub, a restaurant, a cafeteria and a stadium. Laptops, mobile phones and PDAs can be
involved in the information sharing process.
Chapter 1 : Introduction
11
Mobility-awareness: the dynamicity of the environment, which is described by the
mobility of users, determines the quantity of information to be presented. For instance, it is
not necessary to present all the files stored in Pascal’s device to the users in a shop since
they do not have time to check and/or download all the presented files.
Interest-awareness: users’ interests vary with the context of their environment. Pascal is
interested to share academic information at school and information about Olympique
Lyonnais’ players and matches at a stadium. He does not have the same interest when he is
with his friends and with his colleagues.
High-level semantics: it consumes a lot of bandwidth and energy to advertise
descriptions of sharable files one by one. It is important to classify files in order to present
them in groups. For instance, Pascal’s photos can be categorized as photos taken at a
nightclub, in the campus, with fellow students and so on.
Context-aware content delivery: the content delivery protocol should be performed
depending on the context of users and their environment. Online and offline delivery can
be applied according to the dynamicity of the environment. Delivery can be performed in
offline mode, via email for example, if the time that the source and requester peers stay
connected is too small to download the requested file. Distributed content delivery can be
applied if there are several sources of information.
Social awareness: information sharing should be conducted according to social
relationships and networking of users. For instance, photos taken by Pascal at a nightclub
should not be proposed to his co-workers but only to his best friends.
Data dissemination: unlike traditional networks, there are no dedicated routing
infrastructures to disseminate advertisements and queries; this is also true for routing
requests, data and responses. Therefore, efficient routing and dissemination algorithms are
needed in order to share information in a MANET efficiently.
Chapter 1 : Introduction
12
1.3 Research Problem
Despite the maturity of the technologies, information sharing over mobile Internet does
not function as nomadic users expect it due to the following main problem:
• Accessing the Internet from a cell phone (via GSM networks) is still expensive,
especially to exchange multimedia data and in the developing countries,
• Public hotspots, directory, routing and other important services required for
information exchange are not uniformly distributed.
To resolve the above problem, MANETs can be used in places where Internet cannot be
used due to the non-availability of infrastructure-based network or due to their cost.
Moreover, the realization of Wifi-Direct will dramatically improve the utilities of
MANETs.
This thesis aims to propose an information sharing middleware for MANETs.
Information sharing is a popular and a matured domain of research. It has been treated
since the invention of the computer networking. However, information sharing is getting
another dimension because of the characteristics of MANETs and the nomadic users. In
this new dimension, information sharing systems should overcome challenges coming
from mobile devices and wireless network technologies. This thesis particularly focuses on
the following two main challenges of information sharing.
Mobility: In a MANET, information sharing is usually performed via the distribution of
advertisements and queries. In order not to overload the environment with unnecessary
advertisement and queries, an advertisement policy should be designed according to the
information consumptions and provisions of users. Information consumption and
production of users are limited by users’ stay-time (the time that they stay together). Users
stay time is affected by their mobility patterns, which depend on their speeds, movement
directions and pause times. Therefore, information sharing middleware should be mobility
aware in order to overcome the challenges coming from the mobility of users. We define
mobility awareness as the designing of the advertisement policy according to the
Chapter 1 : Introduction
13
dynamicity of the network. We use the users’ stay time to measure the dynamicity of the
network.
Interests: Users have a lot of information to share with each other. If files to be shared
are not controlled, the overloading of information will hinder the use of the information
sharing middleware. Thus, a pervasive information sharing system should be interest
aware. We define interest awareness as the ability to adapt the information discovery
approach according to the users’ interests. In other words, the system needs to capture
users’ interests and determines the information to be shared accordingly.
1.4 Objective
The main objective of the thesis is to design a middleware that allows co-located users to
exchange information anywhere and anytime by using MANETs according to their
interests and stay-time (the time they stay together).
The following are our specific objectives:
(1). Design a theoretical model that
• determines sharable files and queries according to the interests of users to
receive and provide information.
• disseminates queries and advertisements according to the users’ interests.
• determines the volume, the radius and the period of advertisements
according to the users’ context and their stay-time.
• identifies the users’ interests to receive and provide information according
to their context and their social networks.
• classifies files hierarchically in such a way that information discovery is
facilitated and simplified.
(2). Designing and implementing an information sharing middleware that
implements the theoretical model specified by (1) and satisfies the
requirements stated in section 1.2.2.
Chapter 1 : Introduction
14
1.5 Research contributions
The main contributions of this research work are:
Interests Awareness: In this thesis, we introduce a concept called Interest that expresses
the information that a user wants to receive or provide. We formalize the concept interest
as well as its usage in information sharing.
Lifetime awareness: We introduce a concept called mobility class that we use to describe
MANETs according to the users’ stay time and their context in such a way that similar
advertisement policies are applied in MANETs described by the same mobility class.
Formalization, computation and application of mobility classes are discussed in this thesis.
Classification of files: We propose that files are hierarchically classified into a “file tree”.
The dimension of a file tree, i.e., the height of the tree and the number of clusters in each
depth, are computed based on mobility classes in consideration. The formation and usage
of file trees are discussed in this thesis.
Adaptable information sharing approach: SAMi (Self Adaptive Middleware) integrates
an adaptable information sharing approach. In SAMi, the delivery of information is
performed from one or more information sources in order to minimize the delivery time
and maximize the chance of obtaining the required information. SAMi can carry out file
discovery by a “push” discovery approach, a “pull” discovery approach or hybrid of the
two. In the “push” approach, a data source make others aware of his sharable files by
disseminating advertisements; in the “pull” approach, a requester peer searches the
source(s) of a file by distributing queries.
1.6 Structure of the Thesis
This thesis is organized into seven chapters. In this chapter, we have discussed the
motivation behind our research work, our research objectives and the core research issues
of the thesis. Related works on research efforts in data routing, service discovery and
information sharing are discussed and analyzed in chapter 2. Interest-awareness and
Chapter 1 : Introduction
15
lifetime-awareness are discussed in chapter 3 and 4 respectively. File representation and
organization are covered in chapter 5. SAMi, a middleware that implements the
proposition in the thesis, is discussed in chapter 6. In this chapter, we also present the
implementation and the evaluation of the middleware. Finally, we summarize our research
contributions and highlight future work in chapter 7. Abbreviations used in the thesis with
their descriptions are listed in the glossary. Finally, a detailed view of the design of the
middleware is provided in Annex A and Annex B.
17
Chapter 2 Related Work
Most of the research efforts in MANETs have been employed for solving problems
related to information dissemination/routing. In the last decade, an increasing research
attraction has also been observed towards information sharing and service discovery. As
MANETs are peer-to-peer (P2P) networks by nature, efforts done towards information
sharing in traditional P2P networks can be a base to design information sharing system for
MANETs. In this chapter, we review and analyze data routing protocols, information
sharing systems and service discovery protocols designed for MANETs and traditional
peer-to-peer networks.
This chapter is organized as follows. Section 2.1 discusses the information sharing
systems that have been proposed in traditional peer-to-peer systems and analyzes the
possibility of their adoption in MANETs. Section 2.2 presents information sharing systems
designed for MANETs. Section 2.3 discusses service discovery protocols with respect to
the objective of the thesis. Section 2.4 presents routing protocols. Finally, we conclude the
chapter by summarizing the important contributions of the reviewed approaches in section
2.5.
Chapter 2: Related Work
18
2.1 Information Sharing in Peer to Peer Systems
Peer to peer (P2P) systems are designed to facilitate the sharing of computing resources
and data by direct involvement of participants, which are usually refered peers/end-peers.
P2P systems are highly linked with information sharing though they can also be used to
share other type of resources as CPU cycles and storage spaces [11].
Peers are in most cases personal computers. They are autonomous and are assumed to
have equal chance to participate in the consumption and the provision of resources.
As displayed in Figure 2-1, P2P systems are structured in three layers [12]:
infrastructure, application and user. The infrastructure layer focuses on the construction of
virtual/overlay networks, which is a computer network built on top of the physical
network. The application layer enables communication and collaboration of entities in the
absence of a centralized control. The user layer is used to facilitate social interaction
among users.
Layer 3: User
Layer 2: Application
Layer 1: Infrastructure
Figure 2-1: System Layers of P2P Information Management Systems
According to the links of peers in the overlay network, which is specified by the
infrastructure layer, peer-to-peer networks are classified [13] as unstructured, structured,
and loosely structured. In unstructured P2P networks, the overlay links are established
arbitrarily. Oppositely, in order to resolve queries efficiently, the topologies of structured
P2P networks are tightly controlled. In such networks, contents and peers are placed
systematically in the overlay network by using a hash function. Finally, loosely structured
Chapter 2: Related Work
19
P2P networks are similar to unstructured P2P networks with respect to the link of peers
and are similar to structured P2P networks with respect to the placements of files.
According to the communication and the collaboration of entities in the network, the
topologies of an unstructured peer-to-peer system can be centralized, pure and hybrid.
In centralized P2P systems, there is a division of the content and the description of the
content. The description of the content is stored in a centralize server while the content is
stored at the end peer level. The central server performs file localization but file delivery is
performed in a peer-to-peer manner.
Unstructured pure peer-to-peer systems have a decentralized topology. In such systems,
there is no central directory server. Indices of shared files are stored locally among all
peers. A requester peer is responsible to search files that it is looking for.
Unstructured Hybrid peer-to-peer systems possess features from both centralized and
pure P2P systems. In such systems, some of the peers are used to keep the indices of files
owned by other devices. Such peers are called super peers. Super peers are responsible to
perform file localization.
Finally, according to the social interactions of users, peer-to-peer systems can be further
classified as anonymous peer-to-peer systems and social network based peer-to-peer
systems. Anonymous peer-to-peer systems are designed for users who do not (are not
interested to) know each other. Social network based peer-to-peer systems are designed for
users having social relationships with each other.
This section is organized as follows. Centralized, pure and hybrid unstructured peer-to-
peer systems are presented in sections 2.1.1, 2.1.2 and 2.1.3 and section 2.1.4 respectively.
Section 2.1.5 discusses structured and loosely structured peer-to-peer systems. Section
2.1.6 discusses social network based peer-to-peer systems. Finally, section 2.2.7 compares,
analyzes and summarizes the studied peer-to-peer systems.
Chapter 2: Related Work
20
2.1.1 Unstructured centralized peer to peer systems
The topology of a centralized P2P system is very much similar to traditional client/server
model. Napster, which allowed users to exchange songs located in their respective
computers that are not fulltime servers, is an example of a centralized P2P system [14].
The central server is a fundamental entity of any centralized P2P system. The central
server is used to manage the files of the end peers. An end-peer logs onto the system by
informing the central server its IP address and the indices of the files that it is willing to
share. The central server maintains a directory of the end-peers. This directory is updated
every time users logon/logoff.
The server is also responsible for searching files on behalf of the peers [15]. A peer
contacts the server when it needs a file. The server checks its directory and then it sends
the requester-peer with the list of addresses of the peers owning the required file.
Afterwards, the file is downloaded from a peer of the list selected by the requester.
Centralized P2P systems were the first ones to be used in the Internet but recently they
have been mainly used for non-file-sharing systems such as SETI@Home, BOINC and
Skype. Similarly, it is very difficult to adopt centralized P2P systems for MANETs, as they
require a dedicated server. The only possible way to use centralized P2P is by replacing the
central server with an ordinary peer by some election process that takes into consideration
the devices’ processing power and battery life as well as the time that the peer stays in the
vicinity. However, there are MANETs populated only by thin devices like cell phones,
PDAs and pagers. Even if it is sometimes possible to find heavy weighted devices like
laptops in MANETs, they are usually battery powered as well.
2.1.2 Unstructured pure peer to peer systems
A pure P2P system seeks to avoid the central server used in centralized P2P systems. In
such a system, all the functionalities of the server are distributed to end peers. Gnutella is
an example of pure centralized P2P system [16].
Chapter 2: Related Work
21
Each Gnutella peer has a direct connection to a small number of other Gnutella peers,
typically around four [16, 17]. A peer can, however, be connected with more peers with the
help of intermediate peers. The number of intermediate peers between two peers are used
to define the number hops. If the number of intermediate peers is n, the number of hops is
n+1.
When a peer searches for a file, it sends a query to each directly connected peer (one hop
peer). Upon the reception of the query, peers forward the query to their neighbors, which in
turn forward the query, and so on, until the query packet reaches a predetermined number
of hops counted from the requester-peer.
When a file matching the query is found, information about the file is routed back to the
requester peer [16]. As in a centralized P2P system, the requester peer then decides from
which peer it will download the file; finally, the transfer takes place directly between the
requester peer and the selected peer owning the file.
Purely decentralized systems are relatively suitable for MANETs. However, they are not
scalable due to arbitrary forwarding of queries. The scalability of the system is deteriorated
in a MANET since devices are battery powered and the bandwidth is limited.
2.1.3 Unstructured hybrid peer to peer systems
Hybrid peer-to-peer systems have been proposed as a solution to the scalability problem
faced by both centralized and decentralized systems. They have properties of centralized
and decentralized systems. As they combine the sharing activities in similar manner, we
describe the hybrid P2P systems by using FastTrack [18], OpenFT [19], JXTA [20] and
eDonkey [11]. FastTrack is the file sharing protocol used by Kazaa [21]. OpenFT (Open
FastTrack) is the file sharing protocol developed by the giFT project. JXTA (Juxtapose) is
a peer-to-peer platform specification initiated by Sun Microsystems in 2001. eDonkey was
a system used to facilitate sharing of files by using a number of servers.
In hybrid architectures, there are two types of peers. Super peers, named search peers in
openFT and rendezvous peer in JXTA, are responsible for searching files on behalf of
Chapter 2: Related Work
22
ordinary peers. In the most of the hybrid peer-to-peer systems, super peers can be ordinary
peers that enter and leave the peer-to-peer network as they want. Dedicated (i.e., static)
super peers are usually used to monitor and keep track of the network. Bootstrapping peers
in FastTrack [18] and index peer in OpenFT [19] are dedicated peers. In eDonkey, all of
the super peers were dedicated servers.
FastTrack and OpenFT use Gnutella as background for file localization [25, 26].
Localization of a file is accomplished through the broadcasting of queries between super
peers. When an ordinary peer prepares a query, it sends the query to a super peer to which
the ordinary peer is connected; the super peer will in turn broadcast the same query to all
other super peers to which it is currently connected; these super peers will forward the
query to other super peers to which they are connected. This process is repeated a fixed
number of times. Super peers are also responsible to gather results of the query and to
transfer the results back to the requester.
JXTA provides a peer-to-peer infrastructure over which other peer-to-peer applications
can be built. JXTA provide protocols to all peers to (1) organize themselves in peer groups,
(2) discovery each other, (3) communicate with each other and (3) monitor each other.
In eDonkey, a protocol named BitTorrent [23] can be used to download a file. BitTorrent
[23] works as follows. Originally, the file to be distributed is available from one server,
called seed. In addition to the seed, there is a tracker server which keeps track of all the
clients of the file in the network. A client which wants to download the file needs to get the
so-called “torrent-file” which contains metadata about the file and the address of the
tracker for that file. The client, then, contacts the tracker and receives a list of peers which
are currently downloading that file or have already downloaded it. The client then selects
some peers from this peer set and starts downloading the file block by block from the
selected peers.
All the discussed hybrid P2P systems consider traditional computing devices. There are,
however, some efforts that have been done to include Java-ME enabled devices in JXTA
networks [26, 27]. JXTA for Java Me, which is called also JXME, allows a Java ME
compliant device to participate in the JXTA network. The mobile-information-device-
Chapter 2: Related Work
23
profile (MIDP) of JAVA-ME provides the needed APIs that are already recognized by the
Java development community, and thus acts as a firm foundation for the creation of
wireless JXTA peers. However, MIDP has too many constraints to fully implement JXTA:
limited libraries, lack of an XML parser and no security support. As a result, J2ME peers
can only act as edge peers.
The large majority of file sharing systems of the Internet are based on the hybrid
unstructured P2P architecture. However, adopting hybrid architecture in MANETs is
difficult since devices should be grouped. In MANETs, the only possible way of grouping
should be geographical based, i.e., devices that are located in the same position can be
classified together. Such techniques are attached with two important challenges: the
elections of group leaders and the communications between them.
2.1.4 Structured P2P systems
Structured P2P systems such as CAN [26], Chord [27], Pastry [28], and Tapestry [29, 30]
manage peer nodes with a logical structure that is formed by using the concept of
distributed hash tables. Distributed hash tables (DHT) are based on the same idea as
conventional hash tables. Each peer has a unique identifier which is calculated, by using a
hash function, from some properties of the peer (e.g., IP address). Similarly, each sharable
file is mapped to the same hash space, for example, by calculating the hash value of the
file’s name. A peer is responsible for files that are mapped “near” to its place in the hash
space. The definition of the metric for proximity (i.e., what does “near” means) depends on
the actual implementation of the DHT.
On top of such a communication network, high level protocols can be deployed, for
example, service discovery, multicast communication, information retrieval, and file
sharing
Structured peer-to-peer systems are more scalable than unstructured systems. In
structured peer-to-peer systems, information discovery is a deterministic process.
However, these important properties (i.e., scalability and deterministic information
Chapter 2: Related Work
24
discovery) are achieved due to the arrangement of peers in an overlay network. The state of
the art systems over structured peer-to-peer networks are too complex to be adopted in a
MANET. In these systems, the physical location of peers is not considered by the hash
function.
2.1.5 Loosely structured peer to peer networks
Loosely structured P2P systems systematically organize files in the network. In our
knowledge, Freenet [31, 32] is the only example of such system. Freenet is a file sharing
system that allows peers to publish, replicate, and retrieve files in the peer-to-peer network.
Files, in Freenet, are represented by binary file-keys obtained by applying a hash
function [32]. Each peer keeps a local table that contains the list of the keys of the files
stored in neighboring peers. When a peer adds a file to the Freenet network, a key is
assigned to it. The file is forwarded to the neighboring peers. Peers receiving the file make
further forwarding. The forwarding process continues until a suitable location is found for
the file or it is forwarded for a specified number of times. The key of a file is used to
determine a suitable location to store a file. For a file having a key value k, a peer provides
a suitable storage location for the file if the peer has files having keys similar to k.
When a peer searches for a file, it sends a query to a neighbor that it thinks closest to the
target. This neighbor peer forwards the query in the same way if it does not have a file
matching the query. The query forwarding process is repeated a fixed number of times.
When a file matching with the query is found, it is replicated over the search path.
Replicated files are deleted only if a peer has run out of space. The deletion of files is
performed in probabilistic manner; rarely used files have a greater chance to be deleted
than popular ones.
Hash function based file indexing is an important contribution of Freenet. In a MANET,
this kind of indexing can be used to facilitate the search process. However, in MANETs,
peers should not be selected arbitrarily to store a file. Thus, to adopt Freenet in a MANET,
Chapter 2: Related Work
25
the hash function should be modified to consider the stay time of the devices, storage
capacities and battery powers.
2.1.6 Social Network based P2P system
A social network is a social structure between participants who are connected through
various social relationships [33]. Web based social networking occurs through a variety of
websites, which are usually called social network sites. Danah M. Boyd [34] defines a
social network site (SNS) as: “a web-based service that allows individuals to (1) construct
a public or semi-public profile within a bounded system, (2) articulate a list of other users
with whom they share a connection, and (3) view and traverse their list of connections and
those made by others within the system”.
In SNSs, users identify their relationship with others. The label for these relationships
differs depending on the site predefined terms e.g., "Friends," "Contacts," and "Fans.". A
social network site (SNS) allows users to share content, interact with each other and
develop communities around similar interests.
There are mobile-specific SNSs (e.g., Dodgebal [35, 36]), and some web-based SNSs
support limited mobile interactions (e.g., MySpace [37] and Facebook [38]). Mobile
specific SNSs work by exchanging SMSs.
Dodgeball is a commercial system that delivers SMS messages to users to alert them
about nearby friends [35]. Dodgeball does not use any location system such as GPS or cell
connection. When a user wants to communicate with near by friends, he/she sends a SMS
message, which specifies the location of the user, to the Dodgeball system. Dodgelball
distributes the message to all this/her pre-selected friends, as well as any friends of friends
within a ten-block radius. Today, Dodgeball is available in 22 cities within the Unites
States including New York, San Francisco, Los Angeles, Chicago, Washington DC,
Boston and Seattle [36].
Chapter 2: Related Work
26
ImaHima is a location-specific application created in 1999 by a Japanese company
named ImaHima [39]. This system allows users to share their current personal status (such
as location, activity or mood) and pictures with near by friends [35].
In summary, the exploitation of social networking of a user makes SNSs peer-to-peer
networks to over shine over traditional ones. However, social network based P2P systems
are at their early stage. Remember that a peer-to-peer system has three layers: user,
application and overlay. Currently, all of SNSs are peer to peer only at the user level. In
addition, in these systems, users have to perform information sharing manually.
2.1.7 Summary
Peer-to-peer information sharing systems concentrate on (1) organization of files and (2)
the organization and interconnection of peers in the overlay network. However, these
systems do not take into consideration the location of peers, their capacities, the time that
they stay together and the bandwidth of the network technology.
Table 2-1 compares the P2P systems Gnutella, Freenet, CAN, eDonkey and Dodgeball
according the requirement “pervasiveness” discussed in chapter 1 is not fulfilled by most
of the systems. Only Dodgeball satisfied the pervasiveness requirement by using SMS to
interconnect nearby friends. The requirement “Social awareness” is satisfied by only
Dodgeball that since in this system, users communicate with their friends. Interest
awareness is not considered by the mentioned peer-to-peer systems.
As files are arranged systematically in the overlay networks, we consider that structured
and loosely structure peer-to-peer systems considers partially the requirement “High level
semantics”.
The requirements “mobility awareness” and “data dissemination” are not the concerns of
most of the existing peer-to-peer systems. There are dedicated routers in the Internet; so in
our knowledge, there is no peer-to-peer system integrating a routing protocol. Finally, as
peers in the Internet are less mobile, mobility awareness is not considered by most of these
systems.
Chapter 2: Related Work
27
Most of the discussed peer-to-peer systems help peers to discover files. However, peers
perform file delivery by themselves. However, recent peer-to-peer systems (including
eDonkey) integrate content delivery protocols like BitTorrent to facilitate the file delivery
process.
Table 2-1: Analysis of P2P systems
Systems
Requirements
Gnutella Freenet CAN eDonkey Dodgeball
Pervasiveness - - + - ++
Mobility awareness X X X X -
High level semantics - + + - X
Social awareness - - - - ++
Context aware content delivery X X X ++ X
Data Dissemination X X X X X
Interest-awareness - - - - +
X-not applicable, - don’t consider,+ considered in partially or a limited way, ++ fully considered,
Chapter 2: Related Work
28
2.2 Information sharing in MANET
A number of information sharing systems such as ORION [40], Code-Torrent [41],
LIME [42, 43], Limeone [44], TOTA [45, 46], PeerWare [47], AdHocFS [48, 49], Ad-hoc
InfoWare [50] and XMIDDLE [51] have been proposed for MANETs in this decade. In
this section, we discuss these systems with respect to information discovery and delivery
activities. Section 2.2.1 discusses ORION and code-Torrent. Section 2.2.2 presents Lime,
Limeone and TOTA. PeerWare, AdHocFS, Ad-hoc InfoWare and XMIDDLE are
discussed in sections 2.2.3 to 2.2.6. Finally, we compare and analyze the information
sharing systems in section 2.2.7 according to the requirements presented in chapter 1.
2.2.1 ORION and Code-Torrent
ORION [40] is a system that allows peers in a multi-hop MANET to share files. ORION
combines application layer tasks, which are used to search and download files, with
routing tasks, which are used to distribute queries, responses and data. A simple multicast
and broadcast protocol for a MANET [52] is used to disseminate queries.
ORION maintains two routing tables: a response routing table and a file routing table.
Similar to the routing tables used by AODV [53], the response routing table is used to store
the address of the peer from which a query message has been received as next hop on the
reverse path. Thus, a peer is able to return responses to the requesting peer without explicit
route discovery.
In ORION, a peer prepares a query as a list of keywords when it needs files and forwards
the query in the vicinity. A source peer prepares a response as a list of identifiers of files
and sends it in the direction of the requester. When an intermediate peer p1 gets a response
for a query q from a peer p2, it checks the response before forwarding it. If the response
contains files that the peer p1 does not have, p1 notes p2 as the next suitable peer for the
query q.
Chapter 2: Related Work
29
In ORION, a file is divided into a fixed number of blocks and is downloaded block by
block. When a requester gets a response for a query, it chooses a file and sends a data
request message to a peer from which it receives a response containing the identifier of the
selected-file. A data-request-message indicates the block of a file that a peer wants to
download. If the receiver peer has this file, it forwards the requested block to the requester;
otherwise, it forwards the message to the next suitable peer. An intermediate peer sends a
routing-error message to the peer from which it received the request message when its next
suitable peers owing the selected file are disconnected. When a peer receives a route-error
message, it forwards the request message to another next suitable peer. The requester
cancels the download of the file if all the peers owning the file are disconnected before the
download is completed.
ORION is not scalable as it performs file localization by query flooding. The main draw
back of ORION is the usage of an overlay-network. Even if the overlay is constructed by
considering the physical location of peers, it is still costly as there is no guarantee that
peers will remain where they are. The important contribution of ORION is the delivery of
files from one or more sources.
Code-Torrent [41], a file sharing system designed for one hop MANETs, is similar to
ORION in the delivery of files, i.e., downloading of files is performed block by block from
one or more peers. The difference between the two protocols is on the file discovery
process. ORION uses a pull discovery approach (i.e., querying the neighborhood) while
Code-Torrent applies a push discovery approach (i.e., advertising sharable files). In Code-
Torrent, every peer distributes the descriptions of sharable files to its neighbors. A file
description contains the identifier of the file, its name and the number of blocks into which
the file is divided.
2.2.2 Lime, Limeone and TOTA
LIME [42, 43], Limeone [44] and TOTA [45,46] inherit and adapt the communication
model proposed by Linda [54]. The coordination aspect in Linda is accomplished by using
a tuple space, which is globally shared by all participating peers.
Chapter 2: Related Work
30
A tuple is a record of one or more fields, each field having a value of a certain data type.
Any process can write a tuple into the tuple space. Another process, then, can take/read a
tuple from the space. Taking and reading a tuple are different in the fact that a tuple is no
more available after it has been taken. To find a tuple to take/read, a process should supply
a special type of tuple called a template.
Each field of a template is filled with a specific value or a “wildcard”. The template is
matched against all tuples in the tuple space. A template and a tuple are similar if they have
the same number of fields and the fields have the same data types and values (a wildcard
matches any value).
A process can request to take/read a tuple by sending a template; in this case, it receives
a tuple matching the template. If multiple tuples match the template, one tuple is chosen
arbitrarily; if no tuple matches the template, the process is blocked until there is a match.
In Lime, the shared tuple space is replaced by a transiently shared tuple space. As
displayed in Figure 2-2, a transiently shared tuple space is formed from tuple spaces of
peers in the MANET. Each mobile entity is associated with a personal tuple space,
accessed through an interface tuple space (ITS). When mobile entities meet together, their
ITSs are merged. Each mobile entity performs a tuple operation over its personal tuple
space, which is updated with other personal tuple space information when possible. The
transiently shared tuple space is recomputed upon the arrival or departure of a peer. Peers
cannot access the tuple space when the updating process is going on. Consequently, this
system is not appropriate for a dynamic MANET.
Transiently Shared Tuple Space
Local Tuple space
Local Tuple space
Local Tuple space
Figure 2-2: Transiently Shared Tuple space
Chapter 2: Related Work
31
In Limeone [44] and TOTA [45, 46], no global tuple space is constructed. In both
systems, peers manage their own tuple spaces. In Limeone, mobile agents are used to
access the tuple spaces of neighbors while in TOTA, peers makes others aware of sharable
information by distributing tuples.
In Limeone, a software agent is attached with each tuple space. An agent can access the
tuple space of a neighbor upon connection. An agent can inform other agents in the
neighborhood about the tuples that it manages. However, this approach limits the sharing
of data since peers share data only when they have a direct connection.
TOTA works by propagating tuples; each tuple is attached with a propagation rule. The
propagation rule determines how the tuple should be distributed in the environment. This
includes determining the “scope” of the tuple (i.e., the distance to which the tuple should
be propagated and possibly the spatial direction of the propagation) and how the
propagation can be affected by the presence or the absence of other tuples in the system.
The propagation rules can also be used to determine how the tuple content should change
while it is propagated. Attaching a propagation rule to a tuple is an important contribution
of TOTA. This rule can be used to make TOTA work despite the network dynamicity.
However, the authors did not explicitly discuss the design of propagation rules according
to the network dynamicity.
2.2.3 PeerWare
PeerWare [47] is a coordination model. It is developed as a core component of the
MOTION [55] platform, which is designed for a MANET containing fixed backbone
peers.
PeerWare exploits the notion of a global virtual data structure (GVDs). GVDs is a meta-
model of communication for mobile environments, centered around the idea of supporting
coordination among a set of peers through a data space that is transiently shared and
dynamically built out of the data spaces provided by the peers.
Chapter 2: Related Work
32
GVDs is represented as a hierarchy of nodes containing documents. As in Figure 2-3, a
document may actually be accessible from multiple nodes, i.e., a document can have
multiple parents. For example, the thesis you are reading can be classified in mobile
computing domain as well as in information management domain. Changes in the
connectivity state of peers’ determine the content of the GVDs as data become
inaccessible/out of watch.
Figure 2-3: An example of the global virtual data structure managed by PeerWare
The model provides operations that allow peers to query the GVDs, to subscribe for
events and to receive corresponding notifications. The execute operation allows peers to
execute an arbitrary piece of code on a selected set of items held by connected peers. The
subscribe operation allows peers to subscribe to events occurring on a selected set of items.
The publish operation allows peers to notify the occurrence of events.
Organizing documents in a GVDs is an important contribution of PeerWare. However,
allowing a document to have multiple parents makes information retrieval complicated.
Furthermore, the modification of a GVDs is costly for a MANET because it should be
done each time a peer joins/leaves the MANET.
Chapter 2: Related Work
33
All peers have a local data structure except thin devices like PDAs. As a result, small
devices can play only the role of a client. However, nowadays, there are MANETs formed
without heavy-weighted devices, in which, peers in these MANETs store important
information that can be shared with each other.
2.2.4 AdHocFS
AdHocFS [48, 49] is a file system that permits mobile devices to access information
stored in traditional file systems. In this system, mobile devices are considered as terminals
that cache files and exchange caches in MANETs. In the system, every peer is assumed to
have a UID that uniquely identifies the peer. Peers are organized themselves by using their
UIDs.
Initially, each peer is a group leader by itself. Group-leaders discover other peers by
broadcasting messages periodically. Upon discovering other groups, groups are merged
and the peer having a minimum UID becomes the leader of the enlarged group. The leader
has connection to every peer in the group. Every peer is connected to a peer having a UID
less than its own.
The leader broadcasts the hierarchical directory structure of any peer of the group to
every other peer of the same group. The group leader is responsible to assure that (i)
members access the most recent versions of data within a group and (ii) peers have the
same versions of data. In addition, a group leader performs file replication in order to avoid
loss of data due to a sudden disconnection of peers.
As a group leader is needed to manage communications with all peers in the network,
using AdHocFS in a multi-hop MANET seems a very difficult task. Moreover, the leaders
are selected by their UIDs but not by their capacity (e.g., storage and battery).
Chapter 2: Related Work
34
2.2.5 InfoWare
InfoWare [50] is a system designed for a MANET that contains gateways to the Internet.
Even if the network infrastructure is formed in an ad-hoc way, the members of the network
are pre-defined. The system consists of a knowledge manager, a resource manager,
watchdogs, distributed event notification services and a security management.
The knowledge manager module adds a layer of knowledge to the information shared in
the network. It is also used to relate metadata descriptions of information items to a
semantic context. Moreover, it enables querying and retrieval of information items and
resources available in the network.
The distributed event notification service (DEN) is employed to facilitate the exchange
of information. The DEN consists of three services: a publisher, a subscriber and an event
notifer. Any peer involved in the information sharing needs to implement at least one of
these services. Subscribers specify the content that they need. Publishers produce an event
if they have the information needed by the subscribers.
The watchdog module, implemented by every publisher, is used to detect events and to
check the fulfillment of the condition specified by subscribers.
The resource manager module is responsible for delivering information in response to a
query and for controlling watchdogs. It is also in charge of managing the replication of
information and of performing predictions about the probability of network partitions.
The security manager is used to control the access to shared information and resources.
The system groups peers according to their roles in the emergency operation. The system
coordinates resources, performs information sharing and provides support to according to
these grouping.
The infoWare system have two important contributions: (1) the file replication and the
management of resources are performed with respect to the connectivity patterns of the
devices and (2) The management of information and resources is done according to the
peers’ roles in an emergency operation. However, the authors only vaguely discuss the
Chapter 2: Related Work
35
knowledge acquisition related to the connectivity pattern of devices. Moreover, they only
discuss the replication process at high level.
2.2.6 XMIDDLE
XMIDDLE [51] is a system that allows devices in a MANET to share structured
documents with each other. To use this system, peers should present their sharable files in
a tree form (“data tree”). Access points to a data tree are defined in order to present parts of
a file as sharable information.
Hosts having a direct connection can perform data sharing. Host A can modify data in
host B, by creating a link to an access point in a file’s data tree of host B. Such link is
similar to the remote directory mounting in a typical distributed file system.
XMIDDLE has a number of primitive operations to allow peers to share information.
The connect operation is used to connect two peers. During this operation, peers exchange
information about access points of sharable documents. The disconnect operation allows a
peer to explicitly decide to work offline. The link and the unlink operations allow peers to
connect/disconnect to/from an access point of a data tree.
Representing a document in a tree format and allowing only some parts of a document to
be sharable are important contributions of XMIDDLE. However, the system can only be
applied for one hop MANETs.
2.2.7 Discussion
In previous sections, we have discussed the most representative information sharing
systems designed for MANETs. As shown in Table 2-2, these systems apply different
methods to discover files. The methods can be divided into three types: (1) distribution of
queries/advertisements (push/pull approaches), (2) publish/subscribe systems and (3)
shared/transient memory.
Chapter 2: Related Work
36
As the distribution of queries in ORION is not controlled systematically, it can create
insupportable overheads on the participating peers. In Code-Torrent, the information
sharing is performed in a one hop MANET by distributing advertisements blindly. As
blind distribution of advertisements creates important overheads on the participating peers,
this system can not be adopted in multi-hop MANETs.
Table 2-2: Summary of the information sharing systems designed for MANETs
Information sharing
activities
Cod
e-To
rren
t
Lim
e
Lim
eOne
TOTA
OR
ION
Peer
War
e
AdH
ocFS
Info
War
e
XM
IDD
LE
Information Discovery
• CPL: hop count pull
• OPL: one hop pull
• OPH: one hop push
• CPH: hop count push
• GPH: group based push
• SM: shared memory
• PS: publish/subscriber
OPH LM OPL
OPH
CPH CPL SM
PS
GPH PS OPH
Information Delivery
• WC: whole content
• PBP: part by part
• CP: certain parts
• OD: offline delivery
PBP WC WC WC PBP WC WC WC CP
Mobility control
• R: replication
• C: Change of sharing
strategy
- - - C - - - R R
Chapter 2: Related Work
37
The distribution of queries/advertisements can be a simple, a hop count based, or a group
based distribution process. In simple flooding [52, 56], a peer distributes a packet to peers
having a direct connection with it. Each of those peers in turn redistributes the packet in
the same fashion and this continues until all reachable peers have received the packet. This
kind of dissemination of packets has an evident problem of scalability and creates high
burden on devices. In hop-count-based-flooding, a packet is distributed up to a certain hops
counted from the requester peer (for queries) and from the source peer (for
advertisements). In a group-based dissemination, a requester peer disseminates a packet to
peers in the group to which it belongs.
The need of brokers to establish a publish-subscribe system complicates the application
of peerWare in a multi-hop MANET. Finally, the shared/transient memory proposed in
LIME is not suitable for dynamic environments since peers cannot communicate while the
transient memory is changing.
In AdHocFS [48, 49], peers in the MANET forms groups in such a way that there is a
path between any two peers in the same group. The system has raised a very good issue
however, the number of peers in a group is not controlled and the group leader is not
elected according to its processing capacity and battery power. Other researchers have
proposed to use geographical based hierarchical index (GHI) [57] to facilitate information
sharing in a MANET. GHI is formed according to the geographical location of peers. This
approach tries to adopt CAN [26] on a MANET by using a geographical indexing function
that hashes files into a geographical coordinate. Normal distribution and high availability
of peers, however, are required in order to use the system.
Content delivery is a required functionality of any information sharing system. Content
can be delivered at once as it is done in traditional peer-to-peer systems. However, this
may not be always possible in MANETs. The other alternative way is to deliver the content
part by part from one or more peers as it is done in ORION [40], Code-Torrent [41] and
XMIDDLE. However, Only XMIDDLE [51] allows users to download some parts of a file.
Most of the systems do not give any special attention to problems caused by the mobility
of peers (e.g. disconnection of a peer while downloading a file from it). The propagation
Chapter 2: Related Work
38
rules in TOTA can be used to resist the network dynamicity by the scope of a tuple
propagation rule (i.e., the number of hops that a tuple traverses and the modification of the
tuple) according to the event occurring in the environment (e.g., a time alarm or a network
structure change). InfoWare and XMIDDLE try to handle problems caused by mobility of
peers by data replication.
In Table 2-3, we compare the discussed systems according to the requirements identified
in chapter 1, i.e., pervasiveness, mobile-awareness, high level semantics, social-awareness,
context aware content delivery, data routing and interest-awareness.
Table 2-3: Analyzes of information sharing system of MANETs
Systems
Requirement
Cod
eTor
rent
Lim
e
Lim
eone
TOTA
OR
ION
Peer
War
e
AdH
ocFS
Ad-
Hoc
Info
War
e
MID
DLE
Pervasiveness ++ ++ ++ ++ ++ ++ ++ + ++
Mobility awareness - - + + - - + + +
High level semantics - + + + - ++ + - +
Social awareness - - - - - - - ++ -
Context aware content delivery + - - - + - - - +
Data dissemination - - - - ++ - - - -
Interest-awareness - - - - - + - + -
- not considered, + considered partially or in limited way, ++ considered
As discussed in chapter one, the criterion “pervasiveness” describes the possibility of
exchanging information in any place, at anytime and by using any device equipped with
wireless network communication facilities. As all systems are designed for MANETs,
almost all of them satisfy this criterion. Ad-hoc infoWare satisfies this criterion only
Chapter 2: Related Work
39
partially since users should authenticate in the Internet in order to share information in a
MANET.
The mobility awareness criterion is satisfied if the information-sharing activities are
performed according to the network dynamicity1. None of the system fully considered this
criterion. Ad-Hoc InfoWare and Ad-hoc FS use replication of data to resist disconnection
of devices. Ad-Hoc FS blindly replicates the data of a peer to every other peer. However,
blindly replication cannot be a solution in an environment where devices are battery
powered and have limited storage capacity. InfoWare replicates the data of a device that
will be disconnected soon. However, replication of all the data of disconnected devices
may not be necessary. TOTA adjusts the tuples that peers advertise according to a
propagation rule. However, the identification of propagation rules is not discussed
formally. As Limeone uses software agents to manipulate the tuple space, it can tolerate
disconnection. However, the authors do not demonstrate explicitly the use of agents to
tolerate disconnections.
High-level semantics refers to the classification and the representation of sharable files
according to their semantic similarities. Lime, Limeone, TOTA and peerWare use tuple
space, a mathematical model to express semantic inter dependencies of files. However,
semantic richness of file representation depends on the implementation of the tuple space.
As peerWare considers a structure that represents files hierarchically, we consider that it
fully satisfies the criterion “High level semantics”. XMIDDLE uses a tree structure to
represent a file in order to exploit the sharable parts of the file and thus, we consider that it
satisfies the criterion “High level semantics” in a limited extent.
The social-awareness criterion is satisfied if the social relationships of users are
considering during the selection of sharable files. Most of the systems do not satisfy this
criterion. The InfoWare system groups peers according to their roles in the emergency
operation. The system coordinates resources, performs information flow, provide support
according to these grouping.
1 Network dynamicity describes the change of the state of a MANET caused by the mobility of devices. In
this thesis, network dynamism is measured by the stay-time of peers participating in a MANET.
Chapter 2: Related Work
40
In the most of information sharing systems, the delivery of files and routing of messages
are not discussed. ORION integrates ADOV routing protocol to disseminate queries. Code-
Torrent and ORION allow delivery of a file part by part. This kind of delivery process
permits to download files from several sources and thus, accelerates the delivery of files.
However, their delivery techniques are limited and do not consider all of the important
contexts that could be encountered during content delivery. For example, they do not
consider the following context “there is only one source for a file; a source and a requester
can’t stay connected until the download is completed”.
Finally, none of the system gives a special attention to the interest awareness criterion.
Publish/Subscribe information discovery approaches used by PeerWare and InfoWare can
be used to take the interests of users into consideration. However, publish/subscribe
systems are not necessarily interest-aware. Interest-awareness of these systems depends on
the kind of the events produced by publishers.
From the above analysis, we can observe that information sharing in MANETs is still in
its early age. The mobility-awareness, one of the key criterions, is supported in a limited
way by a few of the systems. The interest awareness, another key criterion, is not
considered by any of them.
2.3 Service Discovery
Service discovery protocols are basic components of any information sharing systems. In
this thesis, we focus service discovery protocols for infrastructure-based network and
infrastructure-less networks. SLP (Service Location Protocol) [58] and Universal Plug and
Play (UPnP) [59] are examples of service discovery protocols (SDP) deigned for wired
networks. There are also a number of service discovery protocols including DEAPSpace
[60], GSD [61, 62], Allia [63], Konark [64], and service ring [65] designed for
infrastructure-free networks.
Service discovery protocols can be divided into three types: directory-based, directory-
less, and hybrid of the two. In directory based SDPs, one or more devices provide directory
Chapter 2: Related Work
41
services to facilitate the service discovery process. Service-providers advertise their
services to the directory. To access a service, a client first contacts the directory to obtain
the service description; then, it contacts the suitable service provider directly. In directory-
less SDP, every device is required to provide some form of a directory (service registry).
Each service provider advertises its services to others in the vicinity. Any device interested
in the advertisements can store them in its local service registry. A service consumer can
use the cached advertisements to discover a service or it can disseminate a service
discovery message throughout the environment. In Section 2.3.1, 2.3.2 and 2.3.3, the three
types of service discovery protocols are discussed in detail. In section 2.3.4, we discuss the
main contributions of the discussed SDPs with respect to the requirements pointed out in
chapter 1.
2.3.1 Directory-based Service Discovery
Directory-based SDPs like service ring [65] and protocols proposed in [66, 67] perform
the following activities:
• forming a virtual backbone (i.e., electing directories and creating interconnection
links between them),
• performing service discovery functionalities and
• maintaining the virtual backbone.
In directory-based SDPs, mobile peers are organized in groups, usually based on distance
or communication proximity. A peer in each group is elected as coordinator to handle
routing and service discovery activities. These coordinators establish connections with
each other to form a virtual backbone.
Peers can also be grouped according to the similarity of services that they provide as in
Service ring [65]. A service ring groups together peers that are both physically close to
each other and offer similar services. Each ring possesses a designated service access point
(SAP) which knows a summary about all services offered within its ring. SAPs can be
connected to another ring, too, which lead to a hierarchical structure.
Chapter 2: Related Work
42
There are advantages on using directory based service discovery protocols. First,
scalability is achieved when the network size becomes larger since there are many
directories to handle service discovery and routing. Secondly, the response time for
locating a service is highly reduced. However, dynamic assignment of directories presents
an extra load to the network due to frequent change of the network topology. Moreover, in
a MANET, finding peers that provide directory services may be a difficult task because
these peers should have a good capacity in terms of storage space, processing power and
battery power.
2.3.2 Directory-less SDPs
SDPs like GSD [61, 62], Allia [63], Konark [64], MoGATU [68], and protocols
proposed in [69-71] do not use directories; thus, they are called directory-less SDPs. In
such SDPs, every peer is required to provide the tasks performed by directory services and
to discover resources in a peer-to-peer manner.
These SDPs are deployed with two different working models: a “Push model” and a
“Pull model”. In the push model, service providers disseminate service-advertisements in
the vicinity so that others passively learn about the available services. On the contrary, in
the pull model, clients are rather active, they flood service queries with the hope that the
service provider will eventually reply to their queries. Almost all directories-less SDPs use
a hybrid of the two models.
In most of the protocols, for example in Allia [63] and in the protocols described in [69-
71], peers make advertisement about each local service. In DEAPSpace [60], a peer
advertises its worldview, i.e., information about all services about which it is aware. In
GSD [61, 62], services are classified into groups and a peer advertises its services and the
groups of services that it has seen in its vicinity. In Konark [64], services are arranged in a
tree. In this system, the advertisement can be generic or specific, i.e., it can contain only
services found in the shallow or the bottom level of the service tree.
Chapter 2: Related Work
43
Service information can be updated mainly by two ways. The first way is to update the
service whenever an event occurs, for example the non-availability of a route to the service
provider. The other way is to update the service information on a regular basis for example
by periodical advertisements as done by Konark [64], GSD [61, 62] and the protocol
proposed in [70]. Hybrid of the two can also be used as in [69]. The period of
advertisement can be variable according to the mobility patterns of the peers like in Allia
[63] and Konark [64].
The advertisement can be sent to neighbors that are one or more hops far. The
advertisement diameter is defined as the number of hops that the advertisement crosses. In
DEAPSpace [60] and MoGATU [68], the diameter is fixed to one. Some protocols change
the diameter of advertisements with respect to the mobility of the peer. If the peers are
moving faster then the rate of advertisement increases and the diameter decreases. This
type of technique is implemented in GSD [61, 62] and Allia [63].
Service discovery is generally done by consulting the cached advertisements and if
necessary by flooding a service request query. In a protocol proposed in [70], a peer sends
a query to a service query multicast group when it needs a service. A service query
multicast group is formed during the bootstrapping phase and consists of a service provider
with its possible consumers.
2.3.3 Hybrid service discovery protocols
Scalability is the main drawback of directory-less service discovery protocols. Indeed,
they consume a lot of bandwidth. Hybrid-service-discovery protocols minimize this
drawback by using directories or service brokers whenever possible. Service Location
Protocol (SLP) [58], Universal Plug and Play (UPnP) [59] and Java Enhanced Service
Architecture (JESA) [72] are examples of Hybrid service discovery protocols. Except
JESA, all the mentioned protocols are designed for wired networks.
SLP is a service discovery protocol designed for TCP/IP networks. SLP has two different
modes of operation: (1) when a directory is present, it collects all service information
Chapter 2: Related Work
44
advertised by service providers and consumers send their requests to the directory, and (2)
when there is no directory, consumers repeatedly disseminate their requests in the
environment; service providers listen for these requests and send responses to the
consumers.
Similarly, UPnP can operate with or without a lookup/directory-service. When a service
wants to join the network, it first sends out an advertisement message to notify its
presence. If a lookup or directory service is present, it records such advertisements.
Meanwhile, other services in the network may directly see these advertisements as well.
When a peer wants to discover a service, it can contact the service directly through the
URL that is provided in the service advertisement, or it can send out a multicast query
request. In the case of discovering a service through the multicast query request, the client
request may get a responded from the service directly or from a lookup/directory service.
Finally, as SLP and UPnP, JESA can work with or without a service broker. Services are
advertised and searched by broadcasting service-advertisements and service-requests
respectively. Service-providers reply to the requester-peers via unicast messaging. In larger
networks, service brokers register services provided in the environment and service-
providers stop multicasting advertisements and replying to requests. Brokers are
discoverable as services. If the broker that a service provider has registered with is
disconnected, the service is no longer available. In this protocol, searching for a broker
poses complications by itself.
2.3.4 Summary
Providing information is a service. Hence, discovering a service is logically equivalent to
localizing information (document/file). However, implementing a service for each sharable
file is a resource-consuming task. Nevertheless, service discovery protocols can be used as
bases for designing information localization algorithms.
Chapter 2: Related Work
45
Among the requirements discussed in chapter one, the requirements “Mobility
awareness” and “High-level semantics” are considered by some service discovery
protocols. Table 2-4 lists the service discovery protocols satisfying these requirements.
Table 2-4: Analyzes of service discovery protocols
Requirement Considered by
Mobility awareness GSD, Allia
High level semantics GSD, Service ring; Konark
Allia decides the diameter and the period of advertisement depending on the mobility
pattern of the peers. GSD uses the same concept to decide the diameter. However, the
authors of Allia and GSD do not explicitly specify how they measure the mobility patterns
of peers and how the diameter and the period of advertisement are parameterized.
Konark, GSD and Service ring organize services according to their semantic similarities.
Thus, these protocols satisfy the requirement “high level semantics”. However, except
Konark, they do not exploit semantic rearrangements of service during service
advertisements. In Konark, services are rearranged in a tree and the protocol selects some
branches with respect to some criteria for advertisement. However, the authors of the
protocol discuss these criteria vaguely.
2.4 Routing
The routings of queries, responses and data are fundamental tasks of any information
sharing system. Actually, talking about information sharing system is useless without
having a reliable routing protocol, especially when dealing with MANETs. In MANETs,
peers are responsible to perform data routing. The processing power and the storage
capacity of these peers are limited. Moreover, they are battery powered. As a result,
information sharing systems should be integrated with a routing protocol that considers
these specific conditions.
According to the responsibilities of peers, routing protocols can be classified into two
types: flat and hierarchical [73]. In flat routing protocols, every peer is equally responsible
Chapter 2: Related Work
46
for forming and maintaining the routing information. As the name indicates, in hierarchical
protocols, the network is structured into clusters and cluster-heads, which form a virtual
backbone for routing. The peers forming the backbone perform the routing task. As our
middleware is based on a pure peer-to-peer architecture, we will analyze the flat routing
protocols.
According to the number of destinations, routing protocols can also be classified into two
types: multicast and unicast. Unicast routing protocols are targeted for a single destination;
multicast protocols, for a group of destinations. In this thesis, we will review only unicast
protocols.
Finally, routing algorithms can be further classified as global positioning based protocols
and global position-less protocols [73]. A global positioning based protocol considers the
positions of peers during routing. Oppositely, global positioning-less protocols do not
make any assumption about the positions of peers. In section 2.4.1 and section 2.4.2, we
present the two types of routing protocols. Later in section 2.4.3, we discuss the possible
integration of routing protocols with an information sharing system of a MANET.
2.4.1 Global Position-Less routing protocols
AODV [53], WRP [74, 75], DSDV [76], DSR [77] and ZRP [78] are popular unicast
MANET routing protocols. DSDV is a proactive routing protocol. In this protocol, devices
exchange routing information among each other. DSR and AODV are reactive routing
protocols in which routing information is searched on demand. ZRP is a hybrid protocol
that combines reactive and proactive features.
DSDV is designed based on the Bellman-Ford algorithm. In DSDV, every peer maintains
a routing table and broadcasts the table periodically to its direct neighbors. A routing table
contains the shortest distance to reach every possible destination in the network, the next
hop peer through which the shortest path passes, the address of the destination and a
sequence number, which indicates the presence/absence of a peer. A sequence number is
generated by the destination as even number and can be updated to odd number if the path
Chapter 2: Related Work
47
leading to the destination is broken. In the context of a MANET, the periodical
broadcasting of routes would definitely be very bad for the battery power of devices
participating in a MANET and would drastically increase the network traffic.
WRP uses an improved Bellman-Ford Distance Vector routing algorithm. To adapt to the
dynamic features of mobile ad-hoc networks, mechanisms are introduced to ensure the
reliable exchange of update-messages and to limited route loops created by the Bellman-
Ford algorithm. In WRP, in addition to the routing table, peers maintain two other tables: a
distance table and a link-cost table. The distance and the link-cost table store the distance
and the cost of a destination through each one-hop neighbor respectively. The cost of a link
can be the number of hops of a destination or the number of hops plus a biased value that
indicate the reliability of the link. The distance table contains the distances of each
destination from each one-hop neighbors. In WRP, peers exchange routing tables with their
one-hop neighbors using update-messages. Either the update-messages can be sent
periodically or whenever a link state changes. Additionally, if there is no change in its
routing table since the last update, a peer is required to send a “hello message” to ensure its
connectivity. When a peer receives a hello message from a new peer, the new peer is added
to its routing table, and the receiver peer sends a copy of its routing table information to the
new peer.
In general, proactive protocols like DSDV and WRP are not suitable for MANETs. As
MANETs are very dynamic networks, it is important to update route table whenever a peer
appears, disappears and reappears. This updating process creates a high burden on the
peers participating in a MANET.
AODV is designed based on the DSDV protocol. However, peers build routing tables by
using route requests / route replies query cycles. When a source peer searches for a route to
a destination, it broadcasts a route request (RREQ) packet across the network. The request
contains a source address, a source sequence number, a broadcast ID, a destination address,
a destination sequence number, and a hop count. Peers receiving this packet update their
information for the source peer. A peer receiving a RREQ packet sends a route reply
request (RREP) back to the source if it is the destination or if it has a route to the
Chapter 2: Related Work
48
destination. If this is not the case, it rebroadcasts the RREQ. As a route is searched when a
source needs to send something to the destination, the route searching process will increase
the data transfer time.
DSR is a reactive routing protocol. In this protocol, each peer caches the routes that it has
learnt. Peers learn about possible routes during packet forwarding. The cached routes are
deleted after a predefined period if they are not updated. A host can obtain a suitable
source route by searching its cache of routes. If no route is found in its cache, it will initiate
a route discovery process to find a route to the destination dynamically. A route discovery
process involves broadcasting a route request packet. The route request packet contains the
address of the source and the destination, and a unique identification number. Each
intermediate peer checks whether it knows a route leading to the destination. If it does not,
it appends its address to the route record of the packet and forwards the packet to its
neighbors. If the route discovery is successful then the source host, which has initiated the
route discovery, receives a route reply packet. Learning routes during packet forwarding is
an important contribution of this protocol.
ZRP divides the network into different zones with a variable size. The number of hops is
used to measure the size of a zone. Each peer has its own zone. A peer keeps routes to
every destination within its zone; hence, a packet can be delivered proactively when the
source and the destination are in the same zone. For routes beyond the local zone, a route
discovery is done in a reactive fashion. The source peer sends a route requests to its border
peers. The border peers of a source are those found at r hops away from the source where r
is the radius of the zone defined for the source. The border peers check their local zone for
the destination. If the requested peer is not a member of the local zone, the peer adds its
own address to the route request packet and forwards the packet to its border peers. If the
destination is a member of the local zone of the peer, it sends a route reply on the reverse
path back to the source. The source peer uses the path saved in the route reply packet to
send data packets to the destination. Dividing peers into zones is an important contribution
of this work.
Chapter 2: Related Work
49
2.4.2 Global Positioning based routing protocols
Position-based protocols use the geographic positions of peers, which are determined by
using a localization service, to facilitate the routing [79]. In these protocols, the routing
decision only depends on the position of the destination and the position of the direct
neighbors. Thus, they may not need to establish or maintain routes. LAR [80], DREAM
[81] and DG-CastoR [74, 75] are examples of position-based routing protocols.
Location aware routing (LAR) is an extension of DSR [77] where routing is guided by
the physical location of destinations and sources. If the source peer does not know about
the position of the destination peer, it uses DSR protocol to discover it. Otherwise, the
source peer sends packets to peers that are found in the direction of the destination.
In LAR, an expected zone and a request zone are defined to facilitate the routing process.
An expected zone is an area where the expected location of the destination. It is calculated
by the previous location of the destination and its velocity. A request zone is an area where
query forwarding can be performed.
The protocol proposes two schemes to calculate the request zones. In the first scheme,
the request zone is calculated as a smallest rectangle including the source and the expected
zone. In this scheme, the source adds the four corners of the rectangle with the routing
message. A peer forwards the message if it finds itself in the rectangle. In the second
scheme, the request zone is defined by two constants: a and b. The source includes its
distance from the destination (let us refer it as DISTs) with the routing message. Let
DISTi be the distance of the peer receiving the message to the destination peer. This peer
forwards the message by replacing DISTs by DISTi if a × DISTs +b≤ DISTi.
In Distance Routing Efficient Algorithm (DREAM) [81], each peer maintains a position
database that contains the geographical positions of the destinations peers. Every peer
periodically floods its position up to a fixed distance. When a peer S wants to send a packet
to a peer R, it sends the packet to its neighbors found in the direction of R. The neighbors
forward the packet to their neighbors found in the direction of R. This process continues
Chapter 2: Related Work
50
until the packet arrives at R. DREAM uses a method similar to the second scheme of LAR
to determine the peers laying in the direction of R.
DG-CastoR [82, 83] is a routing protocol proposed in order to improve information
exchange in a VANET. In this algorithm, one-hop neighbors exchange their trajectories
(i.e., current and future positions) periodically. A query used to search information is
distributed to peers that follow the same trajectory as the source. Considering trajectory is
an important contribution of the protocol.
2.4.3 Discussion
In this chapter, we have revised popular routing algorithms as DSDV, AODV, DSR and
LAR. A proactive routing protocol like DSDV is not convenient for MANETs since it
produces a lot of traffic. Route discovery time of reactive routing protocols like AODV
will increase the information discovery time.
The location-aware routing protocol (LAR) introduces location awareness over DSR in
order to facilitate the routing process. However, as it uses DSR, if the peer does not have
the position of the destination, the information discovery time will increase when peers
communicate for the first time. LAR does not have a significant difference with DSR in a
MANET where peers changes position too frequently. DREAM avoids the problems of
LAR by discovering the route of some peers proactively. However, as in all proactive
protocols, the broadcasting of update-messages will create high traffic. DG-CastoR is
different from DREAM by considering future positions (trajectory) of peers.
As our thesis focuses on indoor MANETs where the dynamicity of the network is
limited, we choose LAR to integrate with our middleware. In future, we will integrate a
routing protocol that combines the features of LAR and DG-CastoR (see section 7.3).
Chapter 2: Related Work
51
2.5 Summary
In this chapter, we have reviewed research works in the domain of information sharing,
service discovery and routing in MANETs and traditional peer-to-peer networks according
to the requirements presented in chapter one.
We have observed that it is difficult to adopt P2P systems since they are not designed for
mobile environments where peers usually hold battery powered and thin devices.
Nevertheless, we have observed the important contributions of these systems. Content
downloading performed by BitTorrent protocol [23], which is used to download files block
by block, should be adopted in MANETs in order to facilitate the information delivery
process. File rearrangements performed by Freenet [31, 32] is another interesting feature
even if it is difficult to design a hash function that takes battery power and processing
power of devices into consideration. Social awareness considered by Dodgeball [35, 36]
and ImaHima [39] should be used in MANETs in order to facilitate information exchange
between nomadic users.
In MANETs, we have analyzed ORION [40], Code-Torrent [41], LIME [42, 43],
Limeone [44], TOTA [45, 46], PeerWare [47], AdHocFS [48, 49], InfoWare [50] and
XMIDDLE [51]. Mobility awareness is only partially touched by InfoWare and
XMIDDLE. These systems use the replication of files to stand with the problem coming
from the peers’ mobility patterns. We have also observed that propagation rules in TOTA
and mobile agents in Limeone could be used to resist the network dynamicity. It is difficult
to conclude that interest awareness is touched by these information-sharing systems. As
PeerWare and InfoWare use a Publish/Subscribe information discovery approach, they
might work according to the interests of users. However, as authors do not discuss about
the kinds of events the publishers produced, it is difficult to conclude that they have an
interest awareness feature.
We have also analyzed service discovery protocols like Allia, GSD and Konark. Allia
and GSD consider the requirement “mobility awareness”. These protocols propose to
change the strategy of service discovery method according to the mobility patterns of
Chapter 2: Related Work
52
users. However, the authors do not explicitly discuss the parameterization of the strategy of
service discovery according to the mobility pattern of users. Konark arranges services in a
tree and makes advertisement by selecting some branches of the service tree.
Data routing should be integrated with information sharing in order to facilitate the
information discovery and the information delivery processes. ORION and most of service
discovery protocols integrate a reactive routing protocol. However, a reactive routing
protocol increases the information discovery time. From the survey that we have made in
the routing domain, we have observed that integrating a location aware routing protocol
will improve the performance of the information sharing activity.
Despite all the research efforts, the requirement “mobility awareness” has not been given
enough attention. We have also observed that the requirement “interest awareness” is
considered in a very limited extent.
In this thesis, we propose approaches that enable an information sharing system to
discover files according to the interests and the mobility patterns of users. We also propose
a method to organize files in a tree in such a way that file discovery can be performed with
minimum overhead. Based on the propositions, we design and implement a self-adaptive
middleware called SAMi. In the next chapters, we discuss in details our propositions and
the middleware SAMi.
53
Chapter 3 Interest Awareness
Peer to peer information sharing over MANETs has become an important research topic
due to the advancement in information and communication technologies. In order not to
create information overloading, information discovery in MANETs, which is required to
carry out information sharing, should be performed according to the users’ interests.
Moreover, interest awareness can be used to increase the collaboration of users in the
information discovery process.
In this chapter, we propose an interest aware information discovery approach that
discovers files by disseminating advertisements and queries. In this approach, the users’
interests are used to filter the files to be advertised so that the usage of advertisements
increases and their overhead decreases. The interests of users are also used during query
resolution. Furthermore, in order to increase the scalability of the information discovery
approach, advertisements and queries are disseminated according to the users’ interests.
The proposed approach permits users to specify their interests reactively. It is also possible
to compute the users’ interests automatically by analyzing their habits of information
sharing. Finally, the users’ social networks are used to facilitate the interest identification
process.
The research work in this chapter was presented in International workshop on Mobile
P2P Data Management, Security and Trust (MP-DMST*) organized in conjunction with
the MDM 2010 conference [84] and in Pervasive Computing and Communications
Workshops (PerComW 2010) [85].
This chapter is organized as follows. Section 3.1 presents the motivation behind the
design of an interest aware information discovery approach. Section 3.2 defines the
important concepts and operations used through out the chapter. Section 3.3 discusses the
proposed information discovery approach. Section 3.4 deals with the identification of the
users’ interests. Section 3.5 illustrates the exploitation of users’ social networks in the
interest identification process. Section 3.6 discusses other interests-aware approaches;
finally, we conclude the chapter in section 3.7.
Chapter 3 Interest Awareness
54
3.1 Motivation
The effort and the time needed to search information will be minimized if the
information that users need is provided automatically with respect to their interests.
Filtering the files to be advertised with respect to the users’ interests will minimize the
volume of advertisements. Similarly, considering the interests of users during the query
resolution increases the satisfaction and the collaboration of users. Finally, the interests of
users can facilitate the routing of information and hence, can make the system scalable.
Therefore, we argue that the users’ interests should be definitely considered when
designing an information discovery approach.
Recall the scenario discussed in chapter 1; buses and cafeteria of INSA are among the
places where Pascal uses MANETs to share information. Let us observe the detailed
information needs of Pascal and people with whom he has the habit to share information.
Assume that Pascal can use either “Bus 27“ or “Bus 37” to go to the office at 8 A.M. In
these buses, Pascal communicates with other passengers by using a MANET, which is
formed via Bluetooth. Most of his friends also use Bus 37 where they exchange jokes and
photos related to touristic places. People working in banks use Bus 27. As Pascal is also
interested in financial affairs, he exchanges news about finance in this bus. Pascal uses
MANETs to share information with his colleagues at a cafeteria of INSA named INSA-Café
to exchange information about new research areas and research issues.
According to the above scenario, Pascal and his friends are interested to exchange jokes
and photos in Bus 37. Assume that a MANET formed in Bus 37 is displayed Figure 3-1
such a way that Pascal is a neighbor of Eve and David, who are neighbors of Carol. Pascal
is also a neighbor of Bob and Anne. Let us observe the information exchange from the side
of Pascal. As Eve and Carol are interested in getting photos, Pascal should send
advertisements about photos to Eve and Eve should forward the advertisements to Carol.
Note that Bob and Anne are interested in providing jokes and photos respectively. As a
result, Pascal should send queries concerning jokes to Bob and queries concerning photos
to Anne.
Chapter 3 Interest Awareness
55
o
Wireless Networking ------
Joke PhotInterest out
Interest In
Adv Queries
? ?
Anne
Pascal
Bob ??
Carol
?
David
?
Eve
Figure 3-1: A MANET in Bus 37
As illustrated by the scenario, interest awareness facilitates the information discovery
process. Therefore, the interests of users should be considered during selection of files to
be advertised and queries to be resolved. Similarly, the dissemination of advertisements
and queries should be done according to the interests of users.
It is difficult for nomadic users to specify their interests each time that they participate in
a MANET. Thus, the information discovery process should be able to identify the users’
interests automatically. We propose that analyzing historical queries and advertisements
can be used to identify the interests of users in receiving and providing information.
The location and the time contexts have an influence on the information that a user wants
to receive. For example, a student will be more interested in sharing research papers in
INSA than in other locations. Users may be more interested to exchange news in the
morning than in the night. Users may be highly interested in providing songs during their
vacation, more than in other times.
Chapter 3 Interest Awareness
56
The interests of users are different when they are with friends and colleagues. A user
may be more interested to share information about their office environments with
colleagues than with friends. For example, Pascal is interested to exchange jokes with his
friends and research issues with colleagues.
The contributions of this chapter are related to: (1) formalization of the user interests, (2)
illustration of the application of the users’ interests in information discovery, (3)
identification of the users’ interests and (4) application of the users’ social networking in
the interest identification process.
3.2 Definitions
In MANETs, users have sharable files to provide for others. We assume that each of the
files is described by a set of keywords. Users also want to receive files from others. They
search files that they are looking for by disseminating queries, which are also represented
as set of keywords.
Definition 3.1. Interest: A user-interest2 represents a set of files that a user wants to
receive or provide. An interest I is represented as ({k1,..,kn}, w) where
• k1,.., kn are keywords (referred as Description(I)) and
• w∈(0, 1] is a weight (referred as Weight(I)).
Description(I) expresses the files represented by the interest and Weight(I) indicates the
preference/capacity of a user to receive/provide the files represented by I as compared to
his/her other interests.
An empty interest is defined to represent the files that a user is unable to describe. As
any other interest, the files represented by an empty interest can be provided or received.
The description of an empty interest Ie contains nothing, i.e., Description (Ie) = ø.
2 We use the terms interest and user-interest interchangeably.
Chapter 3 Interest Awareness
57
Definition 3.2. Similarity of Interests: The similarity of two interests Ii and Ij is defined
by the similarity of their descriptions. Let Similarity (Di, Dj) be a similarity value of two
descriptions Di and Dj.
Similarity(Di, Dj) can be computed by using a semantic similarity function proposed in
[86] or a lexical similarity functions proposed in [87]. Lexical similarity functions are
simple to implement and execute especially for thin devices (e.g. a mobile phone). Thus,
during our experimentations, we have used a lexical similarity function derived from the
lexical similarity function introduced in [87]. However, our information discovery
approach can use other similarity functions without significant modifications. More
precisely, we define the similarity value of two descriptions Di and Dj, as:
⎪⎪
⎩
⎪⎪
⎨
⎧≠≠
××
+
=
otherwise
DandDifDD
DDDD
DDSimilarityji
ji
jiji
ji
0
øø2
)(
),(
∩
Note that Similarity(Di, Dj) is 0 if Di and Dj are disjoint and 1 if they are identical. As
described in Definition 3.1, Di= ø is the description of an empty interest. As the
descriptions of the files represented by an empty interest are unknown, thus, it is not
possible to compare an empty interest with a non empty interest. We can not also compare
two empty interests for the same reason. By considering a pessimistic case, we define
Similarity (Di, Dj) as 0 if Di= ø or Dj= ø.
The similarity value of interests Ii and Ij is computed as:
Similarity (Ii, Ij) =Similarity(Description(Ii), Description(Ij))
We define the similarity value of an interest and a file/query in the same way. Let Df be
the description of a file f and let q be a query. Similarity(I,f) and Similarity(I,q) are defined
as:
Similarity(f,Ii)= Similarity(Df, Description(I))
Similarity(q,Ii)= Similarity(q, Description(I))
Chapter 3 Interest Awareness
58
We say that an interest Ii and an interest Ij are similar (noted as Ii ≈ Ij), i.e., if and only if
their similarity value is greater than or equals to a predefined value accSim, i.e.,
Ii ≈Ij ⇔ Similarity (Ii,Ij) ≥ accSim
Similarly, we use the same similarity threshold to determine if an interest I is similar to a
file f (or a query q), i.e.,
Ii ≈f ⇔ Similarity (Ii, f) ≥ accSim
Ii ≈q ⇔ Similarity (Ii, q) ≥ accSim
Definition 3.3. Collaboration tie strength: For users having a habit of appearing
together in MANETs, we define the function Tie-strength(p1, p2) to indicate the degree of
their collaboration. For us, the collaboration between users is expressed by their sharing
habits.
Let numberSh(p1, p2) be the number of files that the peers p1 and p2 have shared, let co-
total-Time(p1, p2) be the total times that peers p1 and p2 have stayed connected in MANETs
and let total-numberPr(p) be the number of files that a peer p provides to others in
MANETs. The collaboration tie strength between p1 and p2 (denoted as Tie-Strength(p1,p2))
is computed as:
)p ,Time(p-total-Co )numberPr(p-total)numberPr(p-total )p ,numberSh(p )p ,strength(p-Tie
2121
2121 ++
=
Note that the Tie-strength is symmetric, i.e., Tie-strength (p1, p2) = Tie-strength (p2, p1).
In a MANET, peers do not have a global knowledge. Thus, for a peer p, it is difficult to
compute the tie-strength between any couple of peers p1 and p2. However, in our protocol
(see section 3.3), the peer p needs to know the degree of collaboration between p1 and p2 in
order to forward advertisements of p1 to p2. We will discuss the advertisement forwarding
process in section 3.3. We propose that p estimates the tie strength between p1 and p2 by
combining his/her degrees of collaboration with respect to p1 and p2. Thus, we propose that
p can compute Tie-strength (p1, p2) as:
Chapter 3 Interest Awareness
59
2)p (p,strength -Tie )p (p,strength -Tie )p ,(pstrength -Tie 21
21+
=
Definition 3.4 Sharing-context: A sharing-context3 of a peer describes a situation in
which the peer allows others to download files from his/her machine. A sharing context is
expressed as a tuple(L,[Ts,Tf]) where L is a location and [Ts,Tf] is a time interval.
For example, (Bus 1, [8AM, 10AM]) is a sharing context describing that a peer allows
others to download files from his/her machine when he/she is in Bus 1 from 8AM to
10AM. Table 3-1 lists other examples of sharing contexts.
Table 3-1: Examples of sharing contexts
Context Description
(Bus,[8AM, 10AM]) any bus from 8AM to 10AM
(“”, [8AM, 10AM]) any place from 8AM to 10AM
(“”, ø) anywhere and anytime
We define two types of sharing contexts: abstract and actual. An abstract sharing
context describes when and where a peer allows others to download files from his/her
machine. For example, a user can specify that others can download files from his/her
machine wherever and whenever she/he is in a MANET by fixing the sharing context to
(“”, ø). However, this doesn’t mean that he/she is in a MANET 24 hour in a day and 7
days in a week. An actual sharing context is derived from an abstract sharing context by
considering the actual time and place in which data were shared. Assume that Pascal
having an abstract sharing context (“”, ø) is interconnected with other nomadic users via a
MANET in Bus 27 from 8 AM to 8:10 AM. As a result, (Bus 27, [8 AM, 8:10 AM]) is the
actual context derived from the abstract context of (“”, ø).
3 In this chapter, we use the terms sharing-context and context interchangeably.
Chapter 3 Interest Awareness
60
Definition 3.5. Sharing-interest: A sharing-interest is the set of interests of a user in a
given context. It can be used as a demand or a provision. An information demand of a peer
is a sharing interest that contains the interests of the peer to receive information. An
information provision of a peer is a sharing interest that contains the interests of the peer to
provide information. Table 3-2 lists examples of Pascal’s information demands.
A sharing interest S has the following properties.
[1] |S| ≥ 1,
[2] Description(I1) ≠ Description(I2) for I1, I2∈ S
[3] ∑∈
=SI
IWeight 1)(
[4] Weight(I) ≥ minW
Table 3-2: Examples of Information demands of Pascal
Context c Information-Demand(Pascal, c)
(Bus 37, [8AM-8:05AM]) {({accounts, banking, economics, financial
affairs},0.3), ({fund, treasure, capital, currency,
change},0.3), ({commerce, buying, selling, exchange,
stock, trade},0.4)}
(Bus 27, [8AM-8:07AM]) {({joke, fun, quip, buffoonery},0.5), ({photo-places,
photo-friends, photo-mountain, photo-nightclub},
0.5)}
(INSA-Café, ø) {({Social-networking, social analysis, social
software}, 0.7), ({iPhone-models, iPhone models,
iPhone history, iPod},0.3)}
We assume that a user participating in a MANET is interested to receive and provide
information. Thus, a sharing interest of a user will contain at least one interest. Two
interests in a sharing interest should represent different kinds of files; as a result, their
Chapter 3 Interest Awareness
61
descriptions should not be identical but can be overlapped. As the weight of an interest is
defined as a comparative value (Definition 3.1), the sum of the weights of the interests in a
sharing interest should be one. As any other data in the computer, the number of interests
in a sharing interest should be finite. Consequently, we define a minimum weight (minW)
to limit the number of interests such that an interest can not have a weight less than minW.
If minW is 0.25, we will have a maximum of 4 interests in a sharing interest.
Empty sharing interest: An empty sharing-interest, denoted as {(ø, 1)}, is defined as a
set containing only an empty interest.
Information-Demand(p, pd, c): Information-Demand (p, pd, c) is an information
demand of a peer p observed by a peer pd in a context c. When p is the same as pd,
Information-Demand (p, pd, c) is referred as Information-Demand(p, c).
Information-Provision(p, pd, c): Information-Provision(p, pd, c) is an information
provision of a peer p observed by a peer pd in a context c. When p is the same as pd,
Information-Provision(p, pd, c) is referred as Information-Provision(p, c)
Overall-Demand(P) and Overall-Provision(P). For a set of peers P, their common
interests to receive and provide information are referred as Overall-Demand(P) and
Overall-Provision(P) respectively. The overall demand of peers is computed by
aggregating their information demands via the operation described in Definition 3.7.
Similarly, the overall provision of peers is computed by aggregating their information
provisions.
Definition 3.6. Similarity of sharing interests: The primary condition that we use to
decide the similarity of two sharing interests S1 and S2 is the similarity of the interests in
the two sets, i.e., for every interest Ii in S1, there should be an interest Ij in S2 such that Ii ≈Ij
and vese versa. The sharing-interests are not similar if the primary condition is not
satisfied. We use the cosine law to determine the similarity of two sharing-interests
satisfying the primarily condition.
Chapter 3 Interest Awareness
62
Assume that S1={I1i,..,I1n} and S2={I21, ..,I2m} such that |S1|=n and |S2|=m. Let the
weights of I1i and I2i be W1i and W2i respectively; Let P1 be the vector representation of S1
and P2 be the vector representation of S2; we define these vectors as:
• P1=(W11,..,W1n) and
• P2=(W21, ..,W2m)
Let W12i be the average weights of the interests in S1 matching with the interest I2i and
W21i be the average weights of the interests in S2 matching with the interest I1i. For a
sharing interests S, let Sim(S,I) be the set of interests in S similar to the interest I; W12i and
W21i are computed as:
),(
)(W
21
),(12i
21
i
ISSimI
ISSim
IWeighti
∑∈=
),(
)(W
12
),(21i
12
i
ISSimI
ISSim
IWeighti
∑∈=
Let P12 be the vector representation of S1 with respect to S2 and P21 be the vector
representation of S2 with respect to S1; we define these vectors as:
• P12 = (W121,..,W12m)
• P21=(W211, ..,W21n)
The primary similarity condition is satisfied by S1 and S2 if and only if
∀ Ii ∈ S1, ∃ Ij∈ S2 such that Ii ≈ Ij and
∀ Ii ∈ S2, ∃ Ij∈ S1 such that Ii ≈ Ij
We define the similarity value between the sharing interests S1 and S2 as:
Chapter 3 Interest Awareness
63
⎪⎪⎩
⎪⎪⎨=
Otherwise0
satisfied iscondition similarityprimary theIf2),(
122211
21 SSSimilarity
⎧ + ) ,cos(),cos( PPPP
Similarity(S1 ,S2) is a commutative operation. The sharing interests S1 and S2 are said to
be similar if and only if
S1 ≈ S2 ⇔ Similarity(S1, S2) ≥ accC
where accC is a predefined similarity threshold.
Example: Assume that S1 = {({Social-networking}, 0.3), ({iPhone-models},0.7)}, S2=
{({Social-networking}, 0.7),({iPhone-models},0.3)} and accC =0.5.
Observe that P1= (0.7, 0.3), P2= (0.7, 0.3), P21=P2, P12 = P1. Thus, cos(P1, P2)= cos(P1,
P21)= cos(P2,P12)=0.39 and hence, Similarity(S1,S2)=0.39; therefore, S1 and S2 are not
similar.
Definition 3.7. Aggregation of sharing interests: The aggregation of a set of sharing-
interests is used to extract the common features of the users’ sharing interests. The
aggregation of a set of sharing interests T = {S1, S2, … ,Sn}, denoted as ⊕∑Si, is computed
by using the following two steps
Step 1: Decompose interests
Let TNI be a set all non-empty interests in the sharing interests to aggregated. The
interests in TNI are decomposed into GTNI = {T1, …, Tm} such that the interests in the same
group are more similar than the interests in different groups. We propose to perform the
decomposition of non-empty interests by using a method4 derived from the agglomerative
hierarchical clustering approaches [88].
4 Grouping of interests are performed in the same way as grouping of queries. The algorithm proposed to group queries is discussed in
section 3.4.1.
Chapter 3 Interest Awareness
64
More specifically, the sets in GTNI satisfy the properties:
Ip, Iq ∈ Tk ∈ GTNI ⇒ Ip ≈ Iq
Tk , Ts ∈ GTNI and k≠s ⇒ ∃ Ip∈ Tk and ∃ Iq ∈ Ts such that Ip !≈ Iq
Let Sim(T,Iq) be the interests in Ts similar to Iq ; for Iq ∈ Tk, the
following property holds true
s
ITSimIpq
k
ITIpq
TTqspqkp
IISimilarityIISimilarity ∑∑ ),(),(∈−∈
≥−
),(}{
1
Step 2: Identifying interests in ⊕∑Si
From each Tk, an interest Ik is computed in such a way that
• TIWeightIWeightkTI
k /)()( ∑∈
=
• ∩kTI
k II∈
= )(nDescriptio)(nDescriptio
As discussed in Definition 3.5, every interest in a sharing interest should have a weight
greater than minW (predefined threshold introduced in Definition 3.5). Consequently, the
interest Ik is added in ⊕∑Si, if Weight(Ik) ≥ minW.
Finally, according to Definition 3.5, the sum of the weight of interests in a sharing
interest should be one. Let SumNI be the sum of the weights of the non-empty interests in
⊕∑Si. If 1-SumNI≥ minW, an empty interest Ie is added in ⊕∑Si such that weight(Ie) = 1-
SumNI. The weights of each interest I in ⊕∑Si will be normalized using the formula below
if 1-SumNI < minW but SumNI <1.
Weight(I) = Weight(I) ÷ SumNI
Chapter 3 Interest Awareness
65
3.3 Interest aware Information Discovery
In a MANET, information discovery can be performed by using two approaches: push
and pull. In a push approach, data sources make others aware about their sharable files by
disseminating advertisements; in a pull approach, a requester peer searches the source of a
file by distributing queries. As discussed in section 3.1, both approaches should be
conducted according to the interests of users. Thus, data-sources should prepare and
disseminate advertisements about their sharable files according to the information demands
of data-requesters. Similarly, data-requesters should resolve queries according to the
information provisions of data-sources.
When joining a MANET, data-sources and data-requesters distribute their interests to
provide and receive information in their vicinities. Let P be a set of peers in the MANET
about which a peer p is aware of; we propose that the peer p estimates the overall demand
of peers in P and their overall provision by using the aggregation operation described in
Definition 3.7, i.e.,
∑∈
⊕=Pp
Demand(p)-nInformatioDemand(P)-Overall
∑∈
⊕=Pp
p)Provision(-nInformatioP)Provision(-Overall
Let Sod be the overall demand of the requester peers in a MANET. A data-source peer in
this MANET should preferably advertise files matching with the interests of the overall
demand Sod. Let Adv-Volume5 be the number of metadata that the data source can use to
advertise sharable files. Let N(I) be the number of metadata that it can to advertise files
matching the interest I∈ Sod. N(I) is computed as:
N(I)= Weight(I) * Adv-Volume
5 We will discuss the computation of Adv-Volume in the next chapter.
Chapter 3 Interest Awareness
66
Let F(I) be a set of files matching the interest I∈ Sod and ADV(I) be a container used to
store advertisements related to the interest I. The data source selects at maximum N(I) files
from F(I) and puts their metadata in ADV(I) via Algorithm 3-1.
For each non-empty interest I, Algorithm 3-1 places the files matching the interest I in
F(I) (lines 1 to 3). For an empty interest Ie, F(Ie) is filled with the files that do not match
any of the non-empty interests (lines 4 to 6). If the advertisement quota for I, i.e., N(I), is
sufficient to advertise all the files in F(I), the metadata of each file in F(I) is placed in
ADV(I) (lines 8 to 9). Otherwise, for each non-empty interest I, some of the files in F(I) are
selected to be advertised according to their similarities (relevance) to the interest (lines 10
to 11). If f1 and f2 are sharable files in F(I) such that Similarity(f1,I) is greater than
Similarity(f2,I), f1 is said to be more relevant to I than f2. In this case, f1 will have more
chance to match users’ need. Therefore, this file is privileged by the data source peer to be
included in ADV(I) than the other file. If the interest I is an empty interest and the
advertisement quota of I is not enough to advertise all the files in F(I), some of the files in
F(I) are randomly selected to be advertised (lines 12 to 13).
For any interest I, the dissemination of ADV(I) is performed according to: (1) the
direction of peers having information demand matching the interest I or/and (2) the degree
of collaboration between the data source peer and the peer to which the advertisement will
be forwarded. An information demand S matches an interest I if and only if ∃ Ii ∈ S such
that Ii is similar to I.
The Tie-strength notation, described in Definition 3.3, is used to calculate the degree
of collaboration between two peers. Let min-tie be the minimum tie between a data source
and a peer that has a chance to receive the advertisements. A peer p is said to have a high
degree of collaboration with the data source peer ps if and only if
Tie-strength(ps,p )≥ min-tie
Chapter 3 Interest Awareness
67
Algorithm: Advertisement message preparation Input : Sod, N(I)∀ I ∈ Sod, F, Sod: overall demand F: sharable files N(I): advertisement quota of an interest I Output : ADV(I) ∀ I ∈ Sod ADV(I): metadata to be advertised w.r.t an interest I Begin //put files matching a non-empty interest I in F(I) 1. For any I ∈ Sod | description(I) ≠ ø 2. F(I) {f | f ≈ I and Similarity(f,I) ≥ Similarity(f, Ij) for∀ Ij ∈ S } 3. End For
/* put all files that are not similar to any of the non empty interests in F(Ie) where Ie is an empty interest*/
4. If ∃ Ie ∈ Sod | description(Ie) =ø 5.
{ }∪
eISIIF
−∈← )(-F )F(Ie
6. End If 7. For any I ∈ Sod 8. If (|F(I)|<N(I)) 9. ADV(I) {metadata((f)|f ∈ F(I)} 10. Else if (description(I) ≠ ø)
/* Relevant (F, I, n): contains the n most relevant(similar) files to the interest I in F*/
11. ADV(I) {metadata(f)|f ∈ Relevant(F(I) I, N(I))} 12. Else
//Random (F,n): contains n files which are taken randomly from F 13. ADV(I) {metadata(f)|f ∈ Random (F(I), N(I))} 14. End if 15. End for End Algorithm
Algorithm 3-1: Advertisement message preparation
Chapter 3 Interest Awareness
68
The data source forwards ADV(I) to direct neighbors located in the direction of peers
having information demand matching the interest I. The method proposed by LAR [80]
(described in chapter 2) is used to select neighbors according to their locations. The data
source peer ps also forwards ADV(I) to his/her direct neighbors having a high degree of
collaboration with him/her. A peer accepting the advertisement forwards the advertisement
in a similar fashion.
Example: Assume that min-tie be 0.6; in the MANET displayed in Figure 3-2, p1
advertises to p3, p4 and p5 since p3 is interested on the advertisement, p4 is located in the
direction of peers interested on the advertisement and p5 has a high degree of collaboration
with p1.
Collaboration with α Tie-strength(files/day)
Advertisemetn flow 0.25 0.75
0.6 0.9
0.005
0.09
0.4
p4
ADV(I)=sharable files matching with I ∈ Overall-
Demand(P)
Ii3∈ Information -Demand(p3) | I ≅ Ii3
Overall Demand=⊕Σinf-Demand(P)
Ii7∈Information-Demand(p7) | I≅ Ii7
2
Ii6∈ Information -Demand(p6) | I≅ Ii6
p7
p6 p5
p8
p9
1
0.25 Wireless communication
α
p3
p1
p2
min-tie 0.8
Figure 3-2: Advertisement Distribution by p1
Chapter 3 Interest Awareness
69
As discussed in Definition 3.3, the peer p5 computes the tie-strength between p1 and his
neighbors by taking the average of his degrees of collaborations towards p1 and his
neighbors. Thus, Tie-Strength(p1,p9) = (0.9+0.6)/2=0.75 and Tie-Strength(p1,p8)=
(0.9+0.7)/2 = 0.8. As min-tie is 0.8, p5 forwards the advertisement of p1 to p8 but not for to
p9.
A peer uses advertisements to identify potential sources of interesting files. It can also
search a file by distributing queries. A query q is resolved if there is a data-source having
an information provision matching with q. Let S be an information provision; we say S
matches that the query q if and only if ∃I ∈ S such that I ≈ q.
We propose to disseminate queries in the same way as advertisement dissemination. A
data-requester forwards a query to its neighbors located in the direction of a data-source
having an information provision matching with the query. The peers receiving the query
forward it to some of their neighbors in a similar way. The query can be forwarded up to a
fixed number of hops.
The discovery approach introduced in this section performs file discovery by combining
a push method (i.e., the distribution of advertisements) and a pull method (i.e., the
distribution of queries). Both methods are performed according to the interests of users. In
addition to the interests of users, the users’ patterns of collaboration are considered during
the advertisement dissemination.
3.4 Interest Identification
3.4.1 Interest Identification from Historical Data
The users’ interests can be specified by themselves reactivelly. In our scenario, Pascal
can state that he is interested in receiving jokes in Bus 37. The precise interests of users
can also be computed automatically from queries and advertisements.
Chapter 3 Interest Awareness
70
An information demand of a peer (i.e., the interests of the peer to receive information)
can be identified from his/her historical queries. Let Q be the set of queries distributed by a
peer p in a MANET in a context c. We propose to identify Information-demand(p, c) from
Q using the following two steps.
Step 1: Decomposing queries
Queries in Q are classified into different groups by using Algorithm 3-2, which is
derived from the agglomerative hierarchical clustering approach [88]. We will use the
queries in Table 3-3 and their similarity in Table 3-4 to illustrate Algorithm 3-2.
Table 3-3: Examples of queries
Query Descripition
q1 {tree, bush, grass, sidewalk}
q2 {tree, bush, sidewalk}
q3 {tree, bush, grass, ground}
q4 {tree, bush, grass, sidewalk, rock}
q5 {tree, bush, flower, grass}
q6 {clear, sky, tree, bush, ground}
q7 {overcast, sky, tree, bush, grass, sidewalk}
q8 {tree, grass, sky}
q9 {tree, grass, clouds, sky}
Chapter 3 Interest Awareness
71
Algorithm: Decomposition of queries Input : Q Q: a set of historical queries. Output: G: G: a subset of the power set of Q such that queries in the same element of G are more similar than queries in other elements of G. Begin
//initialize grouping 1. G=∅ 2. For all q∈Q 3. G= G ∪ {{q}} 4. End for 5. Repeat
//merge two similar sets of queries 6. Gnew ∅ 7. While (G !=∅) 8. Qc randomly selected element of G 9. G G – {Qc}
/*search a set Qk such that every element in Qc is similar to every element in Qk*/ 10. If (∃Qk∈G such that ∀qi∈ Qc , ∀ qj ∈ Qk , qi ≈ qj && for any Qs ∈ G, one of the
following property occurs) //There are dissimilar elements in Qc and Qs (A) ∃qi∈ Qc , ∃qj ∈ Qs such that qi !≈ qj //or Qc is more similar to Qk than Qs
(B)
Similarity(qi,q j )q j ∈Qk
∑qi ∈Qc
∑Qc * Qk
≥
Similarity(qi,q j )q j ∈Qs
∑qi ∈Qc
∑Qc * Qs
11. G G – {Qk} 12. Gnew Gnew ∪ {Qc ∪ Qk} 13. Else 14. Gnew Gnew ∪ {Qc} 15. End if 16. End while 17. G Gnew 18. //Repeat the above computations until any two sets contains dissimilar queries 19. Until: ∀Qc , Qk ∈G, ∃qi∈ Qc , ∃qj ∈ Qk such that qi!≈ qj End Algo
Algorithm 3-2: classification of Description
Chapter 3 Interest Awareness
72
Table 3-4: Similarity values calculated by using the formula presented in Definition 3.2
where similarity threshold is 0.5
q1 q2 q3 q4 q5 q6 q7 q8 q9
q1 1 0.9 0,8 0,9 0,8 0,2 0,6 0,6 0,3
q2 0,9 1 0,6 0,8 0,6 0,3 0,5 0,3 0
q3 0,8 0,6 1 0,7 0,8 0,5 0,4 0,6 0,3
q4 0,90 0,8 0,7 1 0,7 0,2 0,6 0,53 0,2
q5 0,75 0,6 0,8 0,7 1 0,2 0,4 0,6 0,3
q6 0,20 0,3 0,5 0,2 0,2 1 0,6 0,3 0,2
q7 0,60 0,5 0,4 0,6 0,4 0,6 1 0,5 0,4
q8 0,60 0,3 0,6 0,5 0,6 0,3 0,5 1 0,6
q9 0,20 0 0,2 0,2 0,3 0,2 0,4 0,7 1
Algorithm 3-2 starts the decomposition process by forming sets of queries such that each
of the sets contains one query (lines 1-4) and places them in G. In our example, G = {{q1},
{q2}, {q3}, {q4}, {q5}, {q6} {q7}, {q8}, {q9}}.
As described from lines 10 to 12, the algorithm merges two sets. Qc is merged with Qk
∈G if and only if (i) Any two elements in the two sets are similar and (ii) If there is another
set Qs in G satisfying the condition stated above (i.e., in (i)), Qc is more similar to Qk than
Qs. Merging of groups of sets is repeatedly performed until there are no similar sets in G (.
According to the example execution flows of the algorithm in Table 3-5, the queries in
our example are decomposed into G= {Q1, Q2, Q3} where Q1 = {q1, q2, q3, q4, q5} and Q2 =
{q6, q7} and Q3 = {q8, q9}.
Chapter 3 Interest Awareness
73
Table 3-5: Example of execution flows during the decomposition of queries
Input Q={ q1, q2, q3, q4, q5, q6, q7, q8, q9}
Initialization G={{q1}, { q2}, {q3}, {q4}, {q5}, {q6} {q7}, {q8}, {q9}}
The first iteration G={{q1, q4}, { q2, q3}, {q5}, {q6, q7}, {q8, q9}}
The second iteration G={{q1, q4, q2, q3}, {q5}, {q6 ,q7}, {q8 ,q9}}
The third iteration G={{q1, q4, q2, q3, q5}, {q6, q7}, {q8, q9}}
Output G={{q1, q4, q2, q3, q5}, {q6, q7}, {q8, q9}}
Step 2: Identifying information demand
In this step, the notation Occurrence(Qc, ,k) is used to represent the number of queries in
Qc containing the keyword k; and the notation Keys(Qc) is used to represent the union of
the queries in Qc, i.e. Keys(Qc) = . ∪cQqq
∈
Let G = {Q1, Q2,…, Qn} be the result of step 1. Information-demand(p,c) ={I1, I2, …,In}
is computed from G = {Q1, Q2,…, Qn} such that the interest Ic has the following properties.
Weight(Ic) =| Qc| / |Q|
Description(Ic) is the set of the most popular keywords6 in queries in the set Qc, i.e.,:
- |Description(Ic)|= min[maxK, Keys(Qc) ] where maxK is the maximum number of
keywords used to describe an interest;
- Description(Ic) ⊂ Keys(Qc);
- Occurrence(Qc, ki) ≥ Occurrence(Qc, kj) ∀ki ∈ Description(Ic) and
∀kj ∈ Keys(Qc) – Description(Ic)
Example: Assume that maxK is 3, consider the queries in Table 3-3 and their
decompositions in Table 3-5, the information demand consists of the interests displayed in
Table 3-6.
6 If two keywords have the same popularity, one of them is selected randomly.
Chapter 3 Interest Awareness
74
Table 3-6: Interests produced from queries listed in Table 3-3.
Interest Descripition Weight
I1 {tree, bush, grass } 0.6 (≅ 5/9)
I2 {sky, trees, bush, } 0.2 (≅ 2/9)
I3 {tree, grass, sky} 0.2 (≅ 2/9)
Interests with identical descriptions can be formed since we have fixed the maximum
number of keywords and this maximum number might be much less than the number of
keywords in the queries. If maxK were 2; for our example, the set containing interests
(tree, bush}, 0.6), ({tree, bush}, 0.2) and ({sky, cloud}, 0.2) would have been produced.
According to Definition 3.5, the descriptions of two interests should not be identical. Thus,
the interests with the same descriptions should be combined together as follows. If there
are interests I1, …., Ik in Information-demand(p,c) having identical descriptions, they will
be replaced by an interest I having the following properties:
• Description(I)= Description(I1)
• Weight(I)= ∑=
k
iiIWeight
1)(
An interest I is removed from Information-demand(p, c) if Weight(I) < minW, the
minimum threshold introduced in Definition 3.5. Assume minW be 0.5; the second and the
third interests will be removed from the set of interests listed in Table 3-6.
Let sumNI be the sum of the weights of the interests in Information-demand(p, c). An
empty interest Ie is computed with weight 1-sumNI if 1-sumNI ≥ minW. If 1-sumNI <
minW but sumNI < 1, the weight of each interest I is normalized as:
SumNI Weight(I) =
Weight(I)
For our example, the information demand contains one interest with weight 0.6. SumNI
is also 0.6. Therefore, the normalization of this interest’s weight (0.6/0.6) will make its
weight 1.
Chapter 3 Interest Awareness
75
In summarize, in this section, we discuss the identification of the information demands
of a peer from historical queries. The Information provisions of a peer can be determined
from historical advertisements in a similar manner.
3.4.2 Interest Estimation via Association Rules
Identifying users’ interests by analyzing historical data forces a data-requester to wait
until enough queries are produced. However, we argue that users often show similar
interests in similar contexts. Therefore, we propose to use association rules to resolve the
mentioned problem.
An association rule has two parts: antecedence and consequence. The antecedence of a
rule describes the condition that should be satisfied in order to the consequence of the rule
holds true. Association rules are attached with two important values: support and
confidence. The support of a rule is the probability that a randomly selected element will
have the property described by either of its antecedence or its consequence. The confidence
of a rule is the probability that an element has the property specified by the consequence of
the rule given that the element matches the antecedence.
Association rules can be produced from historical data. The produced rule is rejected if
it has a support and a confidence inferior to pre-defined constants called minimum support
and minimum confidence respectively.
An association rule with respect to an information demand is written as:
<Context=c> ⇒ <Information-Demand=D>
As discussed in Definition 3.5, a context is represented as (L, [Ts,Tf]) where Ts is the
start time, Tf is the end time and L is a location. For a sharing context c= (L, [Ts,Tf]), let us
refer Ts as Start-Time(c) and Tf as End-Time(c). The antecedence <context=c> is
equivalent to
Chapter 3 Interest Awareness
76
• <Start-Time=Ts> and <End-Time=Tf> and <Location=L> when both time and
location are specified.
• <Location=L> when the time interval is ∅ but the location is specified
• <Start-Time=Ts> and <End-Time=Tf> when the location is not specified but
the time is specified
The mining of a temporal condition in the form of <Time-Start=Ts> can be produced
by observing the patterns of the start-time contexts with the help of the method proposed
by Sheng Ma et el [89]. Spatial conditions (we call them location antecedents) in the form
of <Location=value> can be produced by mining the location contexts with the help of the
Apriori algorithm [90, 91].
Suppose a temporal condition <Time-Start=Ts> is already identified. Assume the
contexts C = {c1, c2, c3, …, cn} match the condition <Time-Start=Ts>, a time antecedent in
the form of <Time-Start=Ts> and <End-Time=Tf> can be obtained if
• End-time(c) ≤ Tf, ∀c ∈ C and
• ∃ c ∈ C such that End-time(c)=Tf
Let ant be an antecedence representing a context; Demands(ant,p) be a set of
information demands of the peer p at the context described by ant; Demands(ant,p) can be
known from history; let min-Conf be the minimum confidence; and finally let SD ⊂
Demands(ant,p) be a set satisfying the following condition.
Di ≈ Dj, ∀Di , Dj ∈ SD.
If |SD|/|Demands(ant,p)| > min-conf, we derive the following rule.
ant ⇒ < Information-Demand=Ds>
where Ds is an information-demand computed as:
∑∈
⊕=DSD
s DD
Chapter 3 Interest Awareness
77
Illustration: In the scenario discussed in section 3.1, Pascal exchanges information in
Bus 27 with people working in a bank and in Bus 37 with friends. Assume that the
antecedent <context=(””, [8AM-8:10AM])> has been already identified and the
information demands of Pascal are listed in Table 3-7.
Table 3-7: Historical data of Pascal
Historical
data
Start-Time End-Time Location
context
Information-
Demand
s1 8AM 8:07AM Bus 27 D1
s2 8AM 8:06AM Bus 27 D1
s3 8AM 8:05AM Bus 37 D2
s4 8AM 8:07AM Bus 27 D1
s5 8AM 8:05AM Bus 37 D3
s6 8AM 8:07AM Bus 37 D4
D1= {({finance},0.70), ({news},0.30)}, D2={({joke},1)}
D3 = {({photo},1)}, D4={({news},1)}
D1 is the information demand of Pascal in the historical data s1, s2, and s4. Therefore, the
rule is produced7 to indicate the information demand of Pascal at the context (””, [8AM-
8:10AM]).
< context = (””,[8AM-8:10AM]>)�<Information-Demand = D1>
Association rules can be modified in order to increase their confidences. For instance, in
order to make the confidence of the above rule equals to 1, we can simply modify the
antecedence as <context = (Bus 27, [8AM-8:10AM]>.
7 To produce a rule, we should compare the confidence and the support of a rule with predefined constants. For the sake of simplicity, we skip this step in the illustration.
Chapter 3 Interest Awareness
78
In this section, we discuss mining of association rules with respect to the users’
information demands. The information provisions of users can be computed in the same
manner.
3.5 Social Networking
A requester peer can use association rules to identify his/her information demands. A
data source peer can produce association rules to identify information demand of requester
peers. However, rule-mining processes are too expensive to be used for every requester
peer encountered in a MANET. Therefore, a data-source should select important peers to
which association rules are produced. We propose that social links of a data source can be
used to identify the important peers.
As a data-source can have several social links, the reasoning required to identify the
interests of requester peers could be expensive. We propose organizing peers that have a
habit of sharing information with a data source peer into social groups according to the
similarity of their interests. The social groups are then used to identify the interests of the
peers.
Social networks, which include social groups and links, are computed semi-
automatically based on the following assumption: “social network exists between users
who collaborate frequently with each other”. In the scenario discussed in section 3.1,
Pascal, David and Carol exchange jokes whenever they meet in Bus 37. These frequent
collaborations between these persons indicate that there is a social link between them.
3.5.1 Social Link
A social link, denoted as L(pi,pj), is a relationship between two peers pi and pj. The
notation Valid-context(L(pi,pj)) is used to represent the set of sharing contexts in which the
social link L(pi,pj) is valid. In our scenario, Pascal exchanges information with his
colleagues and assume that he communicates with these persons only in INSA. Thus, he
has social links to his colleagues and (INSA, ∅) is the valid context of these social links.
Chapter 3 Interest Awareness
79
As it is done in social-network-sites (e.g. Facebook), a user can manually specify
his/her social links. In the scenario, Pascal, David and Carol are friends. Assume that
Pascal specifies David and Carol as his friends; in this case, there will be a social link
between Pascal and David as well as between Pascal and Carol.
A social link can also be computed semi-automatically by analyzing the degree of
users’ collaboration in MANETs. We argue that the existence of a high degree of
collaborations between p1 and p2 indicates the existence of a social link between these two
peers. As discussed in section 3.3, the threshold of Tie-strength (i.e., min-tie) indicates a
high degree of collaboration between peers. As a result, a social link L(p1, p2) is formed if
and only if Tie-strength(p1, p2) ≥ min-tie. The context (“”,∅) is placed in Valid-
context(L(p1, p2)). Users can also specify the valid contexts of L(p1, p2).
In the scenario discussed in section 3.1, Pascal, Anne, Bob and Eve exchange
information in Bus 37. Suppose they did not specify the fact that they are friends. Their Tie
strengths, which are measured by files/day, are listed in Table 3-8. Let min-tie be 0.1
files/day; we can conclude that there are social links between Pascal and Eve and Anne has
social links with all the mentioned persons.
Table 3-8: Tie-Strengths between Pascal, Anne, Bob and Eve
Pascal Anne Bob Eve
Pascal 0.5 0.001 1
Anne 0.5 0.3 0.6
Bob 0.001 0.3 0.04
Eve 1 0.6 0.04
Chapter 3 Interest Awareness
80
∈
3.5.2 Social Grouping
A social group, denoted as G(P,C,p), is a set of peers in P having similar links with a
peer p in a context c ∈ C; the peer p is called “observer peer” and C is a set of valid
contexts for the group. Demand-In-Group(G(P,C,p),c) denotes the common information
demand of peers in P as observed by p in a context c. The social group G(P,C,p) satisfies
the following properties:
• Information-Demand(pi,p,c) ≈ Information-Demand(pj,p,c) for pi, pj P and
c∈C; and
∈
• Demand-In-Group((G(P,C,p),c) is an empty-sharing interest if c ∉C.
The common information demand of peers in the social group G(P,C,p) in the context c
C is obtained by aggregating the information demands of peers in P, i.e.,
)),,(()cp),C,(G(P,( cppLDemandnInformatioGroupInDemand piPpi
∑∈
−⊕=−−
We propose to compute a social-group as follows. Let P be a set of peers having social
links with a data-source pp. Let PG⊂ P be the set of peers and let C be a set of contexts.
Social group G(PG,C,pp) will be formed if and only if:
• where L(p)),((∩GPp
pi ppLcontextValidC∈
−= i,pp) is a social link between the peers pi
and pp
• Information-Demand(pj,pp,c) ≈ Information-Demand(pi,pp,c) ,∀c ∈ C and ∀pi, ∀pj
∈ PG
In the scenario introduced in section 3.1, Carol and David are interested in receiving
jokes in a bus. According to section 3.5.1, there are social links L(Pascal, Carol) and
L(Pascal, David). Therefore, the following group can be formed.
G({Carol, David}, {(8AM-8:05AM, Bus 37)}, Pascal)
Chapter 3 Interest Awareness
81
c
We propose to determine the users’ information demands by analyzing their social
groups. For a peer p and a context c, a data source pp computes the information demand of
the peer p by analyzing the social groups of p in the context c. GPrt(p,c), the set of groups
of p in the context c, is computed by pp as follows.
GPrt(p,c) = {G(P,C,pp)|p∈P and Ci ∈∃ such that ci ≈ c}.
The context ci is similar to c if and only if the time context and the location context in ci
are similar to the respective contexts in c. The time context in ci is similar to the time
context in c if (1) the time context is not specified in c or (2) the time context in ci is
included in the time context in c. The location context in ci is similar to the corresponding
context in c if (1) the location context is not specified in c or (2) the location context in ci is
same as or an instance of the location context in c.
If GPrt(p, c) is not empty, pp can compute the information demand of the data-requester
p at context c as:
∑∈
−−⊕=−),GPrt(
p ),( ) c,p (p, cpG
cGGroupInDemandDemandnInformatio
In this section, we discuss the computation and the usage of social groups with respect to
information demands. Social groups with respect to information provisions are used and
computed in the same manners.
3.6 Discussion
Publish/Subscribe systems as the one proposed in [92] can be considered as interest
aware systems. A publish/subscribe system consists of publishers, subscribers and a
delivery infrastructure (a sequence of brokers). In such system, a publisher produces
events; subscribers declare their interest on the events and the delivery infrastructure
forwards the events from a publisher to the corresponding subscribers. However, in
MANETs, construction and modification of delivery infrastructures are expensive since
most of the involved computing devices are mobile, battery-powered and thin. Moreover,
Chapter 3 Interest Awareness
82
publish/subscribe systems are not necessarily interest-aware. The interest-awareness
feature of this system depends on the kind of the events that publishers produce.
Lindemann and Waldhorst [25] proposed replication of indices of popular queries at
several mobile devices in order to facilitate the localization of files for those queries. Even
if popular queries represent common interests of users, identifying them is time consuming
and difficult in MANETs where users may not be aware of other participants in advance.
In contrast to MANETs, the learning of users’ interests has been extensively studied in
Web search and mining [93, 94]. Click history [95] and browse history [96] have been
proposed to capture the users’ interests automatically.
Recommender systems such as Letizia [97] and Watson [98] make suggestions to users
based on inferences made about their interests gathered from the recently viewed Web
pages or the contents of active desktop applications.
StumbleUpon (stumbleupon.com) is a recommender system that uses collaborative
filtering (CF), which is an automated process combining human opinions with machine
learning of personal preferences, to create virtual communities of like-minded Web surfers.
Rating Web sites updates a personal profile (a blog-style record of rated sites) and
generates peer networks of Web surfers linked by a common interest. These social
networks coordinate the distribution of Web content, so that users “stumble upon” pages
explicitly recommended by friends and peers. However, recommendations from CF
systems typically require explicit action from a large community of users [99].
Recommendation engines of e-commerce sites (e.g. Amazon and Netflix) works by
exploiting the users’ interests, which are derived both from their explicit actions (e.g.
buying a product) and from their interaction log behavior (e.g. clicking on certain
categories of products). In the Web search area, the interests of users, which are derived
from interaction logs, can be used to create automated-Web-search-engine evaluation
facilities [100].
Chapter 3 Interest Awareness
83
To identify the interests of users in a MANET, we have been inspired from the
identification of users’ interests based on the click-history, the browse history and to
interaction logs to design our interest identification approach. Our interest identification
approach is different from the discussed approaches in the facts that (1) we identify
interests according to the context of users, and (2) we use social networking of users and
association rules to facilitate the interest identification process.
3.7 Conclusion
In this chapter, we have proposed a comprehensive solution to discover file according to
the interests of users. We have introduced two types of interests: information demand and
information provision. The information demand is the set of interests of a user to receive
information; and the information provision is the set of interests of a user to provide
information. The users’ information demands are used to determine files to be advertised
and guide the dissemination of advertisements while information demands are used to
determine where and how queries are resolved. In this chapter, we have also proposed
approaches to (1) identify the users’ interests by analyzing their information sharing
activities and (2) to produce association rules that correlate the users’ interests with their
contexts. Finally, we have investigated how to use the users’ social networks to facilitate
the interest identification process.
Chapter 4 Lifetime Awareness
85
Chapter 4 Lifetime Awareness
As discussed in the previous chapter, in a Mobile Ad-hoc NETwork (MANET),
information discovery is usually performed by distributing advertisements and queries. In
order not to load the environment with unnecessary traffic, an advertisement policy should
be designed according to the users’ information consumptions and provisions. Information
consumptions and provisions are limited by the users’ context and stay-time (i.e, the time
that they stay together). Consequently, these parameters are of primary importance for the
design of an efficient advertisement policy.
In this chapter, we define a concept called mobility class to parameterize advertisements
according to the mobility profile of a data source. A peer can manually specify mobility
classes. We also propose an approach to generate the mobility classes semi-automatically
according to the peers’ sharing habits. Finally, we discuss the identification of mobility
classes by using both the users’ mobility patterns and habit rules.
The research work of this chapter was presented in the 11th IEEE International
Conference on Mobile Data Management (MDM 2010)[101].
The chapter is organized as follows. We discuss the motivation behind the
parameterization of advertisements according to the users’ profile and their context in
section 4.1. Section 4.2 formalizes the concept mobility class; Section 4.3 deals with the
semi-automatic generation of mobility classes. Section 4.4 discusses the identification of
mobility classes. Finally, we conclude the chapter in section 4.5.
Chapter 4 Lifetime Awareness
86
4.1 Overview
In MANETs, file discovery is usually conducted by using two approaches: pull and push.
In the former, requester peers discover files by querying peers in the vicinity. In the later,
data-sources make others aware of their sharable files by distributing advertisements. As in
the most of discovery protocols, in MANETs, information discovery should be performed
by using a hybrid of the two approaches.
An advertisement policy dictates the volume of information to be advertised, the period
after which another advertisement could be made and the number of hops that an
advertisement traverses. We argue that an appropriate advertisement policy must be
designed in such a way that the usage of a push approach is maximized without imposing
insupportable overheads.
An advertisement policy should be designed with respect to the information
consumptions and provisions of users, which are limited by the users’ stay-times and their
contexts. For instance, in the scenario presented in chapter 1, Pascal is involved in
information sharing in a bus, a stadium, a street, a shop and a restaurant. Assume that
Pascal stays from five to seven minutes in a bus. As passengers do not have time to
download all the advertised files, it is not necessary to advertise all Pascal’s sharable files
in such kind of environments. Oppositely, the stay-time of Pascal in a stadium is much
longer than in a bus. Therefore, the volume of an advertisement in a stadium should be
greater than in a bus.
Furthermore, the shorter the stays time, the higher the dynamicity of the network will be.
A highly dynamic MANET is the one formed in a street. The stay-time of a user in a street
is short. In each minute, there are a number of cars joining and leaving a MANET formed
in a street. Thus, the stay time of peers is short in this kind of MANET. In order to make
new peers aware of the sharable files, advertisements should be made more frequently in
such kind of MANET than in other kinds of MANETs.
Chapter 4 Lifetime Awareness
87
The peers’ stay-times also indicate the feasibility of the dissemination of advertisements8
with respect to the distance between peers. Assume that a data source peer and a target
peer are found at the two extreme ends of a big shop hall and the data source peer sends an
advertisement about a file to the target peer. First, the target might leave the shop before
getting the advertisement. Second, if it is possible that the target receives the
advertisement, the data source may leave the shop while the download is going on/or even
before receiving the data download request.
We introduce, in this chapter, a concept called “mobility class” to describe catagories of
MANETs with respect to the users’ stay-times and contexts. The same policy of
advertisement is applied in all MANETs of the same mobility class. In order to consider
the evolution of the users’ information needs and their capacity of information provision,
mobility classes are continuously revised and updated.
The contributions of this chapter are (1) the formalization of the concept of mobility
classes, (2) the generation of mobility classes semi-automatically and (3) the identification
of mobility classes
4.2 Formalization
A MANET is defined as a collection of devices interconnected by wireless
communication technologies. At the application level, a MANET can be seen as a
collection of users (peers). Peers can have a direct connection or they can be connected
through other peers. They are called one-hop neighbors if they have a direct connection
and are multi-hop neighbors if they are connected with the help of other peers.
Every peer has a local view of a MANET. Assume that every passenger of a bus has a
computing device. There is always a MANET in the bus in its normal operation. However,
the MANET view for a particular user is bounded by the time he/she enters and quits the
bus. In addition to the time and location contexts, a MANET view is bounded by the
awareness of peer about others in the environment. A peer is aware of his/her one-hop
8 Remember that we implement the dissemination of advertisements with respect to the interests of users (cf. chapter 3).
Chapter 4 Lifetime Awareness
88
neighbors. He/she is aware of his/her multi-hop neighbors through different
communications that he/she has made with his/her neighbors.
Definition 4.1. MANET and MANET View: A MANET, denoted as V(P), is a set of
peers communicating via wireless communication technologies. A MANET-view9,
denoted as V(P,p0), is a projection of a MANET defined by using the local knowledge of a
peer p0. Thus, V(P,p0) is a collection of peers in a MANET V(P) that the peer p0 is aware
of, i.e.,
V(P,p0) = {p|p∈P such that p0 is aware of the existence of p}.
Definition 4.2. Connectivity-lifetime: For peers p1 and p2, let note stay-time(p1, p2) be
the time that p2 and p1 are expected to stay connected together. Connectivity-
lifetime(V(P,p0,)) is defined as the average time that a peer p0 stays with the other peers in
a MANET-view V(P,p0) ,i.e.,
)pV(P,
)p Time(p,-stay
))pV(P, lifetime(-tyConnectivi0
0
0
0
)pV(P,∑
∈=
p
Definition 4.3. Adv-usage: Let us consider a MANET view V(P,p0) and a peer p0 ∈ P;
Adv-usage(p0) denotes the usage factor of advertisements made by p0 in V(P,p0). It
describes the number of files provided by p0 with respect to the number of files advertised
by the peer. It is computed as
Adv-usage(p0) =|DAF|/ (|Adv-population| *|AF|) where
AF: collection of files that have been advertised by p0. AF may contain duplicated
elements; a file appears twice in AF if it has been advertised twice.
DAF: collection of files that have been advertised and have been downloaded from
p0. By definition, all files in DAF are also a member of AF. A file appears twice in
DAF, if it has been downloaded twice.
Adv-Population: peers in V(P,p0) that received the advertisement made by p0. 9 We use the terms view and MANET view interchangeability.
Chapter 4 Lifetime Awareness
89
Definition 4.4. Sharing statistics: Let us consider a MANET view V(P,p0) and a peer p0
∈ P; a sharing statistics, denoted as s(p0,c), describes the quantitative behavior of peers in
V(P,p0) in the actual sharing context c as observed by p0. A sharing statistics s(p0,c) is
composed of the following attributes.
Hop(s(p0,c)): average distance that exists between the peer p0 and the other peers ,
Files-provisioned(s(p0,c)): average number of files provided by p0 to a peer,
Queries-received(s(p0,c)): average number of queries received by p0,
Usage-factor(s(p0,c)): Adv-usage(p0) described in Definition 4.3 and
Co-lifetime(s(p0,c)): connectivity-lifetime(V(P,p0)) described in Definition 4.2.
Definition 4.5. Mobility Class: A mobility-class m(p,c) is a structure used by a data-
source peer p to describe a class of MANET-views according to their connectivity-lifetimes
in the abstract sharing context c in such a way that the same advertisement policy is
applied in the MANETs described by the same mobility classes. More precisely, a
mobility-class m(p,c) is characterised by the abstract context c and a range connectivity
lifetime, which is referred as range–lifetimes(m(p,c)). A MANET views described by the
mobility class m(p,c) should statisify the following properties:
The contexts of the MANET views are actual sharing contexts of the abstract context
c
Their connectivity-lifetime lies in range–lifetimes(m(p,c)). If range–lifetimes(m(p,c))
is [tmn,tmx), the connectivity lifetime of a MANET-view described by m(p,c) is
greater than or equal to tmn and is less than tmx.
The advertisement policy attached with the mobility class is denoted as adv-
policy(m(p,c)). adv-policy(m(p,c)) is the advertisement policy applied in MANET-views
described by m(p,c). The advertisement policy is composed of three attributes: Adv-
volume(m(p,c)), Adv-radius(m(p,c)) and Adv-period(m(p,c)). Adv-volume(m(p,c)) is the
maximum volume of an advertisement, Adv-radius(m(p,c)) is the maximum number of
hops that an advertisement traverses and Adv-period(m(p,c)) is the time after which an
advertisement should be repeated.
Chapter 4 Lifetime Awareness
90
The two efficiency measures named Satisfactory-Factor(m(p,c)) and Overload-
Factor(m(p,c)) are attached with the mobility class. Satisfactory-Factor(m(p,c)) is the
minimum usage factor of advertisement that a MANET view described by m(p,c) is
targeted to achieve. Overload-Factor(m(p,c)) is the targeted maximum number of queries
received by p in a MANET view described by m(p,c).
Example: Assume that there is a mobility class with the context, the range-lifetimes and
the advertisement policy displayed in Table 4-1. This mobility class describes that a peer
can advertise a maximum of two files to its one-hop neighbors every one minute in a
MANET view formed in a bus from 8 AM to 10 AM and having a connectivity lifetime in
the range of [0,7) minutes.
Table 4-1: example of a mobility class
Context Range-lifetimes
(in minute)
Adv-volume Adv-radius Adv-period
(in minute)
(Bus,[8 AM, 10AM] [0, 7) 2 1 1
An inactive mobility class describes a class of MANET views in which nothing is
advertised. Thus, for the inactive mobility class m(p,c), adv-volume(m(p,c)) = 0 and
information discovery is performed by using the pull approach (i.e., query distribution) in
such kinds of MANET-views described by this mobility class.
Definition 4.6. Sharing statistics of a mobility class: Let us consider a peer p and a
mobility class m(p,c); the sharing statistics of a mobility class m(p,c), denoted as S(m(p,c)),
is defined as the sharing statistics of p in the MANET-views that are described by the
mobility class m(p,c). More precisely, for a mobility-class m(p,c), S(m(p,c)) is the set of
sharing statistics s(p,ca) such that Co-lifetime(s(p,ca))∈ range-lifetimes(m(p,c)) and ca is an
actual sharing context of the abstract context c. S(m(p,c)) is described by the following
attributes.
Chapter 4 Lifetime Awareness
91
Queries-received (S(m(p,c))) is the average number of queries received by the peer p
in the MANET-views described by m(p,c).
Usage-factor (S(m(p,c))) is the average usage factor of advertisements made by the
peer p in the MANET-views described by m(p,c).
4.3 Mobility Class Generation
4.3.1 Operation
Operation 4.1 Adv-characteristic-computation(m(p,c)): The operation is used to
compute the advertisement policy, the satisfaction factor and the overload-factor of a
mobility class m(p,c) by using the sharing statistics of m(p,c).
For not overloading the environment with advertisements, the volume of advertisement is
set less than or equal to the minimum number of queries seen in the history. The minimum
number of hops seen in the history is assigned to the number of hops that an advertisement
traverses. For the period, the average connectivity-lifetimes attached with the sharing
statistics is taken. More precisely, the advertisement policy of the mobility class m(p,c) is
computed as:
• Adv-volume(m(p,c)) = minimum(av-FP, min-QR),
• Adv-radius(m(p,c))= min-hops, and
• Adv-period(m(p,c)) = avg-cnt
where
• av-FP: average value of the “files-provisioned” attribute in S(m(p,c)).
• min-QR: minimum value of the “queries-received” attribute of the sharing
statistics in S(m(p,c)).
• min-hops: minimum hops in S(m(p,c)).
• avg-cnt: average connectivity lifetimes in S(m(p,c)).
Let’s observe the sharing statistics in Table 4-2; if S(m(p,c)) contains s1, s2, s3 and s4,
• Adv-volume(m(p,c)) = minimum( (2+3+4+5)/4 , 2)=2
Chapter 4 Lifetime Awareness
92
• Adv-radius(m(p,c)) =1 and
• Adv-period(m(p,c))=(1+2+3.5+3)/4=2.4
In the same way, we have the mobility classes listed in Table 4-2.
Table 4-2: Examples of sharing statistics
No Co-lifetime (minutes) Hops Files-provided Queries-received Usage
factor
s1 1 1 2 3 0.6
s2 2 1 3 2 0.3
s3 3.5 2 4 10 0.2
s4 3 2 5 12 0.6
s5 4 2 5 20 0.8
s6 5 2 10 15 0.7
s7 7 2 15 15 0.7
s8 9 2 20 25 0.6
Table 4-3: Range-lifetimes and advertisement volumes of classes
Mobility class Range of lifetimes Adv-volume Hops Period
m1 [0, 4) 2 1 2.4
m2 [4 , 6) 7.5 2 4.5
m3 [6 ,∞) 15 2 8
We propose to compute Satisfactory-factor(m(p,c)) and Overload-Factor(m(p,c))
according to the following objectives.
1. At least half of the advertised files should be downloaded and
2. The number of the received queries in the mobility-class should be less than the
average number of queries that has been seen in the history.
Thus, the satisfaction and the overloading factors of a mobility class m(p,c) are assigned
as:
Chapter 4 Lifetime Awareness
93
• Satisfactory-Factor(m(p,c))=0.5
• Overload-Factor(m(p,c))= avg-QR where av-QR is the average value of
the attribute queries-received in S(m(p,c).
The satisfaction-factor and the overload-factor of a mobility class can also be changed
manually.
Operation 4.2. Efficient (m(p,c)): This operation is used to evaluate the efficiency of a
mobility class from its sharing statistics observed in the history. Thus, the efficiency of a
mobility class is computed if and only if S(m(p,c)) ≠ ∅. A mobility class is efficient, i.e.,
Efficient(m(p,c), if the following properties hold true.
• The mobility class is not inactive class, i.e., Adv-Volume(m(p,c)) > 0.
• The average number of queries received in the history is less than the targeted
number of queries for the mobility class, i.e., Queries-received(S(m(p,c)))≤
Overload-Factor(m(p,c)).
• The average usage factor in the history is less than what has been targeted for the
mobility class, i.e., Usage-factor (S(m(p,c))) ≥ Satisfactory-Factor (m(p,c)).
Operation 4.3. Consecutive(mi(p,c),mj(p,c)): Two mobility classes are consecutive if
and only if the upper value in the range-lifetimes of one of the mobility class is the lower
value in the range-lifetimes of the other. Thus, Consecutive(mi(p,c),mj(p,c)) is true if one
of the following conditions is satisfied.
• t2i = t1
j or
• t2j = t1
i
where
• [t1i,t2
i) is range-lifetimes(mi(p,c))
• [t1j,t2
j) is range-lifetimes(mj(p,c)).
Operation 4.4 Included(mi(ps1,c),mj(ps2,c)): For mobility classes mi(ps1,c) and mj(ps2,c)
in a context c; let the range lifetimes of mi(ps1,c) and mj(ps2,c) be [t1i,t2
i) and [t1j,t2
j)
respectively. The mobility class mi(ps1,c) is said to be included in a mobility class mj(ps2,c)
if and only if
Chapter 4 Lifetime Awareness
94
•
jiji
t1i ≤ t1
j and
• t2i ≤ t2
j
Operation 4.5. Merge(mi(p,c),mj(p,c)): The merge operation can only be applied over
two consecutive mobility-classes mi(p,c) and mj(p,c). Let the range-lifetimes of the
mobility-class mi(p,c) be [t1i,t2
i) and the range-lifetimes of the mobility-class mj(p,c) be
[t1j,t2
j). The operation Merge(mi(p,c),mj(p,c)) produces a new mobility-class m(p,c) such
that m(p,c) has all the properties of mj(p,c) except its range-lifetimes and sharing statistics.
The range-lifetimes(m(p,c)) is calculated as:
⎪⎩
⎪⎨⎧ ==
Otherwise) t,[tt t if) t,[t c))m(p,lifetimes(-Range i
2j
1
1221
The sharing statistics attached with the mobility class is re-initialized, i.e., S(m(p,c))=∅.
The objective of applying the operation Merge is to enhance the efficiency of a non-
efficient mobility class by using the advertisement policy of an efficient mobility class.
Example: Let us consider mobility classes in Table 4-3, Table 4 4 shows the merging of
the mobility class m2 with the mobility class m1 and vice versa.
Table 4-4: Merging of mobility classes
Merging Rang-lifetimes Adv-
Volume
Adv-
radius
Adv-
period
m2 with m 1 [0, 6) 2 1 2.4
m2 with m1 [0, 6) 7.5 2 4.5
Operation 4.6. Copy-adv(mi(p1,c), mj(p2,c)): For mobility classes mi(p1,c) and
mj(p2,c), Copy-Class(mi(p1,c),mj(p2,c)) produces a mobility class m(p1,c) having all the
properties of mi(p1,c) except the advertisement policy and the sharing statistics. The
advertisement policy of the new mobility class is the same as the advertisement policy of
mj(p2,c), i.e., Adv-policy(m(p1,c))= Adv-policy(mj(p2,c)). The sharing statistics attached
with the mobility class will be re-initialized, i.e., S(m(p1,c))=∅.
Chapter 4 Lifetime Awareness
95
The objective of the operation Copy-Adv10 is to enhance a mobility class prepared for a
peer p1 in a context c by using a mobility class prepared for a peer p2 such that p1 and p2
have similar information provisions in the context c.
Operation 4.7. Divide(m(p,c)): The operation Divide returns nothing(i.e., ∅) if the
mobility class m(p,c) is not divisible. The mobility class m(p,c) is said to be divisible if
there are at least two sharing statistics in S(m(p,c)) having different connectivity lifetimes,
i.e.,
∃s1, s2 ∈ S(m(p,c)) such that Co-lifetime(s1) ≠ Co-lifetime(s2)
Let us define the notation unique(S) before describing how the division operation is
performed. For a set of sharing statistics S, we use unique(S) to represent the sharing
statistics in S having different connectivity lifetimes, i.e.,
• ∀s1∈ S, ∃s2∈ unique(S) such that Co-lifetime(s1) = Co-lifetime(s2) and
• Co-lifetime(s1) ≠ Co-lifetime(s2) ∀s1, s2∈ ∈ unique(S)
For a divisible mobility class m(p,c) having a range-lifetimes [t1,t2), the operation
Divide(m(p,c)) produces two mobility-classes m1(p,c) and m2(p,c). The range-lifetimes of
the mobility-classes m1(p,c) and m2(p,c) are [t1, tmd) and [tmd, t2) where tmd is the median
connectivity lifetimes in unique(S(m(p,c))). The sharing statistics attached with the two
mobility classes satisfy the properties:
• Co-lifetime(si) < Co-lifetime(sj) ∀si ∈ S(m1(p,c)) and ∀sj ∈ S(m2(p,c)).
• 12
) c))(p,unique(S(m c))(p,unique(S(mk ±= and |S(mk(p,c))|>1 for k=1 or 2
• Co-lifetime(s) ≥ tmd ∀s∈ S(m2(p,c)) and Co-lifetime(s) < tmd ∀s∈ S(m1(p,c))
• S(m1(p,c)) ∪ S(m2(p,c)) = S(m(p,c))
10 For the sake of simplicity, we consider mobility classes defined for the same contexts. However, it is simple to extend the operation to
consider similar contexts.
Chapter 4 Lifetime Awareness
96
The advertisement policies, the satisfaction factors and the overload factors of the
resulted mobility classes are computed by using operation 4.1.
4.3.2 Computation
We propose a method to generate mobility-classes for a data source peer ps in a given
context according to the following objective. Mobility classes should be formed in such a
way that (1) the network traffic created by advertisements is manageable, (2) the discovery
of information is facilitated, and (3) the number of queries to be distributed is reduced.
Assume that a peer ps has just started sharing information in a sharing-context c. If there
is a peer pi such that the information provisions of ps and pi are similar, ps can use the
characteristics of the mobility classes of pi to define its own mobility classes. Otherwise, a
mobility-class m(ps,c) with range-lifetimes(m(ps,c))= [0, ∞) and adv-volume(m(ps,c))=0
will be used as the only mobility class. In this case, data-requesters discover the sharable
files of ps by using the pull discovery approach in the context c.
We propose to enhance the efficiency of mobility classes by using three types of
heuristics: optimistic, pessimistic and neutral. The optimistic heuristics modifies mobility
classes based on the assumption: “inefficiency occurs since (1) the advertisement volume is
too limited to include the files needed by users or/and (2) the advertisement radius is too
short to reach the potential users.” The pessimistic heuristics performs modification based
on the assumption: “the inefficiency of the mobility class occurs due to an over estimation
of the advertisement volume.” The neutral heuristics does not make any assumption but
tries to enhance the efficiency of the mobility classes by merging/dividing them as well as
by using the behavior of the mobility classes computed by similar peers.
The optimistic heuristics increases the volume of advertisement while the pessimistic
heuristics reduces the volume of advertisement. Let [t1, t2) be the range-lifetimes of a
mobility-class m(ps,c). Let ∆ and α be pre-defined incremental factors of the advertisement
volume and of the period respectively. Let β be the highest number of hops that an
Chapter 4 Lifetime Awareness
97
advertisement can traverse. The pseudo-codes in Figure 4-1 and Figure 4-2 are used to
augment and reduce the total number of metadata distributed in a mobility-class m(ps,c).
1. If adv-volume(m(ps,c))+ ∆ ≤ Overload-Factor (m(ps,c)) then
adv-volume(m(ps,c))+=∆
2. if Adv-radius(m(ps,c))< β then Adv-radius(m(ps,c))++
3. if adv-period(m(ps,c)) > α then adv-period(m(ps,c))-= α
Figure 4-1. Augment-Volume
1. if adv-volume(m(ps,c)) ≤ ∆ && (Adv-radius(m(ps,c))≤ 1) &&
adv-period(m(ps,c))+α (t≥ 2- t1) then adv-volume(m(ps,c))=0
2. if adv-volume(m(ps,c))>∆ then adv-volume(m(ps,c))-=∆
3. if Adv-radius(m(ps,c))>1 then Adv-radius(m(ps,c))--
4. if adv-period(m(ps,c))+α <(t2- t1) then adv-period(m(ps,c))+= α
Figure 4-2. Reduce-Volume
In Algorithm 4-1, we use the neutral heuristic as long as possible. However, the neutral
approach cannot be applied if the operations “Merge”, “Copy-adv” and “Divide” cannot be
performed. In this case, we propose the application of the optimistic heuristic.
Chapter 4 Lifetime Awareness
98
Algorithm: Mobility Class Computation Input: ps, M(p,c) for all p∈ PA∪ {ps}, Inf-Pr(p) for all p∈ PA∪ {ps} ps : data source peer in consideration M(p,c): mobility classes of the peer p in the sharing context c PA: set of peers having similar information provision with ps in the context c and have a high degree of collaboration with ps. Inf-Pr(p): denotes Information-Provision(p) Output: M(ps,c) Begin 1. For each mi(ps,c) ∈ M(ps,c) such that S(mi(ps,c)) ≠∅ 2. If (Efficient(mi(ps,c))) 3. #opt-mod(mi(ps,c)) 0 4. Else
//Merge: Neutral Heuristics 5. Case 1: ( ∃ mj(ps,c) ∈ M(ps,c) | Efficient(mj(ps,c)) &&Consecutive(mi(ps,c), mj(ps,c)) ) &&
(If ∃ mk(ps,c) ∈ M(ps,c) | Consecutive(mi(ps,c), mk(ps,c)) && Efficient(mk(ps,c)) then Usage-Factor(mj(ps,c))≥ Usage-Factor(mk(ps,c)) )
6. M(ps,c) (M(ps,c) - { mi(ps,c), mj(ps,c)}) U Merge(mi(ps,c), mj(ps,c)) 7. #opt-mod(m(ps,c)) #opt-mod(mi(ps,c)) 8. Break
//Copy adv: Neutral Heuristics 9. Case 2:( ∃ p∈ PA && ∃ mj(p,c) ∈ M(p,c) |Included(mi(ps,c), mj(p,c)) && Efficient(mj(p,c))) &&
(If ∃ p’ ∈ PA and ∃ mk(p’,c) ∈ M(p’,c)| Included(mi(ps,c), mk(p’,c)) && Efficient(mk(p’,c)) then Similarity(Inf-Pr(ps), Inf-Pr(p)) ≥ Similarity(Inf-Pr(ps), Inf-Pr(p’)))
10. M(ps,c) (M(ps,c) - { mi(ps,c)}) U Copy-adv (mi(ps,c), mj(p,c)) 11. #opt-mod(m(ps,c)) #opt-mod(mi(ps,c)) 12. Break
//Divide: Neutral Heuristics 13. Case 3: Efficient(m1(ps,c)) || Efficient(m2(ps,c)) for m1(ps,c),m2(ps,c) ∈ Divide(mi(ps,c)) 14. M(ps,c) ( M(ps,c) - { mi(ps,c)} ) U { m1(ps,c),m2(ps,c)} 15. For j=1 to 2 16. #opt-mod(mj(ps,c)) #opt-mod(mi(ps,c)) 17. End for 18. Break
//Augment: Optimistic Heuristics 19. Case 4: #optimal-mod(mi(ps,c)) < opt-Limit 20. Augment-Volume(mi(ps,c)) 21. #opt-mod(mi(ps,c))++ 22. Break
//Reduce: Pessimistic Heuristics 23. Case 5: Usage-Factor(mi(ps,c)) < Satisfactory-Factor(mi(ps,c)) 24. Reduce-Volume(mi(ps,c)) 25. End Case 26. End For End Algorithm
Algorithm 4-1: Mobility Class Computing
Chapter 4 Lifetime Awareness
99
Let #opt-mod(m(ps,c)) be the number of consecutive optimistic-modifications made on
a mobility class m(ps,c) and opt–limit be the maximum number of times that an optimistic
heuristic can be consecutively applied. The optimistic heuristic is said to be failed if #opt-
mod((m(ps,c)) is equal to opt-limit. A pessimistic heuristic is applied when the optimistic
heuristic failed to work.
Algorithm 4-1 is used to enhance efficiency of the mobility classes of ps in a context c.
The algorithm accepts as inputs mobility classes of the peer and of the set of peers that
have similar information provisions as ps and have a high degree of collaboration with this
peer. The algorithm processes only mobility class that has been applied in the history
because of the following reasons: (1) efficiency of the mobility class is defined based on
historical observations and (2) it is not important to process a mobility class that has never
been used.
Let PA be the set of data-sources such that pi ∈ PA satisfies the following properties:
• tie-strength(pi, ps)>min-tie and
• Information-Provision(pi,c) ≈ Information-Provision(ps,c).
As discussed in section 3.3, min-tie, the minimum tie, is used to indicate a high degree of
collaboration between peers.
Let M(ps,c) be the mobility classes that a data source ps uses to determine the
advertisement policy of MANET-views at context c. The peer ps can use Algorithm 4-1 to
enhance the inefficiency of a mobility-class mi(ps,c) ∈ M(ps,c) via one of the five cases
described below.
Case 1 Merging: The merge operation is applied if there is an efficient mobility class
mj(ps,c) ∈ M(ps,c) satisfying the following conditions: (1) mj(ps,c) is consecutive to
mi(ps,c) and (2) if the other consecutive mobility class of mi(ps,c), let us refer it as mk(ps,c),
is efficient then the usage factor of advertisement of mj(ps,c) is greater than or equals to
that of mk(ps,c). In this case, mi(ps,c) and mj(ps,c) are replaced by the mobility class resulted
Chapter 4 Lifetime Awareness
100
by merging them (let’s call this class m(ps,c)). As described in Operation 3.5, m(ps,c) will
have all the property of mj(ps,c) except range-lifetimes and historical statistics.
Case 2 Copying: This case is applied if it is not possible to apply case 1 and there is a
peer p ∈ PA having an efficient mobility class mj(p,c) such that the following properties
are satisfied:
• The range-lifetimes of mi(ps,c) is included in the range-lifetimes of mj(p,c)
• The mobility class mj(p,c) is efficient, and
• If there is any other peer p’ satisfying the conditions (1) and (2), the information
provision of ps is more similar to the information provision of p than that of p’ in
the context c.
In this case, the mobility class mi(ps,c) is replaced by the mobility class resulted by
copying the advertisement policy of mj(p,c) to mi(ps,c).
Case 3 Division: This case is applied if there is no efficient mobility class described in
case 1 and case 2 and one of the mobility classes resulted by the division operation is
efficient. In this case, mi(ps,c) is replaced by the mobility classes resulted from the division
operation.
Case 4 Advertisement augmentation: This case is applied if it is not possible to apply
the above cases. In this case, an optimistic approach is used to increase the volume of
advertisement via the augmentation method presented in Figure 4-1.
Case 5 Advertisement reduction: This case is applied if the inefficiency of a mobility
class occurs due to an unsatisfactory usage of advertisements and the optimal approach can
not be applied on the mobility class. A pessimistic approach is used to reduce the total
volume of advertisements via the reduction method presented in Figure 4-2 with the
objective of increasing the usage of advertisement.
To summarize, in this section, we have proposed to compute mobility classes semi-
automatically. In a given context, mobility classes with respect to a peer can be initialized
Chapter 4 Lifetime Awareness
101
by the mobility classes of a similar peer or by an inactive mobility class. Mobility classes
are enhanced by operations presented in section 4.3.1.
4.4 Mobility class Identification
4.4.1 Stay time
A peer can identify a mobility class by using the connectivity lifetime of a MANET
view, which is determined by the stay times of the peers. The stay time of two peers p1 and
p2 is the time that they stayed together (Definition 4.2).
The stay time of direct neighbors having mobility patterns described by constant
velocities and directions can be calculated by using the method proposed in [102]. Let
(xi,yi), vi and θi be the position, the velocity and the direction of a peer pi respectively and r
be the transmission range of the communication technology. As discussed in [102], the
stay time of p1 and p2 can be computed as follows:
22
2222
21 babc)-(ad-)rb(acd)(ab-
)p,time(p-Stay+
+++=
Where a = v1cos θ1 – v2 cos θ2, b=x1 - x2, c = v1sin θ1 - v2sin θ2, d=y1 - y2
Figure 4-3 illustrates that the stay time of a peer p1 located at (1,3) and a peer p2 located
at (4,1) with the specific velocity is 0.3 minutes given that they are connected with a
network technology having a 10 meters of transmission range.
Before defining the stay-time of multi-hop neighbors, let us define the stay time of a
path. Assume that ph is a path connecting peers p0 and pm. The path ph can be expressed as
a sequence of peers p0, p1, …,pm such that there is a direct connection between pi and pi+1
for 0 ≤ i < m. The path ph is valid if and only if any of two adjacent peers in ph stay
connected. Therefore, the stay time of ph is calculated as follows;
stay-time-path (ph) = [ ]10),( 1min −=− + mtoiforpptimestay ii
Chapter 4 Lifetime Awareness
102
X
Y
θs=0° vs=40m/minute
p1
p2
a=40*cos(0°)-45 *(cos(315°)=40 b=1-4=-3 c=40 *sin(0°)-45 *(sin (315°)=32 d=3-1=2 r=10meter ab+cd=40*-3+32*2=-56 a2+b2=1600+9=1609 (a2+b2)r2=160900 (ad-bc)2=(40*2+-3*32)2=256
Stay-time(ps,pd)=( 56+ 160644 )/1609=0.3
θd=315° vd= 45m/minute
Figure 4-3: An example of stay-time computation
In a MANET, multiple paths can be used to connect two peers. Let Ph be the set of paths
connecting the peers p1 and p2. The two peers stay connected if and only if there is at least
one valid path in Ph. Therefore, the stay time of p1 and p2 is computed as:
[ ]∈ Phphfor (ph)path -time-staymax stay-time (p1, p2) =
4.4.2 Association Rule Mining
Mobility classes are determined in the function of the peers’ stay-times. As discussed in
section 4.4.1, the stay-times of peers are determined by analyzing their mobility patterns in
a MANET. However, this way of determining the stay-times of peers is difficult since
Chapter 4 Lifetime Awareness
103
peers may not follow pre-defined mobility patterns. We argue that we can overcome the
above challenge by analyzing the historical behaviors of peers.
The average stay-time of a peer can be computed from the history by using the actual
time that p stayed with other peers in a MANET in a given context. We propose to use
association rules; we also call them “habit-rules”, to estimate the average stay-time of a
peer in a given context and to determine the mobility class of a MANET view.
The association rule below associates the mobility class m3 to the context (Bus 3, [8AM-
8:10AM]).
<Context = (Bus 3, [8AM-8:10AM])> <mobility class=m3>
A habit rule has two parts: an antecedent context (e.g., <Context=(Bus 3, [8AM-
8:10AM])>) and consequent mobility class (e.g., <mobility class=m3>). As discussed in
chapter 3, antecedences can be produced by a method derived from those proposed in [89]
and [90, 91]. Let minConf be the minimum threshold confidence of a rule and S(ant) be the
set of sharing statistics matching the antecedence ant. Assume that there is a mobility class
m(p,c) such that ant is an actual sharing context of the abstract context c. According to
Definition 4.6, S(m(p,c)) is the set of sharing statistics matching the mobility class m(p,c).
A rule ant <mobility class=m(p,c)> is formed if and only if
minConfS(ant)
c))S(m(p,S(ant)≥
∩
In the scenario presented in chapter 1 and in section 3.1, Pascal exchanges information in
a bus with friends and people working in a bank. Let m(Pascal, (Bus,φ )) be a mobility
class with range-lifetimes [0, 6). Let’s observe the sharing statistics of Pascal displayed in
Table 4-5 and the antecedent <context = (“Bus”,[8AM-8:05AM])>.
The contexts attached to sharing statistics in Table 4-5 are the actual contexts of the
abstract context (Bus,φ ). The sharing-statistics s1, s2, s4, and s5 have a connectivity-
Chapter 4 Lifetime Awareness
104
lifetime in the rang-lifetimes of the mobility class m(Pascal, (Bus,φ )), i.e., less than 6.
Thus, the rule below can be formed from the above historical data11.
<context=(“Bus”,[8AM-8:05AM])> <mobility class = m(pascal, (Bus,φ ))>
Table 4-5: Pascal’s sharing statistics
Sharing
statistics
Context Co-lifetime
s1 (Bus 27 ,[8AM-8:07AM]) 5
s2 (Bus 27, [8AM-8:06AM]) 5.9
s3 (Bus 37, [8AM-8:05AM]) 7
s4 (Bus 27, [8AM-8:07AM]) 5
s5 (Bus 27, [8AM-8:05AM]) 2
s6 (Bus 37, [8AM-8:07AM]) 7
In general, association rules can be used to estimate the mobility class of a MANET view
according to the actual sharing contexts. These rules can be produced by mining contextual
patterns in the historical sharing-statistics and the similarity of the connectivity lifetimes
attaches with these historical data.
4.5 Conclusion
Information discovery in a MANET can be performed by using the pull approach (via
querying) and by using the push approach (via advertisements). To maximize the usage of
the push approach, we introduce a novel concept called a mobility-class that parameterizes
the advertisement policy according to users’ stay-times and their context. Mobility classes
can also be computed semi-automatically by using the approach proposed in this chapter.
Peers can determine the current mobility classes by analyzing their stay-times or by using
habit rules. 11 To produce a rule, we should compare the confidence and support of a rule with predefined constants. For
the sake of simplicity, we skip this step in the illustration.
105
Chapter 5 File classification and Organization
In the previous two chapters, we have presented interest-aware and lifetime-aware
information sharing methodologies. In these two chapters, sharable files are advertised by
disseminating their metadata. This kind of advertisement can impose a high burden on
devices. In this chapter, we propose an algorithm that organizes sharable files in a tree,
named a file tree, so that files can be advertised briefly or in detailed by using their
organization in the tree.
The research work in this chapter has been published in the International Journal on
Computer Science and Information Systems [103] and has been presented in the
conference on Pervasive Computing and Communications Workshops (PerComW 2010)
[85].
This chapter is organized as follows: we discuss our motivation in section 5.1; file
representation and organization are covered in section 5.2; classification of files is
presented in section 5.3; we illustrate the application of file trees in file discovery in
section 5.4; section 5.5 discusses the main contributions presented in this chapter, finally,
we conclude this chapter in section 5.6.
Chapter 5 File Classification and Organization
106
5.1 Motivation
Organizing files into a tree facilitates the presentation of files and minimizes the load of
peers involved in information sharing. According to the scenario presented in chapter 1,
Pascal exchanges photos with his friends. Let’s consider the MANET displayed in Figure
5-1 where Pascal is connected to Bob, Carol, David and Eve. Assume that Pascal is
interested in receiving photos of vegetables and the other participants of the MANET are
interested to provide photos about vegetables. Suppose that the advertisement quota is 2
and the forwarding factor is also 2. As a result, each of the participants of the MANET can
advertise two of vegetables’ photos to Pascal.
Assume that Pascal wants to receive a photo about Jerusalem artichoke. The
advertisement quota is too small to indicate the peers owning the photo that Pascal is
looking for. Most probably, he will be forced to search the photo by distributing a query.
As the forwarding factor is small, he will need to distribute the query repeatedly. This type
of file discovery will make the environment overloaded with queries and will take time to
satisfy users’ information needs.
Now, assume that participants of the MANET organize their files in file-trees as shown
in Figure 5-2. Bob informs Pascal that he has photos of tubers and seeds. Other participants
advertise files in a similar fashion. Pascal knows that Jerusalem artichoke is a tuber
vegetable. As consequence, he learns that Bob has the potential to provide the required
photo. Therefore, Pascal decides to send the query only to Bob.
Organizing files in a tree permits users to advertise files at a high level. This kind of file
advertisement allows users to know the potential of a peer to provide the required files and
so to limit the dissemination of his/her queries to potential peers.
In this chapter, we introduce a concept called “cluster” that organizes files hierarchically.
In other words, a cluster represents group of files or group of other clusters. We then
propose an algorithm that classifies files into clusters.
Chapter 5 File Classification and Organization
107
?
??
? 4
4
1 1
1 1
3
Cabagge Carrot
Potato Bean
Cabagge Onion
Tomato Broccoli
Pascal
Searching photo of Jerusalem artichoke
Carol David
Eve Bob
2
Task order Advertisement
Query
Adv-Volume 2
Forwarding factor 2
?
α
Figure 5-1: Query resolution via advertisements about individual files
?3
1 1
1 1leaves roots
tubersseeds
bulbs leaves
fruits flowers
Pascal
Searching photo of Jerusalem artichoke
Eve
2
Advertisement Query ?
Adv-Volume 2 Forwarding Factor 2
David Carol
Bob
Vegetable
leaves roots
Vegetable
tubers seeds
Vegetable
bulbs leaves
Vegetable
fruits flowers leaves
Figure 5-2: File organization and Query resolution
Chapter 5 File Classification and Organization
108
5.2 Information Representation
Files are represented via their metadata. Metadata of a file are composed of basic
metadata and specialized metadata. Basic metadata are described in Table 5-1. The
attributes FileID is assigned by using a sequential number and the Mac address of the
device where the file is created.
Table 5-1: Basic description of a file photo
Attributes Descriptions
FileID unique identifier of the file
Description list of keywords that describes the file
FileSize the size of the file
Specialized metadata depend on the type of the file. As displayed in Figure 5-3, a
specialized metadata of a photo, for example, can contain objects of interest identified in
the photo and spatial/temporal context of the photo snapshot.
< ?xml version = "1.0" encoding="UTF-8" ?> <! -- Description of the file format --> <actor>Pascal, Anne, Michael </ actor > <Format> <Type> jpeg</Type> </Format> <location> Part Dieu </ location >
<Time>28/04/2010 </Time>
Figure 5-3: Example of specialized metadata of a photo
Chapter 5 File Classification and Organization
109
Well-known content description metadata models like Dublin-core [104] and MPEG-7
[105] can be used to represent the metadata of a file. In this chapter, we use an abstract
representation of metadata to present and discuss our work.
To facilitate file searching and categorization, files are mapped in a space via vector
space modeling (VSM) techniques [87,106]. Vector space modeling is a standard
technique in information retrieval to represent documents through their contents.
In VSM, a document di is represented by a vector di = {wi1,wi2, . . ., win} where wij
represents the weight of the term j in the document di. To produce this vector for a text
document, the document is parsed into series of words in such a way that the parsing
process removes stop words such as prepositions, conjunctions, common verbs, pronouns,
articles and common adjectives. The documents are then represented in a term x frequency
matrix [87,106]. A document vector can be considered as a vector in the term x frequency
space that is usually referred as vector space.
In this thesis, we propose to organize files hierarchically into clusters. The structure
containing the clusters is called a file tree. The metadata of a cluster are described in Table
5-2. The metadata of a cluster contain its description as well as the IDs of files and sub-
clusters grouped under it. The metadata also contain the average size of files grouped in the
cluster. An example of metadata of a cluster is given in Figure 5-4.
Table 5-2: Description of a cluster
Attributes Descriptions
ClusterID unique identifier of the represented cluster
Description list of keywords that describes the cluster
FilesIDs ids of the files found under the represented cluster
SubClusterIDs ids of the clusters found under the represented cluster
AvgFileSize the average size of files grouped in the cluster
Chapter 5 File Classification and Organization
110
< ?xml version = "1.0" encoding="UTF-8" ?>
< ClusterID > C0083blueD11</ ClusterID >
<Description> Pascal, Campus </Description>
< FileIDs > F0083blueD333, F0083blueD343 , F0083blueD355 ,
F0083blueD356, F0083blueD356,F0083blueD360</ FileIDs >
< SubClusterIDs> C0083blueD21, C0083blueD22</ SubClusterIDs >
< AvgSizeFile>544KB </ AvgSizeFile>
Figure 5-4: An example metadata of a cluster
The description of a cluster c is computed by using the description of the files/clusters
under it. Assume that maxKey is the maximum number of keywords used to describe a
file/cluster. Suppose G contains the descriptions of the files/clusters grouped under the
cluster c. Let Occurrence(G,k) be the number of descriptions in G containing the keyword
k and keys(G) be the union of descriptions in G. Description(c), the description of the
cluster c, is defined as the set of the most popular keywords that appear in Keys(G), i.e;
- | Keys(G) | < maxKey ⇒|Description(c)| = | Keys(G)|
- | Keys(G)| ≥ maxKey ⇒ |Description(c)| = maxKey
- Description(c) ⊂ Keys(G)
- Occurrence(G, ki) ≥ Occurrence(G, kj), ∀ki ∈ Description(c), ∀kj ∈ Keys(G) -
Description(c)
The vector representation of a cluster c is computed as follows. Let V be the set of
vector-representations of files/clusters classified under c. The vector representation of c is
the centriod vector12 of V. In the example displayed in Figure 5-5, V contains the vectors in
the circle and the center of the circle is the vector representation of the cluster.
12 The centroid vector is the average vector of the vectors in V.
Chapter 5 File Classification and Organization
111
Vector representation of c
Vector classified under the cluster
Figure 5-5: Vector representation of a cluster
We propose to apply a VSM technique to construct a virtual vector space to represent
advertised files. As keywords in the textual descriptions of files are the most important
terms of the file, in this thesis, we propose to construct the virtual vector space from the
term statistics with respect to textual descriptions of files.
5.3 Classification Algorithm
5.3.1 File Classification
Clusters are organized into a structure called a file tree. The root of the tree is an artificial
cluster representing all sharable files.
A file-tree is constructed in a bottom up fashion; files are classified into clusters; the
resulting clusters are then classified into other clusters; the classification continues until a
tree of the required height is obtained.
New sharable files, which are added after the classification has been performed, can be
automatically added into clusters found at the leaves of the file tree.
Files/clusters can be classified by using a content-based approach, a metadata-based
approach or a hybrid of the two. A content-based approach performs classifications by
Chapter 5 File Classification and Organization
112
using the files’/clusters’ vector-representations; a metadata-based approach performs
classifications by using the textual-description of the files/clusters.
A content-based approach may not be always applicable since some of the files may not
have vector representations. A vector space can not be computed every time a file is added
due to the following reasons: (1) vector space computation is expensive; and (2) when a
file is added in thin device, this device may not encounter right away the device that is
capable to compute the vector space on its behalf. Consequently, the recently added files
may not have vector representations. We propose to apply a hybrid approach if there are
files that do not have vector representations13.
In a hybrid classification approach, files are first classified using a metadata-based
approach into clusters such that each of the clusters contains some files that have a vector
representation. The vector representations of the resulted clusters are determined by using
only the files that have a vector representation. Afterwards, the classifications of the
clusters are performed by a content-based approach.
The k-means* algorithm (Algorithm 5-1) classifies files/clusters according to their
similarities. Let us discuss about similarity of files and clusters, before discussing the
algorithm.
The similarity between files and clusters can be computed based on the similarity of
either their textual descriptions or their vector-representations. Let E1 and E2 represent
clusters, or/and files. Let the sets of keywords D1 and D2 be the textual descriptions of E1
and E2 respectively.
As discussed in chapter 3, D1 is similar to D2 if and only if Similarity (D1, D2) ≥ minSim
where minSim is the similarity threshold. The similarity value between E1 and E2 is equals
to the similarity value between D1 and D2, i.e.,
Similarity (E1,E2)= Similarity (D1, D2),
13 In this chapter, all files are assumed to have metadata.
Chapter 5 File Classification and Organization
113
E1 and E2 are similar if and only if D1 and D2 are similar, i.e.,
E1 ≈E2 D1 ≈ D2
The similarity of files and clusters can also be derived from the similarity of their vector
representations. Let 12γ be the angle between the vectors representations of E1 and E2; the
similarity of E1 and E2 is calculated as:
Similarity(E1,E2) =cos ( 12γ )
The elements E1 and E2 are said to be similar if and only if
Similarity(E1,E2) minCosSim ≥
where minCosSim is the minimum cosine similarity value.
K-means* performs the classification process according to the similarity of files and
clusters. Let k be the number of clusters required, S be the textual-descriptions/vector-
representations of files/clusters to be classified. The algorithm puts k or less dissimilar
elements in the set heads (lines 3-9) such that the element in heads satisfies the following
condition.
Similarity(si,sj) <minSim for si, sj∈heads
Note that the value of minSim is different for content-based and metadata-based
classifications.
As described in lines 11-14, k-means* classifies the elements in S according to their
similarities to the elements in the set heads. It copies the content of the set heads, which is
initialized at the beginning of the algorithm, to the set oldheads. The set heads is re-
initializes the set heads (lines 15 and 16). From each group, the algorithm selects a head
element of the group in such a way that the head is more similar to the elements in the
group than any other element in this group (lines 17-19). Regrouping and recompilation of
group-heads continue until heads in consecutive steps are similar or the loop is performed
for a maximum number of times (maxIteration).
Chapter 5 File Classification and Organization
114
Algorithm: k-means* INPUT:S, k, minSim S: files/clusters representations k: number of clusters minSim: a threshold indicating the similarity of elements. Note that minSim has different values for content and metadata based classifications. OUTPUT: heads, member(e) ∀e∈heads heads: group-heads of the resulting clusters members(e): files/clusters members of the cluster headed by e BEGIN
//take dissimilar elements randomly 1. heads= ø 2. S’=S
//find k or less number of heads 3. WHILE (|heads| < k) && (S’ ≠ ø) 4. α= S’.randomSelect() // take an element randomly 5. S’= S’-{ α} //remove the element
//add α in heads if it is dissimilar to the other elements in heads 6. IF ((heads = ø) || (Similarity (α,β)< minSim, ∀ β ∈heads) ) 7. heads.add(α) 8. END IF 9. END WHILE 10. i=0 11. Do
//map elements into clusters 12. FOR each s ∈S
/*put s in the group headed by α if s is more similar to α than to other group heads*/
13. members (α) .add(s) for α ∈ heads such that Similarity(s, α) ≥ Similarity(s,β) ∀β ∈ heads
14. END FOR //copy the content of heads into oldheads
15. oldHeads=heads //reset heads
16. heads =ø //re-compute heads
17. FOR each h ∈ oldHeads //determine the best head for the group currently headed by h
18. heads.add(α) such that ∀β≠α∈ members(h)
∑∑∈∈
≥)()(
),(),(hmemberswhmembersw
wsimilaritywsimilarity βα
19. END FOR 20. i++ 21. WHILE ((oldHead !=heads) &&(i<maxIteration)) END ALGORITHM
Algorithm 5-1: file clustering based on k*-means
Chapter 5 File Classification and Organization
115
Let the height of a file tree be h and the number of clusters at depth i be ni for 1 ≤ i ≤ h.
The file tree is computed as:
Step 1: files are classified into nh clusters by using the k-means* algorithm.
Step 2: for each depth i, i = h-1, h-2, …, 1, clusters found at depth i + 1 are classified
into ni clusters.
Step 3: all clusters at depth 1 are grouped into the root cluster, which is an artificial
cluster representing all sharable files.
The next section studies the determination of the dimension of a tree (i.e., a tree high and
number of clusters in each depth of the tree) in the function of mobility classes.
5.3.2 Computation of the File tree’s Dimension
We propose to compute the height of a file tree and the number of clusters at each depth
according to the mobility classes considered by a source peer to determine the
advertisement policies in MANETs. The number of clusters at a depth of the file tree
should be related to the volume of advertisement attached with a mobility class m so that
the clusters in this depth will be advertised in MANET-views described by the mobility
class m.
Consider the file tree in Figure 5-6. Assume that there are two mobility classes m1 and m2
with advertisement volumes equals to 2 and 8 metadata. As a result, the clusters at depth 1
correspond to the mobility class m1 and those in the depth 2, to the mobility class m2.
Therefore, the clusters c11 and c12 will be advertised in MANET-views described by the
mobility class m1; Clusters c21, c22, c23, c24, c25, c26, c27 and c28 will be advertised in
MANET-views described by the mobility class m2. We discuss advertisements of
files/clusters in section 5.5.
Chapter 5 File Classification and Organization
116
m2
m1
C0
C11 C12
C21 C23
C24
C22
C25
C27
C28
C26
Figure 5-6: An example of association between a file-tree with mobility classes.
Not all of the mobility classes can be considered to compute the dimention of the file tree
(i.e., the height of the file tree and the number of clusters at each depth of the tree) because
of the following reason. Redundancy of clusters can be created since the number of
advertisements of mobility classes may be the same or may not be significantly different.
Assume that there are mobility classes m1, m2 and m3 with advertisement volumes 3, 4 and
8 metadata. If m1, m2 and m3 are considered to compute the dimension of the file tree, a file
tree that looks like the one in Figure 5-7 will be resulted. Note that there are clusters
representing the same group of files in the depths 1 and 2; the clusters c11 and c21 as well as
the clusters c12 and c22 represent the same files. To avoid this kind of redundancies of
clusters, we propose to identify representative mobility classes that will be used to compute
the dimension of the file-tree.
Representative mobility classes are those mobility classes that show significant
differences in terms of advertisement volumes. Let β be a significance-factor such that β >
1. A mobility class mi is said to be significantly greater than to a mobility class mj (denoted
as mi > mj) if and only if
β≥−−
)()(
i
i
mvolumeadvmvolumeadv
Chapter 5 File Classification and Organization
117
m3
m2
m1
C0
C11 C12
C31 C33 C34C32 C35 C37C36
C21 C22 C23 C24
C12
C38
Figure 5-7: The redundancy created by considering all mobility classes
A mobility class mi is said to be significantly less than a mobility class mi (denoted as mi
< mj) if and only if mj > mi. The mobility classes mi and mj are called significantlly
different if and only if mi > mj or mj > mj.
Let nf be the number of sharable files. Let M be the set of all mobility classes considered
by a peer during information sharing. The set Mimp ⊂ M is called a set of representative
classes if and only if every m ∈ Mimp satisfies the following properties
[1] mi<mj or mj<mi, ∀mi,mj ∈ Mimp
[2] for ∀m ∈ M - Mimp, one of the following properties is satisfied
a. ∃mi∈ Mimp such that β<−−
)()(
mvolumeadvmvolumeadv i
b. β<− )(mvolumeadv
nf
[3] one of the following conditions hold true for ∀m ∈ Mimp
a. the mobility class is found in inside of the list, i.e., ∃mi, mj ∈ Mimp –{m}
such that mi<m<mj,
b. the mobility class is found at the end of the list, i.e., mi<m, ∀mi∈ Mimp –
{m} and β* adv-volume(m) ≤ nf,
Chapter 5 File Classification and Organization
118
c. the mobility class is found at the beginning of the list, i.e., m<mi, ∀mi ∈
Mimp –{m} and there is no mj ∈ M such that mj<m
Let us consider the mobility classes that are used to produce the file tree in Figure 5-7
(remember that m1, m2 and m3 have advertisement volumes 3, 4 and 8 metadata in the
example). Assume that nf is 16 and β is 2; m2 and m3 are representative mobility classes.
Algorithm 5-2 is used to calculate representative mobility classes as follows. As
described in line 2, all mobility classes in M that have an advertisement volume
significantly less than nf (the number of sharable files) are placed in a set named M’. The
algorithm, then, identifies, from M’, the mobility class having the maximum advertisement
volume as a representative class (step 1). Let’s call this mobility class as mcp. The
algorithm reinitialized M’ to contain mobility classes significantly less than mcp (step 2).
The algorithm repeats step 1 and 2 until M’ becomes an empty set.
Algorithm: representative mobility class computation Input: M, β,,nf M :list of mobility classes β :significance factor nf :the number of sharable files Output: Mimp Mimp :list of representative classes Begin
/*initialization*/ 1. Mimp= {∅}
/*identify mobility classes that have advertisement volumes significantly less than nf */ 2. M’={m| m∈M and adv-volume(m)*β < nf}
/*compute representative mobility classes*/ 3. While (M’!= ∅)
/* Step 1: identify a mobility class having the maximum number of adv-volume as a representative class */
4. Remove mcp∈ M’ such that adv-volume(mcp)≥adv-volume(m) ∀m∈ M’ 5. Mimp+={mcp}
/*Step 2: reinitialize M’ to contain mobility classes significantly less than to mcp */ 6. M’= {m|m∈M’ and m<mcp } 7. End while End Algorithm
Algorithm 5-2: representative mobility class computation
Chapter 5 File Classification and Organization
119
The condition Mimp = ∅ indicates that the number of sharable files is not significantly
different from the volume of advertisement attached with any of the mobility classes. In
this case, advertisement can be made by using metadata of all individual files; thus,
classification is not needed. If that is not the case, the height of the tree is |Mimp| and the
number of clusters at each depth i equals the advertisement volume attached with the ith
mobility class listed in Mimp.
5.4 Information Sharing Based on File Organization
5.4.1 Information Advertisement
As discussed in chapter 3, a data source can make advertisement by using the metadata
of every sharable file. As discussed in the beginning of this chapter, this kind of
advertisement will overload the environment with queries. In this chapter, we propose
advertising files by using descriptions of clusters that represents groups of sharable files.
The advertisement message can contain only clusters found at the shallowest or the
deepest level of a file-tree. As discussed in chapter 4, the current mobility class is used to
determine the volume of advertisement. As discussed in chapter 3, the overall demand of
the peers in the MANET view is used to determine the content of advertisements.
Files and clusters are mapped to the users’ interests in the overall demand according to
their reciprocal similarities. Let F(I) and C(I) be files and clusters matching the interest I.
A file f and a cluster c are placed in sets F(I) and C(I) respectively if and only if (1) c and f
are relevant to I and (2) for any interest Ij in the overall demand, c and f are more relevant
to I than to Ij. The relevance of files and clusters are computed according to their similarity
to the interest.
We propose to compute the content of advertisements by using Algorithm 5-3 according
to the interests of users, the mobility class of the MANET view and the arrangement of the
sharable files in the file-tree. Let m be a mobility class describing the current MANET
view and let Sod be the overall-demand of the peers in the MANET view. The data source
peer prepares advertisements of files using Algorithm 5-3 according to the overall demand
Chapter 5 File Classification and Organization
120
Sod and the advertisement volume attached with the mobility class m. The total volume of
advertisements with respect to the interests in Sod should be adv-Volume(m) and the sum of
the weight of the interests in Sod is 1; thus, the advertisement quota for the interest I in Sod,
denoted as N(I), is computed as:
N (I) = weight (I)*adv-Volume (m)
Let F be a set of files and Ck be a set of clusters found at the depth k of the file tree. Let
F(I) F and C⊆ ⊆k(I) Ck be sets of files and clusters matching the interest I; and ADV(I) be
an advertisement container for an interest I. For an empty interest Ie, i.e.,
Description(Ie)=∅, F(Ie) and Ck(Ie) are computed as follows.
• and { }
∪eod ISI
IF−∈
= )(-F )F(Ie
• { }∪
eod ISII
−∈
= )(C- C )(IC kkek
Let E be a set of files/clusters; the set Relevant(E, I, n) represents the n most relevant
(similar) elements of E with respect to the interest I, i.e., similarity(ei, I) ≥ similarity(ej, I)
for ∀ei ∈ Relevant(E, I, n), ej ∈ E- Relevant(E, I, n).
Algorithm 5-3 selects the metadata of files and clusters to be distributed in the
environment as follows. As indicated in lines 3 to 6, all metadata of files in F(I) are
selected if N(I) is large enough for advertising all sharable files. Otherwise, starting from
the leaves of the file tree, the algorithm searches a depth of the file tree where the number
of cluster at this depth is less than N(I) (lines 7-10).
Let us call this depth k. If the above search is unsuccessful, the metadata of the most
relevant clusters at depth 1 are placed in ADV(I) (lines 11-14). Otherwise, as described in
lines 15 to 22, the metadata of the most relevant clusters found from the depth k to the
depth h (the height of the tree) are placed in the set ADV(I) according to their position in
the file tree and their similarity with the interest I. After considering all the above clusters,
some metadata of individual files might be placed according to the available slots in
ADV(I) (lines 23-25).
Chapter 5 File Classification and Organization
121
Algorithm: Advertisement content determination Input: h, Sod, F(I) ∀I∈ Sod, Ck(I) for 0<k≤h and every I ∈ Sod h : height of the file-tree Sod : overall demand F(I) : files matching with the interest I Ck(I) : clusters matching an interest I and found at the depth k Output: Adv(I) for all I ∈ Sod Adv(I) : advertisement for every I ∈ Sod Begin 1. For each I ∈ Sod 2. ADV(I)=∅
/*select all metadata of files if N(I) is large enough to advertise them one by one*/ 3. If (N(I) ≥ |F(I)| ) 4. ADV(I)={metadata(f)| f∈ F(I)} 5. Exit 6. End If
/*search the depth where there is less than N(I) clusters*/ 7. k=h 8. While ((|N(I) ≤ |Ck (I)|) && (k>0)) 9. k-- 10. End while
/*if there is no depth where there is less than N(I) clusters, select some of the clusters at depth one and exit*/
11. If (k==0) 12. ADV(I)={metadata(c)|c∈Relevant(C1(I),I,N(I))} 13. Exit 14. End If
// select clusters according to their depth in the file tree 15. While((|Adv(I)|<N(I)) & (k≤ h)) 16. If (N(I)-|Adv(I)| ≥ |Ck (I)|) 17. ADV(I)={metadata(c)|c∈Ck(I)}U ADV(I) 18. Else 19. ADV(I)={metadata(c)|c∈Relevant(Ck(I),I,N(I)-|Adv(I)|)} U ADV(I) 20. End If 21. k++ 22. End while
//select some of the files if there are still free slots in ADV(I) 23. If(|Adv(I)|<N(I)) 24. ADV(I)={metadata(f)|f∈ Relevant(F(I),I,N(I)-|Adv(I)|)} U ADV(I) 25. End If 26. End for End Algorithm
Algorithm 5-3: Advertisement content determination
Chapter 5 File Classification and Organization
122
A source can make another advertisement after adv-period(m). In the meantime, the
information discovery method will try to adjust its knowledge about stay-times of peers
and the mobility class of the MANET-View. Moreover, in addition to the mobility class,
the volume of advertisement will be affected by Adv-usage(ps), the advertisement usage
factor of ps described in definition 4.3 (in chapter 4). Before sending the advertisements, a
data source asks the usage of advertisements of his/her neighbors and the volume of
advertisements14 that they will distribute in the next period.
Assume that peers Pn are direct neighbors. Let Adv-volume-total be the total volume of
advertisements that the peers in P will distribute in the next period. The advertisement
volume for ps ∈ Pn is zero if its advertisement usage-factor is zero. Otherwise, the
advertisement volume is calculated as follows.
) total-volume-Adv()(
)(∗
−−
=−∑∈ nPp
s
pusageAdvpusageAdvvolumeAdv
Considering the usage factor in the advertisement volume computation give more
chances to popular peers and minimize unnecessary advertisement produces by less
popular peers.
5.4.2 Information Discovery
In chapter 3, we have discussed query resolution according to information provisions of
users. In this chapter, we discuss resolving queries by using the received advertisements.
A query is resolved via an information discovery and an information delivery phases.
The information discovery phase is used to discover peers owning files matching with the
query while the information delivery phase is used to fetch the files.
Let F(q) be the files matching a user query q , which is expressed by a list of keywords,
and le C(q) be the clusters matching q. A file f and a cluster c are placed in F(q) and C(q) 14 As a peer computes mobility classes independently, they can have different volume of advertisements.
Chapter 5 File Classification and Organization
123
respectively if they are relevant to the query q. A file fi is relevant to a query q if and only
if fi is similar to the query q. A file fi is more relevant to interest q than a file fk if
Similarity(fi,q) > Similarity(fk,q)
Clusters are compared with queries in the same way.
Let owners(e) be the set of peers owning an element e (which represents a cluster/file).
Let downloadTime(f) be the time needed to download a file f and let disAnddelTime(c) be
the average time needed to discover and deliver a file grouped under a cluster c.
downloadTime(f) is estimated from the attribute “FileSize” in the metadata of the file.
disAnddelTime(c) is estimated by using the average size of files represented by the cluster
c (i.e., by using the attribute “AvgSizeFile” in the metadata of a cluster).
Let p be the data-requester posing the query q. As discussed in chapter 4.5.1, stay-
time(pi,pj) is the time that peer pi and pj stays together. For a set of elements E, let
Relevant(E,k,q) be a set of elements in E containing relevant elements satisfying the
following properties
• |Relevant(E,k,q) | = max (|E|,k)
• similarity(ei,q) ≥ similarity(ej,q) ∀ei ∈ Relevant(E,k,q) and ∀ej
∈ E - Relevant(E,k,q)
Algorithm 5-4 is used to prepare the messages that can be used to discover or to deliver
files for the query q from F(q) and C(q) respectively. More precisely, this algorithm
prepares two sets Delivery(q) and Discovery(q). The set Delivery(q) contains tuples of the
form (p, f) where p is the peer owning a file f that matches the query q. Discovery(q)
contains tuples of the form (p, q) where p is a peer owning a cluster matching with q.
Algorithm 5-4 first removes files from F(q) if it is not possible to deliver these files (line
1). It also removes clusters from C(q) if it is not possible to discover files grouped under
these clusters (line 2). The algorithm, then, selects some of the files from F(q) according to
their relevance to q and to the required number of files (lines 4 to 7). The algorithm ends
without preparing the discovery message if enough advertisements about files are found. In
Chapter 5 File Classification and Organization
124
the case that |Delivery(q)| < n (n is the number of responses displayed to the user), the
owners the most relevant clusters in C(q) are selected as potential sources of the files
matching with q and discovery messages are prepared to them (lines 9 to 15).
For each tuple (p,f), the metadata of the file f is displayed to the user. If the user approves
the downloading of the file, the file will be delivered from the peer p. For each tuple (p,q)
in Discovery(q), the query q is sent to the peer p. A peer receives the query q searches a
file matching with the query. If the search is successful, the peer sends the description of
the file to the requester peer. The requester peer may decide to download the file from the
peer p. We will discuss the delivery of file in the next chapter.
Algorithm: Prepare delivery and discovery messages Input: F(q), C(q), n, pr F(q): files matching with the query q C(q): clusters matching with the query q n: maximum number of files searched for a query pr: requester peer Output: Discovery(q), Delivery(q) Discovery(q): set of tuple (p, q) where p is a peer owning a cluster matching with q Delivery(q): set of tuple (p,f) where p is a peer owning a file f matching a query q Begin 1. Remove any f in F(q) such that stay-time(pr,pi) < downloadTime(f) for all pi ∈ owners(f) 2. Remove any c in C(q) such that stay-time(pr,pi)< disAnddelTime(c) for all pi € owners(c) 3. //prepare discovery messages 4. For all f ∈ Relevant(F(q),n, q) 5. For all p ∈ owners(f) 6. Put (p,f) in Delivery(q) 7. End For 8. End For 9. If (|Delivery(q) |<n) 10. For all c ∈ Relevant(C(q), n-Delivery(q),q) 11. For all pi ∈ owners(c) 12. Put (pi, q) in Discovery(q) 13. End for 14. End For 15. End If End Algorithm
Algorithm 5-4: Prepare delivery and discovery messages
Chapter 5 File Classification and Organization
125
5.5 Discussion
In this thesis, we have proposed to organize files in a file tree in order to facilitate the file
advertisement process. The file tree is formed in a bottom up fashion. First, files are
classified into clusters. The clusters are then repeatedly classified into other clusters until a
file tree with a required dimension is obtained. The dimension of the tree is computed
according to the number of representative mobility classes so that determination of the
content of advertisement is simplified.
We have proposed an algorithm, named k-means*, to classify files and clusters. This
algorithm is derived from the k-means classification algorithm [107,108]. The difference
between this algorithm and k-means is the selection of the group head, which is the
centroid of a cluster. The modification is needed due to the following weaknesses of k-
means.
• As group-heads are initialized by elements that may represent nothing, it may
happen in k-means that a cluster contains nothing [109].
• When the number of files/clusters is small, the initial grouping will determine the
resulted clusters significantly [110].
Let consider the content-based classification approach. K-means takes group-heads from
the vector space randomly. As a result, a group head may be selected in such a way that all
files/clusters to be grouped are less similar to this group head than to the other group-
heads. In this case, a cluster represented by this group-head will contain nothing. Let us
consider the example displayed in Figure 5-8. Assume k-means selects v1, v2 and v3 as
group heads; some of the files are similar to v1 and the others to v2; none of the files is
more similar to v3 than to v1 or v2; thus, the cluster headed by v3 will contain nothing. The
algorithm k*-means resolves this problem by initializing group-heads with elements that
represent files/clusters to be grouped. As a result, in k-means*, a cluster is initialized in
such a way that it will contain at least one element.
Chapter 5 File Classification and Organization
126
f8
f9
f7 f6 f4 f5
f3 f2 f1
v2
v3
v1 Group head
File
Figure 5-8: A possible result of k-means classification
In k-means, the initial grouping may determine content of the resulted clusters. For the
example displayed in Figure 5-8, the vector headed by v3 contains always nothing
regardless the number of times that k-means is iterated. As a result, the classification may
not have semantic meaning when the number of files/clusters is small since the cluster-
heads are initially selected randomly in k-means. In order to resolve the mentioned
problem, our algorithm initializes group-heads with dissimilar elements.
5.6 Conclusion
In this chapter, we have discussed the organization of files in a file tree in such a way that
the dimension of the file tree is computed in the function of the mobility classes that are
considered during information sharing. The data sources can determine the content of
advertisement from the file tree according to the mobility classes of the MANET view. We
demonstrate the application of a file-tree in the file discovery process.
127
Chapter 6 Implementation and Evaluation
In the previous three chapters, we have proposed and discussed methods used to conduct
information discovery in MANETs according to the interests and the stay-times of users.
Based on these approaches, we propose a self-adaptive information sharing middleware
called SAMi. SAMi is designed to fulfill the requirements described in chapter one. In this
chapter, we present the design, the implementation and the evaluation of this middleware.
The chapter has been designed according to our research work presented in the previous
chapters and those presented in the International Conference on Wireless Applications and
Computing [111,112] and in the fourth IEEE International Conference on Pervasive
Services (ICPS’07)[113].
The chapter is organized as follows. The design of the middleware is presented in section
6.1. Section 6.2 discusses the implementation of SAMi in simulated and real environments.
The evaluation of SAMi is covered in section 6.3. Section 6.4 discusses the challenges
encountered during the design and the implementation of the middleware. Finally, we
conclude the chapter in section 6.5.
Chapter 6: Implementation and evaluation
128
6.1 SAMi: a Self-Adaptive Middleware
In this thesis, we propose a self-adaptive middleware called SAMi that works according
to the following requirements specified in chapter 1.
• Pervasiveness: nomadic users should be allowed to share information anywhere,
anytime and by using any device.
• Mobility-awareness: the advertisement policy should be determined according to the
dynamicity of the environment, which is described by the mobility patterns of users.
• Interest-awareness: sharable files should be selected according to the users’ interests
to receive information and the users’ interests to provide information should be
considered during query resolution.
• High-level semantics: sharable files should be advertised at high level according to
their similarities.
• Context-aware content delivery: file delivery should be performed according to the
context of users and their environments.
• Social awareness: sharable files should be selected according to the social networks
of the users.
• Data dissemination: advertisements and queries should be disseminated according to
the users’ interests.
SAMi is a pure peer-to-peer middleware. Every device participating in information
sharing is required to install SAMi. However, thin devices can be helped by heavy
weighted devices to perform complex operations.
Figure 6-1 displays the architecture of SAMi. The main input of the middleware is a
query. Personal information including the basic information (age, name, address, etc), the
agendas, the habits, the states (e.g., busy) and the interests of users can be accepted as
inputs.
Chapter 6: Implementation and evaluation
129
Advertisement data-store
Local repository
Personal data-store
Rule base MANET-Viewdata store
Adv
ertis
emen
t Man
agem
ent
File
Man
ager
Context Manager
Agenda
Habit
User basic data
File Discovery
File Delivery
File Adaptation
Query
State
Interest
Figure 6-1: Architecture of SAMi
SAMi stores important data, which permits to perform information sharing efficiently, in
four data repositories; namely local repository, advertisement data-store, MANET View
data-store and rule base. A device can contain zero or more data-stores.
Local repository and advertisement data-store contain descriptions of sharable files in
the local machine and in the vicinity respectively. In addition to the descriptions of
sharable files, the advertisement data-store contains platform and service advertisements.
The MANET view data store contains historical information about sharing activities. It
contains the sharing statistics and the mobility classes discussed in chapter 4. It also
contains the queries received in the history as well as the information demands and the
information provisions of users.
Rule base contains the association rules that are used to associate statistically the users’
context to their interests and to the mobility classes.
The middleware is composed of three modules; namely, context manager, advertisement
manager and file manager. A device can contain one or more modules. Every device that
participates in information sharing is required to have the file manager module.
Chapter 6: Implementation and evaluation
130
Context manager determines the MANET-views’ mobility classes and the users’
interests from their contexts by using association rules. It also determines the users’
information needs by analyzing their agenda, habits and historical queries.
File manager, the core of SAMi, carries out file management functionalities, which
includes searching, delivering and classifying files. The file discovery, the file delivery and
the file adaptation modules perform information sharing activities. The file discovery
module is responsible for searching for information sources; the file delivery module is
used to download files; finally, file adaptation is used to help the file delivery module to
fetch the file according to the context of users and their profiles.
Advertisement Manager is responsible to make other peers aware of the sharable files
stored in the device of a data source. It determines the content and the distribution of
advertisements according to the mobility class of the MANET view and the interests of the
peers participating in the MANET.
6.1.1 Design goals
The design goals of the SAMi middleware are listed below.
Flexibility: The system should be easy to be used by any person with minimum effort. It
should also be adaptable to the capacity of mobile devices. This flexibility is achieved by
making the middleware to use interfaces of other well-known messengers like yahoo
messenger. It should also provide its own user interface when it is not possible to use such
messengers.
Discovery Optimization: The system should decrease the time needed to search a file.
The search time can be decreased by optimizing the usage of the push type of information
discovery approach.
Fairness: All peers in the network should equally profit from the information exchange.
This can be performed by fixing a quota on the volume of information to be advertised.
Chapter 6: Implementation and evaluation
131
Automatic computing: The interests of users and mobility classes of MANET views
should be computed automatically as much as possible. Moreover, to facilitate information
sharing, association rules should be produced for estimating the mentioned profile
information.
Scalability: the middleware should work regardless of the number of users and the
number of sharable files in a MANET.
6.1.2 User Profile
A user profile is his/her representation in the virtual world. It describes persistent and
context dependent information about a user. Persistent personal information includes age,
birthday and sex. Context dependent personal information includes habit, preference,
agendas and so on.
The agenda and the habit of a user are used to determine the activities of the user. A habit
indicates repetitive activities of a user during a certain context. For example, while
travelling in a bus, a user may have the habit of reading news and listening to music. An
agenda describes the planned activities of a user. In an agenda, a user can specify the
documents that she/he needs to accomplish the planed activities. Examples of user agendas
and user habits are given in Figure 6-2.
A preference of a user describes the format of the information that he/she is interested in.
For instance, a user may prefer audio data during driving. Preferences of users can vary
according to the spatial or the temporal context of users.
The user profile can describe the information demands and provisions of a user. As
discussed in the previous chapter, the information demand describes the interests of a user
to receive information and the information provision describes his/her interests to provide
information.
Chapter 6: Implementation and evaluation
132
A user profile can indicate the social groups of a user. A user can be a member of
different groups with respect to professional activities, social relationships and hobbies as
well as their information sharing habits.
User Agenda Start End event Activity Required Documents 10A.M. 12 P.M. meeting Strategic plan
preparation Strategic plan preparation
12P.M. 1 P.M. lunch - 1P.M. 3 P.M meeting Discussing with business
persons Efficient way of chairing a meeting How to deal with business persons
User Habit Activity When Time needed
habit
Shopping Week end 30 minutes - Journey in train Friday 2 hours Listening music Talking with friend Night 10 minutes Exchange jokes
Figure 6-2: Examples of user agenda and habits
6.1.3 Context Management Module
Dey [114] defines context as “any information that can be used to characterize the
situation of an entity. An entity is a user, a place, or a physical or computational object
that is considered relevant to the interaction between a user and an application, including
the user and application themselves.” According to Winograd [115], “something is context
because of the way it is used in interpretation, not due to its inherent properties. The
voltage on the power lines is a context if there is some action by the user and/or computer
whose interpretation is dependant on it, but otherwise is just part of the environment.”
From above two definitions, Dejene Ejgu [116], a former Ph.D. student in our research
team, describes context as “an operational term whose definition depends on the intention
of the operations involved on an entity at a particular time and space rather than the
inherent characteristics of the entities and the operations themselves”.
Chapter 6: Implementation and evaluation
133
The concept “sharing context”, defined in chapter 3, is based on definition of “context”.
A sharing context describes the situation where the user is willing to provide files to others
in the vicinity. There are two types of sharing context: abstract and actual. An abstract
sharing context is a sharing context which is manually specified by a user to describe when
and where he allows others to download files from his machine. An actual sharing context
is derived from an abstract sharing context by considering the actual time and place in
which data were shared.
In the scenario presented in chapter 1, Pascal has a habit of sharing information in a class
room. Assume that he specifies (“Class-Room”, ø) as an abstract sharing context, where ø
denotes any time. Pascal is interconnected with other students via a MANET in Room 331
where a course is going on. According to the course schedule, the course will be conduced
from 9 AM to 10 AM. Therefore, (Room-331, [9 AM, 10 AM]) is the actual context
derived from the abstract context (“Class-Room”, ø).
As displayed in Figure 6-3, the context manager module uses the RAID-Action Engine
proposed by Dejene Ejigu [116] to interpret sharing contexts. The engine uses the HCom
model proposed also by Dejene [[116] to manage the context semantics and the context
data. The RAID-engine uses the Jena reasoner [117] to produce actions, which are used to
identify the context dependent personal profile, the mobility classes of MANET views and
the information needs of the user.
SAMi identifies the mobility class of a MANET view as follows. RAID-Action Engine
identifies an abstract context that matches with the actual context, which is accepted as an
input for a data source peer. Let us refer the actual context as cA and let us refer the data
source peer as p. The RAID-engine determines, then, the set of mobility classes defined by
the data source peer for the abstract sharing context. Let’s refer this set as M(p,cA). A
mobility class is selected from M(p,cA) by using the association rules and the actual
sharing context of the data source.
Chapter 6: Implementation and evaluation
134
Context Manager
RAID-Action Engine
Rule mining
Input
User agenda
Actual sharing Context
Output
context dependent personal profile
Mobility Class
Information needs
User Habit
Personal data-store
MANET ViewData store
Rule Base
Figure 6-3: Context management in SAMi
Indeed, as discussed in section 4.4.2, mobility classes are determined by using rules
stored in the rule base data-store. For instance, the rule <context =
(Restaurant,∅)> <mobility class = m3> indicates that the MANETs observed in a
restaurant at any time15 are described by a mobility class m3.
SAMi also uses the RAID-Action Engine to identify the interests of the user by analyzing
association rules. The association rules associate the contexts of users with their interests.
They can also be used to associate social networks of users with their interests.
As discussed in chapter 3, habit rules are used to determine the used to determine the
information provisions of users. The following rule may be used to determine the
information provision of a user in a bus.
15 ∅ is used to represent any time.
Chapter 6: Implementation and evaluation
135
<context = (Bus, ∅)> <information provision = {({Football},0.5}, ({news},0.50)}>
Different social groups presented in the MANET view can be used to determine the
information provision of a user. The following rules may be used to determine the
information provisions of users in these groups.
<Group= colleagues > <information provision = {({research}, 0.7), ({news},0.3)}>
<Group=friends > <information provision = {({news}, 0.9), ({music}, 0.10)}>
The RAID-Action Engine is also used to identify the information need of users from their
context dependent personal profile. As described in the previous section, the context
dependent personal profile contains the user agenda, habits, preferences and interests.
The information needs of a user can be determined from his agenda and habits. Assume
that Pascal has the habit of listening music during a long journey with preference for
Whitney Houston’s songs. SAMi starts searching the mentioned songs when he plans to
travel to another country.
A user can describe the information that he needs to perform the activities in the agenda;
for example, he can specify that the documents talking about “How to deal with business
persons” are needed to perform the agenda of a meeting with businesspersons. The
information needs of a user are determined by using his interests. For instance, sport news
should be searched for a user interested to get such news.
To sum up, the context manager module determines the profile of a MANET and of the
users participating in the network as well as their information needs. The module works by
using habit rules. Habit rules with respect to mobility classes and user interests are
produced by the rule-mining component of the context manager module. This component
works as described in chapter 3 and 4.
6.1.4 Information Sharing in SAMi
Chapter 6: Implementation and evaluation
136
In SAMi, information discovery is performed via two phases: information discovery and
information delivery. The information discovery phase is used to discover files while the
information delivery is used to fetch the selected file.
The information discovery phase is guided by an advertisement policy. As discussed in
chapter 3, an advertisement policy is used to determine the volume of advertisement, its
radius and its frequency.
An advertisement policy can be determined from the mobility class that characterizes the
MANET view. Mobility classes are identified from the stay-time of the users and the
association rules. If it is impossible to determine the mobility class, an inactive mobility
class is considered as default class.
The information demands of users are used to determine the files to be advertised. The
information demands of users are identified by habit rules. If it is not possible to determine
the information demand of a user, an empty sharing interest is taken as default demand.
The advertisement manager module of SAMi advertises descriptions of files (metadata of
files and clusters) to other peers in the vicinity by using the algorithm proposed in section
5.3.2. A data requester can identify the files that he is looking for from the advertisements.
A data requester peer can also discover files by distributing queries as discussed in chapter
3.
After the information discovery phase is completed, the information delivery phase
starts. The purpose of this phase is to select one or more information sources to deliver a
file. Information delivery is performed as follows:
• SAMi identifies information-sources that can deliver the whole file. This kind of a
source is identified by analyzing the file size with respect to the time that the source
and the peer stay together16. If there are several such types of information sources,
SAMi selects an information source according to how far it is and how long it will stay
around.
16 Remember that the stay-time of two peers is computed by taking the intermediate peers into consideration.
Chapter 6: Implementation and evaluation
137
• If the search in the above step is not successful, SAMi searches a combination of peers
(p1, p2, …, pk) such that the each information source pi delivers a portion of the file
called sfi so that the merge of (sf1, sf2, …, sfk) gives the required file.
Faults can occur in the information discovery and the information delivery phases. The
requester peer estimates the maximum time required to discover files for a query. The
information discovery for the query fails if the requester peer does not get the required
number of responses for the distributed discovery messages in the estimated time. In that
case, discovery messages are sent to other peers.
When a peer requests an information source to deliver a file or a portion of it, SAMi
estimates the maximum time needed to get the required file/portion. If the peer does not get
the information within that estimated time, the middleware concludes that the delivery of
the file or the portion of the file is failed. In this case, it searches if there are other
information sources to deliver the required file or the portion of the file.
In the above paragraphs, we have discussed about a normal information delivery process,
i.e., when the following conditions hold true:
1. The context of the requester peer matches the format of a file that he is looking for.
2. The peers owning the required file will stay connected with the requester peer until
the delivery process is completed.
In reality, these conditions may not always hold true. Therefore, we apply offline
delivery and adaptation of content in the case that these conditions are not satisfied.
In pervasive computing environments, it is possible that a user discovers an important
file but he is not able to download the file. Offline information delivery can be performed
if a requester peer does not need the information right away. The delivery can be
performed offline, by using email for example in the following two cases:
• the source and the requester peers know each other
Chapter 6: Implementation and evaluation
138
• the requester peer can not wait until the download is completed but the source
peer can deliver the file to another peer such that this peer and the requester peer
know each other
In the first case, the source and the target exchange the file later. In the second case, the
source delivers the file to an intermediate peer in a MANET and this peer will deliver the
file to the requester peer via offline delivery.
Adaptation of content is a solution when the format of the information does not match
the context of the user. Assume that Pascal is in the airport waiting for delayed flight and
looks for some jokes to pass the time. He finds the joke in a text format but he can not read
the text since he is taking care of his baby. Here, the middleware has to convert the text to
audio.
SAMi uses the ConAMi system proposed by Yaser Fawaz, a PhD student in our research
team, to perform the content adaptation process [112,113,118]. ConAMi determines the
adaptation process by comparing the format of the localized file with the format that fits
the context of the user and his environment. In ConAMi, the adaptation process is divided
into simple adaptation tasks in such a way that each of these tasks can be performed by a
single service. Figure 6-4 shows an example of an adaptation process.
TextToAudioConversionTask TextSummarizationTask TextTranslationTask
Figure 6-4: Example of adaptation process
ConAMi performs content adaptation as follows. It identifies the tasks involved in the
adaptation process. It, then, searches services that execute the tasks in the adaptation
process. Hereafter, it constructs the content adaptation tree, which shows the best service
Chapter 6: Implementation and evaluation
139
composition plans. The services in the optimal path of the tree are used to execute the
required content adaptation. More precisely, content adaptation in SAMi is implemented as
in Figure 6-5.
MANET
Adaptation services
File to be adapted
Context dependent personal profile
AdaptationOntology
Environmental context
Adaptation rules
File adaptation Adapted File
Figure 6-5: Implementation of ConAMi by the file adaptation module
The file adaptation module accepts as inputs: the file to be adapted, the environmental
context (e.g., screen dimensions, memory size, CPU speed, darkness, noisiness,
speechlessness and bandwidth), the context dependent personal profile (e.g., user’s
preferences), the adaptation rules, and the adaptation ontology. Adaptation ontology is
used to describe the entities involved in the adaptation process such as device, user,
network, location, adaptation service and data. The adaptation ontology is built based on
the EHRAM model proposed by Dejene Ejigu [116]. Adaptation Rules are a set of pre-
defined rules identified by Yaser Fawaz [116]. They are used to determine the tasks
involved in the adaptation process according to the user’s preferences, the device
capabilities, and the network bandwidth [118].
The file manager module manages, analyzes and performs reasoning on the input data in
order to determine the tasks needed to perform content adaptation on behalf of the
Chapter 6: Implementation and evaluation
140
requester peer. The identified tasks are, then, used to select the adaptation services to be
used to perform the content adaptation process
6.1.5 Deployment
Some of the SAMi’s functionalities (e.g., classification of information and rule mining)
are too expensive to be performed in a MANET. Fortunately, it is enough to perform these
activities occasionally by using heavy weighted devices. Therefore, we decompose the
middleware into two important components: SAMi-basic and SAMi-ext. SAMi-basic
performs the basic functionalities of the middleware that are needed to perform a file
sharing activity in a MANET. SAMi-ext performs functionalities that are expensive but are
not required to be performed frequently. An example deployment of SAMi is displayed in
Figure 6-6.
As displayed in Figure 6-6, SAMi-basic is deployed in each mobile device. There are
different approaches to deploy SAMi-ext. It can be deployed in servers, which are
accessible from the Internet. This solution, however, will make the middleware inflexible
and highly dependent on the Internet. To resolve the above problem, we propose that a user
installs the SAMi-ext component in a PC equipped with wireless network so that he/she
accesses SAMi-ext installed in his/her personal PCs at home and access the one installed at
remote servers in other places.
In order to facilitate the performance of SAMi-ext, this component can use other services
to perform advanced functionalities. Rule mining and classification of files are
implemented as services.
Chapter 6: Implementation and evaluation
141
MANET
Pascal David
Internet
Pascal
Anne
Social Networking
Rule mining
File Classification
Anne
SAMi-basic
SAMi-basic
SAMi-Basic
SAMi-basic SAMi-ext
SAMi-ext
SAMi-basic
Figure 6-6: SAMi deployement
As displayed in Figure 6-7, SAMi-basic is composed of four sub-systems: SAMi-core,
SAMi-GUI, SAMi-thin and SAMi-adaptor.
• SAMi-core is used to perform the fundamental information sharing activities (file
advertisement, file discovery and file delivery).
• SAMi-GUI provides a graphical user interface to accept inputs into the
middleware and to display its outputs.
• SAMi-adaptor enables SAMi to work with well-known messengers like Yahoo
messenger and Google-Talk.
Chapter 6: Implementation and evaluation
142
• SAMi-thin allows thin devices with scarce resources to use SAMi. This
component consists of the functionalities used to discover files (via querying
neighborhood) and deliver files.
SAMi-adaptor
SAMi-GUI
11
1
1
11 SAMi-thin
1
1
SAMi-ext
SAMi-basic
SAMi-core
Figure 6-7: Component diagram of SAMi
6.1.6 Core implementation Classes of SAMi
In this section, we discuss the main classes of SAMi-core and SAMi-ext. The remaining
classes and other components are presented in annex. As shown in Figure 6-8, SAMi-core
is composed of the classes: Context-Manager, Adv-Manager, Info-Manager, Rule-
Manipulator and Env-Behavior. Context-Manager is used to capture the user and the
environmental contexts as well as to determine the profile of the user and his/her
neighbors. Adv-Manager prepares advertisement messages about files and platform (i.e.,
descriptions of devices and adaptation services). Info-Manager is used to fetch and provide
information from and to the devices in the vicinity. Rule-Manipulator manages the rules
that are used to determine the users’ interests and the MANET-views’ mobility classes.
Env-Behavior is used to compute the possible mobility classes that a peer uses during
information sharing.
Chapter 6: Implementation and evaluation
143
Figure 6-8: Core classes and their relationships
The important classes of the SAMi-ext component are displayed in Figure 6-9 and Figure
6-10. Association rules, which we have named habit rules, are extracted by analyzing
historical data.
The class diagram displayed in the Figure 6-9 shows the classes that are involved in
analyzing historical data. Historical data are stored in the form of sharing-statistics (the
data structure defined in chapter 4) and sharing-histories, which is a structure that contains
the queries, the advertisements and the information demand and information provision of a
user in a given context. Mobility-Manager computes mobility-classes (chapter 3) from
sharing statistics (managed by the Sharing-Statistics class). Habit rules with respect to
mobility classes are identified by analyzing the same data. Interest-Manager extracts the
user’s sharing-interest, which can be either an information demand or an information
provision, by using sharing-histories managed by the sharing-history class. The mining of
rules with respect to sharing interest is done by using the same historical data.
Chapter 6: Implementation and evaluation
144
Figure 6-9: Class diagram to manage historical data
As shown in Figure 6-10, the hierarchical classification algorithm implements the
Classification-Manager interface. In this thesis, we uses k-means* repeatedly in order to
get a file tree of a required dimension (the height of the tree and the number of cluster at
each depth). The Hierarchical-k-mean* class performs this classification as discussed in
chapter 5.
Figure 6-10: Classes for information classification
6.2 Implementation
6.2.1 SAMi over a Simulated Environment
We have developed a test-bed depicted in Figure 6-11 to simulate a MANET and to
implement the proposed middleware. This test-bed has been designed to output the rate of
Chapter 6: Implementation and evaluation
145
file delivery of the SAMi middleware. The rate of file delivery expresses the number of
delivered files with respect to the number of files that has been requested
Input
Simulation Parameters
Test Bed
Output Rate of delivery
Application Linker
Connectivity Manager Mobility Manager
Info-Agent Files
Mobility Model
Resources
Messenger
Mobile Node
File-Requests
Network Characteristi
Figure 6-11: A Test bed to simulate a MANET
The test-bed accepts the following inputs: Resources, Files, File-Requests, Simulation-
Parameters, Mobility Model and Network Characteristics.
Files and File-Requests are the most important inputs of the middleware. A file is
represented by its file-size (size of the file), file-ID (a number identifying the file uniquely)
and metadata-size (size of the file’s metadata). The users’ information-needs are
represented by file-requests. A file request contains the request time, the ID of the
requested file and the size of the request-message.
The Resources parameter presently allows specifying only the memory capacities of the
devices involved in the MANET. However, the test bed is open to include other important
resources like the CPU power.
Chapter 6: Implementation and evaluation
146
Simulation Parameters consist of the area coverage of the MANET and the duration of
the simulation. Mobility Model is the description of the mobility patterns of peers.
Currently, the test-bed implements a random-way-point distribution model to determine
the distribution of the peers and their movement patterns; thus, it only accepts the
maximum and the minimum values for the speed and the pause time of the peers.
Nevertheless, the test-bed can be easily modified to consider other mobility models.
Network Characteristics represent the characteristics of the network technology used to
connect peers in a MANET. The test-bed assumes peers are equipped with the same
network technology. Thus, it accepts the bandwidth and the line of sight of the network
technology.
The test bed is composed of 6 modules: Info-Agent, Mobile Node, Messenger,
Connectivity Manager, Mobility Manager and Application Linker. Info-Agent and Mobile
node modules implement the SAMi middleware and simulate a mobile peer respectively.
The other four modules are used to simulate a MANET. Messenger is responsible for
messages transactions. Connectivity Manager is in charge of checking the connectivity
between two peers according to the characteristics of the communication technology.
Mobility Manager is responsible to change the location context of a peer according to the
mobility model. Application Linker is responsible to invoke Info-Agent when
internal/external events occur.
6.2.2 Application of SAMi in Photo Sharing and Annotation
A prototype has been developed to illustrate the application of the SAMi middleware in
photo sharing and annotation in MANETs. J2ME and Java were used to implement SAMi
on a standard PC and a mobile phone respectively.
Chapter 6: Implementation and evaluation
147
The prototype was deployed on top of Sun Wireless Toolkit (an emulation environment),
Sony Ericson w880i and w910i mobile phones, a laptop with 2 GHz processing power and
a desktop computer 3 GHz processing power.
A simple data structure has been used to represent metadata in the local repository of the
mobile phones to allow fast processing and to avoid extra memory usage. Figure 6-12
displays an example representation of metadata in the local repository of a mobile phone.
Thus, the photo described in Figure 6-12 has an ID F0083blueD333, described by the
keywords campus and Pascal, taken at the place called Part-Dieu and on 10/04/2010
(which is equals to 20100413155829).
F0083blueD333|campus-Pascal| Part Dieu | 20100413155829
photo
Figure 6-12: Examples of representation of metadata in local repository
In the prototype, a user can exploit the system to browse photos in his/her phone and
other phones in the vicinity according to the directory structure or their organization in a
file tree as displayed in Figure 6-13 and Figure 6-14.
Chapter 6: Implementation and evaluation
148
Figure 6-13: Browsing photo by their directory organization
The first message that a peer exchange with a neighbor is ‘hello message’ containing the
interests of the peer to receive and provide information. When a peer joins a network, it
searches Bluetooth enabled devices and sends ‘hello message’ to them. The peers
accepting the message try to discover the sender device and welcome the new neighbor by
sending hello. After exchanging ‘hello messages’, peers advertise their sharable files.
A user can download the advertised files and request detailed information about the
advertised clusters. A peer sends metadata of some of the files under the cluster for which
a detail request was received. The user can ask again more detailed information about the
cluster to obtain the metadata of other files in the cluster.
Chapter 6: Implementation and evaluation
149
Figure 6-14: Browsing photos by their organization in a file-tree
If a user does not find the file that he/she is looking for, he/she can search the file by
using a query as displayed in Figure 6-15. The prototype applies a simple keyword-
matching algorithm to compare queries and files.
Figure 6-15: Querying
Chapter 6: Implementation and evaluation
150
As annotation of photo is a very important activity in photo management, the prototype
allows users to annotate the photo in their device while browsing. As shown in Figure
6-16, users can perform annotations in collaboration.
Figure 6-16: collaboration during photo annotation
6.3 Evaluation
In this section, we present the experimentations made to evaluate the SAMi middleware.
A PC with 3 GHz of processing power has been used to evaluate the data delivery of the
middleware. Its main functionalities are tested on top of the same PC and a Sony Ericson
w890i mobile phone.
6.3.1 Data Delivery Rate
The middleware was tested by fixing the simulation area (i.e., the area over which
devices involved in a MANET are distributed) and the simulation time (i.e., the length of
time that the MANET is valid).
Chapter 6: Implementation and evaluation
151
We have assumed that the interests of users are unknown (i.e., information-demands and
information-provisions of users are empty sharing interests). Table 6-1 shows the input
parameters of the test-bed during the experimentation.
Table 6-1: The inputs of the test-bed
Parameters Values
Simulation area 33 meters by 33 meters
Simulation time (STime) 10800 second
Network bandwidth 1 Mbps
Line of sight 10 meters
Overall demand /
Overall provision
{(Ø,1)}
Number of files 10*Number of peers
Number of replication for
each file
Random (0,8)
Request Time (RT) Random (0, STime)
Experiment one Random(RT+1200s, STime) Deadline for file delivery
Experiment two Random(RT + 2400s, STime)
Number of file request per a
peer
Random (0, number of files)
Metadata size Random(1KB, 4KB)
Storage capacity of a peer Random(256 MB, 160 GB)
Speed Random( 1m/s, 45m/s)
We have collected 557 abstracts of research papers collected in the domain of information
retrieval, multimedia, pervasive computing, GIS and e-learning. These abstracts are used
as sharable files that are distributed to the peers randomly. Two or more peers might own
identical files, i.e., the files are replicated. A peer, however, does not keep the files
received from other peers as sharable files. We set the network characteristics by keeping
Chapter 6: Implementation and evaluation
152
Bluetooth in mind. The minimum speed is set by considering the peers are walking; while
the maximum speed is set by considering that peers are using some transportation means.
As described in Table 6-2, three types of mobile ad-hoc environments (mobility classes):
highly dynamic, dynamic and moderate were considered.
Table 6-2: Types of Environments
Pause time Environment
[0,5) Highly dynamic
[5,10) Dynamic
[10,∞) Moderate
We made two experiments for each environment by changing the number of peers from
10 to 110. Figure 6-17 and Figure 6-18 show the result of experiment one and two
respectively. The difference between the two experiments lies in the deadline of file
delivery.
All requested files could not be delivered because of one of the three reasons:
1. the requested file might be stored in a device which was far from the requester peer.
2. the information source might disappear before the file was completely delivered and
the information source and the requester peers didn’t meet again.
3. the deadline of the file delivery was reached before it was completely delivered.
As it is observed in Figure 6-17 and Figure 6-18, the rate of delivery, i.e., the number of
files delivered out of the number of files requested, shows similar patterns regardless of the
environmental changes. The balance between the rates of deliveries is achieved by the fact
that the middleware uses three or more data sources to deliver a file in the case where a
single peer cannot deliver the whole file. Moreover, different portions of a file can be
delivered in different times. The changes of the advertisement period also play a role on
creating a balance in the rate of delivery. The changes of the advertisement period make
Chapter 6: Implementation and evaluation
153
peers to have knowledge on the information around them by imposing a minimum
overhead on the bandwidth.
00,10,20,30,40,50,60,70,80,9
10 20 30 40 50 60 70 80 90 100 110
Number of Peers
Rat
e of
Del
iver
y
highly dynamic dynamic moderate
Figure 6-17: Deliverability of files for experiment one
00,10,20,30,40,50,60,70,80,9
1
10 20 30 40 50 60 70 80 90 100 110
Number of Peers
Rat
e of
Del
iver
y
highly dynamic dynamic moderate
Figure 6-18: Deliverability of files for experiment two
Chapter 6: Implementation and evaluation
154
6.3.2 Interest Awareness
We used the descriptions of the photos in [119] as queries to test the algorithm described
in section 3.4. We have made evaluations to measure the execution time of the proposed
method to identify the users’ interests. Constants displayed in Table 6-3 were used during
the evaluation of the interest extraction algorithm presented in section 3.4.1. We have used
the algorithm to produce an information demand from the list of queries.
We identify the minimum weight of an interest using the following objective. A sharing-
interest is designed to contain at most 5 interests so that the effort required to specify
interests is minimized in a mobile phone. This indicate that 0.2 (1/5) is the minimum
weight of an interest. For a nomadic user equipped with a mobile, it is not simple to
specify several keywords in interests; thus, we limit the number of keywords in an interest
to 5. We set the minimum similarity value indicating the similarity of interests to 0.4,
which indicate that two interests are similar if they have at least two keywords in common.
We set the minimum cosine similarity 0.4, which is a little bit less than 0.5. Note that the
cosine value 0 indicates that the interests are totally different and the cosine value 1
indicates that they are identical.
Figure 6-19 displays the performance of the algorithm versus the number of queries. It
takes less than one minute and 32 milliseconds to process 540 queries on a mobile phone
and on a PC respectively. Therefore, the performance of the algorithm is, indeed,
acceptable for both a PC and a mobile phone.
Table 6-3: Constants for the query extraction algorithm
Minimum Weight of an interest(minW ) 0.2
Maximum Keywords in Interests(maxKeys ) 5
Minimum Interest Similarity Value( accSim ) 0.4
Minimum Cosine Similarity Value( accV ) 0.5
Acceptable support 4%
Acceptable confidence (minConf) 80%
Chapter 6: Implementation and evaluation
155
0
400
800
1200
40 160 280 400 520
Queries
Exe
cutio
n Ti
me(
ms)
PC mobile
Figure 6-19: Performance of interest extraction algorithm
Rule mining is performed to produce rules that can be used to identify information
demand of a user. The historical information demands of a peer are generated by using the
data displayed in Table 6-4. Figure 6-20 shows the relationships between the volume of
sharing histories and the performance of the rule mining algorithm.
Table 6-4: Characteristics of information demands
Information Demand
Context Interest Description Weight
I01 news 30-40 (8 AM, Bus 27)
I02 finance 60-70
I11 research 65-70 (8 AM, Bus 37)
I12 joke 30-35
I21 football 70-80 (ø, Stadium)
I22 tennis 20-30
I31 Research 65-70 (12 PM, INSA-Café)
I32 place 35-40
Chapter 6: Implementation and evaluation
156
To process 180 sharing-histories, it takes around 20 seconds for the PC. The algorithm
takes around 20 seconds to process 120 historical information demands on the mobile
phone. As a result, the algorithm is acceptable for a PC. It has also an acceptable
performance for a mobile phone as long as the number of sharing histories is less than 120.
The rule mining process becomes heavier for a mobile phone as the number of
information demands increases. However, as rule mining is performed occasionally, the
load on a mobile phone is not as such exaggerated.
Assume that a user collects sharing-statistics 3 times in a day. We need 60 days (two
months) to produce 180 sharing-histories. In the reality, it is very rare that a user would
stay away from the Internet for 2 months. Therefore, heavy weighted devices can perform
rule mining on behalf of a mobile phone.
0
20
40
60
80
100
120
50 60 70 80 90 100 110 120 130 140 150 160 170 180
Information demands
Exe
cutio
n tim
e (s
)
PC Mobile
Figure 6-20: Rules to identify information demand
Chapter 6: Implementation and evaluation
157
6.3.3 Mobility Awareness
posed in section 4.4.2 was performed to produce rules
with respect to the mobility classes of MANET-views. Table 6-5 describes the data used to
g
the performance of the rule-mining algorithm
is
Table 6-5: Characteristics of sharing-statistics
Range –lifetime(in minutes)
The rule-mining algorithm pro
enerate sharing-statistics. Table 6-6 lists the values of the constants of the algorithm used
during the evaluation. The mobility-classes described in Table 6-7 are considered during
the evaluation of the algorithm.
The result of the experiment is displayed in Figure 6-21. It takes around 26 seconds to
process 200 sharing-statistics in a PC. Thus,
acceptable for a PC.
Actual sharing Context(Time, Location)
(Bus, 8 AM - 8:10 AM) 3-4
(Restaurant, 12 AM -12:30 AM) 22-28
(Stadium ,∅) 90-120
(Café ,∅) 11-15
Table 6-6: Constants considered during rule mining evaluation
Recent-Time 0(all sharing-statistics are considered
minimum network referred by a rule 5
Acceptable support 4%
Acceptable confidence 80%
Chapter 6: Implementation and evaluation
158
Table 6-7: Range-lifetimes of mobility classes designed for sharing context (“”,∅)
Mobility classes Range-Lifetimes(in minutes)
1 [0,15)
2 [15,30)
3 [30,∞)
For 100 sharing statistics in a mobile phone, the algorithm took around 22 seconds. The
performance of the rule mining reduces, in a mobile phone, as the volume of sharing
statistics increases. It takes around 4.6 minutes to process 200 sharing-statistics in a mobile
phone. Assume that a user is involved 3 times a day in information sharing in a MANET.
We need around 67 days to have 200 sharing statistics. A mobile phone will contact
powerful devices several times in 67 days. As a result, it can be helped by powerful
devices to mine rules. Moreover, we perform rule mining in incremental manner.
0
50
100
150
200
250
300
50 60 70 80 90 100
110
120
130
140
150
160
170
180
190
200
Sharng-statistics
Exe
cutio
n tim
e (s
)
PC Mobile
Figure 6-21: Rules to identify mobility classes
Chapter 6: Implementation and evaluation
159
6.3.4 File Classific
xperiments to evaluate our classification algorithm (section
5.3.1). In the first type of experimentation, the execution time of the algorithm is tested
v
icantly different advertisement volumes. These mobility classes are used to
d
t of files or by using their metadata. The
co
an
ation
We have made two types of e
ersus the number of representative mobility classes (cf section 5.3.2). The second type of
experimentation is used to evaluate the execution time of the algorithm versus the number
of files.
As discussed in section 5.3.2, representative mobility classes are those mobility classes
having signif
etermine the dimension of the file tree, i.e., its height and number of clusters at each depth
of the tree. The significance factor is used to determine the number of representative
mobility class and hence, this factor affects the dimension of the file tree. Consequently,
evaluating classification of file versus representative classes is the same as evaluating
classification of file versus the significant factor.
During the evaluation, we have produced mobility classes in such a way that all of the
mobility classes are also representative classes.
As discussed in section 5.3.1, the hierarchical classification algorithm proposed in this
thesis can be implemented by using the conten
ntent-based classification is performed by using the files’ vector representations. The
metadata-based classification is performed by using the files’ textual descriptions.
The content-based classification was evaluated by using 557 abstracts of research papers
collected in the domain of information retrieval, multimedia, pervasive computing, GIS
d e-learning. The algorithm, however, does not accept the raw files but their vector
descriptions. The metadata-based algorithm accepts metadata of photos collected and
prepared by the Department of Computer Science and Engineering of the University of
Washington [119]. The classification algorithms were tested using the parameters
described in Table 6-8. We have used 0.3 as a lexical minimum similarity value. The value
0.3 is a reasonable similarity value to classify files in the same group; assume that two files
are described by textual descriptions containing 10 keywords; the two files are similar if
Chapter 6: Implementation and evaluation
160
the execution times and the
significance factors for the content-based and the metadata-based algorithms. As displayed,
w
eters used classification algorithm in the first experimentation
Input Content based Metadata based
they have at least 3 keywords in common. For content wise similarity, 0.4 is logical
similarity cosine value for classifying files. Note that cosine value 1 indicates that the files
are identical and 0 indicates that they are totally dissimilar.
Figure 6-22 and Figure 6-23 show the relationships of
e can observe that the execution time of the algorithm does not depend much on the
significance factor.
Table 6-8: Param
Files 557 200
File Type text Photo
minSim 0.4 0.3
maxIteration 100 30
File representation or al description vect textu
Fro th riments, we have observed that the content-based
classification algorithm has a good performance as compared to the metadata based
c
m the result of e expe
lassification algorithm. However, this is only true if the vectors are computed in advance.
In reality, the vectors of files depend on one another; thus, we need to re-compute the
vector space as new files added in the system. Vector production is not a simple process as
displayed in Figure 6-24. The vector space was produced with the help of a library called
Jama [120]. The Jama library does not have a version for mobile phones. However, it is
simple to imagine how the vector production would be expensive for mobile phones.
Chapter 6: Implementation and evaluation
161
0
5000
10000
15000
20000
25000
30000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Significance-factor
Exe
cutio
n tim
e (m
s)
PC Mobile
Figure 6-22: Content based classification performance
0
20000
40000
60000
80000
100000
120000
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Significance-factor
Exe
cutio
n ti
me
(m
s)
PC mobile phone
Figure 6-23: Metadata based classification performance
Chapter 6: Implementation and evaluation
162
05000
1000015000200002500030000350004000045000
50 85 120
155
190
225
260
295
330
365
400
435
470
505
540
Number of files
Exe
cutio
n tim
e (m
s)
Figure 6-24: Vector production in the PC
In the second type of experimentation, we only considered the metadata-based
classification algorithm. As displayed in Table 6-9, the metadata classification was applied
to produce a file tree from the test data downloaded from the website [119]. The file tree
was produced by keeping in mind three mobility classes such that 2, 6 and 20 metadata can
be advertised. Thus, the height of the file tree is 3, two clusters are found at the depth 1; six
clusters, at the depth 2 and twenty clusters, at the depth 3.
Table 6-9: Inputs for classification algorithm for the second type of experimentation
Data data in [119]
Height of the file tree 3
Number of cluster at the first depth 2
Number of cluster at the second depth 6
Number of cluster at the third depth 20
Chapter 6: Implementation and evaluation
163
Figure 6-25 shows the execution time required to perform the metadata based file
classification versus the number of photos to be classified. The algorithm has a good
performance in a PC. It has also an acceptable performance in a mobile phone as long as
the number of photos is less than 250. However, the performance of the algorithm
deteriorated in a mobile phone as the number of photos increases.
0
50
100
150
200
250
300
350
50 100 150 200 250 300 350 400 450 500 550 600 650Photos
Exe
cutio
ns ti
me
(s)
Mobile phone PC
Figure 6-25: Performance metadata based classification
The algorithm took around 5 minutes to process 650 photos on the mobile phone. In
reality, the number of photos, in a mobile phone, rarely passes 250; indeed, thin devices
like Sony Ericson w910i mobile phones are not usually used to store more than 250 photos
(more than 500 mega byte). Furthermore, as photo classification is done occasionally, thin
devices can be helped by heavy weighted devices to perform photo classification.
6.3.5 Advertisement Selection
The algorithm in section 5.4.1, which is used to filter the advertisements according to the
users’ interests, was tested by using the metadata of photos collected and prepared by the
Chapter 6: Implementation and evaluation
164
Department of Computer Science and Engineering of the University of Washington [119].
Table 6-10 describes the data used to filter the advertisements.
The advertisement policy is designed to distribute 3 metadata for the first two interests
(7*0.4 =2.8) and 1 metadata for the other interest (7*0.2=1.4). We consider 0.3 as a
minimum similarity value between an interest and a file/cluster. Assume that the number of
keywords in the textual description of a file is 7. A file is match with an interest if the
interest and the file’s textual description have three keywords in common.
Figure 6-26 illustrates the relationship between the algorithm and the number of sharable
photos. For 200 sharable photos, the algorithm takes around 68 ms in the mobile phone.
Therefore, the execution time is definitely acceptable.
Table 6-10: Test data used during filtering advertisements
Overall
Demand
{({ tree, grass, sky, bench},0.4),
({flower, tree, bush, sky, car},0.4),
({trunk, sidewalk, rock, sky},0.2)}
Advertisement volume 7 metadata
data data in [119]
File-tree’s
Dimension
Height 3; There are 2, 6, and 20 clusters at each level
of the tree.
Similarity threshold 0.3
Chapter 6: Implementation and evaluation
165
010203040
50607080
50 60 70 80 90 100
110
120
130
140
150
160
170
180
190
200
Photos
Exe
cutio
n Ti
me
(ms)
Figure 6-26: Advertisement content determination in a mobile phone
6.4 Discussion
In this chapter, we have discussed the design and the implementation of the SAMi
middleware. The flexibility of the middleware is an important design goal of SAMi. One
way of achieving this goal could be integrating the middleware with well-established
messengers as Google-Talk and Yahoo. Even if most of these messengers permit such kind
of integrations via a plug-in, in our knowledge, no messenger works without accessing the
Internet. Therefore, we designed our own interface for the middleware.
We have tried to use third party simulation software like ns-2. However, the simulators
are designed to test routing protocols and it is difficult to use them to deploy an advanced
information sharing system like SAMi. As a result, we have developed our own simulator.
The main functionalities of the middleware are implemented in a real environment.
J2ME have been used as programming language. As it is a high-level language,
determining the load of a device and state of users were difficult. In J2ME, almost every
computation should be implemented using elementary operations. Advanced computations
(e.g. string manipulation like splitting and merging), are not supported in J2ME.
Chapter 6: Implementation and evaluation
166
The current version of MIDP does not provide a support to process metadata that has
been presented by the content description metadata models like Dublin-core [104] and
MPEG-7 [105]. It does not ever provide an XML parser. As result, simple strings are used
to represent messages and describe files.
We have used Sony Ericsson mobile phones w910i and w890i as computing devices
during the implementation of the middleware. In these phones and most ordinary phones,
database management systems do not exist. In addition, J2ME does not have a library to
access a database management system. As a result, the data stores are implemented by
using a system called the record management system17 provided by MIDP.
The prototype works correctly on a Sun Wireless Toolkit emulator, a Sony Ericson
w880i and w910i mobile phones and a PC with 3 GHz processing speed and a laptop with
2 GHz processing speed. We used the prototype to enable photo sharing involving 4 Sun
wireless Toolkit emulators. We also used it to perform photo sharing between the laptop
and the desktop computer. However, we faced some difficulties with the communications
involved mobile phones due to the instability of Bluetooth.
6.5 Conclusion
In this chapter, we have discussed a self-adaptive middleware named SAMi. The
middleware uses the approaches discussed in the previous three chapters to perform
information discovery according to the profile of the MANET and the peers participating
in the network. In the middleware, information delivery is performed according to the
profile of the users and the deadline of the file delivery. The middleware is decomposed
into two components; named SAMi-Basic and SAMi-ext. SAMi-Basic is used to perform
the basic functionalities of SAMi and is installed by every device participating in the
information sharing. SAMi-ext is used to perform the expensive activities of SAMi like
rule mining and file classification.
17 The record management system stores data as a list of records. A recrd is an array of bytes.
Chapter 6: Implementation and evaluation
167
This chapter has also presented the design, the implementation and the evaluation of
SAMi. The test-bed that simulates MANETs and implements the middleware had been
designed by assuming that peers are equipped with a uniform network technology. The
test-bed was used to evaluate the data delivery rate of the middleware.
Furthermore, the prototype, which is developed to illustrate the application of SAMi in
photo sharing and annotation, considers mobile phones as computing devices and
Bluetooth as a network technology. It was used to evaluate the major functionalities of the
SAMi middleware on Sun Wireless Emulators, a PC with 3 GHz processing speed and a
Sony Ericson w910i mobile phone.
169
Chapter 7 Conclusion and Future Work
In the first two chapters, we have identified interest-awareness and mobility awareness as
important research problems in information sharing in a MANET and we have reviewed
important research works in the field of information sharing, service discovery and data
routing with respect to the identified research problems. In the previous four chapters, we
have presented and evaluated our information sharing middleware that has been proposed
to resolve the identified research problems. In this chapter, we summarize our important
contributions. We also analyze our research work according to the requirements stated in
the first chapter and research work presented in the second chapter. Finally, we conclude
the thesis and the chapter by pointing out the main future work envisaged for extending our
research work.
The chapter is organized as follows: section 7.1 summarizes the contribution of the
thesis; section 7.2 analyses our middleware; finally, section 7.3 winds up the chapter and
the thesis by pointing out future works.
Chapter 7: Conclusions and Future Works
170
7.1 Summary of Contributions
In this thesis, we have proposed a novel middleware named SAMi to allow nomadic
users to share information in MANETs. The middleware works by distributing
advertisements and queries. The advertisement of files and the resolution of queries are
performed according to the users’ profile and their context.
The middleware is designed to fulfill the requirements stated in the first chapters, i.e.,
“Pervasiveness”, “Mobility awareness”, “Interest-awareness”, “High level semantics”,
“Social awareness”, “Context aware content delivery” and “Routing of data”. As existing
information-sharing systems in a MANET gives less emphasis to challenges with respect
to the mobility of users and their interests, our work gives more focus to the requirement
“Mobility awareness” and “Interest-awareness”.
To facilitate the advertisement process, we have studied how files are hierarchically
classified into clusters. Clusters are organized in a file tree. The dimension of the tree (i.e.,
its height and the number of clusters at each depth) is determined from the mobility classes
considered by the data source to share information in MANETs.
A mobility class is a concept used to describe a category of MANET-views according to
the users’ stay-times and their contexts. The same advertisement policy is applied in
MANET-views described by the same mobility class. Mobility classes can be determined
from the peers’ stay times or by using habit rules. We have proposed an approach to
compute mobility classes semi-automatically by analyzing the peers’ historical information
sharing behaviors.
SAMi determines the content of advertisements and their dissemination according to the
users’ interests so that the advertisements’ volumes and the load of their routing are
minimized. Similarly, the resolution of a query (i.e., where and how the query is posed) is
performed by observing the users’ interests to provide information. Historical query
analysis, habit rules and social groups are involved in the interest identification process.
Chapter 7: Conclusions and Future Works
171
In SAMi, a file is delivered from one or more information sources. Information sources
are selected according to their profile, the time they stay with the requester and how far
they are found from the requester in terms of distance and number of hops.
SAMi has been deployed in a simulated environment where devices are assumed to
interconnect by wireless network technology with uniform bandwidths. It has been also
deployed over real devices (mobile phones and PCs) interconnected by Bluetooth. The
simulation-environment was used to evaluate the data delivery rate of the middleware.
Experimentation on real devices was used to evaluate the important different
functionalities of the middleware. From the evaluations that have been made, we have
observed that SAMi has a very good potential to serve nomadic users to share information
according to their interests.
7.2 Conclusion
This thesis is designed especially to tackle the challenges related to the mobility of users
and their interests. Table 7-1, the copy of Table 2-3 with the column “SAMi”, compares
SAMi to the existing information sharing systems. As discussed in chapter 2, none of them
gives a special attention towards the requirement “mobility awareness” and “interest
awareness”. The column “SAMi” indicates that our middleware gives attention to all of
these requirements in a better way.
As described below, SAMi deals with all of the requirements specified in the first
chapter, more especially with mobility awareness and interest awareness.
Pervasiveness: SAMi can be deployed in computing devices ranging from mobile phones
to PCs. As it is designed for a MANET, the middleware can be used anywhere and any
time.
Mobility awareness: SAMi adjusts its information discovery strategies according to the
network dynamicity, which is measured by the connectivity lifetime of MANET views.
Mobility classes are used to parameterize the advertisement policy, which determines the
Chapter 7: Conclusions and Future Works
172
extent to which the push discovery approach is applied, according to the connectivity
lifetime of MANET-views.
Interest-awareness: SAMi is designed to work according to the interests of users. Users,
first, exchange their interests to receive and provide information. Data requesters and data
sources resolve queries and make advertisements based on the received interests. We have
proposed an approach that can be used by data-sources to identify and facilitate the data
requesters’ interests.
Table 7-1: Comparing SAMi to existing information sharing systems
Systems
Requirement
Cod
eTor
rent
Lim
e
Lim
eOne
TOTA
OR
ION
Peer
War
e
AdH
ocFS
Ad-
Hoc
Info
War
e
MID
DLE
SAM
i
Pervasiveness ++ ++ ++ ++ ++ ++ ++ + ++ ++
Mobility awareness - - + + - - + + + ++
Interest-awareness - - - - - + - + - ++
High-level semantics - + + + - ++ + - + +
Social awareness - - - - - - - ++ - +
Context aware content delivery + - - - + - - - + ++
Data dissemination - - - - ++ - - - - +
- not considered, + considered partially or in limited way, ++ considered
High-level semantics: In SAMi, files are hierarchically classified into clusters. Clusters
are used to make the file advertisement at high level. However, the semantic meaning of a
cluster is limited since we have used an unsupervised classification approach.
Social awareness: In SAMi, the users’ social networks are used to facilitate the interest
identification process. We have proposed an algorithm that identifies the implicit social
Chapter 7: Conclusions and Future Works
173
networks of users by analyzing their collaborations in a MANET. However, SAMi do not
assist users to identify their explicit social networks (i.e., friends, neighbors, families and
so on).
Context aware content delivery: In SAMi, files are delivered block by block from one or
more sources. In order to facilitate the downloading of rare files, SAMi applies offline
delivery (by using email for example).
Data dissemination: In SAMi, queries and advertisement are disseminated according to
the interests of users. LAR routing protocol is employed to determine the peers located in
the direction of peers interested in the advertisement and able to resolve the query.
7.3 Future Work
Information sharing in MANETs touches many issues and integrates several domains.
An information sharing middleware for MANETs should deal with issues related to data
routing, content delivery, information discovery, information classification and social
networking. As the fields are too numerous to be covered in a thesis, we concentrated on
the main aspects of information sharing and leave the others as open works. The following
are some of the main envisaged future works:
Privacy: The problem of privacy is more challenging in MANETs than in traditional
networks. Access rights can be used to keep the privacy of the users. SAMi can be used, as
it is, by nomadic users to share files with public access right. SAMi can also be easily
extended to advertise files having access right limited to some individuals or groups.
However, assigning access right manually is a tedious task for users. In the future, we will
investigate on designing a method that assists users to assign access rights to files.
File classification: In this thesis, we have tested an unsupervised classification algorithm
to produce a file tree. However, an ontology based classification approach may give a more
semantically meaningful file tree. The problems of an ontology-based classification
approach are related to the creation of the domain knowledge and the formation of a
balanced file tree. Investigating hybrid classification technique is an important future work
Chapter 7: Conclusions and Future Works
174
that can enable a more meaningful and balanced file tree. Another important future work is
the production of a file tree according to the interests of users. Considering the interest of
users during the production of a file tree will facilitate the advertisement content selection
process.
Data routing: In SAMi, we have used the LAR routing protocol. In order to allow LAR
to work in a dynamic environment, it can be hybridize with the DG-CastoR, a routing
algorithm developed in our research team. In future, we plan to study the precisely
hybridization of LAR and DG-CastoR.
Context Management: In SAMi, the context of users is used to determine the mobility
class of MANET-views and the users’ interests. However, we consider only the time and
the location contexts. Moreover, we have not considered complex manipulation of
contexts. For example, we said two location contexts are similar if they are identical or
have inheritance relationships. In our approach, the context (“Bus”, ∅) and (“Tram”, ∅)
are not the same. In the future, we plan to use the work of Dejene Ejigu, a former PhD
student in our research group, to equip the SAMi middleware with an advanced context
management feature.
175
ITS
Glossary of Acronyms
AODV Ad-hoc On Demand Distance Vector
DEN Distributed Event Notification services
DREAM distance routing efficient algorithm
GHI Geographical based Hierarchical Index
GSM Global System for Mobile
GVDs Global Virtual Data structure
Interface Tuple Space
JESA Java Enhanced Service Architecture
JXME JXTA for java ME
JXTA JuXTApose
MANET Mobile Ad-hoc NETwork
MIDP Mobile Information Device Profile
NAT Network Address Translation
P2P Peer to Peer
PDA Personal Data Assistant
PRNET Packet Radio NETwork
SAMi Self-Adaptive Middleware
SAP Service Access Point
SDP Service Discovery Protocols
SLP Service Location Protocol
SNS Social-Network Sites
UPnP Universal Plug and Play
VANET Vehicular Ad-Hoc NETwork
VSM Vector Space Modeling
WLAN Wireless Local Area Networks
WPAN Wireless Personal Area Networks
WWAN Wireless Wide Area Networks
XML eXtensible Markup Language
176
Bibliography
[1] R. Prasad and L. Deneire, “Chapter 7 - Mobile Ad Hoc Networks (MANET),” From
WPANs to Personal Networks: Technologies and Applications, Artech House, 2006,
available on http://common.books24x7.com/book/id_14823/book.asp, last accessed
on 15 April 2010.
[2] K. Sarkar, Subir, T.G. Basavaraju and C. Puttamadappa, “Chapter 1 - Introduction,”
Ad Hoc Mobile Wireless Networks: Principles, Protocols and Applications,
Auerbach Publications, 2008, available on
http://common.books24x7.com/book/id_26393/book.asp, last Accessed on 15 April
2010.
[3] S. Churchil, “Cellular’s 25th Anniversary”, http://www.dailywireless.org/2008/10/
10/cellulars-25th-anniversary/, last accessed on April 15, 2010, Oct. 2008.
[4] K. German, “Cell phone battery life charts- CNET Reviews,” Feb. 2010, available
on http://reviews.cnet.com/cell-phone-battery-life-charts/, last accessed on 15 April
2010.
[5] “iPhone 3GS, http://www.apple.com/iphone/, last accessed on 15 April 2010.”
[6] M. Mühlhäuser and I. Gurevych, “Chapter IX - Opportunistic Networks” Handbook
of Research on Ubiquitous Computing Technology for Real Time Enterprises, IGI
Global, 2008, available on http://common.books24x7.com/book /id_24561
/book.asp,. last accessed on 15 April 2010.
[7] “IEEE 802.11 - Wikipedia,” the free encyclopedia, available on
http://en.wikipedia.org/wiki/Wi-Fi, last accessed on 21 Jan 2009.
[8] Wi-Fi Alliance, “Wi-Fi Direct,” Oct. 2009, available on http://www.wi-
fi.org/news_articles.php?f=media_news&news_id=909, last accesses on 24 Feb
2010.
[9] ZigBee Alliance, “ZigBee and Wireless Radio Frequency Coexistence”, 2009,
available on http://www.zigbee.org/LearnMore/WhitePapers/tabid/257/Default.aspx,
last accessed Feb 2 2010.
177
[10] K. Tuan Le, “ZigBee SoCs provide cost-effective solutions,” 2005, available on
http://www.wirelessnetdesignline.com/howto/173500576, last accessed on 15 May
2010.
[11] P. Piccard, “Chapter 10 - eDonkey and eMule,” Securing IM and P2P Applications
for the Enterprise, Syngress Publishing, 2006, available on
http://common.books24x7.com/book/id_10710/book.asp, last accessed on 15 Aprl
2010.
[12] R. Subramanian and D. Brian, “Chapter I - Core Concepts in Peer-to-Peer
Networking,” Peer-to-Peer Computing: The Evolution of a Disruptive Technology,
IGI Publishing, 2005, available on http://common.books24x7.com/book/
id_9175/book.asp, last accessed on 15 April 2010
[13] M. Mühlhäuser and I. Gurevych, “Chapter VIII - Peer-to-Peer Systems,” Handbook
of Research on Ubiquitous Computing Technology for Real Time Enterprises, IGI
Global, 2008, available on http://common.books24x7.com/book/id_24561/book.asp,
last accessed on 20 April 2010.
[14] S. Liebowitz, “Chapter 7 - Copyright and the Internet,” Rethinking the Network
Economy: The True Forces that Drive the Digital Marketplace, AMACOM, 2002,
available on http://common.books24x7.com/book/id_5192/book.asp last accessed on
20 April 2010.
[15] I.J. Taylor, “Chapter 2 - Peer-2-Peer Systems,” From P2P to Web Services and
Grids: Peers in a Client/Server World, , Springer, 2005, available on
http://common.books24x7.com/book/id_16228/book.asp, Last Accessed on 12 April
2010.
[16] M. Miller, “Chapter 10 - The Gnutella Network: The Next Napster?,” Discovering
P2P, Sybex, 2001, available on http://common.books24x7.com/book/
id_3239/book.asp, last Accessed on 15 April 2010.
[17] I.J. Taylor, “Chapter 6 - Gnutella,” From P2P to Web Services and Grids: Peers in a
Client/Server World, Springer, 2005, available on http://common.books24x7.com/
book /id_16228/ book.asp,last accessed on 15 April 2010.
178
[18] giFT-FastTrack, “Documentation of the known parts of the FastTrack protocol,
2004, available on http://cvs.berlios.de/cgi-bin/viewcvs.cgi/gift-fasttrack/giFT-
FastTrack, last accessed on 10 April 2010.
[19] R. Subramanian and D.G. Brian, “Chapter II - Peer-to-Peer Networks for Content
Sharing,” Peer-to-Peer Computing: The Evolution of a Disruptive Technology, IGI
Publishing, 2005, available on http://common.books24x7.com/book/
id_9175/book.asp, last accessed on 15 April 2010.
[20] J.D. Gradecki, “Chapter 2 - An Overview of JXTA,” Mastering JXTA: Building
Java Peer-to-Peer Applications, John Wiley & Sons, 2002, available on
http://common.books24x7.com/book/id_5393/book.asp, last accessed on 15 April
2010.
[21] M. Miller, “Chapter 12 - The KaZaA/ MusicCity Network,” Discovering P2P,
Sybex, 2001, available on http://common.books24x7.com/book/id_3239/book.asp,
last accessed on 15 April 2010.
[22] H. Balakrishnan, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica, “Looking up
data in P2P systems,” Commun. ACM, vol. 46, 2003, p. 43-48.
[23] bittorrent, http://www.bittorrent.com/, last accessed on 26 May 2010.
[24] A. Arora, C. Haywood, and K. Pabla, “JXTA for J2ME Extending the Reach of
Wireless With JXTA Technology,” Sun Microsystems, UK, 3 pages, 2002.
[25] C. Lindemann and O.P. Waldhorst, “A Distributed Search Service for Peer-to-Peer
File Sharing in Mobile Applications,” IEEE Computer Society, 2002, 8 pages.
[26] S. Ratnasamy, P. Francis, S. Shenker, R. Karp, and M. Handley, “A Scalable
Content-Addressable Network,” IN PROCEEDINGS OF ACM SIGCOMM, 2001,
p. 161-172.
[27] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, “Chord: A
Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proceedings of the
2001 conference on Applications, technologies, architectures, and protocols for
computer communications, an Diego, California, USA, 2001, p. 149 - 160 .
[28] A.I.T. Rowstron and P. Druschel, “Pastry: Scalable, Decentralized Object Location,
and Routing for Large-Scale Peer-to-Peer Systems,” Springer-Verlag, 2001, p. 329-
350.
179
[29] B.Y. Zhao, L. Huang, J. Stribling, S.C. Rhea, A.D. Joseph, and J.D. Kubiatowicz,
“Tapestry: A Resilient Global-scale Overlay for Service Deployment,” IEEE
JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, vol. 22, 2004, p.
41-53.
[30] B.Y. Zhao, J.D. Kubiatowicz, and A.D. Joseph, Tapestry: An Infrastructure for
Fault-tolerant Wide-area Location and routing, Technical Report: CSD-01-1141,
University of California at Berkeley, 28 pages, 2001.
[31] I.J. Taylor, “Chapter 9 - Freenet,” From P2P to Web Services and Grids: Peers in a
Client/Server World, Springer, 2005, available on
http://common.books24x7.com/book/id_16228/book.asp, last accessed on 15 April
2010.
[32] L.T. Yang and M. Guo, “Chapter 29 - Resource Discovery in Peer-to-Peer
Infrastructure,” High Performance Computing: Paradigm and Infrastructure, John
Wiley & Sons, 2006, available on http://common.books24x7.com/book/
id_22774/book.asp, last accessed on 15 April 2010.
[33] D. Goh and F. Schubert, “Chapter VIII - Adaptive Peer-to-Peer Social Networks for
Distributed Content-Based Web Search,” Social Information Retrieval Systems:
Emerging Technologies and Applications for Searching the Web Effectively, IGI
Publishing, 2008, available on http://common.books24x7.com/book
/id_23246/book.asp, last accessed on April 2010.
[34] D. Boyd and N. Ellison, “Social network sites: Definition, history, and scholarship,”
Journal of Computer-Mediated Communication, vol. 13, 2007, available on
http://jcmc.indiana.edu/vol13/issue1/boyd.ellison.html, last accessed on 12 April
2010.
[35] D. Martinez, Introduction location related aspects to mobile multimedia
environments, Reports from MSI, university of Växjö, 52 pages, 2006.
[36] N.D. Ziv and B. Mulloth, “An Exploration on Mobile Social Networking: Dodgeball
as a Case in Point”, Proceedings of the International Conference on Mobile Business
IEEE Computer Society, Washington, DC, USA, 21 pages, 2006.
[37] Myspace, “http://www.myspace.com, last accessed on 26 May 2010.”
[38] FaceBook, “www.facebook.com, last accessed on 26 May 2010.”
180
[39] N. Jhanji, “ImaHima,”, Sep. 2001, available on http://90.146.8.18/en/archives/
prix_archive/prix_projekt.asp?iProjectID=10954, accessed Dec 10, 2010.
[40] A. Klemm, E. Klemm, C. Lindemann, and O.P. Waldhorst, “A Special-Purpose
Peer-to-Peer File Sharing System for Mobile Ad Hoc Networks,” Proceeding of the
IEEE Semiannual Vehicular Technology Conference (VTC2003-Fall), Orlando, FL,
USA, 6 pages, October 2003.
[41] U. Lee, J. Park, J. Yeh, G. Pau, and M. Gerla, “Code torrent: content distribution
using network coding in VANET”, ACM Press, 6 pages, 2006.
[42] A.L. Murphy, G.P. Picco, and G. Roman, “LIME: A coordination model and
middleware supporting mobility of hosts and agents,” ACM Trans. Softw. Eng.
Methodol., vol. 15, 2006, p. 279-328.
[43] G.P. Picco, A.L. Murphy, and G. Roman, “LIME: Linda meets mobility,” In
Proceedings of the 21stInternational Conference on Software Engineering, Los
Angeles (USA): 1999, p. 368-377.
[44] C. Fok, G. Roman, and G. Hackmann, “A lightweight coordination middleware for
mobile computing,” IN PROCEEDINGS OF THE 6TH INTERNATIONAL
CONFERENCE ON COORDINATION MODELS AND LANGUAGES, vol. 2949,
2004, p. 135-151.
[45] M. Mamei and F. Zambonelli, “Programming pervasive and mobile computing
applications with the tota middleware,” PROCEEDINGS OF THE SECOND IEEE
ANNUAL CONFERENCE ON PERCOM, 2004, p. 263-273.
[46] M. Mamei and F. Zambonelli, “Self-Maintained Distributed Tuples for Field-based
Coordination in Dynamic Networks,” Concurrency and Computation: Practice and
Experience Concurrency - Practice and Experience, vol. 18, 2004, p. 427-443.
[47] G.C. And, G. Cugola, and G.P. Picco, PeerWare: Core Middleware Support for
Peer-to-Peer and Mobile Systems, Technical report, Politecnico di Milano, Italy, 11
pages, 2001.
[48] M. Boulkenafed and V. Issarny, “AdHocFS: Sharing Files in WLANs,” 2nd Int.
Symp. on Network Computing and Applications, 2003, p. 156–63.
181
[49] M. Boulkenafed and V. Issarny, “A Middleware Service for Mobile Ad Hoc Data
Sharing, Enhancing Data Availability,” PROCEEDINGS OF ACM/IFIP
INTERNATIONAL MIDDLEWARE CONFERENCE, RIO DE JANEIRO, 2003, p.
493-511.
[50] T. Plagemann, J. Andersson, O. Drugan, V. Goebel, C. Griwodz, P. Halvorsen, E.
Munthe-kaas, M. Puzar, N. S, K. Steml, T. Plagemann, J. Andersson, O. Drugan, V.
Goebel, and C. Griwodz, “Middleware services for information sharing in mobile
ad-hoc networks - challenges and approach,” IN WORKSHOP ON CHALLENGES
OF MOBILITY, IFIP TC6 WORLD COMPUTER CONGRESS, 12 pages, 2004.
[51] C. Mascolo, L. Capra, S. Zachariadis, and W. Emmerich, “XMIDDLE: A Data-
Sharing Middleware for Mobile Computing,” INT. JOURNAL ON PERSONAL
AND WIRELESS COMMUNICATIONS, vol. 21, 2001, p. 77-103.
[52] J. Jetcheva, Y. Hu, D. Maltz, and D. Johnson, “A Simple Protocol for Multicast and
Broadcast in Mobile Ad Hoc Networks,” IETF Internet Draft, 2001, available on
http://www.monarch.cs.rice.edu/internet-drafts/draft-ietf-manet-simple-mbcast-
01.txt, accessed 24 Nov 2009.
[53] C. Perkins, E. Belding-Royer, and S. Das, “Ad hoc On-Demand Distance Vector
(AODV) Routing, Nokia Research Center,” july 2003, available on
http://www.ietf.org/rfc/rfc3561.txt, last accessed on 26 Nov 2009.
[54] D. Gelernter, “Generative communication in Linda,” ACM TRANSACTIONS ON
PROGRAMMING LANGUAGES AND SYSTEMS, vol. 7, 1985, p. 80-112.
[55] G. Cugola and G.P. Picco, “Peer-to-Peer for Collaborative Applications,” IEEE
Computer Society, 2002, p. 359-364.
[56] C. Ho, K. Obraczka, G. Tsudik, and K. Viswanath, “Flooding for Reliable Multicast
in Multi-Hop Ad Hoc Networks,” IN PROCEEDINGS OF THE 3RD
INTERNATIONAL WORKSHOP ON DISCRETE ALGORITHMS AND
METHODS FOR MOBILE COMPUTING AND COMMUNICATIONS, vol. 7,
1999, p. 64-71.
[57] H. Wu, H. Peng, Q. Zhou, M. Yang, B. Sun, and B. Yu, “P2P Multimedia Sharing
over MANET,” Advances in Multimedia Modeling, 2006, p. 635-642.
182
[58] E. Guttman, “Service Location Protocol: Automatic Discovery of IP Network
Services,” IEEE Internet Computing, vol. 3, 1999, p. 71-80.
[59] E. Meshkova, J. Riihijarvi, M. Petrova, and P. Mahonen, “A survey on resource
discovery mechanisms, peer-to-peer and service discovery frameworks,” Computer
Networks, vol. 52, August. 2008, p. 2097-2128.
[60] R. Hermann, D. Husemann, M. Moser, M. Nidd, C. Rohner, and A. Schade,
“DEAPspace: transient ad-hoc networking of pervasive devices,” IEEE Press,
Boston, Massachusetts, 2000, p. 133-134.
[61] D. Chakraborty, A. Joshi, Y. Yesha, and T. Finin, “GSD: A Novel Group-based
Service Discovery Protocol for MANETS,” IN 4TH IEEE CONFERENCE ON
MOBILE AND WIRELESS COMMUNICATIONS NETWORKS (MWCN), 2002,
p. 140-144.
[62] D. Chakraborty, A. Joshi, Y. Yesha, and T. Finin, “Toward Distributed Service
Discovery in Pervasive Computing Environments,” IEEE Transactions on Mobile
Computing, vol. 5, 2006, p. 97-112.
[63] O. Ratsimor, D. Chakraborty, A. Joshi, and T. Finin, “Allia: Alliance-based Service
Discovery for Ad-Hoc Environments,” IN PROC. OF ACM MOBILE
COMMERCE WORKSHOP, 2002, p. 1-9.
[64] S. Helal, N. Desai, V. Verma, and C. Lee, “Konark - A Service Discovery and
Delivery Protocol for Ad-Hoc Networks,” In Proceedings of the Third IEEE
Conference on Wireless Communication Networks (WCNC), New Orleans, USA, 7
pages, 2003.
[65] M. Klein, B. König-Ries, and P. Obreiter, “Service Rings - A Semantic Overlay for
Service Discovery in Ad hoc Networks,” Proceedings of the 14th International
Workshop on Database and Expert Systems Applications, IEEE Computer Society,
7 pages, 2003.
[66] U.C. Kozat, L. Tassiulas, and M. Ad, “Network Layer Support for Service
Discovery in Mobile Ad Hoc Networks,” Proc. of IEEE/INFOCOM-2003, San
Francisco, USA, 11 pages, 2003.
183
[67] F. Sailhan and V. Issarny, “Proceedings of the Third IEEE International Conference
on Pervasive Computing and Communications,” IEEE Computer Society, 2005, p.
235-244.
[68] F. Perich, A. Joshi, T. Finin, and Y. Yesha, “On Data Management in Pervasive
Computing Environments,” IEEE Trans. on Knowl. and Data Eng., vol. 16, 2004, p.
621-634.
[69] L. Cheng and I. Marsic, “Service Discovery and Invocation for Mobile Ad Hoc
Networked Appliances,” IEEE Second International Workshop on Networked
Appliances(IWNA'2000), New Jersey, USA, 5 pages, 2000.
[70] A. Varshavsky, B. Reid, and E. de Lara, “A cross-layer approach to service
discovery and selection in MANETs,” IEEE International Conference on Mobile
Adhoc and Sensor Systems Conference, Washington DC, USA, 2005, p. 459-466.
[71] V. Lenders, M. May, and B. Plattner, “Service discovery in mobile ad hoc networks:
A field theoretic approach,” Pervasive and Mobile Computing, vol. 1, Sep. 2005, p.
343-370.
[72] S. Preu,” ESA Service Discovery Protocol,” Proceedings of Networking, Pisa, Italy:
LNCS, 2002, p. 1196-1201.
[73] N. Nikaein, H. Labiod, and C. Bonnet, “Distributed Dynamic routing algorithm for
mobile ad hoc networks Mobile and Ad Hoc Networking and Computing,” First
Annual Workshopon Mobile Ad Hoc Network&Computing(MobiHOC),
Boston,USA: 2000, p. 19-27.
[74] S. Murthy and J.J. Garcia-Luna-Aceves, “A routing protocol for packet radio
networks,” Proceedings of the 1st annual international conference on Mobile
computing and networking, Berkeley, California, United States: ACM, 1995, p. 86-
95.
[75] S. Murthy and J.J. Garcia-Luna-Aceves, “An efficient routing protocol for wireless
networks,” Mob. Netw. Appl., vol. 1, 1996, p. 183-197.
[76] C.E. Perkins and P. Bhagwat, “DSDV routing over a multihop wireless network of
mobile computers,” Ad hoc networking, Addison-Wesley Longman Publishing Co.,
Inc., 2001, p. 53-74.
184
[77] D. Johnson, D. Maltz, and J. Broch, “DSR: the dynamic source routing protocol for
multihop wireless ad hoc networks,” Ad hoc networking, Addison-Wesley Longman
Publishing Co., Inc., 2000, p. 139-172.
[78] Z. Haas, “A New Routing Protocol For The Reconfigurable Wireless Networks,” In
Proceedings of the 6th International Conference on Universal Personal
Communications, vol. 2, p. 562-566, 1997.
[79] M. Mauve, J. Widmer, and H. Hartenstein, “A Survey on Position-Based Routing in
Mobile Ad-Hoc Networks,” IEEE NETWORK, vol. 15, 2001, p. 30-39.
[80] Y. Ko and N.H. Vaidya, “Location-aided routing (LAR) in mobile ad hoc
networks,” Journal of wireless networks, Kluwer Academic Publishers, vol. 6, 2000,
p. 307-321.
[81] S. Basagni, I. Chlamtac, V. Syrotiuk, and B. Woodward, “A distance routing effect
algorithm for mobility (DREAM),” ACM Press, 1998, p. 76-84.
[82] T. Atechian and L. Brunie, “DG-CastoR : Direction-based Geocast Routing protocol
for VANET,” IADIS Internal Conference Telecommunications Networks and
Systems TNS, Amsterdam, Netherlands, 8 pages, 2008.
[83] T. Atechian and L. Brunie, “DG-CastoR for query packets dissemination in
VANET,” 5th IEEE Mobile Ad hoc and Sensor Networks MASS, Atlanta, USA, 6
pages, 2008.
[84] A. Shiferaw, L. Brunie, and V. Scutirici, “Interest-Awareness for Information
Sharing in MANETs,” International workshop on Mobile P2P Data Management,
Security and Trust (MP-DMST*) in conjunction with the 11th IEEE International
Conference on Mobile Data Management (MDM), Kansas City, USA, May, 6 pages,
2010.
[85] A. Shiferaw, S. Lajmi, V. Scuturici, and L. Brunie, “PASMi: self-adaptive Photo
Annotation and Sharing Middleware of Mobile Ad-hoc Networks,” Conference on
Pervasive Computing and Communications Workshops (PerComW) 2010,
Mannheim, Germany, , 6 pages 2010
185
[86] L. Limam, D. Coquil, H. Kosch, and L. Brunie, “Extracting user interests from
search query logs: A clustering approach,” In the 7th International Workshop on
Text-based Information Retrieval (TIR '10) in conjunction with the 21st
International Conference on Database and Expert Systems Applications (DEXA
'10), Span: IEEE ed. Bilbao, 5 pages, 2010.
[87] D. Metzler, S. Dumais, and C. Meek, “Similarity Measures for Short Segments of
Text,” Advances in Information Retrieval, 2007, p. 16-27.
[88] R. Xu and D. Wunsch, “Chapter 3 - Hierarchical Clustering,” Clustering, 2009,
available on http://common.books24x7.com/book/id_27271/book.asp, last accessed
on 15 April 2010.
[89] S. Ma and J.L. Hellerstein, “Mining Partially Periodic Event Patterns With
Unknown Periods,” Proceedings of the International Conference on Data
Engineering (ICDE), Heidelberg, Germany, 2000, p. 205-214.
[90] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,”
Proceeding of the Very Large Data Bases (VLDB) Conference, Santiago de Chile,
Chile,1994, p. 487-499.
[91] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets
of Items in Large Databases,” Proceedings of the ACM SIGMOD International
Conference on Management of Data, Washington, D.C., USA, 1993, pp. 207-216.
[92] M. Denko, “PUSMAN: Publish-Subscribe Middleware for Ad Hoc Networks”,
Proceeding of the Canadian Conference on Electrical and Computer Engineering,
Ottawa, ON, Canada: 2006, p. 1677-1681.
[93] B. Mobasher, R. Cooley, and J. Srivastava, “Automatic personalization based on
Web usage mining,” Commun. ACM, vol. 43, 2000, p. 142-151.
[94] M. Eirinaki and M. Vazirgiannis, “Web mining for web personalization,” ACM
Trans. Internet Technology, vol. 3, 2003, p. 1-27.
[95] T. Joachims, “Optimizing search engines using clickthrough data,” Proceedings of
the eighth ACM SIGKDD international conference on Knowledge discovery and
data mining, Edmonton, Alberta, Canada, 2002, p. 133-142.
186
[96] K. Sugiyama, K. Hatano, and M. Yoshikawa, “Adaptive web search based on user
profile constructed without any effort from users,” Proceedings of the 13th
international conference on World Wide Web, New York, USA, 2004, p. 675-684.
[97] H. Lieberman, “Letizia: An Agent That Assists Web Browsing,” international joint
conference on artificial intelligence, , Montreal, Quebec, Canada, 1995, p. 924-929.
[98] J. Budzik and K. Hammond, “Watson: Anticipating and Contextualizing
Information Needs,” In 62nd annual meeting of the American society for information
science, Washington, DC, USA, 1999, p. 727-740.
[99] D. Goldberg, D. Nichols, B.M. Oki, and D. Terry, “Using collaborative filtering to
weave an information tapestry,”Communications of the ACM, vol. 35, 1992, p. 61-
70.
[100] G. Dupret, “Web search engine evaluation using click-through data and a user
model,” Proceeding of the workshop on query log analysis (WWW), Banff, Canada,
8 pages, 2007.
[101] A. Shiferaw, L. Brunie, V. Scutirici, and Y. Fawaz, “Mobility Awareness for
Information Sharing in MANETs,” the 11th IEEE International Conference on
Mobile Data Management (MDM), Kansas City, USA, May, 3 pages, 2010.
[102] W. Su, S. Lee, and M. Gerla, “Mobility Prediction and Routing in Ad Hoc Wireless
Networks,” International Journal of Network Management, vol. 11, 31 pages, 2001.
[103] A. Negash, L. Brunie, and V. Scutirici, “A context aware Information sharing
Middleware for a dynamic pervasive computing environment,” The International
Journal on Computer Science and Information Systems, 2007, p. 65-82.
[104] C. White, Q. Liam, and L. Burman, “Chapter 23 - Introducing the Dublin Core,”
Mastering XML Premium Edition, 2001, available on
http://common.books24x7.com/book/id_2783/book.asp, last accessed on 15 April
2010.
[105] H. Kosch, “Chapter 2 - MPEG-7: The Multimedia Content Description Standard,”
Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-
21,” Auerbach Publications, 2004, available on
http://common.books24x7.com/book/id_7367/book.asp, last accessed on 15 April
2010.
187
[106] E. Chisholm and T.G. Kolda, “New term weighting formulas for the vector space
method in information retrieval,” Technical report, Oak Ridge National Laboratory,
20 pages, 1999.
[107] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, “An
Efficient k-Means Clustering Algorithm: Analysis and Implementation,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 24, July 2002, p. 881-892.
[108] Y. Zhao and G. Karypis, “Empirical and Theoretical Comparisons of Selected
Criterion Functions for Document Clustering,” Machine Learning, vol. 55, June.
2004, p. 311-331.
[109] “A Tutorial on Clustering Algorithms,” Feb. 2010, available on
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/, last accessed on 17 Jan
2010.
[110] K. Teknomo, “K-Mean Clustering Tutorials”, available on
http://people.revoledu.com/kardi/tutorial/kMean/index.html, last accessed on 25 Feb
2010.
[111] A. Negash, L. Brunie, and V. Scutirici, “A Self-Adaptive Information sharing
Middleware for a dynamic pervasive computing environment The 3rd International
Conference on Wireless Applications and Computing. pp 35-42, Lisbon, Portugal,
July, 2007.
[112] Y. Fawaz, A. Negash, L. Brunie, and S. Vasile-Marian, “Service Composition-
Based Content Adaptation for Pervasive Computing Environment,” The 4th IEEE
International Conference on Pervasive Services (ICPS’07), pp 189-192, Istanbul,
Turkey, July, 2007.
[113] Y. Fawaz, A. Negash, L. Brunie, and V. Scuturici, “ConAMi: Collaboration Based
Content Adaptation Middleware for Pervasive Computing Environment,” The 3rd
International Conference on Wireless Applications and Computing. pp 35-42,
Lisbon, Portugal, July, 2007.
[114] A.K. Dey, “Understanding and Using Context,” Personal Ubiquitous Computing,
vol. 5, 2001, p. 4-7.
188
[115] T. Winograd, “Architectures for context,” HUMAN-COMPUTER INTERACTION,
vol. 16, 2001, p. 401-419.
[116] D. Ejigu, “Context Modeling and Collaborative Context-Aware Services for
Pervasive Computing,” PhD Thesis, INSA de Lyon - France, 245 pages, 2007.
[117] HP Labs, “Jena - a Semantic Web Framework for Java,” available on:
http://jena.sourceforge.net/, last accessed on 20 March 2010.
[118] Y. Fawaz, “Context-Aware Service Composition and Execution for Pervasive
Computing: a data driven approach,” PhD Thesis, INSA de Lyon - France, 213
pages, 2010.
[119] “Testdata,” Department of Computer Science and Engineering, University of
Washington, http://www.cs.washington.edu/research/imagedatabase/groundtruth,
last accessed on 20 April 2010.
[120] Jama, “A Java Matrix Package“, available on
http://math.nist.gov/javanumerics/jama/, last accessed on 16 Sept 2009.
i
Annex A. Résumé Etendu
Le partage d'informations au sein d'un réseau pair à pair mobile est devenu un sujet
de recherche important grâce aux progrès rapides des technologies de communication sans
fil et des dispositifs mobiles intelligents. Les utilisateurs peuvent partager des informations
d’ordre générale (par exemple, des documents portant sur l’éducation ou le tourisme), des
informations d’ordre personnel (par exemple, des photos et des profils personnels), ou des
émissions en direct (par exemple, des émissions radio ou télévisé).
L'objectif de nos travaux de recherche est de concevoir et d’implémenter un système de
partage d’informations dans un environnement ad-hoc. Le partage d’informations, c'est
mettre à disposition à des personnes avec lesquels on est en contact des données afin de les
visualiser, les modifier ou les télécharger. Ce système permet aux utilisateurs de partager
les informations où et quand ils ont l'occasion sur un MANET. Cette thèse se focalise,
particulièrement, sur les exigences suivantes:
• Ubiquité: les utilisateurs nomades devraient être capables de partager l'information
n'importe où, n'importe quand et en utilisant n'importe quel dispositif.
• Sensibilité à la mobilité: les mécanismes de mise en ouvre de partage
d’information doivent prendre en compte la mobilité des utilisateurs.
• Sensibilité aux intérêts: les fichiers partageables doivent être choisis selon les
intérêts des utilisateurs.
• Sémantique de haut niveau: les fichiers partageables doivent être annoncés via
des descriptions de haut niveau.
• Délivrance de contenu sensible au contexte: la délivrance des fichiers doit être
effectuée en fonction du contexte des utilisateurs et de leur environnement.
• Sensibilité au réseau social: les fichiers partageables doivent sélectionnés en
considérant les réseaux sociaux avec quels appartiennent les utilisateurs.
• Acheminement des données/routage: les annonces et les requêtes doivent être
acheminées en fonction des intérêts des utilisateurs.
ii
Afin de prendre en compte ces nombreux défis, nous proposons un intergiciel appelé
SAMi pour permettre aux utilisateurs nomades de partager des informations.
Ce chapitre est organisé comme suit. Tout d’abord, la sensibilité aux intérêts et à la
mobilité sont examinés dans la section 1. Ensuite, SAMi est présenté dans la section 2.
Enfin, nous concluons ce chapitre en présentant quelques pistes de recherche dans la
section 3.
1 Sensibilité aux intérêts et à la mobilité
1.1 Sensibilité aux intérêts
Dans un MANET, le partage d'informations est effectué par la distribution d’annonces et
de requêtes. Les pairs possèdent des fichiers qu’ils ont l’intention de partager avec les
autres. Nous supposons qu’un fichier est décrit par un ensemble de mots-clés. Les pairs qui
souhaitent recevoir des fichiers recherchent ces derniers via la diffusion de requêtes. Les
requêtes, à leur tour, sont représentées par un ensemble de mots-clés.
Afin d’éviter la surcharge de l'environnement avec des annonces et des requêtes inutiles,
l’annonce de fichiers et la résolution de requête doivent être effectuées selon les intérêts
des utilisateurs.
Un intérêt représente un ensemble de fichiers que l'utilisateur a l’intention de recevoir
ou fournir. Un intérêt, noté I, est représenté par ((k1, .., kn), w) tel que
• k1,.., kn sont des mots-clés (nommé Description (I)) et
• w∈ (0, 1] est un poids (nommé Weight(I)).
Description(I) représente les fichiers représentés par l’intérêt I. Weight(I) indique la
préférence/capacité d'un utilisateur pour recevoir ou fournir les fichiers représentés par I.
Un intérêt vide est défini pour représenter les fichiers que l'utilisateur ne peut pas décrire.
La description de l’intérêt vide Ie est l’ensemble vide, i.e., Description (Ie) = ø.
La similarité entre deux intérêts Ii et Ij est définie comme de la similarité de leurs
descriptions. Notons Similarity(Di, Dj) une valeur de similarité de deux descriptions Di et
iii
Dj. Similarity(Di, Dj) peut être calculée en utilisant la fonction de similarité sémantique
proposée dans [86] ou une des fonctions de similarité lexicale proposées dans [87].
La valeur de la similitude des intérêts Ii et Ij est calculée comme suit:
Similarity (Ii,Ij) =Similarity(Description(Ii), Description(Ij))
Nous définissons la valeur de similarité entre un intérêt et un fichier ou une requête de la
même manière. Soit Df représente la description d'un fichier f et soit q représente une
requête. Similarity(I, f) et Similarity(I, q) sont définies par:
Similarity(f,Ii)= Similarity(Df, Description(I))
Similarity(q,Ii)= Similarity(q, Description(I))
Soit Ei et Ej deux éléments qui peuvent représenter des intérêts, des fichiers ou des
requêtes. Les deux éléments sont similaires (noté Ei ≈ Ej ) si et seulement si :
Similarity (Ii,Ij) ≥ accSim où accSim est un seuil de similarité prédéfini
L’intérêt au partage d’l pair est l'ensemble des intérêts de ce pair dans un contexte de
partage donné. Un contexte de partage 18 d'un pair décrit une situation dans laquelle un
pair permet aux autres de télécharger des fichiers depuis son dispositif. Un contexte de
partage est exprimé par un tuple (L, [Ts, TF]) où L est une localisation et [Ts, Tf] est un
intervalle de temps.
Par exemple, (Bus 1, [8AM, 10AM]) est un contexte de partage décrivant qu'un pair
autorise les autres pairs à télécharger des fichiers depuis son dispositif quand il est dans le
Bus 1 de 8AM à 10AM. D'autres exemples contextes de partage sont listés dans le Tableau
1.
Tableau 1: Exemples de contextes de partage
Contexte Description
(Bus, [8AM, 10AM]) N'importe quel bus de 8AM à 10AM
(“”, [8AM, 10AM]) N'importe quel endroit (lieu) de 8AM à 10AM
(“”, ø) N'importe où et n'importe quand
18 Nous utilisons les concepts «contexte de partage» et «contexte» de façon interchangeable.
iv
Nous définissons deux types de contextes de partage: abstrait et réel. Un contexte de
partage abstrait décrit quand et où un pair autorise les autres pairs à télécharger des fichiers
à partir de son dispositif. Par exemple, un utilisateur peut spécifier que d'autres peuvent
télécharger des fichiers à partir de son dispositif partout et chaque fois qu'il est dans un
MANET en fixant le contexte de partage à ("", ø). Cependant, cela ne signifie pas qu'il est
dans un MANET 24 heures sur 24 et 7 jours sur 7. Un contexte de partage réel est dérivé
d’un contexte de partage abstrait en considérant le temps et le lieu réel dans lequel les
données ont été partagées. A titre d’illustration, supposons qu’un pair ayant un contexte de
partage abstrait ("", ø) est connecté avec d'autres utilisateurs nomades via un MANET dans
le Bus 27 de 8h00 à 8h10. Le (Bus 27, [8h00, 8h10]) est un contexte réel déduit du
contexte abstrait ("", ø).
Un intérêt au partage S est décrit par :
[1] |S| ≥ 1,
[2] Description(I1) ≠ Description(I2) pour tout I1, I2∈ S
[3] ∑∈
=SI
IWeight 1)(
[4] Weight(I) ≥ minW où minW désigne le poids minimal d'un intérêt.
La condition que nous utilisons pour décider de la similitude entre deux intérêts au
partage S1 et S2 est la similitude des intérêts des deux ensembles, c'est-à-dire, pour chaque
intérêt Ii dans S1, il doit y avoir un intérêt Ij dans S2 tel que Ii ≈ I et réciproque. Les intérêts
au partage ne sont pas similaires si cette condition n'est pas satisfaite. Nous utilisons la
mesure de cosinus pour déterminer la similarité entre deux intérêts au partage satisfaisant
la condition principale.
Supposons les deux intérêts au partage S1={I1i,..,I1n} et S2={I21, ..,I2m} tel que |S1|=n et
|S2|=m.
Soit W1i et W2i les poids relatif de I1i et I2i respectivement; la représentation vectorielle de
S1, notée P1, et de S2 , notée P2 ,sont donnée par
• P1=(W11,..,W1n) et
• P2=(W21, ..,W2m)
Soit W12i représente le poids moyen des intérêts de S1 qui sont similaires à l’intérêt I2i.
Soit W21i représente le poids moyen des intérêts dans S2 qui sont similaires à l'intérêt I1i.
v
Pour un intérêt au partage S, soit Sim(S,I) les intérêts de S qui sont semblables à l'intérêt
I;c-a-d Sim(S,I)={Ij| Ij∈S et Ij≈I } ; W12i et W21i sont calculés comme suit:
),(
)(W
21
),(12i
21
i
ISSimI
ISSim
IWeighti
∑∈=
),(
)(W
12
),(21i
12
i
ISSimI
ISSim
IWeighti
∑∈=
Soit P12 égal la représentation vectorielle de S1 par rapport à S2 et soit P21 égal la
représentation vectorielle de S2 par rapport à S1; nous définissons ces vecteurs comme suit:
• P12 = (W121,..,W12m)
• P21=(W211, ..,W21n)
La condition de similitude est satisfaite par S1 et S2 si et seulement si :
∀ Ii ∈ S1, ∃ Ij∈ S2 tel que Ii ≈ Ij et
∀ Ii ∈ S2, ∃ Ij∈ S1 tel que Ii ≈ Ij
Nous définissons la valeur de similitude entre les intérêts au partage S1 et S2 comme suit:
⎪⎪⎩
⎪⎪⎨
⎧ +
=
sinon 0
satisfaiteest principale similitude decondition la si 2
) ,cos(),cos(
),(
122211
21
PPPP
SSSimilarity
Similarity(S1,S2) est commutative. Deux intérêts au partage S1 et S2 sont similaires si et
seulement si :
S1 ≈ S2 ⇔ Similarity(S1,S2) ≥ accC où accC est un seuil de similarité prédéfini.
Un intérêt au partage peut être utilisé comme demande ou provision d’information. Une
demande d’informations d'un pair est un intérêt au partage qui contient les intérêts
décrivant les informations que ce pair souhaite recevoir. Une provision d'informations
contient les intérêts décrivant les informations que ce pair est prêt à fournir.
vi
Information-Demand(p, pd, c) représente la demande d’informations du pair p observée
par le pair pd dans le contexte c. Quand p est égal à pd,, Information-Demand (p, pd, c) est
notée Information-Demand(p,c).
Information-Provision(p, pd, c) représente une provision d’informations d’un pair p
observé par un pair pd dans un contexte c. Quand p est égal à pd, Information-Provision(p,
pd, c) est considéré comme Information-Provision(p, c).
Overall-Demand(P) représentent les intérêts d’un ensemble de pairs P décrivant les
informations que ces pair souhaitent recevoir.
Overall-Provision(P) représentent d’un ensemble de pairs P à décrivant les informations
que ces pair sont prêts fournir.
Les intérêts des utilisateurs peuvent être exprimés manuellement par eux mêmes. Par
exemple, un utilisateur peut déclarer qu'il est intéressé par la réception des blagues dans le
contexte bus 37. Les intérêts des utilisateurs peuvent, aussi, être calculés automatiquement
en utilisant les requêtes et les annonces échangées dans l’historique. Les intérêts peuvent, également, être déterminés en utilisant des règles d'associations.
Une règle d'association liée à une demande d'informations est écrite comme suit :
<Contexte=c> ⇒ <Demande-Information =D>
Exemple: la règle ci-dessous indique que de 8h00 à 8h10 dans n'importe quel lieu, la
demande d'informations de l’utilisateur contient un intérêt lié à la finance (70%) et autre
intérêt lié au tourisme (30%).
< contexte = (””,[8AM-8:10AM]>
< Demande-Information = {({finance},0.70), ({tourisme},0.30)}>
Un pair demandeur peut utiliser des règles d'association afin d'identifier ses demandes
d'informations. Une source de données peut produire des règles d'associations pour
identifier des demandes d’informations des pairs demandeurs. Toutefois, l'extraction de
règles est trop coûteuse pour être utilisée pour chaque pair demandeur rencontré dans un
MANET. Par conséquent, une source de données devra choisir les pairs importants pour
lesquels des règles d'associations seraient produites. Nous proposons l’utilisation des liens
sociaux d'une source de données pour identifier les pairs importants.
Une source de données pouvait avoir plusieurs liens sociaux, l’identification des intérêts
des pairs pourrait être coûteuse. Nous proposons, donc, d’identifier les pairs qui ont
vii
l’habitude de partager des informations avec une source de données, de les placer dans des
groupes sociaux et de les trier selon la similitude de leurs intérêts au partage. Les groupes
sociaux, ainsi crées, sont, alors, utilisés pour identifier les intérêts des pairs.
1.2 Sensibilité à la mobilité
Comme discuté dans la section précédente, le partage d'informations dans un MANET
est généralement effectué par la diffusion des annonces et des requêtes. La sélection et la
diffusion d’annonces de fichiers sont déterminées par une politique d’annonces qui guide
le volume d'informations dans une annonce, la période après laquelle une autre annonce
devrait être faite et le nombre de pairs traversés par une annonce. Pour ne pas surcharger
les environnements avec de trafic inutile, une politique d’annonces doit être conçue selon
les demandes et les provisions d’informations des utilisateurs. Ces dernières sont
conditionnées par le contexte d'utilisateurs, leur temps de connexion et le temps qu'ils
restent ensemble dans un MANET.
Un MANET est défini par une collection de dispositifs connectés par des technologies
de communication sans fil. En pratique, on peut considérer un MANET comme un
ensemble de pairs. Deux pairs peuvent avoir une connexion directe ou ils peuvent être
connectés indirectement via d'autres pairs. Ils sont appelés des voisins directs s’ils ont une
connexion directe et voisins multi-sauts s’ils sont connectés d’une manière indirecte via
d’autres pairs.
Chaque pair a une vue locale d'un MANET appelée MANET-View. Supposons que
chaque utilisateur dans le bus possède un dispositif mobile. Il y a toujours un MANET
dans un bus. Cependant, la MANET-View pour un utilisateur particulier est limitée au
moment où il monte et descend du bus. En plus des contextes liés au temps et au lieu, une
MANET-View d’un pair est limitée par ses connaissances. En effet, une MANET-View d’un
pair continent l’ensemble des pairs avec lesquels il peut communique (directement ou
multi-sauts).
viii
Un MANET, noté par V(P), est un ensemble de pairs communiquant via les technologies
de communication sans fil. Une MANET View, noté V(P, p0), est une projection d'un
MANET définies en utilisant les connaissances d'un pair p0.
Pour les pairs p1 et p2, soit stay-time(p1, p2) le temps estimé que p1 et p2 restent
connectés. Connectivity-lifetime(V(P,p0,)) est défini comme le temps moyen que le pair p0
reste connecté avec les autres pairs dans une MANET View V(P,p0).
Prenons une MANET View V(P,p0) et un pair p0 ∈ P; une statistique de partage, notée
s(p0,c), décrit le comportement quantitatif des pairs dans la MANET View V(P,p0) dans le
contexte de partage c. Une statistique de partage s(p0,c) est composée des attributs
suivants :
Hop(s(p0,c)): distance moyenne entre le pair p0 et les autres pairs,
Files-provisioned(s(p0,c)): nombre moyen de fichiers fournis par p0 à un pair,
Queries-received(s(p0,c)): nombre moyen de requêtes reçues par p0,
Usage-factor(s(p0,c)): nombre de fichiers découverts et téléchargés à partir de p0
grâce à des annonces faites par p0,
Co-lifetime(s(p0,c)): connectivity-lifetime(V(P,p0)) décrit ci-dessus.
Une classe de mobilité noté m(p, c) est une structure utilisée par un pair p pour décrire
un groupe de MANET View selon leur temps de connectivité (connectivity lifetime) dans
le contexte de partage abstrait c. L’idée sous-jacent à la notion de class de mobilité est que
la même politique d’annonces est appliquée dans les MANET-Views décrit par les mêmes
classe de mobilité. Les attributs importants d'une classe de mobilité sont range–
lifetimes(m(p,c)) et adv-policy(m(p,c)). range–lifetimes(m(p,c)), noté [tmn,tmx), indique que
le temps de connectivité d'une MANET View décrite par m(p,c) est supérieur ou égal à tmn
et inférieur à tmx. adv-policy(m(p,c)) est la politique d’annonces appliquée dans des
MANET Views décrites par m(p, c). adv-policy(m(p,c)) est définie par:
• Adv-volume(m(p,c)): volume maximum d'une annonce,
• Adv-radius(m(p,c)): nombre maximal de sauts qu’une annonce traverse, et
• Adv-period(m(p,c)): temps après lequel une annonce devra être répétée.
ix
Un pair peut identifier une class de mobilité en utilisant la durée de vie de connectivité
d'une MANET View, qui est déterminé par les temps de connexions des pairs.
Dans le manuscrit, nous décrivons comment utiliser des règles d'associations pour
déterminer les classes de mobilité dans un contexte donné. Par exemple, la règle
d’association ci-dessous associe la classe de mobilité m3 au contexte bus 3 à 8 heures de 8:
10 heures.
<Contexte = (Bus 3, [8 heures-8: 10 heures])> <classe de mobilité =m3>
1.3 Classification des fichiers
Dans cette thèse, nous proposons d'organiser les fichiers de manière hiérarchique dans
des structures appelées clusters. La structure contenant les clusters est appelée
arborescence de fichiers. La racine de l'arborescence est un cluster artificiel représentant
tous les fichiers partageables.
Une arborescence de fichiers se construit du bas vers le haut. Tout d'abord, les fichiers
sont classifiés dans des clusters, les groupes résultants sont, alors, classifiés dans d'autres
clusters. Ainsi, la classification continue jusqu'à ce qu'on obtienne une arborescence ayant
la profondeur demandée. Des fichiers partageables qui sont ajoutés après le calcule de la
classification vent être insérer dans des clusters de niveau 1.
Nous proposons de calculer la hauteur d'une arborescence de fichiers et le nombre de
clusters définis chaque profondeur selon les classes de mobilité représentatives de l’activité
de partage de utilisateur. Les classes de mobilité représentatives sont les classes de
mobilité qui affichent des différences significatives en termes de volumes d’annonces.
Soit ß représente un seuil indiquant qu’il existe une différence significative entre deux
volumes d’annonces ß > 1. On dit qu’une classe de mobilité mi est significativement plus
grande qu'une classe de mobilité mj (noté comme mi > mj) si et seulement si :
x
β≥−−
)()(
i
i
mvolumeadvmvolumeadv
On dit qu'une classe de mobilité mi est significativement moins grande qu'une classe de
mobilité mj (noté comme mi < mj) si et seulement si mj > mi. Soit nf égal le nombre de
fichiers partageables, soit M représente l’ensemble des classes de mobilité considérées par
un pair pendant le partage d'informations. La liste Mimp ⊂ M est appelé l’ensemble des
classes de mobilité représentatives si et seulement si :
1. mi<mj ou mj<mi, ∀mi,mj ∈ Mimp
2. pour ∀m ∈ M - Mimp, l’une des conditions suivantes est satisfaite
d. ∃mi∈ Mimp tel que β<−−
)()(
mvolumeadvmvolumeadv i
e. β<− )(mvolumeadv
nf
3. L’une des conditions suivantes est satisfaite :
a. La classe de mobilité se trouve dans la liste, c'est-à-dire, ∃mi, mj ∈ Mimp
– {m} tel que mi<m<mj,
b. La classe de mobilité se trouve à la fin de la liste, c'est-à-dire, mi<m ∀mi
∈ Mimp –{m} et β* adv-volume(m) ≤ nf,
c. La classe de mobilité se trouve au début de la liste, c'est-à-dire, m<mi,
∀mi ∈ Mimp – {m} et ∃ mj ∈ M tel que m > mj
La hauteur de l'arborescence de fichiers est |Mimp|. Le nombre de clusters à chaque
profondeur k est égal au volume d’annonces attaché à la classe de mobilité à la kème place
de Mimp.
2 SAMi
Dans cette thèse, nous proposons un intergiciel auto-adaptatif appelé SAMi. La Figure 27
illustre l'architecture SAMi. Chaque pair souhaitent participer au MANET doit exécuter
xi
l’intergiciel. SAMi stocke les données des gestions dans quatre bases de données : “local
repository“, “advertisement data-store“, “MANET View data-store“ et “rule base“.
“Local repository“ et “advertisement data-store“ contiennent les descriptions des
fichiers. La base “MANET view“ contient des informations historiques concernant les
activités de partages. “Rule base“ contient les règles d'associations utilisées pour associer
un contexte à un intérêt au partage ou à la classe de la mobilité.
Le intergiciel est composé de trois modules: (i) le “context manager” ; (ii) l’
“advertisement manager” ; et (iii) le “file manager”. Un dispositif peut exécuter un ou
plusieurs modules. Le module “file manager” est, cependant, obligatoire.
Le module “context manager” détermine les classes de mobilités et les intérêts au
partage. Il détermine, aussi, les besoins d'informations des utilisateurs en analysant leurs
agendas, leurs habitudes et leurs historiques de requêtes.
Advertisement data-store
Local repository
Personal data-store
Rule base MANET-Viewdata store
Adv
ertis
emen
t Man
agem
ent
File
Man
ager
Context Manager
File Discovery
File Delivery
File Adaptation
Figure 27 : Architecture de SAMi
Le module “File manager” effectue les fonctionnalités de gestion de fichiers via les
modules “file discovery”, “file delivery” et “file adaptation”. Le module “file discovery”
est chargé de rechercher des sources d'informations; le module “file delivery” est utilisé
pour télécharger des fichiers; et le module “file adaptation” est utilisé afin de modifier les
formats de fichiers par rapport au contexte et aux préférences des utilisateurs.
xii
Enfin, le module “Advertisement Manager” est chargé de communiquer aux autres pairs
les fichiers partageables stockés dans le dispositif d'une source de données. Il détermine le
contenu et la distribution des annonces selon la classe de mobilité d’une MANET-View et
les intérêts des pairs participant au MANET.
2.1 Le module ‘Context manager’
Le module “Context manager” est chargé de déterminer la classe de mobilité
correspondant à une MANET Veiw. La classe la mobilité décrivant une MANET View est
déterminée en analysant le temps de connectivité de la MANET Veiw. Des classes de
mobilité peuvent également être déterminées à l'aide de règles d'associations. La règle,
décrite ci-dessous, indique que la MANET-View observée dans un bus à tout moment 19 est
décrite par la classe de mobilité m3.
<context = (Bus,∅)> <mobility class = m3>
Le module “Context manager” est responsable de la détection des besoins
d'informations. Les besoins d'informations d'un utilisateur sont déterminés en fonction des
agendas et des habitudes de l’utilisateur. Par exemple, pour un utilisateur qui a l'habitude
d'écouter des chansons pendant un long voyage avec une préférence pour les chansons la
chanteuse Whitney Houston, SAMi commence à rechercher les chansons de cette chanteuse
dés que l'utilisateur à planifie son voyager.
Un utilisateur peut également décrire l'information dont il a besoin pour satisfaire les
activités dans son agenda. Par exemple, il peut préciser la documentation concernant
"Comment faire face à des hommes d'affaires» qui est nécessaire afin de satisfaire l’activité
mentionnée dans son agenda qui concerne une réunion avec des hommes d'affaires.
19 ∅ est utilisé pour représenter tout moment.
xiii
Les besoins en informations d'un utilisateur sont enfin déterminés en utilisant ses
intérêts. Par exemple, des nouvelles sportives sont recherchées si l’on identifie que
l’utilisateur est intéressé par ce type d’information.
Le module “Context manager” est également chargé de déterminer des provisions en
informations et des demandes d’informations des utilisateurs. Les règles d’associations
(comme les règles ci-dessous) sont utilisées pour déterminer les dispositions d'informations
et les demandes d'informations des utilisateurs par rapport à leur contexte.
<context = (Bus, ∅)> <information provision = {({Football},0.5}, ({news},0.5)}>
<context = (Bus, ∅)> <information demand = {({film},0.5}, ({music},0.5)}>
Les groupes sociaux présents dans la MANET View, peuvent être utilisés pour
déterminer les provisions d'informations et des demandes d’informations des utilisateurs.
2.2 Le module: ‘Advertisement Manager’
Le module “Advertisement manager” est responsable de la distribution d’annonces aux
voisins d’un pair. Un message d’annonces contient des clusters trouvés dans l’arborescence
de fichiers à un niveau plus ou moins profond. La classe de mobilité actuelle est utilisée
pour déterminer le volume de l’annonce. La demande globale des pairs est utilisée dans le
but de déterminer le contenu des annonces (c.-à-d. dans le but de bien proposer des
informations qui a priori ils intéressent).
Nous proposons de calculer le contenu des annonces en fonction :
• des intérêts des pairs présentés dans le MANET,
• de la classe de mobilité décrivant la MANET View,
• et de l’emplacement des fichiers dans l’arborescence de fichiers.
Soit m égal une classe de mobilité décrivant la MANET View actuelle. Soit Sod la demande
globale des pairs à la MANET View (i.e., Sod est Overall-Demand(P) défini dans la section
xiv
1.1 tel que P est l’ensemble des pairs participant dans la MANET View). Le quota de
l’annonce, notée N(I), pour l'intérêt I dans l'ensemble Sod, noté N(I), est calculé comme :
N (I) = weight (I)*adv-Volume (m)
Soit F représente un ensemble des fichiers et soit Ck représente un ensemble de clusters
trouvés à la profondeur k de l'arborescence de fichier. Soit F(I) F l’ensemble de fichier ⊆
correspondants à l'intérêt I et soit Ck(I) C⊆ k l’ensemble de clusters correspondant à
l'intérêt I.. Un fichier f et un cluster c est placé dans F (I) et Ck (I) respectivement si et
seulement si (i) c et f sont similaires à I ; et (ii) pour n'importe quel intérêt Ij dans la
demande globale, c et f sont plus similaires de I que de Ij.
Pour un intérêt vide Ie,, c'est-à-dire, Description(Ie)=∅, F(Ie) et Ck(Ie) sont calculé
comme suit.
• et { }
∪eod ISI
IF−∈
= )(-F )F(Ie
• { }∪
eod ISII
−∈
= )(C- C )(IC kkek
L’Algorithme 5 détermine les métadonnées des fichiers et des clusters destinées à être
distribuées dans l'environnement. Le principe de l’algorithme est comme suit : lignes 3 à 6,
toutes les métadonnées des fichiers dans F(I) sont sélectionnées, si N(I) est assez grand
pour annoncer les fichiers en utilisant les métadonnées de chaque fichier. Sinon,
l'algorithme recherche une profondeur de l'arborescence de fichier tel que le nombre de
clusters à cette profondeur sont inférieure de N(I). Cette profondeur est appelée k. Si la
recherche échoue, les métadonnées des clusters les plus similaires à la profondeur 1 sont
placées à ADV(I). Sinon, comme décrit dans les lignes 15 à 22, les métadonnées des
clusters les plus pertinentes trouvées dans de la profondeur k à de la profondeur h sont
placées dans l'ensemble ADV(I), selon leur position dans l'arborescence de fichiers et leur
similitude à l'intérêt I. Après avoir examiné tous les clusters ci-dessus, certaines
métadonnées des fichiers peuvent être placées dans ADV(I).
xv
Algorithm: Préparation de messages d’annonces Input: h, Sod, F(I) ∀I ∈ Sod, Ck(I) pour 0<k≤h et chaque I ∈ Sod h : hauteur de l'arborescence des fichiers Sod : la demande globale F(I) : fichiers correspondant à l'intérêt I Ck(I) : clusters correspondant à l'intérêt I et trouve à la profondeur k Output: Adv(I) pour tout I ∈ Sod Adv(I) : métadonnées d’annonces à l'égard de l'intérêt I ∈ Sod Begin 1. For each I ∈ Sod 2. ADV(I)=∅
/* Sélectionner l'ensemble des métadonnées des fichiers si N(I) est assez grand pour faire de la publicité un par un */
3. If (N(I) ≥ |F(I)| ) 4. ADV(I)={metadata(f)| f∈ F(I)} 5. Exit 6. End If
/* Recherche de la profondeur où il ya moins de N(I) clusters */ 7. k=h 8. While ((|N(I) ≤ |Ck (I)|) && (k>0)) 9. k-- 10. End while
/* S'il n'ya pas de profondeur où il est inférieur à N (I) des clusters, sélectionner quelque clusters en profondeur un Relevant (C, I, n): contient les n plus similaires clusters à l'intérêt I dans C */
11. If (k==0) 12. ADV(I)={metadata(c)|c∈Relevant(C1(I),I,N(I))} 13. Exit 14. End If
// Clusters sélectionner en fonction de leur profondeur dans l'arborescence des fichiers 15. While((|Adv(I)|<N(I)) & (k≤ h)) 16. If (N(I)-|Adv(I)| ≥ |Ck (I)|) 17. ADV(I)={metadata(c)|c∈Ck(I)}U ADV(I) 18. Else 19. ADV(I)={metadata(c)|c∈Relevant(Ck(I),I,N(I)-|Adv(I)|)} U ADV(I) 20. End If 21. k++ 22. End while
/* Sélectionner les fichiers s'il ya encore des places libres en ADV(I) Relevant (F, I, n): contient les n plus pertinentes fichiers (i.e., similaires) à l'intérêt I dans F*/
23. If(|Adv(I)|<N(I)) 24. ADV(I)={metadata(f)|f∈ Relevant(F(I),I,N(I)-|Adv(I)|)} U ADV(I) 25. End If 26. End for End Algorithm
Algorithme 5: Préparation de messages d’annonces
Apres avoir calculé ADV(I), la source de donnés le transfère à ses voisins directs qui
satisfont une des deux conditions :(i) le voisin est situé dans la direction de pairs ayant une
xvi
demande d'informations correspondant à l'intérêt I ; et (ii) le voisin a un haut degré de
collaboration avec la source de données. Dans le premier cas, la méthode proposée par
l’algorithme de routage LAR [81] est utilisée pour sélectionner les voisins en fonction de
leur emplacement. Dans le deuxième cas, les voisins sont déterminés par rapport à leur
historique de partage. Un pair acceptant l’annonce la retransmettra de la même façon.
Un pair-source de données retransmettra éventuellement l’annonce après une période noté
adv-période(m). Dans l'intervalle, le pair va essayer d'améliorer ses connaissances sur le
temps de connexion des pairs dans la MANET View et d’affiner la classe de mobilité
décrivant la MANET View.
2.3 Le module: ‘File Manager’
Le module “File-Manager” est chargé de découvrir et de télécharger des fichiers
correspondant à une requête via deux phases : (i) la découverte d’informations ; et (ii) le
téléchargement d’informations. La phase de découverte d'informations est utilisée pour
découvrir les pairs possédant les fichiers correspondant à la requête. Quant à la phase de
téléchargement, elle est utilisée pour récupérer les fichiers.
F(q) et C(q) représentent, respectivement, les fichiers et les clusters correspondant à une
requête q. La requête q est décrite par une liste de mots-clés. Un fichier f et un cluster c
sont placés dans F(q) et C(q), respectivement si ils sont similaires à la requête q.
Tout d’abord, certains fichiers dans F(q) sont supprimés si ce n’est pas possible de les
fournir. De même, certains des clusters dans C(q) sont supprimés s’il n'est pas possible de
découvrir des fichiers groupés dans ces clusters. Les fichiers sont placés dans F(q) en
fonction de leur pertinence par rapport à q. Si le nombre de fichiers sélectionnés ne suffit
pas, les clusters les plus pertinents dans C(q) sont sélectionnés comme des sources
potentielles de fichiers correspondant à q et les messages de découverte sont envoyés aux
ces sources potentielles. Des messages de découverte sont également envoyés aux pairs
ayant une provision d'informations correspondant à q.
xvii
Après que la phase de découverte d’informations soit achevée, la phase de téléchargement
de l’information commence. Le but de cette phase est de choisir une ou plusieurs sources
d'informations pour télécharger un fichier. Le téléchargement d’informations est exécuté
comme suit :
• SAMi recherche les sources d'informations qui peuvent acheminer le fichier en
entier. Si plusieurs pairs sont capables d’effectuer l’acheminement, SAMi
sélectionne un pair en fonction de sa distance par rapport au pair concerné et au
temps qu’ils vont rester ensemble.
• Si aucun pair n’est pas en mesure de transférer le fichier en entier, SAMi recherche
une combinaison de pairs (p1, p2, …, pk) telle que pi fournit une portion du fichier
(appelé sfi) et c’que la fusion de (sf1, sf2, …, sfk) donne le fichier demandé.
3 Conclusion et perspectives
Dans cette thèse, nous proposons un modèle théorique le système de partage
d'informations adapté aux MANETs qui prouvent découvrir des fichiers selon les intérêts
des utilisateurs et la dynamicité du réseau. Nous proposons, aussi, une méthode
d'organisation des fichiers en arborescence permettant de faciliter la découverte des
fichiers. Pour mettre en œuvre le modèle théorique proposé, nous décrivons un intergiciel
auto-adaptatif appelé SAMi.
Actuellement, SAMi peut être utilisé pour permettre aux utilisateurs nomades de partager
des fichiers sous condition sur les droits d’accès. Dans l'avenir, nous planifions d’étendre
SAMi pour distribuer les annonces des fichiers selon les droits des pairs.
Dans cette thèse, une arborescence de fichiers est construite en utilisant une technique de
classification non supervise afin de faciliter la découverte des fichiers. Dans l'avenir, nous
planifions d’utiliser une technique fondée sur une ontologie pour enrichir la technique non-
supervisée proposé. Nous planifions, aussi, d’utiliser des intérêts des utilisateurs pour
optimiser la classification des fichiers.
xviii
Annex B. Detailed Design of SAMi
State Diagram
In the middleware, a user and a device have states as shown in the state diagrams displayed
in Figure B-1 and Figure B-2.
Figure B-1: State of a device
A device has four main states: isolated-idle, isolated-busy, inMANET-idle and inMANET-
busy. The prefix isolated and inMANET indicate a device is in and not in a MANET
respectively. The suffix idle indicates that no program is running on the device while the
suffix busy indicates that programmers are running.
There are two important states for a user: States idle and busy. A user can be interrupted in
the idle state.
xix
Figure B-2: States of a user
Activity Diagram
Advertisement (Figure B-3) is performed when a peer enters in a MANET by using
activities listed in Table B-1. The advertisement policy (period, content and radius of the
advertisement) should be determined by the advManager. The advertisement time and
advertisement message are determined and prepared by the advManager. When the
advertisement time is arrived, the advertisement message is distributed by the messenger
object. The above process is repeated until the device is out of the network.
In SAMi, information-needs of a user can be identified from queries of users as discussed
in Figure B-4 and from their agendas as in Figure B-5 by using activities mentioned in
Table B-1. The entered query can be searched directly if the device is in MANET and is
idle. Otherwise, the query is passed to query Manager for later treatment, otherwise.
When a user enters an agenda, the information manager extracts a query in order to search
documents that are needed to accomplish the agenda. If the device state is inMANET-idle
and the query should be treated urgently (the agenda is planned after a few hours) or the
query goes with the context of the environment (the interests of the user in the MANET
matches with the query), the query is treated directly, and it will be treated later, otherwise.
<Advertisement>M
esse
nge
rC
ont
ext
Ma
nage
rA
dvM
ana
ger
Dev
ice
[enters in a MANET] prepare advMessage calculateTimeToAdv
sendAdvMessage
[it is TimeToAdv]
determine profile
determine adv-policy
Figure B-3: Activity diagram of advertisement
Table B-1: Important activities to perform advertisement
Activity Description
Determine profile calculates the interests and the mobility class of users
Determine Adv policy determines the period, the content and radius of
advertisement with respect to users’ interest and mobility
class
Prepare advMessage prepares advertisement message
calculateTimeToAdv determines the time to make advertisement as current time
plus a random number between zero and the period of
advertisement
sendAdvMessage distributes advertisement in the vicinity
xx
<Information extraction from user >
<Device> <InfManager><user>
[Enter a query] [state = inMANET-idle]
[state != inMANET-idle]
Search file
search is successful
search isn't successful
Treat a query later
Figure B-4: Searching information for a user query
As shown in Figure B-6, queries that have been kept for later treatment are searched if the
device enters in another MANET. The query which deadline is approaching will treated
first. The query that goes with the information provision capacity of the user will be treated
next.
xxi
<Extract information need from Agenda ><I
nfM
anag
er>
<D
evic
e><U
ser>
[agenda is entered]
[need information for the agenda]
[state = inMANET-idle]
[query go enviromental context]
[urgent query] [search is successfull ]
Treat a query later
extract Query
search file
[Nothing is needed]
(State != inMANET-idle]
[Other queries] [Search is not successfull ]
Figure B-5: Activity diagram of information extraction
<Query treatement>
<In
fManager
><D
evi
ce>
[state=INMANET-idle]
take a urgent query
no uregnt query
take a contextual query
search a file
[state=INMANET-idle]
[no query to treat]
treat a query later
[search is successful ]
[Search is not successful ]
[state != INMANET-idle]
There are queries
Figure B-6: Activity diagram of query treatment
xxii
xxiii
Table B-2: Activities to extract and search information
Activity Description
Search file Searches a file expressed by a query
Treat a query later Puts a query for later treatment
Extract a query Extracts a query from a user agenda or habit
Take urgent query Selects a query which will be expired before a peer
involves in another MANET
Take contextual query Selects a query which go with the information provisions
of users in the vicinity
Advertisement can be used to identify information-sources for a query as shown in Figure
B-7. The file indicated by the advertisement will be downloaded if it does not exist locally
and matched with a query.
<Usage of Advertisement >
<Inf
Man
ager
><M
esse
nger
>
[advertisement is accepted]
matches a query download file
[the advertisement is for a file[matches with historical query download file
Figure B-7: File searching from incoming advertisement
Rule identification is done offline as shown Figure B-8. When a device is in isolated-idle
state, the class rule-miner estimates the time that a peer stays in the state (calculate life-in-
state). If the time is enough to mine rule, the rule mining will be performed.
xxiv
<Rule mining>
<Rule Miner><Device>
[device is isolated -idle]calculate
life-in-state, the time that a node stayes in
this state
[life-in-state < mining-time
mine-rules
Figure B-8: Activity diagram of rule mining
As rule identification, file classification and representation are done offline. As shown in
Figure B-9, when the middleware starts working it represent the files, classify them into
clusters and then represent the clusters. A new file is grouped under a leaf-cluster that is
more similar to the file. When a tree is unbalanced, it will be modified to create a balanced
one. The modification of tree can be done when the environmental context is changed. As
classification of files, modification of tree is done offline.
Figure B-9: Activity diagram file representation and classification
xxv
Sub-system Decomposition
The component SAMi-adaptor (Figure B-10) contains only one package. It passes inputs
entered through other messenger to SAMi-basic and displays the output produced by
SAMi-basic by using interface provided by the messenger. The main component of SAMi-
adaptor is the interface plug-in.
Figure B-10: A SAMi-Adapotor yahoo messenger
SAMi-thin (Figure B-11) is used to allow thin devices to participate in the information
exchange. It is composed of 3 packages: Login; Collaborator and UserInterface. Note that
the packages are not unique for this component. They can be used with/without mediator
for the other components as well.
The login package is used to verify that an authorized user accesses the middleware. The
UserInterface package is used to accept basic inputs of SAMi, i.e., a query, a user profile
and an agenda. The Collaborator package is used to ask other peers in the surrounding to
search information on behalf of a user owning a thin device.
xxvi
Figure B-11: SAMi-thin
As shown in the diagram displayed in Figure B-12, the SAMi-GUI component contains a
package called UserInterface, which is also a part of the component SAMi-thin. The
package contains four interfaces and four classes that implement the interfaces. The class
guiFileIO is used by to accept a query and to display a query recommendation,
advertisements, and files that are downloaded recently. The class uiUserIO is used to
accept a user preference, state, agenda and profile. The envIO is used by the administrator
to configure mobility classes. The guiMain is used by a user to navigate from one interface
to the other.
Figure B-12: SAMi GUI
xxvii
SAMi-GUI is implemented by extending the user interface classes of J2ME. It is consists
of the classes displayed in Figure B-13.
mainMenu
browseMenu aboutForm settingMenusearchForm
advFormruleForm
tempStoreFormHistForm
habitFormagendaForm
prefFormmobilityForm
Figure B-13: Classes in SAMi-GUI
The component SAMi-core (Figure B-14) is composed of the three packages that access
the advertisement data-store, the MANET-View data-store and the local repository. The
advertiser package is responsible to distribute and to manage advertisements according to
the context of a user and the environment. The inf-Manager package searches and
downloads files according to a user query, agenda and habit.
xxviii
Figure B-14: SAMi-core
SAMi-ext (Figure B-15) is composed of two packages: ruleExtractor and extInfMang. The
ruleExtracto package identifies rules by analyzing historical data-store and puts the
resulted rules in rule Base. The extInfManager package is used to classify files into
clusters, represent files and clusters in vector space, and manages file adaptation.
Figure B-15: SAMi-ext
xxix
Annex C. Important classes of SAMi
Inf-Manager
lstFiles: the list of metadata of shareable files.
lstCluster: the list of metadata of clusters found in each depth of the file tree.
THeight: the height of the file tree.
resp-limit: the maximum number of files returned for a query.
numUploads: the number of files sent to the neighbors in the current session.
processQuery(): identifies files that match a query and prepares a response.
prepareReponse(): prepares a response for a query.
removeException(): removes the files that are identified as exception by a query.
searchByTitle(): searches files according to their title.
searchbyCategory(): searches files according to their category.
xxx
getTHeight (): returns the height of the file-tree.
mapFileToInterest(): identifies files that match with the interest of a user.
mapFileByCategory(): identifies files that have/similar to a given category.
mapFileByTitle(): identifies files that have/similar to a title passed as an argument.
mapClusterToInterest(): identifies clusters that match with the interest of a user.
mapClusterBy category(): identifies clusters containing files having/similar to a given
category.
mapClusterByTitle(): identifies clusters that have/similar to a title passed as an argument.
nearer():returns the interest which is more similar to a given file.
uploadFile():sends whole or a part of a file.
isExist():returns a file having a given meta.
Discovery-Manager
lstDiscovery: a list of discovery objects that can be used to discover files for a query.
maxFile: the maximum number of files searched for a file.
searchFile(): searches files and their sources by creating a discovery object.
cleanDiscovery(): removes a discovery object and register a query dealt by the object in
historical data-store.
accpetResponse(): accepts a response and hands it to an appropriate discovery object.
searchDiscovery(): searches a discovery object that searches a response for a given query.
xxxi
File Discovery
maxfile: the maximum number of files that can be discovered for a query.
discoveryDeadline: the maximum time that files should be discovered.
q: a query for which information is discovered.
lstInf: a list of files discovered for a query.
lstRsp: a list of responses accepted for a query.
searchFile():searches file for a query by distributing discovery message and from
advertisement data store.
setMetafile():adds the metadata of a file and the source of the file in lstInf.
distributeMessage():distributes discovery message for potential sources.
getInfDis():returns the attribute lstInf.
searchAgain(): performs a further search.
approvalDelivery(): passes the lstInf to the deliveryManager object.
getQueryID(): returns the id of the query that the object is dealing with.
acceptResponse():accepts a response for the distributed message.
xxxii
DeliveryManager
download: assigns objects to download a given list of files.
cleanDelivery: removes a delivery object.
acceptDelivery: transfers a portion of a file to an appropriate delivery object.
searchDelObject: searches a delivery object that deals with the file specified by a given
metadata.
File Delivery
maxSource: the maximum number of sources from which the file can be downloaded.
Meta: the metafile considered by the object.
lstReq: the number of delivery-requests prepared by the object.
lstInf: the downloaded parts of a file with their owners.
lstSources: a list of the profiles of sources of the file with the metafile referred.
downloadFile(): searches list of sources to download the required file.
downloadPartially(): downloads some parts of the file.
acceptFilePortions(): accepts a portion of a file.
xxxiii
distributeMessageDelivery(): distributes delivery messages to the sources of the file that
the object is dealing with.
mergResult(): merges the portions of the file.
canDownLoadFull(): checks if the file that a object deals with can be downloaded by
using the given list of sources.
searchRequest():Searches a request that is sent a given source
Response
queryID: the identifier of a query about which a response is dealing.
numFile: the number of files matching with the query.
lstFiles: metadata of files matching with the query.
simValues: list of similarity values where the ith value indicate the similarity between the
ith file and the query
Query
queryID: the identifier of a query.
title: the title of the file about which the query is dealing.
catagories: the categories of the file to be searched.
xxxiv
deadline: the time after which a file should be no more searched or downloaded for the
query.
exceptions: the identifiers of the files to which the query doesn’t stand
setDeadline(): assigns the deadline of the query
Download Request
meta: the metadata of the file to be downloaded.
sourceID: the identifiers of a peer from which parts of the file will be downloaded.
requestTime: the time when the request is distributed.
divisionBy: the number that indicates into how many parts the file is divided into.
requestedPart:the part of the file that will be downloaded from the peer referred by the
object.
FileAdv
meta: the description of a file
owners: the identifiers of peers that have advertised the file referred by the object
xxxv
ClusterAdv
meta:A metadata of a cluster
owners:The identifiers of peers that have advertised the cluster referred by the object
Adv-Manager
xxxvi
radius: the number of hops that the advertisement traverses
period: the time interval between two successive advertisements
numAdv: the volume of the advertisements
advCont: the content of the advertisement
advTimeThershold: the time up to when the advertisement distribution can be delayed
IDAdvFile: the identifier of files that have been advertised to the current neighbors
IDAdvCluster: the identifier of clusters that have been advertised to the current neighbors
prevAdv: advertisements that have been made in recent history
intializeAdv(): sets the attributes the object as of the mobility class
setNumAdv(): recalculates the volume of advertisement with related to the usability of the
previous advertisement
scheduleAdv(): schedules the advertisement
prepareCont(): initializes the content of advertisement
setCont( ): selects the files and clusters to be advertised
sendAdv(): sends advertisement to a neighbor
setIDFilesClustersAdvertised(): identifies the identifiers of files and clusters that have
been advertized to the current neighbors
addFile(): adds files to be advertised
addIsolatedFile: adds a file which is classified under no cluster that the adv-manager is
aware of
addCluster(): adds clusters to be advertised
addNonIsolatedFile(): adds a file which is classified under a cluster that the adv-manager
is aware of
createBalanceDoc(): makes sure that the number of files/clusters for interests doesn’t
show significant differences as much as possible
resolveConflict(): makes sets of files matching two interests are disjoint
setIDFileCluster(): identifies of files/clusters advertise for neighbors
xxxvii
PositiveDoc
isoFiles: files that match an interest and classified under no cluster in the attribute cluster
nonIsoFiles: files that match an interest that also match files in isoFiles and classified at
least under a cluster in the attribute cluster
clusters: clusters matching the interest that match the files in isoFiles
THeight: the height of the file tree
numAdv: the volume of advertisement
intialize(): initializes the attribute isoFiles, THeight and numAdv
addClusters(): adds clusters in the attribute clusters and move files classified under this
clusters from isoFiles to nonIsoFiles
getNumAdv(): returns the volume of advertisements
getNumFiles(): returns the number of files that are referred by the object
getNumCluster(): returns the number of clusters that are referred by the object
getFiles(): returns the IDs of files referred by the object
getNumIsoFiles(): returns the number of isolated files
xxxviii
getNumNonIsoFiles(): returns the number of non-isolated files
getIsoFiles(): returns the IDs of the isolated files
getNonIsoFiles(): returns the IDs of the non-isolated files
getNumIsoClusters(): returns the number of isolated clusters, i.e. the clusters that are
referred by the object and are classified under no cluster referred by the object
getNumNonIsoClusters(): returns the number of non-isolated clusters, i.e. the clusters that
are referred by the object and are classified at least under a cluster referred by the object
getIsolatedClusters():Returns the ID of the isolated clusters referred by the object
getNonIsolatedClusters():Returns the ID of the non-isolated clusters referred by the object
PositiveCluster
isolatedClusters: a list of clusters that match an interest, founds at the same depth and
classified under no clusters matching the same interest
nonIsolatedClusters:A list of clusters that match the interest matching the clusters in
isolatedClusters, founds at the same depth and classified under another clusters matching
the same interest
addCluster():Adds an id of a cluster in isolatedClusters
getIsolated():Returns the ids of the isolated clusters in isolatedClusters
removeIsoCluster(int at):Removes a cluster from isolatedClusters
addNonIsoCluster():Adds an id of a cluster in nonIsolatedClusters
getNumClusters():Returns the number of clusters
getNumIsolated():Returns the number of isolated clusters
getNumNonIsolated():Returns the number of non isolated clusters
getIsolated():Returns the ids of t clusters in isolatedClusters
xxxix
ProQueryManager
maxNum: the maximum number of proactive queries.
lstProQuery: the list of proactive queries.
maxTime: the maximum time that a proactive query can be kept for approval.
setQueryForApproval(): adds a proactive query in lstProQuery.
deleteQuery(): deletes a proactive query.
cleanProQuery(): deletes proactive queries that are formed before maxTime.
approvesQuery(): starts searching files for the query approved by a user.
TempFileManager
tempFolderPath: the path of the folder where files can be stored temporally.
totalSize: the maximum size of memory that can be occupied by temporary files.
occupiedSize: the actual size of memory occupied by the temporary files.
lstTemfile: the metadata of files stored temporally.
xl
maxTime: the maximum lifetime of a temporarily file.
downloadFileTemporarly():downloads files temporally.
deleteTemporarly(): deletes a temporary files that disapproved by a user.
approvedTempory()): moves a file to a place indicating by a user during his/her approval.
cleanTempory(): deletes unapproved files that are download before maxTime.
AdvStore
lstFileAdv: a list of the advertised files.
lstClstAdv: a list of the advertised clusters.
lstPlatAdv: a list of the advertised devices’ profiles.
timeTolerance : the time during which the advertisements of the expired/the disconnected
peers is tolerated.
uselessTime: the minimum stay-time that a peer should have in order to considered during
query resolution.
memLimit: the maximum volume of the memory allowed to store advertisement.
xli
addAdv(): adds advertisements in the advertisement data store.
modifyDeviceProfile(): modifies a device’s profile.
adjustAdv(): deletes some advertisements according to the time and memory constraints.
deleteFileAdv(): deletes file advertisements belongs to a peer with a given ID.
deleteClusters(int dID): deletes file advertisements belongs to a peer with a given ID.
searchFile() :searches files go with a given query.
searchPotential(): searches potential sources of a file that the query is dealing with.
Sharing-history
t: the time context of a MANET-view.
loc: the location context of a MANET-view.
lstLT: a list of peers’ stay-times.
lstDemands: a list of users’ information demands in the MANET-View.
lstProvisions: a list of users’ information provisions in the MANET-View.
lstFileDist: the metadata of files that have been advertised in the view.
lstQueries: queries that have been distributed in the view.
getInterestsFromInfExc (): extracts interest from files exchanged.
getInterestsFromQueries(): extracts interests from queries.
xlii
HistDataDistributed
distID: an identifier of the file that has been distributed.
title: the title of the file.
category: the category of the file.
frequency: the number of times that the file has been distributed.
MANETID: the identifiers of the MANET-Views where the distribution has been done.
addFrequency(): increments the distribution frequency.
mergUnder(): merges a given historical data with the one referred by the object.
getSimValue(): calculates the similarity between the file that the object refers to and a
given file.
histMANET
MID: an identifier of a MANET-View.
location: the location context of the view.
xliii
time: the time context of the view.
Overall- demand: the overall information demand of peers in the view.
Overall- provision: the overall information provision of peers in the view.
avgLT: the average stay-time of peers in the view.
MetaFile
ID: the identifier of a file.
title: the title of the file.
categories: the categories of the mentioned file.
Device Profile
Id: the identifier of the device referred by the object.
Stay-time: the time where the device rests connected with the peer in consideration.
modTime:the time where the referred device sent its stay-time.
X: the x position of the device.
Y:the y position of the device.
getTTL():returns the time after which the device is unreachable.
modifyStayTime():changes the values of the lifetime and modTime attributes.
xliv
User habit
rTime:A time when a user usually performs an habit referred by the object.
duriation:The duration of the activity referred by the habit .
lstReqDocs:The documents needed to perform the habit.
location:The location where a user performs activities referred by the habit.
User Agenda
rTime: A time when a user performs an agenda referred by the object.
duriation: the duration of the agenda.
activities: activities included in the agenda.
lstReqDocs: the documents needed to perform the agenda.
location: the location where the agenda is performed.
xlv
Interest
description: the description of a user interest.
location: the locations where the interest is dependent.
now: marks if the interest is only used for the current MANET-View.
always: marks if the interest is applicable anywhere and anytime.
TimeFrom: the time after when the interest is applicable.
TimeTo: the time after when the interest is no longer applicable.
histQuery
queryID: the identifier of a query.
title: titles of files that have been discovered for the query.
catagory: categories of files that have been discovered for the query.
frequency: the number of times that a user poses the query.
MANETID: the identifiers of the MANET-views where the query been posed.
addFrequency(): increments the usage frequency of the query.
mergUnder(): merges a given query with the one referred by the object.
xlvi
getSimValue(): calculates the similarity between the query that the object refers to and a
given query.
getProQuery():Prepares a proactive query
DocDescriptors
Description: a list of words describing a cluster or a file.
maxKeywords: the maximum number of keywords used to describe a cluster/file.
addWord(): adds a word in lstWords.
getSimValue(): calculates the similarity the description with a given title, which can be a
list of String/keywords.
adjustList(): adjusts the keywords in the description according to the quota by removing
less important words.
Keyword
word: a word that describes a cluster/file.
stemWord: the stemmed form of the word.
Freq: the number of times that a word appears in the title of the file or the cluster.
increaseFreq(): increases the value of the attribute freq by one
xlvii
MetaCluster
CID: the identifier of a cluster.
depth: the depth of the cluster in the file tree.
IDFlsUnder: the files classified under the cluster.
IDClstUnder: the clusters classified under the cluster.
lstKeywords: a list of keywords describing the cluster.
addElement(): groups a given file/cluster under the cluster referred by the object.
getSimValue():calculates the similarity between the referred cluster and the given
file/cluster.
ClusterAtDept
Clusters: the clusters found at the same depth.
get(): returns the ith cluster
xlviii
infClassifier
lstCollection:Clusters at each depth of the file tree.
lstFile:A list of files’ metadata.
THeight:The height of the file tree.
lstNBClusteres:Number of clusters at each depth of the file-tree.
balanceTree():Balances the file tree.
createTree():Creates a file tree.
classifyFiles():Classifies files into different clusters.
classifyClusters():Classifies clusters into other clusters.
List of Publications
I. International Journals
• A. Negash, L.Brunie, V.Scutirici,” A context aware Information sharing Middleware
for a dynamic pervasive computing environment”,The International Journal on
Computer Science and Information Systems, Vol. 2, No. 2, pp. 65-82, ISSN: 1646-
3692, 2007
II. International Conferences • A. Negash, L.Brunie, V.Scutirici, “SAMi: A Self-Adaptive Information sharing
Middleware for a dynamic pervasive computing environment”, The International
Conference Wireless Applications and Computing. 6 pages, Lisbon, Portugal. 2007
• Y. Fawaz, A. Negash, Lionel Brunie, and Vasile-Marian Scuturici, “ConAMi:
Collaboration-based content Adaptation Middleware for Pervasive Computing
Environment”. The 4th IEEE International Conference on Pervasive Services
(ICPS’07), pp 189-192, Istanbul, Turkey, July, 2007.
• Y. Fawaz, A. Negash, Lionel Brunie, and Vasile-Marian Scuturici, “Service
Composition-based Content Adaptation for Pervasive Computing Environment”. The
3rd International Conference on Wireless Applications and Computing. pp 35-42,
Lisbon, Portugal, July, 2007 (chosen as a best paper).
• A. Shiferaw, S. Lajmi, V. Scuturici and L. Brunie, PASMi: self-adaptive Photo
Annotation and Sharing Middleware of Mobile Ad-hoc Networks, Conference on
Pervasive Computing and Communications Workshops (PerComW 2010), 6 pages,
Mannheim, Germany, 2010.
A. Shiferaw, L.Brunie, V.Scutirici, Y.Fawaz, “Mobility Awareness for Information
Sharing in MANETs”. The 11th IEEE International Conference on Mobile Data
Management (MDM 2010), 3 pages, Kansas City, USA, May, 2010(in press.)
• A. Shiferaw, L.Brunie and V.Scutirici, Interest-Awareness for Information Sharing in
MANETs, International workshop on Mobile P2P Data Management, Security and
Trust (MP-DMST*), 6 pages, Kansas City, USA, , May, 2010 ( in press.)
Curriculum Vitae
Informations Personnelles
Nom, Prénom Addisalem Negash Shiferaw
Date et lieu de naissance le 11 mars 1977 à Addis-Abeba (Ethiopie)
Etat civil Célibataire
Nationalité Ethiopienne
Langue : Anglais, Français, Amharique (langue d’Ethiopie)
Formation
• Doctorante en informatique au laboratoire LIRIS, INSA de Lyon (Jan.2006 - Juillet 2010)
• Master en informatique, Département d’Informatique, Université d’Addis-Abeba, Addis-Abeba, Ethiopie (Sept. 2002 - Juillet 2004).
• ‘Bachelor of Science (BSc)’ en informatique, Département de mathématiques, Université d’Addis-Abeba, Addis-Abeba, Ethiopie (Sept.1996 - Juillet 2000).
• E.S.L.C.E (Diplôme de fin d’études secondaires), Addis-Abeba, Ethiopie (Mai 1996).
Expérience professionnelle
• Enseignement en informatique, Département de Mathématiques et Informatique, Université d’Addis-Abeba (Sept. 2000- Oct 2004).
Expérience administrative
• Intervenante et responsable pour les cours de bureautique et logiciels d’application, formation continue, Département de Mathématiques et Informatique, Université d’Addis-Abeba (Juin. 2002-Sept 2002).