Upload
adrianpopescu
View
1.417
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Su
Citation preview
Semantic User Profile
Marius Barat1 – [email protected]
Adrian Stefan Popescu1 –[email protected]
Alin Alexandru1 – [email protected]
1 Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iasi 16, Berthlot Street – 700483 Iasi,
Romania
Abstract. Due to overgrowing number of web social-networks and digital identity management
service a single individual can have a high number of accounts and authentication information
The possibility o forgetting the authentication information, loosing much time on checking the
news from every social-network and the lack of connection between the well known social-net-
work lead us to create a WebService with a single authentication method, a single web link and
a single web page in which a large number of social-networks and other web services for iden -
tity management could be reached. Using a high number of identity information lead to creating
scores for a user (like Klout) and for known persons of it. Also a possibility for reaching a SPARQL
database was mandatory to overgrowing of semantic web and SPARQL endpoints.
Keywords. Semantic Web, Windows Azure, Protégé, WCF, Cloud Service, WebService, Virtuoso,
EndPoint, RDF, SPARQL.
1 Introduction
This application started from the necessity of existing a social web platform able to
merge all the information that belongs to a user from the multitude of social websites
available nowadays. This platform should be smart enough to build hierarchies between
users, and also to provide statistics about their activities and their interests.
We built this application using the .Net Framework 4.0 which comes with a very
improved version of Entity Framework then the one from .Net Framework 3.5 (a data ac-
cess library that lives in the system.Data.Entity namespace).
Entity framework enables a more code-centric option called “code first develop-
ment”. It enables a different approach that enables to develop without ever having to open
XML mapping file, create and configure a database.
During this application development Entity Framework was used to store Web user
and their profiles (from social networks like Facebook or Twitter) and to ensure authentica-
tion. Also we store the access tokens for profiles information in the database.
Beside .Net Framework 4.0, we used many others tools and technologies, in order
to ensure a strong modularity for this application: stable independent modules, which can
be easily modified, update, without any impact on the other ones, and because some of
them are exposed as web services, they offer strong code reusability.
The main core of this application work like this: a user wants to create a new ac-
count on this platform; due the use of OAuth technology, he can use his Facebook or Twit -
ter account to access the platform. Once he gets logged in, some processing start in cloud,
and all the information gathered by them are stored as a RDF graph on a remote server.
This remote server has installed a Virtuoso open source distribution, and is able to store
RDF graphs and also provide support for SPARQL queries for them.
In the next sections we will describe some of the most important technologies we
used during developing this application, a work flow for the application, and some future
work directions.
2 Overview
We present an overview of our application and the connection between the main
parts in Fig. 1.
3 Technologies
Right from the start of the project we realized the magnitude of the project and a
correct software engineering approach was well recommended for the purpose of good
management of the project. Using a software engineering approach and for a overview of
the workflow we present the technologies in the order of appearance in the engineering
scheme with correlation with the normal workflow.
3.1 Source control
In order to work on the same project, in the same time, without conflict regarding
sources, and to keep a track of all the modifications we did at the project, we used a source
control.
Fig. 1. Worklow of the SUP application
Source control is the management of changes to documents, programs, large web
sites and other information stored as computer files. It is most commonly used is software
development, where a team of people may modify the same files.
Changes are usually identified by a number or letter code, called the “revision num-
ber”, or simply “revision”, finding an older version for a source code file being really simpli-
fied.
We used the Visual SVN source control. The project and all the sources that belong to
it are available at the following address: https://svn.info.uaic.ro/repos/sup/. It is not a public
source control, as read/write access are allowed only after a user access is granted, and only
after inserting the credentials from the info.uaic.ro account.
3.2 OAuth
SUP project supports both OAuth 1.0a and 2.0 and it is fully configurable to add new
endpoints for authentications. In the current version of the solution we have configured
Twitter ( that uses OAuth 1.0) and Facebook ( for OAuth 2.0).
Fig. 1. . Source control using Visual SVN
OAuth allows users to hand out tokens instead of credentials to their data hosted by
a given service provider. Each token grants access to a specific site (e.g., a video editing
site) for specific resources (e.g., just videos from a specific album) and for a defined dura-
tion (e.g., the next 2 hours).
This allows a user to grant a third party site access to their information stored with
another service provider, without sharing their access permissions or the full extent of their
data.
During the development process we had two major difficulties:
1. Add possibility for a user to add multiple data sources and to be able to use any of the added
providers to log in.
2. Because of the load balancer that is available in Cloud, the authentication process could start on
an instance and finish on another. In that case we had to store all the temporally information in a
table. Also we didn’t have a solution to determine the current URI in order to set the callback
used in the process. The solution was to take advantage of the different configuration files that
Visual Studio and Windows Azure uses. That means we could set in web.config file one callback
url for development process and another for deployment.
3.3 Windows Azure
Windows Azure is the platform used for hosting the project because it offers the pos-
sibility to scale each role individually. Also for us an advantage was that Visual Studio offers
an emulator for this cloud platform.
We used Web role to host the website and worker roles ( one for getting the data
from sources and one that processes the information into a RDF format using an OWL
schema.
For storing data we used SQL Azure and Azure Tables. The first one was used for
storing persistent data like user access tokens. Azure Tables is used for storing temporally
data for Datadigest Role.
We could give access to the Azure Table store to other developers that could use
that information to create their own ontology. That information could be accessed using
OData protocol.
3.4 Ontologies and Protégé
In computer science and particularly in our case ontology represents the knowledge for-
malized as a set of concepts, properties and rules within a domain and the relationships
between the concepts.
And ontology was needed for a good management and storage of the information
that is send by Windows Azures WorkerRoles from the Cloud and then classified in a consis-
tent way in the Virtuoso database. For creating the ontology we choose OWL\RDF model
because of the well known compatibility with a high number of web services and especially
because of the simple correlation between RDF and Triples that are used in our case of
Virtuoso storage model.
We used for building the OWL\RDF model a well known software Protégé that of-
fered us the possibility in a user-friendly approach to create classes, properties and to con-
nect them in a GUI interface.
Fig. 1. Protégé GUI interface
The ontology was created using top-down approach engineering, first step was
asking us what questions the ontology should know to answer, after we had all the answers
we started to model the basic classes (User, Information, Profile) and taking the model
deeper in specifications.
Fig. 1. Ontology graph
The relationships between classes and proprieties were designed at the same time
with creating the database tables and the information that we can retrieve from the differ -
ent social-networks that we integrated in our system. We encountered different problems
in designed the ontology because the rules and classes were constantly changing in order
with the new requests imposed by the social-networks.
3.5 Virtuoso
OpenLink Virtuoso is a cross platform universal server to implement Web, File, and
Database server functionality alongside Native XML Storage, and Universal Data Access
Middleware, as a single server solution. It includes support for key Internet, Web, and Data
Access standards such as: XML, XPATH, XSLT, SOAP, WSDL, UDDI, SMTP, ODBS, JDBC, etc.
It provides a high-performance virtual database engine for the Distributed Comput-
ing Age. It is a core universal data access technology set to accelerate our advances into the
emerging Information Age. It provides transparent access to existing data sources, which
are typically databases from different database vendors.
Fig. 1. Virtuoso endpoint Service
We have installed a Virtuoso open source distribution on a remote server, in order
to store there the rdf graphs with information about all the users. Virtuoso provides an
endpoint for sparql queries, which can be performed on any of the graphs stored on the
remote server. Due the fact that we are dealing with a remote server, dedicated only for
this service: graphs storage and offering the possibility of making sparql queries using the
endpoint it provides, this remote server could be used not only for this social platform that
we developed, but also by any other application that need rdf data storage, and also need
to perform sparql queries for the rdf information.
We encountered some problems when we started to install this software, due the
fact that the remote server we installed it on has 64bit processor architecture, and the Vir-
tuoso distributions for the 64bit architecture are not really stable. Once installed, the distri-
bution we worked with (version 6.3) has provided a lot of test database samples, and also
many useful usage examples.
3.6 WCF WebService
As we already said in the previous section, the Virtuoso server is installed on a re-
mote server machine, and its functionalities are available also for any other application, not
only for the social platform we wanted to build.
All these functionalities are accessed via web services, which are build using WCF –
Windows Communication Foundation. WCF is a unified programming model for building
service-oriented applications. It enables developers to build secure, reliable, transacted
solutions that integrate across platforms and interoperate with existing investments.
The functionalities that are implemented through WCF web services are the next ones:
The possibility of creating a new rdf graph
Insert a new rdf triple in an existing rdf graph
Delete a rdf triple in an existing rdf graph
Initialize a rdf graph
Execute sprawl queries for existing rdf graph
The communication between the WCF web service`s code and the virtuoso framework is
done using a public module available on the internet: dotNetRDF.dll; this provides the pos-
sibility of inserting a new rdf graph in tha database, to update an existing one, or perform-
ing a sparql query.
All these web services are published using the Internet Information Services (IIS) Manager.
Being public web services, they can be used by any user, not only by the WorkerRoles from
cloud.
4 Workflow
In this section we will present a usage scenario: whenever a new user wants to join
our social web platform, all he has to do is to insert its Facebook or twitter credentials. At
this moment, it has to wait a few moments, time while the WorkerRoles from cloud start to
inspect his Facebook and his twitter account for friends, posts, likes, and statuses. All these
information are gathered and they are stored as a rdf graph in a database from the virtuoso
server via a web service. After this part is ended, using some sparql queries that are per-
formed via another web service published from the server where the rdf graph is stored, a
score is computed for this user, and he has the possibility to see his score and also the
score for other user, for his friends who already have an account on our platform.
Also, again using the web service that provide access to the endpoint service from
the web server where the virtuoso distribution is installed, this user has access to several
statistics regarding his activity in the social networks he has account on.
5 Scalability
SUP project is designed to be able to scale each component as it needs. For example
it is possible that at some point to have a large amount of data to transform in RDF format.
In this case we need to increase the number of in-stances for DataDigest Worker Role.
Windows Azure offers an API from which we can monitor the load on each module
and be able to increase or decrease the number of instances pro-grammatically
In the presented solution Virtuoso server is the only point that is not sailable but we
could install it in Amazon EC2 which is the cloud solution from Amazon.
6 Future work
The project is opened to different improvements and new features added in any of
the main points of the application.
One of the possibilities in improving the SUP application is by making suggestions to
a user with different perspectives. We can add different artificial intelligence algorithms in
order to build groups of users which may have common interests, based on the tags from
their posts and from the posts he "like"
Regarding the authentication method, in the future work, a linked in authentication
method will be implemented, and also the information from this social network will be
gathered.
In order to quickly extend our platform, and to increase the number of users, it is
really necessary to implement a module to invite our user’s friends from the other social
networks to join our platform
7 Bibliography
Hosting a web service using IIS webserver
http://beyondrelational.com/blogs/dhananjaykumar/archive/2011/02/11/walkthrough-on-creating-wcf-4-0-service-and-hosting-in-iis-7-5.aspx
Module for communication between C# and Virtuoso http://www.dotnetrdf.org/content.asp?pageID=Using%20Virtuoso%20Universal%20Server
Virtuoso documentation http://wikis.openlinksw.com/dataspace/owiki/wiki/MetaWiki/ OAuth Login http://oauth.net/ Azure Cloud http://www.windowsazure.com/en-us/ Owl modeling http://protege.stanford.edu/ WCF services http://msdn.microsoft.com/en-us/library/ms731082.aspx