15
Semantic User Profile Marius Barat 1 [email protected] Adrian Stefan Popescu 1 [email protected] Alin Alexandru 1 [email protected] 1 Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iasi 16, Berthlot Street – 700483 Iasi, Romania Abstract. Due to overgrowing number of web social-networks and digital identity management service a single individual can have a high number of accounts and authentication information The possibility o forgetting the authentication information, loosing much time on checking the news from every social-network and the lack of connection between the well known social-network lead us to create a WebService with a single authentication method, a single web link and a single web page in which a large number of social-networks and other web services for identity management could be reached. Using a high number of identity information lead to creating scores for a user (like Klout) and for known persons of it. Also a possibility for reaching a SPARQL database was mandatory to overgrowing of semantic web and SPARQL endpoints. Keywords. Semantic Web, Windows Azure, Protégé, WCF, Cloud Service, WebService, Virtuoso, EndPoint, RDF, SPARQL.

Sup documentation

Embed Size (px)

DESCRIPTION

Su

Citation preview

Page 1: Sup documentation

Semantic User Profile

Marius Barat1 – [email protected]

Adrian Stefan Popescu1 –[email protected]

Alin Alexandru1 – [email protected]

1 Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iasi 16, Berthlot Street – 700483 Iasi,

Romania

Abstract. Due to overgrowing number of web social-networks and digital identity management

service a single individual can have a high number of accounts and authentication information

The possibility o forgetting the authentication information, loosing much time on checking the

news from every social-network and the lack of connection between the well known social-net-

work lead us to create a WebService with a single authentication method, a single web link and

a single web page in which a large number of social-networks and other web services for iden -

tity management could be reached. Using a high number of identity information lead to creating

scores for a user (like Klout) and for known persons of it. Also a possibility for reaching a SPARQL

database was mandatory to overgrowing of semantic web and SPARQL endpoints.

Keywords. Semantic Web, Windows Azure, Protégé, WCF, Cloud Service, WebService, Virtuoso,

EndPoint, RDF, SPARQL.

1 Introduction

This application started from the necessity of existing a social web platform able to

merge all the information that belongs to a user from the multitude of social websites

Page 2: Sup documentation

available nowadays. This platform should be smart enough to build hierarchies between

users, and also to provide statistics about their activities and their interests.

We built this application using the .Net Framework 4.0 which comes with a very

improved version of Entity Framework then the one from .Net Framework 3.5 (a data ac-

cess library that lives in the system.Data.Entity namespace).

Entity framework enables a more code-centric option called “code first develop-

ment”. It enables a different approach that enables to develop without ever having to open

XML mapping file, create and configure a database.

During this application development Entity Framework was used to store Web user

and their profiles (from social networks like Facebook or Twitter) and to ensure authentica-

tion. Also we store the access tokens for profiles information in the database.

Beside .Net Framework 4.0, we used many others tools and technologies, in order

to ensure a strong modularity for this application: stable independent modules, which can

be easily modified, update, without any impact on the other ones, and because some of

them are exposed as web services, they offer strong code reusability.

The main core of this application work like this: a user wants to create a new ac-

count on this platform; due the use of OAuth technology, he can use his Facebook or Twit -

ter account to access the platform. Once he gets logged in, some processing start in cloud,

and all the information gathered by them are stored as a RDF graph on a remote server.

This remote server has installed a Virtuoso open source distribution, and is able to store

RDF graphs and also provide support for SPARQL queries for them.

In the next sections we will describe some of the most important technologies we

used during developing this application, a work flow for the application, and some future

work directions.

2 Overview

We present an overview of our application and the connection between the main

parts in Fig. 1.

Page 3: Sup documentation

3 Technologies

Right from the start of the project we realized the magnitude of the project and a

correct software engineering approach was well recommended for the purpose of good

management of the project. Using a software engineering approach and for a overview of

the workflow we present the technologies in the order of appearance in the engineering

scheme with correlation with the normal workflow.

3.1 Source control

In order to work on the same project, in the same time, without conflict regarding

sources, and to keep a track of all the modifications we did at the project, we used a source

control.

Fig. 1. Worklow of the SUP application

Page 4: Sup documentation

Source control is the management of changes to documents, programs, large web

sites and other information stored as computer files. It is most commonly used is software

development, where a team of people may modify the same files.

Changes are usually identified by a number or letter code, called the “revision num-

ber”, or simply “revision”, finding an older version for a source code file being really simpli-

fied.

We used the Visual SVN source control. The project and all the sources that belong to

it are available at the following address: https://svn.info.uaic.ro/repos/sup/. It is not a public

source control, as read/write access are allowed only after a user access is granted, and only

after inserting the credentials from the info.uaic.ro account.

3.2 OAuth

SUP project supports both OAuth 1.0a and 2.0 and it is fully configurable to add new

endpoints for authentications. In the current version of the solution we have configured

Twitter ( that uses OAuth 1.0) and Facebook ( for OAuth 2.0).

Fig. 1. . Source control using Visual SVN

Page 5: Sup documentation

OAuth allows users to hand out tokens instead of credentials to their data hosted by

a given service provider. Each token grants access to a specific site (e.g., a video editing

site) for specific resources (e.g., just videos from a specific album) and for a defined dura-

tion (e.g., the next 2 hours).

This allows a user to grant a third party site access to their information stored with

another service provider, without sharing their access permissions or the full extent of their

data.

During the development process we had two major difficulties:

1. Add possibility for a user to add multiple data sources and to be able to use any of the added

providers to log in.

2. Because of the load balancer that is available in Cloud, the authentication process could start on

an instance and finish on another. In that case we had to store all the temporally information in a

table. Also we didn’t have a solution to determine the current URI in order to set the callback

used in the process. The solution was to take advantage of the different configuration files that

Visual Studio and Windows Azure uses. That means we could set in web.config file one callback

url for development process and another for deployment.

3.3 Windows Azure

Windows Azure is the platform used for hosting the project because it offers the pos-

sibility to scale each role individually. Also for us an advantage was that Visual Studio offers

an emulator for this cloud platform.

We used Web role to host the website and worker roles ( one for getting the data

from sources and one that processes the information into a RDF format using an OWL

schema.

For storing data we used SQL Azure and Azure Tables. The first one was used for

storing persistent data like user access tokens. Azure Tables is used for storing temporally

data for Datadigest Role.

Page 6: Sup documentation

We could give access to the Azure Table store to other developers that could use

that information to create their own ontology. That information could be accessed using

OData protocol.

3.4 Ontologies and Protégé

In computer science and particularly in our case ontology represents the knowledge for-

malized as a set of concepts, properties and rules within a domain and the relationships

between the concepts.

And ontology was needed for a good management and storage of the information

that is send by Windows Azures WorkerRoles from the Cloud and then classified in a consis-

tent way in the Virtuoso database. For creating the ontology we choose OWL\RDF model

because of the well known compatibility with a high number of web services and especially

because of the simple correlation between RDF and Triples that are used in our case of

Virtuoso storage model.

We used for building the OWL\RDF model a well known software Protégé that of-

fered us the possibility in a user-friendly approach to create classes, properties and to con-

nect them in a GUI interface.

Fig. 1. Protégé GUI interface

Page 7: Sup documentation

The ontology was created using top-down approach engineering, first step was

asking us what questions the ontology should know to answer, after we had all the answers

we started to model the basic classes (User, Information, Profile) and taking the model

deeper in specifications.

Fig. 1. Ontology graph

The relationships between classes and proprieties were designed at the same time

with creating the database tables and the information that we can retrieve from the differ -

ent social-networks that we integrated in our system. We encountered different problems

in designed the ontology because the rules and classes were constantly changing in order

with the new requests imposed by the social-networks.

3.5 Virtuoso

OpenLink Virtuoso is a cross platform universal server to implement Web, File, and

Database server functionality alongside Native XML Storage, and Universal Data Access

Page 8: Sup documentation

Middleware, as a single server solution. It includes support for key Internet, Web, and Data

Access standards such as: XML, XPATH, XSLT, SOAP, WSDL, UDDI, SMTP, ODBS, JDBC, etc.

It provides a high-performance virtual database engine for the Distributed Comput-

ing Age. It is a core universal data access technology set to accelerate our advances into the

emerging Information Age. It provides transparent access to existing data sources, which

are typically databases from different database vendors.

Fig. 1. Virtuoso endpoint Service

We have installed a Virtuoso open source distribution on a remote server, in order

to store there the rdf graphs with information about all the users. Virtuoso provides an

endpoint for sparql queries, which can be performed on any of the graphs stored on the

remote server. Due the fact that we are dealing with a remote server, dedicated only for

this service: graphs storage and offering the possibility of making sparql queries using the

Page 9: Sup documentation

endpoint it provides, this remote server could be used not only for this social platform that

we developed, but also by any other application that need rdf data storage, and also need

to perform sparql queries for the rdf information.

We encountered some problems when we started to install this software, due the

fact that the remote server we installed it on has 64bit processor architecture, and the Vir-

tuoso distributions for the 64bit architecture are not really stable. Once installed, the distri-

bution we worked with (version 6.3) has provided a lot of test database samples, and also

many useful usage examples.

3.6 WCF WebService

As we already said in the previous section, the Virtuoso server is installed on a re-

mote server machine, and its functionalities are available also for any other application, not

only for the social platform we wanted to build.

All these functionalities are accessed via web services, which are build using WCF –

Windows Communication Foundation. WCF is a unified programming model for building

service-oriented applications. It enables developers to build secure, reliable, transacted

solutions that integrate across platforms and interoperate with existing investments.

The functionalities that are implemented through WCF web services are the next ones:

The possibility of creating a new rdf graph

Insert a new rdf triple in an existing rdf graph

Delete a rdf triple in an existing rdf graph

Initialize a rdf graph

Execute sprawl queries for existing rdf graph

The communication between the WCF web service`s code and the virtuoso framework is

done using a public module available on the internet: dotNetRDF.dll; this provides the pos-

sibility of inserting a new rdf graph in tha database, to update an existing one, or perform-

ing a sparql query.

Page 10: Sup documentation

All these web services are published using the Internet Information Services (IIS) Manager.

Being public web services, they can be used by any user, not only by the WorkerRoles from

cloud.

4 Workflow

In this section we will present a usage scenario: whenever a new user wants to join

our social web platform, all he has to do is to insert its Facebook or twitter credentials. At

this moment, it has to wait a few moments, time while the WorkerRoles from cloud start to

inspect his Facebook and his twitter account for friends, posts, likes, and statuses. All these

information are gathered and they are stored as a rdf graph in a database from the virtuoso

server via a web service. After this part is ended, using some sparql queries that are per-

formed via another web service published from the server where the rdf graph is stored, a

score is computed for this user, and he has the possibility to see his score and also the

score for other user, for his friends who already have an account on our platform.

Also, again using the web service that provide access to the endpoint service from

the web server where the virtuoso distribution is installed, this user has access to several

statistics regarding his activity in the social networks he has account on.

5 Scalability

SUP project is designed to be able to scale each component as it needs. For example

it is possible that at some point to have a large amount of data to transform in RDF format.

In this case we need to increase the number of in-stances for DataDigest Worker Role.

Windows Azure offers an API from which we can monitor the load on each module

and be able to increase or decrease the number of instances pro-grammatically

In the presented solution Virtuoso server is the only point that is not sailable but we

could install it in Amazon EC2 which is the cloud solution from Amazon.

Page 11: Sup documentation

6 Future work

The project is opened to different improvements and new features added in any of

the main points of the application.

One of the possibilities in improving the SUP application is by making suggestions to

a user with different perspectives. We can add different artificial intelligence algorithms in

order to build groups of users which may have common interests, based on the tags from

their posts and from the posts he "like"

Regarding the authentication method, in the future work, a linked in authentication

method will be implemented, and also the information from this social network will be

gathered.

In order to quickly extend our platform, and to increase the number of users, it is

really necessary to implement a module to invite our user’s friends from the other social

networks to join our platform

Page 12: Sup documentation

7 Bibliography

Hosting a web service using IIS webserver

http://beyondrelational.com/blogs/dhananjaykumar/archive/2011/02/11/walkthrough-on-creating-wcf-4-0-service-and-hosting-in-iis-7-5.aspx

Module for communication between C# and Virtuoso http://www.dotnetrdf.org/content.asp?pageID=Using%20Virtuoso%20Universal%20Server

Virtuoso documentation http://wikis.openlinksw.com/dataspace/owiki/wiki/MetaWiki/ OAuth Login http://oauth.net/ Azure Cloud http://www.windowsazure.com/en-us/ Owl modeling http://protege.stanford.edu/ WCF services http://msdn.microsoft.com/en-us/library/ms731082.aspx