15
Studia Ekonomiczne. Zeszyty Naukowe Uniwersytetu Ekonomicznego w Katowicach ISSN 2083-8611 Nr 234 · 2015 Krzysztof Węcel Uniwersytet Ekonomiczny w Poznaniu Katedra Informatyki Ekonomicznej [email protected] LINKED GEODATA FOR PROFILING OF TELCO USERS Summary: There is a growing interest in location-based profiling of users de-fined as com- bining geo-data with anonymous on-line profiles. The profile of an entity usually consists of concepts accompanied by a weight specifying a relative importance of the given concept for making an analysed entity distinct. The proposed profiling method of telco users is a two-step approach. First, profiles of mobile tower stations (BTS) are created based on crowdsourced geographical information. Second, they are used to generalise the behaviour of a calling user, which is determined from Call Detail Records (CRD). The linked data cloud is considered as an additional knowledge source in the user modelling process. Keywords: linked data, user profiling, linked geodata, call detail record, mo-bile user, telco, cdr, bts, lgd, osm. Introduction There is a growing interest in location-based profiling defined as combining geo-data with anonymous on-line profiles. New methods for capturing informa- tion on where are the users and how their position changes over time are con- stantly developed. This information is becoming increasingly more valuable for a growing number of location-based or location-aware services. Some research- ers try to estimate value of mobile data information utilising proximity-based advertising valuation (Baccelli, Bolot, 2011). Tourists have been identified as the most rewarding target group. Call Detail Record (CDR) is the most widely used source of mobile loca- tion data in academic research (Song et al., 2010). Presented location is not very precise as it is “rounded” to the co-ordinates of the nearest base transceiver sta- tion (BTS). Various granularities of location impact the value of location infor-

LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

Studia Ekonomiczne. Zeszyty Naukowe Uniwersytetu Ekonomicznego w Katowicach ISSN 2083-8611 Nr 234 · 2015

Krzysztof Węcel

Uniwersytet Ekonomiczny w Poznaniu Katedra Informatyki Ekonomicznej [email protected]

LINKED GEODATA FOR PROFILING OF TELCO USERS

Summary: There is a growing interest in location-based profiling of users de-fined as com-bining geo-data with anonymous on-line profiles. The profile of an entity usually consists of concepts accompanied by a weight specifying a relative importance of the given concept for making an analysed entity distinct. The proposed profiling method of telco users is a two-step approach. First, profiles of mobile tower stations (BTS) are created based on crowdsourced geographical information. Second, they are used to generalise the behaviour of a calling user, which is determined from Call Detail Records (CRD). The linked data cloud is considered as an additional knowledge source in the user modelling process. Keywords: linked data, user profiling, linked geodata, call detail record, mo-bile user, telco, cdr, bts, lgd, osm. Introduction

There is a growing interest in location-based profiling defined as combining geo-data with anonymous on-line profiles. New methods for capturing informa-tion on where are the users and how their position changes over time are con-stantly developed. This information is becoming increasingly more valuable for a growing number of location-based or location-aware services. Some research-ers try to estimate value of mobile data information utilising proximity-based advertising valuation (Baccelli, Bolot, 2011). Tourists have been identified as the most rewarding target group.

Call Detail Record (CDR) is the most widely used source of mobile loca-tion data in academic research (Song et al., 2010). Presented location is not very precise as it is “rounded” to the co-ordinates of the nearest base transceiver sta-tion (BTS). Various granularities of location impact the value of location infor-

Page 2: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

Krzysztof Węcel

200

mation (Baccelli, Bolot, 2011). CDR has been identified to be sufficient for drawing conclusions at the area level (Qu, Zhang, 2013), hence it is also suffi-cient for our purposes.

Linked data cloud is very often considered as additional knowledge source in user modelling process. Concepts defined in various ontologies can be used to characterise entities. Profile of the entity consists then of the concept accompa-nied by a weight specifying relative importance of the given concept for making analyse entity distinct.

In this paper we focus on cell tower granularity of location information and annotate it with geographical ontology derived from OpenStreetMap. The goal of the method is to provide profiles of the users based on profiles of the BTS sta-tions. Therefore, the profiling process has been split into several steps. First, the information about BTS location and its neighbourhood has to be retrieved and analysed. Then, based on this information a summary of BTS profiles is pre-pared. We propose an improvement in profiling process by leveraging TF-IDF ranking to address the issue of uneven distribution of categories describing mo-bile tower locations (skewness). In the last step we can characterise users that log-in into specific BTS stations.

Section 2 presents related research. Section 3 explains our general approach to profiling of entities based on geographical context. Section 4 introduces a method for characterisation of BTS stations, from data collection, through analy-sis, to data aggregation. Section 5 provides a method for profiling of telco users. 1. Related research

In the literature, there are various approaches to profiling of mobile users. Some authors base purely on telco data, i.e. data available for mobile operators. Majority of methods leverage the social media where users manifest their opin-ions, feelings, reveal location etc. The most sophisticated approaches add mining for generalisation of patterns and classification of users. Having access to ano-nymised call data, we base our method on this data.

Most methods base on social data like Twitter, Foursquare, Flickr, or Insta-gram as data is relatively easily retrievable (API available). These services are then widely used and generate large volumes of data. One of the challenges is how to model the use of such social (and mobile) applications by various users. It is essential to understand the semantics of messages posted by them. Two trends are observed here: extraction of meaning and location.

Page 3: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

Linked geodata for profiling of telco users

201

In many studies in order to enrich and disambiguate information gathered from user, semantic technologies are considered. They are particularly useful for providing context, and geographical context is one of the most important. Abel et al. introduced a user modelling framework that utilises semantic background knowledge and use it for point of interest (POI) specification (Abel et al., 2012). Two knowledge sources of linked data are considered: GeoNames and DBpedia. They demonstrate that user modelling quality improves when LOD-based back-ground knowledge is considered. DBpedia is unfortunately too coarse-grained for our purposes.

Instead of analysing separate check-ins, some approaches build activity-travel profile – a spatial trajectory is built from mobile phone call records only. The tricky part is in classification of trajectories, where data mining methods can be applied (Görnerup, 2012). Not only the sequences have to be derived but in order to make sense they have to be classified into typical activity-travel pat-terns. Their relative frequencies constitute an activity travel profile (Liu et al., 2014). Our method also bases on a number of BTSes visited. However, we do not consider sequences of visits as our experiments have shown that this would not produce meaningful results.

Trajectories, or sequences of visited locations, are not very useful unless they are confronted with activities of the users. Lie et al. investigated to what ex-tent the behavioural routines could reveal the activities being performed at mo-bile phone call locations (Liu et al., 2013). The real value is in annotating loca-tions with activity purposes but for this additional information is required. Although they have devised mathematical models to quantitatively characterize travel patterns, the motivating activities “were still in a less-explored stage” (Liu et al., 2013). Our method provides just another context for reasoning about pos-sible activities of the users, e.g. shopping, playing tennis.

According to (Cano et al., 2013) little has been done in modelling location entities. Therefore, they proposed to profile geographical areas by providing topical categorisation. Cano et al. used additional source – linked data to inter-link information from social stream with geographical objects (Cano et al., 2013). They have introduced LinkedPOI ontology, which uses DBpedia catego-ries to profile geographic space and proposed geo-lattice Awareness Stream model as one of the ways to represent location. The process consisted of filter-ing, enriching, structuring and interlinking microposts from Twitter, Facebook, and TripAdvisor. Our method in fact models location entities but as we do not analyse microposts we do not have to guess the correct location – it is provided in CDR.

Page 4: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

Krzysztof Węcel

202

Qu proposed a framework and corresponding analytic methods to use User Generated Mobile Location Data (UGMLD) for Trade Area Analysis (Qu, Zhang, 2013). They have defined three key processes: “identifying the activity centre of a mobile user, profiling users based on their location history, and mod-elling users’ preference probability.” Application of the method is meant for analysis of customers’ visits to business venues. However, they rejected CDR as the data source in their research as it was too coarse. Our method is able to util-ise CDR in order to provide profiles of certain locations although we cannot provide profiles of certain venues belonging to bigger chains.

When specific locations are considered, Chen et al. also presented a method for profiling businesses at specific locations that was based on mining informa-tion from social media (Chen et al., 2014). They matched geo-tagged tweets against locations from Foursquare to build a profile of mentioned businesses.

Going back to the user, Ostuni et al. presented Cinemappy – a location-based application that computes film recommendations by exploiting contextual information related to current location of the user, leveraging information from DBpedia (Ostuni et al., 2013). Similarly, DBpedia was used by (Cano et al., 2013) who proposed a semantic travel mash-up as possible application. Ap-proach for museums was presented in (Ruotsalo et al., 2013).

One of the obstacles by user modelling is uneven distribution of categories of business, e.g. there are much more visits to a cinema than to second-hand and there are much more shops than theatres. It was first noted by Qu who explained this by social motivation, not necessarily by the differences in the number of various categories (Qu, Zhang, 2013). In their work the categories were very fine grained, for example Foursquare has a hierarchical category structure with 9 top categories and ca. 400 sub-categories. For the clarity of interpretation the catego-ries have been later collapsed to 6 groups. Our methods addresses this issue by applying specific methods from information retrieval domain. 2. Approach to geographical linked data-based profiling

This section describes user profiling based on BTS characteristics derived from the geographical linked data. We follow the idea of location-based user profiling – one of the approaches is geoprofiling, a commonly used method to approximate user characteristics based on neighbourhood demographic data.

In most approaches, there is a venue or a place given and authors are look-ing for coordinates. This is particularly important when text is analysed, espe-cially in social media, e.g. Twitter. In order to simplify disambiguation, some

Page 5: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

Linked geodata for profiling of telco users

203

portals allow the so called check-ins where users select precise location, e.g. Foursquare, Facebook. We take the reverse process: starting from coordinates we are interested in the objects nearby, thus describing the geographical context, further referred to as location profile.

Our experiments have been carried out on anonymised data, where the only reasonable data for linking was location of BTS towers. There was just one type of information that was widely used and could supplement our records – geo-graphical information. There are several open data sources concerning geo-graphical information that we could use, DBpedia and OpenStreetMap being the most prominent. Taking into account the granularity of available data and the re-quirement to display results on the map we made a decision to base our method on OpenStreetMap and its triplified counterpart – LinkedGeoData (Auer, Leh-mann, Hellmann, 2009). As a crowdsourced data, it is kept relatively up to date and but without breaking fluctuations.

LinkedGeoData (LGD) provides an ontology for classification of locations. There are ca. 1200 categories grouped into ca. 45 top-level categories. Compar-ing LGD to Foursquare, the latter has ten top-level, 436 second-level, and 266 third-level categories1. Foursquare’s maps in fact use OpenStreetMap2. In our approach more general categories are advantageous as they can make interpreta-tion of generalisation results easier. Sub-categories could be more valuable as they can distinguish users better but the solution is then less stable from the sta-tistical point of view. This is a well-known trade-off of specificity vs. sensitivity (Fawcett, 2006). In order to best characterise the BTS stations, we have re-stricted our further analysis to some predefined objects (see Fig. 3).

The reasoning behind our profiling approach is presented in the following user story. A user often visits sport amenities. They are in the scope of some BTS stations. Profiles of such stations contain sport amenities with higher frequency than an average station. This is further reflected in a user profile where sport amenities gain higher weight when user trajectory is aggregated. Some visited venues are additionally annotated with a kind of sport, for example tennis3. Looking into calendars of sport events we can even reason further in which kind of sport user is interested or whom the user is supporting (whether team or indi-vidual).

1 https://developer.foursquare.com/categorytree. 2 https://foursquare.com/about/osm. 3 This depends on data availability in OpenStreetMap.

Page 6: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

Krzysztof Węcel

204

3. Characteristics of BTS 3.1. Retrieval

In our experiments we have used BTS towers located in Poland, with ca. 8000 unique locations, stored in MySQL. At the beginning the information about BTS locations has been retrieved. Using a Python script, for each location a SPARQL query was prepared to retrieve list of objects in the neighbourhood along with their categories. As a source of data for our queries we have used LinkedGeoData which is a derivative of OpenStreetMap. Two main categories of objects are distinguished therein: nodes (just a point according to GIS termi-nology) and ways (lines or polygons). Separate queries for nodes and ways had to be prepared because the Virtuoso’s built-in distance function has different be-haviour for nodes and ways. As there were ca. 8000 locations, two kinds of ob-jects, two means for object capturing (bounding box and circle) and 3 various distances, we had to post ca. 80 thousand queries.

Below sample SPARQL query is presented:

PREFIX lgdm:<http://linkedgeodata.org/meta/> PREFIX geom:<http://geovocab.org/geometry#> PREFIX ogc:<http://www.opengis.net/ont/geosparql#> SELECT distinct ?class ?way WHERE { ?way a lgdm:Way . ?way a ?class . ?way geom:geometry [ogc:asWKT ?geo ] . filter(bif:st_within( ?geo,bif:st_point(%f,%f),%f )) }

Listing 1. Sample SPARQL query using geospatial functions

We have decided to use LGD’s endpoint instead of OSM’s API as it was possible to use Virtuoso built-in SPARQL functions for spatial queries. For the retrieval of nodes, it was possible to provide detailed query and the radius was in fact expressed in kilometres. For example, Figure 1a presents various venues lo-cated in a circle of 1 km diameter.

Retrieval of ways was more complicated. Function bif:st_within was not returning what it was expected for Way objects when the other parameter was a point. Several methods to get satisfactory results have been tested, includ-ing generation of boxes to simulate containment (see Fig. 1b and 1c). Finally, the circle overlap after toleration parameter tuning to 0.01 has been chosen as a method to query for neighbouring ways (Fig. 1d).

Page 7: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

F

3

atiT

ct(u

(a)

(c)

Fig.

3.2

andtheiinteThe

catition(0.0user

) no

) wa

. 1.

. A

Vad quir nerese Fi

Aion.n an06).r pr

des

ays,

Vartion

Ana

Variouantneigstingg. 2

Anot. Gend Surofi

, cir

box

riounal

lys

ous tita

ghbog to2b stherene1.2uch les.

rcle

x, 5

us vFai

sis

aspativeourho obshor aseral reasy

.

e, 1k

km

venur

pece. Fhoobserws pecfinstauym

L

km d

ues s

ts oFor od. rve a c

ct anndinuran

mme

Link

diam

sele

of loexThtha

closnaly

ngs nts;try

ked

mete

ecte

ocaampe siat ther lyseare; onne

geo

er

d ba

ationpleize he clooked ise as n theds

odat

ased

n ch Fiof

cityk. s nufol

he os to

ta fo

d on

harg. 2the

y mo

umbllowoth be

for p

n ob

ract2a e ciost

ber ws: er e

e ad

profi

(b

(d)

bjec

erisshorclepac

of on end

ddre

filing

b) w

) wa

ct ty

sticowse spcke

venav

d aresse

g of

ways

ays,

ype

s cas Bpecied w

nueerare ued w

f tel

, bo

circ

and

an bTS ifie

with

es oge univwhe

co u

ox, 1

cle,

d dis

be asta

es thh ho

f githeversen b

user

1km

dis

stan

anaatiohe notel

ivenre asitiebui

rs

m

stanc

nce,

alysons numls in

n caare es (ildin

ce 0

nea

sed,tha

mben Po

ateg4.0

(0.0ng

0.01

ar P

boat her oolan

gory0 sh04) the

1 Pozn

oth haveof hnd i

y pehops

an loc

nań

quae hohoteis G

er Bs pe

nd ccati

Inte

alitaotel

els. Gda

BTSer lcineion

20

erna

ativls iIt i

ansk

S lolocaema

an

5

a-

ve in is k.

o-a-as nd

Page 8: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

2

F

raaOhamst

app

206

(a)

Fig.

resua coandObjhena wmarsentthe

a typrofpark

6

Pol

. 2. N

Aults ompd GIject

nce way rkedted moA

ypicfileking

land

Num

Althopr

prehIS tts lithein

d aas

ost bAftercal e is gs a

d

mbe

ougrovihentypike ey a alms a nod

balar exBTpre

and

er o

gh nidednsive apar

are mono

desancxperTS lesend 15

f ho

nodd bve loand rkinmost

ode. aned rimlocanted5.3%

otels

des y tocathe

ng oore 110. On

nd oenti

mentatiod in% le

s lo

andthesatione Opor lpop

0 thn th

onlyity:ts won, n theisu

cate

d wse tn prpeneisupulhouhe

y 4 24

we cinc

he Fure-

ed w

waystworofinStrure lar usanothas a

4899conccludFig. -rela

K

with

s wo tyile. reetare

as wnds her a w9 wclud

ding3.

ated

Krzy

hin B

wereypes

ThtMaeas waycas

endway. waysdedg inOn

d.

yszto

(b)

BTS

e dis ofhereap careys. ses;d ar

Ths vsd thanputn av

of W

Gd

S sta

istinf obe is come be

Fo; onre Ahe ss. 2at itt frvera

Węc

dańs

ation

ngubjeca s

mmuetteror enly

ATMsam887t is omage,

cel

k

ns

uishcts tronunitr re

examy 3.Ms

me ap71 nuse

m bo, 22

ed hav

ng pty heprempl5 t– opplnodefuloth 2.1%

dueve prehas esenle, pthouoverlies des.l tonod

% o

e toto feread

ntedparusanr 6for

predesof o

o tebe enc

daptd byrkinnds500r tra

epa an

obje

echnme

ce bted y sh

ng is pa0 enam

are cnd wects

nicaergebetw

cerhows rearkintitisto

chaways wi

al lied tweenrtainwingepring ies

ops.

aracys. ithi

imito pn cn pg thesearareSh

teriComn B

itatiproateg

pattehe ante

rease rehops

isticmb

BTS

ionsvidgorernsaread a

s arepres ar

cs oine

S ar

s, de ry s. a, as re e-re

of ed re

Page 9: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

F 4 4

fssok

pTd

d

w

Fig.

4. P

4.1

ficiesolusultof lkind

posTF-doc

deci

whe

. 3.

Pro

. T

Sient ute ts wlocad of

Ine t-IDFcum

TFided

ere

Ave

ofile

F-I

impto val

wouatiof ob

n orto uF is

mentF cad to

ni,j

erag

es o

IDF

ple corluesld b

ons. bjecrderuses act frean b

o use

is n

ge d

of l

F-in

aggrrecs. Fbe b

In cts mr toe Tctuaeqube ce re

num

distr

loca

nsp

gregctly For biasfac

mor all

TF-Ially

uenccalcelati

mbe

L

ribut

atio

pire

gatiproexa

sed ct, wre oleviIDFy a pcy (culaive f

r of

Link

tion

ons

ed

ion ofilampif w

we ofteiate

F wpro

(IDFated freq

f tim

ked

n of

s an

me

of e lople,we are

en the theweigoducF).

d in quen

mes

geo

f obj

nd

eth

geoocat as had

e inhane efghtict o

diffncy

s th

odat

jects

us

od

ogrationthe

d nonteren anffecing of tw

fferey (w

at t

ta fo

s am

ers

for

aphns. Rere ot ineste

n avct o

scwo

ent wwhic

the t

for p

mon

s

r lo

hicaRelare

nclued iveraof uchem

sta

waych is

term

profi

ng p

oca

l calative mudein i

age unevma atist

ys, es no

m ti

filing

rede

atio

ategve v

muched thinfouse

venkn

tics

e.g.orma

i occ

g of

efin

on p

gorivaluh mhis

ormer. n disnows: te

. boalis

cur

f tel

ned

pro

ies aues

morecor

matio

stribwn erm

ooleed t

rred

co u

30 c

ofili

assiare

e shrrecon,

butfro

m fre

ean oto 1

d in

user

cate

ing

igne mhopctioif g

tionm equ

or r1.0),

doc

rs

egor

g

ned moreps thon ingiv

n ofinf

uenc

raw, ex

cum

ries

to pe imhann then

f caformcy (

w frexpre

men

of o

placmpon libhe cuse

tegmati(TF

equeesse

nt dj

obje

ces ortanbrarcharers

orieion

F) an

encyd as

j.

ects

is nnt thries ractvisi

es wn rend

y. Ws fo

s

not hanthe

teriit s

we etrieinv

We hollow

20

sufn abe resticom

proevavers

havws:

7

f-b-e-cs

me

o-al. se

ve

Page 10: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

2

n

wb

oPnTs

F T

208

nati

wheber

of tPitcnot TF-stric

Fig.

Tab

8

IDion

ere of A

the ch –ver

-IDFcted

. 4.

ble 1

DF ipow

|D|doc

An emo

– isry uF md lis

Per

1. S

is a wer

is cumentryost ps 4.2usefmethst o

rcen

amp

a meas

a nmeny topop234ful hod

of 3

ntage

ple

easuthey

numts c

o IDpula4. Cto

d ar0 ob

e of

pro

ure y ca

mbercontDF car cCondist

re gbje

f loc

file

of tan b

r oftaincalcateg

ncluting

givects.

catio

s of

termbett

f doningculagor

usionguisen in.

ons

f BT

m spter c

ocug thatiory –n: ssh ln th

con

TS l

K

pecichar

umehe teon i– Shsholocahe T

ntain

loca

Krzy

ificiract

entserms prhopps, atioTab

ning

ation

yszto

ity –teris

in m ti.

resep – i

preons. ble

g ve

ns c

of W

– lese a

cor

enteis 0esenSa

1. T

enue

calcu

Węc

ess fa do

rpu

ed i0.76nt i

ampThe

es (n

ulat

cel

freqocum

us, a

in F66 ain aple rey h

nod

ted w

quenmen

and

Fig.andalmresu

have

des)

with

nt tnt. I

d de

4. d forost ultse be

of g

h TF

ermt is

enom

TFr leha

s obeen

give

F-ID

ms hcal

min

-IDeast lf o

btainn bu

en c

DF

havelcula

nato

DF fpo

of thned

uilt

categ

e biated

or c

factopulhe

d frobas

gory

igged as

cont

tor ilar clocom sed

y

er ds fol

tain

in tcateatiotheon

discrllow

ns n

the egoonse ab

n the

rimiws:

num

casory , arbove re

i-

m-

se –

re ve e-

Page 11: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

naSImqi5tT

mnbt

F 4

t1

4

neigare SomIn bmosqueinte5 mthosThi

meaneigbluethe

(a)

Fig.

4.2

tain10 B

4 Pl

ri

Thghbnod

me lboldst c

ent (erestmonuse cs is

Fiasurghbe – wh

Pol

. 5.

. U

FoningBTS

leasod c

heybourdes locad wchar(loctingumcate to ig. re.

bourhis

hole

land

Mo

User

or eg daS lo

e no

can b

y prrhoo

(poatio

we hractcatiog toents

egorsom5 pEa

rhostore Po

d

ost p

r pr

evalata focat

ote thbe c

rofiod ointns hhavterison io os anries me epresch od.

ricaolan

popu

rof

luatfor tion

hat t

consi

filesof tts), hav

ve mstic id 3bsend 8wi

extesentBT Th

al obnd (

ular

filin

tion3 m

ns a

this ider

L

inthe inc

ve jumark

for32),erve8 bth sent ts a

TS lhe mbjec(Fig

r ann

ng

n, wmonand

is il

red, a

Link

n thloc

cludust kedr a , thee thanksmajus

a vilocamoscts,

g. 5a

nota

we hnthsthe

llustany

ked

he catiodingone

d thgiv

e onhat ks. Wallertifieisuaatiost im oraa) a

ation

haves. Fen r

tratioothe

geo

Tabon g 2 e obhe cvenne w13 Weir nued –alisaon impangand

ns o

e usrom

rand

on oer se

odat

ble id 3bu

bjeccaten locwith

shightumb– suatiois aorta

ge –Gd

of B

sedm thdom

of thet of

ta fo

1 34 t

us stct (eegorcatih thopsts cber uch on oannoant – scdańs

BTS

d thehis dmly

he mf obj

for p

shothertop

e.g. ry wion.he hs incan of ccat

of totatobj

choosk (

loc

e dadatasel

methoject

profi

oulre as, 1id 3

with. W

highn lo

be catetegotop ted

bjecols,(Fig

(b)

catio

atababalect

od antype

filing

d bare 1 pl36)h th

Whenhest ocat

comegoorie

clawi

ts a, grg. 5

Gda

ons

basease wted u

nd ses ca

g of

be 4 glace, othe hn mt IDtionmparies

es classeith are reenb) a

ańsk

e wwe use

somean b

f tel

integeoge ofther high

manDF mn idareds geleares rthecod

n –are

k

with hav

ers4.

e debe us

co u

erprgrapf wor havhest

ny cmakd 41d beet hirly pranke todedpubatta

a sve q. Th

ecisised.

user

retephicorshve mt mcatekes 1 aretwighproked

op-rd asblicach

amquehe c

ons

rs

ed cal hipman

measegortopre

weener w

ofiled acranks foc trahed.

mpleeriedcalc

are

as obj, anny msureries

p of lessn loweige theccorked ollowansp.

d 1d ucula

arbi

foljectnd 1more, th are

f thes imocatghte lordinca

ws:por

0.0sersatio

itrar

llowts, a1 fure (huse eqe ranmpotions (c

ocatng tateg red

rt. F

000 s wn o

ry. A

ws: all ouel (e.gs bequankinortans, bcasetionto T

goryd –Figu

usewith of th

Any

in of tsta. id eingally ng.

ant but e id n. TF-y in– shures

ers at l

he c

othe

20

ththemation

41g th

freIt ithathe36)

-IDFn thhopss fo

conleaschar

er pe

9

he m n. ).

he e-is

an en ).

F he s, or

n-st r-

e-

Page 12: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

2

aanv

F

bwhvis

210

acteas anumvisu

Fig.

bettwithhavvaluin Fshou

0

erisa wmbeuali

. 6.

Foter eh th

ve nue oFig.uld

ticsweiger oise p

Geo

or cemphis knormof g 7.

d fur

s ofghtef copro

ogra

comphakindmalgive

Grrthe

f usd sonnfile

aphi

mparasizd oliseden orouper im

sersum

nectes o

ical

risoze df chd thobjepingmpr

s ism oftionof us

Lin

on odiffeharthe vect g orov

s raf prons insers

nked

of uerents isvalutypf ce

ve th

atheofilnitias as

d D

userncess thues

pe inertahe r

r stles ateds a p

Data-

r prs an

hat as in n prain read

K

traiof vd wpie

-Ba

rofilnd saxes

chroficatedab

Krzy

ghtvisi

with cha

sed

les sims sh

hartsile hegoility

yszto

tforiteda g

art.

pro

mumilarhous inhasoriey of

of W

rward BTgiveFig

ofile

uch ritie

uld hn su vas orf th

Węc

rd. TS

en Bg. 6

e of

moes bhavuch aluer ev

he ch

cel

Thloc

BTS pre

f the

ore betwve th

a w 1.0venhar

he pcatiS. Sesen

e sam

suitweehe sway0. Tn redrt.

profons

Simnts

mpl

tabln usamy, t

The duc

file s, wilarpro

e us

le iuserme mthatsam

cing

of wherrly tofile

sers

s a s. O

meat usme g nu

f a re tto les f

s

radOneasurser use

umb

usethe ocafor

dar e of re. Twit

ers ber

er iswe

atiosam

chaf theTheth tare of

s preighns,

mple

art te prerefthe

comcat

repht iswe

e us

thatroblforehigmptego

ares the casers

t caleme, wghesareorie

ed he an s.

an ms we

st ed es

Page 13: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

F C

Lueio

pJtbtttca

Fig.

Con

Linusererating or r

profJordtionblesthatto pthe catean e

. 7.

ncl

Inkedrs. Ttorsany

reveSo

filedann is s out areprephie

egorexte

Com

lusi

n thdGeThes. Ny buenuo fa

e. Inn, 20

mour ae pr

pareerarriesensi

mpa

ion

his eoDe m

Neveusine esar wn th012odelapprove aprchys whion

ariso

ns a

paDatameth

erthnesstimwe e fu

2). Illed

proavidepproy ofhenof

on o

and

apea anhod heles re

matihav

uturIt isd asach ed boprif can anthe

L

of u

d fu

er wnd a

hasess, equiionve re ws a gs a f

forby liateategnnote me

Link

user

utur

we alsos be theirin. app

we pgenfinir proca

e migoritatietho

ked

pro

re w

ha theeene el

ng p

plieplan

neraite mrofiatioixtues. ng od t

geo

ofile

wo

avee TF

n aplaboprof

d an toativemixilingns. ure.Asspeto h

odat

es on

ork

e cF-ID

pplieoratfile

a simo use prxturg: uWh Th of

ecifhier

ta fo

n ra

ontDF-ed ited of

mpse Lrobre ouserhat he mf nofic vrarc

for p

adar

tribu-bain tmenei

ple aLateabi

overrs awe

methw Lvenuchic

profi

cha

uteasedhe pethoighb

appent listr anare e nehodLinkue

cal L

filing

arts

d d meparod cbou

proaDirtic mn un

moeed d caked(e.gLDA

g of

– re

theethorticucan

urho

ach richmodnderodeto

an adGeg. bA (h

f tel

estr

mod fularn beood

forhlet del rlyi

elleddet

alsoeoDbothhLD

co u

ricte

methfor r see us, e.

r agAllwh

ing d astermo beDatah shDA

user

ed to

hodpro

ettinsed g. f

ggrlocahere

sets a

minee ima cohop A).

rs

o tw

d foofilings

unfor m

regaatioe eat of

mie ar

mproontaand

wo u

for ing of

nivemar

atioon (ach f topixture wovedainsd am

users

exof mo

ersarke

n toLDitem

picsure weigd b

s allmen

s

ploloc

obilelly

eting

o oDA) m os. Tof ghty cl intnity

oitatcatioe tefor

g pu

obta(B

of a This

catts alonsterm

y). T

tiononselcor prurp

ain lei,cores

tegollowsidemedThe

21

n os ano oprofilose

use Ngllecsemoriewinerindiatere i

1

of nd p-l-es

er g, c-

m-es ng ng te is

Page 14: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

Krzysztof Węcel

212

References Abel F., Hauff C., Houben G.-J., Tao K. (2012), Leveraging User Modeling on the So-

cial Web with Linked Sata [in:] Web Engineering, Springer-Verlag, Berlin-Heidelberg, pp. 378-385.

Auer S., Lehmann J., Hellmann S. (2009), LinkedGeoData: Adding a Spatial Dimension to the Web of Data, ISWC 2009, Vol. 5823, Springer, Heidelberg, pp. 731-746.

Baccelli F., Bolot J. (2011), Modeling the Economic Value of the Location Data of Mo-bile Users, INFOCOM, IEEE, pp. 1467-1475.

Blei D.M., Ng A.Y., Jordan M.I. (2012), Latent Dirichlet Allocation, “Journal of Machine Learning Research”, Vol. 3(4-5), pp. 993-1022, doi:10.1162/jmlr.2003.3.4-5.993.

Cano A.E., Dadzie A.-S., Burel G., Ciravegna F. (2013), Topica-Profiling Locations through Social Streams. Semantic Technology, Springer-Verlag, Berlin-Heidelberg, pp. 290-305.

Chen F., Joshi D., Miura Y., Ohkuma T. (2014), Social Media-based Profiling of Busi-ness Locations, Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, Orlando, FL, pp. 1-6.

Fawcett T. (2006), An Introduction to ROC Analysis, “Pattern Recognition Letters”, Vol. 27(8), pp. 861-874, doi:10.1016/j.patrec.2005.10.010.

Görnerup O. (2012), Scalable Mining of Common Routes in Mobile Communication Network Traffic Data [in:] J. Kay, P. Lukowicz, H. Tokuda, P. Olivier, A. Krüger (eds.), “Pervasive Computing”, Vol. 7319, Springer-Verlag London, pp. 99-106, doi:10.1007/978-3-642-31205-2_7.

Liu F., Janssens D., Cui J., Wang Y., Wets G., Cools M. (2014), Building a Validation Measure for Activity-based Transportation Models Based on Mobile Phone Data, “Expert Systems with Applications”, Vol. 41(14), pp. 6174-6189, doi: 10.1016/ j.eswa.2014.03.054.

Liu F., Janssens D., Wets G., Cools M. (2013), Annotating Mobile Phone Location Data with Activity Purposes Using Machine Learning Algorithms, “Expert Systems with Applications”, Vol. 40(8), pp. 3299-3311. doi:10.1016/j.eswa.2012.12.100.

Ostuni V.C., Gentile G., Di Noia T., Mirizzi R., Romito D., Di Sciascio E. (2013), Mobile Movie Recommendations with Linked Data [in:] Availability, Reliability, and Security in Information Systems and HCI, Springer, Berlin-Heidelberg, pp. 400-415.

Qu Y., Zhang J. (2013), Trade Area Analysis Using User Generated Mobile Location Data, Proceedings of the 22nd International Conference on World Wide Web, Re-public and Canton of Geneva, Switzerland, International World Wide Web Confer-ences Steering Committee, pp. 1053-1064, http://dl.acm.org/citation.cfm?id= 2488388.2488480 (accessed: 30.08.2015).

Ruotsalo T., Haav K., Stoyanov A., Roche S., Fani E., Deliai R., Mäkelä E., Kauppinen T., Hyvönen E. (2013), SMARTMUSEUM: A Mobile Recommender System for the Web of Data. “Web Semantics: Science, Services and Agents on the World Wide Web”, Vol. 20(0), pp. 50-67, doi:10.1016/j.websem.2013.03.001.

Song C., Qu Z., Blumm N., Barabási A.-L. (2010), Limits of Predictability in Human Mobility, “Science”, Vol. 327(5968), pp. 1018-1021.

Page 15: LINKED GEODATA FOR PROFILING OF TELCO USERS · 2016-02-02 · Linked geodata for profiling of telco users 201 In many studies in order to enrich and disambiguate information gathered

Linked geodata for profiling of telco users

213

POWIĄZANE GEODANE DLA PROFILOWANIA UŻYTKOWNIKÓW TELCO

Streszczenie: Obserwuje się rosnące zainteresowanie geograficznym profilowaniem użytkowników, rozumianym jako łączenie danych geograficznych z anonimowymi pro-filami użytkowników. Profil jednostki zazwyczaj składa się z pojęć geograficznych oznaczonych wagami, odzwierciedlającymi względną ważność poszczególnych pojęć dla odróżniania użytkowników. Proponowana metoda profilowania użytkowników sieci komórkowych jest dwuetapowa. W pierwszej kolejności tworzone są profile stacji prze-kaźnikowych (BTS) na podstawie społecznie dostarczonych informacji geograficznych. Następnie te profile są wykorzystywane do uogólnienia zachowania użytkownika, wyni-kającego z analizy logów jego połączeń (CDR). Chmura danych powiązanych (linked data) jest wykorzystywana jako dodatkowe źródło wiedzy w procesie modelowania użytkownika. Słowa kluczowe: dane powiązane, profilowanie użytkownika, powiązane geodane, logi połączeń, użytkownik mobilny, telco, cdr, bts, lgd, osm.