Upload
computer-science-club
View
496
Download
0
Embed Size (px)
Citation preview
Академические инициативыАкадемические инициативы Яндексад
Павел БраславскийПавел Браславский
Академические инициативыАкадемические инициативы
• Школа Анализа Данных
• Семинары ЯндексаСеминары Яндекса
• Интернет‐математика
• РОМИП
• Школа по информационному поискеШкола по информационному поиске (RuSSIR)
• Книга «Введение в информационный поиск»
2
Yandex School of Data Analysis
two‐year master program, http://shad.yandex.ru3
Teachers
4
Scientific seminarsScientific seminarsMonthly seminars on Data analysis &Monthly seminars on Data analysis & information retrieval
Organized by Microsoft Research +Microsoft Research + Яндекс
5
http://company.yandex.ru/public/seminars/schedule/
IMAT 2009IMAT 2009
L i k• Learning to rank • 245 features for query‐document pairs• Graded relevance judgments (0..4)• Pure numeric data (i.e. no original queries, documents
f )or feature semantics)• Learning set: 97 290 feature vectors (9 124 queries)• Test set: 115 643 vectors (21 103 – public evaluation; 94 540 – final evaluation)
• Evaluation measure: DCG• http://imat2009.yandex.ru
6
7
IMAT 2010IMAT 2010
ffi i di i• Traffic congestion prediction• (Rough) data:( g )
– Modified graph of Moscow streets – Observed traffic speed 4‐10 pm (4‐min intervals)Observed traffic speed 4 10 pm (4 min intervals) for 30 subsequent days + 4‐6 pm on the 31st day
• Task: predict traffic speed 6‐10 pm of the 31stTask: predict traffic speed 6‐10 pm of the 31day
bli /fi l l ti• public/final evaluation• http://imat2010.yandex.ru
8
Modified graph of streetsModified graph of streets
9
IMAT 2010 DataIMAT 2010 Data
G h ti (139 241/33 029) d d (206• Graph: vertices (139 241/33 029) and edges (206 260/86 249)– <id vertex> <id group>_ _g p– <id_edge> <id_edge_group> <start_vert> <end_vert>– <id_edge_group> <length> <avg_speed>
• Observations (learning set 29 226 208 lines)• Observations (learning set, 29 226 208 lines)– <id_edge_group> <day> <time> <speed>
• Task (691 641 lines)( )– <id_edge_group> <day> <time> ??
• Evaluation
10
11
ИМАТ 2011ИМАТ 2011
Старт конкурса – февраль 2011
Задача интересная, победителю – приз☺Задача интересная, победителю приз ☺
12
ROMIPROMIP• TREC‐like Russian initiative
• Started 2002
• Several text and image collectionsg
• 10‐15 participants per year (total 50+)• Academia and industry, students support
• ~3 000 man‐hours of evaluation (2009)
• Remote participation + live meetingRemote participation + live meeting
• Collections are freely available
• Popular testbed for IR research in Russia
13
ROMIP largest text collectionsROMIP largest text collections
Collection Documents Size(compressed) Topics
Evaluated within ad‐hoc search
tracktrack
Legal ~300 000 2 Gb 14 794 220
ByWeb 1 524 676 8 Gb ~ 60 000 1 500+By.Web 1 524 676 8 Gb 60 000 1 500+
KM.RU 3 010 455 13 Gb ~ 60 000 ~250
14
Image collectionsImage collectionsPhoto collection: 20 000 images from FlickrPhoto collection: 20 000 images from Flickr
Dups collection: 15 hrs video 37 800 frames
15 15
RuSSIRRuSSIR• Yekaterinburg, 5‐12 September 2007Yekaterinburg, 5 12 September 2007
http://romip.ru/russir2007
• Taganrog 1 5 September 2008• Taganrog, 1‐5 September 2008http://romip.ru/russir2008/
• Petrozavodsk, 11‐16 September 2009http://romip.ru/russir2009/
• Voronezh, 13‐18 September 2010http://romip.ru/russir2010/
• Saint Petersburg, 15‐19 August 2011http://romip.ru/edbt‐russir2011/p p
16
RuSSIRRuSSIR
• Put RuSSIR pic here
• Annual eventAnnual event
• 100+ participants
• 4th RuSSIR: Voronezh 13‐18 September
• http://romip ru/russir2010/http://romip.ru/russir2010/
17
Информационный поиск по русскиИнформационный поиск по‐русски
18
Оригинальная английская версия: http://informationretrieval.org
Павел БраславскийПавел Браславскийpb@yandex‐team.ru
19