Семантическое ядро рунета

Embed Size (px)

Citation preview

2011

XII
CEE-SECR / 28 - 29 , Click to edit Master title style

Click to edit Master title style

Click to edit Master text stylesSecond levelThird levelFourth level

Fifth level

28.10.2016

21.6.16

21.6.16

C:\Users\filippov\Desktop\logo.png

ontent-based Amazon Kinesis/Lucene

1-

?

/

,

,

/

Best-sellers

-

Mining of Massive Datasets, 9.1.2: Leskovec, Rajaraman, Ullman (Stanford University)

/

/, (User-User)

(Item-Item):

/

( )

// ()

Apache Spark MLlib (als), Apache Mahout (Taste) + ,

/

Content-based

.

Toyota, , Toyota

: Sphinx, Lucene (Solr)

: . , .

/

(, , )

/

, , ,

/

?

?

?

?

...

/

- RabbitMQ

http://www.rabbitmq.com

AMQP

Erlang

/

- Apache Kafka

http://kafka.apache.org/

LinkedIn

!

Scala

/

- Apache Storm

http://storm.apache.org

Task parallel

, workflow

Clojure/JVM

/

- Pinba

http://pinba.org

. MySQL PHP

,

Badoo.com

/

Apache Hadoop

:

- (MapReduce)

- (HDFS)

- SQL- (Hive)

/

Apache Spark

!

/

MapReduce

Mining of Massive Datasets: Leskovec, Rajaraman, Ullman (Stanford University)

/

Apache Spark

/

Online , !

SQL ;-)

Mining of Massive Datasets: Leskovec, Rajaraman, Ullman (Stanford University)

/

SQL MapReduce: Hive, Pig, Spark SQL

SQL MPP (massive parallel processing):

Impala, Presto, Amazon RedShift, Vertica

NoSQL: Cassandra, Hbase, Amazon DynamoDB

: MySQL, MS SQL, Oracle,

/

Lambda-

/

BigData . .

ID

ID


/

BigData . .

~1000 /

bitrix.info

,

Batch

On-line

analytics.bitrix.info

/

Amazon DynamoDB

nginx+Lua

Amazon Kinesis

BigData . .

~1000 /

bitrix.info

workers cluster

worker (PHP)

~100 /

worker (PHP)

worker (PHP)

worker (PHP)

worker (PHP)

worker (PHP)

worker (PHP)

worker (PHP)

/

Amazon DynamoDB

BigData . , , .

Apache Spark

(spot)

Amazon S3

(spot)

(spot)

(spot)

Apache Tomcat

Apache Mahout

analytics.bitrix.info

/

Apache Lucene

Doug Cutting: Nutch, Hadoop (Yahoo!) Cloudera

Lucene: Solr, ElasticSearch

Lucene:

/

Apache Lucene: +/-

(-)

(-)

(-)

(-) 100%

(+) ()

(+) API

(+)

(+) Thread-safety

/

Redis

:

(word2vec, glove, ...)

/

Amazon Kinesis

Java indexing workers (16)

~1000 /

content-based

Index (disk)

Index (disk)

Redis (profiles)

Servlet

/

, java/lucene

Amazon Kinesis

,

Servlet

/

: -

: , 100

: , h1

: , ,

: ,

/

/


! ?

@[email protected]