If you can't read please download the document
Upload
cee-secr
View
119
Download
0
Embed Size (px)
Citation preview
2011
XII
CEE-SECR / 28 - 29 , Click to edit Master title style
Click to edit Master title style
Click to edit Master text stylesSecond levelThird levelFourth level
Fifth level
28.10.2016
21.6.16
21.6.16
C:\Users\filippov\Desktop\logo.png
ontent-based Amazon Kinesis/Lucene
1-
?
/
,
,
/
Best-sellers
-
Mining of Massive Datasets, 9.1.2: Leskovec, Rajaraman, Ullman (Stanford University)
/
/, (User-User)
(Item-Item):
/
( )
// ()
Apache Spark MLlib (als), Apache Mahout (Taste) + ,
/
Content-based
.
Toyota, , Toyota
: Sphinx, Lucene (Solr)
: . , .
/
(, , )
/
, , ,
/
?
?
?
?
...
/
- RabbitMQ
http://www.rabbitmq.com
AMQP
Erlang
/
- Apache Kafka
http://kafka.apache.org/
!
Scala
/
- Apache Storm
http://storm.apache.org
Task parallel
, workflow
Clojure/JVM
/
- Pinba
http://pinba.org
. MySQL PHP
,
Badoo.com
/
Apache Hadoop
:
- (MapReduce)
- (HDFS)
- SQL- (Hive)
/
Apache Spark
!
/
MapReduce
Mining of Massive Datasets: Leskovec, Rajaraman, Ullman (Stanford University)
/
Apache Spark
/
Online , !
SQL ;-)
Mining of Massive Datasets: Leskovec, Rajaraman, Ullman (Stanford University)
/
SQL MapReduce: Hive, Pig, Spark SQL
SQL MPP (massive parallel processing):
Impala, Presto, Amazon RedShift, Vertica
NoSQL: Cassandra, Hbase, Amazon DynamoDB
: MySQL, MS SQL, Oracle,
/
Lambda-
/
BigData . .
ID
ID
/
BigData . .
~1000 /
bitrix.info
,
Batch
On-line
analytics.bitrix.info
/
Amazon DynamoDB
nginx+Lua
Amazon Kinesis
BigData . .
~1000 /
bitrix.info
workers cluster
worker (PHP)
~100 /
worker (PHP)
worker (PHP)
worker (PHP)
worker (PHP)
worker (PHP)
worker (PHP)
worker (PHP)
/
Amazon DynamoDB
BigData . , , .
Apache Spark
(spot)
Amazon S3
(spot)
(spot)
(spot)
Apache Tomcat
Apache Mahout
analytics.bitrix.info
/
Apache Lucene
Doug Cutting: Nutch, Hadoop (Yahoo!) Cloudera
Lucene: Solr, ElasticSearch
Lucene:
/
Apache Lucene: +/-
(-)
(-)
(-)
(-) 100%
(+) ()
(+) API
(+)
(+) Thread-safety
/
Redis
:
(word2vec, glove, ...)
/
Amazon Kinesis
Java indexing workers (16)
~1000 /
content-based
Index (disk)
Index (disk)
Redis (profiles)
Servlet
/
, java/lucene
Amazon Kinesis
,
Servlet
/
: -
: , 100
: , h1
: , ,
: ,
/
/
! ?