, . : , , , , . : BigData ;: .
: Rack 10 Units
. , 7 ( !). , . , .. , , . , - , , .
7Blade: GPU SuperBlade SBI-7127RG2 CPU Intel Xeon E5-265032 Gb RAM2x Tesla M2075 6 Gb RAMInfiniBand 4x QDR (40Gbps)Network 2x Gigabit Ethernet
1Blade: GPU SuperBlade SBI-7127RG2 CPU Intel Xeon E560624 Gb RAM2x SSD 80Gb4x HDD 300GbInfiniBand 4x QDR (40Gbps)Network 2x Gigabit Ethernet
, HPC , .. ...
1. Intel Xeon processor E5-2600 family; QPI up to 8.0 GT/s 2. Intel C602 Chipset 3. Up to 256GB RDIMM or 64GB UDIMM; 8x DIMM slots 4. Intel i350 Dual port Gigabit Ethernet 5. 4x QDR (40Gb) InfiniBand or 10GbE mezzanine HCA 6. IPMI 2.0, KVM over IP, Virtual Media 7. 1x SATA DOM up to 64GB 8. Integrated Matrox G200eW Graphics
Infiniband 40Gbit, IPv4 (.. IPoverIB) 10Gbit. , K-means .
CPU Intel Xeon E5-2650
Intel Xeon E5-2600
Sandy Bridge
2012
8
2
2000
2800 (1 2 )2700 (3 )2500 (4, 5 6 )2400 (7 8 )
L320
4 DDR 3
AVX, SSE1-4, EM64T, AES .
(double)~150 Gflops
GPU Nvidia Tesla M2075
Fermi
2011
448
1215
6
144 /
- (double)~500 Gflops
2 : 14 Tesla14. x 448 x 32 => 200704 , 65535 !
2 : 14 Tesla14 x 448 x 32 => 200704 , 65535 !
High performance computing (HPC)
( ) . .
( ) . .
HPC-
HPC . .
( ) . .
Alt Linux 7.0
TORQUE
gcc
OpenMP
OpenMPI
OpenCL
Nvidia CUDA Toolkit
OpenSUSE 13.2(SLES 11.4)
, .. AltLinux . OpenSUSE SLES.
0OpenMP + MPI + CUDA
1 CPU GPU
OpenMP + MPI
CUDA
: 60%
2 CPU
: 10 15%
Deep learnong
(EDA)
,
- . .. . , Mathematica 2014. LIGO, ,
...
... : (Albert@home, Asteroids@home, Cosmology@Home, Einstein@Home)
(ATLAS@Home)
(BURP, Electric Sheep)
(CAS@home)
(Climate Prediction)
(Collatz Conjecture)
(DENIS@Home)
(DistributedDataMining)
(Distributed.net)
(DreamLab)
(Folding@home) : https://en.wikipedia.org/wiki/List_of_distributed_computing_projects
.
!!!!
Infrastructure-as-a-Service (IaaS)
Platform-as-a-Service (PaaS)
, .
(IaaS, . Infrastructure-as-a-Service) , ,
(PaaS, . Platform-as-a-Service) ,
, - . . - , , , - . (SaaS, . Software-as-a-Service), . IaaS PaaS.
OpenStack
OpenStack , .
(IaaS, . Infrastructure-as-a-Service) , , , , , , . (PaaS, . Platform-as-a-Service) , Amazon e2, Microsoft Azure, ElasticHosts...
GlusterFS , , ...
Text
IaaS , . . PaaS ssh( .) RemoteDesktop/TeamVewer
BigData
. . .
. . ... , , .
LHPChadoop
2011 Yahoo , Hadoop, Hortonworks. Hortonworks . Ambari, , . . ...
HDFS
HDFS (Hadoop Distributed File System) , , , GoogleFS
BigData . HPC BigData , . ... , ., GoogleFS, HDFS (Hadoop Distributed File System) , , . HDFS ( ) , , ( , ) . . HDFS ( ), . : , , , .
HDFS LHPChadoop
GlusterFS , HDFS
HDFS , , Hadoop HDFS, Amazon S3 CloudStore[en] . , HDFS MapReduce-, , , NoSQL- HBase, Apache Mahout. HDFS Hadoop, ...- , ;- GlusterFS ;- ;- HDFS ;- Ethernet Infiniband.
Hadoop MapReduce
Hadoop HDFS MapReduce
MapReduce , Google .
YARN
Yet Another Resource Negotiator
YARN , .
YARN ,
YARN (. Yet Another Resource Negotiator ) , 2.0 (2013), . MapReduce, (JobTracker), YARN (ResourceManager), . YARN MapReduce-, , ; YARN .YARN , , .
Hive
Apache Hive Hadoop (.. HDFS+MapReduce) , .HiveQL SQL- HDFS
Apache Hive Hadoop (.. HDFS+MapReduce) , . Facebook, . Netflix, Amazon, Amazon Elastic MapReduce Amazon Web Services. SQL- HDFS HiveQL, MapReduce . . Bitmap index .
Pig
Pig Latin
User Defined Functions on Java, Python, JavaScript, Ruby or Groovy
lazy evaluation
extract, transform, load (ETL)
is able to store data at any point during a pipeline
declares execution plans
supports pipeline splits, thus allowing workflows to proceed along DAGs instead of strictly sequential pipelines
Pig , MapReduce Hadoop. Yahoo 2006. Pig Latin, Java MapReduce SQL. Java, Python, JavaScript, Ruby or Groov.: ; extract, transform, load (ETL) , , ; ; ; .
Mahout
Distributed Row Matrix API with R and Matlab like operators
Similarity Analysis
Collaborative Filtering
Classification
Clustering
Dimensionality Reduction note
Frequent itemset mining
etc.
Mahout . MapReduce, .
Mahout . MapReduce, . Mahout: Basic Linear Algebra; ; Collaborative filtering ; ; Frequent itemset mining .
Giraph
Giraph MapReduce.
Facebook: 200 4
Giraph MapReduce.Giraph: Facebook G 200 4
HBase
HBase features compression
in-memory operation
Bloom filters on a per-column basis
Replication across the data center
Atomic and strongly consistent row-level operations
Near real time lookups
cells no larger than 10 MB
1 and 3 column families per table
Time based versions
HBase NoSQL , Google BigTable. HDFS BigTable- Hadoop.
HBase NoSQL ; Java; Google BigTable. HDFS BigTable- Hadoop, . Facebook . , , , , , CAP consistency, availability, partition tolerance
Kafka
Apache Kafka .
Apache Kafka . LinkedIn , , . , -, , .
Storm
Fast
Fast
Scalable
Fault-tolerant
Reliable
Easy to operate
Apache Storm near real-time . MISD ( ).
Apache Storm real-time . Twitter MISD , .. .Fast 100 . -
Scalable
Fault-tolerant , ..
Reliable , .
Easy to operate
Spark speed
Logistic regression in Hadoop and Spark
.
100 Hadoop MapReduce 10
, Hadoop. Hadoop, MapReduce , , , , . 100 Hadoop MapReduce 10
Spark Ease of Use
Word count in Spark's Python API
Java, Scala, Python, R.
Scala, Python R.
Java, Scala, Python, R. 80 , Map Reduce, . Scala, Python R.
Spark Speed
Streaming, SQL, Graph processing and machine learning
SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming.
SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. . Spark Streaming , Storm, MISD , SIMD
Spark speed
Access diverse data sources including HDFS, Cassandra, Hbase, S3, Hive, Tachyon, and any Hadoop data source
, Hadoop. Hadoop, MapReduce , , , , . 100 Hadoop MapReduce 10
Zeppelin
Hadoop Spark. , Scala, Hive, SparkSQL, Linux Shell,
Z Hadoop Spark. . , Scala, Hive, SparkSQL, Linux Shell, iPython. , HDFS, NFS S3, Twitter ..
Zeppelin
|grep http,GET,POST,CONNECT...
Kafka
, ,
BigData
BD. . , , ,
- ..!
HP
1. BigData
2. BigData: Data Computing, Data Sciense
3.
, . . , . . BigData . , , .. - , , . , , . : , . , .
Recommended