KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB

1 楽天株式会社開発部アーキテクトG 窪田博昭｜ 2012年1月18日

KVSの性能

RDBMSのインデックス

更にMapReduceを併せ持つ

All-in-one NoSQL

• Introduction

• How to use mongo on the news.infoseek.co.jp

Agenda

Introduction

Who am I ?

Name：窪田博昭 Hiroaki Kubota

Company： Rakuten Inc.

Unit： ACT = Development Unit Architect Group

Mail: hiroaki.kubota@mail.rakuten.com

Hobby： Futsal , Golf

Recent： My physical power has gradual declined...

twitter : crumbjp

github: crumbjp

Profile

Introduction

How to take advantages of the Mongo

for the infoseek news

For instance of our page

Page structure

Layout / Components

Layout Components

Albatross structure

Internet

ContentsDB MongoDB

ReplSet

Gat page layout

Request LayoutDB

MongoDB

ReplSet Call APIs

Retrieve data

Get components

SessionDB

MongoDB

ReplSet

Memcache

Developer

Albatross structure

API servers

ContentsDB MongoDB

ReplSet

Set page layout HTML markup

API settings

LayoutDB

MongoDB

ReplSet

Deploy API

Set components

Batch servers

Insert Data

Layout editor

MapReduce

We have never used MapReduce as regular operation.

However, We have used it for some irreglar case.

• To search the invalid articles that should be removed because of someone’s mistakes...

• To analyze the number of new articles posted a day.

• To analyze the updated number an article.

• We get start considering to use it regularly for the social data analyzing before long ...

Our usage

MapReduce

Structure & Performance

Structure

• Intel(R) Xeon(R) CPU X5650 2.67GHz 1core!!

• 4GB memory

• 50 GB disk space ( iScsi )

• CentOS5.5 64bit

• mongodb 1.8.0

– ReplicaSet 5 nodes ( + 1 Arbiter)

– Oplog size 1.2GB

– Average object size 1KB

We are using very poor machine (Virtual machine) !!

Structure

We’ve also researched following environments...

• Virtual machine 1 core

– 1kb data , 6,000,000 documents

– 8kb data , 200,000 documents

• Virtual machine 3 core

– 1kb data , 6,000,000 documents

– 8kb data , 200,000 documents

• EC2 large instance

– 2kb data , 60,000,000 documents. ( 100GB )

Researched environment

Performance

1~8 kb documents + 1 unique index

C = Number of CPU cores (Xeon 2.67 GHz)

DD = Score of ‘dd’ command (byte/sec)

S = Document size (byte)

• GET qps = 4500 × C

• SET(fsync) bytes/s = 0.05×DD ÷ S

• SET(nsync) qps = 4500 BUT...

have chance of STALE

I found the formula for making a rough estimation of QPS

Performance example (on EC2 large)

Data-type

shop: 'someone',

item: 'something',

description: 'item explanation sentences...‘

EC2 large instance

– 2kb data , 60,000,000 documents. ( 100GB )

– 1 unique index

Environment and amount of data

Batch insert (1000 documents) fsync=true

17906 sec (=289 min) (=3358 docs/sec)

Ensure index (background=false)

4049 sec (=67min)

1. primary 2101 sec (=35min)

2. secondary 1948 sec (=32min)

Add one node

5833sec (=97min)

1. Get files 2GB×48 2120 sec (=35min)

2. _id indexing 1406 sec (=23min)

3. uniq indexing 2251 sec (=38min)

4. other processes 56 sec (=1 min)

Group by

• Reduce by unique index & map & reduce

– 368 msec

db.data.group({ key: { shop: 1},

cond: { shop: 'someone' },

reduce: function ( o , p ) { p.sum++; },

initial: { sum: 0 } });

MapReduce

• Scan all data 3116sec (=52min)

– number of key = 39092

db.data.mapReduce(

function(){ emit(this.shop,1); },

function(k,v){

var ret=0;

v.forEach( function (value){ ret+=value; });

return ret; },

{ query: {}, inline: 1, out: 'Tmp' } );

Major problems...

Indexing

Index probrem

Indexing is lock operation in default.

Indexing operation can run as background

on the primary. But...

It CANNOT run as background on the secondary

Moreover the all secondary’s indexing run

at the same time !!

Result in above...

All slave freezes ! orz...

Online indexisng is completely useless even if last version (2.0.2)

Present indexing ( default )

Index probrem

Secondary

Secondary Secondary

Client Client Client Client Client

Primary save

Index probrem

ensureIndex Primary

Indexing

Secondary

Secondary Secondary

Batch Cannot write

Index probrem

Cannot read !!

SYNC SYNC SYNC

Secondary

Indexing

Secondary

Indexing

Secondary

Indexing

Primary

Complete Batch

finished

Index probrem

Ideal indexing ( default )

Secondary

Complete

Secondary

Complete

Secondary

Complete

Primary

Complete Batch

Present indexing ( background )

Index probrem

Secondary

Secondary Secondary

Primary save

Index probrem

Primary

Slowdown

Indexing

Secondary

Secondary Secondary

Batch ensureIndex(background) Slow down...

Index probrem

Cannot read !!

SYNC SYNC SYNC

Secondary

Indexing

Secondary

Indexing

Secondary

Indexing

Primary

Complete Batch

finished

Index probrem

Cannot read !!

SYNC SYNC SYNC

Secondary

Indexing

Secondary

Indexing

Secondary

Indexing

Primary

Complete Batch

finished

Background indexing don’t work

on the secondaries

Index probrem

Cannot read !!

SYNC SYNC SYNC

Secondary

Indexing

Secondary

Indexing

Secondary

Indexing

Primary

Complete Batch

finished

Index probrem

Ideal indexing ( background )

Secondary

Complete

Secondary

Complete

Secondary

Complete

Primary

Complete Batch

Probable 2.1.X indexing

Index probrem

But not released formally.

So I checked out the source code up to date.

Certainlly it’ll be fixed !

Moreover it sounds like it’ll run as foreground

when slave status isn’t SECONDARY

(it means RECOVERING )

Accoding to mongodb.org this probrem will fix in 2.1.0

Index probrem

Secondary

Secondary Secondary

Primary save

Index probrem

Secondary

Secondary Secondary

ensureIndex(background)

Primary

Slowdown

Indexing

Slow down...

Index probrem

Slow down...

SYNC SYNC

Secondary

Slowdown

Indexing

Secondary

Slowdown

Indexing

Secondary

Slowdown

Indexing

Primary

Complete Batch

finished

Index probrem

Secondary

Complete

Secondary

Complete

Secondary

Complete

Primary

Complete

Index probrem

But I think it’s not enough.

I think it can be fatal for the system that

the all secondaries slowdown at the same time !!

Background indexing 2.1.X

Ideal indexing

Index probrem

Secondary

Secondary Secondary

Primary save

Ideal indexing

Index probrem

Ideal indexing

Secondary

Secondary Secondary

ensureIndex(background)

Primary

Slowdown

Indexing

Slow down...

Index probrem

Ideal indexing

ensureIndex

Recovering

Indexing

Secondary

finished Primary

Complete Batch

Index probrem

Ideal indexing

ensureIndex

Secondary

Complete

Recovering

Indexing

Secondary

Primary

Complete

Index probrem

Ideal indexing

ensureIndex

Secondary

Complete

Secondary

Complete

Recovering

Indexing

Primary

Complete

Index probrem

Ideal indexing

Secondary

Complete

Secondary

Complete

Secondary

Complete

Primary

Complete Batch

Index probrem

It would be great if I can operate indexing manually

at each secondaries

But ... I easilly guess it’s difficult to apply for current Oplog

I suggest Manual indexing

Index probrem

Secondary

Secondary Secondary

Primary save

Manual indexing

Index probrem

Manual indexing

Secondary

Secondary Secondary

Primary

Slowdown

Indexing

Slow down... ensureIndex(manual,background)

Index probrem

Manual indexing

finished Primary

Complete Batch

Secondary

Index probrem

Manual indexing

finished Primary

Complete Batch

Secondary

The secondaries don’t sync

automatically

Index probrem

Manual indexing

finished Primary

Complete Batch

Secondary

Index probrem

Manual indexing

Recovering

Indexing

Secondary

ensureIndex(manual)

Primary

Complete Batch

Index probrem

Manual indexing

Secondary

Complete

Recovering

Indexing

Secondary

ensureIndex(manual)

Primary

Complete Batch

Index probrem

Manual indexing

Secondary

Complete

Secondary

Complete

Secondary

Slowdown

Indexing

ensureIndex(manual,background)

Primary

Complete Batch

Index probrem

Manual indexing

Secondary

Complete

Secondary

Complete

Secondary

Slowdown

Indexing

Primary

Complete Batch

It needs to support

background operation

Just in case,if the ReplSet has only

one Secondary

Index probrem

Manual indexing

Secondary

Complete

Secondary

Complete

Secondary

Slowdown

Indexing

Primary

Complete Batch

Index probrem

Manual indexing

Secondary

Complete

Secondary

Complete

Secondary

Complete

Primary

Complete Batch

That’s all about Indexing problem

Struggle to control the sync

Unknown log & Out of control the ReplSet

• Secondaries change status repeatedly in a moment

between Secondary and Recovering (1.8.0)

• Then we found the strange line in the log...

[rsSync] replSet error RS102 too stale to catch up

We often suffered from going out of control the Secondaries...

What’s Stale ?

• 〈食品・飲料などが〉新鮮でない（⇔fresh）；

• 気の抜けた, 〈コーヒーが〉香りの抜けた,

• 〈パンが〉ひからびた, 堅くなった,

• 〈空気・臭(にお)いなどが〉むっとする,

• いやな臭いのする

stale [stéil] (レベル：社会人必須 ) powered by goo.ne.jp

What’s Stale ?

• 〈食品・飲料などが〉新鮮でない（⇔fresh）；

• 気の抜けた, 〈コーヒーが〉香りの抜けた,

• 〈パンが〉ひからびた, 堅くなった,

• 〈空気・臭(にお)いなどが〉むっとする,

• いやな臭いのする

どうも非常によろしくないらしい・・・

stale [stéil] (レベル：社会人必須 ) powered by goo.ne.jp

Mechanizm of being stale

ReplicaSet

Primary

Database Oplog

Client

Secondary

Database

mongod mongod

Replication (simple case)

ReplicaSet

Primary

Database Oplog

Client

Secondary

Database

mongod mongod

Insert & Replication 1

Primary

Database Oplog

Client

Secondary

Database

Insert A

Insert

mongod mongod

Primary

Database Oplog

Client

Secondary

Database

Insert A

Replication (busy case)

Primary

Database Oplog

Client

Secondary

Database

Insert A

mongod mongod

Primary

Database Oplog

Client

Secondary

Database

Insert A

Insert

Insert B

Insert A

Primary

Database Oplog

Client

Secondary

Database

Insert A

Insert

Insert B

Insert C

Insert A

Primary

Database Oplog

Client

Secondary

Database

Insert A

Update

Insert B

Insert C

Update A

Insert A

Primary

Database Oplog

Client

Secondary

Database

Insert A

Insert B

Insert C

Update A

Insert A

Check Oplog

Primary

Database Oplog

Client

Secondary

Database

Insert A

Insert B C

Insert C

Update A

Insert A

Insert B

Insert C

Update A

Replication (more busy)

Primary

Database Oplog

Client

Secondary

Database

Insert A

mongod mongod

Primary

Database Oplog

Client

Secondary

Database

Insert A

Insert

Insert B

Insert A

Primary

Database Oplog

Client

Secondary

Database

Insert A

Insert

Insert B

Insert C

Insert A

Primary

Database Oplog

Client

Secondary

Database

Insert A

Update

Insert B

Insert C

Update A

Insert A

Primary

Database

Client

Secondary

Database

Update

Insert A

Insert B

Insert C

Update A

Update C

Primary

Database

Client

Secondary

Database

Insert

Insert A

Insert B

Insert C

Update A

Update C

Insert D

Insert A

Primary

Database

Client

Secondary

Database

Insert A

Insert B

Insert C

Update A

Update C

Insert D

Check Oplog

[Inset A]

not found !!

Insert A

Primary

Database

Client

Recovering

Database

Insert A

Insert B

Insert C

Update A

Update C

Insert D

Check Oplog

[Inset A]

not found !!

It cannot get

infomation about

[Insert B].

So cannot sync !!

It’s called STALE Insert A

We can specify the oplog size as one of the command line option

Only at the first time per the dbpath

that is also specified as a command line.

Also we cannot change the oplog size

without clearing the dbpath.

Be careful !

We have to understand the importance of adjusting oplog size

Replication (Join as a new node)

InitialSync

Primary

Database

Client

Insert C

Update A

Update C

Insert D

mongod

InitialSync

Primary

Database

Client

Startup

Database

Insert C

Update A

Update C

Insert D

mongod mongod

InitialSync

Primary

Database

Client

Recovering

Database

Insert C

Update A

Update C

Insert D

Get last Oplog

Insert D

InitialSync

Primary

Database

Client

Recovering

Database

Insert C

Update A

Update C

Insert D

Cloning DB

Insert D

InitialSync

Primary

Database

Client

Recovering

Database

Insert C

Update A

Update C

Insert D

Cloning DB

Insert D A

InitialSync

Primary

Database

Client

Recovering

Database

Insert C

Update A

Update C

Insert D

Cloning DB

Insert D A

Insert

E Insert E

InitialSync

Primary

Database

Client

Recovering

Database

Update A

Update C

Insert D

Cloning DB complete

Insert D

Update

Insert E

Update B

InitialSync

Primary

Database

Client

Recovering

Database

Update C

Insert D

Check Oplog

Insert D

Insert E

Update B

InitialSync

Primary

Database

Client

Secondary

Database

Update C

Insert D

Insert E

Update B

Insert E

Update B

Additional infomation

Secondary will try to sync from other Secondaries

when it cannot reach the Primary or

might be stale against the Primary.

There is a bit of chance that sync problem not occured if the

secondary has old Oplog or larger Oplog space than Primary

From source code. ( I’ve never examed these... )

Sync from another secondary

Primary

Database

Client

Secondary

Database

Insert A

Insert B

Insert C

Update A

Update C

Insert D

Insert A

Secondary

Database

Insert B

Insert C

Update A

Update C

Insert D

Insert A

Primary

Database

Client

Secondary

Database

Insert A

Insert B

Insert C

Update A

Update C

Insert D

Insert A

Secondary

Database

Insert B

Insert C

Update A

Update C

Insert D

Insert A

Check Oplog

[Inset A]

not found !!

Primary

Database

Client

Secondary

Database

Insert A

Insert B

Insert C

Update A

Update C

Insert D

Insert A

Secondary

Database

Insert B

Insert C

Update A

Update C

Insert D

Insert A

Check Oplog

But found at the other secondary

So it’s able to sync

Sync from the other secondary

Primary

Database

Client

Secondary

Insert C

Update A

Update C

Insert D

Secondary

Database

Insert B

Insert C

Update A

Update C

Insert D

Insert A

But found at the other secondary

So it’s able to sync

Database

Insert C

Update A

Update C

Insert D

Insert B

Insert A

Insert B

Insert A

That’s all about sync

Others...

Disk space

Data fragment into any DB files sparsely...

We met the unfavorable circumstance in our DBs

This circumstance appears at some of our collections

around 3 months after we launched the services

db.ourcol.storageSize() = 16200727264 (15GB)

db.ourcol.totalSize() = 16200809184

db.ourcol.totalIndexSize() = 81920

db.outcol.dataSize() = 2032300 (2MB)

What’s happen to them !!

Disk space

Data fragment into any DB files sparsely...

It’s seems like to be caused by the specific operation

that insert , update and delete over and over.

Anyway we have to shrink the using disk space regularly

just like PostgreSQL’s vacume.

But how to do it ?

Disk space

Shrink the using disk spaces

MongoDB offers some functions for this case.

But couldn’t use in our case !

repairdatabase:

Only runable on the Primary.

It needs long time and BLOCK all operations !!

compact:

Only runable on the Secondary.

Zero-fill the blank space instead of shrink disk spaces.

So cannot shrink...

Disk space

Our measurements

For temporary collection:

To issue drop-command regularly.

For other collections:

1. Get rid of one secondary from the ReplSet.

2. Shut down this.

3. Remove all DB files.

4. Join to the ReplSet.

5. Do these operations one after another.

6. Step down the Primary. (Change Primary node)

7. At last, do 1 – 4 operations on prior Primary.

PHP client

We tried 1.4.4 and 1.2.2

1.4.4:

There is some critical bugs around connection pool.

We struggled to invalidate the broken connection.

I think, you should use 1.2.X instead of 1.4.X

1.2.2:

It seems like to be fixed around connection pool.

But there are 2 critical bugs !

– Socket handle leak

– Useless sleep

However, This version is relatively stable

as long as to fix these bugs

PHP client

We tried 1.4.4 and 1.2.2

https://github.com/crumbjp/Personal

- mongo1.2.2.non-wait.patch

- mongo1.2.2.sock-leak.patch

PHP client

Closing

What’s MongoDB ?

It has very good READ performance.

We can use mongo instead of memcached.

if we can allow the limited write performance.

Die hard !

MongoDB have high availability even if under a severe stress..

Can use easilly without deep consideration

We can manage to do anything after getting start to use.

Let’s forget any awkward trivial things that have bothered us.

How to treat the huge data ?

How to put in the cache system ?

How to keep the availablity ?

And so on ....

Closing

Keep in mind

Sharding is challenging...

It’s last resort !

It’s hard to operate. In particular, to maintain config-servers.

[Mongos] is also difficult to keep alive.

I want the way to failover Mongos.

Mongo is able to run on the poor environment but...

You should ONLY put aside the large diskspace

Huge write is sensitive

Adjust the oplog size carefully

Indexing function has been unfinished

Cannot apply index online

All right, Have fun !!

Thank you for your listening

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB

Technology

Tibero RDBMS Backup&Recovery(Basic) RDBMS... · 2019-06-26 · Tibero RDBMS Backup&Recovery(Basic) 본 문서에서는 Tibero RDBMS 운영 중에 발생할 수 있는 장애상황을

contractual list - KVS

Cisnce - KVS

Week8 Gis Rdbms

KVS - strojarnica-turbinemaster.grad.hr/nastava/hidrotehnika/gf/kvs/predavanja5.pdf1 KVS - strojarnica • Strojarnica, u širem smislu, je skup građevina i opreme koja se koristi

4. farmakologija KVS

PRORACUN Kvs

分散 KVS の概要と okuyama の紹介 · データベースソフトウェアの分類 RDBMS NoSQL 一貫性重視スケールアップ・Oracle ・MySQL ・PostgreSQL etc

01 - RDBMS

KVS Tutorial

COA 2018 メーカー名インデックス - jointex.co.jp · 2 jointex 2018 メーカー名インデックスメーカー名インデックスindex あえ転倒防止スタンドボード

20140510 kvs

HG KVS Brane

KVS Energija i Snaga

Russell/Nomura 日本株インデックスインデックス …qr.nomura.co.jp/jp/frcnri/docs/RN_rule201307.pdfRussell/Nomura 日本株インデックスインデックス構成ルールブック

ストックインデックスファンド225 › saving › invest › pdf › stock...ストックインデックスファンド225 追加型投信／国内／株式／インデックス型

Db for rdbms

Mahesh Kvs Ppt

RDBMS Concepts

SMT グローバル債券インデックス・オープンSMT グローバル債券インデックス･オープン SMT グローバル債券インデックス・オープン追加型投信／海外／債券／インデックス型