29
MySQL 5.7 JSON datatype 2015.11.29 정지원 1

MySQL 5.7 NF – JSON Datatype 활용

Embed Size (px)

Citation preview

Page 1: MySQL 5.7 NF – JSON Datatype 활용

MySQL 5.7 JSON datatype

2015.11.29

정지원

1

Page 2: MySQL 5.7 NF – JSON Datatype 활용

2

Index

1. Why JSON

2. About JSON datatype

3. DDL ,DML with JSON

4. Indexing JSON data

5. Data performance

6. 적용 사례

7. ROADMAP

Page 3: MySQL 5.7 NF – JSON Datatype 활용

3

1. Why JSON

편리한 객체 나열 형식

JSON data 의 효과적인 처리 필요

RDB & Schemaless data의 통합

새로운 어플리케이션에 대한 기존 database의 대응강화

참고 : http://www.w3schools.com/json/

Page 4: MySQL 5.7 NF – JSON Datatype 활용

4

MySQL 5.7 부터 지원

Binary format

Parse and validation on insert only

Dictionary

Sorted objects’ keys

Fast access to array cells by index

지원되는 타입

모든 JSON type 지원됨

숫자,문자,boolean

객체, 배열

Extended

date, time, datetime, timestamp … 등등

2. About JSON data type

Ex1> ["12:18:29.000000", "2015-07-29", "2015-07-29 12:18:29.000000"] Ex2> SELECT JSON_ARRAY('a', 1, NOW()); +---------------------------------------+ | JSON_ARRAY('a', 1, NOW()) | +----------------------------------------+ | ["a", 1, "2015-07-27 09:43:47.000000"] | +----------------------------------------+

Page 5: MySQL 5.7 NF – JSON Datatype 활용

5

max_allowed_packet

JSON 컬럼 길이 제한

2. About JSON data type

Page 6: MySQL 5.7 NF – JSON Datatype 활용

6

2. About JSON data type

Function List

https://dev.mysql.com/doc/refman/5.7/en/json-functions.html

Page 7: MySQL 5.7 NF – JSON Datatype 활용

7

CREATE & INSERT

3. DDL & DML with JSON

insert into t1(data)

values

('{"series":1}')

,('{"series":7}')

,('{"series":3}')

,(JSON_QUOTE('some, might be formatted,{text} with "quotes"'))

;

select * from t1;

+---------------------------------------------------+

| data |

+---------------------------------------------------+

| {"series": 1} |

| {"series": 7} |

| {"series": 3} |

| "some, might be formatted,{text} with \"quotes\"" |

+---------------------------------------------------+

12 rows in set (0.00 sec)

create table t1

(

data JSON // 데이터 타입 (JSON) );

Page 8: MySQL 5.7 NF – JSON Datatype 활용

8

SELECT

3. DDL & DML with JSON

select * from t1 where json_extract(data,"$.series") >= 3;

+----------------+ | data | +----------------+ | {"series": 3} | | {"series": 7} | +----------------+ select * from t1 where data -> "$.series" >= 3; -- [5.7.9~] inlined json path +----------------------------------+------+ | data | id | +----------------------------------+------+ | {"series": 3} | 7 | | {"series": 7} | 3 | +----------------------------------+------+ select * from t1 where data >= json_object("series",3);

+----------------------------------+------+ | data | id | +----------------------------------+------+ | {"series": 3} | 7 | | {"series": 7} | 3 | | {"a": "valid", "json": ["text"]} | NULL | -- ?? +----------------------------------+------+

Page 9: MySQL 5.7 NF – JSON Datatype 활용

9

UPDATE

3. DDL & DML with JSON

create table gm_friends

(

uid bigint primary key

,friend_uid json -- 친구리스트

);

set @friend := '[113]'; -- 친구추가

insert into gm_friends values (111 , @friend)

on duplicate key update friend_uid = json_merge(friend_uid,@friend);

select * from gm_friends where uid=111;

+-----+------------+

| uid | friend_uid |

+-----+------------+

| 111 | [112, 113] | -- 유저 111의 친구리스트

+-----+------------+

1 row in set (0.00 sec)

Page 10: MySQL 5.7 NF – JSON Datatype 활용

10

CTAS

3. DDL & DML with JSON

create table friend_list

as

select 100 user_id, 200 friend_id union all

select 100 user_id, 300 friend_id union all

select 200 user_id, 100 friend_id union all

select 200 user_id, 300 friend_id union all

select 200 user_id, 400 friend_id;

select * from friend_list;

+---------+-----------+

| user_id | friend_id |

+---------+-----------+

| 100 | 200 |

| 100 | 300 |

| 200 | 100 |

| 200 | 300 |

| 200 | 400 |

+---------+-----------+

create table t2

as

select user_id

, json_object('lst‘

,json_array(group_concat(friend_id)))

as friend_lst

from friend_list

group by user_id;

select * from t2;

+---------+--------------------------+

| user_id | friend_lst |

+---------+--------------------------+

| 100 | {"lst": ["200,300"]} |

| 200 | {"lst": ["100,300,400"]} |

+---------+--------------------------+

select JSON_SEARCH(friend_lst, 'all', '200,300')

from t2

where user_id = 100;

+-------------------------------------------+

| JSON_SEARCH(friend_lst, 'all', '200,300') |

+-------------------------------------------+

| "$.lst[0]" |

+-------------------------------------------+

select user_id

, friend_lst

, JSON_EXTRACT(friend_lst, "$.lst") as s1

, JSON_EXTRACT(friend_lst, "$.lst[0]") as s2

, JSON_UNQUOTE(JSON_EXTRACT(friend_lst, "$.lst[0]")) as s3

from t2

where user_id = 100;

+---------+----------------------+-------------+-----------+---------+

| user_id | friend_lst | s1 | s2 | s3 |

+---------+----------------------+-------------+-----------+---------+

| 100 | {"lst": ["200,300"]} | ["200,300"] | "200,300" | 200,300 |

+---------+----------------------+-------------+-----------+---------+

Page 11: MySQL 5.7 NF – JSON Datatype 활용

11

JOIN

3. DDL & DML with JSON

create table t2

(

data JSON

);

insert into t2(data)

values

('{"series":[11, 1, 100]}')

,('{"series":[22, 7 ]}')

,('{"series":[33, 3, 200]}');

select * from t2;

+--------------------------+

| data |

+--------------------------+

| {"series": [11, 1, 100]} |

| {"series": [22, 7]} |

| {"series": [33, 3, 200]} |

+--------------------------+

select *

from t1, t2

where t1.data -> "$.series"

= t2.data -> "$.series[1]";

+---------------+--------------------------+

| data | data |

+---------------+--------------------------+

| {"series": 1} | {"series": [11, 1, 100]} |

| {"series": 7} | {"series": [22, 7]} |

| {"series": 3} | {"series": [33, 3, 200]} |

+---------------+--------------------------+

Page 12: MySQL 5.7 NF – JSON Datatype 활용

12

4. Indexing JSON data

JSON columns cannot be indexed.

You can work around this restriction by creating an index on a generated column that extracts a scalar value

from the JSON column. See Secondary Indexes and Virtual Generated Columns, for a detailed example.

Generated Column (=Virtual Column)

MySQL supports indexes on generated columns. For example

CREATE TABLE t1

(

f1 INT

, gc INT AS (f1 + 1) STORED

, INDEX (gc)

);

The generated column, gc, is defined as the expression f1 + 1.

The column is also indexed and the optimizer can take that index into account during execution plan construction.

Page 13: MySQL 5.7 NF – JSON Datatype 활용

13

4. Indexing JSON data

VIRTUAL

- 가상컬럼의 데이터는 실제 저장 되지 않음

=> insert / update 빠름

- SELECT

컬럼이 나타내야하는 값을 읽을때 마다 계산

- 인덱스

secondary index만 생성가능

btree만 지원

- 컬럼 추가 시

table rebuild 작업 하지 않음

STORED

- 가상컬럼의 데이터가 실제로 저장됨

- 인덱스

primary & secondary 인덱스 모두가능

btree , fts, gis 지원

- 컬럼 추가 시

table rebuild 작업 필요

VS

GENERATED COLUMN

Page 14: MySQL 5.7 NF – JSON Datatype 활용

14

4. Indexing JSON data

GENERATED COLUMN을 이용한 인덱스 생성

create table `t1` (

`data` json,

`id` int(11) AS (JSON_EXTRACT(data,"$.id")) STORED,

`id2` int(11) AS (JSON_EXTRACT(data,"$.series")) VIRTUAL

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

alter table t1 add primary key (id);

Create index id_idx on t1(id2);

show create table t1\G

*************************** 1. row ***************************

Table: t1

Create Table: CREATE TABLE `t1` (

`data` json DEFAULT NULL,

`id` int(11) GENERATED ALWAYS AS (JSON_EXTRACT(data,"$.id")) STORED NOT NULL,

`id2` int(11) GENERATED ALWAYS AS (JSON_EXTRACT(data,"$.series")) VIRTUAL,

PRIMARY KEY (`id`),

KEY `id_idx` (`id2`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

Page 15: MySQL 5.7 NF – JSON Datatype 활용

15

4. Indexing JSON data

explain select data from t1 where JSON_EXTRACT(data,"$.series") between 3 and 5;

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

| 1 | SIMPLE | t1 | NULL | ALL | id_idx | NULL | NULL | NULL | 10 | 11.11 | Using where |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

explain select data from t1 where id between 3 and 5;

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

| 1 | SIMPLE | t1 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 3 | 100.00 | Using where |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

GENERATED COLUMN을 이용한 인덱스 생성 - 실행계획

desc t1;

+-------+---------+------+-----+---------+-------------------+

| Field | Type | Null | Key | Default | Extra |

+-------+---------+------+-----+---------+-------------------+

| data | json | YES | | NULL | |

| id | int(11) | NO | PRI | NULL | STORED GENERATED |

| id2 | int(11) | YES | MUL | NULL | VIRTUAL GENERATED |

+-------+---------+------+-----+---------+-------------------+

select * from t1;

+-------------------------+----+------+

| data | id | id2 |

+-------------------------+----+------+

| {"id": 0, "series": 11} | 0 | 11 |

| {"id": 1, "series": 10} | 1 | 10 |

| {"id": 3, "series": 8} | 3 | 8 |

| {"id": 4, "series": 7} | 4 | 7 |

+-------------------------+----+------+

Page 16: MySQL 5.7 NF – JSON Datatype 활용

16

5. Data performance

일반 테이블 desc log_col;

+----------+---------------+------+-----+---------+----------------+

| Field | Type | Null | Key | Default | Extra |

+----------+---------------+------+-----+---------+----------------+

| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |

| user_id | bigint(20) | NO | MUL | NULL | |

| world_id | tinyint(4) | NO | | NULL | |

| log_date | datetime | NO | | NULL | |

| col1 | bigint(20) | YES | | NULL | |

| col2 | bigint(20) | YES | | NULL | |

| col3 | bigint(20) | YES | | NULL | |

| col4 | bigint(20) | YES | | NULL | |

| col5 | bigint(20) | YES | | NULL | |

| str1 | varchar(50) | YES | | NULL | |

| str2 | varchar(50) | YES | | NULL | |

| str3 | varchar(100) | YES | | NULL | |

| str4 | varchar(100) | YES | | NULL | |

| str5 | varchar(1000) | YES | | NULL | |

+----------+---------------+------+-----+---------+----------------+

14 rows in set (0.04 sec)

JSON 테이블 desc log_json;

+----------+------------+------+-----+---------+----------------+

| Field | Type | Null | Key | Default | Extra |

+----------+------------+------+-----+---------+----------------+

| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |

| user_id | bigint(20) | NO | MUL | NULL | |

| world_id | tinyint(4) | NO | | NULL | |

| log_date | datetime | NO | | NULL | |

| jdata | JSON | YES | | NULL | |

+----------+------------+------+-----+---------+----------------+

5 rows in set (0.00 sec))

테이블 크기 +--------------+------------------+------------+---------------+

| table_schema | table_name | table_rows | DB Size in MB |

+--------------+------------------+------------+---------------+

| test | log_col | 994788 | 111.2 | # 일반 테이블 | test | log_json | 992943 | 163.3 | # JSON 테이블 (40%) +--------------+------------------+------------+---------------+

Page 17: MySQL 5.7 NF – JSON Datatype 활용

17

INSERT SELECT

테이블 시간

일반 4 min 6.55 sec

JSON 4 min 14.62 sec

테이블 시간

일반 select count(col1) from log_col where col1 between 3336 and 5990; 0.24 sec

JSON select count(json_extract(jdata,"$.col1")) from log_json where json_extract(jdata,"$.col1") >= 3336 and json_extract(jdata,"$.col1") <= 5990;

2.13 sec

5. Data performance

create index idx01 on log_col(col1); -- 1.07 sec

테이블 시간

일반 select count(col1) from log_col where col1 between 3336 and 5990; 0.2 sec

JSON 인덱스 생성 불가

Page 18: MySQL 5.7 NF – JSON Datatype 활용

18

STORED 테이블 desc log_json_store;

+----------+------------+------+-----+---------+------------------+

| Field | Type | Null | Key | Default | Extra |

+----------+------------+------+-----+---------+------------------+

| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |

| user_id | bigint(20) | NO | MUL | NULL | |

| world_id | tinyint(4) | NO | | NULL | |

| log_date | datetime | NO | | NULL | |

| id | bigint(20) | YES | | NULL | STORED GENERATED |

| jdata | json | YES | | NULL | |

+----------+------------+------+-----+---------+------------------+

6 rows in set (0.01 sec)

VIRTUAL 테이블

desc log_json_virtual;

+----------+------------+------+-----+---------+-------------------+

| Field | Type | Null | Key | Default | Extra |

+----------+------------+------+-----+---------+-------------------+

| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |

| user_id | bigint(20) | NO | MUL | NULL | |

| world_id | tinyint(4) | NO | | NULL | |

| log_date | datetime | NO | | NULL | |

| id | bigint(20) | YES | | NULL | VIRTUAL GENERATED |

| jdata | json | YES | | NULL | |

+----------+------------+------+-----+---------+-------------------+

6 rows in set (0.00 sec)

5. Data performance

테이블 크기 +--------------+------------------+------------+---------------+

| table_schema | table_name | table_rows | DB Size in MB |

+--------------+------------------+------------+---------------+

| test | log_json | 992943 | 163.3 |

| test | log_json_store | 991134 | 197.8 | # STORED 테이블 | test | log_json_virtual | 989866 | 168.8 | # VIRTUAL 테이블 +--------------+------------------+------------+---------------+

Page 19: MySQL 5.7 NF – JSON Datatype 활용

19

테이블 시간

STORED 4 min 27.99 sec

VIRTUAL 4 min 12.83 sec

테이블 시간

STORED select count(id) from log_json_store where id between 3336 and 5990; 0.21 sec

VIRTUAL select count(id) from log_json_virtual where id between 3336 and 5990; 1.93 sec

5. Data performance

INSERT (100만건) SELECT

테이블 시간

STORED select count(id) from log_json_store where id between 3336 and 5990; 0.0 sec

VIRTUAL select count(id) from log_json_virtual where id between 3336 and 5990; 0.0 sec

create index idx01 on log_json_store(id); -- 0.81 sec create index idx01 on log_json_virtual(id); -- 1.38 sec

Page 20: MySQL 5.7 NF – JSON Datatype 활용

20

테이블 시간

JSON STORED 0.54 sec

JSON VIRTUAL 2.43 sec

TEXT STORED 0.66 sec

TEXT VIRTUAL 8.02 sec

5. Data performance

WHY JSON THAN TEXT/VARCHAR ???

desc log_text_virtual; desc log_json_virtual;

+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------

----------+

| Field | Type | Null | Key | Default | Extra | | Field | Type | Null | Key | Default | Extra

|

+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------

----------+

| log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | log_idx | bigint(20) | NO | PRI | NULL |

auto_increment |

| user_id | bigint(20) | NO | MUL | NULL | | | user_id | bigint(20) | NO | MUL | NULL |

|

| world_id | tinyint(4) | NO | | NULL | | | world_id | tinyint(4) | NO | | NULL |

|

| log_date | datetime | NO | | NULL | | | log_date | datetime | NO | | NULL |

|

| id | bigint(20) | YES | | NULL | VIRTUAL GENERATED | | id | bigint(20) | YES | | NULL | VIRTUAL

GENERATED |

| jdata | text | YES | | NULL | | | jdata | json | YES | | NULL |

|

+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------

----------+

6 rows in set (0.00 sec)

select sum(id) from log_text_stored;

TEXT/VARCHAR 내부 위치한 객체 키 값 /배열 항목에 대한 위치정보 따로 관리 안 됨

=> select 시 해당row 위치를 다시 찾아야 함

VIRTUAL 테이블

Page 21: MySQL 5.7 NF – JSON Datatype 활용

21

6. 적용사례

Column Based Table

Page 22: MySQL 5.7 NF – JSON Datatype 활용

22

6. 적용사례

JSON type 사용

* JSON 포함내용에서 제외항목

1) 예측 가능한 컬럼

2) 조회 시 중요하게 사용될 수 있는 컬럼

3) 분석 시 Dimension 에 해당 되는 컬럼

“HIBRID TABLE”

Page 23: MySQL 5.7 NF – JSON Datatype 활용

23

6. 적용사례

JSON

Column based table

Page 24: MySQL 5.7 NF – JSON Datatype 활용

24

6. 적용사례

조회 편의성을 위해 View 로 제공

JSON Data 는 중첩구조[배열] 로 저장되지 않도록 가이드

Page 25: MySQL 5.7 NF – JSON Datatype 활용

25

6. 적용사례

JSON SELECT

7배 이상 느림

( ∵ Disk IO 부하 + JSON internal search 부하 로 예상 )

Column based

JSON based

Column based

JSON based

Page 26: MySQL 5.7 NF – JSON Datatype 활용

26

6. 적용사례

JSON WRITE

속도: Column Table 보다 20~30% 이내로 느림

( ∵ Row Length 에 따른 Disk IO 부하로 예상 )

크기: JSON based table 30% 더 차지

( ∵ row별 객체KEY + 내부 객체 KEY인덱스)

Column based

JSON based

Column based JSON based

Page 27: MySQL 5.7 NF – JSON Datatype 활용

27

컬럼 추가에 대한 확장성이 필

요해! (down time 최소화)

쓰기 성능은 그럭저럭?

“읽기” 성능이

너무 떨어지는 거 아냐?

6. 적용사례

COLUMN? or JSON?

Your Choice!!!

Page 28: MySQL 5.7 NF – JSON Datatype 활용

28

7. ROADMAP

JSON/BLOB replication 시 partial streaming 제공

GENERATED COLUMN-VIRTUAL 에서도 FULL text / GIS 인덱스 제공

JSON/BLOB 의 in-place update 지원

(update시 동일 페이지에 있는 해당 rows들이 옮겨지지 않고 rowid도 바뀌지 않는 방법)

Condition Pushdown을 통한 성능향상 제공

Page 29: MySQL 5.7 NF – JSON Datatype 활용

29