MySQL 5.7 NF – JSON Datatype 활용

MySQL 5.7 JSON datatype

2015.11.29

정지원

1

2

Index

1. Why JSON

2. About JSON datatype

3. DDL ,DML with JSON

4. Indexing JSON data

5. Data performance

6. 적용 사례

7. ROADMAP

3

1. Why JSON

편리한 객체 나열 형식

JSON data 의 효과적인 처리 필요

RDB & Schemaless data의 통합

새로운 어플리케이션에 대한 기존 database의 대응강화

참고 : http://www.w3schools.com/json/

http://www.w3schools.com/json/

http://www.w3schools.com/json/

4

MySQL 5.7 부터 지원

Binary format

Parse and validation on insert only

Dictionary

Sorted objects’ keys

Fast access to array cells by index

지원되는 타입

모든 JSON type 지원됨

숫자,문자,boolean

객체, 배열

Extended

date, time, datetime, timestamp … 등등

2. About JSON data type

Ex1> ["12:18:29.000000", "2015-07-29", "2015-07-29 12:18:29.000000"] Ex2> SELECT JSON_ARRAY('a', 1, NOW()); +---------------------------------------+ | JSON_ARRAY('a', 1, NOW()) | +----------------------------------------+ | ["a", 1, "2015-07-27 09:43:47.000000"] | +----------------------------------------+

5

max_allowed_packet

JSON 컬럼 길이 제한


6


Function List

https://dev.mysql.com/doc/refman/5.7/en/json-functions.html

7

CREATE & INSERT

3. DDL & DML with JSON

insert into t1(data)

values

('{"series":1}')

,('{"series":7}')

,('{"series":3}')

,(JSON_QUOTE('some, might be formatted,{text} with "quotes"'))

;

select * from t1;

+---------------------------------------------------+

| data |

+---------------------------------------------------+

| {"series": 1} |

| {"series": 7} |

| {"series": 3} |

| "some, might be formatted,{text} with \"quotes\"" |

+---------------------------------------------------+

12 rows in set (0.00 sec)

create table t1

(

data JSON // 데이터 타입 (JSON) );

8

SELECT


select * from t1 where json_extract(data,"$.series") >= 3;

+----------------+ | data | +----------------+ | {"series": 3} | | {"series": 7} | +----------------+ select * from t1 where data -> "$.series" >= 3; -- [5.7.9~] inlined json path +----------------------------------+------+ | data | id | +----------------------------------+------+ | {"series": 3} | 7 | | {"series": 7} | 3 | +----------------------------------+------+ select * from t1 where data >= json_object("series",3);

+----------------------------------+------+ | data | id | +----------------------------------+------+ | {"series": 3} | 7 | | {"series": 7} | 3 | | {"a": "valid", "json": ["text"]} | NULL | -- ?? +----------------------------------+------+

9

UPDATE


create table gm_friends

(

uid bigint primary key

,friend_uid json -- 친구리스트

);

set @friend := '[113]'; -- 친구추가

insert into gm_friends values (111 , @friend)

on duplicate key update friend_uid = json_merge(friend_uid,@friend);

select * from gm_friends where uid=111;

+-----+------------+

| uid | friend_uid |

+-----+------------+

| 111 | [112, 113] | -- 유저 111의 친구리스트

+-----+------------+

1 row in set (0.00 sec)

10

CTAS


create table friend_list

as

select 100 user_id, 200 friend_id union all




select 200 user_id, 400 friend_id;

select * from friend_list;

+---------+-----------+

| user_id | friend_id |

+---------+-----------+

| 100 | 200 |

| 100 | 300 |

| 200 | 100 |

| 200 | 300 |

| 200 | 400 |

+---------+-----------+

create table t2

as

select user_id

, json_object('lst‘

,json_array(group_concat(friend_id)))

as friend_lst

from friend_list

group by user_id;

select * from t2;

+---------+--------------------------+

| user_id | friend_lst |

+---------+--------------------------+

| 100 | {"lst": ["200,300"]} |

| 200 | {"lst": ["100,300,400"]} |

+---------+--------------------------+

select JSON_SEARCH(friend_lst, 'all', '200,300')

from t2

where user_id = 100;

+-------------------------------------------+

| JSON_SEARCH(friend_lst, 'all', '200,300') |

+-------------------------------------------+

| "$.lst[0]" |

+-------------------------------------------+

select user_id

, friend_lst

, JSON_EXTRACT(friend_lst, "$.lst") as s1

, JSON_EXTRACT(friend_lst, "$.lst[0]") as s2

, JSON_UNQUOTE(JSON_EXTRACT(friend_lst, "$.lst[0]")) as s3

from t2

where user_id = 100;

+---------+----------------------+-------------+-----------+---------+

| user_id | friend_lst | s1 | s2 | s3 |

+---------+----------------------+-------------+-----------+---------+

| 100 | {"lst": ["200,300"]} | ["200,300"] | "200,300" | 200,300 |

+---------+----------------------+-------------+-----------+---------+

11

JOIN


create table t2

(

data JSON

);

insert into t2(data)

values

('{"series":[11, 1, 100]}')

,('{"series":[22, 7 ]}')

,('{"series":[33, 3, 200]}');

select * from t2;

+--------------------------+

| data |

+--------------------------+

| {"series": [11, 1, 100]} |

| {"series": [22, 7]} |

| {"series": [33, 3, 200]} |

+--------------------------+

select *

from t1, t2

where t1.data -> "$.series"

= t2.data -> "$.series[1]";

+---------------+--------------------------+

| data | data |

+---------------+--------------------------+

| {"series": 1} | {"series": [11, 1, 100]} |

| {"series": 7} | {"series": [22, 7]} |

| {"series": 3} | {"series": [33, 3, 200]} |

+---------------+--------------------------+

12


JSON columns cannot be indexed.

You can work around this restriction by creating an index on a generated column that extracts a scalar value

from the JSON column. See Secondary Indexes and Virtual Generated Columns, for a detailed example.

Generated Column (=Virtual Column)

MySQL supports indexes on generated columns. For example

CREATE TABLE t1

(

f1 INT

, gc INT AS (f1 + 1) STORED

, INDEX (gc)

);

The generated column, gc, is defined as the expression f1 + 1.

The column is also indexed and the optimizer can take that index into account during execution plan construction.

https://dev.mysql.com/doc/refman/5.7/en/create-table.html#create-table-secondary-indexes-virtual-columns

13


VIRTUAL

- 가상컬럼의 데이터는 실제 저장 되지 않음

=> insert / update 빠름

- SELECT

컬럼이 나타내야하는 값을 읽을때 마다 계산

- 인덱스

secondary index만 생성가능

btree만 지원

- 컬럼 추가 시

table rebuild 작업 하지 않음

STORED

- 가상컬럼의 데이터가 실제로 저장됨

- 인덱스

primary & secondary 인덱스 모두가능

btree , fts, gis 지원

- 컬럼 추가 시

table rebuild 작업 필요

VS

GENERATED COLUMN

14


GENERATED COLUMN을 이용한 인덱스 생성

create table `t1` (

`data` json,

ìd` int(11) AS (JSON_EXTRACT(data,"$.id")) STORED,

ìd2` int(11) AS (JSON_EXTRACT(data,"$.series")) VIRTUAL

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

alter table t1 add primary key (id);

Create index id_idx on t1(id2);

show create table t1\G

*************************** 1. row ***************************

Table: t1

Create Table: CREATE TABLE `t1` (

`data` json DEFAULT NULL,

ìd` int(11) GENERATED ALWAYS AS (JSON_EXTRACT(data,"$.id")) STORED NOT NULL,

ìd2` int(11) GENERATED ALWAYS AS (JSON_EXTRACT(data,"$.series")) VIRTUAL,

PRIMARY KEY (ìd`),

KEY ìd_idx` (ìd2`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

15


explain select data from t1 where JSON_EXTRACT(data,"$.series") between 3 and 5;

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

| 1 | SIMPLE | t1 | NULL | ALL | id_idx | NULL | NULL | NULL | 10 | 11.11 | Using where |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

explain select data from t1 where id between 3 and 5;

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

| 1 | SIMPLE | t1 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 3 | 100.00 | Using where |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

GENERATED COLUMN을 이용한 인덱스 생성 - 실행계획

desc t1;

+-------+---------+------+-----+---------+-------------------+

| Field | Type | Null | Key | Default | Extra |

+-------+---------+------+-----+---------+-------------------+

| data | json | YES | | NULL | |

| id | int(11) | NO | PRI | NULL | STORED GENERATED |

| id2 | int(11) | YES | MUL | NULL | VIRTUAL GENERATED |

+-------+---------+------+-----+---------+-------------------+

select * from t1;

+-------------------------+----+------+

| data | id | id2 |

+-------------------------+----+------+

| {"id": 0, "series": 11} | 0 | 11 |

| {"id": 1, "series": 10} | 1 | 10 |

| {"id": 3, "series": 8} | 3 | 8 |

| {"id": 4, "series": 7} | 4 | 7 |

+-------------------------+----+------+

16

5. Data performance

일반 테이블 desc log_col;

+----------+---------------+------+-----+---------+----------------+


+----------+---------------+------+-----+---------+----------------+

| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |

| user_id | bigint(20) | NO | MUL | NULL | |

| world_id | tinyint(4) | NO | | NULL | |

| log_date | datetime | NO | | NULL | |

| col1 | bigint(20) | YES | | NULL | |





| str1 | varchar(50) | YES | | NULL | |





+----------+---------------+------+-----+---------+----------------+


JSON 테이블 desc log_json;

+----------+------------+------+-----+---------+----------------+


+----------+------------+------+-----+---------+----------------+





| jdata | JSON | YES | | NULL | |

+----------+------------+------+-----+---------+----------------+

5 rows in set (0.00 sec))

테이블 크기 +--------------+------------------+------------+---------------+

| table_schema | table_name | table_rows | DB Size in MB |

+--------------+------------------+------------+---------------+

| test | log_col | 994788 | 111.2 | # 일반 테이블 | test | log_json | 992943 | 163.3 | # JSON 테이블 (40%) +--------------+------------------+------------+---------------+

17

INSERT SELECT

테이블 시간

일반 4 min 6.55 sec

JSON 4 min 14.62 sec

테이블 시간

일반 select count(col1) from log_col where col1 between 3336 and 5990; 0.24 sec

JSON select count(json_extract(jdata,"$.col1")) from log_json where json_extract(jdata,"$.col1") >= 3336 and json_extract(jdata,"$.col1") <= 5990;

2.13 sec

5. Data performance

create index idx01 on log_col(col1); -- 1.07 sec

테이블 시간

일반 select count(col1) from log_col where col1 between 3336 and 5990; 0.2 sec

JSON 인덱스 생성 불가

18

STORED 테이블 desc log_json_store;

+----------+------------+------+-----+---------+------------------+


+----------+------------+------+-----+---------+------------------+





| id | bigint(20) | YES | | NULL | STORED GENERATED |

| jdata | json | YES | | NULL | |

+----------+------------+------+-----+---------+------------------+


VIRTUAL 테이블

desc log_json_virtual;

+----------+------------+------+-----+---------+-------------------+


+----------+------------+------+-----+---------+-------------------+





| id | bigint(20) | YES | | NULL | VIRTUAL GENERATED |

| jdata | json | YES | | NULL | |

+----------+------------+------+-----+---------+-------------------+


5. Data performance

테이블 크기 +--------------+------------------+------------+---------------+

| table_schema | table_name | table_rows | DB Size in MB |

+--------------+------------------+------------+---------------+

| test | log_json | 992943 | 163.3 |

| test | log_json_store | 991134 | 197.8 | # STORED 테이블 | test | log_json_virtual | 989866 | 168.8 | # VIRTUAL 테이블 +--------------+------------------+------------+---------------+

19

테이블 시간

STORED 4 min 27.99 sec

VIRTUAL 4 min 12.83 sec

테이블 시간

STORED select count(id) from log_json_store where id between 3336 and 5990; 0.21 sec

VIRTUAL select count(id) from log_json_virtual where id between 3336 and 5990; 1.93 sec

5. Data performance

INSERT (100만건) SELECT

테이블 시간

STORED select count(id) from log_json_store where id between 3336 and 5990; 0.0 sec

VIRTUAL select count(id) from log_json_virtual where id between 3336 and 5990; 0.0 sec

create index idx01 on log_json_store(id); -- 0.81 sec create index idx01 on log_json_virtual(id); -- 1.38 sec

20

테이블 시간

JSON STORED 0.54 sec

JSON VIRTUAL 2.43 sec

TEXT STORED 0.66 sec

TEXT VIRTUAL 8.02 sec

5. Data performance

WHY JSON THAN TEXT/VARCHAR ???

desc log_text_virtual; desc log_json_virtual;

+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------

----------+

| Field | Type | Null | Key | Default | Extra | | Field | Type | Null | Key | Default | Extra

|

+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------

----------+

| log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | log_idx | bigint(20) | NO | PRI | NULL |

auto_increment |

| user_id | bigint(20) | NO | MUL | NULL | | | user_id | bigint(20) | NO | MUL | NULL |

|

| world_id | tinyint(4) | NO | | NULL | | | world_id | tinyint(4) | NO | | NULL |

|

| log_date | datetime | NO | | NULL | | | log_date | datetime | NO | | NULL |

|

| id | bigint(20) | YES | | NULL | VIRTUAL GENERATED | | id | bigint(20) | YES | | NULL | VIRTUAL

GENERATED |

| jdata | text | YES | | NULL | | | jdata | json | YES | | NULL |

|

+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------

----------+


select sum(id) from log_text_stored;

TEXT/VARCHAR 내부 위치한 객체 키 값 /배열 항목에 대한 위치정보 따로 관리 안 됨

=> select 시 해당row 위치를 다시 찾아야 함

VIRTUAL 테이블

21

6. 적용사례

Column Based Table

22

6. 적용사례

JSON type 사용

* JSON 포함내용에서 제외항목

1) 예측 가능한 컬럼

2) 조회 시 중요하게 사용될 수 있는 컬럼

3) 분석 시 Dimension 에 해당 되는 컬럼

“HIBRID TABLE”

23

6. 적용사례

JSON

Column based table

24

6. 적용사례

조회 편의성을 위해 View 로 제공

JSON Data 는 중첩구조[배열] 로 저장되지 않도록 가이드

25

6. 적용사례

JSON SELECT

7배 이상 느림

( ∵ Disk IO 부하 + JSON internal search 부하 로 예상 )

Column based

JSON based

Column based

JSON based

26

6. 적용사례

JSON WRITE

속도: Column Table 보다 20~30% 이내로 느림

( ∵ Row Length 에 따른 Disk IO 부하로 예상 )

크기: JSON based table 30% 더 차지

( ∵ row별 객체KEY + 내부 객체 KEY인덱스)

Column based

JSON based

Column based JSON based

27

컬럼 추가에 대한 확장성이 필

요해! (down time 최소화)

쓰기 성능은 그럭저럭?

“읽기” 성능이

너무 떨어지는 거 아냐?

6. 적용사례

COLUMN? or JSON?

Your Choice!!!

28

7. ROADMAP

JSON/BLOB replication 시 partial streaming 제공

GENERATED COLUMN-VIRTUAL 에서도 FULL text / GIS 인덱스 제공

JSON/BLOB 의 in-place update 지원

(update시 동일 페이지에 있는 해당 rows들이 옮겨지지 않고 rowid도 바뀌지 않는 방법)

Condition Pushdown을 통한 성능향상 제공

29

Internet

MySQL 5.7 NF – JSON Datatype 활용