Upload
i-goo-lee
View
858
Download
7
Embed Size (px)
Citation preview
MySQL 5.7 JSON datatype
2015.11.29
정지원
1
2
Index
1. Why JSON
2. About JSON datatype
3. DDL ,DML with JSON
4. Indexing JSON data
5. Data performance
6. 적용 사례
7. ROADMAP
3
1. Why JSON
편리한 객체 나열 형식
JSON data 의 효과적인 처리 필요
RDB & Schemaless data의 통합
새로운 어플리케이션에 대한 기존 database의 대응강화
참고 : http://www.w3schools.com/json/
4
MySQL 5.7 부터 지원
Binary format
Parse and validation on insert only
Dictionary
Sorted objects’ keys
Fast access to array cells by index
지원되는 타입
모든 JSON type 지원됨
숫자,문자,boolean
객체, 배열
Extended
date, time, datetime, timestamp … 등등
2. About JSON data type
Ex1> ["12:18:29.000000", "2015-07-29", "2015-07-29 12:18:29.000000"] Ex2> SELECT JSON_ARRAY('a', 1, NOW()); +---------------------------------------+ | JSON_ARRAY('a', 1, NOW()) | +----------------------------------------+ | ["a", 1, "2015-07-27 09:43:47.000000"] | +----------------------------------------+
5
max_allowed_packet
JSON 컬럼 길이 제한
2. About JSON data type
6
2. About JSON data type
Function List
https://dev.mysql.com/doc/refman/5.7/en/json-functions.html
7
CREATE & INSERT
3. DDL & DML with JSON
insert into t1(data)
values
('{"series":1}')
,('{"series":7}')
,('{"series":3}')
,(JSON_QUOTE('some, might be formatted,{text} with "quotes"'))
;
select * from t1;
+---------------------------------------------------+
| data |
+---------------------------------------------------+
| {"series": 1} |
| {"series": 7} |
| {"series": 3} |
| "some, might be formatted,{text} with \"quotes\"" |
+---------------------------------------------------+
12 rows in set (0.00 sec)
create table t1
(
data JSON // 데이터 타입 (JSON) );
8
SELECT
3. DDL & DML with JSON
select * from t1 where json_extract(data,"$.series") >= 3;
+----------------+ | data | +----------------+ | {"series": 3} | | {"series": 7} | +----------------+ select * from t1 where data -> "$.series" >= 3; -- [5.7.9~] inlined json path +----------------------------------+------+ | data | id | +----------------------------------+------+ | {"series": 3} | 7 | | {"series": 7} | 3 | +----------------------------------+------+ select * from t1 where data >= json_object("series",3);
+----------------------------------+------+ | data | id | +----------------------------------+------+ | {"series": 3} | 7 | | {"series": 7} | 3 | | {"a": "valid", "json": ["text"]} | NULL | -- ?? +----------------------------------+------+
9
UPDATE
3. DDL & DML with JSON
create table gm_friends
(
uid bigint primary key
,friend_uid json -- 친구리스트
);
set @friend := '[113]'; -- 친구추가
insert into gm_friends values (111 , @friend)
on duplicate key update friend_uid = json_merge(friend_uid,@friend);
select * from gm_friends where uid=111;
+-----+------------+
| uid | friend_uid |
+-----+------------+
| 111 | [112, 113] | -- 유저 111의 친구리스트
+-----+------------+
1 row in set (0.00 sec)
10
CTAS
3. DDL & DML with JSON
create table friend_list
as
select 100 user_id, 200 friend_id union all
select 100 user_id, 300 friend_id union all
select 200 user_id, 100 friend_id union all
select 200 user_id, 300 friend_id union all
select 200 user_id, 400 friend_id;
select * from friend_list;
+---------+-----------+
| user_id | friend_id |
+---------+-----------+
| 100 | 200 |
| 100 | 300 |
| 200 | 100 |
| 200 | 300 |
| 200 | 400 |
+---------+-----------+
create table t2
as
select user_id
, json_object('lst‘
,json_array(group_concat(friend_id)))
as friend_lst
from friend_list
group by user_id;
select * from t2;
+---------+--------------------------+
| user_id | friend_lst |
+---------+--------------------------+
| 100 | {"lst": ["200,300"]} |
| 200 | {"lst": ["100,300,400"]} |
+---------+--------------------------+
select JSON_SEARCH(friend_lst, 'all', '200,300')
from t2
where user_id = 100;
+-------------------------------------------+
| JSON_SEARCH(friend_lst, 'all', '200,300') |
+-------------------------------------------+
| "$.lst[0]" |
+-------------------------------------------+
select user_id
, friend_lst
, JSON_EXTRACT(friend_lst, "$.lst") as s1
, JSON_EXTRACT(friend_lst, "$.lst[0]") as s2
, JSON_UNQUOTE(JSON_EXTRACT(friend_lst, "$.lst[0]")) as s3
from t2
where user_id = 100;
+---------+----------------------+-------------+-----------+---------+
| user_id | friend_lst | s1 | s2 | s3 |
+---------+----------------------+-------------+-----------+---------+
| 100 | {"lst": ["200,300"]} | ["200,300"] | "200,300" | 200,300 |
+---------+----------------------+-------------+-----------+---------+
11
JOIN
3. DDL & DML with JSON
create table t2
(
data JSON
);
insert into t2(data)
values
('{"series":[11, 1, 100]}')
,('{"series":[22, 7 ]}')
,('{"series":[33, 3, 200]}');
select * from t2;
+--------------------------+
| data |
+--------------------------+
| {"series": [11, 1, 100]} |
| {"series": [22, 7]} |
| {"series": [33, 3, 200]} |
+--------------------------+
select *
from t1, t2
where t1.data -> "$.series"
= t2.data -> "$.series[1]";
+---------------+--------------------------+
| data | data |
+---------------+--------------------------+
| {"series": 1} | {"series": [11, 1, 100]} |
| {"series": 7} | {"series": [22, 7]} |
| {"series": 3} | {"series": [33, 3, 200]} |
+---------------+--------------------------+
12
4. Indexing JSON data
JSON columns cannot be indexed.
You can work around this restriction by creating an index on a generated column that extracts a scalar value
from the JSON column. See Secondary Indexes and Virtual Generated Columns, for a detailed example.
Generated Column (=Virtual Column)
MySQL supports indexes on generated columns. For example
CREATE TABLE t1
(
f1 INT
, gc INT AS (f1 + 1) STORED
, INDEX (gc)
);
The generated column, gc, is defined as the expression f1 + 1.
The column is also indexed and the optimizer can take that index into account during execution plan construction.
13
4. Indexing JSON data
VIRTUAL
- 가상컬럼의 데이터는 실제 저장 되지 않음
=> insert / update 빠름
- SELECT
컬럼이 나타내야하는 값을 읽을때 마다 계산
- 인덱스
secondary index만 생성가능
btree만 지원
- 컬럼 추가 시
table rebuild 작업 하지 않음
STORED
- 가상컬럼의 데이터가 실제로 저장됨
- 인덱스
primary & secondary 인덱스 모두가능
btree , fts, gis 지원
- 컬럼 추가 시
table rebuild 작업 필요
VS
GENERATED COLUMN
14
4. Indexing JSON data
GENERATED COLUMN을 이용한 인덱스 생성
create table `t1` (
`data` json,
`id` int(11) AS (JSON_EXTRACT(data,"$.id")) STORED,
`id2` int(11) AS (JSON_EXTRACT(data,"$.series")) VIRTUAL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
alter table t1 add primary key (id);
Create index id_idx on t1(id2);
show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`data` json DEFAULT NULL,
`id` int(11) GENERATED ALWAYS AS (JSON_EXTRACT(data,"$.id")) STORED NOT NULL,
`id2` int(11) GENERATED ALWAYS AS (JSON_EXTRACT(data,"$.series")) VIRTUAL,
PRIMARY KEY (`id`),
KEY `id_idx` (`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
15
4. Indexing JSON data
explain select data from t1 where JSON_EXTRACT(data,"$.series") between 3 and 5;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | id_idx | NULL | NULL | NULL | 10 | 11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
explain select data from t1 where id between 3 and 5;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 3 | 100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
GENERATED COLUMN을 이용한 인덱스 생성 - 실행계획
desc t1;
+-------+---------+------+-----+---------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------------------+
| data | json | YES | | NULL | |
| id | int(11) | NO | PRI | NULL | STORED GENERATED |
| id2 | int(11) | YES | MUL | NULL | VIRTUAL GENERATED |
+-------+---------+------+-----+---------+-------------------+
select * from t1;
+-------------------------+----+------+
| data | id | id2 |
+-------------------------+----+------+
| {"id": 0, "series": 11} | 0 | 11 |
| {"id": 1, "series": 10} | 1 | 10 |
| {"id": 3, "series": 8} | 3 | 8 |
| {"id": 4, "series": 7} | 4 | 7 |
+-------------------------+----+------+
16
5. Data performance
일반 테이블 desc log_col;
+----------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------------+------+-----+---------+----------------+
| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |
| user_id | bigint(20) | NO | MUL | NULL | |
| world_id | tinyint(4) | NO | | NULL | |
| log_date | datetime | NO | | NULL | |
| col1 | bigint(20) | YES | | NULL | |
| col2 | bigint(20) | YES | | NULL | |
| col3 | bigint(20) | YES | | NULL | |
| col4 | bigint(20) | YES | | NULL | |
| col5 | bigint(20) | YES | | NULL | |
| str1 | varchar(50) | YES | | NULL | |
| str2 | varchar(50) | YES | | NULL | |
| str3 | varchar(100) | YES | | NULL | |
| str4 | varchar(100) | YES | | NULL | |
| str5 | varchar(1000) | YES | | NULL | |
+----------+---------------+------+-----+---------+----------------+
14 rows in set (0.04 sec)
JSON 테이블 desc log_json;
+----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+----------------+
| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |
| user_id | bigint(20) | NO | MUL | NULL | |
| world_id | tinyint(4) | NO | | NULL | |
| log_date | datetime | NO | | NULL | |
| jdata | JSON | YES | | NULL | |
+----------+------------+------+-----+---------+----------------+
5 rows in set (0.00 sec))
테이블 크기 +--------------+------------------+------------+---------------+
| table_schema | table_name | table_rows | DB Size in MB |
+--------------+------------------+------------+---------------+
| test | log_col | 994788 | 111.2 | # 일반 테이블 | test | log_json | 992943 | 163.3 | # JSON 테이블 (40%) +--------------+------------------+------------+---------------+
17
INSERT SELECT
테이블 시간
일반 4 min 6.55 sec
JSON 4 min 14.62 sec
테이블 시간
일반 select count(col1) from log_col where col1 between 3336 and 5990; 0.24 sec
JSON select count(json_extract(jdata,"$.col1")) from log_json where json_extract(jdata,"$.col1") >= 3336 and json_extract(jdata,"$.col1") <= 5990;
2.13 sec
5. Data performance
create index idx01 on log_col(col1); -- 1.07 sec
테이블 시간
일반 select count(col1) from log_col where col1 between 3336 and 5990; 0.2 sec
JSON 인덱스 생성 불가
18
STORED 테이블 desc log_json_store;
+----------+------------+------+-----+---------+------------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+------------------+
| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |
| user_id | bigint(20) | NO | MUL | NULL | |
| world_id | tinyint(4) | NO | | NULL | |
| log_date | datetime | NO | | NULL | |
| id | bigint(20) | YES | | NULL | STORED GENERATED |
| jdata | json | YES | | NULL | |
+----------+------------+------+-----+---------+------------------+
6 rows in set (0.01 sec)
VIRTUAL 테이블
desc log_json_virtual;
+----------+------------+------+-----+---------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------------------+
| log_idx | bigint(20) | NO | PRI | NULL | auto_increment |
| user_id | bigint(20) | NO | MUL | NULL | |
| world_id | tinyint(4) | NO | | NULL | |
| log_date | datetime | NO | | NULL | |
| id | bigint(20) | YES | | NULL | VIRTUAL GENERATED |
| jdata | json | YES | | NULL | |
+----------+------------+------+-----+---------+-------------------+
6 rows in set (0.00 sec)
5. Data performance
테이블 크기 +--------------+------------------+------------+---------------+
| table_schema | table_name | table_rows | DB Size in MB |
+--------------+------------------+------------+---------------+
| test | log_json | 992943 | 163.3 |
| test | log_json_store | 991134 | 197.8 | # STORED 테이블 | test | log_json_virtual | 989866 | 168.8 | # VIRTUAL 테이블 +--------------+------------------+------------+---------------+
19
테이블 시간
STORED 4 min 27.99 sec
VIRTUAL 4 min 12.83 sec
테이블 시간
STORED select count(id) from log_json_store where id between 3336 and 5990; 0.21 sec
VIRTUAL select count(id) from log_json_virtual where id between 3336 and 5990; 1.93 sec
5. Data performance
INSERT (100만건) SELECT
테이블 시간
STORED select count(id) from log_json_store where id between 3336 and 5990; 0.0 sec
VIRTUAL select count(id) from log_json_virtual where id between 3336 and 5990; 0.0 sec
create index idx01 on log_json_store(id); -- 0.81 sec create index idx01 on log_json_virtual(id); -- 1.38 sec
20
테이블 시간
JSON STORED 0.54 sec
JSON VIRTUAL 2.43 sec
TEXT STORED 0.66 sec
TEXT VIRTUAL 8.02 sec
5. Data performance
WHY JSON THAN TEXT/VARCHAR ???
desc log_text_virtual; desc log_json_virtual;
+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------
----------+
| Field | Type | Null | Key | Default | Extra | | Field | Type | Null | Key | Default | Extra
|
+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------
----------+
| log_idx | bigint(20) | NO | PRI | NULL | auto_increment | | log_idx | bigint(20) | NO | PRI | NULL |
auto_increment |
| user_id | bigint(20) | NO | MUL | NULL | | | user_id | bigint(20) | NO | MUL | NULL |
|
| world_id | tinyint(4) | NO | | NULL | | | world_id | tinyint(4) | NO | | NULL |
|
| log_date | datetime | NO | | NULL | | | log_date | datetime | NO | | NULL |
|
| id | bigint(20) | YES | | NULL | VIRTUAL GENERATED | | id | bigint(20) | YES | | NULL | VIRTUAL
GENERATED |
| jdata | text | YES | | NULL | | | jdata | json | YES | | NULL |
|
+----------+------------+------+-----+---------+-------------------+ +----------+------------+------+-----+---------+---------
----------+
6 rows in set (0.00 sec)
select sum(id) from log_text_stored;
TEXT/VARCHAR 내부 위치한 객체 키 값 /배열 항목에 대한 위치정보 따로 관리 안 됨
=> select 시 해당row 위치를 다시 찾아야 함
VIRTUAL 테이블
21
6. 적용사례
Column Based Table
22
6. 적용사례
JSON type 사용
* JSON 포함내용에서 제외항목
1) 예측 가능한 컬럼
2) 조회 시 중요하게 사용될 수 있는 컬럼
3) 분석 시 Dimension 에 해당 되는 컬럼
“HIBRID TABLE”
23
6. 적용사례
JSON
Column based table
24
6. 적용사례
조회 편의성을 위해 View 로 제공
JSON Data 는 중첩구조[배열] 로 저장되지 않도록 가이드
25
6. 적용사례
JSON SELECT
7배 이상 느림
( ∵ Disk IO 부하 + JSON internal search 부하 로 예상 )
Column based
JSON based
Column based
JSON based
26
6. 적용사례
JSON WRITE
속도: Column Table 보다 20~30% 이내로 느림
( ∵ Row Length 에 따른 Disk IO 부하로 예상 )
크기: JSON based table 30% 더 차지
( ∵ row별 객체KEY + 내부 객체 KEY인덱스)
Column based
JSON based
Column based JSON based
27
컬럼 추가에 대한 확장성이 필
요해! (down time 최소화)
쓰기 성능은 그럭저럭?
“읽기” 성능이
너무 떨어지는 거 아냐?
6. 적용사례
COLUMN? or JSON?
Your Choice!!!
28
7. ROADMAP
JSON/BLOB replication 시 partial streaming 제공
GENERATED COLUMN-VIRTUAL 에서도 FULL text / GIS 인덱스 제공
JSON/BLOB 의 in-place update 지원
(update시 동일 페이지에 있는 해당 rows들이 옮겨지지 않고 rowid도 바뀌지 않는 방법)
Condition Pushdown을 통한 성능향상 제공
29