36
New dict implementation in Python 3.6 Inada Naoki (@methane)

Compact ordered dict__k_lab_meeting_

Embed Size (px)

Citation preview

Page 1: Compact ordered dict__k_lab_meeting_

New dict implementationin Python 3.6

Inada Naoki (@methane)

Page 2: Compact ordered dict__k_lab_meeting_

自己紹介

@methane

K-Labo, KLab Inc.

Python core developer

C, Go, Network (server) programming, MySQL clients

ISUCON 6 winner (See http://isucon.net/ )

Page 3: Compact ordered dict__k_lab_meeting_

Table of contents

● dict in Python● Python 3.5 implementation● Python 3.6 implementation● Toward Python 3.7

Page 4: Compact ordered dict__k_lab_meeting_

Dict in Python

Page 5: Compact ordered dict__k_lab_meeting_

DictKey-Value storage. A.k.a. associative-array, map, hash.

x = {"foo": 42, "bar": 84}

print( x["foo"] ) # => 42

Key feature:

● Constant time lookup● Amortized constant time insertion● Support custom (user-defined) key type

Page 6: Compact ordered dict__k_lab_meeting_

Dicts are everywhere in Pythonx = 5 # global namespace is dict. Insert 'x' to it.def add(a): # Insert 'add' to global dict return a + x # lookup 'x' from global dictprint(add(7)) # search 'print' and 'add' from global dict

There are many dicts in Python program.

Lookup speed is critical.

Insertion speed and memory usage is very important too.

Page 7: Compact ordered dict__k_lab_meeting_

Python 3.5 implementation

Page 8: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

d["foo"] = "spam" # insert new item

hash("foo") = 42 # hash value is 4242 % 8 = 2 # hash value % hash table size = 2

Page 9: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

d["foo"] = "spam"

hash("foo") = 4242 % 8 = 2

"foo"

42

"spam"

Page 10: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

d["bar"] = "ham"

hash("bar") = 5252 % 8 = 4

"foo"

42

"spam"

"bar"

52

"ham"

Page 11: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

d["baz"] = "egg"

hash("baz") = 5858 % 8 = 2 # "baz" is conflict with "foo"

"foo"

42

"spam"

"bar"

52

"ham"

Page 12: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

"Open addressing" uses another slot in the table.(Another strategy is "chaining")

For example, "linear probing" algorithm uses next entry.※Python uses more complex probing, but I use simpler way in this example.

"foo"

42

"spam"

"bar"

52

"ham"

"baz"

58

"egg"

Page 13: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

del d["foo"]

hash("foo") = 4242 % 8 = 2

"foo"

42

"spam"

"bar"

52

"ham"

"baz"

58

"egg"

Page 14: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

del d["foo"]

hash("foo") = 4242 % 8 = 2

"bar"

52

"ham"

"baz"

58

"egg"

Page 15: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

x = d["baz"]

hash("baz") = 5858 % 8 = 2 (!!?)

"bar"

52

"ham"

"baz"

58

"egg"

Page 16: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

del d["foo"] remains DUMMY key

"bar"

52

"ham"

"baz"

58

"egg"

DUMMY

Page 17: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

x = d["baz"]

hash("baz") = 5858 % 8 = 2 (conflict with dummy, then linear probing)

"bar"

52

"ham"

"baz"

58

"egg"

DUMMY

Page 18: Compact ordered dict__k_lab_meeting_

Problems in classical open addressing hash table

● Large memory usage○ At least 1/3 of entries are empty

■ Otherwise, "probing" can be too slow○ One entry uses 3 words

■ word = 8 bytes on recent machine○ minimum size = 192 byte

■ 8 (byte/word) * 3 (word/entry) * 8 (table width)

Page 19: Compact ordered dict__k_lab_meeting_

Python 3.6 implementation

Page 20: Compact ordered dict__k_lab_meeting_

Compact and ordered dict

PyPy implements it in 2015https://morepypy.blogspot.jp/2015/01/faster-more-memory-efficient-and-more.html

Python 3.6 dict is almost same as PyPy.

Ruby 2.4, php 7 has similar one.

Page 21: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

d["foo"] = "spam" # hash("foo") = 42, 42 % 8 = 2

"foo"

42

"spam"

0index

Page 22: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

d["foo"] = "spam"d["bar"] = "ham" # hash("bar") = 52 , 52 % 8 = 4

"bar"

52

"ham"

"foo"

42

"spam"

0 1index

Page 23: Compact ordered dict__k_lab_meeting_

Key

hash

value

0 1 2 3 4 5 6 7

d["foo"] = "spam"d["bar"] = "ham"d["baz"] = "egg"del d["foo"]

"bar"

52

"ham"

"baz"

58

"egg"

DUMMY 2 1index

Page 24: Compact ordered dict__k_lab_meeting_

● Less memory usage○ Index can be 1 byte for small dict○ 3*8 *5 (entries) + 8 (index table) = 128 bytes

■ It was 192 bytes in legacy implementation● Faster iteration (dense entries)● Preserve insertion order● (cons) One more indirect memory access

New dict vs Legacy dict

Page 25: Compact ordered dict__k_lab_meeting_

Toward Python 3.7

Page 26: Compact ordered dict__k_lab_meeting_

Working on ...

● Remove redundant code for optimize legacy implementation.

● OrderedDict based on New dict○ Remove doubly linked list used for keep order○ About 1/2 memory usage!○ Faster creation and iterating.○ (cons) Slower .move_to_end() method

Page 27: Compact ordered dict__k_lab_meeting_

We're finding new contributors

Contributing to Python is easier, thanks to Github.

● Read devguide (https://devguide.python.org/ )● Find easy bug on https://bugs.python.org/ and fix it.● Review other's code● Translate document on Transifex

○ See https://docs.python.org/ja/

Page 28: Compact ordered dict__k_lab_meeting_
Page 29: Compact ordered dict__k_lab_meeting_
Page 30: Compact ordered dict__k_lab_meeting_

Future ideas● specialized dict for namespace

○ all keys are interned string○ only pointer comparison○ no "hash" in entry -> more compact

● Implement set like dict○ current set is larger than dict...

● functools.lru_cache○ Use `od.move_to_end(key)`, instead of linked list

Page 31: Compact ordered dict__k_lab_meeting_

PEP 412: Key sharing dict

Page 32: Compact ordered dict__k_lab_meeting_

PEP 412: Key sharing dict

Introduced in Python 3.4

Instances of same class can share keys object

Page 33: Compact ordered dict__k_lab_meeting_

class A:

def __init__(self, a, b):

self.foo = a

self.bar = b

a = A("spam", "ham")

b = A("bacon", "egg")

Page 34: Compact ordered dict__k_lab_meeting_

KeyClass

value

0 1 2 3 4 5 6 7

"bar"

52

"foo"

42

0 1index

"ham""spam"values

"egg""bacon"values

instance

instance

Page 35: Compact ordered dict__k_lab_meeting_

Problem

● Two instances can have different insertion order○ drop key sharing dict?

■ key sharing dict can save more memory.● But __slots__ can be used for such cases!

■ performance improvements in some microbench● Is it matter for real case? __slots__?

■ Needs consensus● it's more difficult than implementation

Page 36: Compact ordered dict__k_lab_meeting_

Keep key sharing dict support

● Only exactly same order can be permitted○ "skipped" keys are prohibited○ deletion is also prohibited

● Otherwise, stop "key sharing"○ `self.x = None` is faster than `del self.x`