Weird stuff with hashes.key

Weird stuff with hasheshttp://tenderlovemaking.com/

2015/02/11/weird-stuff-with-hashes.html

그래서 정리해봤습니다.

원필자 소개

Aaron Patterson

Rails Core Team Ruby Core Team

RedHat

Integration Test 가 가비지 컬렉션에서 엄청난 시간을 소모하는걸 발견함

원인은 문자열을 Key로 사용하는 Hash 때문이었음.

Hash?

Key - Value 한 세트입니다.

f(Key) -> Address Key를 이용해 주소를 계산

Address <- Value 해당하는 주소에 값을 넣어준다.

So, It’s fast. O(1)

Easy to use hash[‘key’] = value

How this works?

x = 'key'hash = {}hash[x] = 'value'

x == hash.keys.first>>

How this works?


x == hash.keys.first>> true

How this works?


x.object_id == hash.keys.first.object_id>>

How this works?


x.object_id == hash.keys.first.object_id>> false

실제로는…

How this works?

x = 'key'hash = {}hash[x] = ‘value'

How this works?

x = 'key'hash = {}# 주의: 예시코드입니다.# temp_x = x.dup# temp_x.freeze# hash[temp_x] = 'value'

다시말해, 문자열을 키로 사용하면, 객체를 한번씩 더 생성하고 있었다는 것임

왜죠?

Case1. 외부로부터의 변경

x = 'string'y = {}y[x] = :valuex = 'str'y.key? x

우리가 생각하는 동작

x = 'string'y = {}y[x] = :valuex = 'str'y.key? x>> false

만약 dup, freeze가 없다면?

x = 'string'y = {}y[x] = :valuex = 'str'y.key? x>> false

…str :value…

x = 'string'y = {}y[x] = :valuex = 'str'y.key? 'str'>> true

…string :value

…

Case2. 해시값의 문제

class Foo attr_accessor :hashend

x = Foo.newx.hash = 10hsh = {}hsh[x] = :hello

hash: key를 이용해 주소를 계산하고 반환 eql?: 해시값이 충돌하는지 확인

puts hsh.key?(x)>> trueputs hsh.keys.include?(x)>> true

x.hash = 11

puts hsh.key?(x)

puts hsh.keys.include?(x)

KEY HASH VALUE

x 10 :hello11

x.hash = 11

puts hsh.key?(x)>> falseputs hsh.keys.include?(x)>> true

KEY HASH VALUE

10 :hellox 11

x.hash = 11hash.rehashputs hash.key?(x)>> trueputs hash.keys.include?(x)>> true

KEY HASH VALUE

x 10 :hello11

KEY HASH VALUE

10x 11 :hello

앞의 34페이지를 한마디로 요약해보면,

“해시에서 키의 해시값이 바뀔만한 일은 하지 마세요.”

Bonus Performance P. S.

2 way for cloning hash

Hash[] vs Hash#dup

뭐가 빠를까요?

Calculating ------------------------------------- Hash#dup 7.705k i/100ms Hash[] 15.978k i/100ms --------------------------------------------------- Hash#dup 93.648k (± 4.9%) i/s - 470.005k Hash[] 230.497k (±11.2%) i/s - 1.150M

Hash#dup -> call ‘rehash’Hash[] -> don’t call ‘rehash'

그렇다고 무조건 Hash[]를쓰시라는 이야기는 아닙니다.

Only if your benchmarks can prove that it’s a bottleneck.

Q.A.

- END -

Technology

Weird stuff with hashes.key