Python 中 += 與 join比較

Preview:

DESCRIPTION

Python 中 += 與 join比較

Citation preview

Python 中 += 與 join比較

高國棟

● 2013/04 在 taipei.py 演講關於 pdb 的實作。相關投影片:http://www.slideshare.net/ya790026/recoverpdb

● 2013/05 在 pyconf.tw 演將 CPython 原始碼解析。相關投影片:http://www.slideshare.net/ya790026/c-python-23247730。

● 2013/08 在taipei.py 演講 python 如何執行程式碼。相關投影片:http://www.slideshare.net/ya790026/python-27854881

演講經歷

實驗與觀察

● https://gist.github.com/ya790206/7496787

在 windows, linux, mac 下,呈現的結果有時 join 快,有時 += 快, why?

pep8

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

What is fragile ?

+=

● += 的 opcode 是 INPLACE_ADD● + 的 opcode 是 BINARY_ADD● 執行字串加法是靠 x = string_concatenate(v,

w, f, next_instr);● Python/ceval.c

string_concatenate if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) {

if (_PyString_Resize(&v, new_len) != 0) {

return NULL;

}

memcpy(PyString_AS_STRING(v) + v_len,

PyString_AS_STRING(w), w_len);

return v;

}

else {

PyString_Concat(&v, w);

return v;

}

https://github.com/ya790206/CPython/blob/master/Python/ceval.c#L4836

PyString_Concatvoid

PyString_Concat(register PyObject **pv, register PyObject *w)

{

register PyObject *v;

if (*pv == NULL)

return;

if (w == NULL || !PyString_Check(*pv)) {

Py_CLEAR(*pv);

return;

}

v = string_concat((PyStringObject *) *pv, w);

Py_DECREF(*pv);

*pv = v;

}

https://github.com/ya790206/CPython/blob/master/Objects/stringobject.c#L3856

string_concat op = (PyStringObject *)PyObject_MALLOC(PyStringObject_SIZE + size);

if (op == NULL)

return PyErr_NoMemory();

PyObject_INIT_VAR(op, &PyString_Type, size);

op->ob_shash = -1;

op->ob_sstate = SSTATE_NOT_INTERNED;

Py_MEMCPY(op->ob_sval, a->ob_sval, Py_SIZE(a));

Py_MEMCPY(op->ob_sval + Py_SIZE(a), b->ob_sval, Py_SIZE(b));

op->ob_sval[size] = '\0';

return (PyObject *) op;

https://github.com/ya790206/CPython/blob/master/Objects/stringobject.c#L1014

_PyString_Resize

● defined in Objects/stringobject.c● it called PyObject_REALLOC● _PyString_Resize -> PyObject_REALLOC

(Include/objimp.h) -> PyObject_Realloc(Objects/obmalloc.c)

PyObject_Realloc

return realloc(p, nbytes);

https://github.com/ya790206/CPython/blob/master/Objects/obmalloc.c#L1176

uClibc 的 realloc if (new_size > size) /* Grow the block. */ { size_t extra = new_size - size;

__heap_lock (&__malloc_heap_lock); extra = __heap_alloc_at (&__malloc_heap, base_mem + size, extra); __heap_unlock (&__malloc_heap_lock);

if (extra)/* Record the changed size. */MALLOC_SET_SIZE (base_mem, size + extra);

else/* Our attempts to extend MEM in place failed, just allocate-and-copy. */{ void *new_mem = malloc (new_size - MALLOC_HEADER_SIZE); if (new_mem) { memcpy (new_mem, mem, size - MALLOC_HEADER_SIZE); free (mem); } mem = new_mem;}

}

https://github.com/ya790206/ext_c_lib/blob/master/uClibc-0.9.33/libc/stdlib/malloc/realloc.c#L24

glibc 的 realloc if (chunk_is_mmapped(oldp))

{

void* newmem;

#if HAVE_MREMAP

newp = mremap_chunk(oldp, nb);

if(newp) return chunk2mem(newp);

#endif

/* Note the extra SIZE_SZ overhead. */

if(oldsize - SIZE_SZ >= nb) return oldmem; /* do nothing */

/* Must alloc, copy, free. */

newmem = __libc_malloc(bytes);

if (newmem == 0) return 0; /* propagate failure */

MALLOC_COPY(newmem, oldmem, oldsize - 2*SIZE_SZ);

munmap_chunk(oldp);

return newmem;

}

https://github.com/ya790206/ext_c_lib/blob/master/glibc-2.18/malloc/malloc.c#L2908

glibc 的 realloc /* Note the extra SIZE_SZ overhead. */

if(oldsize - SIZE_SZ >= nb)

newmem = oldmem; /* do nothing */

else {

/* Must alloc, copy, free. */

if (top_check() >= 0)

newmem = _int_malloc(&main_arena, bytes+1);

if (newmem) {

MALLOC_COPY(newmem, oldmem, oldsize - 2*SIZE_SZ);

munmap_chunk(oldp);

}

}

https://github.com/ya790206/ext_c_lib/blob/master/glibc-2.18/malloc/hooks.c#L290

複雜度分析

問: n 個長度為 m 的字串相加,在最糟糕的情形下,其複雜度?

答: ● 使用 join ,複雜度為 O(nm)● 使用 +=,其複雜度為 f(n) = f(n-1) + nm = O(n2m)

複雜度分析

問: n 個長度為 m 的字串相加,在最佳的情形下,其複雜度?

答: ● 使用 join ,複雜度為 O(nm)● 使用 +=,其複雜度為 O(nm)

一開始,join是在慢什麼?

1. 執行字串的 INPLACE_ADD 比 CALL_FUNCTION(append) 快多了。

2. python list 的實作與 c++ vector相似。當 list 空間不夠時,會將 list 搬到可以滿足新大小的地方

list resize

● list_resize(Object/listobject.c) -> PyMem_RESIZE(Include/pymem.h)-> PyMem_REALLOC(Include/pymem.h)-> PyMem_REALLOC(Include/pymem.h)-> realloc

為什麼 join 可以後來居上?

● list 要空間是越要越大。4, 8, 16, 25, 35, 46, 58, 72, 88, …

● list 只存指標,因此每次搬家只需複製 (現在list 大小 * 指標大小) 個bytes

結論

● 如果 realloc 不是回傳新的記憶體位址,則 +=會很有效率,因為減少一次 memcpy,而且呼叫字串加法比呼叫 list 的 append 快。

● 在程序的記憶體破碎 (memory fragment) 情形還沒很嚴重時,realloc比較不容易回傳新的位址。

結論

● 使用 join 會比 += 好,因為 join 的效能是可以預期的(使用 +=,你的程序可能執行越久,效能越差)。

● 如果你想要使用 += 的最佳化,則可以考慮在新建的程序執行。新建的程序不會有記憶體破碎問題,但是新建程序會有額外成本。

● += 和 + 的效能一樣。

工商時間

● PyConf 場務徵人

Question

Thank you