29

PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Embed Size (px)

Citation preview

Pythran: Static Compilation ofParallel Scientific Kernels

a.k.a. Python/Numpy compiler for themass

Proudly made in Namek by & serge-sans-paille pbrunet

/usSerge « sans paille » Guelton

$ w h o a m is g u e l t o n

R&D engineer at on compilation for securityAssociate researcher at Télécom Bretagne

QuarksLab

Pierrick Brunet$ w h o a m ip b r u n e t

R&D engineer at on parallelismINRIAlpes/MOAIS

Pythran in a snake shell

A Numpy-centric Python-to-C++ translatorA Python code optimizerA Pythonic C++ library

Core conceptsFocus on high-level constructsGenerate clean high level codeOptimize Python code before generated codeVectorization and ParalllelismTest, test, testBench, bench, bench

Ask StackOverflowwhen you're looking for test cases

http://stackoverflow.com/[...]numba-or-cython-acceleration-in-reaction-diffusion-algorithm

i m p o r t n u m p y a s n pd e f G r a y S c o t t ( c o u n t s , D u , D v , F , k ) : n = 3 0 0 U = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t 3 2 ) V = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t 3 2 ) u , v = U [ 1 : - 1 , 1 : - 1 ] , V [ 1 : - 1 , 1 : - 1 ]

r = 2 0 u [ : ] = 1 . 0 U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 V [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 2 5 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) v + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )

f o r i i n r a n g e ( c o u n t s ) : L u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ] - 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1 : - 1 , 2 : ] + U [ 2 : , 1 : - 1 ] ) L v = ( V [ 0 : - 2 , 1 : - 1 ] + V [ 1 : - 1 , 0 : - 2 ] - 4 * V [ 1 : - 1 , 1 : - 1 ] + V [ 1 : - 1 , 2 : ] +

Thread SummaryOP

My code is slow with Cython and NumbaBest Answer

You need to make all loops explicit

Cython Versionc i m p o r t c y t h o ni m p o r t n u m p y a s n pc i m p o r t n u m p y a s n p

c p d e f c y t h o n G r a y S c o t t ( i n t c o u n t s , d o u b l e D u , d o u b l e D v , d o u b l e F , d o u b l e k ) : c d e f i n t n = 3 0 0 c d e f n p . n d a r r a y U = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t _ ) c d e f n p . n d a r r a y V = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t _ ) c d e f n p . n d a r r a y u = U [ 1 : - 1 , 1 : - 1 ] c d e f n p . n d a r r a y v = V [ 1 : - 1 , 1 : - 1 ]

c d e f i n t r = 2 0 u [ : ] = 1 . 0 U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 V [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 2 5 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) v + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )

c d e f n p . n d a r r a y L u = n p . z e r o s _ l i k e ( u ) c d e f n p . n d a r r a y L v = n p . z e r o s _ l i k e ( v ) c d e f i n t i , c , r 1 , c 1 , r 2 , c 2 c d e f d o u b l e u v v

Pythran versionAdd this line to the original kernel:

# p y t h r a n e x p o r t G r a y S c o t t ( i n t , f l o a t , f l o a t , f l o a t , f l o a t )

Timings$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 5 2 . 9 m s e c p e r l o o p$ c y t h o n g r a y s c o t t . p y x$ g c c g r a y s c o t t . c ` ` - s h a r e d - f P I C - o g r a y s c o t t . s o - O 3 - m a r c h =$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 3 6 . 4 m s e c p e r l o o p$ p y t h r a n g r a y s c o t t . p y - O 3 - m a r c h = n a t i v e$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 2 0 . 3 m s e c p e r l o o p

p y t h o n - c o n f i g - - c f l a g s - - l i b s

Lessons learntExplicit is not always better than implicitMany ``optimization hints'' can be deduced by the compilerHigh level constructs carry valuable informations

I am not saying Cython is bad. Cython does a great job. It is justpragmatic where Pythran is idealist

Compilation Challengesu = U [ 1 : - 1 , 1 : - 1 ]U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )

Array viewsValue broadcastingTemporary arrays creationExtended slices compositionNumpy API calls

Optimization OpportunitiesL u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ] - 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1 : - 1 , 2 : ] + U [ 2 : , 1 : - 1 ] )

Many useless temporariesLu could be forward-substitutedSIMD instruction generation opportunitiesParallel loop opportunities

Pythran Usage$ p y t h r a n - - h e l pu s a g e : p y t h r a n [ - h ] [ - o O U T P U T _ F I L E ] [ - E ] [ - e ] [ - f f l a g ] [ - v ] [ - p p a s s ] [ - m m a c h i n e ] [ - I i n c l u d e _ d i r ] [ - L l d f l a g s ] [ - D m a c r o _ d e f i n i t i o n ] [ - O l e v e l ] [ - g ] i n p u t _ f i l e

p y t h r a n : a p y t h o n t o C + + c o m p i l e r

p o s i t i o n a l a r g u m e n t s : i n p u t _ f i l e t h e p y t h r a n m o d u l e t o c o m p i l e , e i t h e r a . p y o r a . c p p f i l e

o p t i o n a l a r g u m e n t s : - h , - - h e l p s h o w t h i s h e l p m e s s a g e a n d e x i t - o O U T P U T _ F I L E p a t h t o g e n e r a t e d f i l e - E o n l y r u n t h e t r a n s l a t o r , d o n o t c o m p i l e - e s i m i l a r t o - E , b u t d o e s n o t g e n e r a t e p y t h o n g l u e - f f l a g a n y c o m p i l e r s w i t c h r e l e v a n t t o t h e u n d e r l y i n g C + + c o m p i l e r - v b e v e r b o s e - p p a s s a n y p y t h r a n o p t i m i z a t i o n t o a p p l y b e f o r e c o d e g e n e r a t i o n - m m a c h i n e a n y m a c h i n e f l a g r e l e v a n t t o t h e u n d e r l y i n g C + + c o m p i l e r

Sample Usage$ p y t h r a n i n p u t . p y # g e n e r a t e s i n p u t . s o$ p y t h r a n i n p u t . p y - E # g e n e r a t e s i n p u t . c p p$ p y t h r a n i n p u t . p y - O 3 - f o p e n m p # p a r a l l e l !$ p y t h r a n i n p u t . p y - m a r c h = n a t i v e - O f a s t # E s o d M u m i x a m !

Type AnnotationsOnly for exported functions

# p y t h r a n e x p o r t f o o 0 ( )# p y t h r a n e x p o r t f o o 1 ( i n t )

# p y t h r a n e x p o r t f o o 2 ( f l o a t 3 2 [ ] [ ] )# p y t h r a n e x p o r t f o o 2 ( f l o a t 6 4 [ ] [ ] )# p y t h r a n e x p o r t f o o 2 ( i n t 8 [ ] [ ] [ ] )

# p y t h r a n e x p o r t f o o 3 ( ( i n t , f l o a t ) , i n t l i s t , s t r : s t r d i c t )

Pythran Compilation Flowinput.py

AST

IR

outout.py input.cpp

input.so

Optimization loop

import ast

gcc [...]

Front End100% based on the ast moduleSupports

Several standard module (incl. partial Numpy)Polymorphic functionsndarray, list, tuple, dict, str, int, long, floatNamed parameters, default argumentsGenerators…

Does not SupportNon-implicitely typed codeGlobal variableMost Python modules (no CPython mode!)User-defined classes…

Middle EndIteratively applies high level, Python-aware optimizations:

Interprocedural Constant FoldingFor-Based-Loop UnrollingForward SubstitutionInstruction SelectionDeforestationScalar Renamingdead Code Elimination

Fun Facts: can evaluate pure functions at compile time ☺

Back EndsPython Back End

Useful for debugging!

C++11 Back EndC++11 implementation of __builtin__ numpyitertools…Lazy evaluation through Expression TemplatesRelies on OpenMP, nt2 and boost::simd for theparallelization / vectorization

C++11: Typingthe W.T.F. slide

Pythran translates Python implicitly statically typed polymorphiccode into C++ meta-programs that are instanciated for the user-

given types, and specialize them for the target architecture

C++11: ParallelismImplicit

Array operations and several numpy functions are written usingOpenMP and Boost.simd

ExplicitOpenMP 3 support, ported to Python

# o m p p a r a l l e l f o r r e d u c t i o n ( + : r )f o r i , v i n e n u m e r a t e ( l ) : r + = i * v

Benchmarks

A collection of high-level benchmarkshttps://github.com/serge-sans-paille/numpy-benchmarks

Code gathered from StackOverflow + other compiler code baseMostly high-level codeGenerate results for CPython, PyPy, Numba, Parakeet, Hopeand Pythran

Most kernels are too high level for Numba and Hope…

Benchmarksno parallelism, no vectorisation (, no fat)

(Num)Focus: growcutFrom the Numba codebase!

# p y t h r a n e x p o r t g r o w c u t ( f l o a t [ ] [ ] [ ] , f l o a t [ ] [ ] [ ] , f l o a t [ ] [ ] [ ] , i n t )i m p o r t m a t hi m p o r t n u m p y a s n pd e f w i n d o w _ f l o o r ( i d x , r a d i u s ) : i f r a d i u s > i d x : r e t u r n 0 e l s e : r e t u r n i d x - r a d i u s

d e f w i n d o w _ c e i l ( i d x , c e i l , r a d i u s ) : i f i d x + r a d i u s > c e i l : r e t u r n c e i l e l s e : r e t u r n i d x + r a d i u s

d e f g r o w c u t ( i m a g e , s t a t e , s t a t e _ n e x t , w i n d o w _ r a d i u s ) : c h a n g e s = 0 s q r t _ 3 = m a t h . s q r t ( 3 . 0 )

h e i g h t = i m a g e . s h a p e [ 0 ] w i d t h = i m a g e . s h a p e [ 1 ]

(Num)Focus: growcut

Academic ResultsPythran: Enabling Static Optimization of Scientific PythonPrograms, S. Guelton, P. Brunet et al. in CSD, 2015Exploring the Vectorization of Python Constructs UsingPythran and Boost SIMD, S. Guelton, J. Falcou and P. Brunet,in WPMVP, 2014Compiling Python modules to native parallel modules usingPythran and OpenMP Annotations, S. Guelton, P. Brunetand M. Amini, in PyHPC, 2013Pythran: Enabling Static Optimization of Scientific PythonPrograms, S. Guelton, P. Brunet et al. in SciPy, 2013

Powered by Strong EngineeringPreprequisite for reproductible science

2773 test cases, incl. unit testing, doctest, Continuousintegration (thx !)Peer-reviewed codePython2.7 and C++11Linux, OSX (almost okay), Windows (on going)User and Developer doc: Hosted on Releases on PyPi: $ pip install pythran

: $ apt-get install pythran

Travis

http://pythonhosted.org/pythran/https://github.com/serge-sans-paille/pythran

Custom Debian repo

We need more peonsPythonic needs a serious cleanupTyping module needs better error reportingOSX support is partial and Windows support is on-goingnumpy.random and numpy.linalg

THE ENDAUTHORS

Serge Guelton, Pierrick Brunet, Mehdi Amini, AdrienMerlini, Alan Raynaud, Eliott Coyac…

INDUSTRIAL SUPPORT, , →You←

CONTRIBUTE#pythran on FreeNode, [email protected], GitHub repo

Silkan NumScale