Upload
pole-systematic-paris-region
View
164
Download
0
Embed Size (px)
Citation preview
Pythran: Static Compilation ofParallel Scientific Kernels
a.k.a. Python/Numpy compiler for themass
Proudly made in Namek by & serge-sans-paille pbrunet
/usSerge « sans paille » Guelton
$ w h o a m is g u e l t o n
R&D engineer at on compilation for securityAssociate researcher at Télécom Bretagne
QuarksLab
Pierrick Brunet$ w h o a m ip b r u n e t
R&D engineer at on parallelismINRIAlpes/MOAIS
Pythran in a snake shell
A Numpy-centric Python-to-C++ translatorA Python code optimizerA Pythonic C++ library
Core conceptsFocus on high-level constructsGenerate clean high level codeOptimize Python code before generated codeVectorization and ParalllelismTest, test, testBench, bench, bench
Ask StackOverflowwhen you're looking for test cases
http://stackoverflow.com/[...]numba-or-cython-acceleration-in-reaction-diffusion-algorithm
i m p o r t n u m p y a s n pd e f G r a y S c o t t ( c o u n t s , D u , D v , F , k ) : n = 3 0 0 U = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t 3 2 ) V = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t 3 2 ) u , v = U [ 1 : - 1 , 1 : - 1 ] , V [ 1 : - 1 , 1 : - 1 ]
r = 2 0 u [ : ] = 1 . 0 U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 V [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 2 5 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) v + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )
f o r i i n r a n g e ( c o u n t s ) : L u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ] - 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1 : - 1 , 2 : ] + U [ 2 : , 1 : - 1 ] ) L v = ( V [ 0 : - 2 , 1 : - 1 ] + V [ 1 : - 1 , 0 : - 2 ] - 4 * V [ 1 : - 1 , 1 : - 1 ] + V [ 1 : - 1 , 2 : ] +
Thread SummaryOP
My code is slow with Cython and NumbaBest Answer
You need to make all loops explicit
Cython Versionc i m p o r t c y t h o ni m p o r t n u m p y a s n pc i m p o r t n u m p y a s n p
c p d e f c y t h o n G r a y S c o t t ( i n t c o u n t s , d o u b l e D u , d o u b l e D v , d o u b l e F , d o u b l e k ) : c d e f i n t n = 3 0 0 c d e f n p . n d a r r a y U = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t _ ) c d e f n p . n d a r r a y V = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t _ ) c d e f n p . n d a r r a y u = U [ 1 : - 1 , 1 : - 1 ] c d e f n p . n d a r r a y v = V [ 1 : - 1 , 1 : - 1 ]
c d e f i n t r = 2 0 u [ : ] = 1 . 0 U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 V [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 2 5 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) v + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )
c d e f n p . n d a r r a y L u = n p . z e r o s _ l i k e ( u ) c d e f n p . n d a r r a y L v = n p . z e r o s _ l i k e ( v ) c d e f i n t i , c , r 1 , c 1 , r 2 , c 2 c d e f d o u b l e u v v
Pythran versionAdd this line to the original kernel:
# p y t h r a n e x p o r t G r a y S c o t t ( i n t , f l o a t , f l o a t , f l o a t , f l o a t )
Timings$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 5 2 . 9 m s e c p e r l o o p$ c y t h o n g r a y s c o t t . p y x$ g c c g r a y s c o t t . c ` ` - s h a r e d - f P I C - o g r a y s c o t t . s o - O 3 - m a r c h =$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 3 6 . 4 m s e c p e r l o o p$ p y t h r a n g r a y s c o t t . p y - O 3 - m a r c h = n a t i v e$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 2 0 . 3 m s e c p e r l o o p
p y t h o n - c o n f i g - - c f l a g s - - l i b s
Lessons learntExplicit is not always better than implicitMany ``optimization hints'' can be deduced by the compilerHigh level constructs carry valuable informations
I am not saying Cython is bad. Cython does a great job. It is justpragmatic where Pythran is idealist
Compilation Challengesu = U [ 1 : - 1 , 1 : - 1 ]U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )
Array viewsValue broadcastingTemporary arrays creationExtended slices compositionNumpy API calls
Optimization OpportunitiesL u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ] - 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1 : - 1 , 2 : ] + U [ 2 : , 1 : - 1 ] )
Many useless temporariesLu could be forward-substitutedSIMD instruction generation opportunitiesParallel loop opportunities
Pythran Usage$ p y t h r a n - - h e l pu s a g e : p y t h r a n [ - h ] [ - o O U T P U T _ F I L E ] [ - E ] [ - e ] [ - f f l a g ] [ - v ] [ - p p a s s ] [ - m m a c h i n e ] [ - I i n c l u d e _ d i r ] [ - L l d f l a g s ] [ - D m a c r o _ d e f i n i t i o n ] [ - O l e v e l ] [ - g ] i n p u t _ f i l e
p y t h r a n : a p y t h o n t o C + + c o m p i l e r
p o s i t i o n a l a r g u m e n t s : i n p u t _ f i l e t h e p y t h r a n m o d u l e t o c o m p i l e , e i t h e r a . p y o r a . c p p f i l e
o p t i o n a l a r g u m e n t s : - h , - - h e l p s h o w t h i s h e l p m e s s a g e a n d e x i t - o O U T P U T _ F I L E p a t h t o g e n e r a t e d f i l e - E o n l y r u n t h e t r a n s l a t o r , d o n o t c o m p i l e - e s i m i l a r t o - E , b u t d o e s n o t g e n e r a t e p y t h o n g l u e - f f l a g a n y c o m p i l e r s w i t c h r e l e v a n t t o t h e u n d e r l y i n g C + + c o m p i l e r - v b e v e r b o s e - p p a s s a n y p y t h r a n o p t i m i z a t i o n t o a p p l y b e f o r e c o d e g e n e r a t i o n - m m a c h i n e a n y m a c h i n e f l a g r e l e v a n t t o t h e u n d e r l y i n g C + + c o m p i l e r
Sample Usage$ p y t h r a n i n p u t . p y # g e n e r a t e s i n p u t . s o$ p y t h r a n i n p u t . p y - E # g e n e r a t e s i n p u t . c p p$ p y t h r a n i n p u t . p y - O 3 - f o p e n m p # p a r a l l e l !$ p y t h r a n i n p u t . p y - m a r c h = n a t i v e - O f a s t # E s o d M u m i x a m !
Type AnnotationsOnly for exported functions
# p y t h r a n e x p o r t f o o 0 ( )# p y t h r a n e x p o r t f o o 1 ( i n t )
# p y t h r a n e x p o r t f o o 2 ( f l o a t 3 2 [ ] [ ] )# p y t h r a n e x p o r t f o o 2 ( f l o a t 6 4 [ ] [ ] )# p y t h r a n e x p o r t f o o 2 ( i n t 8 [ ] [ ] [ ] )
# p y t h r a n e x p o r t f o o 3 ( ( i n t , f l o a t ) , i n t l i s t , s t r : s t r d i c t )
Pythran Compilation Flowinput.py
AST
IR
outout.py input.cpp
input.so
Optimization loop
import ast
gcc [...]
Front End100% based on the ast moduleSupports
Several standard module (incl. partial Numpy)Polymorphic functionsndarray, list, tuple, dict, str, int, long, floatNamed parameters, default argumentsGenerators…
Does not SupportNon-implicitely typed codeGlobal variableMost Python modules (no CPython mode!)User-defined classes…
Middle EndIteratively applies high level, Python-aware optimizations:
Interprocedural Constant FoldingFor-Based-Loop UnrollingForward SubstitutionInstruction SelectionDeforestationScalar Renamingdead Code Elimination
Fun Facts: can evaluate pure functions at compile time ☺
Back EndsPython Back End
Useful for debugging!
C++11 Back EndC++11 implementation of __builtin__ numpyitertools…Lazy evaluation through Expression TemplatesRelies on OpenMP, nt2 and boost::simd for theparallelization / vectorization
C++11: Typingthe W.T.F. slide
Pythran translates Python implicitly statically typed polymorphiccode into C++ meta-programs that are instanciated for the user-
given types, and specialize them for the target architecture
C++11: ParallelismImplicit
Array operations and several numpy functions are written usingOpenMP and Boost.simd
ExplicitOpenMP 3 support, ported to Python
# o m p p a r a l l e l f o r r e d u c t i o n ( + : r )f o r i , v i n e n u m e r a t e ( l ) : r + = i * v
Benchmarks
A collection of high-level benchmarkshttps://github.com/serge-sans-paille/numpy-benchmarks
Code gathered from StackOverflow + other compiler code baseMostly high-level codeGenerate results for CPython, PyPy, Numba, Parakeet, Hopeand Pythran
Most kernels are too high level for Numba and Hope…
(Num)Focus: growcutFrom the Numba codebase!
# p y t h r a n e x p o r t g r o w c u t ( f l o a t [ ] [ ] [ ] , f l o a t [ ] [ ] [ ] , f l o a t [ ] [ ] [ ] , i n t )i m p o r t m a t hi m p o r t n u m p y a s n pd e f w i n d o w _ f l o o r ( i d x , r a d i u s ) : i f r a d i u s > i d x : r e t u r n 0 e l s e : r e t u r n i d x - r a d i u s
d e f w i n d o w _ c e i l ( i d x , c e i l , r a d i u s ) : i f i d x + r a d i u s > c e i l : r e t u r n c e i l e l s e : r e t u r n i d x + r a d i u s
d e f g r o w c u t ( i m a g e , s t a t e , s t a t e _ n e x t , w i n d o w _ r a d i u s ) : c h a n g e s = 0 s q r t _ 3 = m a t h . s q r t ( 3 . 0 )
h e i g h t = i m a g e . s h a p e [ 0 ] w i d t h = i m a g e . s h a p e [ 1 ]
Academic ResultsPythran: Enabling Static Optimization of Scientific PythonPrograms, S. Guelton, P. Brunet et al. in CSD, 2015Exploring the Vectorization of Python Constructs UsingPythran and Boost SIMD, S. Guelton, J. Falcou and P. Brunet,in WPMVP, 2014Compiling Python modules to native parallel modules usingPythran and OpenMP Annotations, S. Guelton, P. Brunetand M. Amini, in PyHPC, 2013Pythran: Enabling Static Optimization of Scientific PythonPrograms, S. Guelton, P. Brunet et al. in SciPy, 2013
Powered by Strong EngineeringPreprequisite for reproductible science
2773 test cases, incl. unit testing, doctest, Continuousintegration (thx !)Peer-reviewed codePython2.7 and C++11Linux, OSX (almost okay), Windows (on going)User and Developer doc: Hosted on Releases on PyPi: $ pip install pythran
: $ apt-get install pythran
Travis
http://pythonhosted.org/pythran/https://github.com/serge-sans-paille/pythran
Custom Debian repo
We need more peonsPythonic needs a serious cleanupTyping module needs better error reportingOSX support is partial and Windows support is on-goingnumpy.random and numpy.linalg
THE ENDAUTHORS
Serge Guelton, Pierrick Brunet, Mehdi Amini, AdrienMerlini, Alan Raynaud, Eliott Coyac…
INDUSTRIAL SUPPORT, , →You←
CONTRIBUTE#pythran on FreeNode, [email protected], GitHub repo
Silkan NumScale