[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

  • View
    1.468

  • Download
    3

Embed Size (px)

Text of [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

  • 1.How to Integrate Python into a Scala Stack to Build Realtime Predictive Models Jerry Chou Lead Research Engineer jerry@fliptop.com

2. Stories Beforehand Product pivoted Data search => data analysis Build on top of existing infrastructure (hosted on AWS & Azure) Need tools for scientific computation Mahout (Java) Weka (Java) Scikit-learn (Python) 2 3. Agenda Requirements and high level concepts Tools for calling Python from Scala Decision making 3 4. High Level Concept - Before 4 Existing business logic (in both Scala & Java) Modeling Logic (in Python) Node 1 Modeling Logic (in Python) Node 2 Modeling Logic (in Python) Node N 5. Requirements APIs to exploit Pythons modeling power Train, predict, model info query, etc Scalability On demand Python serving nodes 5 6. Tools for Scala-Python Integration Reimplementation of Python Jython (JPython) Communication through JNI Jepp Communication through IPC Thrift Communication through REST API calls Bottle 6 7. Jython (JPython) Re-Implementation of Python in Java Compiles to Java bytecode either on demand or statically. Can import and use any Java class 7 8. Jython 8 JVM Scala Code Python Code Jython 9. Jython Lacks support for lots of extensions for scientific computing Numpy, Scipy, etc. JyNI to the rescue? Not ready yet for even Numpy 9 10. 10 11. Communication through JNI Jepp (Java Embedded Python) Embeds CPython in Java Runs Python code in CPython Leverages both JNI and Python/C API for integration 11 12. Python Interpreter Jepp 12 JVM Scala Code Python Code JNI Jepp 13. Jepp 13 object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } def python_add(a, b): return a + b def python_add(a, b): return a + b python_util.py TestJepp.scala 14. Communication through IPC Thrift Developed & open sourced by Facebook IDL-based (Interface Definition Language) Generates server/client code in specified languages Take care of protocol and transport layer details Comes with generators for Java, Python, C++, etc. No Scala generator Scrooge to the rescue! 14 15. Thrift IDL 15 namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } TestThrift.thrift $ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift 16. Thrift Python Server class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() PythonAddServer.py class Iface: def pythonAdd(self, a, b): pass class Iface: def pythonAdd(self, a, b): pass PythonAddService.py 17. Thrift Scala Client 17 object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } PythonAddClient.scala 18. Thrift 18 JVM Scala Code Thrift Python Code Python Interpreter Thrift Python Code Python Interpreter Thrift Auto Balancing Built-in Encryption 19. 19 ~ 20. REST API Architecture 20 Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code Auto Balancer? Encoding? 21. Thrift v.s. REST Thrift RES T Load Balancer Encode / Decode Low Learning Curve No Dependency Does it matter? No (AWS & Azure) No (Were already doing it) Maybe Yes 22. Fliptops Architecture 22 Load Balancer Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code 5 Python servers ~4,500 requests/sec 23. Summary Jython () Tight integration with Scala/Java () Lack support for C extensions (JyNI might help in the future) Jepp () Access high quality Python extensions with CPython speed () Two runtime environments Thrift, REST () Language-independent development () Bigger communication overhead 23 24. Thank You 24 25. Other tools JyNI (Jython Native Interface) A compatibility layer to enable Jython to use native CPython extensions like NumPy or SciPy Binary compatible with existing builds Cython A subset of Python implementation written in Python that translates Python codes to C JNA (Java Native Access) JNI-based wrapper providing Java programs access to native shared libraries JPE (Java-Python Extension) JNI-based wrapper integrating Java and standard Python last updated at: 2013-03-22 25