How to Integrate Python into a Scala Stack to Build
Realtime Predictive Models
Jerry Chou
Lead Research Engineer
Stories Beforehand
• Product pivoted • Data search => data analysis• Build on top of existing infrastructure (hosted on AWS & Azure)
• Need tools for scientific computation• Mahout (Java)• Weka (Java)• Scikit-learn (Python)
2
Agenda
• Requirements and high level concepts
• Tools for calling Python from Scala
• Decision making
3
High Level Concept - Before
4
Existing business logic(in both Scala & Java)
Modeling Logic(in Python)
Node 1
Modeling Logic(in Python)
Node 2
…Modeling Logic(in Python)
Node N
Requirements
• APIs to exploit Python’s modeling power• Train, predict, model info query, etc
• Scalability• On demand Python serving nodes
5
Tools for Scala-Python Integration
• Reimplementation of Python• Jython (JPython)
• Communication through JNI• Jepp
• Communication through IPC• Thrift
• Communication through REST API calls• Bottle
6
Jython (JPython)
• Re-Implementation of Python in Java
• Compiles to Java bytecode• either on demand or statically.
• Can import and use any Java class
7
Jython
8
JVM
Scala Code
Python Code
Jython
Jython
• Lacks support for lots of extensions for scientific computing• Numpy, Scipy, etc.
• JyNI to the rescue?• Not ready yet for even Numpy
9
10糟透了 全部重做
Communication through JNI
•Jepp (Java Embedded Python)• Embeds CPython in Java• Runs Python code in CPython• Leverages both JNI and Python/C API for integration
11
Python Interpreter
Jepp
12
JVM
Scala Code
Python Code
JNI Jepp
Jepp
13
object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b)}
object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b)}
def python_add(a, b): return a + bdef python_add(a, b): return a + b
python_util.py
TestJepp.scala
Communication through IPC
• Thrift•Developed & open sourced by Facebook•IDL-based (Interface Definition Language)•Generates server/client code in specified languages•Take care of protocol and transport layer details•Comes with generators for Java, Python, C++, etc.
• No Scala generator• Scrooge to the rescue!
14
Thrift – IDL
15
namespace java python_service_testnamespace py python_service_test
service PythonAddService{ i32 pythonAdd (1:i32 a, 2:i32 b),}
namespace java python_service_testnamespace py python_service_test
service PythonAddService{ i32 pythonAdd (1:i32 a, 2:i32 b),}
TestThrift.thrift
$ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift
Thrift – Python Server
class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b
handler = ExampleHandler()processor = Example.Processor(handler)transport = TSocket.TServerSocket(9090)tfactory = TTransport.TBufferedTransportFactory()pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve()
class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b
handler = ExampleHandler()processor = Example.Processor(handler)transport = TSocket.TServerSocket(9090)tfactory = TTransport.TBufferedTransportFactory()pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve()
PythonAddServer.py
class Iface: def pythonAdd(self, a, b): pass
class Iface: def pythonAdd(self, a, b): pass
PythonAddService.py
Thrift – Scala Client
17
object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol)
transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close()}
object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol)
transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close()}
PythonAddClient.scala
Thrift
18
JVM Scala Code
Thrift
Python Code
Python Interpreter
Thrift
Python Code
Python Interpreter
Thrift
…
Auto Balancing、Built-in Encryption
19
哦 ~ 還不錯
REST API Architecture
20
…Bottle
Python Code
Bottle
Python Code
Bottle
Python Code
JVM
Scala Code
Auto Balancer?Encoding?
Thrift v.s. REST
Thrift REST
Load Balancer ✔Encode / Decode ✔Low Learning Curve ✔No Dependency ✔
Does it matter?
No (AWS & Azure)
No(We’re already doing it)
Maybe
Yes
Fliptop’s Architecture
22
Load Balancer
…Bottle
Python Code
Bottle
Python Code
Bottle
Python Code
JVM Scala Code
5 Python servers~4,500 requests/sec
Summary
• Jython• (✓) Tight integration with Scala/Java• (✗) Lack support for C extensions (JyNI might help in the future)
• Jepp• (✓) Access high quality Python extensions with CPython speed• (✗) Two runtime environments
• Thrift, REST• (✓) Language-independent development• (✗) Bigger communication overhead
23
Thank You
24
Other tools
• JyNI (Jython Native Interface)• A compatibility layer to enable Jython to use native CPython extensions like
NumPy or SciPy• Binary compatible with existing builds
• Cython• A subset of Python implementation written in Python that translates Python
codes to C
• JNA (Java Native Access)• JNI-based wrapper providing Java programs access to native shared libraries
• JPE (Java-Python Extension)• JNI-based wrapper integrating Java and standard Python• last updated at: 2013-03-22
25