Transcript
Page 1: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

How to Integrate Python into a Scala Stack to Build

Realtime Predictive Models

Jerry Chou

Lead Research Engineer

[email protected]

Page 2: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Stories Beforehand

• Product pivoted • Data search => data analysis• Build on top of existing infrastructure (hosted on AWS & Azure)

• Need tools for scientific computation• Mahout (Java)• Weka (Java)• Scikit-learn (Python)

2

Page 3: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Agenda

• Requirements and high level concepts

• Tools for calling Python from Scala

• Decision making

3

Page 4: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

High Level Concept - Before

4

Existing business logic(in both Scala & Java)

Modeling Logic(in Python)

Node 1

Modeling Logic(in Python)

Node 2

…Modeling Logic(in Python)

Node N

Page 5: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Requirements

• APIs to exploit Python’s modeling power• Train, predict, model info query, etc

• Scalability• On demand Python serving nodes

5

Page 6: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Tools for Scala-Python Integration

• Reimplementation of Python• Jython (JPython)

• Communication through JNI• Jepp

• Communication through IPC• Thrift

• Communication through REST API calls• Bottle

6

Page 7: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Jython (JPython)

• Re-Implementation of Python in Java

• Compiles to Java bytecode• either on demand or statically.

• Can import and use any Java class

7

Page 8: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Jython

8

JVM

Scala Code

Python Code

Jython

Page 9: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Jython

• Lacks support for lots of extensions for scientific computing• Numpy, Scipy, etc.

• JyNI to the rescue?• Not ready yet for even Numpy

9

Page 10: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

10糟透了 全部重做

Page 11: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Communication through JNI

•Jepp (Java Embedded Python)• Embeds CPython in Java• Runs Python code in CPython• Leverages both JNI and Python/C API for integration

11

Page 12: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Python Interpreter

Jepp

12

JVM

Scala Code

Python Code

JNI Jepp

Page 13: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Jepp

13

object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b)}

object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b)}

def python_add(a, b): return a + bdef python_add(a, b): return a + b

python_util.py

TestJepp.scala

Page 14: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Communication through IPC

• Thrift•Developed & open sourced by Facebook•IDL-based (Interface Definition Language)•Generates server/client code in specified languages•Take care of protocol and transport layer details•Comes with generators for Java, Python, C++, etc.

• No Scala generator• Scrooge to the rescue!

14

Page 15: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Thrift – IDL

15

namespace java python_service_testnamespace py python_service_test

service PythonAddService{ i32 pythonAdd (1:i32 a, 2:i32 b),}

namespace java python_service_testnamespace py python_service_test

service PythonAddService{ i32 pythonAdd (1:i32 a, 2:i32 b),}

TestThrift.thrift

$ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift

Page 16: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Thrift – Python Server

class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b

handler = ExampleHandler()processor = Example.Processor(handler)transport = TSocket.TServerSocket(9090)tfactory = TTransport.TBufferedTransportFactory()pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve()

class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b

handler = ExampleHandler()processor = Example.Processor(handler)transport = TSocket.TServerSocket(9090)tfactory = TTransport.TBufferedTransportFactory()pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve()

PythonAddServer.py

class Iface: def pythonAdd(self, a, b): pass

class Iface: def pythonAdd(self, a, b): pass

PythonAddService.py

Page 17: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Thrift – Scala Client

17

object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol)

transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close()}

object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol)

transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close()}

PythonAddClient.scala

Page 18: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Thrift

18

JVM Scala Code

Thrift

Python Code

Python Interpreter

Thrift

Python Code

Python Interpreter

Thrift

Auto Balancing、Built-in Encryption

Page 19: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

19

哦 ~ 還不錯

Page 20: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

REST API Architecture

20

…Bottle

Python Code

Bottle

Python Code

Bottle

Python Code

JVM

Scala Code

Auto Balancer?Encoding?

Page 21: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Thrift v.s. REST

Thrift REST

Load Balancer ✔Encode / Decode ✔Low Learning Curve ✔No Dependency ✔

Does it matter?

No (AWS & Azure)

No(We’re already doing it)

Maybe

Yes

Page 22: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Fliptop’s Architecture

22

Load Balancer

…Bottle

Python Code

Bottle

Python Code

Bottle

Python Code

JVM Scala Code

5 Python servers~4,500 requests/sec

Page 23: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Summary

• Jython• (✓) Tight integration with Scala/Java• (✗) Lack support for C extensions (JyNI might help in the future)

• Jepp• (✓) Access high quality Python extensions with CPython speed• (✗) Two runtime environments

• Thrift, REST• (✓) Language-independent development• (✗) Bigger communication overhead

23

Page 24: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Thank You

24

Page 25: [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Other tools

• JyNI (Jython Native Interface)• A compatibility layer to enable Jython to use native CPython extensions like

NumPy or SciPy• Binary compatible with existing builds

• Cython• A subset of Python implementation written in Python that translates Python

codes to C

• JNA (Java Native Access)• JNI-based wrapper providing Java programs access to native shared libraries

• JPE (Java-Python Extension)• JNI-based wrapper integrating Java and standard Python• last updated at: 2013-03-22

25