20160708 データ処理のプラットフォームとしてのpython 札幌

  • View
    2.156

  • Download
    0

  • Category

    Software

Preview:

Citation preview

Python

Sky

• Python 2000

(**)

• db tech showcase MongoDB

• FB: Ryuji Tamagawa

• Twitter : tamagawa_ryuji

2015

2016

• Python

• Python

• Python

• Python

• NumPy, SciPy, matplotlib, Pandas • Python

• scikit-learn • TensorFlow

• Python IPython, Jupyter notebook, Spyder, VisualStudio

• Python

• Python

• Pandas

• Spark - PySpark DataFrame API

• matplotlib

Part 1 : Python

Python•

• GoogleGuido GoogleGoogle 1

NumPy, SciPy, matplotlib → Pandas

-2000Linux

-2010 Web Trac

Google

Python•

Python

• pyODBC

• Web WSGI

Python• 2.x 3.x 32bit 64bit

64bit

• 2.x

• 3.x3

• 2.x3.x

• Ruby?

• R?

• Java?

• Scala?

Python• Python ’CPython’ JIT

PyPy JVM Jython .Net IronPython

• CPython

• CPython 2

• C

• processingPySpark

Python• Python

• 1 Linux Mac OS PythonPython Mac

• Python pip 3.x Python 2.7.9 2.xPython pip Linux Python

pip yum apt

• Python Anaconda Pythonconda

• python 2016 http://qiita.com/y__sama/items/5b62d31cb7e6ed50f02c

NumPy, SciPy, matplotlib, Pandas•

• NumPy SciPy

• PandasPandas Pandas NumPy

• Anaconda Python

Python•

scikit-learn http://scikit-learn.org/stable/

Python• TensorFlow

Python

Python

IPython

Jupyter, …

IDESpyder, Rodeo

Visual Studio, PyCharm, PyDev

• GUI IDLE

OK

• IPython

• Anaconda

• pip

• Jupyter Notebook

• Python

• IPython NotebookPython

• Apache Zeppelin http://zeppelin.apache.org

IDE

• R RStudio

• IDE

• 2 Spyder Rodeo

Spyder

• Visual Studio

• Eclipse PyDev

• PyCharm

Part 2 :

Python

1 1.2 1000000L Python2

‘abc’ u’ ’ Python2

[1, 2, 3, ‘foo’, ‘bar’, ‘foo’]

(1, 2, 3, ‘foo’, ‘bar’, ‘foo’)

{‘k1’: ‘value1’, ‘k2’: ‘value2’}

set(1, 2, 3, ‘foo’, ‘bar’)

• split

s = ‘foo, bar, baz’

items = s.split(‘,’)

print items[0]

print items[-1]

print items[0][-2:]

• list comprehension

• dictionary comprehension

• lambda map, reduce, filter

sList = [‘foo’, ‘bar’, ‘baz’]

lList = [len(s) for s in sList]

lList = map(lambda s:len(s), sList)

lDict = {s:len(s) for s in sList}

Pandas• Pandas

matplotlib / seaborn

• NumPySciPy

Python

• Pandas + matplotlibOK Pandas NumPy

NumPy / SciPy

Pandas• Pandas

DataFrame

• R

• RDB2

• index Series Columns

Columns

Series Series SeriesIndex

Pandas I/O• CSV JSON RDB Excel

• column

• RDB

import pandas as pd

pd.read_csv(<filename>)

pd.read_json(<filename>)

pd.to_csv(<filename>)

pd.to_excel(<filename>)

#

pd.to_clipboard()

• http://sinhrks.hatenablog.com/entry/2015/01/28/073327

0 1

import pandas as pddf[‘nValue’] = df[‘value’] / sum(df[‘value’])

id value color

sapporo 43 red

osaka 42 pink

matsumoto 40 green

id value color nValue

sapporo 43 red 0.344

osaka 42 pink 0.336

matsumoto 40 green 0.32

Python

Spark - PySpark DataFrame API

Python

• Spark PySparkfindSpark

Spark

• Python Spark APIDataFrame API

• Spark PandasSpark

PySpark

Sparknode

Sparknode

Sparknode

Sparknode

driver

matplotlib / seaborn

• Python NumPy / Pandas

• Jupyter NotebookSpyder

Questions ?

Recommended