65
iPython & Jupyter: 4 fun & prot Лев Тонких

«iPython & Jupyter: 4 fun & profit», Лев Тонких, Rambler&Co

Embed Size (px)

Citation preview

iPython & Jupyter: 4 fun & pro�tЛев Тонких

Наши проекты

Лучшее 2015

«Афиша-Рестораны» и «Рамблер.Почта» вошли в список25 лучших приложений 2015 года для iOS по версииредакции AppStore, официального онлайн-магазина Apple.У «Афиши-Рестораны» — 15-е место в рейтинге,«Рамблер.Почта» — на 24-м.

http://lenta.ru/news/2015/12/09/rambler/(http://lenta.ru/news/2015/12/09/rambler/)

Контакты

В группе компаний Rambler&Co всегда есть открытые вакансии для тех, кто хочет профессионально расти и развиваться, занимаясь тем, что по-настоящему нравится

[email protected]

& www.rambler-co.ru/jobs (www.rambler-co.ru/jobs)

В чем разница между IPython & Jupyter?

IPython2001

Все в консоли, отображение графиков в новом окне и т.д.

IPython Notebook pre 1.018 декабря 2011

8 августа 2013 - IPython 1.0

1 апреля 2014 - IPython 2.0

Появились виджеты

27 февраля 2015 - IPython 3.0

- 54IPython kernels for other languages (https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages)

ПроблемаВ названии IPython

Это уже не только Python

Jupyter Notebook

И еще одна проблема...

Один большой монолитный репозиторий

Куда класть исходники - Григорий ПетровКуда класть исходники-Григорий Петров

Jupyter 4.0 "BigSplit"

ipythonipykernelipywidgetnotebookjupyter_coreetc.

8 января 2016 - Jupyter 4.1

Первый релиз после BigSplitБудут частыми smaller releases

Multi-cells

Выделение Shift-Up / Shift-DownMerge Shift-M

Command palette

Cmd-Shift-P / Ctrl-Shift-P

Restart kernel and run-all

Find & Replace

Hotkey - F

In [ ]: ! jupyter notebook

Немного магииIn [14]: %lsmagic

Out[14]: Available line magics: %alias %alias_magic %autocall %automagic %autosave %bookmark %cat %cd %clear %colors %config %connect_info %cp %debug %dhist %dirs %doctest_mode %ed %edit %env %gui %hist %history %install_default_config %install_ext %install_profiles %killbgscripts %ldir %less %lf %lk %ll %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %lx %macro %magic %man %matplotlib %mkdir %more %mv %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %popd %pprint %precision %profile %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %rep %rerun %reset %reset_selective %rm %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode

Available cell magics: %%! %%HTML %%SVG %%bash %%capture %%debug %%file %%html %%javascript %%latex %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

In [15]: %quickref

pwd

Current working directory path

In [16]: %pwd

Out[16]: '/Users/l.tonkikh/ipython'

What in inside?

In [17]: import os os.getcwd(), os.path.realpath('.') # or smth else?

Out[17]: ('/Users/l.tonkikh/ipython', '/Users/l.tonkikh/ipython')

Docstrings & examples

In [18]: %pinfo %pwd

In [19]: from IPython.core.magics.osm import OSMagics magic = OSMagics() magic.pwd()

Out[19]: '/Users/l.tonkikh/ipython'

In [20]: %psource magic.pwd

In [21]: %psource %psource

In [22]: %psource %pwd

env

Virtual Environment

In [ ]: %env

In [24]: env = %env %page env

True or False?

In [1]: env = %env %set_env lol lol q = 'lol' in env.keys()

env: lol=lol

In [2]: q

Out[2]: False

In [27]: # %mkdir test # %cd test

In [ ]: # %load test.py test.py

print('Hello world')

In [31]: %pycat test.py

Shell capture

In [32]: %sc files = ls

In [33]: files

Out[33]: 'Untitled.ipynb\ndocs\nenv\nhtml\nimg\nipython.ipynb\nipython.slides.html\nipython_log.py\nnot_track\noverwritten.py\nrequirements.txt\ntest.png\ntest.py\n'

In [34]: ! ls

Untitled.ipynb env img ipython.slides.html not_trackdocs html ipython.ipynb ipython_log.py overwritten.py test.png

In [35]: files = ! ls files, files.n, files.s

Out[35]: (['Untitled.ipynb', 'docs', 'env', 'html', 'img', 'ipython.ipynb', 'ipython.slides.html', 'ipython_log.py', 'not_track', 'overwritten.py', 'requirements.txt', 'test.png', 'test.py'], 'Untitled.ipynb\ndocs\nenv\nhtml\nimg\nipython.ipynb\nipython.slides.html\nipython_log.py\nnot_track\noverwritten.py\nrequirements.txt\ntest.png\ntest.py', 'Untitled.ipynb docs env html img ipython.ipynb ipython.slides.html ipython_log.py not_track overwritten.py requirements.txt test.png test.py')

Shell execute

In [36]: files = %sx ls # == %sc -l files = l files # files.n # files.s

Out[36]: ['Untitled.ipynb', 'docs', 'env', 'html', 'img', 'ipython.ipynb', 'ipython.slides.html', 'ipython_log.py', 'not_track', 'overwritten.py', 'requirements.txt', 'test.png', 'test.py']

In [37]: !! ls

Out[37]: ['Untitled.ipynb', 'docs', 'env', 'html', 'img', 'ipython.ipynb', 'ipython.slides.html', 'ipython_log.py', 'not_track', 'overwritten.py', 'requirements.txt', 'test.png', 'test.py']

One more example

In [38]: ! pip freeze | grep ipython

ipython==4.0.3 ipython-genutils==0.1.0

In [39]: !! pip freeze | grep ipython

Out[39]: ['ipython==4.0.3', 'ipython-genutils==0.1.0']

Bash

In [40]: %%bash for i in {1..3}; do echo "$i" done

1 2 3

Who

In [41]: %who

OSMagics a env files magic os q

In [42]: %who_ls

Out[42]: ['OSMagics', 'a', 'env', 'files', 'magic', 'os', 'q']

In [43]: %who_ls dict

Out[43]: ['env']

In [44]: %whos

Variable Type Data/Info ------------------------------------- OSMagics MetaHasTraits <class 'IPython.core.magics.osm.OSMagics'> a SList ['You are using pip versi<...>ipython-genutils==0.1.0']env dict n=39 files SList ['Untitled.ipynb', 'docs'<...>', 'test.png', 'test.py']magic OSMagics <IPython.core.magics.osm.<...>cs object at 0x106353160>os module <module 'os' from '/Libra<...>3.5/lib/python3.5/os.py'>q bool True

psearchПоиск по имени переменной

In [63]: a1 = 1 a2 = 'a2'

In [64]: %psearch a*

In [66]: %psearch -e builtin a* # a1 # a2

In [67]: %psearch -e builtin a* int

LoggingIn [69]: %logstate

Logging has not been activated.

In [70]: %logstart

Activating auto-logging. Current session state plus future input saved. Filename : ipython_log.py Mode : rotate Output logging : False Raw input log : False Timestamping : False State : active

In [71]: %logoff

Switching logging OFF

In [72]: %logon

Switching logging ON

In [73]: %logstop

In [ ]: %cat ipython_log.py

Python2 & Python3In [69]: %%html

<iframe scrolling="no" style="border:none;" width="640" height="330" src="http://www.google.com/trends/fetchComponent?hl=en-US&q=python+2,+python+3&cmpt=q&tz=Etc/GMT-3&tz=Etc/GMT-3&content=1&cid=TIMESERIES_GRAPH_0&export=5&w=640&h=330"></iframe>

Interest over time. Web Search. Worldwide, 2004 - present.

View full report in Google Trends

python 2 python 3

2005 2007 2009 2011 2013 2015

In [70]: %%python2 import sys print(sys.version)

Couldn't find program: 'python2'

In [71]: %%python3 import sys print(sys.version)

3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

In [6]: %%perl

print(sqrt(4))

2

In [3]: %%ruby

include Math

puts sqrt(4)

2.0

Write & Sharing

In [254]: %%writefile -a overwritten.py a = 'Hi' # You are in Good Company

Writing overwritten.py

In [255]: %pastebin -d 'test file' overwritten.py

Out[255]: 'https://gist.github.com/baaa3be4614b1c26d6f1'

TimeВремя выполнения

In [279]: %time x = sum(range(10000))

CPU times: user 365 µs, sys: 2 µs, total: 367 µs Wall time: 370 µs

In [278]: %timeit x = sum(range(10000))

1000 loops, best of 3: 208 µs per loop

In [42]: %%timeit -n 1000 x = range(10000) max(x)

1000 loops, best of 3: 320 µs per loop

LaTeXIn [3]: %%latex

\begin{equation} \label{eq:normal_dist} \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)2}{2\sigma2}\right) \end{equation}

exp(+ )1σ 2π‾‾‾√

(x + μ)2

2σ2 (1)

In [323]: %%latex \begin{align} \nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} \nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\ \nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t\nabla \cdot \vec{\mathbf{B}} & = 0 \end{align}

× +B  1c�E 

�t Þ E 

× +E  1c�B 

�t Þ B 

=4πc

j  

= 4πρ

= 0 

= 0

(2)

(3)

(4)

(5)

Demos

Куда спрятаться от ядерного взрыва?

In [32]: import os import random from ipywidgets import interact, interactive, fixed import ipywidgets as widgets from IPython.display import display, IFrame

In [33]: cities = { 'Washington': (38.890366, -77.031955), 'New York': (40.714545, -74.007112), 'Los Angeles': (34.053485, -118.245313), 'Las Vegas': (36.171906, -115.139963), }

In [34]: city = widgets.Dropdown( options=cities, value=random.choice(list(cities.values())), description='Choose city:', )

display(city)

In [35]: button = widgets.ToggleButton( description='Merry Christmas!', tooltip='123', value=False, )

In [36]: button

In [36]: import folium my_map = folium.Map(location=city.value, zoom_start=12) my_map.circle_marker(location=city.value, radius=1900, popup='Laurelhurst Park', line_color='#3186cc', fill_color='#3186cc') my_map.simple_marker(city.value, popup='Merry Christmas here', marker_color='red', marker_icon

my_map.create_map(path='html/map.html')

# L.control.scale().addTo(map);

In [38]: IFrame(src='html/map.html', width=1000, height=350)

Out[38]:

+

-

Leaflet (http://leafletjs.com) | Map data (c) OpenStreetMap (http://openstreetmap.org) contributors

Github Commits

Introducing the New GitHub Graphs - April 25, 2012 link (https://github.com/blog/1093-introducing-the-new-github-graphs)

In [4]: from github import Github import matplotlib.pyplot as plt import seaborn as sns import pandas as pd %matplotlib inline

In [3]: GITHUB_USER = os.getenv('GITHUB_USER') GITHUB_TOKEN = os.getenv('GITHUB_TOKEN') REPO_PATH = 'python/cpython'

In [8]: g = Github(GITHUB_USER, GITHUB_TOKEN) repo = g.get_repo(REPO_PATH) df = pd.DataFrame({'date': pd.Series((i.commit.committer.date for i in repo.get_commits()))})stats = df.groupby(pd.Grouper(key='date', freq='M')).size()

In [35]: plt.figure(figsize=(16,4)) stats.plot(title='Stats of commits', label='Commits', legend=True) plt.xlabel('Date'); plt.ylabel('Commits') plt.suptitle(REPO_PATH, y=1.05, fontsize=14, fontweight='bold')

Out[35]: <matplotlib.text.Text at 0x1789e32b0>

ipyparallelPowerful architecture for parallel and distributed computing

Single program, multiple data (SPMD) parallelism.Multiple program, multiple data (MPMD) parallelism.Message passing using MPI.Task farming.Data parallel.Combinations of these approaches.Custom user de�ned approaches

Engine

IPython instance, that takes commandshandle incoming and outgoing Python objects

Controller

provide an interface for working with a set of engines (Direct or LoadBalanced)collection of processes to which IPython engines and clients can connectController = Hub + Schedulers

Hub

Center of a ClusterProcess that keeps track of engine connections, schedulers, clients

Schedulers

All actions that can be performed on the engine go through a Schedulerprovide a fully asynchronous interface to a set of engines

In [ ]: ! ipcluster start -n 4

In [3]: %%bash python not_track/habraparse.py save_favs_list stleon not_track/test.txt python not_track/habraparse.py save_favs_list --gt stleon not_track/test1.txt python not_track/habraparse.py save_favs_list --mm stleon not_track/test2.txt cat not_track/test1.txt not_track/test2.txt >> not_track/test.txt

In [47]: from ipyparallel import Client, require

client = Client() dview = client[:]

In [48]: dview

Out[48]: <DirectView [0, 1, 2, 3]>

In [49]: links = (link.strip() for link in open('not_track/test.txt'))

In [50]: dview.scatter('links', list(links))

Out[50]: <AsyncResult: finished>

In [51]: len(dview['links'])

Out[51]: 4

In [52]: @dview.remote(block=False) @require('requests', 'bs4') def tag_maker(): tags = {} for link in links: soup = bs4.BeautifulSoup(requests.get(link).text, 'html.parser') for i in soup.findAll("a", rel="tag"): tag = i.string.lower() tags[tag] = 1 + tags.get(tag, 0) return tags

In [53]: tags = tag_maker().result

In [1]: # tags

In [55]: from collections import Counter from functools import reduce from operator import add

new_tags = dict(reduce(add, (Counter(tag) for tag in tags)))

In [56]: len(new_tags.keys())

Out[56]: 827

In [ ]: from wordcloud import WordCloud

wordcloud = WordCloud(width=1920, height=1080, scale=1, font_path='/Library/Fonts/Verdana.ttf', max_words=len(new_tags.keys()))

wordcloud.generate_from_frequencies(new_tags.items())

In [ ]: plt.imshow(wordcloud) plt.axis("off") plt.show() wordcloud.to_file('test.png')

In [ ]: ! ipcluster stop

nbconvert

https://github.com/jupyter/nbconvert (https://github.com/jupyter/nbconvert)

http://nbconvert.readthedocs.org/en/latest/usage.html(http://nbconvert.readthedocs.org/en/latest/usage.html)

In [ ]: ! jupyter nbconvert --to slides ipython.ipynb --post serve

Linkshttps://github.com/stleon/ipython-slides (https://github.com/stleon/ipython-slides)https://talkpython.fm/episodes/show/44/project-jupyter-and-ipython(https://talkpython.fm/episodes/show/44/project-jupyter-and-ipython)http://blog.jupyter.org/2016/01/08/notebook-4-1-release/(http://blog.jupyter.org/2016/01/08/notebook-4-1-release/)https://jupyter.readthedocs.org/en/latest/(https://jupyter.readthedocs.org/en/latest/)https://folium.readthedocs.org/en/latest/(https://folium.readthedocs.org/en/latest/)http://matplotlib.org (http://matplotlib.org)http://stanford.edu/~mwaskom/software/seaborn/(http://stanford.edu/~mwaskom/software/seaborn/)https://github.com/PyGithub/PyGithub (https://github.com/PyGithub/PyGithub)http://ipyparallel.readthedocs.org/en/latest/index.html(http://ipyparallel.readthedocs.org/en/latest/index.html)

Спасибо!