Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
진행 순서
단계 주요 내용
Python 기초 Python 개요와 설치
변수, 문장, 조건문과 Loop
함수, Module과 프로그램, 예제 프로그램
Python 프로그래밍 (1)
String
List/Dictionary/Set
Module과 Package
Python 프로그래밍 (2)
File & I/O
OOP
Exception 처리
Python 프로그래밍 (3)
Regular Expression, 데이터베이스 활용
Standard Library, 기타의 유용한 기능
Python을 활용한 데이터분석 (1)
Python 활용 데이터분석 (1)
Python 활용 데이터분석 (2)
Python을 활용한 데이터분석 (2)
Python 활용 데이터분석 (3)
Python과 빅데이터
마무리 빅데이터분석교육(2015-11)
I-1. 서론
• 일반론
• 설치
• Python 프로그램 맛보기
빅데이터분석교육(2015-11)
일반론
• 배경
• 특징
• 종류
• 본 강의의 범위
빅데이터분석교육(2015-11)
배경
• 1990년대 후반 Guido van Rossum이 개발 (네덜란드)
• 오픈소스 cross-platform – General Public License (GPL)라이선스)
• ABC, Modula-3, C, C++, Algol-68, SmallTalk, Unix shell 등을 기반으로 개발
빅데이터분석교육(2015-11)
Python의 특징 (1)
• High-level 언어
• Interpreted 언어
• Interactive 언어
• 객체지향 언어
• Scripting 언어
빅데이터분석교육(2015-11)
Python의 특징 (2)
• Easy-to-learn
• Easy-to-read:
• Easy-to-maintain:
• 수학적 개념
• 다른 언어와의 비교 – C, C++
– Java
– R
빅데이터분석교육(2015-11)
Python의 종류
• “Python” or “CPython” written in C/C++ – Version 2.7 ; in mid-2010
– Version 3.1.2 ; in early 2010
• “Jython” ; Java for the JVM
• “IronPython” ; in C# for the .Net environment
• 분야별 Package 활용
빅데이터분석교육(2015-11)
개발환경
• Python Interactive Shell – IDLE
• 기타 – 1. PyDev with Eclipse – 2. Komodo – 3. Emacs – 4. Vim – 5. TextMate – 6. Gedit – …
• iPython
빅데이터분석교육(2015-11)
Python 이용방법
• (1) 대화식 (Interactive Interpreter):
• (2) 명령어 줄을 통한 Script 프로그램의 활용
• http://www.python.org/
• https://docs.python.org/3.5/
• https://docs.python.org/2/tutorial/
빅데이터분석교육(2015-11)
본 강의의 범위
• 다루는 범위 – Python 프로그래밍 언어 기초
• 함수, String 처리, OOP, RE, Database 활용 등 전반 • Numpy 등 데이터 분석용 package
– 활용 • 데이터분석 (Numpy, Pandas, … 등) • 기계학습과 텍스트 분석
– 개발환경: • Version 3.4 기본 (일부 2.7) • MS Windows 환경 (기본 IDLE 설치)
• 제외되는 항목 – GUI – 통신/Web/Hacking/Unicode 등
빅데이터분석교육(2015-11)
설치
• Python 기본설치
– MS Windows 용 설치
• Eclipse with pydev
• 데이터분석 용 패키지 설치
– (해당 시간)
빅데이터분석교육(2015-11)
Python 기본설치 – 실습
• www.python.org 에서 다운로드
• 실행
– 명령어 줄에서의 python
– Python IDLE (Integrated Development Environment)
– (Linux)
– #!/usr/bin/python
빅데이터분석교육(2015-11)
Eclipse with pydev
• Java 설치 – Java for developers > JDK
• Eclipse – 다운로드 > 실행 – 경로변경
• JAVA_HOME만들고 • PATH 맨 앞
에: %JAVA_HOME%\bin;
– 확인: cmd > java
• pydev – Help > Install New S/W >
pydev (http://pydev.org/updates)
– Preferences – Perspectives
• win> pref > – Run > debug > Always launch … – Gen > Contents type > … UTF-8 – Pydev > editor > ..
빅데이터분석교육(2015-11)
맛보기
• Python 이용 맛보기
빅데이터분석교육(2015-11)
맛보기 – 실습
• 숫자처리 – Numbers & Expressions
• >>> 2+2
• >>> 1.0 / 2.0
• >>> 1/2
– 8진수, 16진수
• >>> 0xAF (16진수)
• >>> 100000000000000000000
• 프로그램 – >>> print(“Hello World!”)
– Comment의 활용
– #
빅데이터분석교육(2015-11)
I-2. Python 프로그램 기초
• 변수
• 데이터타입
• 문장 (statements)
• Module
• String
• 프로그램 저장과 실행
• Class와 library
• 예제프로그램 – Python Turtle Graphics
빅데이터분석교육(2015-11)
데이터 타입과 변수
• 기본 (built-in) 데이터타입 – 숫자
• Integers
• Floats
• Complex numbers
• Booleans
– List, Tuple, • List : 일종의 array + α
• Tuple: Immutable
– String
– Dictionary
– Sets
– File Objects
– # 데이터 타입
>>> type(20)
<class 'int'>
>>> type("17")
<class 'str'>
>>> type("3.2")
<class 'str'>
>>> type('this is a string')
<class 'str'>
>>> type("""and this""")
<class 'str'>
>>> print('''"Oh no", she said''')
"Oh no", she said
빅데이터분석교육(2015-11)
• # 숫자
• >>> 2+2 • 4 • >>> 1/2 • 0.5 • >>> 10/3 • 3.3333333333333335 • >>> 2**3 • 8 • >>> (-3)**4 • 81 • >>> 0xAF • 175
빅데이터분석교육(2015-11)
• # List
>>> []
[]
>>> [1]
[1]
>>> [1,2,3,4]
[1, 2, 3, 4]
>>> [1, "two", 3, 4.0, ["a","b"],(5,6)]
[1, 'two', 3, 4.0, ['a', 'b'], (5, 6)]
>>> x = ["first","second","third","fourth"]
>>> x[0]
'first'
• >>> x[2]
• 'third'
• >>> x[-2]
• 'third'
• >>> x[1:-1]
• ['second', 'third']
• >>> x[:3]
• ['first', 'second', 'third']
• >>> x[-2:]
• ['third', 'fourth']
빅데이터분석교육(2015-11)
• # Tuple
• >>> ()
• ()
• >>> (1,)
• (1,)
• >>> (1,2,3,4)
• (1, 2, 3, 4)
• >>> (1,2,"three",["a","b"], (5,6))
• (1, 2, 'three', ['a', 'b'], (5, 6))
빅데이터분석교육(2015-11)
– # 문자열
>>> 'Let's go'
SyntaxError: invalid syntax
>>> 'Let\'s go!'
"Let's go!"
>>> "\"Hello, world\" she said"
'"Hello, world" she said'
>>> '"Hello, world" she said'
'"Hello, world" she said'
– # 문자열 연결 (concatenation)
>>> "Let's say " '"Hello, world"'
'Let\'s say "Hello, world"'
>>> "Hello, " + "world!"
'Hello, world!'
>>> x = "Hello, "
>>> y = "world!"
>>> x + y
'Hello, world!'
빅데이터분석교육(2015-11)
• Long string – “”” ~~ “”” 또는 ‘’’ ~~~ ‘’’
>>> ''' Thsi is very long string, It continues here. And it's not over yet. "Hello world!" still here .''' ' This is very long string,\nIt continues here.\nAnd it\'s not over yet.\n' >>> print("hello \ world and again \ you too") hello world and again you too
빅데이터분석교육(2015-11)
• Unicode • # 3.4 에서
>>> import sys >>> sys.getdefaultencoding() 'utf-8' >>> type('파이썬') <class 'str'> >>> type(u'파이썬') <class 'str'> >>> s='파이썬' >>> u=u'파이썬' >>> s==u True
• # 2.6에서
>>> type(u'파이썬') <type 'unicode'> >>> type('파이썬') <type 'str'> >>> s='파이썬' >>> u=u'파이썬' >>> s==u
빅데이터분석교육(2015-11)
• # Dictionary • >>> x = {1:"one", 2:"two"} • >>> x[1] • 'one' • >>> x[2] • 'two' • >>> x.get(4, "not available") • 'not available‘
• # sets • >>> x =set([1,2,3,1,3,5]) • >>> x • {1, 2, 3, 5} • >>> 1 in x • True • >>> 4 in x • False
빅데이터분석교육(2015-11)
• 변수 – >>> x =3 – >>> x *2 – 6 – >>> message = "What's up?" – >>> message – "What's up?“ – >>> day="Monday" – >>> day – 'Monday' – >>> day="Friday" – >>> day – 'Friday' – >>> day=21 – >>> day – 21
• 변수명과 keyword – Keyword는 사용하지 말 것 – __ 사용가능하나 – 맨 앞에 나오면 특수한 의미
• 보통 – PEP 규칙 – Class는 대문자로 시작 – 함수, Method 등은 camel type – Constant 는 모두 대문자 – 기타변수는 소문자
• Python Style Guide – http://legacy.python.org/dev/p
eps/pep-0008/
빅데이터분석교육(2015-11)
문장 (statements)
• 기본 >>> 2+2 4 >>> print(2*2) 4 >>>
• 사용자로부터의 입력 >>> input("How old are you? ") How old are you? 35 '35‘ >>> x = input("x: ") x: 30 >>> y=input("y: ") y: 25 >>> x+y '3025‘ >>> int(x)+int(y) 55
>>> response = input("What is the
radius? ") What is the radius? 4 >>> r = float(response) >>> area = 3.14159 * r ** 2 >>> print("The area is: ", area) The area is: 50.26544
• 작업순서 (precedence) – (애매하면 괄호로 묶을 것)
빅데이터분석교육(2015-11)
함수
• 개요
• 예제
>>> 2**3
8
>>> pow(2,3)
8
>>> 10+pow(2, 5) / 5.0
16.4
>>> abs(-10)
10
>>> int(3.14)
3
>>> int(3.99)
3
>>> int(-3.99)
-3
>>> int("23 bottles")
Traceback (most recent call last):
…
빅데이터분석교육(2015-11)
Module
• Module은 확장기능(을 담은 파일)
– >>> import math
– >>> math.floor(32.9)
– 32
– >>>
– >>> from math import sqrt
– >>> sqrt(9)
– 3.0
– >>>
>>> from math import sqrt
>>> sqrt(-1)
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
sqrt(-1)
ValueError: math domain error
>>>
>>> import cmath
>>> cmath.sqrt(-1)
1j
>>> (1+3j) * (9+4j)
(-3+31j)
빅데이터분석교육(2015-11)
프로그램의 저장과 실행
• IDLE 활용
• 명령어 줄에서 이용
빅데이터분석교육(2015-11)
Class와 Library
• OOP란?
• Class와 Instance
• Python 프로그래밍
– Procedural
– OOP
• Library
• Standard Library
빅데이터분석교육(2015-11)
Python 환경변수
변수 설명
PYTHONPATH PATH와 같은 역할.
PYTHONSTARTUP Interpreter 실행 때마다 동작하는 초기화 작업 – Unix .profile 또는 .login 파일과 유사
PYTHONCASEOK Windows 에서 대소문자 구별 여부를 지정
PYTHONHOME PYTHONSTARTUP or PYTHONPATH 디렉토리에 내장
빅데이터분석교육(2015-11)
Python Turtle Graphics
# turtle_test1.py
import turtle
wn = turtle.Screen()
alex = turtle.Turtle()
alex.forward(150)
alex.left(90)
alex.forward(75)
# turtle_test2.py
import turtle
wn = turtle.Screen()
wn.bgcolor("lightgreen")
tess = turtle.Turtle()
tess.color("blue")
tess.pensize(3)
tess.forward(50)
tess.left(120)
tess.forward(50)
wn.exitonclick() 빅데이터분석교육(2015-11)
# turtle_test3.py import turtle wn = turtle.Screen() wn.bgcolor("lightgreen") tess = turtle.Turtle() tess.color("hotpink") tess.pensize(5) alex = turtle.Turtle() tess.forward(80) tess.left(120) tess.forward(80) tess.left(120) tess.forward(80) tess.left(120)
tess.right(180) tess.forward(80) alex.forward(50) alex.left(90) alex.forward(50) alex.left(90) alex.forward(50) alex.left(90) alex.forward(50) alex.left(90) wn.exitonclick()
빅데이터분석교육(2015-11)
# turtle_test4.py
import turtle
wn = turtle.Screen()
alex = turtle.Turtle()
for i in [0, 1, 2, 3]: # repeat 4
alex.forward(50)
alex.left(90)
wn.exitonclick()
# turtle_test5.py
import turtle
wn = turtle.Screen()
alex = turtle.Turtle()
for aColor in ["yellow", "red", "purple", "blue"]:
alex.color(aColor)
alex.forward(50)
alex.left(90)
wn.exitonclick()
빅데이터분석교육(2015-11)
# turtle_test5.py import turtle wn = turtle.Screen() wn.bgcolor("lightgreen") tess = turtle.Turtle() tess.color("blue") tess.shape("turtle") print(range(5, 60, 2)) tess.up() for size in range(5, 60, 2): tess.stamp() tess.forward(size) tess.right(24) wn.exitonclick()
• up() ; – stops all drawing. Until down is
called, nothing will be drawn to the screen. Cursor movement will still take effect, however.
• stamp() – leave an impression on the canvas
• https://docs.python.org/2/library/turtle.html
빅데이터분석교육(2015-11)
II. Python 프로그래밍 (1)
• List와 Tuple
• String 처리
• Dictionary
• 조건문과 Loop문
빅데이터분석교육(2015-11)
개요
• 데이터 구조 – 데이터를 일정한 기준에 의해 모아 놓은 것
(collection of data elements, structured in some way) – (종류) Built-in + 확장 – Container type의 데이터구조
• Sequence = mapping by element position (위치값 즉, index) • Mapping = mapping by element name (즉, key)
• Built-in Sequence – List – Tuples – String – Buffer objects – Xrange objects
빅데이터분석교육(2015-11)
• Sequence에 대한 주요 작업 – Indexing
– Slicing
– Adding
– Multiplying
– Membership 확인
빅데이터분석교육(2015-11)
List
• 개념
• 특징: – “Mutable!”
• Indexing
• Slicing
• Adding, Multiplying, Membership
• List관련 함수
• List methods – Method = object에 관련된 함수
(a function tightly coupled to some object)
– object.method(arguments)
빅데이터분석교육(2015-11)
(Sequence 공통의) List 작업
• Indexing
– Sequence 내의 element에 대해 순서 별로 index를 매겨서 각각 따로 이용하는 것
>>> greeting = 'Hello'
>>> greeting[0]
'H'
>>> greeting[-1]
'o'
>>> fourth = input('Year: ')[3]
Year: 2014
>>> fourth
'4‘
>>>fourth = input("year: ")[2:3]
• # II01_index.py (뒷면)
빅데이터분석교육(2015-11)
# Program: II01_index.py # Print out a date, given y, m, & d as no months = [ 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December' ]
# A list ending for 1 to 31 endings = ['st', 'nd', 'rd'] + 17 * ['th'] \ + ['st', 'nd', 'rd'] + 7 * ['th'] \ + ['st'] year = input('Year: ') month = input('Month (1-12): ') day = input('Day (1-31): ') month_number = int(month) day_number = int(day) # Remember to subtract 1 month_name = months[month_number-1] ordinal = day + endings[day_number-1] print (month_name + ' ' + ordinal + ', ' + year)
빅데이터분석교육(2015-11)
• Slicing
– 일정 범위 내의 element를 access하는 것
– 주의 [index1, index2] – index1은 inclusive, index2는 exclusive
>>> tag = '<a href="http://www.python.org">Python web site </a>'
>>> tag[9:30]
'http://www.python.org'
>>> tag[32:-4]
url = input('Please enter the URL: ')
domain = url[:-4]
print("Domain name: " + domain)
빅데이터분석교육(2015-11)
# copy >>> numbers=[1,2,3,4,5,6,7,8] >>> numbers[2:5] [3, 4, 5] >>> numbers=[1,2,3,4,5,6,7,8] >>> numbers[:] [1, 2, 3, 4, 5, 6, 7, 8] >>> numbers2 = numbers[:] >>> numbers2 [1, 2, 3, 4, 5, 6, 7, 8] >>> numbers [1, 2, 3, 4, 5, 6, 7, 8]
# step index의 이용 >>> numbers[0:10:1] [0, 10, 1] >>> numbers=[1,2,3,4,5,6,7,8] >>> numbers[0:10:1] [1, 2, 3, 4, 5, 6, 7, 8] >>> numbers[0:10:2] [1, 3, 5, 7] >>> numbers[3:6:3] [4] >>> numbers[::2] [1, 3, 5, 7] >>> numbers[8:0:-2] [8, 6, 4, 2] >>> numbers[::-3] [8, 5, 2] >>> numbers[:5:-2] [8]
빅데이터분석교육(2015-11)
• Sequence의 추가 – 단, 같은 type일 것
>>> [1,2,3]+[4,5,6] [1, 2, 3, 4, 5, 6] >>> >>> 'Hello ' + "world!" 'Hello world!‘ >>> 'Hello ' + "world!" 'Hello world!' >>> >>> 'Hello ' + '2014' 'Hello 2014' >>> 'Hello ' + 2014 Traceback (most recent call last): File "<pyshell#53>", line 1, in <module> 'Hello ' + 2014 TypeError: Can't convert 'int' object to str implicitly >>> 'Hello ' + ''2014' SyntaxError: invalid syntax
• Multiplication >>> 'python' * 3 'pythonpythonpython' >>> 'python is a ' *3 'python is a python is a python is a ‘ None, Empty List, List의 초기화 = 일단 공간을 확보하는 차원 >>> sequence = [None] * 10 >>> sequence [None, None, None, None, None, None, None, None, None, None] # Program: II01_multiply.py
빅데이터분석교육(2015-11)
# Program: II01_multiply.py # Prints a sentence in a centered "box" sentence = input("Sentence: ") screen_width = 80 text_width = len(sentence) box_width = text_width + 6 left_margin = (screen_width - box_width) // 2 print() print (' ' * left_margin + '+' + '-' * (box_width-2) + '+') print (' ' * left_margin + '| ' + ' ' * text_width + ' |') print (' ' * left_margin + '| ' + sentence + ' |') print (' ' * left_margin + '| ' + ' ' * text_width + ' |') print (' ' * left_margin + '+' + '-' * (box_width-2) + '+') print ()
빅데이터분석교육(2015-11)
• Membership >>> permission = 'rw' >>> 'w' in permission True >>> 'x' in permission False # spam filter에 이용 >>> subject = '$$$ Get rich now!! $$$' >>> '$$$' in subject True >>> users = ['hky','foo','bar'] >>> input('Enter your name: ') in users Enter your name: hky True
# II01_membership.py # Check a user name & PIN code database = [ ['albert', '1234'], ['dilbert', '4242'], ['smith', '7524'], ['jones', '9843'] ] username = input('User name: ') pin = input('PIN code: ') if [username, pin] in database: print('Access granted')
빅데이터분석교육(2015-11)
• Length, Minimum, Maximum >>> numbers = [100,34,567]
>>> len(numbers)
3
>>> max(numbers)
567
>>> min(numbers)
34
>>> min(3,5,7,2)
2
빅데이터분석교육(2015-11)
List() 함수
>>> list('Hello')
['H', 'e', 'l', 'l', 'o']
>>> a = list('Hello')
>>> a
['H', 'e', 'l', 'l', 'o']
>>> ''.join(a)
'Hello'
빅데이터분석교육(2015-11)
List에 대한 작업
• 개요 – Indexing, Slicing, Concatenating,
multiplying – 여기에 추가해서 …
• 변경작업: Item Assignment >>> x =[1,1,1] >>> x[1] 1 >>> x[1] =2 >>> x [1, 2, 1]
– 단, 유의할 점 ( 다음 면 볼 것)
>>> x[20] = 5 Traceback (most recent call last): File "<pyshell#90>", line 1, in <module> x[20] = 5 IndexError: list assignment index out of range >>> x[20] = 'None' Traceback (most recent call last): File "<pyshell#91>", line 1, in <module> x[20] = 'None' IndexError: list assignment index out of range >>> x = [None] * 20 >>> x[20] =5 Traceback (most recent call last): File "<pyshell#93>", line 1, in <module> x[20] =5 IndexError: list assignment index out of range >>> x = [None] * 21 >>> x[20] = 5 >>> x [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 5]
빅데이터분석교육(2015-11)
– Del >>> names = ['Alice', 'Beth', 'Cecil','Dee-Dee','Earl'] >>> del names[2] >>> names ['Alice', 'Beth', 'Dee-Dee', 'Earl']
• Assigning to Slices >>> name = list('Perl') >>> name ['P', 'e', 'r', 'l'] >>> name[2:] = list('ar') >>> name ['P', 'e', 'a', 'r'] >>> name[2:] = list('rl') >>> name ['P', 'e', 'r', 'l'] >>> numbers = [1,5] >>> numbers[1:1] = [2,3,4] >>> numbers [1, 2, 3, 4, 5] >>> numbers[1:3] = [] >>> numbers [1, 4, 5] – # 사실상 del numbers[1:3] 과 같음
빅데이터분석교육(2015-11)
List Methods
• append()
– >>> lst = [1,2,3] # 변수명에 유의할 것
>>> lst.append(4)
>>> lst
[1, 2, 3, 4]
• count()
>>> ['to', 'be', 'or', 'not', 'to', 'be'].count('to')
2
>>> x =[[1,2],1,1,[2,1,[1,2]]]
>>> x.count(1)
2
>>> x.count([1,2])
1
빅데이터분석교육(2015-11)
• extend() >>> a = [1,2,3] >>> b=[4,5,6] >>> a.extend(b) >>> a [1, 2, 3, 4, 5, 6]
– # 주의: concatenation과 다름 -
extended sequence is modified, – # vs. ordinary concatenation: 완
전히 새로운 sequence가 반환됨
>>> a = [1,2,3] >>> b = [4,5,6] >>> a+b [1, 2, 3, 4, 5, 6] >>> a [1, 2, 3]
– # 굳이 필요하다면 >>> a = a+ b >>> a [1, 2, 3, 4, 5, 6] >>>
– # (Note) extend의 효과를 slicing
을 통해 거둘 수 있다. >>> a[len(a):] = b >>> a [1, 2, 3, 4, 5, 6, 4, 5, 6]
빅데이터분석교육(2015-11)
• index() – 처음 발견되는 항목의 index
>>> heroes = ['We', 'are', 'the', 'heroes', 'who', 'say', 'of', 'course!'] >>> heroes.index('who') 4 >>> heroes.index('knights') Traceback (most recent call last): File "<pyshell#147>", line 1, in <module> heroes.index('knights') ValueError: list.index(x): x not in list >>> heroes[4] 'who'
• insert() >>> numbers = [1,2,3,5,6,7] >>> numbers.insert(3,'four') >>> numbers [1, 2, 3, 'four', 5, 6, 7] # (index에서처럼) slice를 통해서도 insert 효과를 거둘 수 있다. >>> numbers[4:4] = ['five'] >>> numbers [1, 2, 3, 'four', 'five', 5, 6, 7]
• pop() >>> x = [1,2,3] >>> x.pop() 3 >>> x [1, 2] >>> x.pop(0) 1 >>> x [2]
빅데이터분석교육(2015-11)
• remove() >>> heroes ['We', 'are', 'the', 'heroes', 'who', 'say', 'of', 'course!'] >>> heros.remove('the') Traceback (most recent call last): File "<pyshell#176>", line 1, in <module> heros.remove('the') NameError: name 'heros' is not defined >>> heroes.remove('the') >>> heroes ['We', 'are', 'heroes', 'who', 'say', 'of', 'course!']
– (주의1) 맨 첫 번째 항목부터 삭제됨 – (주의2) “In-place change”이므로 되
돌이킬 수 없음.
• reverse() >>> y [1, 2, 3, 4] >>> y.reverse() >>> y [4, 3, 2, 1] >>> reversed(y) <list_reverseiterator object at 0x00000000035CB630> >>> y [4, 3, 2, 1] >>> list(reversed(y)) [1, 2, 3, 4]
빅데이터분석교육(2015-11)
• sort() – ; sort lists ‘in place’ >>> x=[4,6,2,1,7,9] >>> x.sort() >>> x [1, 2, 4, 6, 7, 9] – >>> # 정렬된 copy본을 이용하고자
할 때 주의!! >>> x=[4,6,2,1,7,9] >>> y = x.sort() >>> y >>> print(y) None – # 즉, sort는 x를 변경시키지만 반환값
은 없다 (nothing) – >>> # 이를 위해서는...
>>> x=[4,6,2,1,7,9] >>> y=x[:] >>> x [4, 6, 2, 1, 7, 9] >>> y [4, 6, 2, 1, 7, 9] >>> y.sort() >>> y [1, 2, 4, 6, 7, 9] – # 또는... >>> x=[4,6,2,1,7,9] >>> y=sorted(x) >>> x [4, 6, 2, 1, 7, 9] >>> y [1, 2, 4, 6, 7, 9] – # sorted()는 어떤 sequence에든 이용
가능, 단 항상 list를 반환함 >>> sorted('Python') ['P', 'h', 'n', 'o', 't', 'y']
빅데이터분석교육(2015-11)
– sort()의 option
>>> heroes = ['We', 'are', 'the', 'heroes', 'who', 'say', 'of', 'course!']
>>> heroes.sort(key=len)
>>> heroes
['We', 'of', 'are', 'the', 'who', 'say', 'heroes', 'course!']
>>> heroes.sort(key=len, reverse=True)
>>> heroes
['course!', 'heroes', 'are', 'the', 'who', 'say', 'We', 'of']
– # 고급의 sorting:
– https://wiki.python.org/moin/HowTo/Sorting
빅데이터분석교육(2015-11)
Tuples
• 개념 – Immutable sequence
– 구문: , 또는 ( , , )
• Operation
• Methods
• 주요 활용처 – Map에서의 key (cf. list는 key가 될 수 없음)
– 일부 내장함수에서의 반환값으로 이용
빅데이터분석교육(2015-11)
• 기본 사용방법 >>> 1,2,3 (1, 2, 3) >>> (1,2,3) (1, 2, 3) >>> () () >>> 42 42 >>> 42, (42,) >>> (42,) (42,) >>> 3 * (40+2) 126 >>> 3 * (40+2,) (42, 42, 42)
• tuple() 함수 >>> x = [1,2,3] >>> tuple(x) (1, 2, 3) >>> tuple('abc') ('a', 'b', 'c') >>> tuple((1,2,3)) (1, 2, 3) >>> y=1,2,3 >>> y[1] 2 >>> y (1, 2, 3) >>> y[1] 2 >>> y[0:2] (1, 2)
빅데이터분석교육(2015-11)
String 처리
• 개요
• 기초 – “~~” ‘~~’ “”” ~~””” ‘~~~’ – concatenation – input()
• Raw string • str()과 repr()
– str() ;human-readable, – repr() ; representations to be read by the interpreter – 같은 값이 반환될 때도 있음. 예: numbers, lists, dictionaries – 그러나, Strings, floating point numbers 등은 확연히 달라짐
• Formatting ; … 단, string module • Methods
빅데이터분석교육(2015-11)
• Raw string >>> print('Hello, \nworld!') Hello, world! >>> path = "c:\temp" >>> path 'c:\temp' >>> print(path) c: emp >>> print('c:\\temp') c:\temp >>> path = 'c:\\Program Files\\Python\\' >>> path 'c:\\Program Files\\Python\\' >>> print(path) c:\Program Files\Python\
>>> # >>> print(r'c:\Program Files\Python') c:\Program Files\Python >>> print(r'Let\'s go') Let\'s go >>> print(r'Let's go') SyntaxError: invalid syntax >>> print("Let's go") Let's go
빅데이터분석교육(2015-11)
• str()과 repr() >>> x=10 * 3.25 >>> y=200*200 >>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...' >>> print(s) The value of x is 32.5, and y is 40000... >>> # The repr() of a string adds string quotes and backslashes: >>> hello = 'Hello, world\n' >>> hellos = repr(hello) >>> hellos "'Hello, world\\n'" >>> print(hellos) 'Hello, world\n' >>> repr((x,y, ('ham','eggs'))) "(32.5, 40000, ('ham', 'eggs'))"
빅데이터분석교육(2015-11)
Formatted printing >>> for x in range(1,11): print(repr(x).rjust(2), repr(x*x).rjust(3), end='') print(repr(x*x*x).rjust(4)) 1 1 1 2 4 8 … 9 81 729 10 1001000 >>> >>> for x in range(1,11): print('{0:2d} {1:3d} {2:4d}'.format(x, x*x, x*x*x)) 1 1 1 2 4 8 … 9 81 729 10 100 1000
>>> for x in range(1,11): print(x, "\t", end='') 1 2 3 4 5 6 7 8 9 10 >>> for x in range(1,11): print(x, "\t") 1 2 3 … 8 9 10
빅데이터분석교육(2015-11)
String Methods
• find >>> title = "Monty Python's Flying Circus"
>>> title.find('Monty')
0
>>> title.find('Python')
6
>>> title.find('database')
-1
>>> subject ='$$$ Get rich now!!! $$$'
>>> subject.find('$$$')
0
>>> subject.find('$$$',1)
20
>>> subject.find('!!!',0,16)
-1
• join >>> seq = [1,2,3,4,5] >>> sep = '+' >>> seq.join(seq) Traceback (most recent call last): File "<pyshell#376>", line 1, in <module> seq.join(seq) AttributeError: 'list' object has no attribute 'join‘ >>> seq = ['1','2','3','4','5'] >>> sep.join(seq) '1+2+3+4+5' >>> dirs ='', 'usr','bin','env' >>> dirs ('', 'usr', 'bin', 'env') >>> '/'.join(dirs) '/usr/bin/env' >>> print('c:'+'\\'.join(dirs)) c:\usr\bin\env
빅데이터분석교육(2015-11)
• lower >>> """Monty Python (sometimes known as The Pythons) was a British surreal comedy group … BBC on 5 October 1969. """.lower() "monty python (sometimes known as the pythons) was … october 1969. " >>> >>> if 'Gumby' in ['gumby','smith','jones']: print ('Found it') >>> name = 'Gumby' >>> names = ['gumby','smith','jones'] >>> if name.lower() in names: print('Found it!') Found it!
• replace >>> 'This is a test'.replace('is', 'eez') 'Theez eez a test' >>>
• split
>>> '1+2+3+4'.split('+') ['1', '2', '3', '4'] >>> '/usr/bin/env'.split('/') ['', 'usr', 'bin', 'env'] >>> 'Using the default'.split() ['Using', 'the', 'default']
빅데이터분석교육(2015-11)
• Strip >>> ' internal white space is not to be '.strip() 'internal white space is not to be' >>> names=['gumby','smith','jones'] >>> name = 'gumby ' >>> if name in names: print('Found it') >>> if name.strip() in names: print('Found it!') Found it! >>> >>> '*** SPAM * for* everyone!!! *** SyntaxError: EOL while scanning string literal >>> '*** SPAM * for* everyone!!! ***'.strip('*!') ' SPAM * for* everyone!!! '
• Translate – (생략)
빅데이터분석교육(2015-11)
Dictionary
빅데이터분석교육(2015-11)
개요
• 개념
• 사용처
• 생성방법
빅데이터분석교육(2015-11)
• 기본 >>> phonebook ={'Alice':'2341', 'Beth':'9102', 'Cecil':'3258'} >>> phonebook['Beth']
• dict() 함수 >>> items =[('name','Gumby'), ('age',42)] >>> d = dict(items) >>> d {'age': 42, 'name': 'Gumby'} >>> d['name'] 'Gumby' >>> >>> # keyword argument로 사용 가능 >>> >>> d = dict(name='Gumby', age=42) >>> d {'age': 42, 'name': 'Gumby'}
빅데이터분석교육(2015-11)
• 기본적인 작업
– len(d)
– d[k]
– d[k] = v
– del d[k]
– k in d
>>> x =[]
>>> x[42]='Foobar'
Traceback (most recent call last):
File "<pyshell#443>", line 1, in <module>
x[42]='Foobar'
IndexError: list assignment index out of range
>>> x = {}
>>> x[42] = 'Foobar'
>>> x
{42: 'Foobar'}
빅데이터분석교육(2015-11)
# Program: II01_database.py # A simple database people = { 'Alice': { 'phone': '2341', 'addr': 'Foo drive 23' }, 'Beth': { 'phone': '9102', 'addr': 'Bar street 42' }, 'Cecil': { 'phone': '3158', 'addr': 'Baz avenue 90' } }
labels = { 'phone': 'phone number', 'addr': 'address' } name = input('Name: ') # Are we looking for a phone number or an address? request = input('Phone number (p) or address (a)? ') # Use the correct key: if request == 'p': key = 'phone' if request == 'a': key = 'addr' # Only try to print the information if the name is a valid key in our dictionary: if name in people: print ("%s's %s is %s." % (name, labels[key], people[name][key])) 빅데이터분석교육(2015-11)
Dictionary Methods
• clear
>>> x = {}
>>> y=x
>>> x['key'] = 'value'
>>> x
{'key': 'value'}
>>> y
{'key': 'value'}
>>> x={}
>>> y
{'key': 'value'}
>>>
>>>
>>> #
>>> x = {}
>>> y=x
>>> x['key']='value'
>>> y
{'key': 'value'}
>>> x.clear()
>>> x
{}
>>> y
{}
빅데이터분석교육(2015-11)
• copy >>> x = {'username':'admin', 'machines':['foo','bar','baz']} >>> y = x.copy() >>> y['username'] = 'tom' >>> y['machines'].remove('bar') >>> y {'username': 'tom', 'machines': ['foo', 'baz']} >>> x {'username': 'admin', 'machines': ['foo', 'baz']} >>> # >>> from copy import deepcopy >>> d={} >>> d['names'] = ['Alfred','Betrand']
>>> c = d.copy() >>> dc = deepcopy(d) >>> d {'names': ['Alfred', 'Betrand']} >>> c {'names': ['Alfred', 'Betrand']} >>> dc {'names': ['Alfred', 'Betrand']} >>> d['names'].append('Clive') >>> d {'names': ['Alfred', 'Betrand', 'Clive']} >>> c {'names': ['Alfred', 'Betrand', 'Clive']} >>> dc {'names': ['Alfred', 'Betrand']}
빅데이터분석교육(2015-11)
• fromkeys >>> {}.fromkeys(['name','age']) {'age': None, 'name': None} >>> # same effect as below >>> dict.fromkeys(['name','age']) {'age': None, 'name': None} >>> # default 지정 >>> dict.fromkeys(['name','age'], '(unknown)') {'age': '(unknown)', 'name': '(unknown)'}
• get >>> d={} # items에 대한 forgiving (너그러운) access >>> d.get('name') >>> print(d.get('name')) None >>> # default값 지정 >>> d.get('name','NA') 'NA' >>> d['name']='Eric' >>> d.get('name') 'Eric'
# Program: II01_database2.py labels = { 'phone': 'phone number', 'addr': 'address' } name = input('Name: ') request = input('Phone number (p) or address (a)? ') key = request # In case the request is neither 'p' nor 'a' if request == 'p': key = 'phone' if request == 'a': key = 'addr' # Use get to provide default values: person = people.get(name, {}) label = labels.get(key, key) result = person.get(key, 'not available') print ("%s's %s is %s." % (name, label, result))
빅데이터분석교육(2015-11)
• has_key – == k in d 단, Python 3에서 없어짐 >>> # d.has_key(k) >>> d = {} >>> d.has_key('name')
• items와 iteritems >>> # items(); returns all the items of the dictionary as a list of items, in which item is of the form (key,value) >>> # 단, 순서는 달라짐 >>> d = {'title':'Python web site', 'url':'www.python.org', 'spam':0} >>> d.items() dict_items([('url', 'www.python.org'), ('spam', 0), ('title', 'Python web site')]) >>> # iteritems(); iterator 반환 – Python 3에서 없어짐 >>> it = d.iteritems()
• keys와 iterkeys >>> d.keys() dict_keys(['url', 'spam', 'title'])
• pop >>> d = {'x':1,'y':2} >>> d.pop('x') 1
• popitem >>> d = {'title':'Python web site', 'url':'www.python.org', 'spam':0} >>> d.popitem() ('url', 'www.python.org') >>> # 단, arbitrary item <-- list가 아니어서 'last item' 개념이 없음 >>> # 또한 dictionary에는 append 가 없음
빅데이터분석교육(2015-11)
• setdefault >>> d2 ={} >>> d2.setdefault('name','N/A') 'N/A' >>> d2 {'name': 'N/A'} >>> d2['name'] = 'Charles' >>> d2.setdefault('name','N/A') 'Charles' >>> d2 {'name': 'Charles'}
• update >>> d3 = { 'title': 'Python web site', 'url': 'www.python.org', 'changed': 'Mar 14 22:05 MET 2014' } >>> x = {'title':'Python language web site'}
>>> d3.update(x) >>> d3 {'url': 'www.python.org', 'changed': 'Mar 14 22:05 MET 2014', 'title': 'Python language web site'}
• values와 itervalues >>> d = {} >>> d[1] = 1 >>> d[2] = 2 >>> d[3] =3 >>> d[4]=1 >>> d[0]=5 >>> d {0: 5, 1: 1, 2: 2, 3: 3, 4: 1} >>> d.values() dict_values([5, 1, 2, 3, 1])
빅데이터분석교육(2015-11)
조건문과 Loop문
• Block 지정 – Indentation – Level 당 4 spaces – 조건지정: Boolean 값
• 조건문 – if ~ elif ~ else – assertion
• Loop – while loop, for loop – 다양한 iteration 방법 …
• List Comprehension • 보충내용 (1)
– pass, del, exec, del
• 보충내용 (2) – Sequence unpacking, …
빅데이터분석교육(2015-11)
조건지정: Boolean 값
• False – False, None, 0, “”, {},(),[]
• True – True, False가 아닌 기타의 모든
것
• 내부처리의 본질: True=1, False=0 – >>> True – True – >>> False – False – >>> True==1 – True – >>> False==0 – True – >>> True+False+50 – 51
• True와 False는 bool type에 속함 – >>> type(True) – <class 'bool'> – >>> type(False) – <class 'bool'> – >>> bool('') – False – >>> bool(0) – False
빅데이터분석교육(2015-11)
조건 실행
• if, elif, else • Nested Blocks
var = 100 if var < 200: print("Expression value is less than 200") if var == 150: print("Which is 150") elif var == 100: print("Which is 100") elif var == 50: print("Which is 50") elif var < 50: print("Expression value is less than 50") else: print("Could not find true expression") print("Good bye!")
• 비교연산자 – == – <, > – <=, >= – != – is와 is not – in과 not in
• Equality 연산자 – is – 양자를 구분할 것! – identity vs.
equality >>> x = y = [1,2,3] >>> z = [1,2,3] >>> x==y True >>> x==z True >>> x is y True >>> x is z False
빅데이터분석교육(2015-11)
• Membership 연산자 – In >>> name = input('your name? ') your name? hk yoon >>> if 'h' in name: print('Your name contains a chracter "h"') Your name contains a chracter "h"
• String 비교 >>> 'alpha' < 'beta' True >>> # 이와 관련하여 >>> 'FnOrD'.lower() == 'fnOrd'.lower() True >>> [1,2] < [2,1] True
• Boolean 연산자 number = input('Enter a number between 1~10: ') if int(number) <=10 and int(number) >=1: print('Well done!') else: print('Watch out the range!') – Short-circuit Logic – …
빅데이터분석교육(2015-11)
Loop
• while loop >>> x =1 >>> while x <=100: print(x) x+=1 # # Program: II02_whiletest01.py name='' while not name: name=input('You have to enter name: ') print('Hello, %s' % name) # 주의 – space 입력의 문제 # Program: II02_whiletest02.py name='' while not name or name.isspace(): # while not name.strip() name=input('You have to enter name: ') print('Hello, %s' % name)
• for loop – Code block 수행을 for each
element of a set (or sequence, or other iterable object)
– iterable object = iterate 가능한 객체
>>> words = ['this','is','a','wonder'] >>> for word in words: print(word) this is a wonder
빅데이터분석교육(2015-11)
• Range >>> range(1,10,3) range(1, 10, 3) >>> print(range(1,10,3)) range(1, 10, 3) >>> list(range(1,10,3)) [1, 4, 7] >>> for number in range(1,10,3): print(number) 1 4 7 – >>> # Python 2.x에서의
xrange는 3.x에서 range에 포함되었음
• Dictionary에 대한 iteration >>> d = {'x':1, 'y':2,'z':3} >>> for key in d: print (key, 'corresponds to', d[key]) y corresponds to 2 x corresponds to 1 z corresponds to 3 # 다음과 동일 >>> for key, value in d.items(): print(key, 'corresponds to', d[key])
빅데이터분석교육(2015-11)
• Iteration 관련 utilities – Parallel iteration >>> names = ['anne','beth','george','damon'] >>> ages=[12,30,35,55] >>> for i in range(len(names)): print (names[i], 'is ', ages[i], 'years old') anne is 12 years old beth is 30 years old george is 35 years old damon is 55 years old >>> # zip >>> list(zip(names,ages)) [('anne', 12), ('beth', 30), ('george', 3 5), ('damon', 55)] Number iteration # enumerate()로 index를 이용 가능. >>> for i,v in enumerate(['tic','tac','toe']): print(i,v) 0 tic 1 tac 2 toe
Reversed & Sorted iteration # reversed(), sorted()는 reverse(), sort()와 유사하지만, # they work on any sequence or iterable object, # 또한 object를 in-place 변경시키지 않고, they return reversed and sorted version >>> sorted([4,3,7,6,9]) [3, 4, 6, 7, 9] >>> sorted('hello, world!') [' ', '!', ',', 'd', 'e', 'h', 'l', 'l', 'l', 'o', 'o', 'r', 'w'] >>> list(reversed('hello')) ['o', 'l', 'l', 'e', 'h'] >>> ''.join(reversed('hello,world')) 'dlrow,olleh'
빅데이터분석교육(2015-11)
• Loop 에서 나오는 방법 – break from math import sqrt for n in range(99,0,-1): root = sqrt(n) if root == int(root): print(n) break – continue – 예제 생략
– while True/break idiom ##word = input('Enter a word: ') ##while word: ## # do something ## print('The word was ' + word) ## word = input('Enter a word: ') while True: word = input('Enter a word: ') if not word: break # do something print('The word was ' + word) – while True는 무한반복이므로 문
장내에 if /break 조건으로 처리
빅데이터분석교육(2015-11)
List Comprehension
• 개념 – 간편하면서도 강력! >>> [x*x for x in range(10)] [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> # 3으로 나누어지는 수만? >>> [x*x for x in range(10) if x % 3==0] [0, 9, 36, 81] >>> [(x,y) for x in range(3) for y in range(2)] [(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)] result=[]
– ## 다음 프로그램은 verbose for x in range(3): for y in range(2): result.append((x,y))
>>> girls = ['alice','bernice','clarice'] >>> boys = ['chris','arnold','bob'] >>> [b+':' for b in boys for g in girls if b[0]==g[0]] ['chris:', 'arnold:', 'bob:'] >>> [b+':'+g for b in boys for g in girls if b[0]==g[0]] ['chris:clarice', 'arnold:alice', 'bob:bernice'] >>> – >>> # 개선 - girl의 앞글자를 key로
하고 이름을 value로 하는 dictionary --> cartesian을 피함
>>> grouping = {} >>> for girl in girls: grouping.setdefault(girl[0],[]).append(girl) >>> print([b+':'+g for b in boys for g in girls if b[0]==g[0]])
빅데이터분석교육(2015-11)
보충내용 (1) pass, del, exec, eval
• pass() ; do nothing – if name=='park': – print('welcome') – elif name=='bush': – pass – else: – print('we are waiting')
• del(): delete – >>> x=1 – >>> x – 1 – >>> del x – >>> x – Traceback (most recent call last): – File "<pyshell#925>", line 1, in
<module> – x – NameError: name 'x' is not
defined
– >>> x = ['hello','world'] – >>> y=x – >>> y[1] – 'world' – >>> y[1] = 'python' – >>> x – ['hello', 'python'] – >>> del x – >>> x – Traceback (most recent call last): – File "<pyshell#934>", line 1, in
<module> – x – NameError: name 'x' is not
defined – >>> y – ['hello', 'python'] – # Why? – Del은 이름만 삭제, list (즉 value)는
그대로 존재 – 사실상 삭제 불가. 이 작업은 Python interpreter가 실시
빅데이터분석교육(2015-11)
• exec()
– Execute a series of Python statements
– …
• eval()
– Evaluates a Python expression and return the resulting value
>>> eval(input("Enter an arithmetic expressoin: "))
Enter an arithmetic expressoin: 5 + 3*2
11
– # 주의: 보안문제에 유의할 것
빅데이터분석교육(2015-11)
보충내용 (2) Sequence unpacking, …
• Sequence unpacking >>> x, y, z=1,2,3 >>> print(x,y,z) 1 2 3 >>> >>> x,y = y,x >>> x,y (2, 1) >>> x,y,z (2, 1, 3) >>> x,y = y,z >>> x,y,z (1, 3, 3) >>> x,y,z = z,x,y >>> x,y,z (3, 1, 3)
• Chained assignment와 augmented assignment x = y = somefunction() >>> x=2 >>> x+=3 >>> x*=5 >>> x 25 >>> >>> sname = 'foo' >>> sname +='bar' >>> sname *=3 >>> sname 'foobarfoobarfoobar'
빅데이터분석교육(2015-11)
• assert
– 위험요소를 미리 드러나게 함
– 일종의 checkpoint로 활용
>>> age =10
>>> assert 0<age<100
>>> age =-5
>>> assert 0<age<100
Traceback (most recent call last):
File "<pyshell#979>", line 1, in <module>
assert 0<age<100
AssertionError
빅데이터분석교육(2015-11)
Module
• 개념 – 프로그램 파일 (Python, C, C++, …)
– 역할:
• Module 작성과 이용 – Module작성
– import ans
– 검색경로
– Private name
• Scoping Rule
빅데이터분석교육(2015-11)
• 개념 – 프로그램 파일 (Python, C, C++, …) – 역할:
• … • namespace를 통해 name clash를 방지
– Namespace = 일종의 dictionary of identifiers
• Module 작성과 이용 – Module작성 – import 문
• import 후 qualification이 필요함
– 검색경로 • 유의할 것 • Python 경로 안에 설치, sys.path 변경, PYTHONPATH 이용, .pth 파일
작성
– Private name
빅데이터분석교육(2015-11)
• Scoping Rule – 순서:
– Local > Global > Built-in
• locals()
• globals()
• dir(__builtins__)
빅데이터분석교육(2015-11)
• # module 작성 및 이용
>>> import II02_mymath
>>> area(2)
12.56636
>>> II02_mymath.pi
3.14159
>>> II02_mymath.area(5)
78.53975
>>> from II02_mymath import area
>>> area(10)
314.159
• # module 탐색경로
>>> import sys
>>> sys.path
['C:/Python31/DBnet', 'C:\\Python31\\Lib\\idlelib', 'C:\\Windows\\system32\\python31.zip', 'C:\\Python31\\DLLs', 'C:\\Python31\\lib', 'C:\\Python31\\lib\\plat-win', 'C:\\Python31', 'C:\\Python31\\lib\\site-packages']
빅데이터분석교육(2015-11)
II. Python 프로그래밍 (2)
• 함수
• 파일 & 입출력
• OOP
• Exception처리
빅데이터분석교육(2015-11)
함수
빅데이터분석교육(2015-11)
Functions
• 개념 – 함수?... 추상화의 한 단계
• 함수, class, design patterns, …
• 함수의 작성 – def 문
• 함수에서의 Parameter • Scoping의 문제
• Local – 함수 내에서만 … • Nonlocal –previously bound variable in the closest enclosing scope • Global – 함수 밖에 존재하고 이를 함수 밖에서 global 선언하여 access, 변경
• Recursion • Lambda expression
– In-line 정의하는 익명의 작은 함수 (단, return 문이 없음)
• Generator 함수 – 자신이 원하는 iterator를 정의 – yield 문 이용
• Decorator 함수 – Function도 1st class로서 변수에 assign 되거나 parameter로서 전달될 수 있다.
빅데이터분석교육(2015-11)
함수의 작성
• def – function을 정의 >>> def hello(name): return 'Hello, ' + name + '!' >>> hello('world') 'Hello, world!' >>> def fibs(num): result = [0,1] for i in range(num-2): result.append(result[-2]+result[-1]) return result >>> fibs(10) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34] >>> fibs(20) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]
• Documenting >>> def square(x): 'x의 제곱을 계산' return x*x >>> square(2) 4 >>> square(7.5) 56.25 >>> square.__doc__ 'x의 제곱을 계산‘ >>> help(square) Help on function square in module __main__: square(x) x의 제곱을 계산
빅데이터분석교육(2015-11)
Parameter
• 용어 – 정의 시: Formal parameter – 호출 시: Actual parameter =
argument
• Local 변수의 문제 – local scope (자신이 지정된 function
또는 block)만을 이용하는 변수 – 함수 내에서는 기본적으로 별도의
copy본을 만들어서 이용 >>> def inc(x): return x+1 >>> inc(10) 11 >>> inc(11) 12 >>> x =20 >>> inc(x) 21
>>> def try_to_change(n): n="Me" >>> name='You' >>> try_to_change(name) >>> name 'You' >>> def change(n): n[0] = "Me" >>> names = ["You","Her","Him"] >>> change(names) >>> names ['Me', 'Her', 'Him'] >>>
빅데이터분석교육(2015-11)
• 단, parameter object를 바꿀 경우 immutable에 유의
– >>> def inc(x):
– return x+1
– >>> inc(10)
– 11
– >>> inc(11)
– 12
– >>> x =20
– >>> inc(x)
– 21
– >>>
– >>> y=[10]
– >>> inc(y)
– Traceback (most recent call last):
– File "<pyshell#147>", line 1, in <module>
– …
– TypeError: can only concatenate list (not "int") to list
– >>> def inc2(x):
– x[0] = x[0]+1
– >>> inc2(y)
– >>> y
– [11]
빅데이터분석교육(2015-11)
• Positional parameter, Keyword parameter, default parameter >>> def hello1(greeting, name): print('%s, %s' % (greeting, name)) >>> def hello2(greeting, name): print('%s, %s' % (name, greeting)) >>> hello1('Hi', 'Seoul') Hi, Seoul >>> hello2('Hi', 'Seoul') Seoul, Hi
>>> >>> hello1(name='hk', greeting="good morning") good morning, hk >>> def hello3(greeting="Good day", name="everybody"): print('%s, %s' % (greeting, name)) >>> hello3() Good day, everybody >>> hello3('Wonderful', 'universe') Wonderful, universe – 단, 여러 종류의 parameter가 동
시에 사용될 때는 positional parameter가 먼저 나올 것
빅데이터분석교육(2015-11)
• Parameter의 개수변동 >>> def print_params(*params):
print(params)
>>> print_params(1)
(1,)
>>> print_params(2,3,4)
(2, 3, 4)
>>> print_params('hk', 'john','obama')
('hk', 'john', 'obama')
>>>
>>> def print_params2(title, *params):
'parameter: title is required, but you can add others'
print(title)
print(params)
>>> print_params2('Nice wether', 'yesterday','today','tomorrow')
Nice wether
('yesterday', 'today', 'tomorrow')
>>> help(print_params2)
…
빅데이터분석교육(2015-11)
• 여러 개의 Keyword parameter 정보 수집 >>> def print_params3(**params): print(params) >>> print_params3(name='kim', age=35, gender='male') {'gender': 'male', 'age': 35, 'name': 'kim'}
• 종합 >>> def print_params(x, y, z=3, *position_par, **keyword_par): print(x,y,z) print(position_par) print(keyword_par) >>> print_params(1,5,10, 'my function', hobby1='game', hobby2='climbing') 1 5 10 ('my function',) {'hobby2': 'climbing', 'hobby1': 'game'}
빅데이터분석교육(2015-11)
>>> storage={} >>> storage['firstname']={} >>> storage['middlename']={} >>> storage['lastname']={} >>> me1 ='wolfgang amadeus mozart' >>> me2 = 'john f kennedy' >>> storage['firstname']['wolfgang']=me1 >>> storage['middlename']['wolfgang']=me1 >>> storage['lastname']['mozart']=me1 >>> storage['middlename']['amadeus']=me1 >>> storage['firstname']['john'] = me2 >>> storage['middlename']['f'] = me2 >>> storage['lastname']['kennedy'] = me2
>>> storage {'middlename': {'wolfgang': 'wolfgang amadeus mozart', 'amadeus': 'wolfgang amadeus mozart', 'f': 'john f kennedy'}, 'lastname': {'kennedy': 'john f kennedy', 'mozart': 'wolfgang amadeus mozart'}, 'firstname': {'wolfgang': 'wolfgang amadeus mozart', 'john': 'john f kennedy'}} >>> >>> def lookup(data, label, name): return data[label].get(name) >>> lookup(storage, 'lastname','kennedy') 'john f kennedy' >>> >>> def init(data): data['firstname']={} data['middlename']={} data['lastname']={}
빅데이터분석교육(2015-11)
>>> def store(data, fullname): names = fullname.split() if len(names) ==2: names.insert(1, '') labels = 'firstname','middlename','lastname' for label, name in zip(labels, names): people = lookup(data, label, name) if people: people.append(fullname) else: data[label][name] = [fullname]
>>> def store2(data, *fullnames): for fullname in fullnames: names = fullname.split() if len(names) ==2: names.insert(1,'') labels = 'first','middle','last' for label, name in zip(labels,names): people=lookup(data, label,name) if people: people.append(fullname) else: data[label][name] =[fullname] >>> store(storage, 'chol su kim') >>> lookup(storage, 'lastname','kim') ['chol su kim'] >>> store(storage, 'ki moon ban') >>> lookup(storage,'firstname','ki') ['ki moon ban']
빅데이터분석교육(2015-11)
>>> params=(1,2) >>> add(*params) 3 >>> >>> def hello3(greeting="Good day", name="everybody"): print('%s, %s' % (greeting, name)) >>> params2={'name':'Dan gun', 'greeting':'father of father'} >>> hello3(**params2) father of father, Dan gun >>> # Using * (or **) both when you define and call the function will pass the tuple (or dictionary)
>>> def with_stars(**keywords): print(keywords['name'], 'is ', keywords['age'], 'years old') >>> def without_stars(keywords): print(keywords['name'], 'is ', keywords['age'], 'years old') >>> arguments = {'name':'Mr. Knowall','age':40} >>> with_stars(**arguments) Mr. Knowall is 40 years old >>> without_stars(arguments) Mr. Knowall is 40 years old # So, * (stars) are really useful only if you use them either # when defining a function (to allow a varying number of arguments) # or when calling a function (to 'splice in" a dictionary or a sequence
빅데이터분석교육(2015-11)
• It may be useful to use these splicing operators to ‘pass through’ parameters, without worrying too much about how many there are, and so forth.
– >>> def foo(x,y,z, m=0, n=0):
– print(x,y,z,m,n)
–
– >>> def call_foo(*args, **kwds):
– print('Calling foo! ...')
– foo(*args, **kwds)
• # 종합
>>> def story(**kwds):
return 'Once upon a time, there was a ' \
'%(job)s called %(name)s. ' % kwds
>>> def power(x,y, *others):
if others:
print('Received redundant parameters: ', others)
return pow(x,y)
>>> def interval(start, stop=None, step=1):
'Imitates range() for step >0'
if stop is None:
start, stop=0, start
result=[]
•
빅데이터분석교육(2015-11)
>>> story(job='king', name='Arthur')
'Once upon a time, there was a king called Arthur. '
>>> story(name = 'robinhood', job='righteous outlaw')
'Once upon a time, there was a righteous outlaw called robinhood. '
>>> params = {'job':'language','name':'python'}
>>> params1 = {'job':'language','name':'python'}
>>> story(**params1)
'Once upon a time, there was a language called python. '
>>> del params['job']
>>> params1
{'job': 'language', 'name': 'python'}
>>> params
{'name': 'python'}
>>> del params1['job']
>>> params1
{'name': 'python'}
>>> story(job='miracle in our age', **params)
'Once upon a time, … called python. '
>>>
>>> power(2,4)
16
>>> params2 = (5,) *2
>>> params2
(5, 5)
>>> power(*params2)
3125
>>> 5*5*5*5*5
3125
>>> power(3,3,'Hello, world')
Received redundant parameters: ('Hello, world',)
27
>>> interval(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> interval(1,5)
[1, 2, 3, 4]
>>> interval(3,12,4)
[3, 7, 11]
>>> power(*interval(3,7))
Received redundant parameters: (5, 6)
빅데이터분석교육(2015-11)
• Scoping의 문제 – Namespace란?
• (변수 등의) 이름이 존재하는 곳 • 일종의 invisible dictionary = scope
>>> x=1 >>> scope = vars() >>> scope {'square': <function square at 0x00000000033309C8>, 'print_params': <function print_params at 0x00000000033982C8>, 'call_foo': ... 'y': [11], 'x': 1, 'z': (10,), 'scope': {...}} >>> scope['x'] 1 >>> scope['x']+1 2 >>> x 1 >>> scope['x']+=1 >>> x 2
– Local, nonlocal, global, 변수 • Local – 함수 내에서만 … • Nonlocal – closest enclosing scope
의 변수 • Global –
– Rebinding global variable • Making them refer to some new
value • 변수를 함수 내에서 정의하는 순간
자동으로 local 변수가 됨. (다르게 지정할 수는 있음)
빅데이터분석교육(2015-11)
# http://www.python-course.eu/python3_global_vs_local_variables.php >>> def f(): print(s) >>> s = "I love Paris in the summer!" >>> f() I love Paris in the summer! >>> >>> def f(): s = "I love London!" print(s) >>> s = "I love Paris!" >>> f() I love London! >>> print(s) I love Paris! >>> >>> >>> def f(): print(s) s = "I love Longdon!" print(s)
>>> s = "I love Paris!" >>> f() Traceback (most recent call last): File "<pyshell#526>", line 1, in <module> f() File "<pyshell#524>", line 2, in f print(s) UnboundLocalError: local variable 's' referenced before assignment >>> def f(): global s print(s) s = "Only in spring, but London is great as well!" print(s) >>> s = "I am looking for a course in Paris!" >>> f() I am looking for a course in Paris! Only in spring, but London is great as well! >>> print(s) Only in spring, but London is great as well! >>> s = "new sentence" >>> print(s) new sentence >>>
빅데이터분석교육(2015-11)
>>> def f(): s = "I am globally not known" print(s) >>> f() I am globally not known >>> print(s) new sentence >>> def f(): s1 = "I am globally not known" print(s1) >>> f() I am globally not known >>> print(s1) Traceback (most recent call last): File "<pyshell#546>", line 1, in <module> print(s1) NameError: name 's1' is not defined ------------ # save following as: scopetest01.py def f(): s = "I am globally not known" print(s) f() print(s) -------------
c:\Python31>python scopetest01.py I am globally not known Traceback (most recent call last): File "ex.py", line 6, in <module> print(s) NameError: name 's' is not defined c:\Python31> ------------- >>> def foo(x, y): global a a = 42 x,y = y,x b = 33 b = 17 c = 100 print(a,b,x,y) >>> a,b,x,y = 1,15,3,4 >>> foo(17,4) 42 17 4 17 >>> print(a,b,x,y) 42 15 3 4
빅데이터분석교육(2015-11)
• Nested scope Function 안의 function – 예: using one function to create another. >>> def foo(x, y): global a a = 42 x,y = y,x b = 33 b = 17 c = 100 print(a,b,x,y) >>> a,b,x,y = 1,15,3,4 >>> foo(17,4) 42 17 4 17 >>> print(a,b,x,y) 42 15 3 4 >>> >>> def multiplier(factor): def multiplyByFactor(number): return number * factor return multiplyByFactor
# outer 함수가 inner 함수를 반환 - not called # 특징: returned function still has access to the scope where it was defined # 즉, it carries its environment (and the associated local variables) with it! >>> double = multiplier(2) >>> double(5) 10 >>> triple=multiplier(3) >>> triple(3) 9 >>> multiplier(5)(4) 20 이때 multiplyByFactor (that stores its enclosing scope)를 closure라고 한다 # 보통은 outer scope의 변수를 rebind할 수 없다. # 그러나 Python 3에서는 'nonlocal'을 통해 outer (but non-global) scope의 변수를 assign할 수 있다.
빅데이터분석교육(2015-11)
• 재귀함수
>>> def recursion():
return recursion()
>>> def factorial(n):
result=n
for i in range(1,n):
result *=1
return result
# 예 2
# factorial의 정의: (a) 1의 factorial은 1
# 1 이상 n의 factorial은 n x factorial(n-1)
>>> def factorial(n):
if n==1:
return 1
else:
return n * factorial(n-1)
>>> factorial(10)
3628800
>>>
# 예 2
>>> def power(x,n):
result=1
for i in range(n):
result *= x
return result
>>>
>>> # power의 정의
>>> # power(x,0)= 1
>>> # power(x,n) for n >0 = x * power(x, n-1)
>>> def power(x,n):
if n==0:
return 1
else:
return x * power(x, n-1)
빅데이터분석교육(2015-11)
• Lambda 함수 >>> t2 = {'FtoK': lambda deg_f: 273 + (deg_f -32) * 5 / 9,
'CtoK': lambda deg_c: 273 + deg_c}
>>> t2['FtoK'](32)
273.0
• Generator 함수 >>> def four():
x = 0
while x <4:
print("in generator, x = ", x)
yield x
x+=1
>>> for i in four():
print(i)
in generator, x = 0
0
in generator, x = 1
1
in generator, x = 2
2
in generator, x = 3
3
>>> 2 in four()
in generator, x = 0
in generator, x = 1
in generator, x = 2
True
빅데이터분석교육(2015-11)
• Decorators – Wrapping function
(= decorator)
– +
– Wrapped function (작성 • 후에 wrapping function의
argument로 이용)
• @decorate 함수정의()
빅데이터분석교육(2015-11)
• Decorators >>> def decorate(func):
print("in decorate function, decorating", func.__name__)
def wrapper_func(*args):
print("Executing", func.__name__)
return func(*args)
return wrapper_func
>>> def myfunction(parameter):
print(parameter)
>>> myfunction = decorate(myfunction)
in decorate function, decorating myfunction
>>> myfunction("hello")
Executing myfunction
hello
>>>
>>>
>>> @decorate
def myfunction(parameter):
print(parameter)
in decorate function, decorating myfunction
>>> myfunction('hello')
Executing myfunction
hello
빅데이터분석교육(2015-11)
• Another classic: Binary Search
– (생략)
• Throwing functions around – Python에서는 function 도 object
의 하나일 뿐 • 다른 변수에 지정하거나
• Parameter로 전달하거나
• 다른 함수에서의 결과값으로 반환
– 아울러, 특수한 함수들: • map(), filter(), reduce(), …
– (보충 要)
빅데이터분석교육(2015-11)
파일 및 입출력
빅데이터분석교육(2015-11)
• 파일 – 데이터를 일정한 방식으로 저장한 것.
• 파일 열기 file object = open(file_name [, access_mode][, buffering])
– File mode
– Buffering
• 파일 관련 Methods – Read, Write
– Pipe
– Read, Write Lines
– 파일 닫기
– 주요 methods
• 파일 내용에 대한 Iteration (생략) – Byte 단위, Line 단위, 통째로 읽기
– File Iterator
빅데이터분석교육(2015-11)
파일 열기
• File mode – open()의 Mode argument
• Binary 모드에서는 MS
Windows에서의 \n\r를 변환시키는 문제 등을 원천 해결
• Buffering – HDD 대신 메모리를 이용
• 0 (False); unbuffered
• 1 (True); buffered
• 큰 숫자는 buffersize를 표시 (-1은 default buffer size를 의미)
• 표준 stream; – sys module
– sys.stdin, sys.stdout, sys.stderr
• File-like – Streams
• io module
• (입출력의 3 가지 유형)
• text I/O, binary I/O, raw I/O.
• 각각에 대해 다양한 저장장치를 적용 - Concrete objects가 streams
– urllib.urlopen
– …
값 설명
‘r’ 읽기
‘w’ 쓰기
‘a’ Append
‘b’ Binary (다른 mode에 추가)
‘+’ Read/write (다른 mode에 추가)
빅데이터분석교육(2015-11)
파일 관련 Methods
• 파일 읽기 및 쓰기
>>> f = open('somefile.txt','w')
>>> f.write('hello, ')
7
>>> f.write('World')
5
>>> f.close()
>>>
>>> f=open('somefile.txt','r')
>>> f.read(4)
'hell'
>>> f.read()
'o, World'
• # Piping
– Linux/Unix
– # cat somefile.txt | python somescript.py | sort
• stdin - ### II02_wordcount01.py ???
• Read, Write Lines
– readlines()
– writelines()
• 파일 닫기 – close()
빅데이터분석교육(2015-11)
>>> f2 = open(r'c:\Python31\DBnet\somefile2.txt')
>>> f2.readline()
"Pro-democracy demonstrators seized … after a night of scuffles.\n"
>>> f2.readline()
"Spurred on by police … according to Hong Kong police."
>>> f2.readlines()
[]
>>> f2.seek(0,0)
0
>>> f2.readlines()
["Pro-democracy demonstrators seized … to Hong Kong police."]
>>> f2.close()
>>> f3 = open(r'c:\Python31\DBnet\somefile2.txt', 'w')
>>> f3.write('this is a new line')
18
>>> f3.close()
>>> f3 = open(r'c:\Python31\DBnet\somefile2.txt', 'r')
>>> f3.readlines()
['this is a new line']
빅데이터분석교육(2015-11)
# urllib >>> import urllib.request >>> f= open('somefile.txt','w') >>> url = 'http://www.openwith.net' >>> urllib.request.urlretrieve(url, 'somefile.txt') ('somefile.txt', <http.client.HTTPMessage object at 0x0000000003673E48>) <'somefile.txt'를 열어 볼 것> >>> import urllib.request >>> response = urllib.request.urlopen('www.openwith.net') >>> html = response.read() >>> html b'<!DOCTYPE html>\n<html lang="ko-KR">\n<head>\n<meta charset="UTF-8" />\n<title>\x…
빅데이터분석교육(2015-11)
파일 내용에 대한 Iteration (생략)
• Byte 단위, Line 단위, 통째로 읽기
• File Iterator
빅데이터분석교육(2015-11)
OOP
• 개요 – Class와 Object
– 특징
• Class와 Type – Class 만들기
– Attribute와 Method
– Class namespace
– Superclass와 subclass
• OOD
빅데이터분석교육(2015-11)
개요
• Class와 Object – Class = A user-defined prototype for an object
• 특징 – Encapsulation
• 불필요한 detail을 감추는 것
– Polymorphism • (사용자는 모르는 가운데에서도) object의 type에 따라 자동으로 작업내용이
달라지는 것
– Inheritance • (Super)Class—subclass 즉, Parent-Child
• 주의 – Class와 type
빅데이터분석교육(2015-11)
# polymorphism >>> 2+2 4 >>> 'good' + 'morning' 'goodmorning' >>> 'good ' + 'morning' 'good morning‘ >>> add(1,2) 3 >>> add('good', ' morning') 'good morning‘ def length_of_message(n): print('The length of ', repr(n), 'is', len(n)) >>> length_of_message('good morning') The length of 'good morning' is 12 >>> length_of_message([3,5,7]) The length of [3, 5, 7] is 3
빅데이터분석교육(2015-11)
Class와 Type
• Class 만들기 – Constructor – initializer
• Attribute와 Method – Attribute = Object의 성격/특징을 표현하는 변수 – Method = Object에 속한 (bound) 함수
=A special kind of function that is defined in a class definition. – Operator overloading: 특정 operator에 여러 개의 함수를 assign하는 것
• Class namespace – Private ; (attribute 또는 method가) object 내에서만 인식되도록 한 것
이 경우 accessor method를 통해서만 이용가능
• Superclass와 subclass – class SubClassName (ParentClass1[, ParentClass2, ...]):
'Optional class documentation string' class_suite
• Interface와 Introspection
빅데이터분석교육(2015-11)
• Built-In Class Attributes
• Sameness vs. ‘==‘ • 복제 (copying)
– Aliasing을 했을 때 한쪽의 변경사항이 다른 한쪽에 영향미치는지가 애매할 수 있다. – 복제가 (copying) 한 대안이 될 수 있다. import copy copy.copy()
Class attributes 설명
__dict__ Dictionary containing the class's namespace.
__doc__ Class documentation string or None if undefined.
__name__ Class name
__module__ Module name in which the class is defined. This attribute is "__main__" in interactive mode.
__bases__ A possibly empty tuple containing the base classes, in the order of their occurrence in the base class list.
빅데이터분석교육(2015-11)
class Person01: def setName(self, name): self.name = name def getName(self): return self.name def greet(self): print("Hello, I am %s." % self.name) person1 = Person() >>> person2 = Person() >>> person1.setName('Ki Chul Kim') >>> person2.setName('Young Hee Park') >>> person1.greet() Hello, I am Ki Chul Kim. >>> person2.greet() Hello, I am Young Hee Park.
빅데이터분석교육(2015-11)
# Private class PrivateTest(): def __inaccessible(self): print('Not permitted to use from outside the object') def accessible(self): print('Permitted to access') self.__inaccessible() >>> p1 = PrivateTest() >>> p1.__inaccessible() Traceback (most recent call last): File "<pyshell#56>", line 1, in <module> p1.__inaccessible() AttributeError: 'PrivateTest' object has no attribute '__inaccessible' >>> p1.accessible() Permitted to access Not permitted to use from outside the object
빅데이터분석교육(2015-11)
# Program II02_class.py: class Person: population = 0 def __init__(self, name, age): self.name = name self.age =age print('{0} has been born!'.format(self.name)) Person.population +=1 def __str__(self): return '{0} is {1} years old'.format(self.name, self.age) def __del__(self): print('{0} is dying! :'.format(self.name)) Person.population -=1 def totalPop(): print('There are {} population in the world.'.format(Person.population))
p1 = Person("jonny",20) print(Person.population) p2 = Person("mary",25) print(Person.population) print(p1) print(p2)
빅데이터분석교육(2015-11)
>>> class Point:
""" Point class: represents and manipulates x,y coord."""
def __init__(self):
""" Create a new point at the origin"""
self.x = 0
self.y =0
>>> p1 = Point()
>>> p2 = Point()
>>> p2.x =3
>>> p2.y = 5
>>> print(p2.x, p2.y)
3 5
>>> print("p2's coordinates: ", p2.x, p2.y)
p2's coordinates: 3 5
빅데이터분석교육(2015-11)
# Program II02_point.py class Point: """ Point class: represents and manipulates x,y coord.""" def __init__(self, x=0, y=0): """ Create a new point at the origin""" self.x = x self.y =y def distance_org(self): """ Compute distance from origin """ return((self.x **2)+(self.y **2)) ** 0.5 def print_location(self): print('({0},{1})'.format(self.x, self.y)) def __str__(self): return 'Point of ({0},{1})'.format(self.x, self.y)
def halfway(self, target): """ return the halfway point between myself and the target""" mx = (self.x + target.x)/2 my = (self.y + target.y)/2 return Point(mx,my) #---------- >>> p1.print_location() (5,10) >>> p1=Point(5,10) >>> print(p1) Point of ((5,10) >>> str(p1) 'Point of ((5,10)‘ p = Point(3,4) >>> q = Point(5,12) >>> r = p.halfway(q) >>> print(r) Point of (4.0,8.0)
빅데이터분석교육(2015-11)
class Employee: 'Common base class for all employees' empCount = 0 def __init__(self, name, salary): self.name = name self.salary = salary Employee.empCount += 1 def displayCount(self): print ("Total Employee %d" % Employee.empCount) def displayEmployee(self): print ("Name : ", self.name, ", Salary: ", self.salary)
>>> emp1 = Employee("Zara", 2000) >>> emp2 = Employee("Manni", 5000) >>> emp1.displayEmployee() Name : Zara , Salary: 2000 >>> emp2.displayEmployee() Name : Manni , Salary: 5000 >>> print("Total Employee %d" % Employee.empCount) Total Employee 2 >>> # You can add, remove or modify attributes of classes and objects at any time >>> emp1.age = 30 >>> emp1.age = 35 >>> emp1.age 35 >>> hasattr(emp1, 'age') True >>> getattr(emp1, 'age') 35 >>> setattr(emp1, 'age', 35) >>> delattr(emp1, 'age')
빅데이터분석교육(2015-11)
# Class Attributes >>> hasattr(emp1, 'age') True >>> getattr(emp1, 'age') 35 >>> setattr(emp1, 'age', 35) >>> delattr(emp1, 'age') >>> Employee.__doc__ 'Common base class for all employees' >>> Employee.__name__ 'Employee' >>> Employee.__module__ '__main__' >>> Employee.__dict__ dict_proxy({'displayEmployee': <function displayEmployee at ‘… 0x0000000003512EC8>}) >>>
빅데이터분석교육(2015-11)
• # Inheritance
#!/usr/bin/python class Parent: parentAttr = 100 def __init__(self): print ("Calling parent constructor") def parentMethod(self): print ('Calling parent method') def setAttr(self, attr): Parent.parentAttr = attr def getAttr(self): print ("Parent attribute :", Parent.parentAttr)
>>> class Child(Parent): def __init__(self): print("Call’ child constructor") def childMethod(self): print("Calling child method") >>> c = Child() Calling child constructor >>> c.childMethod() Calling child method >>> c.parentMethod() Calling parent method >>> c.setAttr(200 ) >>> c.getAttr() Parent attribute : 200
빅데이터분석교육(2015-11)
# operator overloading
>>> class Vector:
def __init__(self, a,b):
self.a = a
self.b = b
def __str__(self):
return 'Vector (%d, %d)' % (self.a, self.b)
def __add__(self,other):
return Vector(self.a + other.a, self.b + other.b)
>>> v1 = Vector(2,10)
>>> v2 = Vector(5,-2)
>>> v1 + v2
<__main__.Vector object at 0x00000000034CE860>
>>> print (v1+v2)
Vector (7, 8)
빅데이터분석교육(2015-11)
# multiple inheritance >>> class Calculator: def calculate(self, expression): self.value = eval(expression) >>> class Talker: def talk(self): print("Hi, my value is ", self.value) >>> class TalkingCalculator(Calculator, Talker): pass >>> tc = TalkingCalculator() >>> tc.calculate('10+3*4') >>> tc.talk() Hi, my value is 22
빅데이터분석교육(2015-11)
OOD
빅데이터분석교육(2015-11)
Exception 처리
빅데이터분석교육(2015-11)
Exception 처리
• 예상치 못한 error를 처리하는 2가지 기법 – Exception Handling: Exceptions.
– Assertions: 다룬 바 있음.
• Exception이란? – 프로그램의 정상적인 수행을 중단시키는 event.
– 즉, error를 나타내는 Python object = exceptional condition
– 보통 이 경우 raises an exception.
– (중요) 각각의 exception은 어떤 class의 instance이다.
– Exception Hierarchy • https://docs.python.org/3/library/exceptions.html#exception-hierarchy
• Custom Exception
• Exception과 함수 – 즉각 이를 처리 (handle)하지 않으면 (즉, 방어적 프로그래밍)
프로그램은 수행을 중단하고 error message (= traceback)와 함께 프로그램 종료
– 함수 내에서 exception이 발생했는데 적절히 처리하지 못하면 그 함수를 호출한 곳으로 propagate (bubble-up)
빅데이터분석교육(2015-11)
• Exception의 처리 (Handling an exception): – Catching Exception - 처리순서 (trapping) – try: – 의도하는 작업; – except ExceptionI: – … – except ExceptionII: – ........... – else: – If there is no exception then execute this block.
– raise() - 의도적으로 exception을 발생시킴
>>> raise Exception Traceback (most recent call last): File "<pyshell#165>", line 1, in <module> raise Exception Exception
빅데이터분석교육(2015-11)
while True:
try:
n=input('Enter an integer: ')
n=int(n)
print('Successful!')
break
except ValueError:
print('No valid integer! Please try agin...')
print('again')
빅데이터분석교육(2015-11)
>>> x = 5+'ham' Traceback (most recent call last): File "<pyshell#23>", line 1, in <module> x = 5+'ham' TypeError: unsupported operand type(s) for +: 'int' and 'str' >>> try: x = 5+'ham' except: print('sorry, a mistake!') sorry, a mistake! >>> try: x = 5+'ham' except: pass
빅데이터분석교육(2015-11)
>>> try: x= 1/0 except ZeroDivisionError: print('mistake: div by zero) SyntaxError: EOL while scanning string literal >>> try: x= 1/0 except ZeroDivisionError: print('mistake: div by zero') print("routines to handle such error HERE") finally: print('Print this anyway') mistake: div by zero routines to handle such error HERE Print this anyway
빅데이터분석교육(2015-11)
#!/usr/bin/python
try:
fh=open('testfile','w')
fh.write("this is my test file for exception handling!")
except IOError:
print("Error: can't find a file or read data")
else:
print("Written successfully")
fh.close
빅데이터분석교육(2015-11)
II. Python 프로그래밍 (3)
• Standard Library
• 정규표현식 (Regular Expression)
• Database 활용
빅데이터분석교육(2015-11)
Standard Library
• 개념
• 주요 내용
• 내장함수 (예)
• 예제 – sys module
– os module
– fileinput module
– random module
– shelve module
– pickle module
빅데이터분석교육(2015-11)
• Python standard library – https://docs.python.org/3/library/
– 내장 (built-in) module들을 간직한 라이브러리 • Python으로 작성
• C, C++, …
– standard library 이외에도 방대한 3rd party library 가 있음.
– Python Package Index.(2014년 10월 현재 50,284개) • Python 관련 software repository
• https://pypi.python.org/pypi
빅데이터분석교육(2015-11)
Python Standard Library 주요 내용
• (1) 내장 함수 및 내장 상수 – 내장함수 – 뒷면 참조 – True/False/None/...
• (3) 내장 타입 (Built-in Types) – 데이터 타입 – Exception
• (4) Text Processing Services – string — String관련 method와
formatting spec. – ** re — Regular expression – 기타 (difflib, unicodedata
(Unicode Database) 등 ...)
• (5) Numeric 및 Mathematical Modules – math, cmath, random 등...
• (6) Data Persistence/파일 관련 – ** pickle (Python object
serialization) – marshal — Internal Python
object serialization – ** sqlite3 – zlib, gzip 등 파일 압축관련 , csv,
configparser 등 파일 포맷관련 – hashlib 등 암호화 관련
• (7) OS 서비스 관련 – pathlib, os.path, fileinput, glob
등 파일/디렉토리 관련 – argparse, getopt, logging 등 – platform, errno 등 – ctypes – threading, multiprocessing 등
Concurrent Execution
빅데이터분석교육(2015-11)
• (8) 네트워크 – socket, ssl, asyncio, signal 등
의 IPC, network 관련
– email, json, mailbox 등의 네트워크 데이터 관련
– html, XML 처리 관련
– webbrowser, cgi, urllib, http 등
• (9) Python Runtime Services – sys, sysconfig, Built-in objects
– __main__ (Top-level script 환경)
• (10) 기타 – Multimedia 서비스
– Internationalization (gettext, locale, ...)
– Program Frameworks
– GUI (tkinter 등)
– IDLE
– 개발도구 (pydoc, doctest, unittest, bdb, pdb, ...)
– S/W Packaging과 Distribution (distutils 등)
– Custom Python Interpreters
– Importing Modules
– Python Language Services (parser, formatter, ...)
빅데이터분석교육(2015-11)
예: 내장함수
abs() dict() help() min() setattr()
all() dir() hex() next() slice()
any() divmod() id() object() sorted()
ascii() enumerate() input() oct() staticmethod()
bin() eval() int() open() str()
bool() exec() isinstance() ord() sum()
bytearray() filter() issubclass() pow() super()
bytes() float() iter() print() tuple()
callable() format() len() property() type()
chr() frozenset() list() range() vars()
classmethod()
getattr() locals() repr() zip()
compile() globals() map() reversed() __import__()
complex() hasattr() max() round()
delattr() hash() memoryview()
set() 빅데이터분석교육(2015-11)
• sys module 관련 주요 함수
– argv
– exit([arg])
– modules
– path
– platform
– stdin
– stdout
– stderr
• 활용예 – sys.argv
– sys.exit
– …
>>> import sys
>>> print (sys.version)
3.4.2 (v3.4.2:ab2c023a9432, Oct 6 2014, 22:15:05) [MSC v.1600 32 bit (Intel)]
>>> print (sys.platform)
win32
>>> print(sys.path)
['C:/Python34', 'C:\\Python34\\Lib\\idlelib', …
>>> sys.version_info
sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)
>>> if sys.version_info<(2,6,0):
sys.stderr.write("You need python 2.6 or later to run this script\n")
exit(1)
빅데이터분석교육(2015-11)
# II03_systest.py
import sys
args = sys.argv[1:]
args.reverse()
print('~'.join(args))
--
$ python II03_systest.py this is a test
빅데이터분석교육(2015-11)
빅데이터분석교육(2015-11)
• os module 관련 주요 함수 – environ
– system(command)
– startfile(command)
– sep
– pathsep
– linesep
– urandom(n)
• 참고: – MS Windows에서는 Python
프로그램이 계속 수행됨
– Linux에서는 Python 프로그램은 os 관련 함수 종료까지 대기
import os
>>> os.startfile(r'C:\Program Files\Internet Explorer\iexplore.exe')
빅데이터분석교육(2015-11)
빅데이터분석교육(2015-11)
• fileinput module 관련 주요 함수 – Input([files[, inplace[,
backup]])
– filename()
– lineno()
– filelineno()
– lineno()
– isstdin()
– nextfile()
– close()
# II03_numberlines.py
import fileinput
for line in fileinput.input(inplace=1):
line = line.rstrip()
num = fileinput.lineno()
print('%-40s # %2i' % (line, num))
C:\python34>..\python II03_numberlines.py II03_numberlines.py
• type…
빅데이터분석교육(2015-11)
빅데이터분석교육(2015-11)
• random module 주요함수 – random()
– getrandbits()
– uniform(a,b)
– randrange([start], stop, [step])
– choice(seq)
– shufle(seq[, random])
– sample(seq, n)
# Program II03_randrange.py
from random import randrange
num = input('How many dice? ')
sides = input('How many sides per dice? ')
sum =0
for i in range(int(num)):
sum += randrange(int(sides))+1
print('The result is: ', sum)
빅데이터분석교육(2015-11)
>>> values = list(range(1,11)) + 'Jack Queen King'.split() >>> values [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'Jack', 'Queen', 'King'] >>> suits = 'diamonds clubs hearts spades'.split() >>> suits ['diamonds', 'clubs', 'hearts', 'spades'] >>> >>> from pprint import pprint >>> deck = ['%s of %s' % (v,s) for v in values for s in suits] >>> pprint(deck[:12]) ['1 of diamonds', '1 of clubs', ... '3 of hearts', '3 of spades'] >>> from random import shuffle >>> shuffle(deck)
>>> pprint(deck[:12]) ['4 of clubs', '1 of diamonds', .. '2 of diamonds', '5 of spades'] >>> >>> # dealer distributes a card >>> while deck: input(deck.pop()) Queen of clubs '' Jack of clubs '' 3 of diamonds ... >>>
빅데이터분석교육(2015-11)
• shelve module
– open(filename[, flag='c'[, protocol=None[, writeback=False]]])
• 개념
– “shelf” = a persistent, dictionary-like object.
– shelf에서의 value는 어떠한 object라도 상관없다.(cf.“dbm” 데이터베이스)
– 즉, anything that the pickle module can handle. (예: class instances, recursive data types, 등)
– 단, key는 일반 string일 것
– 참조: II03_shelf.py
• pickle module – dump
– load
– dumps
– loads
– ...
• 개념 – serializing and de-serializing
object structure.
– “Pickling”; object가 byte stream으로,
– “unpickling”; byte stream을 object로
– = “serialization”, “marshalling,” “flattening”
빅데이터분석교육(2015-11)
# pickle
>>> help(pickle)
~~
>>> movieList=['Monty Python', 'Inception','Star Wars','Lord of the Rings']
>>> print(movieList)
['Monty Python', 'Inception', 'Star Wars', 'Lord of the Rings']
>>> outFile=open('pickle.txt','wb')
>>> pickle.dump(movieList, outFile)
>>> outFile.close()
====== RESTART ==========
>>> import pickle
>>> inFile = open('pickle.txt','rb')
>>> newList= pickle.load(inFile)
>>> print(newList)
['Monty Python', 'Inception', 'Star Wars', 'Lord of the Rings']
>>> inFile.close
<built-in method close of _io.BufferedReader object at 0x00000000034FE258>
빅데이터분석교육(2015-11)
정규표현식 (Regular Expression)
• 개념
• 주요함수
• 주요 method
• Identifier
• Modifier
• White Space
• 기타
• 예제
빅데이터분석교육(2015-11)
• 정규표현식? – re, regular express, regex, regexp
• 주요 내용 – 특수문자에 대한 escape – Alternative와 subpattern
– p(ython|erl)
– Optional 및 반복되는 subpattern – r'(http://)?(www\.)?python\.org’ – == http:// (o,x), www(o,x)
• (pattern)* ; zero or more times • (pattern)+ ; one or more times • (pattern){m,n} ; m to n times
– 문자열의 시작과 끝 • ^와 $
빅데이터분석교육(2015-11)
• re module
– 주요 함수
– match object의 주요 method
빅데이터분석교육(2015-11)
• Identifiers: – \d = any number – \D = anything but a number – \s = space – \S = anything but a space – \w = any letter – \W = anything but a letter – . = any character, except for a
new line – \b = space around whole
words – \. = period. (보통 ‘.’ 는 any
character를 뜻하므로 반드시 …)
• Modifiers: – {1,3} = for digits, u expect 1-3
counts of digits, or “places” – + = match 1 or more – ? = match 0 or 1 repetitions. – * = match 0 or MORE
repetitions – $ = matches at the end of
string – ^ = matches start of a string – | = matches either/or. Example
x|y = will match either x or y – [] = range, or “variance” – {x} = expect to see this amount
of the preceding code. – {x,y} = expect to see this x-y
amounts of the precedng code
빅데이터분석교육(2015-11)
• White Space Chars: – \n = new line – \s = space – \t = tab – \e = escape – \f = form feed – \r = carriage return
• ESCAPE 대상이 되는 특수문자 – . + * ? [ ] $ ^ ( ) { } | \
• Brackets: – [] = quant[ia]tative = either
quantitative, or quantatative. – [a-z] = any lowercase letter a-z – [1-5a-qA-Z] = all numbers 1-5,
lowercase letters a-q and uppercase A-Z
빅데이터분석교육(2015-11)
>>> exampleString = ''' Jessica is 15 years old, and Daniel is 27 years old. Edward is 97, and his grandfather, Oscar , is 102.'''
>>>
>>> ages = re.findall(r'\d{1,3}', exampleString)
>>> names = re.findall(r'[A-Z][a-z]*', exampleString)
>>> print(ages)
['15', '27', '97', '102']
>>> print(names)
['Jessica', 'Daniel', 'Edward', 'Oscar']
>>> ageDict= {}
>>> x=0
>>> for eachName in names:
ageDict[eachName]=ages[x]
x+=1
print(ageDict)
• 결과
{'Jessica': '15'}
{'Jessica': '15', 'Daniel': '27'}
{'Edward': '97', 'Jessica': '15', 'Daniel': '27'}
{'Edward': '97', 'Jessica': '15', 'Oscar': '102', 'Daniel': '27'}
>>>
빅데이터분석교육(2015-11)
# x = urllib.request.urlopen('https://www.google.com') # print(x.read()) # del x
# urllib module과 re의 결합 import urllib.request import urllib.parse import re url = 'http://www.python.org' values = {'s':'python', 'submit':'search'} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() print(respData)
빅데이터분석교육(2015-11)
#!/usr/bin/python import re line = "Cats are smarter than dogs" matchObj = re.match( r'(.*) are (.*?) .*',
line, re.M|re.I) if matchObj: print "matchObj.group() : ",
matchObj.group() print "matchObj.group(1) : ",
matchObj.group(1) print "matchObj.group(2) : ",
matchObj.group(2) else: print "No match!!"
• 수행 시 결과
>>> matchObj.group() : Cats are smarter than
dogs matchObj.group(1) : Cats matchObj.group(2) : smarter
빅데이터분석교육(2015-11)
Database 활용 (생략)
• 개요
• DB API 에서의 최소한의 작업절차:
• MySQL 예 (Python v.2x)
• SQLite3의 API
• SQLite3의 예제
빅데이터분석교육(2015-11)
III. Python을 활용한 데이터분석 (1)
• 개요
• 환경설정
빅데이터분석교육(2015-11)
개요
빅데이터분석교육(2015-11)
Python 활용 데이터분석 프로세스
빅데이터분석교육(2015-11)
데이터분석 용 주요 패키지
• ipython,
• numpy,
• matplotlib,
• scipy
• pandas
빅데이터분석교육(2015-11)
ipython
• 개념 – 대화형 컴퓨팅 작업용 command shell (여러 프로그램 언어 지원)
• 특징 – introspection – 추가의 shell 기능- (terminal and Qt-based)
• tab completion & history, rich media
• Ipython Notebook – 브라우저 기반의 코딩, 수식, inline plots, rich media. – 대화형 data visualization 및 GUI toolkits 적용 – Flexible, embeddable interpreters to load into one's own projects. – parallel computing용 performance tools
• Profileing & 최적화 – %time, %timeit in Ipython – %prun ; to profile a statement with cProfile – %run –p ; to profile whole programs – Line_profiler module, for line-by-line timing
빅데이터분석교육(2015-11)
numpy
• 개념 – 수치 데이터 처리 기능을 확장
• 주요 기능 – large, multi-dimensional arrays and matrices 및
high-level 수학 함수 지원
• 배경 – Numeric에서 출발
빅데이터분석교육(2015-11)
• numpy에서의 Array 생성 함수
빅데이터분석교육(2015-11)
• Universal functions – ndarray 상의 데이터에 대해 element-wise 작업
– ufunc
– 일종의 vectorized wrapper
빅데이터분석교육(2015-11)
빅데이터분석교육(2015-11)
matplotlib
• 개념 – plotting library for the Python and its NumPy.
• 주요 기능 – Plot을 애플리케이션에 내장하기 위한 object-oriented API
• general-purpose GUI toolkits 이용( wxPython, Qt, or GTK+)
• pylab – state machine 기반 (예: OpenGL),
– MATLAB과 유사
– SciPy 은 matplotlib을 이용
빅데이터분석교육(2015-11)
scipy
• 개념 – 과학, 분석 용 오픈소스 기반 Python library
• Numpy와 scipy – NumPy array object위에서 구축/개발됨
– NumPy stack의 일부분 • Matplotlib, pandas 및 SymPy을 포함
• 주요 내용 – 최적화, linear algebra, integration, interpolation, special
functions, FFT, signal 및 image processing, ODE solvers
– 기타의 science and engineering 작업 도구
• 라이센스 – BSD license
빅데이터분석교육(2015-11)
pandas
• 개념 – Python 을 이용한 데이터 분석을 위한 software library
– Data munging/preparation/cleaning /integraation
– Rich data manipulation tool (Numpy 이용)
– Fast, intuitive data structures
– Python과 DSL (예: R)의 중간영역 (?)
– R의 data.frame과 유사
– Easy-to-use, highly consistent API
빅데이터분석교육(2015-11)
세부 내용
• 주요 기능 – DataFrame object – Integrated indexing을 이용한 데이터
분석 – 여러 포맷 지원 (CSV, text files, Excel, SQL databases,
HDF5) – data alignment 및 결측 데이터를 위한 통합 기능 – 데이터셋의 reshaping 및 pivoting – 대규모 dataset 용 label-based slicing, indexing,
subsetting – 데이터 Aggregating/ transforming data (group by 엔진) split-apply-combine operations on data sets;
– Hierarchical axis indexing – Time series 기능
빅데이터분석교육(2015-11)
pandas.core
• Data structures – Series (1D)
– DataFrame (2D)
– Panel (3D)
• NA-friendly statistics
• Index implementations/label-indexing
• GroupBy engine
• Time series tools – Data range generation
– Extensible data offsets
• Hierarchical indexing stuff
빅데이터분석교육(2015-11)
Pandas의 데이터 모델
• Series: – 1D label – numpy array – Subclass of numpy.ndarray – Data: any dtype – Index labels need not be ordered – Duplicates are possible (but result in reduced functionality)
• DataFrame – 2D table with rows and column labels – potentially heterogeneous columns – ndarray-like, but not ndarray – column 별로 서로 다른 dtype을 가질 수 있음 – Row and column index – Size mutable: insert and delete columns
빅데이터분석교육(2015-11)
index
• Index – Every axis has an index – 신속한 lookup과 Data alignment and join operations – Hierarchical indexes
• Semantics: a tuple at each tick • Enables easy group selection • Terminology: "multiple levels" • Natural part of GroupBy and reshape operations
• Data Alignment – Binary operations are joins! – "Outer join by default – Data Alignment – DataFrame joins/aligns on both axes – Irregularly-indexed data
빅데이터분석교육(2015-11)
Series
• Subclass of numpy.ndarray
• Data: any dtype
• Index labels need not be ordered
• Duplicates are possible (but result in reduced functionality)
빅데이터분석교육(2015-11)
DataFrame
• ndarray-like, but not ndarray
• Each column can have a different dtype
• Row and column index
• Size mutable: insert and delete columns
빅데이터분석교육(2015-11)
Hierarchical indexes
• Semantics: a tuple at each tick
• group selection이 손쉬워짐
• 용어: "multiple levels"
• Natural part of GroupBy and reshape operations
빅데이터분석교육(2015-11)
Data Alignment
• Binary operations are joins!
• "Outer join by default
• DataFrame joins/aligns on both axes
빅데이터분석교육(2015-11)
• Irregularly-indexed data
• Axis metadata
빅데이터분석교육(2015-11)
GroupBy
• Splitting axis into groups – DataFrame columns
– Arrays of labels
– Functions, applied to axis labels
• grouped data 작업방식은 다양함 – Iterate: "for key group, in grouped"
– Aggregate: grouped.agg(f)
– Transform: grouped.transform(f)
– Apply: grouped.apply(f)
빅데이터분석교육(2015-11)
기타
• Agg, Transform, Apply – Agg/Transform are specialized, faster
• Agg: produce a single aggregated value per column per group
• Transform: alster values, but not their size
– Apply: completely generic, but slower
• Join/concatenation algorithms
• Sparse version of Series, DataFrame,…
• IO tools: csv files, HDF5, Excel
• Moving window statistics (rolling mean, …)
• Pivot tables
• High-level matplotlib interface
• Better integration with stats models and scikit-learn
• R integration via rpy2
빅데이터분석교육(2015-11)
pandas roadmap
• Javascript visualization framework과의 통합 – D3, Flot, others
• Alternate DataFrame “backends” – Memory maps
– HDF5/PyTables
– SQL or NoSQL-backed
• Ipython Notebook과의 통합 강화
• ggplot2 for Python
• pandas for Big Data – Alternate DataFrame backends
– Integration with MapReduce framework
빅데이터분석교육(2015-11)
환경설정
• 패키지
– ipython,
– numpy,
– matplotlib,
– scipy
– pandas
빅데이터분석교육(2015-11)
• 개별적 설치 – 각 프로젝트 사이트 이용
• http://www.ipython.org/ • http://www.numpy.org/ • http://pandas.pydata.org/ • http://matplotlib.org/
• 통합설치 – Enthought Canopy
• https://store.enthought.com/
– Python(x,y) • https://code.google.com/p/pythonxy/
– Anaconda • https://store.continuum.io/cshop/anaconda/
빅데이터분석교육(2015-11)
실습
• 기초
• 응용
빅데이터분석교육(2015-11)
기초
• ipython
• numpy
• matplotlib
• scipy
• pandas
빅데이터분석교육(2015-11)
실습 – 코드와 데이터
• scikit learning 사이트
• Python for Data Analysis
– By Wes McKinney
빅데이터분석교육(2015-11)
III. Python을 활용한 데이터분석 (2)
• 데이터 분석 사례
• 마무리
빅데이터분석교육(2015-11)
데이터분석 사례
빅데이터분석교육(2015-11)
마무리
• Python과 데이터 분석의 영역
• Python과 R
• Python과 빅데이터
빅데이터분석교육(2015-11)
Python과 데이터 분석의 영역
• https://pypi.python.org/pypi?%3Aaction=browse
빅데이터분석교육(2015-11)
• 정형데이터의 분석 – 데이터베이스 (SQL, NoSQL) – 기계학습, Mining – 수리, 통계분석
• 비정형데이터의 분석 – re, pawk 등 – www.nltk.org – …
• 빅데이터
– 다양한 시도
빅데이터분석교육(2015-11)
Python과 R
• R의 장점: – 수리, 통계에 특화 (DSL)
– 5000여개의 특화된 패키지
• Python의 장점: – 강력한 범용언어
• 충실한 OOL (Object-Oriented Language),
• Dynamic Typing
• …
– 50,000여개 패키지와 방대한 Framework
• 통합의 어려움 – 설계사상의 차이점
– Python관점: more pythonic appoarch의 구현문제
– Namespace 등
빅데이터분석교육(2015-11)
• 다양한 시도 – (1) Rserver
• = R을 위한 TCP/IP 서버 • 다양한 client가 R을 access하도록 함 (예: c/c++/c#/Ruby, ...) • pyRserve를 통해 Python client가 R을 직접 호출 가능
– R code는 Python으로 callback
– (2) rPython • R에서 python을 호출 • python.call( "len", 1:3 ) • a <- 1:4 • b <- 5:8 • python.exec( "def concat(a,b): return a+b" ) • python.call( "concat", a, b)
– (3) rpy2 • Python에서 r을 호출 • rpy에서 출발
빅데이터분석교육(2015-11)
Python과 빅데이터
• 배경 – Big Data & Hadoop
– Jython 프로그램 이용
– Hadoop streaming 이용
– source: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
• Python MapReduce 프로그래밍 개요 – 환경: Hadoop on Linux
– 데이터: WordCount 예에 Gutenberg 데이터(https://www.gutenberg.org/) 적용
빅데이터분석교육(2015-11)
• 방법론 – Hadoop Streaming
• 모든 Hadoop job을 표준입출력 (stdin, stdout)으로 여긴다.
• (http://hadoop.apache.org/docs/r1.1.2/streaming.html#Hadoop+Streaming)
• Hadoop에 포함된utility – 어떤 프로그램 언어로 작성되었던 상관없이 Hadoop의 Map/Reduce job으로 이
용할 수 있다. 즉, Python의 sys.stdin 으로 입력데이터를 읽고 sys.stdout으로 결과물을 출력한다.
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc
빅데이터분석교육(2015-11)
• Linux shell – = command interpreter
– Standard I/O
• 일단 명령이 수행되면 process가 만들어지며 이 process opens 3 flows:
• stdin,
– standard input reads the input data.
• stdout, – standard output writes the output data.
• stderr, – standard error writes the error messages.
– Redirection과 Pipe
빅데이터분석교육(2015-11)
Python MapReduce Code
• Map 단계 (/home/hduser/mapper.py) #!/usr/bin/env python import sys # input comes from STDIN (standard input) for line in sys.stdin: line = line.strip() words = line.split() for word in words: # write the results to STDOUT (standard output); # what we output here will be the input for the Reduce step # print '%s\t%s' % (word, 1)
빅데이터분석교육(2015-11)
• Reducer 단계 (/home/hduser/reducer.py) #!/usr/bin/env python from operator import itemgetter import sys current_word = None current_count = 0 word = None for line in sys.stdin: line = line.strip() word, count = line.split('\t', 1) try: count = int(count) except ValueError: continue if current_word == word: current_count += count else: if current_word: print '%s\t%s' % (current_word, current_count) current_count = count current_word = word if current_word == word: print '%s\t%s' % (current_word, current_count)
빅데이터분석교육(2015-11)
• Testing – mapper와 reducer를 별도로 test하여 확인된 후 MapReduce
job으로 실행
• Python 코드를 Hadoop에서 수행 – (1) 데이터를 HDFS로 복사
$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg
– (2) MR job 수행 $ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar \ -file /home/hduser/mapper.py -mapper /home/hduser/mapper.py \ -file /home/hduser/reducer.py -reducer /home/hduser/reducer.py \ -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output
빅데이터분석교육(2015-11)
• 수행과정 $ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -mapper /home/hduser/mapper.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output additionalConfSpec_:null null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming packageJobJar: [/app/hadoop/tmp/hadoop-unjar54543/] [] /tmp/streamjob54544.jar tmpDir=null [...] INFO mapred.FileInputFormat: Total input paths to process : 7 [...] INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local] [...] INFO streaming.StreamJob: Running job: job_200803031615_0021 [...] [...] INFO streaming.StreamJob: map 0% reduce 0% … [...] INFO streaming.StreamJob: map 100% reduce 100% [...] INFO streaming.StreamJob: Job complete: job_200803031615_0021 [...] INFO streaming.StreamJob: Output: /user/hduser/gutenberg-output
빅데이터분석교육(2015-11)
• 실행결과
빅데이터분석교육(2015-11)
• Python 프로그래밍 관련 참고한 문헌:
– M.L. Hetland, “Beginning Python”, Apress, 2008
– V.L. Ceder, “The Quick Python Book”, Manning, 2010
– P.Wentworth (외), How to Think Like a Computer Scientist”, 2011
– Mark Lutz, Learning Python (5th ed), O’Reilly, 2013
– 기타
• 관련 article 및 관련 사이트 등
빅데이터분석교육(2015-11)
• 참고도서 – Python for Data Analysis
• 데이터 – 2012 US Presidential Election
FEC disclosure data (CSV) – Baby names: top 1000 US boy
and girl names 1880~2008 (CSV) – USDA Food Nutrient database
(JSON) – https://github.com/pydata/pyda
ta-book
빅데이터분석교육(2015-11)
scikit.org
• http://scikit-learn.org/
빅데이터분석교육(2015-11)