Python을 이용한 데이터 분석 - openwith.net¹…데이터분석2015Part4_2.pdf · Python을 이용한 데이터 분석 2015.11 윤형기 ([email protected]) 빅데이터분석교육(2015-11)

Python을 이용한 데이터 분석

2015.11

윤형기 ([email protected])

빅데이터분석교육(2015-11)

mailto:[email protected]

진행 순서

단계 주요 내용

Python 기초 Python 개요와 설치

변수, 문장, 조건문과 Loop

함수, Module과 프로그램, 예제 프로그램

Python 프로그래밍 (1)

String

List/Dictionary/Set

Module과 Package


File & I/O

OOP

Exception 처리


Regular Expression, 데이터베이스 활용

Standard Library, 기타의 유용한 기능

Python을 활용한 데이터분석 (1)

Python 활용 데이터분석 (1)


Python을 활용한 데이터분석 (2)


Python과 빅데이터

마무리 빅데이터분석교육(2015-11)

I-1. 서론

• 일반론

• 설치

• Python 프로그램 맛보기


일반론

• 배경

• 특징

• 종류

• 본 강의의 범위


배경

• 1990년대 후반 Guido van Rossum이 개발 (네덜란드)

• 오픈소스 cross-platform – General Public License (GPL)라이선스)

• ABC, Modula-3, C, C++, Algol-68, SmallTalk, Unix shell 등을 기반으로 개발


Python의 특징 (1)

• High-level 언어

• Interpreted 언어

• Interactive 언어

• 객체지향 언어

• Scripting 언어


Python의 특징 (2)

• Easy-to-learn

• Easy-to-read:

• Easy-to-maintain:

• 수학적 개념

• 다른 언어와의 비교 – C, C++

– Java

– R


Python의 종류

• “Python” or “CPython” written in C/C++ – Version 2.7 ; in mid-2010

– Version 3.1.2 ; in early 2010

• “Jython” ; Java for the JVM

• “IronPython” ; in C# for the .Net environment

• 분야별 Package 활용


개발환경

• Python Interactive Shell – IDLE

• 기타 – 1. PyDev with Eclipse – 2. Komodo – 3. Emacs – 4. Vim – 5. TextMate – 6. Gedit – …

• iPython


Python 이용방법

• (1) 대화식 (Interactive Interpreter):

• (2) 명령어 줄을 통한 Script 프로그램의 활용

• http://www.python.org/

• https://docs.python.org/3.5/

• https://docs.python.org/2/tutorial/


http://www.python.org/

https://docs.python.org/3.4/

https://docs.python.org/2/tutorial/

본 강의의 범위

• 다루는 범위 – Python 프로그래밍 언어 기초

• 함수, String 처리, OOP, RE, Database 활용 등 전반 • Numpy 등 데이터 분석용 package

– 활용 • 데이터분석 (Numpy, Pandas, … 등) • 기계학습과 텍스트 분석

– 개발환경: • Version 3.4 기본 (일부 2.7) • MS Windows 환경 (기본 IDLE 설치)

• 제외되는 항목 – GUI – 통신/Web/Hacking/Unicode 등


설치

• Python 기본설치

– MS Windows 용 설치

• Eclipse with pydev

• 데이터분석 용 패키지 설치

– (해당 시간)


Python 기본설치 – 실습

• www.python.org 에서 다운로드

• 실행

– 명령어 줄에서의 python

– Python IDLE (Integrated Development Environment)

– (Linux)

– #!/usr/bin/python


http://www.python.org/

Eclipse with pydev

• Java 설치 – Java for developers > JDK

• Eclipse – 다운로드 > 실행 – 경로변경

• JAVA_HOME만들고 • PATH 맨 앞

에: %JAVA_HOME%\bin;

– 확인: cmd > java

• pydev – Help > Install New S/W >

pydev (http://pydev.org/updates)

– Preferences – Perspectives

• win> pref > – Run > debug > Always launch … – Gen > Contents type > … UTF-8 – Pydev > editor > ..


http://pydev.org/updates

맛보기

• Python 이용 맛보기


맛보기 – 실습

• 숫자처리 – Numbers & Expressions

• >>> 2+2

• >>> 1.0 / 2.0

• >>> 1/2

– 8진수, 16진수

• >>> 0xAF (16진수)

• >>> 100000000000000000000

• 프로그램 – >>> print(“Hello World!”)

– Comment의 활용

– #


I-2. Python 프로그램 기초

• 변수

• 데이터타입

• 문장 (statements)

• Module

• String

• 프로그램 저장과 실행

• Class와 library

• 예제프로그램 – Python Turtle Graphics


데이터 타입과 변수

• 기본 (built-in) 데이터타입 – 숫자

• Integers

• Floats

• Complex numbers

• Booleans

– List, Tuple, • List : 일종의 array + α

• Tuple: Immutable

– String

– Dictionary

– Sets

– File Objects

– # 데이터 타입

>>> type(20)

<class 'int'>

>>> type("17")

<class 'str'>

>>> type("3.2")

<class 'str'>

>>> type('this is a string')

<class 'str'>

>>> type("""and this""")

<class 'str'>

>>> print('''"Oh no", she said''')

"Oh no", she said


• # 숫자

• >>> 2+2 • 4 • >>> 1/2 • 0.5 • >>> 10/3 • 3.3333333333333335 • >>> 2**3 • 8 • >>> (-3)**4 • 81 • >>> 0xAF • 175


• # List

>>> []

[]

>>> [1]

[1]

>>> [1,2,3,4]

[1, 2, 3, 4]

>>> [1, "two", 3, 4.0, ["a","b"],(5,6)]

[1, 'two', 3, 4.0, ['a', 'b'], (5, 6)]

>>> x = ["first","second","third","fourth"]

>>> x[0]

'first'

• >>> x[2]

• 'third'

• >>> x[-2]

• 'third'

• >>> x[1:-1]

• ['second', 'third']

• >>> x[:3]

• ['first', 'second', 'third']

• >>> x[-2:]

• ['third', 'fourth']


• # Tuple

• >>> ()

• ()

• >>> (1,)

• (1,)

• >>> (1,2,3,4)

• (1, 2, 3, 4)

• >>> (1,2,"three",["a","b"], (5,6))

• (1, 2, 'three', ['a', 'b'], (5, 6))


– # 문자열

>>> 'Let's go'

SyntaxError: invalid syntax

>>> 'Let\'s go!'

"Let's go!"

>>> "\"Hello, world\" she said"

'"Hello, world" she said'

>>> '"Hello, world" she said'

'"Hello, world" she said'

– # 문자열 연결 (concatenation)

>>> "Let's say " '"Hello, world"'

'Let\'s say "Hello, world"'

>>> "Hello, " + "world!"

'Hello, world!'

>>> x = "Hello, "

>>> y = "world!"

>>> x + y

'Hello, world!'


• Long string – “”” ~~ “”” 또는 ‘’’ ~~~ ‘’’

>>> ''' Thsi is very long string, It continues here. And it's not over yet. "Hello world!" still here .''' ' This is very long string,\nIt continues here.\nAnd it\'s not over yet.\n' >>> print("hello \ world and again \ you too") hello world and again you too


• Unicode • # 3.4 에서

>>> import sys >>> sys.getdefaultencoding() 'utf-8' >>> type('파이썬') <class 'str'> >>> type(u'파이썬') <class 'str'> >>> s='파이썬' >>> u=u'파이썬' >>> s==u True

• # 2.6에서

>>> type(u'파이썬') <type 'unicode'> >>> type('파이썬') <type 'str'> >>> s='파이썬' >>> u=u'파이썬' >>> s==u


• # Dictionary • >>> x = {1:"one", 2:"two"} • >>> x[1] • 'one' • >>> x[2] • 'two' • >>> x.get(4, "not available") • 'not available‘

• # sets • >>> x =set([1,2,3,1,3,5]) • >>> x • {1, 2, 3, 5} • >>> 1 in x • True • >>> 4 in x • False


• 변수 – >>> x =3 – >>> x *2 – 6 – >>> message = "What's up?" – >>> message – "What's up?“ – >>> day="Monday" – >>> day – 'Monday' – >>> day="Friday" – >>> day – 'Friday' – >>> day=21 – >>> day – 21

• 변수명과 keyword – Keyword는 사용하지 말 것 – __ 사용가능하나 – 맨 앞에 나오면 특수한 의미

• 보통 – PEP 규칙 – Class는 대문자로 시작 – 함수, Method 등은 camel type – Constant 는 모두 대문자 – 기타변수는 소문자

• Python Style Guide – http://legacy.python.org/dev/p

eps/pep-0008/


http://legacy.python.org/dev/peps/pep-0008/





문장 (statements)

• 기본 >>> 2+2 4 >>> print(2*2) 4 >>>

• 사용자로부터의 입력 >>> input("How old are you? ") How old are you? 35 '35‘ >>> x = input("x: ") x: 30 >>> y=input("y: ") y: 25 >>> x+y '3025‘ >>> int(x)+int(y) 55

>>> response = input("What is the

radius? ") What is the radius? 4 >>> r = float(response) >>> area = 3.14159 * r ** 2 >>> print("The area is: ", area) The area is: 50.26544

• 작업순서 (precedence) – (애매하면 괄호로 묶을 것)


함수

• 개요

• 예제

>>> 2**3

8

>>> pow(2,3)

8

>>> 10+pow(2, 5) / 5.0

16.4

>>> abs(-10)

10

>>> int(3.14)

3

>>> int(3.99)

3

>>> int(-3.99)

-3

>>> int("23 bottles")

Traceback (most recent call last):

…


Module

• Module은 확장기능(을 담은 파일)

– >>> import math

– >>> math.floor(32.9)

– 32

– >>>

– >>> from math import sqrt

– >>> sqrt(9)

– 3.0

– >>>

>>> from math import sqrt

>>> sqrt(-1)


File "<pyshell#80>", line 1, in <module>

sqrt(-1)

ValueError: math domain error

>>>

>>> import cmath

>>> cmath.sqrt(-1)

1j

>>> (1+3j) * (9+4j)

(-3+31j)


프로그램의 저장과 실행

• IDLE 활용

• 명령어 줄에서 이용


Class와 Library

• OOP란?

• Class와 Instance

• Python 프로그래밍

– Procedural

– OOP

• Library

• Standard Library


Python 환경변수

변수 설명

PYTHONPATH PATH와 같은 역할.

PYTHONSTARTUP Interpreter 실행 때마다 동작하는 초기화 작업 – Unix .profile 또는 .login 파일과 유사

PYTHONCASEOK Windows 에서 대소문자 구별 여부를 지정

PYTHONHOME PYTHONSTARTUP or PYTHONPATH 디렉토리에 내장


Python Turtle Graphics

# turtle_test1.py

import turtle

wn = turtle.Screen()

alex = turtle.Turtle()

alex.forward(150)

alex.left(90)

alex.forward(75)

# turtle_test2.py

import turtle


wn.bgcolor("lightgreen")

tess = turtle.Turtle()

tess.color("blue")

tess.pensize(3)

tess.forward(50)

tess.left(120)

tess.forward(50)

wn.exitonclick() 빅데이터분석교육(2015-11)

# turtle_test3.py import turtle wn = turtle.Screen() wn.bgcolor("lightgreen") tess = turtle.Turtle() tess.color("hotpink") tess.pensize(5) alex = turtle.Turtle() tess.forward(80) tess.left(120) tess.forward(80) tess.left(120) tess.forward(80) tess.left(120)

tess.right(180) tess.forward(80) alex.forward(50) alex.left(90) alex.forward(50) alex.left(90) alex.forward(50) alex.left(90) alex.forward(50) alex.left(90) wn.exitonclick()


# turtle_test4.py

import turtle



for i in [0, 1, 2, 3]: # repeat 4

alex.forward(50)

alex.left(90)

wn.exitonclick()

# turtle_test5.py

import turtle



for aColor in ["yellow", "red", "purple", "blue"]:

alex.color(aColor)

alex.forward(50)

alex.left(90)

wn.exitonclick()


# turtle_test5.py import turtle wn = turtle.Screen() wn.bgcolor("lightgreen") tess = turtle.Turtle() tess.color("blue") tess.shape("turtle") print(range(5, 60, 2)) tess.up() for size in range(5, 60, 2): tess.stamp() tess.forward(size) tess.right(24) wn.exitonclick()

• up() ; – stops all drawing. Until down is

called, nothing will be drawn to the screen. Cursor movement will still take effect, however.

• stamp() – leave an impression on the canvas

• https://docs.python.org/2/library/turtle.html


https://docs.python.org/2/library/turtle.html




II. Python 프로그래밍 (1)

• List와 Tuple

• String 처리

• Dictionary

• 조건문과 Loop문


개요

• 데이터 구조 – 데이터를 일정한 기준에 의해 모아 놓은 것

(collection of data elements, structured in some way) – (종류) Built-in + 확장 – Container type의 데이터구조

• Sequence = mapping by element position (위치값 즉, index) • Mapping = mapping by element name (즉, key)

• Built-in Sequence – List – Tuples – String – Buffer objects – Xrange objects


• Sequence에 대한 주요 작업 – Indexing

– Slicing

– Adding

– Multiplying

– Membership 확인


List

• 개념

• 특징: – “Mutable!”

• Indexing

• Slicing

• Adding, Multiplying, Membership

• List관련 함수

• List methods – Method = object에 관련된 함수

(a function tightly coupled to some object)

– object.method(arguments)


(Sequence 공통의) List 작업

• Indexing

– Sequence 내의 element에 대해 순서 별로 index를 매겨서 각각 따로 이용하는 것

>>> greeting = 'Hello'

>>> greeting[0]

'H'

>>> greeting[-1]

'o'

>>> fourth = input('Year: ')[3]

Year: 2014

>>> fourth

'4‘

>>>fourth = input("year: ")[2:3]

• # II01_index.py (뒷면)


# Program: II01_index.py # Print out a date, given y, m, & d as no months = [ 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December' ]

# A list ending for 1 to 31 endings = ['st', 'nd', 'rd'] + 17 * ['th'] \ + ['st', 'nd', 'rd'] + 7 * ['th'] \ + ['st'] year = input('Year: ') month = input('Month (1-12): ') day = input('Day (1-31): ') month_number = int(month) day_number = int(day) # Remember to subtract 1 month_name = months[month_number-1] ordinal = day + endings[day_number-1] print (month_name + ' ' + ordinal + ', ' + year)


• Slicing

– 일정 범위 내의 element를 access하는 것

– 주의 [index1, index2] – index1은 inclusive, index2는 exclusive

>>> tag = '<a href="http://www.python.org">Python web site </a>'

>>> tag[9:30]

'http://www.python.org'

>>> tag[32:-4]

url = input('Please enter the URL: ')

domain = url[:-4]

print("Domain name: " + domain)


# copy >>> numbers=[1,2,3,4,5,6,7,8] >>> numbers[2:5] [3, 4, 5] >>> numbers=[1,2,3,4,5,6,7,8] >>> numbers[:] [1, 2, 3, 4, 5, 6, 7, 8] >>> numbers2 = numbers[:] >>> numbers2 [1, 2, 3, 4, 5, 6, 7, 8] >>> numbers [1, 2, 3, 4, 5, 6, 7, 8]

# step index의 이용 >>> numbers[0:10:1] [0, 10, 1] >>> numbers=[1,2,3,4,5,6,7,8] >>> numbers[0:10:1] [1, 2, 3, 4, 5, 6, 7, 8] >>> numbers[0:10:2] [1, 3, 5, 7] >>> numbers[3:6:3] [4] >>> numbers[::2] [1, 3, 5, 7] >>> numbers[8:0:-2] [8, 6, 4, 2] >>> numbers[::-3] [8, 5, 2] >>> numbers[:5:-2] [8]


• Sequence의 추가 – 단, 같은 type일 것

>>> [1,2,3]+[4,5,6] [1, 2, 3, 4, 5, 6] >>> >>> 'Hello ' + "world!" 'Hello world!‘ >>> 'Hello ' + "world!" 'Hello world!' >>> >>> 'Hello ' + '2014' 'Hello 2014' >>> 'Hello ' + 2014 Traceback (most recent call last): File "<pyshell#53>", line 1, in <module> 'Hello ' + 2014 TypeError: Can't convert 'int' object to str implicitly >>> 'Hello ' + ''2014' SyntaxError: invalid syntax

• Multiplication >>> 'python' * 3 'pythonpythonpython' >>> 'python is a ' *3 'python is a python is a python is a ‘ None, Empty List, List의 초기화 = 일단 공간을 확보하는 차원 >>> sequence = [None] * 10 >>> sequence [None, None, None, None, None, None, None, None, None, None] # Program: II01_multiply.py


# Program: II01_multiply.py # Prints a sentence in a centered "box" sentence = input("Sentence: ") screen_width = 80 text_width = len(sentence) box_width = text_width + 6 left_margin = (screen_width - box_width) // 2 print() print (' ' * left_margin + '+' + '-' * (box_width-2) + '+') print (' ' * left_margin + '| ' + ' ' * text_width + ' |') print (' ' * left_margin + '| ' + sentence + ' |') print (' ' * left_margin + '| ' + ' ' * text_width + ' |') print (' ' * left_margin + '+' + '-' * (box_width-2) + '+') print ()


• Membership >>> permission = 'rw' >>> 'w' in permission True >>> 'x' in permission False # spam filter에 이용 >>> subject = '$$$ Get rich now!! $$$' >>> '$$$' in subject True >>> users = ['hky','foo','bar'] >>> input('Enter your name: ') in users Enter your name: hky True

# II01_membership.py # Check a user name & PIN code database = [ ['albert', '1234'], ['dilbert', '4242'], ['smith', '7524'], ['jones', '9843'] ] username = input('User name: ') pin = input('PIN code: ') if [username, pin] in database: print('Access granted')


• Length, Minimum, Maximum >>> numbers = [100,34,567]

>>> len(numbers)

3

>>> max(numbers)

567

>>> min(numbers)

34

>>> min(3,5,7,2)

2


List() 함수

>>> list('Hello')

['H', 'e', 'l', 'l', 'o']

>>> a = list('Hello')

>>> a

['H', 'e', 'l', 'l', 'o']

>>> ''.join(a)

'Hello'


List에 대한 작업

• 개요 – Indexing, Slicing, Concatenating,

multiplying – 여기에 추가해서 …

• 변경작업: Item Assignment >>> x =[1,1,1] >>> x[1] 1 >>> x[1] =2 >>> x [1, 2, 1]

– 단, 유의할 점 ( 다음 면 볼 것)

>>> x[20] = 5 Traceback (most recent call last): File "<pyshell#90>", line 1, in <module> x[20] = 5 IndexError: list assignment index out of range >>> x[20] = 'None' Traceback (most recent call last): File "<pyshell#91>", line 1, in <module> x[20] = 'None' IndexError: list assignment index out of range >>> x = [None] * 20 >>> x[20] =5 Traceback (most recent call last): File "<pyshell#93>", line 1, in <module> x[20] =5 IndexError: list assignment index out of range >>> x = [None] * 21 >>> x[20] = 5 >>> x [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 5]


– Del >>> names = ['Alice', 'Beth', 'Cecil','Dee-Dee','Earl'] >>> del names[2] >>> names ['Alice', 'Beth', 'Dee-Dee', 'Earl']

• Assigning to Slices >>> name = list('Perl') >>> name ['P', 'e', 'r', 'l'] >>> name[2:] = list('ar') >>> name ['P', 'e', 'a', 'r'] >>> name[2:] = list('rl') >>> name ['P', 'e', 'r', 'l'] >>> numbers = [1,5] >>> numbers[1:1] = [2,3,4] >>> numbers [1, 2, 3, 4, 5] >>> numbers[1:3] = [] >>> numbers [1, 4, 5] – # 사실상 del numbers[1:3] 과 같음


List Methods

• append()

– >>> lst = [1,2,3] # 변수명에 유의할 것

>>> lst.append(4)

>>> lst

[1, 2, 3, 4]

• count()

>>> ['to', 'be', 'or', 'not', 'to', 'be'].count('to')

2

>>> x =[[1,2],1,1,[2,1,[1,2]]]

>>> x.count(1)

2

>>> x.count([1,2])

1


• extend() >>> a = [1,2,3] >>> b=[4,5,6] >>> a.extend(b) >>> a [1, 2, 3, 4, 5, 6]

– # 주의: concatenation과 다름 -

extended sequence is modified, – # vs. ordinary concatenation: 완

전히 새로운 sequence가 반환됨

>>> a = [1,2,3] >>> b = [4,5,6] >>> a+b [1, 2, 3, 4, 5, 6] >>> a [1, 2, 3]

– # 굳이 필요하다면 >>> a = a+ b >>> a [1, 2, 3, 4, 5, 6] >>>

– # (Note) extend의 효과를 slicing

을 통해 거둘 수 있다. >>> a[len(a):] = b >>> a [1, 2, 3, 4, 5, 6, 4, 5, 6]


• index() – 처음 발견되는 항목의 index

>>> heroes = ['We', 'are', 'the', 'heroes', 'who', 'say', 'of', 'course!'] >>> heroes.index('who') 4 >>> heroes.index('knights') Traceback (most recent call last): File "<pyshell#147>", line 1, in <module> heroes.index('knights') ValueError: list.index(x): x not in list >>> heroes[4] 'who'

• insert() >>> numbers = [1,2,3,5,6,7] >>> numbers.insert(3,'four') >>> numbers [1, 2, 3, 'four', 5, 6, 7] # (index에서처럼) slice를 통해서도 insert 효과를 거둘 수 있다. >>> numbers[4:4] = ['five'] >>> numbers [1, 2, 3, 'four', 'five', 5, 6, 7]

• pop() >>> x = [1,2,3] >>> x.pop() 3 >>> x [1, 2] >>> x.pop(0) 1 >>> x [2]


• remove() >>> heroes ['We', 'are', 'the', 'heroes', 'who', 'say', 'of', 'course!'] >>> heros.remove('the') Traceback (most recent call last): File "<pyshell#176>", line 1, in <module> heros.remove('the') NameError: name 'heros' is not defined >>> heroes.remove('the') >>> heroes ['We', 'are', 'heroes', 'who', 'say', 'of', 'course!']

– (주의1) 맨 첫 번째 항목부터 삭제됨 – (주의2) “In-place change”이므로 되

돌이킬 수 없음.

• reverse() >>> y [1, 2, 3, 4] >>> y.reverse() >>> y [4, 3, 2, 1] >>> reversed(y) <list_reverseiterator object at 0x00000000035CB630> >>> y [4, 3, 2, 1] >>> list(reversed(y)) [1, 2, 3, 4]


• sort() – ; sort lists ‘in place’ >>> x=[4,6,2,1,7,9] >>> x.sort() >>> x [1, 2, 4, 6, 7, 9] – >>> # 정렬된 copy본을 이용하고자

할 때 주의!! >>> x=[4,6,2,1,7,9] >>> y = x.sort() >>> y >>> print(y) None – # 즉, sort는 x를 변경시키지만 반환값

은 없다 (nothing) – >>> # 이를 위해서는...

>>> x=[4,6,2,1,7,9] >>> y=x[:] >>> x [4, 6, 2, 1, 7, 9] >>> y [4, 6, 2, 1, 7, 9] >>> y.sort() >>> y [1, 2, 4, 6, 7, 9] – # 또는... >>> x=[4,6,2,1,7,9] >>> y=sorted(x) >>> x [4, 6, 2, 1, 7, 9] >>> y [1, 2, 4, 6, 7, 9] – # sorted()는 어떤 sequence에든 이용

가능, 단 항상 list를 반환함 >>> sorted('Python') ['P', 'h', 'n', 'o', 't', 'y']


– sort()의 option

>>> heroes = ['We', 'are', 'the', 'heroes', 'who', 'say', 'of', 'course!']

>>> heroes.sort(key=len)

>>> heroes

['We', 'of', 'are', 'the', 'who', 'say', 'heroes', 'course!']

>>> heroes.sort(key=len, reverse=True)

>>> heroes

['course!', 'heroes', 'are', 'the', 'who', 'say', 'We', 'of']

– # 고급의 sorting:

– https://wiki.python.org/moin/HowTo/Sorting


https://wiki.python.org/moin/HowTo/Sorting



Tuples

• 개념 – Immutable sequence

– 구문: , 또는 ( , , )

• Operation

• Methods

• 주요 활용처 – Map에서의 key (cf. list는 key가 될 수 없음)

– 일부 내장함수에서의 반환값으로 이용


• 기본 사용방법 >>> 1,2,3 (1, 2, 3) >>> (1,2,3) (1, 2, 3) >>> () () >>> 42 42 >>> 42, (42,) >>> (42,) (42,) >>> 3 * (40+2) 126 >>> 3 * (40+2,) (42, 42, 42)

• tuple() 함수 >>> x = [1,2,3] >>> tuple(x) (1, 2, 3) >>> tuple('abc') ('a', 'b', 'c') >>> tuple((1,2,3)) (1, 2, 3) >>> y=1,2,3 >>> y[1] 2 >>> y (1, 2, 3) >>> y[1] 2 >>> y[0:2] (1, 2)


String 처리

• 개요

• 기초 – “~~” ‘~~’ “”” ~~””” ‘~~~’ – concatenation – input()

• Raw string • str()과 repr()

– str() ;human-readable, – repr() ; representations to be read by the interpreter – 같은 값이 반환될 때도 있음. 예: numbers, lists, dictionaries – 그러나, Strings, floating point numbers 등은 확연히 달라짐

• Formatting ; … 단, string module • Methods


• Raw string >>> print('Hello, \nworld!') Hello, world! >>> path = "c:\temp" >>> path 'c:\temp' >>> print(path) c: emp >>> print('c:\\temp') c:\temp >>> path = 'c:\\Program Files\\Python\\' >>> path 'c:\\Program Files\\Python\\' >>> print(path) c:\Program Files\Python\

>>> # >>> print(r'c:\Program Files\Python') c:\Program Files\Python >>> print(r'Let\'s go') Let\'s go >>> print(r'Let's go') SyntaxError: invalid syntax >>> print("Let's go") Let's go


• str()과 repr() >>> x=10 * 3.25 >>> y=200*200 >>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...' >>> print(s) The value of x is 32.5, and y is 40000... >>> # The repr() of a string adds string quotes and backslashes: >>> hello = 'Hello, world\n' >>> hellos = repr(hello) >>> hellos "'Hello, world\\n'" >>> print(hellos) 'Hello, world\n' >>> repr((x,y, ('ham','eggs'))) "(32.5, 40000, ('ham', 'eggs'))"


Formatted printing >>> for x in range(1,11): print(repr(x).rjust(2), repr(x*x).rjust(3), end='') print(repr(x*x*x).rjust(4)) 1 1 1 2 4 8 … 9 81 729 10 1001000 >>> >>> for x in range(1,11): print('{0:2d} {1:3d} {2:4d}'.format(x, x*x, x*x*x)) 1 1 1 2 4 8 … 9 81 729 10 100 1000

>>> for x in range(1,11): print(x, "\t", end='') 1 2 3 4 5 6 7 8 9 10 >>> for x in range(1,11): print(x, "\t") 1 2 3 … 8 9 10


String Methods

• find >>> title = "Monty Python's Flying Circus"

>>> title.find('Monty')

0

>>> title.find('Python')

6

>>> title.find('database')

-1

>>> subject ='$$$ Get rich now!!! $$$'

>>> subject.find('$$$')

0

>>> subject.find('$$$',1)

20

>>> subject.find('!!!',0,16)

-1

• join >>> seq = [1,2,3,4,5] >>> sep = '+' >>> seq.join(seq) Traceback (most recent call last): File "<pyshell#376>", line 1, in <module> seq.join(seq) AttributeError: 'list' object has no attribute 'join‘ >>> seq = ['1','2','3','4','5'] >>> sep.join(seq) '1+2+3+4+5' >>> dirs ='', 'usr','bin','env' >>> dirs ('', 'usr', 'bin', 'env') >>> '/'.join(dirs) '/usr/bin/env' >>> print('c:'+'\\'.join(dirs)) c:\usr\bin\env


• lower >>> """Monty Python (sometimes known as The Pythons) was a British surreal comedy group … BBC on 5 October 1969. """.lower() "monty python (sometimes known as the pythons) was … october 1969. " >>> >>> if 'Gumby' in ['gumby','smith','jones']: print ('Found it') >>> name = 'Gumby' >>> names = ['gumby','smith','jones'] >>> if name.lower() in names: print('Found it!') Found it!

• replace >>> 'This is a test'.replace('is', 'eez') 'Theez eez a test' >>>

• split

>>> '1+2+3+4'.split('+') ['1', '2', '3', '4'] >>> '/usr/bin/env'.split('/') ['', 'usr', 'bin', 'env'] >>> 'Using the default'.split() ['Using', 'the', 'default']


• Strip >>> ' internal white space is not to be '.strip() 'internal white space is not to be' >>> names=['gumby','smith','jones'] >>> name = 'gumby ' >>> if name in names: print('Found it') >>> if name.strip() in names: print('Found it!') Found it! >>> >>> '*** SPAM * for* everyone!!! *** SyntaxError: EOL while scanning string literal >>> '*** SPAM * for* everyone!!! ***'.strip('*!') ' SPAM * for* everyone!!! '

• Translate – (생략)


Dictionary


개요

• 개념

• 사용처

• 생성방법


• 기본 >>> phonebook ={'Alice':'2341', 'Beth':'9102', 'Cecil':'3258'} >>> phonebook['Beth']

• dict() 함수 >>> items =[('name','Gumby'), ('age',42)] >>> d = dict(items) >>> d {'age': 42, 'name': 'Gumby'} >>> d['name'] 'Gumby' >>> >>> # keyword argument로 사용 가능 >>> >>> d = dict(name='Gumby', age=42) >>> d {'age': 42, 'name': 'Gumby'}


• 기본적인 작업

– len(d)

– d[k]

– d[k] = v

– del d[k]

– k in d

>>> x =[]

>>> x[42]='Foobar'



x[42]='Foobar'

IndexError: list assignment index out of range

>>> x = {}

>>> x[42] = 'Foobar'

>>> x

{42: 'Foobar'}


# Program: II01_database.py # A simple database people = { 'Alice': { 'phone': '2341', 'addr': 'Foo drive 23' }, 'Beth': { 'phone': '9102', 'addr': 'Bar street 42' }, 'Cecil': { 'phone': '3158', 'addr': 'Baz avenue 90' } }

labels = { 'phone': 'phone number', 'addr': 'address' } name = input('Name: ') # Are we looking for a phone number or an address? request = input('Phone number (p) or address (a)? ') # Use the correct key: if request == 'p': key = 'phone' if request == 'a': key = 'addr' # Only try to print the information if the name is a valid key in our dictionary: if name in people: print ("%s's %s is %s." % (name, labels[key], people[name][key])) 빅데이터분석교육(2015-11)

Dictionary Methods

• clear

>>> x = {}

>>> y=x

>>> x['key'] = 'value'

>>> x

{'key': 'value'}

>>> y

{'key': 'value'}

>>> x={}

>>> y

{'key': 'value'}

>>>

>>>

>>> #

>>> x = {}

>>> y=x

>>> x['key']='value'

>>> y

{'key': 'value'}

>>> x.clear()

>>> x

{}

>>> y

{}


• copy >>> x = {'username':'admin', 'machines':['foo','bar','baz']} >>> y = x.copy() >>> y['username'] = 'tom' >>> y['machines'].remove('bar') >>> y {'username': 'tom', 'machines': ['foo', 'baz']} >>> x {'username': 'admin', 'machines': ['foo', 'baz']} >>> # >>> from copy import deepcopy >>> d={} >>> d['names'] = ['Alfred','Betrand']

>>> c = d.copy() >>> dc = deepcopy(d) >>> d {'names': ['Alfred', 'Betrand']} >>> c {'names': ['Alfred', 'Betrand']} >>> dc {'names': ['Alfred', 'Betrand']} >>> d['names'].append('Clive') >>> d {'names': ['Alfred', 'Betrand', 'Clive']} >>> c {'names': ['Alfred', 'Betrand', 'Clive']} >>> dc {'names': ['Alfred', 'Betrand']}


• fromkeys >>> {}.fromkeys(['name','age']) {'age': None, 'name': None} >>> # same effect as below >>> dict.fromkeys(['name','age']) {'age': None, 'name': None} >>> # default 지정 >>> dict.fromkeys(['name','age'], '(unknown)') {'age': '(unknown)', 'name': '(unknown)'}

• get >>> d={} # items에 대한 forgiving (너그러운) access >>> d.get('name') >>> print(d.get('name')) None >>> # default값 지정 >>> d.get('name','NA') 'NA' >>> d['name']='Eric' >>> d.get('name') 'Eric'

# Program: II01_database2.py labels = { 'phone': 'phone number', 'addr': 'address' } name = input('Name: ') request = input('Phone number (p) or address (a)? ') key = request # In case the request is neither 'p' nor 'a' if request == 'p': key = 'phone' if request == 'a': key = 'addr' # Use get to provide default values: person = people.get(name, {}) label = labels.get(key, key) result = person.get(key, 'not available') print ("%s's %s is %s." % (name, label, result))


• has_key – == k in d 단, Python 3에서 없어짐 >>> # d.has_key(k) >>> d = {} >>> d.has_key('name')

• items와 iteritems >>> # items(); returns all the items of the dictionary as a list of items, in which item is of the form (key,value) >>> # 단, 순서는 달라짐 >>> d = {'title':'Python web site', 'url':'www.python.org', 'spam':0} >>> d.items() dict_items([('url', 'www.python.org'), ('spam', 0), ('title', 'Python web site')]) >>> # iteritems(); iterator 반환 – Python 3에서 없어짐 >>> it = d.iteritems()

• keys와 iterkeys >>> d.keys() dict_keys(['url', 'spam', 'title'])

• pop >>> d = {'x':1,'y':2} >>> d.pop('x') 1

• popitem >>> d = {'title':'Python web site', 'url':'www.python.org', 'spam':0} >>> d.popitem() ('url', 'www.python.org') >>> # 단, arbitrary item <-- list가 아니어서 'last item' 개념이 없음 >>> # 또한 dictionary에는 append 가 없음


• setdefault >>> d2 ={} >>> d2.setdefault('name','N/A') 'N/A' >>> d2 {'name': 'N/A'} >>> d2['name'] = 'Charles' >>> d2.setdefault('name','N/A') 'Charles' >>> d2 {'name': 'Charles'}

• update >>> d3 = { 'title': 'Python web site', 'url': 'www.python.org', 'changed': 'Mar 14 22:05 MET 2014' } >>> x = {'title':'Python language web site'}

>>> d3.update(x) >>> d3 {'url': 'www.python.org', 'changed': 'Mar 14 22:05 MET 2014', 'title': 'Python language web site'}

• values와 itervalues >>> d = {} >>> d[1] = 1 >>> d[2] = 2 >>> d[3] =3 >>> d[4]=1 >>> d[0]=5 >>> d {0: 5, 1: 1, 2: 2, 3: 3, 4: 1} >>> d.values() dict_values([5, 1, 2, 3, 1])


조건문과 Loop문

• Block 지정 – Indentation – Level 당 4 spaces – 조건지정: Boolean 값

• 조건문 – if ~ elif ~ else – assertion

• Loop – while loop, for loop – 다양한 iteration 방법 …

• List Comprehension • 보충내용 (1)

– pass, del, exec, del

• 보충내용 (2) – Sequence unpacking, …


조건지정: Boolean 값

• False – False, None, 0, “”, {},(),[]

• True – True, False가 아닌 기타의 모든

것

• 내부처리의 본질: True=1, False=0 – >>> True – True – >>> False – False – >>> True==1 – True – >>> False==0 – True – >>> True+False+50 – 51

• True와 False는 bool type에 속함 – >>> type(True) – <class 'bool'> – >>> type(False) – <class 'bool'> – >>> bool('') – False – >>> bool(0) – False


조건 실행

• if, elif, else • Nested Blocks

var = 100 if var < 200: print("Expression value is less than 200") if var == 150: print("Which is 150") elif var == 100: print("Which is 100") elif var == 50: print("Which is 50") elif var < 50: print("Expression value is less than 50") else: print("Could not find true expression") print("Good bye!")

• 비교연산자 – == – <, > – <=, >= – != – is와 is not – in과 not in

• Equality 연산자 – is – 양자를 구분할 것! – identity vs.

equality >>> x = y = [1,2,3] >>> z = [1,2,3] >>> x==y True >>> x==z True >>> x is y True >>> x is z False


• Membership 연산자 – In >>> name = input('your name? ') your name? hk yoon >>> if 'h' in name: print('Your name contains a chracter "h"') Your name contains a chracter "h"

• String 비교 >>> 'alpha' < 'beta' True >>> # 이와 관련하여 >>> 'FnOrD'.lower() == 'fnOrd'.lower() True >>> [1,2] < [2,1] True

• Boolean 연산자 number = input('Enter a number between 1~10: ') if int(number) <=10 and int(number) >=1: print('Well done!') else: print('Watch out the range!') – Short-circuit Logic – …


Loop

• while loop >>> x =1 >>> while x <=100: print(x) x+=1 # # Program: II02_whiletest01.py name='' while not name: name=input('You have to enter name: ') print('Hello, %s' % name) # 주의 – space 입력의 문제 # Program: II02_whiletest02.py name='' while not name or name.isspace(): # while not name.strip() name=input('You have to enter name: ') print('Hello, %s' % name)

• for loop – Code block 수행을 for each

element of a set (or sequence, or other iterable object)

– iterable object = iterate 가능한 객체

>>> words = ['this','is','a','wonder'] >>> for word in words: print(word) this is a wonder


• Range >>> range(1,10,3) range(1, 10, 3) >>> print(range(1,10,3)) range(1, 10, 3) >>> list(range(1,10,3)) [1, 4, 7] >>> for number in range(1,10,3): print(number) 1 4 7 – >>> # Python 2.x에서의

xrange는 3.x에서 range에 포함되었음

• Dictionary에 대한 iteration >>> d = {'x':1, 'y':2,'z':3} >>> for key in d: print (key, 'corresponds to', d[key]) y corresponds to 2 x corresponds to 1 z corresponds to 3 # 다음과 동일 >>> for key, value in d.items(): print(key, 'corresponds to', d[key])


• Iteration 관련 utilities – Parallel iteration >>> names = ['anne','beth','george','damon'] >>> ages=[12,30,35,55] >>> for i in range(len(names)): print (names[i], 'is ', ages[i], 'years old') anne is 12 years old beth is 30 years old george is 35 years old damon is 55 years old >>> # zip >>> list(zip(names,ages)) [('anne', 12), ('beth', 30), ('george', 3 5), ('damon', 55)] Number iteration # enumerate()로 index를 이용 가능. >>> for i,v in enumerate(['tic','tac','toe']): print(i,v) 0 tic 1 tac 2 toe

Reversed & Sorted iteration # reversed(), sorted()는 reverse(), sort()와 유사하지만, # they work on any sequence or iterable object, # 또한 object를 in-place 변경시키지 않고, they return reversed and sorted version >>> sorted([4,3,7,6,9]) [3, 4, 6, 7, 9] >>> sorted('hello, world!') [' ', '!', ',', 'd', 'e', 'h', 'l', 'l', 'l', 'o', 'o', 'r', 'w'] >>> list(reversed('hello')) ['o', 'l', 'l', 'e', 'h'] >>> ''.join(reversed('hello,world')) 'dlrow,olleh'


• Loop 에서 나오는 방법 – break from math import sqrt for n in range(99,0,-1): root = sqrt(n) if root == int(root): print(n) break – continue – 예제 생략

– while True/break idiom ##word = input('Enter a word: ') ##while word: ## # do something ## print('The word was ' + word) ## word = input('Enter a word: ') while True: word = input('Enter a word: ') if not word: break # do something print('The word was ' + word) – while True는 무한반복이므로 문

장내에 if /break 조건으로 처리


List Comprehension

• 개념 – 간편하면서도 강력! >>> [x*x for x in range(10)] [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> # 3으로 나누어지는 수만? >>> [x*x for x in range(10) if x % 3==0] [0, 9, 36, 81] >>> [(x,y) for x in range(3) for y in range(2)] [(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)] result=[]

– ## 다음 프로그램은 verbose for x in range(3): for y in range(2): result.append((x,y))

>>> girls = ['alice','bernice','clarice'] >>> boys = ['chris','arnold','bob'] >>> [b+':' for b in boys for g in girls if b[0]==g[0]] ['chris:', 'arnold:', 'bob:'] >>> [b+':'+g for b in boys for g in girls if b[0]==g[0]] ['chris:clarice', 'arnold:alice', 'bob:bernice'] >>> – >>> # 개선 - girl의 앞글자를 key로

하고 이름을 value로 하는 dictionary --> cartesian을 피함

>>> grouping = {} >>> for girl in girls: grouping.setdefault(girl[0],[]).append(girl) >>> print([b+':'+g for b in boys for g in girls if b[0]==g[0]])


보충내용 (1) pass, del, exec, eval

• pass() ; do nothing – if name=='park': – print('welcome') – elif name=='bush': – pass – else: – print('we are waiting')

• del(): delete – >>> x=1 – >>> x – 1 – >>> del x – >>> x – Traceback (most recent call last): – File "<pyshell#925>", line 1, in

<module> – x – NameError: name 'x' is not

defined

– >>> x = ['hello','world'] – >>> y=x – >>> y[1] – 'world' – >>> y[1] = 'python' – >>> x – ['hello', 'python'] – >>> del x – >>> x – Traceback (most recent call last): – File "<pyshell#934>", line 1, in

<module> – x – NameError: name 'x' is not

defined – >>> y – ['hello', 'python'] – # Why? – Del은 이름만 삭제, list (즉 value)는

그대로 존재 – 사실상 삭제 불가. 이 작업은 Python interpreter가 실시


• exec()

– Execute a series of Python statements

– …

• eval()

– Evaluates a Python expression and return the resulting value

>>> eval(input("Enter an arithmetic expressoin: "))

Enter an arithmetic expressoin: 5 + 3*2

11

– # 주의: 보안문제에 유의할 것


보충내용 (2) Sequence unpacking, …

• Sequence unpacking >>> x, y, z=1,2,3 >>> print(x,y,z) 1 2 3 >>> >>> x,y = y,x >>> x,y (2, 1) >>> x,y,z (2, 1, 3) >>> x,y = y,z >>> x,y,z (1, 3, 3) >>> x,y,z = z,x,y >>> x,y,z (3, 1, 3)

• Chained assignment와 augmented assignment x = y = somefunction() >>> x=2 >>> x+=3 >>> x*=5 >>> x 25 >>> >>> sname = 'foo' >>> sname +='bar' >>> sname *=3 >>> sname 'foobarfoobarfoobar'


• assert

– 위험요소를 미리 드러나게 함

– 일종의 checkpoint로 활용

>>> age =10

>>> assert 0<age<100

>>> age =-5

>>> assert 0<age<100



assert 0<age<100

AssertionError


Module

• 개념 – 프로그램 파일 (Python, C, C++, …)

– 역할:

• Module 작성과 이용 – Module작성

– import ans

– 검색경로

– Private name

• Scoping Rule


• 개념 – 프로그램 파일 (Python, C, C++, …) – 역할:

• … • namespace를 통해 name clash를 방지

– Namespace = 일종의 dictionary of identifiers

• Module 작성과 이용 – Module작성 – import 문

• import 후 qualification이 필요함

– 검색경로 • 유의할 것 • Python 경로 안에 설치, sys.path 변경, PYTHONPATH 이용, .pth 파일

작성

– Private name


• Scoping Rule – 순서:

– Local > Global > Built-in

• locals()

• globals()

• dir(__builtins__)


• # module 작성 및 이용

>>> import II02_mymath

>>> area(2)

12.56636

>>> II02_mymath.pi

3.14159

>>> II02_mymath.area(5)

78.53975

>>> from II02_mymath import area

>>> area(10)

314.159

• # module 탐색경로

>>> import sys

>>> sys.path

['C:/Python31/DBnet', 'C:\\Python31\\Lib\\idlelib', 'C:\\Windows\\system32\\python31.zip', 'C:\\Python31\\DLLs', 'C:\\Python31\\lib', 'C:\\Python31\\lib\\plat-win', 'C:\\Python31', 'C:\\Python31\\lib\\site-packages']



• 함수

• 파일 & 입출력

• OOP

• Exception처리


함수


Functions

• 개념 – 함수?... 추상화의 한 단계

• 함수, class, design patterns, …

• 함수의 작성 – def 문

• 함수에서의 Parameter • Scoping의 문제

• Local – 함수 내에서만 … • Nonlocal –previously bound variable in the closest enclosing scope • Global – 함수 밖에 존재하고 이를 함수 밖에서 global 선언하여 access, 변경

• Recursion • Lambda expression

– In-line 정의하는 익명의 작은 함수 (단, return 문이 없음)

• Generator 함수 – 자신이 원하는 iterator를 정의 – yield 문 이용

• Decorator 함수 – Function도 1st class로서 변수에 assign 되거나 parameter로서 전달될 수 있다.


함수의 작성

• def – function을 정의 >>> def hello(name): return 'Hello, ' + name + '!' >>> hello('world') 'Hello, world!' >>> def fibs(num): result = [0,1] for i in range(num-2): result.append(result[-2]+result[-1]) return result >>> fibs(10) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34] >>> fibs(20) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]

• Documenting >>> def square(x): 'x의 제곱을 계산' return x*x >>> square(2) 4 >>> square(7.5) 56.25 >>> square.__doc__ 'x의 제곱을 계산‘ >>> help(square) Help on function square in module __main__: square(x) x의 제곱을 계산


Parameter

• 용어 – 정의 시: Formal parameter – 호출 시: Actual parameter =

argument

• Local 변수의 문제 – local scope (자신이 지정된 function

또는 block)만을 이용하는 변수 – 함수 내에서는 기본적으로 별도의

copy본을 만들어서 이용 >>> def inc(x): return x+1 >>> inc(10) 11 >>> inc(11) 12 >>> x =20 >>> inc(x) 21

>>> def try_to_change(n): n="Me" >>> name='You' >>> try_to_change(name) >>> name 'You' >>> def change(n): n[0] = "Me" >>> names = ["You","Her","Him"] >>> change(names) >>> names ['Me', 'Her', 'Him'] >>>


• 단, parameter object를 바꿀 경우 immutable에 유의

– >>> def inc(x):

– return x+1

– >>> inc(10)

– 11

– >>> inc(11)

– 12

– >>> x =20

– >>> inc(x)

– 21

– >>>

– >>> y=[10]

– >>> inc(y)

– Traceback (most recent call last):

– File "<pyshell#147>", line 1, in <module>

– …

– TypeError: can only concatenate list (not "int") to list

– >>> def inc2(x):

– x[0] = x[0]+1

– >>> inc2(y)

– >>> y

– [11]


• Positional parameter, Keyword parameter, default parameter >>> def hello1(greeting, name): print('%s, %s' % (greeting, name)) >>> def hello2(greeting, name): print('%s, %s' % (name, greeting)) >>> hello1('Hi', 'Seoul') Hi, Seoul >>> hello2('Hi', 'Seoul') Seoul, Hi

>>> >>> hello1(name='hk', greeting="good morning") good morning, hk >>> def hello3(greeting="Good day", name="everybody"): print('%s, %s' % (greeting, name)) >>> hello3() Good day, everybody >>> hello3('Wonderful', 'universe') Wonderful, universe – 단, 여러 종류의 parameter가 동

시에 사용될 때는 positional parameter가 먼저 나올 것


• Parameter의 개수변동 >>> def print_params(*params):

print(params)

>>> print_params(1)

(1,)

>>> print_params(2,3,4)

(2, 3, 4)

>>> print_params('hk', 'john','obama')

('hk', 'john', 'obama')

>>>

>>> def print_params2(title, *params):

'parameter: title is required, but you can add others'

print(title)

print(params)

>>> print_params2('Nice wether', 'yesterday','today','tomorrow')

Nice wether

('yesterday', 'today', 'tomorrow')

>>> help(print_params2)

…


• 여러 개의 Keyword parameter 정보 수집 >>> def print_params3(**params): print(params) >>> print_params3(name='kim', age=35, gender='male') {'gender': 'male', 'age': 35, 'name': 'kim'}

• 종합 >>> def print_params(x, y, z=3, *position_par, **keyword_par): print(x,y,z) print(position_par) print(keyword_par) >>> print_params(1,5,10, 'my function', hobby1='game', hobby2='climbing') 1 5 10 ('my function',) {'hobby2': 'climbing', 'hobby1': 'game'}


>>> storage={} >>> storage['firstname']={} >>> storage['middlename']={} >>> storage['lastname']={} >>> me1 ='wolfgang amadeus mozart' >>> me2 = 'john f kennedy' >>> storage['firstname']['wolfgang']=me1 >>> storage['middlename']['wolfgang']=me1 >>> storage['lastname']['mozart']=me1 >>> storage['middlename']['amadeus']=me1 >>> storage['firstname']['john'] = me2 >>> storage['middlename']['f'] = me2 >>> storage['lastname']['kennedy'] = me2

>>> storage {'middlename': {'wolfgang': 'wolfgang amadeus mozart', 'amadeus': 'wolfgang amadeus mozart', 'f': 'john f kennedy'}, 'lastname': {'kennedy': 'john f kennedy', 'mozart': 'wolfgang amadeus mozart'}, 'firstname': {'wolfgang': 'wolfgang amadeus mozart', 'john': 'john f kennedy'}} >>> >>> def lookup(data, label, name): return data[label].get(name) >>> lookup(storage, 'lastname','kennedy') 'john f kennedy' >>> >>> def init(data): data['firstname']={} data['middlename']={} data['lastname']={}


>>> def store(data, fullname): names = fullname.split() if len(names) ==2: names.insert(1, '') labels = 'firstname','middlename','lastname' for label, name in zip(labels, names): people = lookup(data, label, name) if people: people.append(fullname) else: data[label][name] = [fullname]

>>> def store2(data, *fullnames): for fullname in fullnames: names = fullname.split() if len(names) ==2: names.insert(1,'') labels = 'first','middle','last' for label, name in zip(labels,names): people=lookup(data, label,name) if people: people.append(fullname) else: data[label][name] =[fullname] >>> store(storage, 'chol su kim') >>> lookup(storage, 'lastname','kim') ['chol su kim'] >>> store(storage, 'ki moon ban') >>> lookup(storage,'firstname','ki') ['ki moon ban']


>>> params=(1,2) >>> add(*params) 3 >>> >>> def hello3(greeting="Good day", name="everybody"): print('%s, %s' % (greeting, name)) >>> params2={'name':'Dan gun', 'greeting':'father of father'} >>> hello3(**params2) father of father, Dan gun >>> # Using * (or **) both when you define and call the function will pass the tuple (or dictionary)

>>> def with_stars(**keywords): print(keywords['name'], 'is ', keywords['age'], 'years old') >>> def without_stars(keywords): print(keywords['name'], 'is ', keywords['age'], 'years old') >>> arguments = {'name':'Mr. Knowall','age':40} >>> with_stars(**arguments) Mr. Knowall is 40 years old >>> without_stars(arguments) Mr. Knowall is 40 years old # So, * (stars) are really useful only if you use them either # when defining a function (to allow a varying number of arguments) # or when calling a function (to 'splice in" a dictionary or a sequence


• It may be useful to use these splicing operators to ‘pass through’ parameters, without worrying too much about how many there are, and so forth.

– >>> def foo(x,y,z, m=0, n=0):

– print(x,y,z,m,n)

–

– >>> def call_foo(*args, **kwds):

– print('Calling foo! ...')

– foo(*args, **kwds)

• # 종합

>>> def story(**kwds):

return 'Once upon a time, there was a ' \

'%(job)s called %(name)s. ' % kwds

>>> def power(x,y, *others):

if others:

print('Received redundant parameters: ', others)

return pow(x,y)

>>> def interval(start, stop=None, step=1):

'Imitates range() for step >0'

if stop is None:

start, stop=0, start

result=[]

•


>>> story(job='king', name='Arthur')

'Once upon a time, there was a king called Arthur. '

>>> story(name = 'robinhood', job='righteous outlaw')

'Once upon a time, there was a righteous outlaw called robinhood. '

>>> params = {'job':'language','name':'python'}

>>> params1 = {'job':'language','name':'python'}

>>> story(**params1)

'Once upon a time, there was a language called python. '

>>> del params['job']

>>> params1

{'job': 'language', 'name': 'python'}

>>> params

{'name': 'python'}

>>> del params1['job']

>>> params1

{'name': 'python'}

>>> story(job='miracle in our age', **params)

'Once upon a time, … called python. '

>>>

>>> power(2,4)

16

>>> params2 = (5,) *2

>>> params2

(5, 5)

>>> power(*params2)

3125

>>> 5*5*5*5*5

3125

>>> power(3,3,'Hello, world')

Received redundant parameters: ('Hello, world',)

27

>>> interval(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> interval(1,5)

[1, 2, 3, 4]

>>> interval(3,12,4)

[3, 7, 11]

>>> power(*interval(3,7))

Received redundant parameters: (5, 6)


• Scoping의 문제 – Namespace란?

• (변수 등의) 이름이 존재하는 곳 • 일종의 invisible dictionary = scope

>>> x=1 >>> scope = vars() >>> scope {'square': <function square at 0x00000000033309C8>, 'print_params': <function print_params at 0x00000000033982C8>, 'call_foo': ... 'y': [11], 'x': 1, 'z': (10,), 'scope': {...}} >>> scope['x'] 1 >>> scope['x']+1 2 >>> x 1 >>> scope['x']+=1 >>> x 2

– Local, nonlocal, global, 변수 • Local – 함수 내에서만 … • Nonlocal – closest enclosing scope

의 변수 • Global –

– Rebinding global variable • Making them refer to some new

value • 변수를 함수 내에서 정의하는 순간

자동으로 local 변수가 됨. (다르게 지정할 수는 있음)


# http://www.python-course.eu/python3_global_vs_local_variables.php >>> def f(): print(s) >>> s = "I love Paris in the summer!" >>> f() I love Paris in the summer! >>> >>> def f(): s = "I love London!" print(s) >>> s = "I love Paris!" >>> f() I love London! >>> print(s) I love Paris! >>> >>> >>> def f(): print(s) s = "I love Longdon!" print(s)

>>> s = "I love Paris!" >>> f() Traceback (most recent call last): File "<pyshell#526>", line 1, in <module> f() File "<pyshell#524>", line 2, in f print(s) UnboundLocalError: local variable 's' referenced before assignment >>> def f(): global s print(s) s = "Only in spring, but London is great as well!" print(s) >>> s = "I am looking for a course in Paris!" >>> f() I am looking for a course in Paris! Only in spring, but London is great as well! >>> print(s) Only in spring, but London is great as well! >>> s = "new sentence" >>> print(s) new sentence >>>


http://www.python-course.eu/python3_global_vs_local_variables.php





>>> def f(): s = "I am globally not known" print(s) >>> f() I am globally not known >>> print(s) new sentence >>> def f(): s1 = "I am globally not known" print(s1) >>> f() I am globally not known >>> print(s1) Traceback (most recent call last): File "<pyshell#546>", line 1, in <module> print(s1) NameError: name 's1' is not defined ------------ # save following as: scopetest01.py def f(): s = "I am globally not known" print(s) f() print(s) -------------

c:\Python31>python scopetest01.py I am globally not known Traceback (most recent call last): File "ex.py", line 6, in <module> print(s) NameError: name 's' is not defined c:\Python31> ------------- >>> def foo(x, y): global a a = 42 x,y = y,x b = 33 b = 17 c = 100 print(a,b,x,y) >>> a,b,x,y = 1,15,3,4 >>> foo(17,4) 42 17 4 17 >>> print(a,b,x,y) 42 15 3 4


• Nested scope Function 안의 function – 예: using one function to create another. >>> def foo(x, y): global a a = 42 x,y = y,x b = 33 b = 17 c = 100 print(a,b,x,y) >>> a,b,x,y = 1,15,3,4 >>> foo(17,4) 42 17 4 17 >>> print(a,b,x,y) 42 15 3 4 >>> >>> def multiplier(factor): def multiplyByFactor(number): return number * factor return multiplyByFactor

# outer 함수가 inner 함수를 반환 - not called # 특징: returned function still has access to the scope where it was defined # 즉, it carries its environment (and the associated local variables) with it! >>> double = multiplier(2) >>> double(5) 10 >>> triple=multiplier(3) >>> triple(3) 9 >>> multiplier(5)(4) 20 이때 multiplyByFactor (that stores its enclosing scope)를 closure라고 한다 # 보통은 outer scope의 변수를 rebind할 수 없다. # 그러나 Python 3에서는 'nonlocal'을 통해 outer (but non-global) scope의 변수를 assign할 수 있다.


• 재귀함수

>>> def recursion():

return recursion()

>>> def factorial(n):

result=n

for i in range(1,n):

result *=1

return result

# 예 2

# factorial의 정의: (a) 1의 factorial은 1

# 1 이상 n의 factorial은 n x factorial(n-1)

>>> def factorial(n):

if n==1:

return 1

else:

return n * factorial(n-1)

>>> factorial(10)

3628800

>>>

# 예 2

>>> def power(x,n):

result=1

for i in range(n):

result *= x

return result

>>>

>>> # power의 정의

>>> # power(x,0)= 1

>>> # power(x,n) for n >0 = x * power(x, n-1)

>>> def power(x,n):

if n==0:

return 1

else:

return x * power(x, n-1)


• Lambda 함수 >>> t2 = {'FtoK': lambda deg_f: 273 + (deg_f -32) * 5 / 9,

'CtoK': lambda deg_c: 273 + deg_c}

>>> t2['FtoK'](32)

273.0

• Generator 함수 >>> def four():

x = 0

while x <4:

print("in generator, x = ", x)

yield x

x+=1

>>> for i in four():

print(i)

in generator, x = 0

0

in generator, x = 1

1

in generator, x = 2

2

in generator, x = 3

3

>>> 2 in four()

in generator, x = 0

in generator, x = 1

in generator, x = 2

True


• Decorators – Wrapping function

(= decorator)

– +

– Wrapped function (작성 • 후에 wrapping function의

argument로 이용)

• @decorate 함수정의()


• Decorators >>> def decorate(func):

print("in decorate function, decorating", func.__name__)

def wrapper_func(*args):

print("Executing", func.__name__)

return func(*args)

return wrapper_func

>>> def myfunction(parameter):

print(parameter)

>>> myfunction = decorate(myfunction)

in decorate function, decorating myfunction

>>> myfunction("hello")

Executing myfunction

hello

>>>

>>>

>>> @decorate

def myfunction(parameter):

print(parameter)

in decorate function, decorating myfunction

>>> myfunction('hello')

Executing myfunction

hello


• Another classic: Binary Search

– (생략)

• Throwing functions around – Python에서는 function 도 object

의 하나일 뿐 • 다른 변수에 지정하거나

• Parameter로 전달하거나

• 다른 함수에서의 결과값으로 반환

– 아울러, 특수한 함수들: • map(), filter(), reduce(), …

– (보충 要)


파일 및 입출력


• 파일 – 데이터를 일정한 방식으로 저장한 것.

• 파일 열기 file object = open(file_name [, access_mode][, buffering])

– File mode

– Buffering

• 파일 관련 Methods – Read, Write

– Pipe

– Read, Write Lines

– 파일 닫기

– 주요 methods

• 파일 내용에 대한 Iteration (생략) – Byte 단위, Line 단위, 통째로 읽기

– File Iterator


파일 열기

• File mode – open()의 Mode argument

• Binary 모드에서는 MS

Windows에서의 \n\r를 변환시키는 문제 등을 원천 해결

• Buffering – HDD 대신 메모리를 이용

• 0 (False); unbuffered

• 1 (True); buffered

• 큰 숫자는 buffersize를 표시 (-1은 default buffer size를 의미)

• 표준 stream; – sys module

– sys.stdin, sys.stdout, sys.stderr

• File-like – Streams

• io module

• (입출력의 3 가지 유형)

• text I/O, binary I/O, raw I/O.

• 각각에 대해 다양한 저장장치를 적용 - Concrete objects가 streams

– urllib.urlopen

– …

값 설명

‘r’ 읽기

‘w’ 쓰기

‘a’ Append

‘b’ Binary (다른 mode에 추가)

‘+’ Read/write (다른 mode에 추가)


파일 관련 Methods

• 파일 읽기 및 쓰기

>>> f = open('somefile.txt','w')

>>> f.write('hello, ')

7

>>> f.write('World')

5

>>> f.close()

>>>

>>> f=open('somefile.txt','r')

>>> f.read(4)

'hell'

>>> f.read()

'o, World'

• # Piping

– Linux/Unix

– # cat somefile.txt | python somescript.py | sort

• stdin - ### II02_wordcount01.py ???

• Read, Write Lines

– readlines()

– writelines()

• 파일 닫기 – close()


>>> f2 = open(r'c:\Python31\DBnet\somefile2.txt')

>>> f2.readline()

"Pro-democracy demonstrators seized … after a night of scuffles.\n"

>>> f2.readline()

"Spurred on by police … according to Hong Kong police."

>>> f2.readlines()

[]

>>> f2.seek(0,0)

0

>>> f2.readlines()

["Pro-democracy demonstrators seized … to Hong Kong police."]

>>> f2.close()

>>> f3 = open(r'c:\Python31\DBnet\somefile2.txt', 'w')

>>> f3.write('this is a new line')

18

>>> f3.close()

>>> f3 = open(r'c:\Python31\DBnet\somefile2.txt', 'r')

>>> f3.readlines()

['this is a new line']


# urllib >>> import urllib.request >>> f= open('somefile.txt','w') >>> url = 'http://www.openwith.net' >>> urllib.request.urlretrieve(url, 'somefile.txt') ('somefile.txt', <http.client.HTTPMessage object at 0x0000000003673E48>) <'somefile.txt＇를 열어 볼 것> >>> import urllib.request >>> response = urllib.request.urlopen('www.openwith.net') >>> html = response.read() >>> html b'<!DOCTYPE html>\n<html lang="ko-KR">\n<head>\n<meta charset="UTF-8" />\n<title>\x…


파일 내용에 대한 Iteration (생략)

• Byte 단위, Line 단위, 통째로 읽기

• File Iterator


OOP

• 개요 – Class와 Object

– 특징

• Class와 Type – Class 만들기

– Attribute와 Method

– Class namespace

– Superclass와 subclass

• OOD


개요

• Class와 Object – Class = A user-defined prototype for an object

• 특징 – Encapsulation

• 불필요한 detail을 감추는 것

– Polymorphism • (사용자는 모르는 가운데에서도) object의 type에 따라 자동으로 작업내용이

달라지는 것

– Inheritance • (Super)Class—subclass 즉, Parent-Child

• 주의 – Class와 type


# polymorphism >>> 2+2 4 >>> 'good' + 'morning' 'goodmorning' >>> 'good ' + 'morning' 'good morning‘ >>> add(1,2) 3 >>> add('good', ' morning') 'good morning‘ def length_of_message(n): print('The length of ', repr(n), 'is', len(n)) >>> length_of_message('good morning') The length of 'good morning' is 12 >>> length_of_message([3,5,7]) The length of [3, 5, 7] is 3


Class와 Type

• Class 만들기 – Constructor – initializer

• Attribute와 Method – Attribute = Object의 성격/특징을 표현하는 변수 – Method = Object에 속한 (bound) 함수

=A special kind of function that is defined in a class definition. – Operator overloading: 특정 operator에 여러 개의 함수를 assign하는 것

• Class namespace – Private ; (attribute 또는 method가) object 내에서만 인식되도록 한 것

이 경우 accessor method를 통해서만 이용가능

• Superclass와 subclass – class SubClassName (ParentClass1[, ParentClass2, ...]):

'Optional class documentation string' class_suite

• Interface와 Introspection


• Built-In Class Attributes

• Sameness vs. ‘==‘ • 복제 (copying)

– Aliasing을 했을 때 한쪽의 변경사항이 다른 한쪽에 영향미치는지가 애매할 수 있다. – 복제가 (copying) 한 대안이 될 수 있다. import copy copy.copy()

Class attributes 설명

__dict__ Dictionary containing the class's namespace.

__doc__ Class documentation string or None if undefined.

__name__ Class name

__module__ Module name in which the class is defined. This attribute is "__main__" in interactive mode.

__bases__ A possibly empty tuple containing the base classes, in the order of their occurrence in the base class list.


class Person01: def setName(self, name): self.name = name def getName(self): return self.name def greet(self): print("Hello, I am %s." % self.name) person1 = Person() >>> person2 = Person() >>> person1.setName('Ki Chul Kim') >>> person2.setName('Young Hee Park') >>> person1.greet() Hello, I am Ki Chul Kim. >>> person2.greet() Hello, I am Young Hee Park.


# Private class PrivateTest(): def __inaccessible(self): print('Not permitted to use from outside the object') def accessible(self): print('Permitted to access') self.__inaccessible() >>> p1 = PrivateTest() >>> p1.__inaccessible() Traceback (most recent call last): File "<pyshell#56>", line 1, in <module> p1.__inaccessible() AttributeError: 'PrivateTest' object has no attribute '__inaccessible' >>> p1.accessible() Permitted to access Not permitted to use from outside the object


# Program II02_class.py: class Person: population = 0 def __init__(self, name, age): self.name = name self.age =age print('{0} has been born!'.format(self.name)) Person.population +=1 def __str__(self): return '{0} is {1} years old'.format(self.name, self.age) def __del__(self): print('{0} is dying! :'.format(self.name)) Person.population -=1 def totalPop(): print('There are {} population in the world.'.format(Person.population))

p1 = Person("jonny",20) print(Person.population) p2 = Person("mary",25) print(Person.population) print(p1) print(p2)


>>> class Point:

""" Point class: represents and manipulates x,y coord."""

def __init__(self):

""" Create a new point at the origin"""

self.x = 0

self.y =0

>>> p1 = Point()

>>> p2 = Point()

>>> p2.x =3

>>> p2.y = 5

>>> print(p2.x, p2.y)

3 5

>>> print("p2's coordinates: ", p2.x, p2.y)

p2's coordinates: 3 5


# Program II02_point.py class Point: """ Point class: represents and manipulates x,y coord.""" def __init__(self, x=0, y=0): """ Create a new point at the origin""" self.x = x self.y =y def distance_org(self): """ Compute distance from origin """ return((self.x **2)+(self.y **2)) ** 0.5 def print_location(self): print('({0},{1})'.format(self.x, self.y)) def __str__(self): return 'Point of ({0},{1})'.format(self.x, self.y)

def halfway(self, target): """ return the halfway point between myself and the target""" mx = (self.x + target.x)/2 my = (self.y + target.y)/2 return Point(mx,my) #---------- >>> p1.print_location() (5,10) >>> p1=Point(5,10) >>> print(p1) Point of ((5,10) >>> str(p1) 'Point of ((5,10)‘ p = Point(3,4) >>> q = Point(5,12) >>> r = p.halfway(q) >>> print(r) Point of (4.0,8.0)


class Employee: 'Common base class for all employees' empCount = 0 def __init__(self, name, salary): self.name = name self.salary = salary Employee.empCount += 1 def displayCount(self): print ("Total Employee %d" % Employee.empCount) def displayEmployee(self): print ("Name : ", self.name, ", Salary: ", self.salary)

>>> emp1 = Employee("Zara", 2000) >>> emp2 = Employee("Manni", 5000) >>> emp1.displayEmployee() Name : Zara , Salary: 2000 >>> emp2.displayEmployee() Name : Manni , Salary: 5000 >>> print("Total Employee %d" % Employee.empCount) Total Employee 2 >>> # You can add, remove or modify attributes of classes and objects at any time >>> emp1.age = 30 >>> emp1.age = 35 >>> emp1.age 35 >>> hasattr(emp1, 'age') True >>> getattr(emp1, 'age') 35 >>> setattr(emp1, 'age', 35) >>> delattr(emp1, 'age')


# Class Attributes >>> hasattr(emp1, 'age') True >>> getattr(emp1, 'age') 35 >>> setattr(emp1, 'age', 35) >>> delattr(emp1, 'age') >>> Employee.__doc__ 'Common base class for all employees' >>> Employee.__name__ 'Employee' >>> Employee.__module__ '__main__' >>> Employee.__dict__ dict_proxy({'displayEmployee': <function displayEmployee at ‘… 0x0000000003512EC8>}) >>>


• # Inheritance

#!/usr/bin/python class Parent: parentAttr = 100 def __init__(self): print ("Calling parent constructor") def parentMethod(self): print ('Calling parent method') def setAttr(self, attr): Parent.parentAttr = attr def getAttr(self): print ("Parent attribute :", Parent.parentAttr)

>>> class Child(Parent): def __init__(self): print("Call’ child constructor") def childMethod(self): print("Calling child method") >>> c = Child() Calling child constructor >>> c.childMethod() Calling child method >>> c.parentMethod() Calling parent method >>> c.setAttr(200 ) >>> c.getAttr() Parent attribute : 200


# operator overloading

>>> class Vector:

def __init__(self, a,b):

self.a = a

self.b = b

def __str__(self):

return 'Vector (%d, %d)' % (self.a, self.b)

def __add__(self,other):

return Vector(self.a + other.a, self.b + other.b)

>>> v1 = Vector(2,10)

>>> v2 = Vector(5,-2)

>>> v1 + v2

<__main__.Vector object at 0x00000000034CE860>

>>> print (v1+v2)

Vector (7, 8)


# multiple inheritance >>> class Calculator: def calculate(self, expression): self.value = eval(expression) >>> class Talker: def talk(self): print("Hi, my value is ", self.value) >>> class TalkingCalculator(Calculator, Talker): pass >>> tc = TalkingCalculator() >>> tc.calculate('10+3*4') >>> tc.talk() Hi, my value is 22


OOD


Exception 처리


Exception 처리

• 예상치 못한 error를 처리하는 2가지 기법 – Exception Handling: Exceptions.

– Assertions: 다룬 바 있음.

• Exception이란? – 프로그램의 정상적인 수행을 중단시키는 event.

– 즉, error를 나타내는 Python object = exceptional condition

– 보통 이 경우 raises an exception.

– (중요) 각각의 exception은 어떤 class의 instance이다.

– Exception Hierarchy • https://docs.python.org/3/library/exceptions.html#exception-hierarchy

• Custom Exception

• Exception과 함수 – 즉각 이를 처리 (handle)하지 않으면 (즉, 방어적 프로그래밍)

프로그램은 수행을 중단하고 error message (= traceback)와 함께 프로그램 종료

– 함수 내에서 exception이 발생했는데 적절히 처리하지 못하면 그 함수를 호출한 곳으로 propagate (bubble-up)


https://docs.python.org/3/library/exceptions.html#exception-hierarchy



• Exception의 처리 (Handling an exception): – Catching Exception - 처리순서 (trapping) – try: – 의도하는 작업; – except ExceptionI: – … – except ExceptionII: – ........... – else: – If there is no exception then execute this block.

– raise() - 의도적으로 exception을 발생시킴

>>> raise Exception Traceback (most recent call last): File "<pyshell#165>", line 1, in <module> raise Exception Exception


while True:

try:

n=input('Enter an integer: ')

n=int(n)

print('Successful!')

break

except ValueError:

print('No valid integer! Please try agin...')

print('again')


>>> x = 5+'ham' Traceback (most recent call last): File "<pyshell#23>", line 1, in <module> x = 5+'ham' TypeError: unsupported operand type(s) for +: 'int' and 'str' >>> try: x = 5+'ham' except: print('sorry, a mistake!') sorry, a mistake! >>> try: x = 5+'ham' except: pass


>>> try: x= 1/0 except ZeroDivisionError: print('mistake: div by zero) SyntaxError: EOL while scanning string literal >>> try: x= 1/0 except ZeroDivisionError: print('mistake: div by zero') print("routines to handle such error HERE") finally: print('Print this anyway') mistake: div by zero routines to handle such error HERE Print this anyway


#!/usr/bin/python

try:

fh=open('testfile','w')

fh.write("this is my test file for exception handling!")

except IOError:

print("Error: can't find a file or read data")

else:

print("Written successfully")

fh.close



• Standard Library

• 정규표현식 (Regular Expression)

• Database 활용


Standard Library

• 개념

• 주요 내용

• 내장함수 (예)

• 예제 – sys module

– os module

– fileinput module

– random module

– shelve module

– pickle module


• Python standard library – https://docs.python.org/3/library/

– 내장 (built-in) module들을 간직한 라이브러리 • Python으로 작성

• C, C++, …

– standard library 이외에도 방대한 3rd party library 가 있음.

– Python Package Index.(2014년 10월 현재 50,284개) • Python 관련 software repository

• https://pypi.python.org/pypi


https://docs.python.org/3/library/

https://pypi.python.org/pypi

https://pypi.python.org/pypi

Python Standard Library 주요 내용

• (1) 내장 함수 및 내장 상수 – 내장함수 – 뒷면 참조 – True/False/None/...

• (3) 내장 타입 (Built-in Types) – 데이터 타입 – Exception

• (4) Text Processing Services – string — String관련 method와

formatting spec. – ** re — Regular expression – 기타 (difflib, unicodedata

(Unicode Database) 등 ...)

• (5) Numeric 및 Mathematical Modules – math, cmath, random 등...

• (6) Data Persistence/파일 관련 – ** pickle (Python object

serialization) – marshal — Internal Python

object serialization – ** sqlite3 – zlib, gzip 등 파일 압축관련 , csv,

configparser 등 파일 포맷관련 – hashlib 등 암호화 관련

• (7) OS 서비스 관련 – pathlib, os.path, fileinput, glob

등 파일/디렉토리 관련 – argparse, getopt, logging 등 – platform, errno 등 – ctypes – threading, multiprocessing 등

Concurrent Execution


• (8) 네트워크 – socket, ssl, asyncio, signal 등

의 IPC, network 관련

– email, json, mailbox 등의 네트워크 데이터 관련

– html, XML 처리 관련

– webbrowser, cgi, urllib, http 등

• (9) Python Runtime Services – sys, sysconfig, Built-in objects

– __main__ (Top-level script 환경)

• (10) 기타 – Multimedia 서비스

– Internationalization (gettext, locale, ...)

– Program Frameworks

– GUI (tkinter 등)

– IDLE

– 개발도구 (pydoc, doctest, unittest, bdb, pdb, ...)

– S/W Packaging과 Distribution (distutils 등)

– Custom Python Interpreters

– Importing Modules

– Python Language Services (parser, formatter, ...)


예: 내장함수

abs() dict() help() min() setattr()

all() dir() hex() next() slice()

any() divmod() id() object() sorted()

ascii() enumerate() input() oct() staticmethod()

bin() eval() int() open() str()

bool() exec() isinstance() ord() sum()

bytearray() filter() issubclass() pow() super()

bytes() float() iter() print() tuple()

callable() format() len() property() type()

chr() frozenset() list() range() vars()

classmethod()

getattr() locals() repr() zip()

compile() globals() map() reversed() __import__()

complex() hasattr() max() round()

delattr() hash() memoryview()

set() 빅데이터분석교육(2015-11)

https://docs.python.org/3/library/functions.html










































































• sys module 관련 주요 함수

– argv

– exit([arg])

– modules

– path

– platform

– stdin

– stdout

– stderr

• 활용예 – sys.argv

– sys.exit

– …

>>> import sys

>>> print (sys.version)

3.4.2 (v3.4.2:ab2c023a9432, Oct 6 2014, 22:15:05) [MSC v.1600 32 bit (Intel)]

>>> print (sys.platform)

win32

>>> print(sys.path)

['C:/Python34', 'C:\\Python34\\Lib\\idlelib', …

>>> sys.version_info

sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)

>>> if sys.version_info<(2,6,0):

sys.stderr.write("You need python 2.6 or later to run this script\n")

exit(1)


# II03_systest.py

import sys

args = sys.argv[1:]

args.reverse()

print('~'.join(args))

--

$ python II03_systest.py this is a test



• os module 관련 주요 함수 – environ

– system(command)

– startfile(command)

– sep

– pathsep

– linesep

– urandom(n)

• 참고: – MS Windows에서는 Python

프로그램이 계속 수행됨

– Linux에서는 Python 프로그램은 os 관련 함수 종료까지 대기

import os

>>> os.startfile(r'C:\Program Files\Internet Explorer\iexplore.exe')



• fileinput module 관련 주요 함수 – Input([files[, inplace[,

backup]])

– filename()

– lineno()

– filelineno()

– lineno()

– isstdin()

– nextfile()

– close()

# II03_numberlines.py

import fileinput

for line in fileinput.input(inplace=1):

line = line.rstrip()

num = fileinput.lineno()

print('%-40s # %2i' % (line, num))

C:\python34>..\python II03_numberlines.py II03_numberlines.py

• type…



• random module 주요함수 – random()

– getrandbits()

– uniform(a,b)

– randrange([start], stop, [step])

– choice(seq)

– shufle(seq[, random])

– sample(seq, n)

# Program II03_randrange.py

from random import randrange

num = input('How many dice? ')

sides = input('How many sides per dice? ')

sum =0

for i in range(int(num)):

sum += randrange(int(sides))+1

print('The result is: ', sum)


>>> values = list(range(1,11)) + 'Jack Queen King'.split() >>> values [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'Jack', 'Queen', 'King'] >>> suits = 'diamonds clubs hearts spades'.split() >>> suits ['diamonds', 'clubs', 'hearts', 'spades'] >>> >>> from pprint import pprint >>> deck = ['%s of %s' % (v,s) for v in values for s in suits] >>> pprint(deck[:12]) ['1 of diamonds', '1 of clubs', ... '3 of hearts', '3 of spades'] >>> from random import shuffle >>> shuffle(deck)

>>> pprint(deck[:12]) ['4 of clubs', '1 of diamonds', .. '2 of diamonds', '5 of spades'] >>> >>> # dealer distributes a card >>> while deck: input(deck.pop()) Queen of clubs '' Jack of clubs '' 3 of diamonds ... >>>


• shelve module

– open(filename[, flag='c'[, protocol=None[, writeback=False]]])

• 개념

– “shelf” = a persistent, dictionary-like object.

– shelf에서의 value는 어떠한 object라도 상관없다.(cf.“dbm” 데이터베이스)

– 즉, anything that the pickle module can handle. (예: class instances, recursive data types, 등)

– 단, key는 일반 string일 것

– 참조: II03_shelf.py

• pickle module – dump

– load

– dumps

– loads

– ...

• 개념 – serializing and de-serializing

object structure.

– “Pickling”; object가 byte stream으로,

– “unpickling”; byte stream을 object로

– = “serialization”, “marshalling,” “flattening”


# pickle

>>> help(pickle)

~~

>>> movieList=['Monty Python', 'Inception','Star Wars','Lord of the Rings']

>>> print(movieList)

['Monty Python', 'Inception', 'Star Wars', 'Lord of the Rings']

>>> outFile=open('pickle.txt','wb')

>>> pickle.dump(movieList, outFile)

>>> outFile.close()

====== RESTART ==========

>>> import pickle

>>> inFile = open('pickle.txt','rb')

>>> newList= pickle.load(inFile)

>>> print(newList)

['Monty Python', 'Inception', 'Star Wars', 'Lord of the Rings']

>>> inFile.close

<built-in method close of _io.BufferedReader object at 0x00000000034FE258>


정규표현식 (Regular Expression)

• 개념

• 주요함수

• 주요 method

• Identifier

• Modifier

• White Space

• 기타

• 예제


• 정규표현식? – re, regular express, regex, regexp

• 주요 내용 – 특수문자에 대한 escape – Alternative와 subpattern

– p(ython|erl)

– Optional 및 반복되는 subpattern – r'(http://)?(www\.)?python\.org’ – == http:// (o,x), www(o,x)

• (pattern)* ; zero or more times • (pattern)+ ; one or more times • (pattern){m,n} ; m to n times

– 문자열의 시작과 끝 • ^와 $


• re module

– 주요 함수

– match object의 주요 method


• Identifiers: – \d = any number – \D = anything but a number – \s = space – \S = anything but a space – \w = any letter – \W = anything but a letter – . = any character, except for a

new line – \b = space around whole

words – \. = period. (보통 ‘.’ 는 any

character를 뜻하므로 반드시 …)

• Modifiers: – {1,3} = for digits, u expect 1-3

counts of digits, or “places” – + = match 1 or more – ? = match 0 or 1 repetitions. – * = match 0 or MORE

repetitions – $ = matches at the end of

string – ^ = matches start of a string – | = matches either/or. Example

x|y = will match either x or y – [] = range, or “variance” – {x} = expect to see this amount

of the preceding code. – {x,y} = expect to see this x-y

amounts of the precedng code


• White Space Chars: – \n = new line – \s = space – \t = tab – \e = escape – \f = form feed – \r = carriage return

• ESCAPE 대상이 되는 특수문자 – . + * ? [ ] $ ^ ( ) { } | \

• Brackets: – [] = quant[ia]tative = either

quantitative, or quantatative. – [a-z] = any lowercase letter a-z – [1-5a-qA-Z] = all numbers 1-5,

lowercase letters a-q and uppercase A-Z


>>> exampleString = ''' Jessica is 15 years old, and Daniel is 27 years old. Edward is 97, and his grandfather, Oscar , is 102.'''

>>>

>>> ages = re.findall(r'\d{1,3}', exampleString)

>>> names = re.findall(r'[A-Z][a-z]*', exampleString)

>>> print(ages)

['15', '27', '97', '102']

>>> print(names)

['Jessica', 'Daniel', 'Edward', 'Oscar']

>>> ageDict= {}

>>> x=0

>>> for eachName in names:

ageDict[eachName]=ages[x]

x+=1

print(ageDict)

• 결과

{'Jessica': '15'}

{'Jessica': '15', 'Daniel': '27'}

{'Edward': '97', 'Jessica': '15', 'Daniel': '27'}

{'Edward': '97', 'Jessica': '15', 'Oscar': '102', 'Daniel': '27'}

>>>


# x = urllib.request.urlopen('https://www.google.com') # print(x.read()) # del x

# urllib module과 re의 결합 import urllib.request import urllib.parse import re url = 'http://www.python.org' values = {'s':'python', 'submit':'search'} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() print(respData)


#!/usr/bin/python import re line = "Cats are smarter than dogs" matchObj = re.match( r'(.*) are (.*?) .*',

line, re.M|re.I) if matchObj: print "matchObj.group() : ",

matchObj.group() print "matchObj.group(1) : ",

matchObj.group(1) print "matchObj.group(2) : ",

matchObj.group(2) else: print "No match!!"

• 수행 시 결과

>>> matchObj.group() : Cats are smarter than

dogs matchObj.group(1) : Cats matchObj.group(2) : smarter


Database 활용 (생략)

• 개요

• DB API 에서의 최소한의 작업절차:

• MySQL 예 (Python v.2x)

• SQLite3의 API

• SQLite3의 예제


III. Python을 활용한 데이터분석 (1)

• 개요

• 환경설정


개요


Python 활용 데이터분석 프로세스


데이터분석 용 주요 패키지

• ipython,

• numpy,

• matplotlib,

• scipy

• pandas


ipython

• 개념 – 대화형 컴퓨팅 작업용 command shell (여러 프로그램 언어 지원)

• 특징 – introspection – 추가의 shell 기능- (terminal and Qt-based)

• tab completion & history, rich media

• Ipython Notebook – 브라우저 기반의 코딩, 수식, inline plots, rich media. – 대화형 data visualization 및 GUI toolkits 적용 – Flexible, embeddable interpreters to load into one's own projects. – parallel computing용 performance tools

• Profileing & 최적화 – %time, %timeit in Ipython – %prun ; to profile a statement with cProfile – %run –p ; to profile whole programs – Line_profiler module, for line-by-line timing


numpy

• 개념 – 수치 데이터 처리 기능을 확장

• 주요 기능 – large, multi-dimensional arrays and matrices 및

high-level 수학 함수 지원

• 배경 – Numeric에서 출발


• numpy에서의 Array 생성 함수


• Universal functions – ndarray 상의 데이터에 대해 element-wise 작업

– ufunc

– 일종의 vectorized wrapper



matplotlib

• 개념 – plotting library for the Python and its NumPy.

• 주요 기능 – Plot을 애플리케이션에 내장하기 위한 object-oriented API

• general-purpose GUI toolkits 이용( wxPython, Qt, or GTK+)

• pylab – state machine 기반 (예: OpenGL),

– MATLAB과 유사

– SciPy 은 matplotlib을 이용


scipy

• 개념 – 과학, 분석 용 오픈소스 기반 Python library

• Numpy와 scipy – NumPy array object위에서 구축/개발됨

– NumPy stack의 일부분 • Matplotlib, pandas 및 SymPy을 포함

• 주요 내용 – 최적화, linear algebra, integration, interpolation, special

functions, FFT, signal 및 image processing, ODE solvers

– 기타의 science and engineering 작업 도구

• 라이센스 – BSD license


pandas

• 개념 – Python 을 이용한 데이터 분석을 위한 software library

– Data munging/preparation/cleaning /integraation

– Rich data manipulation tool (Numpy 이용)

– Fast, intuitive data structures

– Python과 DSL (예: R)의 중간영역 (?)

– R의 data.frame과 유사

– Easy-to-use, highly consistent API


세부 내용

• 주요 기능 – DataFrame object – Integrated indexing을 이용한 데이터

분석 – 여러 포맷 지원 (CSV, text files, Excel, SQL databases,

HDF5) – data alignment 및 결측 데이터를 위한 통합 기능 – 데이터셋의 reshaping 및 pivoting – 대규모 dataset 용 label-based slicing, indexing,

subsetting – 데이터 Aggregating/ transforming data (group by 엔진) split-apply-combine operations on data sets;

– Hierarchical axis indexing – Time series 기능


pandas.core

• Data structures – Series (1D)

– DataFrame (2D)

– Panel (3D)

• NA-friendly statistics

• Index implementations/label-indexing

• GroupBy engine

• Time series tools – Data range generation

– Extensible data offsets

• Hierarchical indexing stuff


Pandas의 데이터 모델

• Series: – 1D label – numpy array – Subclass of numpy.ndarray – Data: any dtype – Index labels need not be ordered – Duplicates are possible (but result in reduced functionality)

• DataFrame – 2D table with rows and column labels – potentially heterogeneous columns – ndarray-like, but not ndarray – column 별로 서로 다른 dtype을 가질 수 있음 – Row and column index – Size mutable: insert and delete columns


index

• Index – Every axis has an index – 신속한 lookup과 Data alignment and join operations – Hierarchical indexes

• Semantics: a tuple at each tick • Enables easy group selection • Terminology: "multiple levels" • Natural part of GroupBy and reshape operations

• Data Alignment – Binary operations are joins! – "Outer join by default – Data Alignment – DataFrame joins/aligns on both axes – Irregularly-indexed data


Series

• Subclass of numpy.ndarray

• Data: any dtype

• Index labels need not be ordered

• Duplicates are possible (but result in reduced functionality)


DataFrame

• ndarray-like, but not ndarray

• Each column can have a different dtype

• Row and column index

• Size mutable: insert and delete columns


Hierarchical indexes

• Semantics: a tuple at each tick

• group selection이 손쉬워짐

• 용어: "multiple levels"

• Natural part of GroupBy and reshape operations


Data Alignment

• Binary operations are joins!

• "Outer join by default

• DataFrame joins/aligns on both axes


• Irregularly-indexed data

• Axis metadata


GroupBy

• Splitting axis into groups – DataFrame columns

– Arrays of labels

– Functions, applied to axis labels

• grouped data 작업방식은 다양함 – Iterate: "for key group, in grouped"

– Aggregate: grouped.agg(f)

– Transform: grouped.transform(f)

– Apply: grouped.apply(f)


기타

• Agg, Transform, Apply – Agg/Transform are specialized, faster

• Agg: produce a single aggregated value per column per group

• Transform: alster values, but not their size

– Apply: completely generic, but slower

• Join/concatenation algorithms

• Sparse version of Series, DataFrame,…

• IO tools: csv files, HDF5, Excel

• Moving window statistics (rolling mean, …)

• Pivot tables

• High-level matplotlib interface

• Better integration with stats models and scikit-learn

• R integration via rpy2


pandas roadmap

• Javascript visualization framework과의 통합 – D3, Flot, others

• Alternate DataFrame “backends” – Memory maps

– HDF5/PyTables

– SQL or NoSQL-backed

• Ipython Notebook과의 통합 강화

• ggplot2 for Python

• pandas for Big Data – Alternate DataFrame backends

– Integration with MapReduce framework


환경설정

• 패키지

– ipython,

– numpy,

– matplotlib,

– scipy

– pandas


• 개별적 설치 – 각 프로젝트 사이트 이용

• http://www.ipython.org/ • http://www.numpy.org/ • http://pandas.pydata.org/ • http://matplotlib.org/

• 통합설치 – Enthought Canopy

• https://store.enthought.com/

– Python(x,y) • https://code.google.com/p/pythonxy/

– Anaconda • https://store.continuum.io/cshop/anaconda/


http://www.ipython.org/

http://www.ipython.org/

http://www.numpy.org/

http://www.numpy.org/

http://pandas.pydata.org/

http://pandas.pydata.org/

http://matplotlib.org/

http://matplotlib.org/

https://store.enthought.com/

https://store.enthought.com/

https://code.google.com/p/pythonxy/

https://code.google.com/p/pythonxy/

https://store.continuum.io/cshop/anaconda/

https://store.continuum.io/cshop/anaconda/

실습

• 기초

• 응용


기초

• ipython

• numpy

• matplotlib

• scipy

• pandas


실습 – 코드와 데이터

• scikit learning 사이트

• Python for Data Analysis

– By Wes McKinney


III. Python을 활용한 데이터분석 (2)

• 데이터 분석 사례

• 마무리


데이터분석 사례


마무리

• Python과 데이터 분석의 영역

• Python과 R

• Python과 빅데이터


Python과 데이터 분석의 영역

• https://pypi.python.org/pypi?%3Aaction=browse


https://pypi.python.org/pypi?:action=browse

https://pypi.python.org/pypi?:action=browse

• 정형데이터의 분석 – 데이터베이스 (SQL, NoSQL) – 기계학습, Mining – 수리, 통계분석

• 비정형데이터의 분석 – re, pawk 등 – www.nltk.org – …

• 빅데이터

– 다양한 시도


http://www.nltk.org/

Python과 R

• R의 장점: – 수리, 통계에 특화 (DSL)

– 5000여개의 특화된 패키지

• Python의 장점: – 강력한 범용언어

• 충실한 OOL (Object-Oriented Language),

• Dynamic Typing

• …

– 50,000여개 패키지와 방대한 Framework

• 통합의 어려움 – 설계사상의 차이점

– Python관점: more pythonic appoarch의 구현문제

– Namespace 등


• 다양한 시도 – (1) Rserver

• = R을 위한 TCP/IP 서버 • 다양한 client가 R을 access하도록 함 (예: c/c++/c#/Ruby, ...) • pyRserve를 통해 Python client가 R을 직접 호출 가능

– R code는 Python으로 callback

– (2) rPython • R에서 python을 호출 • python.call( "len", 1:3 ) • a <- 1:4 • b <- 5:8 • python.exec( "def concat(a,b): return a+b" ) • python.call( "concat", a, b)

– (3) rpy2 • Python에서 r을 호출 • rpy에서 출발


Python과 빅데이터

• 배경 – Big Data & Hadoop

– Jython 프로그램 이용

– Hadoop streaming 이용

– source: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

• Python MapReduce 프로그래밍 개요 – 환경: Hadoop on Linux

– 데이터: WordCount 예에 Gutenberg 데이터(https://www.gutenberg.org/) 적용


http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/















https://www.gutenberg.org/

• 방법론 – Hadoop Streaming

• 모든 Hadoop job을 표준입출력 (stdin, stdout)으로 여긴다.

• (http://hadoop.apache.org/docs/r1.1.2/streaming.html#Hadoop+Streaming)

• Hadoop에 포함된utility – 어떤 프로그램 언어로 작성되었던 상관없이 Hadoop의 Map/Reduce job으로 이

용할 수 있다. 즉, Python의 sys.stdin 으로 입력데이터를 읽고 sys.stdout으로 결과물을 출력한다.

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \

-input myInputDirs \

-output myOutputDir \

-mapper /bin/cat \

-reducer /bin/wc


http://hadoop.apache.org/docs/r1.1.2/streaming.html#Hadoop+Streaming



• Linux shell – = command interpreter

– Standard I/O

• 일단 명령이 수행되면 process가 만들어지며 이 process opens 3 flows:

• stdin,

– standard input reads the input data.

• stdout, – standard output writes the output data.

• stderr, – standard error writes the error messages.

– Redirection과 Pipe


Python MapReduce Code

• Map 단계 (/home/hduser/mapper.py) #!/usr/bin/env python import sys # input comes from STDIN (standard input) for line in sys.stdin: line = line.strip() words = line.split() for word in words: # write the results to STDOUT (standard output); # what we output here will be the input for the Reduce step # print '%s\t%s' % (word, 1)


• Reducer 단계 (/home/hduser/reducer.py) #!/usr/bin/env python from operator import itemgetter import sys current_word = None current_count = 0 word = None for line in sys.stdin: line = line.strip() word, count = line.split('\t', 1) try: count = int(count) except ValueError: continue if current_word == word: current_count += count else: if current_word: print '%s\t%s' % (current_word, current_count) current_count = count current_word = word if current_word == word: print '%s\t%s' % (current_word, current_count)


• Testing – mapper와 reducer를 별도로 test하여 확인된 후 MapReduce

job으로 실행

• Python 코드를 Hadoop에서 수행 – (1) 데이터를 HDFS로 복사

$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg

– (2) MR job 수행 $ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar \ -file /home/hduser/mapper.py -mapper /home/hduser/mapper.py \ -file /home/hduser/reducer.py -reducer /home/hduser/reducer.py \ -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output


• 수행과정 $ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -mapper /home/hduser/mapper.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output additionalConfSpec_:null null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming packageJobJar: [/app/hadoop/tmp/hadoop-unjar54543/] [] /tmp/streamjob54544.jar tmpDir=null [...] INFO mapred.FileInputFormat: Total input paths to process : 7 [...] INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local] [...] INFO streaming.StreamJob: Running job: job_200803031615_0021 [...] [...] INFO streaming.StreamJob: map 0% reduce 0% … [...] INFO streaming.StreamJob: map 100% reduce 100% [...] INFO streaming.StreamJob: Job complete: job_200803031615_0021 [...] INFO streaming.StreamJob: Output: /user/hduser/gutenberg-output


• 실행결과


• Python 프로그래밍 관련 참고한 문헌:

– M.L. Hetland, “Beginning Python”, Apress, 2008

– V.L. Ceder, “The Quick Python Book”, Manning, 2010

– P.Wentworth (외), How to Think Like a Computer Scientist”, 2011

– Mark Lutz, Learning Python (5th ed), O’Reilly, 2013

– 기타

• 관련 article 및 관련 사이트 등


http://www.tutorialspoint.com/python/

http://www.tutorialspoint.com/python/

• 참고도서 – Python for Data Analysis

• 데이터 – 2012 US Presidential Election

FEC disclosure data (CSV) – Baby names: top 1000 US boy

and girl names 1880~2008 (CSV) – USDA Food Nutrient database

(JSON) – https://github.com/pydata/pyda

ta-book


https://github.com/pydata/pydata-book




scikit.org

• http://scikit-learn.org/


http://scikit-learn.org/





Documents

Python을 이용한 데이터 분석 - openwith.net¹…데이터분석2015Part4_2.pdf · Python을 이용한 데이터 분석 2015.11 윤형기 ([email protected]) 빅데이터분석교육(2015-11)