Upload
ngotram
View
241
Download
9
Embed Size (px)
Citation preview
1
ProgramminginPython
MichaelSchroederSebas0anSalen0n
UpdatesbyAndreasHenschel
Lecture1:Datatypes,Condi0ons,andLoops
SlidesderivedfromIanHolmes,DepartmentofSta0s0cs,UniversityofOxford
2
Objec0vesofthiscourse
• Conceptsofcomputerprogramming• RudimentaryPython(widely-usedlanguage)
• Introduc0ontoBioinforma0csfileformats
• Prac0caldata-handlingalgorithms
• Exposuretobioinforma0cslibraries
3
Mo0va0on
mkdntvplkliallangefhsgeqlgetlgmsraainkhiqtlrdwgvdvftvpgkgyslpep mktvrqerlksivrilerskepvsgaqlaeelsvsrqvivqdiaylrslgynivatprgyvlagg kaltarqqevfdlirdhisqtgmpptraeiaqrlgfrspnaaeehlkalarkgvieivsgasrgirllqee mrssakqeelvkafkallkeekfssqgeivaalqeqgfdninqskvsrmltkfgavrtrnakmemvyclpaelgvptt gqrhikireiimsndietqdelvdrlreagfnvtqatvsrdikemqlvkvpmangrykyslpsdqrfnplqklkr kgqrhikireiitsneietqdelvdmlkqdgykvtqatvsrdikelhlvkvptnngsykyslpadqrfnplsklkr dvtgriaqtllnlakqpdamthpdgmqikitrqeigqivgcsretvgrilkmledqnlisahgktivvygt dikqriagffidhanttgrqtqggvivsvdftveeianligssrqttstalnslikegyisrqgrghytipnlvrlkaaa iderdkiileilekdartpfteiakklgisetavrkrvkaleekgiiegytikinpkklg elqaiapevaqslaeffavladpnrlrllsllarselcvgdlaqaigvsesavshqlrslrnlrlvsyrkqgrhvyyqlqdhhivalyqnaldhlqec mntlkkafeildfivknpgdvsvseiaekfnmsvsnaykymvvleekgfvlrkkdkryvpgyklieygsfvlrrf lfneiiplgrlihmvnqkkdrllneylsplditaaqfkvlcsircaacitpvelkkvlsvdlgaltrmldrlvckgwverlpnpndkrgvlvklttggaaiceqchqlvgqdlhqeltknltadevatleyllkkvlp nypvnpdlmpalmavfqhvrtriqseldcqrldltppdvhvlklideqrglnlqdlgrqmcrdkalitrkirelegrnlvrrernpsdqrsfqlfltdeglaihqhaeaimsrvhdelfapltpveqatlvhlldqclaaq tdilreigmiaraldsisniefkelsltrgqylylvrvcenpgiiqekiaelikvdrttaaraikrleeqgfiyrqedasnkkikriyatekgknvypiivrenqhsnqvalqglseveisqladylvrmrknvsedwefvkkg mskindindlvnatfqvkkffrdtkkkfnlnyeeiyilnhilrsesneisskeiakcsefkpyyltkalqklkdlkllskkrslqdertvivyvtdtqkaniqkliseleeyikn aitkindcfellsmvtyadklkslikkefsisfeefavltyisenkekeyylkdiinhlnykqpqvvkavkilsqedyfdkkrnehdertvlilvnaqqrkkiesllsrvnkrit miimeeakkliielfselakihglnksvgavyailylsdkpltisdimeelkiskgnvsmslkkleelgfvrkvwikgerknyyeavdgfssikdiakrkhdliaktyedlkkleekcneeekefikqkikgiermkkisekilealndld aqspagfaeeyiiesiwnnrfppgtilpaerelseligvtrttlrevlqrlardgwltiqhgkptkvnnfwets eekrsstgflvkqraflklymitmteqerlyglkllevlrsefkeigfkpnhtevyrslhellddgilkqikvkkegaklqevvlyqfkdyeaaklykkqlkveldrckkliekalsdnf hmqaeilltlklqqklfadprrisllkhialsgsisqgakdagisyksawdainemnqlsehilveratggkggggavltrygqrliqlydllaqiqqkafdvlsdddalplnsllaaisrfslqts skvtyiikasndvlnektatilitiakkdfitaaevrevhpdlgnavvnsnigvlikkglveksgdgliitgeaqdiisnaatlyaqenapellk sprivqsndlteaayslsrdqkrmlylfvdqirksdgtlqehdgiceihvakyaeifgltsaeaskdirqalksfagkevvfyrpeedagdekgyesfpwfikpahspsrglysvhinpylipffiglq nrftqfrlsetkeitnpyamrlyeslcqyrkpdgsgivslkidwiieryqlpqsyqrmpdfrrrflqvcvneinsrtpmrlsyiekkkgrqtthivfsfrdit lglekrdreilevlilrfgggpvglatlatalsedpgtleevhepylirqgllkrtprgrvatelarrhl lglekrdreilevlilrfgggpvglatlatalsedpgtleevhepylirqgllkrtprgrvatelayrhlgypppv egldefdrkilktiieiyrggpvglnalaaslgveadtlsevyepyllqagflartprgrivtekaykhlkyevp iseevliglplheklfllaivrslkishtpyitfgdaeesykivceeygerprvhsqlwsylndlrekgivetrqnkrgegvrgrttlisigtepldtleavitklikeelr kyeltlqrslpfiegmltnlgamklhkihsflkitvpkdwgynritlqqlegylntladegrlkyiangsyeiv pmkteqkqeqetthknieedrklliqaaivrimkmrkvlkhqqllgevltqlssrfkprvpvikkcidiliekeylervdgekdtysyla gspekilaqiiqehregldwqeaatraslsleetrkllqsmaaagqvtllrvendlyaist eryqawwqavtraleefhsryplrpglareelrsryfsrlparvyqalleewsregrlqlaantvalagftps fsetqkkllkdledkyrvsrwqppsfkevagsfnldpseleellhylvregvlvkindefywhr qalgeareviknlastgpfglaeardalgssrkyvlplleyldqvkftrrvgdkrvvvgn vpkrvywemlatnltdkeyvrtrralileilikagslkieqiqdnlkklgfdevietiendikglintgifieikgrfyqlkdhilqfvipnrgvtkqlv irtfgwvqnpgkfenlkrvvqvfdrnskvhnevknikiptlvkeskiqkelvaimnqhdliytykelvgtgtsirseapcdaiiqatiadqgnkkgyidnwssdgflrwahalgfieyinksdsfvitdvglaysksad gsaiekeilieaissyppairiltlledgqhltkfdlgknlgfsgesgftslpegilldtlanampkdkgeirnnwegssdkyarmiggwldklglvkqgkkefiiptlgkpdnkefishafkitgeglkvlrrakgstkftr
AllthesesequencesarewingedhelixDNAbindingdomains. Howcanwegroupthemintofamilies?
4
Mo0va0onLet'srebuildSCOPfamilies!
• GivenaSCOPsuperfamilyanditssequences ->howcanwedivideitintofamilies?
• First,weneeddynamicprogrammingtodeterminethesequencesimilarity
• Thenwedothefollowing:– Forallpairsofsequences,callthesequencesimilarityalgorithmandrecordthesimilarityintoadistancematrix
– Next,runhierarchicalclusteringtoclusterthesequences.
Whichprogrammingstepsarenecessary?
5
WhyPython?
• WellsuitedforscripFng,easysyntax• capableofobject-orientedprogramming• Complexdatatypesandlargeprojectsfeasible• Availabilityandreuseofcode(e.g.BioPython)• Universallanguage,applica0onsinandbeyondbioinforma0cs:Galaxy,PyMOL,Tophat,etc.
• Compa0blewithmostso]waretechnologies:GUI,MPI,OpenGL,RDB
• TestcomplicatedexpressionsinthePythonshell
6
Styleofthislecture
• Thecolorschemeforprograms,outputandtextfiles:
• Interac0onwiththePythonshell:veryhandyforquicktests.Helpsbeginnerstoovercomephysiologicalbarrier:Goahead,trythingsout!
The main program The program output Files are shown in yellow
Thefilenamegoeshere
>>> (Python Expression) (immediate Python result)
Prompt,(pythonexpectsinputhere) PressEnter
7
Generalprinciplesofprogramming
• Makeincrementalchanges
• Testeverythingyoudo– usethePythonshellfortes0ngexpressions/func0ons
interac0vely– theedit-run-revisecycle
• Writesothatotherscanreadit– (whenpossible,writewithothers)– Documentcode
• Thinkbeforeyouwrite• Useagoodtexteditor/IDE
– vim,emacs,Atom– (iPythonnotebooks)
YoucanusegeditonthePCPoolWorksta0ons
8
Datatypes
9
Pythonbasics
• BasicsyntaxofaPythonprogram:
# Elementary Python program print "Hello World"
printstatementtellsPythontoprintthefollowingstufftothescreen
Singleordoublequotesenclosea"stringliteral"
Linesbeginningwith"#"arecomments,andareignoredbyPython
Hello World
10
Variables
• WecantellPythonto"remember"apar0cularvalue,usingtheassignmentoperator"=":
• Thexisreferredtoasa"scalarvariable".Variablenamescancontainalphabe0ccharacters,numbers(butnotatthestartofthename),andunderscoresymbols"_"
x = 3 print x
3
x = "ACGCGT" print x
ACGCGT
Bindingsiteforyeasttranscrip0onfactorMCB
11
VariablesandObjects
• EverythinginPythonisanobject• Anobjectmodelsareal-worlden0ty• objectspossessmethods(alsocalledfunc0ons)thataretypicallyappliedtotheobject,possiblyparameterized
• objectscanalsopossessvariables,thatdescribetheirstate
• e.g.x.upper()isaparameter-lessmethod,thatworksonthestringobjectx
Object . Methodorvariable
12
Built-indatatypesandopera0onsI/II
• TruthValuesandBoolean– None,True,False(arespecialtypeofintegers)– BooleanOpera0ons:and,or,not
• Comparisons– <,==,!=,isnot,…
• Numerictypes– Integersandfloa0ngpointnumbers– Arithme0copera0ons:x+y,x/y,abs(x),x**y,…
13
Built-indatatypesandopera0onsII/II
• Sequencetypes– Strings,lists,tuples,…– Opera0ons:concaten*on(+),len(x),xins,…– Stringswithaddi0onalmethods:
• capitalize,endswith,find,islower,lstrip,…
• Settypes– set,frozenset(can’tbechanged:immutable)
– Opera0ons:len(s),issubset(other),union(s1,s2),add,…
• MappingTypes– dict:mapshashablekeystovalues– Opera0ons:deld[key],keyind,…
• …andsomemore
14
Arithme0copera0ons…
• Basicoperatorsare+ - / * %x = 14 y = 3 print "Sum: ", x + y print "Product: ", x * y print "Remainder: ", x % y
Sum: 17 Product: 42 Remainder: 2
x = 5 print "x started as", x x = x * 2 print "Then x was", x x = x + 1 print "Finally x was" ,x
x started as 5 Then x was 10 Finally x was 11
Couldwritex *= 2 Couldwritex += 1
15
…Orinterac0vely
>>> x = 14 >>> y = 3 >>> x + y 17 >>> x * y 42 >>> x % y 2 >>> x = 5 >>> print "x started as", x x started as 5 >>> x *= 2 >>> print "Then x was", x Then x was 10 >>> x += 1 >>> print "Finally x was", x Finally x was 11 >>>
• Thisway,youcanusePythonasacalculator
• Canalsouse+= -= /= *=
16
Stringopera0ons
• Concatena0on+ +=
• Canfindthelengthofastringusingthefunc0onlen(x)
a = "pan" b = "cake" a = a + b print a
pancake
a = "soap" b = "dish" a += b print a
soapdish
mcb = "ACGCGT" print "Length of %s is "%mcb, len(mcb)
Length of ACGCGT is 6
17
Stringformavng
• Stringscanbeformawedwithplaceholdersforinsertedstrings(%s)andnumbers(%dfordigitsand%fforfloats)
• UseOperator%onstrings:
>>> "aaaa%saaaa%saaa"%("gcgcgc","tttt") 'aaaagcgcgcaaaattttaaa' >>> "A range written like this: (%d - %d)" % (2,5) 'A range written like this: (2 - 5)' >>> "Or with preceeding 0's: (%03d - %04d)" % (2,5) "Or with preceeding 0's: (002 - 0005)" >>> "Rounding floats %.3f" % math.pi 'Rounding floats 3.142' >>> ”Scientific notation: %.3e" % 0.0000002345) ’Scientific notation: 2.345e-07'
FormawedString % Inser0onTuple
18
Morestringopera0ons
x = "A simple sentence" print x print x.upper() print x.lower() xl=list(x) xl.reverse() print "".join(xl) x = x.replace("i", "a") print x print len(x)
A simple sentence A SIMPLE SENTENCE a simple sentence ecnetnes elpmis A A sample sentence 17
Converttouppercase
Converttolowercase
Convertthestringtoalist
Translate"i"'sinto"a"'s
Calculatethelengthofthestring
ReversethelistJoinalllistmembers
19
Concatena0ngDNAfragments
dna1 = "accacgt" dna2 = "taggtct" print dna1 + dna2
"Transcribing"DNAtoRNA
accacguuaggucu
dna = "accACgttAGGTct" rna = dna.lower().replace("t", "u") print rna
Makeitalllowercase
DNAstringisamixtureofupper&lowercase
Replace"t"with"u"
accacgttaggtct
20
Searchinginstrings
rna = ”accacguuaggucu” pattern = “cguu” Print rna.find(pattern)
4 accacguuaggucu012345678…
rna = ”accacguuaggucu” result = rna.endswith(“gg”) print result
False
rna = ”accacguuaggucu” print “cguu” in rna True
21
CondiFons
22
Condi0onalblocks
• Theabilitytoexecuteanac*oncon0ngentonsomecondi*oniswhatdis0nguishesacomputerfromacalculator.InPython,thislookslikethis:
x = 149 y = 100 if x > y: print x,"is greater than",y else: print x,"is less than", y
149 is greater than 100
Theseindenta0ons tellPythonwhichpieceofcodeiscon0ngentonthecondi0on.
if condition: action else: alternative
Consistent,level-wiseinden0ngimportant
23
Condi0onaloperators
• Numeric:> >= < <= != ==
• Thesameoperatorsworkonstringsasalphabe0ccomparisons
x = 5 * 4 y = 17 + 3 if x == y: print x, "equals", y 20 equals 20
Notethatthetestfor"xequalsy"isx==y,notx=y
(x, y) = ("Apple", "Banana") if y > x: print y, "after", x Banana after Apple
"doesnotequal"
Shorthandsyntaxforassigningmorethanonevariableata0me
24
Logicaloperators• Logicaloperators: and andor
• Thekeywordnotisusedtonegatewhatfollows.Thusnot x < ymeansthesameasx >= y
• ThekeywordFalse(orthevaluezero)isusedtorepresentfalsehood,whileTrue (oranynon-zerovalue,e.g.1)representstruth.Thus:
if True: print "True is true" if False: print "False is true" if -99: print "-99 is true"
True is true -99 is true
x = 222 if x % 2 == 0 and x % 3 == 0: print x, "is an even multiple of 3"
222 is an even multiple of 3
25
Loops
26
x = 0 while x < 10: print x, x+=1
0 1 2 3 4 5 6 7 8 9
Theindentedcodeisrepeatedlyexecutedaslongasthecondi0onx<10remainstrue
Loops
• Here'showtoprintoutthenumbers0to9:
• Thisisawhileloop.Thecodeisexecutedwhilethecondi0onistrue.
Equivalenttox = x + 1
27
Acommonkindofloop
• Let'sdissectthecodeofthewhileloopagain:
• Alterna0vely,theforloopconstructiteratesthroughasequence(e.g.list,tuple,string)
x = 0 while x < 10: print x, x+=1
Ini0alisa0on
Testforcomple0on
Con0nua0on
for x in range(10): print x,
Itera0onvariable Generatesalist[0,1,…,9]
28
Forloopfeatures• Loopscanbeusedwithalliteratabletypes,ie.:lists,strings,tuples,iterators,sets,filehandlers
• Stepsizescanbespecifiedwiththe3.argumentofthesliceconstructor(nega0vevaluesforitera0ngbackwards)
>>> for nucleotide in "actgc": ... print nucleotide, a c t g c
>>> for number in range(0,50,7) : ... print number, 0 7 14 21 28 35 42 49 >>> for nucleotide in "actgc"[::-1]: ... print nucleotide, c g t c a >>> print “HtaEdsLfgLdfOf”[::3] HELLO
29
Enumerate• Enumerateisahandymethodtotracktheindexwhenloopingoverasequence
• Insteadof…
• Wecanletdoenumeratedotheworkforus…
>>> i = 0 >>> for nuc in "actgc”: ... print “%i: %s” % (i+1, nuc) ... i +=1
>>> for i, nuc in enumerate("actgc”): ... print “%i: %s” % (i+1, nuc)
1: a 2: c 3: t 4: g 5: c
30
ReadingDatafromFiles
• Toreadfromafile,wecanconvenientlyiteratethroughitlinewisewithafor-loopandtheopenfunc0on.– Internallyafilehandleismaintainedduringtheloop.
Thiscodesnippetopensafilecalled"sequence.txt"intheinthecurrentdirectory,anditeratesthroughitlinebyline
for line in open("sequence.txt"): print line,
>CG11604 TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC
>CG11604 TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC
sequence.txt
Thecommapreventsprint'sautoma0cnewline
31
Howto…
• sumupallthenumbersin[3,7,10,4,-1,0]?
• printallwordin[‘Oranges’,‘Bananas’,‘Cucumbers’,‘Apples’]star0ngwithanOorA?
• checkwhichelementof[1,5,-1,8,7]arealsoinside[8,7,10,5,0,0]?
• checkifthera0ooftwointegersaandbislargerthan0.5?
• checkforalltuplesin[(3,2),(10,5),(1,-1)]ifthesquareofthefirstnumbercanbedividedbythesecondwithoutrest?
32
Summary
• EverythinginPythonisanobject• Built-indatatypes:numeric,sequences,sets,mappings
• Controlstructures:LoopsandCondi0ons– Forloop&whileloop– if,else,elif,<,>,==,!=,etc.
• PythonwellsuitedforstringmanipulaFon– Lotsofstring-specificmethods