Upload
adrian-sheehan
View
213
Download
0
Embed Size (px)
Citation preview
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-1Praktisch BLASTen & BLAST-Outputs
praktisch BLASTen praktisch BLASTen & &
BLAST-OutputsBLAST-Outputs
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-2Praktisch BLASTen & BLAST-Outputs
Ein praktisches BeispielEin praktisches Beispiel
ATGCTGTGGCAGCGTGCAGTCCAGTCTCGTACTGCAT NRPEP
1.506 kartierteGersten-Gene
2.869.704annotierteProteine
BlastXErgebnis:905 Annotation
Laufzeit:17,5 h
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-3Praktisch BLASTen & BLAST-Outputs
Lösung: Verteilung der Analysen Lösung: Verteilung der Analysen
NRPEP
NRPEP
NRPEP
NRPEP
NRPEP
...ATGCGTACGTAACCACGTAACAGTCGTCAACACAGATACGTCTGGCC
ATGCGTACGTAACCACGTAACAGTCGTCAACACAGATACGTCTGGCC
ATGCGTACGTAACCACGTAACAGTCGTCAACACAGATACGTCTGGCC
ATGCGTACGTAACCACGTAACAGTCGTCAACACAGATACGTCTGGCC
ATGCGTACGTAACCACGTAACAGTCGTCAACACAGATACGTCTGGCC
GTAGCTACGTAACCCGTAAAAGTCGTCAAAACAGATACCGTCTGGCC
GGTGCGTACGAACCCACGTAACAGTCGTTAAACAGATACAGTCTGGC
TTACGGCGTACGTAACCACGTAACAGTCGTCAACACAGATACGTCT
‚CCCATGCGTACGTAACCACGTAACAGTCGTCAACACAGATACGTCT
...
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-4Praktisch BLASTen & BLAST-Outputs
IPK Cluster BROCKEN IPK Cluster BROCKEN
Ergebnis:905 Annotation
72 Nodes ->Laufzeit:16 min
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-5Praktisch BLASTen & BLAST-Outputs
CEF: Cluster Execution FrameworkCEF: Cluster Execution Framework#!/bin/bashprojdir=/data/pdw-16/agbi/projects/#split query filepython2.3 /data/pdw-20/python_scripts/splitFas2.py -i Clones.fasta -o
$projdir -n 500blast_db=$projdir/wheat_consensus.txtmergescript=$projdir/domerge.shecho "#!/bin/sh" > $mergescriptecho "cat \\" >> $mergescriptz=0for i in split/*doscript_file=$projdir/script/blastjob_$$_$z.shresult_file=$projdir/result/blastresult_$$_$z.txtlog_file=$projdir/log/joblog_$$_$zecho "#!/bin/sh" > $script_file#echo "cd $projdir" >> $script_fileecho "/usr/bin/blastall -i $projdir/$i -p blastn -d $blast_db -m0 -e 1E-10
-v 10 -b 10 -o $result_file" >> $script_fileecho "$result_file \\" >> $mergescriptqsub -o $log_file.out -e $log_file.err -q long $script_fileecho "qsub -o $log_file.out -e $log_file.err -q long $script_file"z=`expr $z + 1`doneecho ">final_result.txt" >> $mergescriptecho "rm log/* script/* " >> $mergescript
file server/data/pdw-16/
file server/data/pdw-20/
Metadata about Tools (NCBI BLAST, Spidey, …) Tool parameters (-i FASTA-query, …) Files (FASTA, blastable, …) Jobs/sub jobs (progress, finished, …)
…
master/head node pdw-22
22 nodes
CEF GUI CEF SOAPWeb Services
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-6Praktisch BLASTen & BLAST-Outputs
CEF: APEX GUI CEF: APEX GUI
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-7Praktisch BLASTen & BLAST-Outputs
Eingabe EST-SequenzEingabe EST-Sequenz
>HY01A03T GAATTCGGCACCAGAGTGAGCACGCAAGCCAGTGTTTGTAGCCAGCAGCCACAATGGCCGGGAACATGCT AGCCAACTATGTCCAAGTCTACGTCATGCTCCCGCTGGATGTCGTGAGCGTCGACAACAAGTTCGAGAAG GGCGACGAGATCAGGGCGCAGCTGAAGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATAGACGTCT GGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTACAAGCAGGTCTTCGACCT GGTGCACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTCCACCAGTGCGGTGGCAACGTCGGCGAC GTAGTCAACATCCCCATCCCACAGTGGGTGCGGGATGTCGGCGCTACCGACCCCGACATTTTCTACACGA ACCGCAGAGGGACGAGGAACATCGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAG AACTGCCGTCCAGATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCC GGTACCATCGTGGACATTGAAGTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTATCCTCAGA GCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTACCTGGAAGCAGACTTCAA
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-8Praktisch BLASTen & BLAST-Outputs
BlastN-ResultatBlastN-Resultat>HY01A03T Length = 700 Plus Strand HSPs: Score = 2595 (395.4 bits), Expect = 3.0e-112, P = 3.0e-112 Identities = 573/618 (92%), Positives = 573/618 (92%), Strand = Plus / PlusQuery: 77 CTATGTCCAAGTCTACGTCATGCTCCCGCTGGATGTCGTGAGC--GT-CGACAACAAGTT 133 ||| ||| | | || | | | | || || |||| | | || ||| || Sbjct: 89 CTACGTC-ATG-CTCCCGCTGGATGTCG-TGAGCGTCGACAACAAGTTCGAGAAGGGCGA 145Query: 134 CGAGA--AGGGCGACGAGATCAGGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATA 191 ||||| |||||| | || | | |||||||||||||||||||||||||||||||||||||Sbjct: 146 CGAGATCAGGGCG-C-AGCTGAAGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATA 203Query: 192 GACGTCTGGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTAC 251 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 204 GACGTCTGGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTAC 263Query: 252 AAGCAGGTCTTCGACCTGGTACACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTC 311 |||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||Sbjct: 264 AAGCAGGTCTTCGACCTGGTGCACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTC 323Query: 312 CACCCCGTGCGGTGGCAACGTCGGCGACGTAGTCAACATCCCCATCCCACAGTGGGTGCG 371 |||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 324 CACCA-GTGCGGTGGCAACGTCGGCGACGTAGTCAACATCCCCATCCCACAGTGGGTGCG 382Query: 372 GGATGTCGGCGCTACCGACCCCGACATTTTCCACACGAACCTCAGAGGGACGAGGAACAT 431 ||||||||||||||||||||||||||||||| ||||||||| ||||||||||||||||||Sbjct: 383 GGATGTCGGCGCTACCGACCCCGACATTTTCTACACGAACCGCAGAGGGACGAGGAACAT 442Query: 432 CGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAGAACTGCCGTCCA 491 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 443 CGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAGAACTGCCGTCCA 502Query: 492 GATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCCGG 551 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 503 GATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCCGG 562Query: 552 TACCATCGTGGACA---A-GTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTA 607 |||||||||||||| | |||||||||||||||||||||||||||||||||||||||||Sbjct: 563 TACCATCGTGGACATTGAAGTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTA 622Query: 608 TCCTCAGAGCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTA 667 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 623 TCCTCAGAGCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTA 682Query: 668 CCTGGAAGCAGACTTCAA 685 ||||||||||||||||||Sbjct: 683 CCTGGAAGCAGACTTCAA 700
Vorlesung Einführung in die Bioinformatik - U. Scholz & M. Lange Folie #3z-9Praktisch BLASTen & BLAST-Outputs
BlastX-ResultatBlastX-Resultat
>dbj|BAC83773.1| Gene info putative beta-amylase [Oryza sativa (japonica cultivar-group)] gb|EAZ40178.1| hypothetical protein OsJ_023661 [Oryza sativa (japonica cultivar-group)]Length=488
Score = 403 bits (1036), Expect = 4e-111 Identities = 191/215 (88%), Positives = 200/215 (93%), Gaps = 0/215 (0%) Frame = +3
Query 54 MAGNMLANYVQVYVMLPLDVVSVDNKFEKGDEIRAQLKKLTEAGVDGVMIDVWWGLVEGK 233 MAGN+LANYVQV VMLPLDVV+VDNKFEK DE RAQLKKLTEAGVDGVM+DVWWGLVEGKSbjct 1 MAGNLLANYVQVNVMLPLDVVTVDNKFEKVDETRAQLKKLTEAGVDGVMVDVWWGLVEGK 60
Query 234 GPKAYDWSAYKQVFDLVHEARLKLQAIMSFHQCGGNVGDVVNIPIPQWVRDVGATDPDIF 413 GP +YDW AYKQ+F LV EA LKLQAIMSFHQCGGNVGD+VNIPIPQWVRDVGA+DPDIFSbjct 61 GPGSYDWEAYKQLFRLVQEAGLKLQAIMSFHQCGGNVGDIVNIPIPQWVRDVGASDPDIF 120
Query 414 YTNRRGTRNIEYLTLGVDDQPLFHGRTAVQMYHDYMASFRENMKKFLDAGTIVDIEVGLG 593 YTNR G RNIEYLTLGVDDQPLFHGRTA+QMY DYM SFRENM +FLD G IVDIEVGLGSbjct 121 YTNRGGARNIEYLTLGVDDQPLFHGRTAIQMYADYMKSFRENMAEFLDTGVIVDIEVGLG 180
Query 594 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF 698 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADFSbjct 181 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF 215