24
13.1 ההה הההה הה ההה הההההה- הההה ההההה ה06.01.13 ההההה הההה ההההה הההה ההההה הההההה הההה הההה הההה ה ההההה ההההה הההההprint “It is your chance to give us feedback\n ;” use strict ;

אנא מלאו את סקר ההוראה

  • Upload
    zwi

  • View
    64

  • Download
    0

Embed Size (px)

DESCRIPTION

אנא מלאו את סקר ההוראה. הסקר ייפתח ב-06.01.13 ויהיה ניתן למלאו במשך שלושה שבועות הסקר יהיה זמין ב מערכת המידע האישי. print “It is your chance to give us feedback\n”;. use strict;. Bio Perl. Bio Perl. Bio Perl is a collection of Perl modules for bioinformatics applications. - PowerPoint PPT Presentation

Citation preview

Page 1: אנא מלאו את סקר ההוראה

13.1אנא מלאו את סקר ההוראה

ויהיה ניתן למלאו במשך שלושה 06.01.13הסקר ייפתח ב-שבועות

מערכת המידע האישיהסקר יהיה זמין ב

print “It is your chance to give us feedback\n

;”

use strict;

Page 2: אנא מלאו את סקר ההוראה

13.2

BioPerl

Page 3: אנא מלאו את סקר ההוראה

12.3

BioPerl is a collection of Perl modules for bioinformatics applications.

It is an active open source software project founded in 1996.

BioPerl provides software modules for many of the typical bioinformatic

tasks.

Among these are:• Format conversions• Manipulating biological sequences• Searching for similar sequences• Creating sequence alignments• Searching for ORFs on genomic DNA• And more…

BioPerl

Page 4: אנא מלאו את סקר ההוראה

12.4 BioPerl

BioPerl modules are called Bio::XXX

You can use the BioPerl wiki:

http://bio.perl.org/

with documentation and examples for how to use them – which is the best

way to learn this. We recommend beginning with the "How-tos":

http://www.bioperl.org/wiki/HOWTOs

To a more hard-core inspection of BioPerl modules:

BioPerl 1.6.1 Module Documentation

Page 5: אנא מלאו את סקר ההוראה

12.5

Many packages are meant to be used as objects.

In Perl, an object is a data structure that can use subroutines that are

associated with it.

We will not learn object oriented programming,

but we will learn how to create and use objects defined by BioPerl packages.

Object-oriented use of packages

$obj0x225d14

func()anotherFunc()

Page 6: אנא מלאו את סקר ההוראה

12.6

BioPerl modules are named Bio::xxxx

The Bio::SeqIO module deals with Sequences Input and Output:

We will pass arguments to the new argument of the file name and format

use Bio::SeqIO;

my $in = Bio::SeqIO->new("-file" => "<seq.gb",

"-format" => "GenBank");

BioPerl: the SeqIO module

File argument(filename as

would be in open)

A list of all the sequence formats BioPerl can read is in:

http://www.bioperl.org/wiki/HOWTO:SeqIO#Formats

Format argument $in

0x25e211

next_seq()write_seq()

Page 7: אנא מלאו את סקר ההוראה

12.7

use Bio::SeqIO;

my $in = Bio::SeqIO->new("-file" => "<seq.gb",

"-format" => "GenBank");

my $seqObj = $in->next_seq();

BioPerl: the SeqIO module

$in0x25e211

next_seq() write_seq()

next_seq() returns the next sequence in the file as

a Bio::Seq object (we will talk about them soon)

Perform next_seq()subroutine on $in You could think of it as:SeqIO::next_seq($in)

Page 8: אנא מלאו את סקר ההוראה

12.8

use Bio::SeqIO;

my $in = Bio::SeqIO->new("-file" => "<adeno12.gb",

"-format" => "GenBank");

my $out = Bio::SeqIO->new("-file" => ">adeno12.out.fas",

"-format" => "Fasta");

my $seqObj = $in->next_seq();

while ( defined($seqObj) ){

$out->write_seq($seqObj);

$seqObj = $in->next_seq();

}

BioPerl: the SeqIO module

write_seq()write a Bio::Seq object to $out

according to its format

Page 9: אנא מלאו את סקר ההוראה

12.9

use Bio::SeqIO;

my $in = Bio::SeqIO->new( "-file" => "<Ecoli.prot.fasta",

"-format" => "Fasta");

my $seqObj = $in->next_seq();

while (defined($seqObj)) {

print "ID:".$seqObj->id()."\n"; #1st word in header

print "Desc:".$seqObj->desc()."\n"; #rest of header

print "Sequence:".$seqObj->seq()."\n"; #seq string

print "Length:".$seqObj->length()."\n"; #seq length

$seqObj = $in->next_seq()

}

You can read more about the Bio::Seq subroutines in:

http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object

BioPerl: the Seq module

Page 10: אנא מלאו את סקר ההוראה

12.10 Print last 30aa of each sequence (no BioPerl)open (my $in, "<","seq.fasta") or die "Cannot open seq.fasta...";

my $fastaLine = <$in>;

while (defined $fastaLine) {

chomp $fastaLine;

my $header="";

# Read first word of header

if (fastaLine =~ m/^>(\S*)/) {

$header = substr($fastaLine,1);

$fastaLine = <$in>;

}

# Read seq until next header

my $seq = "";

while ((defined $fastaLine) and(substr($fastaLine,0,1) ne ">" )) {

chomp $fastaLine;

$seq = $seq.$fastaLine;

$fastaLine = <$in>;

}

# print last 30aamy $subseq = substr($seq,-30);

print "$header\n“."$subseq\n";

}

Page 11: אנא מלאו את סקר ההוראה

12.11 Now using BioPerl use Bio::SeqIO;

my $in = Bio::SeqIO->new("-file"=>"<seq.fasta","-format"=>"Fasta");

my $seqObj = $in->next_seq();

while (defined($seqObj)) {

# Read first word of header

my $header = $seqObj->id();

# print last 30aa

my $seq = $seqObj->seq();

my $subseq = substr($seq,-30);

print "$header\n";

print "$subseq\n";

$seqObj = $in->next_seq();

}

Note: BioPerl warnings about:

Subroutine ... redefined at ...

Should not trouble you, it is a known issue –

it is not your fault and won't effect your

script's performances.

Page 12: אנא מלאו את סקר ההוראה

13.13Class exercise 12a

1. Use Bio::SeqIO to read a FASTA file and print to an output FASTA file only sequences shorter than 3,000 bases. (use the EHD nucleotide FASTA from the webpage)

2. Use Bio::SeqIO to read a FASTA file, and print (to the screen) header lines that contain the words "Mus musculus".

3. Write a script that uses Bio::SeqIO to read a GenPept file and convert it to FASTA. (use preProInsulinRecords.gp from the course’s webpage)

4*. Same as Q1, but print to the FASTA the reverse complement of each sequence. (Do not use the reverse or tr// functions! BioPerl can do it for you - read the BioPerl documentation).

Page 13: אנא מלאו את סקר ההוראה

12.14

The Bio::DB::Genbank module allows us to download

a specific record from the NCBI website:

use Bio::DB::GenBank;

my $gb = Bio::DB::GenBank->new;

my $seqObj = $gb->get_Seq_by_acc("J00522");

print $seqObj->seq();

see more options in:http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database

http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html

BioPerl: downloading files from the web

Page 14: אנא מלאו את סקר ההוראה

12.15

BLAST

Congrats, you just sequenced yourself some DNA.

And you want to see if it exists in any other organism#$?!?

Page 15: אנא מלאו את סקר ההוראה

12.16

BLAST

BLAST helps you find similarity between your

sequence and other sequences

BLAST - Basic Local Alignment and Search Tool

Page 16: אנא מלאו את סקר ההוראה

12.17

BLAST

BLAST helps you find similarity between your

sequence and other sequences

BLAST - Basic Local Alignment and Search Tool

Page 17: אנא מלאו את סקר ההוראה

12.18

BLAST

BLAST helps you find similarity between your

sequence and other sequences

Page 18: אנא מלאו את סקר ההוראה

12.19

BLAST

Database

query hit

 high scoring pair (HSP)

Page 19: אנא מלאו את סקר ההוראה

12.21

First we need to have the BLAST results in a text file BioPerl can read.

Here is one way to achieve this (using NCBI BLAST):

BioPerl: reading BLAST output

Text

Download

An alternative is to use BLASTALL on your computer

Page 20: אנא מלאו את סקר ההוראה

12.22

Query= gi|52840257|ref|YP_094056.1| chromosomal replication initiatorprotein DnaA [Legionella pneumophila subsp. pneumophila str.Philadelphia 1] (452 letters)

Database: Coxiella.faa 1818 sequences; 516,956 total letters

Searching..................................................done

Score ESequences producing significant alignments: (bits) Value

gi|29653365|ref|NP_819057.1| chromosomal replication initiator p... 633 0.0 gi|29655022|ref|NP_820714.1| DnaA-related protein [Coxiella burn... 72 4e-14gi|29654861|ref|NP_820553.1| Holliday junction DNA helicase B [C... 32 0.033gi|29654871|ref|NP_820563.1| ATPase, AFG1 family [Coxiella burne... 27 1.4 gi|29654481|ref|NP_820173.1| hypothetical protein CBU_1178 [Coxi... 25 3.1 gi|29654004|ref|NP_819696.1| succinyl-diaminopimelate desuccinyl... 25 3.1

BioPerl: reading BLAST outputQuery

Results info

Page 21: אנא מלאו את סקר ההוראה

12.23

gi|215919162|ref|NP_820316.2| threonyl-tRNA synthetase [Coxiella... 25 5.3 gi|29655364|ref|NP_821056.1| transcription termination factor rh... 24 9.0 gi|215919324|ref|NP_821004.2| adenosylhomocysteinase [Coxiella b... 24 9.0 gi|29653813|ref|NP_819505.1| putative phosphoribosyl transferase... 24 9.0

>gi|29653365|ref|NP_819057.1| chromosomal replication initiator protein [Coxiella burnetii RSA 493] Length = 451

Score = 633 bits (1632), Expect = 0.0 Identities = 316/452 (69%), Positives = 371/452 (82%), Gaps = 5/452 (1%)

Query: 1 MSTTAWQKCLGLLQDEFSAQQFNTWLRPLQAYMDEQR-LILLAPNRFVVDWVRKHFFSRI 59 + T+ W KCLG L+DE QQ+NTW+RPL A +Q L+LLAPNRFV+DW+ + F +RISbjct: 3 LPTSLWDKCLGYLRDEIPPQQYNTWIRPLHAIESKQNGLLLLAPNRFVLDWINERFLNRI 62

Query: 60 EELIKQFSGDDIKAISIEVGSKPVEAVDTPAETIVTSSSTAPLKSAPKKAVDYKSSHLNK 119 EL+ + S D I +++GS+ E + + AP + + +++N Sbjct: 63 TELLDELS-DTPPQIRLQIGSRSTEMPTKNSHEPSHRKAAAPPAGT---TISHTQANINS 118

Query: 120 KFVFDSFVEGNSNQLARAASMQVAERPGDAYNPLFIYGGVGLGKTHLMHAIGNSILKNNP 179 F FDSFVEG SNQLARAA+ QVAE PG AYNPLFIYGGVGLGKTHLMHA+GN+IL+ + Sbjct: 119 NFTFDSFVEGKSNQLARAAATQVAENPGQAYNPLFIYGGVGLGKTHLMHAVGNAILRKDS 178

BioPerl: reading BLAST output

Result header

 high scoring pair (HSP) data

HSP Alignment

Note: There could be more than one HSP for each result,

in case of homology in different parts of the protein

Page 22: אנא מלאו את סקר ההוראה

12.24

The Bio::SearchIO module can read and parse BLAST output:use Bio::SearchIO;

my $blast_report =

Bio::SearchIO->new("-file" => "<LegCox.blastp",

"-format" => "blast" );

my ($resultObj, $hitObj, $hspObj);

while( defined($resultObj = $blast_report->next_result()) ){

print "Checking query ".$resultObj->query_name()."\n";

while( defined($hitObj = $resultObj->next_hit()) ) {

print "Checking hit ". $hitObj->name()."\n";

$hspObj = $hitObj->next_hsp();

print "Best score: ".$hspObj->score()."\n"; }

}

(See the BLAST output example in the course’s website)

Bio::SearchIO : reading BLAST output

Page 23: אנא מלאו את סקר ההוראה

12.25

You can send parameters to the subroutines of the objects:

# Get length of HSP (including gaps)

$hspObj->length("total");

# Get length of hit part of alignment (without gaps)

$hspObj->length("hit");

# Get length of query part of alignment (without gaps)

$hspObj->length("query");

More about what you can do with query, hit and hsp see in:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Table_of_Methods

BioPerl: reading BLAST output

Page 24: אנא מלאו את סקר ההוראה

13.26Class exercise 12b

1. Uses Bio::SearchIO to parse the BLAST results: (LegCox.blastp provided in the course’s website)

a) For each query print out its name and the name of its first hit.

b*) Print the percent identity of each HSP of the first hit of each query.

c*) Print the e-value of each HSP of the first hit of each query.