33
SOME TRAINING ON NUCLEOTIDE SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING ALIGNMENT AND TREE BUILDING Y.Ph. Kartavtsev A.V. Zhirmunsky Institute of Marine Biology of Far Eastern Branch of Russian Academy of Sciences, Vladivostok 690041, Russia, e-mail: [email protected]

SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

  • Upload
    dante

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING. Y.Ph. Kartavtsev A.V. Zhirmunsky Institute of Marine Biology of Far Eastern Branch of Russian Academy of Sciences, Vladivostok 690041, Russia, e-mail: [email protected]. ГЛАВНЫЕ ВОПРОСЫ. - PowerPoint PPT Presentation

Citation preview

Page 1: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

SOME TRAINING ON NUCLEOTIDE SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, SEQUENCES: EDITION, REGISTRATION,

ALIGNMENT AND TREE BUILDINGALIGNMENT AND TREE BUILDING

Y.Ph. KartavtsevA.V. Zhirmunsky Institute of Marine Biology of Far

Eastern Branch of Russian Academy of Sciences, Vladivostok 690041, Russia, e-mail: [email protected]

Page 2: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

ГЛАВНЫЕ ВОПРОСЫГЛАВНЫЕ ВОПРОСЫ1. 1. Sequence edition and their registration in Sequence edition and their registration in

GenBankGenBank..2. 2. Data format and gene banks availableData format and gene banks available. . 3. 3. Sequence alignmentSequence alignment..4. 4. Finding an optimal model of nucleotide Finding an optimal model of nucleotide

substitutionsubstitution. . 5. 5. Tree building with software packageTree building with software package MEGAMEGA-3 -3

((MEGAMEGA-4). -4). 6. 6. Annotation onAnnotation on PAUPPAUP, , MrBayesMrBayes and some other and some other

programsprograms. .

Page 3: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

nDNAnDNA,, rDNArDNA

Most substantiated statistically resultsMost substantiated statistically results

Statistically significant resultsStatistically significant results

APPLICABILITY OF DIFFERENT DNA TYPESAPPLICABILITY OF DIFFERENT DNA TYPES IN IN PHYLOGENETICS AND TAXONOMYPHYLOGENETICS AND TAXONOMY

SpeciesSpecies GenusGenus FamilyFamily OrderOrder ClassClass PhylumPhylum

SpacersSpacers[[ITS-1, 2ITS-1, 2]]

mtDNAmtDNA

Page 4: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

МАТЕРИАЛ И МЕТОДЫМАТЕРИАЛ И МЕТОДЫ

2. PCR DNA

Amplification

3. Determination of Primary

Nucleotide Sequence

1. DNA Isolation

4. Phylogenetic

Analysis

Page 5: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

1. 1. SEQUENCE EDITION AND THEIR REGISTRATION IN SEQUENCE EDITION AND THEIR REGISTRATION IN THE GENBANK, NCBI THE GENBANK, NCBI (1)(1)

Original sequence that obtained from a sequencing machine requires an edition. Many Original sequence that obtained from a sequencing machine requires an edition. Many requirement for the edition meet such program packages requirement for the edition meet such program packages ((PPPP) ) as MEGAas MEGA-3 -3 oror MEGAMEGA-4-4 ( (httphttp://://wwwwww..megasoftwaremegasoftware..netnet// ), ), GeneDOC GeneDOC ((httphttp://://www.nrbsc.orgwww.nrbsc.org//) ) etcetc. . Most suitable PP tool for the Most suitable PP tool for the primary edition primary edition is Chromasis Chromas ( (ChromasChromas--propro, , that is available at that is available at httphttp://://www.flu.org.cnwww.flu.org.cn//enen oror httphttp://://wwwwww..technelysiumtechnelysium..comcom..auau//chromaschromas..htmlhtml ). ). Currently realized version Currently realized version ((ChromasChromas--propro 2.31) 2.31) let to let to perform a number of edition optionsperform a number of edition options. .

Opens chromatogram files from Applied Biosystems and Amersham MegaBace DNA Opens chromatogram files from Applied Biosystems and Amersham MegaBace DNA sequencerssequencers. . Opens SCF format chromatogram files created by ALF, Li-Cor, Visible Genetics OpenGene, Opens SCF format chromatogram files created by ALF, Li-Cor, Visible Genetics OpenGene, Beckman CEQ 2000XL and CEQ 8000, and other sequencers.Beckman CEQ 2000XL and CEQ 8000, and other sequencers. View Genescan genotype files. View Genescan genotype files. Save in SCF or Applied Biosystems format. Save in SCF or Applied Biosystems format. Prints chromatogram with options to zoom or fit to one page.Prints chromatogram with options to zoom or fit to one page. Exports sequences in plaint text, formatted with base numbering, FASTA, EMBL, GenBank or Exports sequences in plaint text, formatted with base numbering, FASTA, EMBL, GenBank or GCG formats.GCG formats. Copy the sequence to the clipboard in plain text or FASTA format for pasting into other Copy the sequence to the clipboard in plain text or FASTA format for pasting into other applications. applications. Export sequences from batches of chromatogram files, with automatic removal of vector Export sequences from batches of chromatogram files, with automatic removal of vector sequence. sequence. Reverse & complement the sequence and chromatogram. Reverse & complement the sequence and chromatogram. Search for sequences by exact matching or optimal alignment. Search for sequences by exact matching or optimal alignment. Display translations in 3 frames along with the sequence. Display translations in 3 frames along with the sequence. Copy an image of a chromatogram section for pasting into documents or presentations.Copy an image of a chromatogram section for pasting into documents or presentations.

Page 6: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

1. 1. SEQUENCE EDITION AND THEIR REGISTRATION IN SEQUENCE EDITION AND THEIR REGISTRATION IN THE GENBANK , NCBI THE GENBANK , NCBI (2)(2)

Main task that CROMAS can perform is a comparisonMain task that CROMAS can perform is a comparison of sequencesof sequences, , a a removalremoval of vector sequences in of vector sequences in the beginning and in the end of chainsthe beginning and in the end of chains, , an inversion an inversion of the anti-parallel sequence (chainsof the anti-parallel sequence (chains), ), a creation ofa creation of a consensus sequencea consensus sequence and and recording all information in a mode that convenient for further calculationsrecording all information in a mode that convenient for further calculations. .

Fig. 1.1 presents a view of sequences in CHROMAS PP editor.Fig. 1.1 presents a view of sequences in CHROMAS PP editor.

Fig. 1.1. A graphic and symbolic representation of a sequence fragment at cytochrome oxidase 1 (Со-1) gene in flounder, Liopsetta pinifasciata.

Sequencing made with АBI-3100 (Applied Biosistems, USA) machine. Four repeated sequences obtained with different primers (1K_F2 etc, left) and they are shown as peaks and their letter translation. After the inversion of the anti-parallel chains (1KR1_L_p and 1K_R2 etc) and performing their complementation sequences have automatically aligned. The consensus sequence that is under edition shown above. Chromatogram lines and letters of four nucleotides are shown in different color for better visual perception.

Page 7: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

1. 1. SEQUENCE EDITION AND THEIR REGISTRATION IN SEQUENCE EDITION AND THEIR REGISTRATION IN THE GENBANK , NCBI THE GENBANK , NCBI (3)(3)

After an edition in CHROMAS or any other editorAfter an edition in CHROMAS or any other editor a sequence of nucleotidesa sequence of nucleotides have to have to register it in a gene bank. For a registration register it in a gene bank. For a registration of single genes or their segmentsof single genes or their segments thethe Bankit Bankit utility is convenient.utility is convenient. This utility let to submit a sequence or set of them in the This utility let to submit a sequence or set of them in the interactive mode with the attribution to them a preliminary codes and after checking interactive mode with the attribution to them a preliminary codes and after checking the codes of accession to the GenBank data basethe codes of accession to the GenBank data base. . In Fig. 1.2 there is a fraction of In Fig. 1.2 there is a fraction of info that provided under request in the GenBank site. info that provided under request in the GenBank site.

Fig. 1.2. Fragment of the GenBank window.

Data are shown for the complete mtDNA genome of one flatfish species (Pleuronectiformes).

Page 8: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

2. 2. DATA FORMAT AND GENE BANKS AVAILABLEDATA FORMAT AND GENE BANKS AVAILABLEThe submitted sequences will be The submitted sequences will be accessible accessible for overall usage after agreed datefor overall usage after agreed date, , usually after 1 year usually after 1 year and publication of a paperand publication of a paper. . Particular sequence is accessible in different formatsParticular sequence is accessible in different formats GenBankGenBank, , FASTAFASTA etcetc. . In the first case it is looks likeIn the first case it is looks like as below as below ((FigFig. 2.1).. 2.1).

                 1 1 gtgcctgagc cggaatagtc ggggacaggc ctaagtctgc tcattcgagc agagctaagcgtgcctgagc cggaatagtc ggggacaggc ctaagtctgc tcattcgagc agagctaagc                61 61 caacctgggt gctctcctgg gagacgacca aatttataac gtaatcgtca ccgcacacgccaacctgggt gctctcctgg gagacgacca aatttataac gtaatcgtca ccgcacacgc            121 121 ctttgtaata atcttcttta tagtaatacc aattatgatn cggagggttc ggaaactgacctttgtaata atcttcttta tagtaatacc aattatgatn cggagggttc ggaaactgac            181 181 ttattccatt aataattggg gcccccgnat atggccttcc ctcgaataaa taacatgagtttattccatt aataattggg gcccccgnat atggccttcc ctcgaataaa taacatgagt            241 241 ttctgacttc tacccccatc ctttctcctc cttctagcct cttcaggncg tcgaagctggttctgacttc tacccccatc ctttctcctc cttctagcct cttcaggncg tcgaagctgg            301 301 ggcagggaca ggatgaaccg tgtatccccc actagctgga aatctagcac acgccggagcggcagggaca ggatgaaccg tgtatccccc actagctgga aatctagcac acgccggagc            361 361 atcggtagac ctcaccattt tctctcttca ccttgccgga atttcatcaa ttctaggggcatcggtagac ctcaccattt tctctcttca ccttgccgga atttcatcaa ttctaggggc            421 421 aatcaacttt attactacta tcatcaacat gaaaccaaca gcagtcacta tgtaccaaataatcaacttt attactacta tcatcaacat gaaaccaaca gcagtcacta tgtaccaaat            481 481 cccactattt gtctgagccg tactaatcac cgcacgtcct tcttcttctt tcacactacccccactattt gtctgagccg tactaatcac cgcacgtcct tcttcttctt tcacactacc            541 541 acgtcactgg ccgctggcat tacaatgcta ctgactagac cgcaacacta aacacaaacaacgtcactgg ccgctggcat tacaatgcta ctgactagac cgcaacacta aacacaaaca            601 601 cttctttgac cctgcygcttctttgac cctgcyg

FigFig. 2.1. . 2.1. Partial nucleotide sequence Partial nucleotide sequence Со-1 Со-1 gene in floundergene in flounder, Pseudopleuronectes obscurus. , Pseudopleuronectes obscurus. In the left column ordering numbers for first nucleotides are shownIn the left column ordering numbers for first nucleotides are shown. . Nucleotides are grouped by 10Nucleotides are grouped by 10 with total numberwith total number 60 60 in a rowin a row.. Other info in the NCBI window was shown above (Fig. 1.2). Other info in the NCBI window was shown above (Fig. 1.2).

For a sequence registration one of three most recognized gene banks available: NCBI (USA), DDBJ For a sequence registration one of three most recognized gene banks available: NCBI (USA), DDBJ (Japan), and EMBL (EU)(Japan), and EMBL (EU). . These three banks are connected and exchange dataThese three banks are connected and exchange data. . ThusThus, , made a made a registration (submission) of a sequenceregistration (submission) of a sequence, , for instance infor instance in the GenBankthe GenBank ( (httphttp://://wwwwww..ncbincbi..nlmnlm..nihnih..govgov), ), an author granted a confidence from an unwanted access in a certain agreed time and then these an author granted a confidence from an unwanted access in a certain agreed time and then these sequences become available to any user of Internetsequences become available to any user of Internet. .

You are also free for a submission of your data in the European DNA bank, EMBL (You are also free for a submission of your data in the European DNA bank, EMBL (http://http://www.ebi.ac.uk/emblwww.ebi.ac.uk/embl// ), or in the DNA data bank of Japan, DDBJ ( ), or in the DNA data bank of Japan, DDBJ (http://www.ddbj.nig.ac.jp/searches-e.htmlhttp://www.ddbj.nig.ac.jp/searches-e.html ). There are also local DNA data banks, e.g. the Japan ). There are also local DNA data banks, e.g. the Japan Center of BioResources, RIKEN (Center of BioResources, RIKEN (http://http://www.brc.riken.jp/lab/dna/enwww.brc.riken.jp/lab/dna/en//), the North Bank, NGB (), the North Bank, NGB (http://http://www.ngb.sewww.ngb.se) etc. ) etc.

Page 9: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

3. 3. SEQUENCE ALIGNMENTSEQUENCE ALIGNMENT (1) (1)Sequence alignmentSequence alignment ((выравнивание) выравнивание) is very important procedureis very important procedure, , which anticipates theirwhich anticipates their quantitative quantitative analysis including a calculation of similarity-distances measuresanalysis including a calculation of similarity-distances measures, , homology estimatehomology estimate, , and at lastand at last building building different molecular phylogenetic trees (dendrograms)different molecular phylogenetic trees (dendrograms). . There are several algorithms of alignment that There are several algorithms of alignment that performed by different performed by different , , sequence processors (editors)sequence processors (editors). . We will consider here for short only one sequenceWe will consider here for short only one sequence alignment that makealignment that make CLUSTAL WCLUSTAL W, , a program adopted fora program adopted for OSOS WindowsWindows. . For the alignment you should first load the sequences into the editorFor the alignment you should first load the sequences into the editor . . There are 3 way to do thisThere are 3 way to do this: : ((1) 1) Making Making a directa direct record of nucleotide sequences one by one in a consequent window of the editorrecord of nucleotide sequences one by one in a consequent window of the editor , , ((2) 2) ImportingImporting the the sequences from a filesequences from a file that was prepared before, andthat was prepared before, and ((3) 3) Copying a sequence via clipboard from former editor Copying a sequence via clipboard from former editor to CLUSTAL W windowto CLUSTAL W window. . In FigIn Fig. 3.1 . 3.1 the interface of the CLUSTAL W editor is shownthe interface of the CLUSTAL W editor is shown ( (Thompson et alThompson et al. 1994), . 1994), that integrated withthat integrated with MEGA;MEGA; cases beforecases before (А) (А) and afterand after (В) (В) alignmentalignment. .

А

Page 10: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

3. 3. SEQUENCE ALIGNMENTSEQUENCE ALIGNMENT (2) (2)

FigFig. 3.1. . 3.1. Windows of the CLUSTAL WWindows of the CLUSTAL W alignment editor alignment editor ((Alignment explorerAlignment explorer) ) inin MEGAMEGA, , with fragments of with fragments of ССytyt--b gene nucleotide sequencesb gene nucleotide sequences from several fish from several fish species beforespecies before (А) (А) and after alignment completedand after alignment completed (В). (В).

With same color similar sitesWith same color similar sites are shownare shown. . An asteriskAn asterisk marks sites that hasmarks sites that has 100% 100% homology of nucleotideshomology of nucleotides, , i.e., these nucleotides are identical in all the i.e., these nucleotides are identical in all the sequences in a setsequences in a set. . After the species names other identifiers (Labs’ codes or After the species names other identifiers (Labs’ codes or GenBank accession numbers) are denotedGenBank accession numbers) are denoted..

В

Page 11: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

3. 3. SEQUENCE ALIGNMENTSEQUENCE ALIGNMENT (3) (3)

In the above case the sequences were loaded via clipboardIn the above case the sequences were loaded via clipboard ( (FigFig. 3.1). . 3.1). Make run of MEGAMake run of MEGA-3 (-3 (MEGAMEGA-4), -4), we can chouse in the main menuwe can chouse in the main menu: : Alignment Alignment Alignment explorer Alignment explorer//Clustal Clustal Create a new Create a new alignment alignment («(«выравниваниевыравнивание» » « «редактор редактор выравнивания/выравнивания/ClustalClustal»» « «создать новое выравниваниесоздать новое выравнивание»»)). .

In the last options there are actually 3 possibilitiesIn the last options there are actually 3 possibilities: : Create a new Create a new alignment alignment («(«создать новое выравниваниесоздать новое выравнивание»), »), Open a saved Open a saved alignment session alignment session («(«открыть сохраненную сессию открыть сохраненную сессию выравниваниявыравнивания»), »), Retrieve sequence from a file Retrieve sequence from a file («(«вывести вывести последовательность из файлапоследовательность из файла»). »).

When sequences are loadedWhen sequences are loaded, , an author meetsan author meets, , as a ruleas a rule, , with a with a dimension problem: sequences length is unequaldimension problem: sequences length is unequal and their starts & and their starts & endsends are not complemented;are not complemented; more overmore over, , many sequences have many sequences have deletions/insertionsdeletions/insertions ( (GapsGaps), ), which are not coincide in different which are not coincide in different individuals and speciesindividuals and species. . Alignment allows to solve all these problemsAlignment allows to solve all these problems. .

Page 12: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

3. 3. SEQUENCE ALIGNMENTSEQUENCE ALIGNMENT (4) (4)Technically, to start CLUSTAL W execution you have to chooseTechnically, to start CLUSTAL W execution you have to choose all sequences and run the all sequences and run the optionoption “Alignment” of the main menu“Alignment” of the main menu. . As a result of this action a special dialog box As a result of this action a special dialog box appearedappeared ( (FigFig. 3.2). . 3.2). In FigIn Fig. 3.2 . 3.2 two dialog boxes are shown that suits for certain setting two dialog boxes are shown that suits for certain setting under under alignment, which proceeds in the two stepsalignment, which proceeds in the two steps. .

FigFig. 3.2. . 3.2. Dialog boxes of Dialog boxes of the MEGA integrated CLUSTAL W editor that helps to perform alignment the MEGA integrated CLUSTAL W editor that helps to perform alignment in an appropriate and user specified mode.in an appropriate and user specified mode. Opened windows are for setting the penalty optionsOpened windows are for setting the penalty options ((PenaltiesPenalties) ) underunder pair-wise alignmentpair-wise alignment ( (Pairwise ParametersPairwise Parameters) ) and multiple alignmentand multiple alignment ( (Multiple Multiple ParametersParameters).).

Page 13: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

3. 3. SEQUENCE ALIGNMENTSEQUENCE ALIGNMENT (5) (5)

Pushing the button executePushing the button execute ( (ОКОК) ) execute alignmentexecute alignment. . The alignment is a delicate The alignment is a delicate art and may take patienceart and may take patience. . Different sets of sequences takes specific an Different sets of sequences takes specific an empirical treat with the penalty values for best alignment resultsempirical treat with the penalty values for best alignment results. .

The alignment algorithm is such that with the increase of the penaltyThe alignment algorithm is such that with the increase of the penalty score score produced the increaseproduced the increase of Gapsof Gaps (caused by deletions and insertions as we (caused by deletions and insertions as we remember)remember) and high homology of reminder part of the nucleotide (or other) and high homology of reminder part of the nucleotide (or other) sequencessequences. . However, too big penalties led to the loose of some fraction of However, too big penalties led to the loose of some fraction of nucleotidesnucleotides, , which are actually homologicalwhich are actually homological, , but represented only in some certain but represented only in some certain sitessites of sequencesof sequences. . Our and other authors’Our and other authors’ experience with mtDNA nucleotideexperience with mtDNA nucleotide sequences showed that penalties within the limit sequences showed that penalties within the limit 15-30 15-30 for the gap openingfor the gap opening andand 0.5-8 0.5-8 for the gap extension are well satisfactory for the first step of the alignmentfor the gap extension are well satisfactory for the first step of the alignment. .

WhenWhen CLUSTAL W program have finished [It was runned with the setting in the CLUSTAL W program have finished [It was runned with the setting in the windows as in our example windows as in our example ((FigFig. 3.2, А): . 3.2, А): Gap Opening Penalties Gap Opening Penalties («(«штрафы за штрафы за открытие пропусковоткрытие пропусков») ») areare 15 15 units andunits and Gap Extension Penalties Gap Extension Penalties («(«штрафы за удлинение пропусковштрафы за удлинение пропусков») ») areare 5 5 unitsunits, , both for pair-wise and both for pair-wise and multiple alignment steps]multiple alignment steps], , the window appeared that contained the sequences the window appeared that contained the sequences with gaps, looking like blank spaceswith gaps, looking like blank spaces with dashes, homologically placedwith dashes, homologically placed ( (alignedaligned) ) sequencessequences ( (FigFig. 3.3). . 3.3). Biggest gaps at this step appearedBiggest gaps at this step appeared and sequences looks and sequences looks like as shown in Figlike as shown in Fig. 3.3.. 3.3.

Page 14: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

3. 3. SEQUENCE ALIGNMENTSEQUENCE ALIGNMENT (6) (6)

FigFig. 3.3. . 3.3. Window ofWindow of CLUSTAL W editor inCLUSTAL W editor in MEGAMEGA, , that shows fragments of nucleotide that shows fragments of nucleotide sequencessequences atat С Сytyt--bb genegene after execution the option “Alignment” (after execution the option “Alignment” (««выравниваниевыравнивание»»)) and realization of the first step of the alignmentand realization of the first step of the alignment. .

Gaps (as blank spaces with dashesGaps (as blank spaces with dashes) ) aligned sequences are seen. After gaps removal aligned sequences are seen. After gaps removal the sequences take final form as was shown inthe sequences take final form as was shown in FigFig. 3.1, В.. 3.1, В.

The sequences are inspected and large gaps removed manuallyThe sequences are inspected and large gaps removed manually. . One can remove gaps by mean of One can remove gaps by mean of an editor (processor) softwarean editor (processor) software. . After first step againAfter first step again CLUSTAL WCLUSTAL W dialog box is run and align dialog box is run and align starts with decreased values of penaltiesstarts with decreased values of penalties ( (FigFig. 3.2, В). . 3.2, В). Now after finishing the program all gaps Now after finishing the program all gaps are removed and the obtained file in an appropriate format for further examinationare removed and the obtained file in an appropriate format for further examination. .

Page 15: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

4. 4. FINDING AN OPTIMAL MODEL OF FINDING AN OPTIMAL MODEL OF NUCLEOTIDE SUBSTITUTIONNUCLEOTIDE SUBSTITUTION (1)(1)

For choosing a model that is most For choosing a model that is most suitable for particular suitable for particular empirical data sets you need some tool.empirical data sets you need some tool. TheThe MODELTESTMODELTEST 3.06 3.06 ((PosadaPosada, , GrandalGrandal, 1998) , 1998) program program and later versions and later versions 3.6 - 3.73.6 - 3.7 are very convenient for that are very convenient for that. . I I could not present here info about models but you can could not present here info about models but you can easily know on model properties in the program manual easily know on model properties in the program manual and in the literatureand in the literature ( (NeiNei, , KumarKumar, 2000; , 2000; HallHall, 2001; , 2001; Sanderson, Shaffer,Sanderson, Shaffer, 2003; 2003; FelsensteinFelsenstein, 2004, 2004); there is ); there is also a brief info in my book (Kartavtsev, 2005also a brief info in my book (Kartavtsev, 2005). ). To useTo use MODELTESTMODELTEST you have to learn firstly theyou have to learn firstly the PAUP PAUP PPPP, , because this program usesbecause this program uses some ofsome of PAUP modulesPAUP modules. . The work with the program is basically simple and The work with the program is basically simple and includesincludes 5 5 stepssteps. .

Page 16: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

4.4. FINDING AN OPTIMAL MODEL OF NUCLEOTIDE FINDING AN OPTIMAL MODEL OF NUCLEOTIDE SUBSTITUTION (SUBSTITUTION (22))

1. 1. First you must make a working file in theFirst you must make a working file in the NexusNexus (. (.nexnex)) format with the nucleotide format with the nucleotide sequences and necessary identifiers of the program parameterssequences and necessary identifiers of the program parameters, , in acordance in acordance with thewith the PAUP demandsPAUP demands;;

2. 2. Next you should reach theNext you should reach the MODELTEST website and load all recommended MODELTEST website and load all recommended modules and copy in the modules and copy in the nexusnexus--filefile made before made before the filethe file ““modelblockPAUPbmodelblockPAUPb10.10.txttxt””, , which is distributed with the MODELTESTwhich is distributed with the MODELTEST ( (it suits forit suits for PAUP PAUP 44bb10 10 version forversion for WindowsWindows); );

3. 3. Run then Run then PAUP PAUP 44bb1010 installed before installed before ((better to renamebetter to rename original data fileoriginal data file) ) and start and start the executionthe execution of the working fileof the working file;;

4. 4. When program stops normally, in the same directory (folder),When program stops normally, in the same directory (folder), from which working from which working file have been executed,file have been executed, the new file will appeared with the namethe new file will appeared with the name ““modelmodel..scoresscores””; ;

5. 5. Now it is necessary to run the program, MODELTESTNow it is necessary to run the program, MODELTEST 3.7 3.7 is best, fromis best, from an OSan OS DOS window;DOS window; better to do this from the directory that contain executable file better to do this from the directory that contain executable file ““modeltestmodeltest3.7.3.7.winwin..exeexe””. . Consequent identifiers in the command line will beConsequent identifiers in the command line will be as as followsfollows: : ““modeltestmodeltest3.7.3.7.exeexe < <modelmodel..scoresscores> > testtest..outout”” ( (last output file may have an last output file may have an arbitrary namearbitrary name). ).

In the output file all necessary information will be presentedIn the output file all necessary information will be presented and the parameters of and the parameters of one or two best fit modelsone or two best fit models of 5of 57 7 estimated model typesestimated model types will be given as well;will be given as well; testing is performed by the testing is performed by the Maximum LikelihoodMaximum Likelihood (ML) algorithm and by the (ML) algorithm and by the Acaike Acaike Information CriteriaInformation Criteria. .

Page 17: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

5.5. TREE BUILDING WITH SOFTWARE TREE BUILDING WITH SOFTWARE PACKAGEPACKAGE MEGAMEGA-3 (-3 (MEGAMEGA-4) (1)-4) (1)

Options and model parameters as well model themselves Options and model parameters as well model themselves for calculation of for calculation of molecular phylogenetic tressmolecular phylogenetic tress are provided by different programsare provided by different programs: : PAUPPAUP* (* (SwoffordSwofford, , 2000), 2000), MEGAMEGA-3 (-3 (MEGAMEGA-4) (-4) (Kumar et alKumar et al., 1993; 2000) ., 1993; 2000) etcetc. . Book by HallBook by Hall ((20012001) is ) is very good manual for a molecular phylogenetic analysisvery good manual for a molecular phylogenetic analysis. . This manual is focused This manual is focused mainly onmainly on PAUPPAUP**.. HoweverHowever, , in the book the exact examples available and in the book the exact examples available and recommendations are given onrecommendations are given on PP CLUSTAL XPP CLUSTAL X, , MrBayesMrBayes etcetc. .

Beginning an analytical jobBeginning an analytical job in MEGAin MEGA-3 -3 andand MEGAMEGA-4 -4 may be accomplishedmay be accomplished right afterright after alignment completedalignment completed. . Closing saved file in the Closing saved file in the Alignment ExplorerAlignment Explorer (редактора (редактора выравнивания; выравнивания; it has the extensionit has the extension ..masmas). ). Under this action a window appearUnder this action a window appear with a with a noticenotice: : ““Save data to MEGA fileSave data to MEGA file: : YesYes, , NoNo, , CancelCancel’ ’ («(«сохранить файл для сохранить файл для MEGAMEGA», с опциями: «», с опциями: «дада», «», «нетнет», «», «сброссброс»). »). Choosing the option “YES”Choosing the option “YES” opens the opens the next windownext window with the file name ready to be saved on the hard diskwith the file name ready to be saved on the hard disk. . By default the file By default the file name is supposed same as the alignment file, but with different extension: “name is supposed same as the alignment file, but with different extension: “..megmeg””. . By choosing the option saveBy choosing the option save ((««сохранитьсохранить»»)), , we run the MEGA PP itselfwe run the MEGA PP itself. . Before Before openning theopenning the megmeg--filefile for the execution,for the execution, it is necessary to note in the opened windowit is necessary to note in the opened window, , what sequence is processedwhat sequence is processed: : ““ProteinProtein--coding nucleotide sequence datacoding nucleotide sequence data” ” ((««данные с белок-кодирующей нуклеотидной последовательностьюданные с белок-кодирующей нуклеотидной последовательностью»), »), withwith the alternative YES or NO. At last the dialog box appeared with the question: “ Open the alternative YES or NO. At last the dialog box appeared with the question: “ Open Data File in MEGAData File in MEGA ((««открыть файл с данными в открыть файл с данными в MEGAMEGA»»)), , YES, NO.YES, NO. In a choose In a choose YESYES we getwe get MEGA working fileMEGA working file, , following by opening a special editorfollowing by opening a special editor ““Sequence Sequence Data ExplorerData Explorer” ” («(«редактора последовательностейредактора последовательностей») (») (FigFig. 5.1). . 5.1).

Page 18: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

5.5. TREE BUILDING WITH SOFTWARE TREE BUILDING WITH SOFTWARE PACKAGEPACKAGE MEGAMEGA-3 (-3 (MEGAMEGA-4) (-4) (22))

FigFig. 5.1. . 5.1. View of working file inView of working file in MEGAMEGA-3 (-3 (MEGAMEGA-4)-4) with opened with opened Sequence Sequence Data Explorer (Data Explorer (««редактором последовательностейредактором последовательностей»). »).

Dots are similar nucleotidesDots are similar nucleotides. . Undefined denoted byUndefined denoted by RR,,TT,,MM,,WW..

Page 19: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

5.5. TREE BUILDING WITH SOFTWARE TREE BUILDING WITH SOFTWARE PACKAGEPACKAGE MEGAMEGA-3 (-3 (MEGAMEGA-4) (-4) (33))

Close Sequence Data Explorer we have main menu ofClose Sequence Data Explorer we have main menu of MEGAMEGA. . Main menu ofMain menu of MEGA contains the following MEGA contains the following optionsoptions: : File (File (««файлфайл»),»), Data (Data (««данныеданные»), »), Distances (Distances (««расстояниярасстояния»), »), Phylogeny (Phylogeny (««филогенияфилогения»),»), Pattern (Pattern (««типтип»), »), Selection Selection ««отборотбор»), »), Alignment (Alignment (««выравниваниевыравнивание»). »). Option Alignment was considered Option Alignment was considered beforebefore ( (seesee 5.3). 5.3). There are two more options in main menuThere are two more options in main menu ( (WindowsWindows, , HelpHelp), ), which functions are obviouswhich functions are obvious..Main menu starts with theMain menu starts with the File optionFile option, , which allow several operations with filewhich allow several operations with file ( (FigFig. 5.2). . 5.2).

FigFig. 5.2. . 5.2. Opened window of main menu ofOpened window of main menu of MEGAMEGA-3 (-3 (MEGAMEGA-4) -4) with its optionswith its options. . Opened the dialog box for theOpened the dialog box for the FileFile options with some functions options with some functions. . Command line below gives location Command line below gives location

of working file of working file ((Data FileData File) ) at the diskat the disk a taska task title title ((TitleTitle).).

Page 20: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

5.5. TREE BUILDING WITH SOFTWARE TREE BUILDING WITH SOFTWARE PACKAGEPACKAGE MEGAMEGA-3 (-3 (MEGAMEGA-4) (-4) (44))

FigFig. 5.4. . 5.4. Opened window of main menu of Opened window of main menu of MEGAMEGA-3 (-3 (MEGAMEGA-4) -4) with its optionswith its options. . A dialog box is opned for the A dialog box is opned for the DistancesDistances ( (««расстояниярасстояния»»)) optionoption with several functionswith several functions..

Distances Distances Chose Model ( Chose Model (««выбрать модельвыбрать модель»), »), Pattern among Lineages Pattern among Lineages ((««тип между линиямитип между линиями»»; 1. ; 1. Same (Homogeneous) Same (Homogeneous) ((««одинаковыеодинаковые»») or) or ( (DifferentDifferent ((HeterogeneousHeterogeneous)) («(«различныеразличные» ). » ). 2. 2. Rates Among Sites (Rates Among Sites (««скорость между скорость между сайтамисайтами»). »). To choose an appropriate model allowed the optionTo choose an appropriate model allowed the option “Phylogeny”“Phylogeny”. .

Page 21: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

5.5. TREE BUILDING WITH SOFTWARE TREE BUILDING WITH SOFTWARE PACKAGEPACKAGE MEGAMEGA-3 (-3 (MEGAMEGA-4) (-4) (55))

Next option in main menu isNext option in main menu is PhylogenyPhylogeny ( (««филогенияфилогения») ») ((FigFig. 5.5). . 5.5). Actions:Actions: Construct Phylogeny (Construct Phylogeny (««построить построить филогениюфилогению»), »), oror Bootstrap Test of PhylogenyBootstrap Test of Phylogeny ((««бутстреп тест филогениибутстреп тест филогении»)»);; give the access to 4 give the access to 4 different programs of tree buildingdifferent programs of tree building. . From up to bottom that areFrom up to bottom that are: (1) : (1) Neighbor JoiningNeighbor Joining; ; NJ NJ ((««ближайшего соседстваближайшего соседства»), (2) »), (2) Minimal Evolution Minimal Evolution ((««минимальной эволюцииминимальной эволюции»), (3) »), (3) Maximum Maximum Parsimony (Parsimony (««максимальной парсимониимаксимальной парсимонии») ») andand (4) (4) UPGMAUPGMA ((НПГМАНПГМА). ). CommentsComments. .

Page 22: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

5.5. TREE BUILDING WITH SOFTWARE TREE BUILDING WITH SOFTWARE PACKAGEPACKAGE MEGAMEGA-3 (-3 (MEGAMEGA-4) (-4) (66))

FigFig. 5.5. . 5.5. Opened window of main menu of Opened window of main menu of MEGAMEGA-3 (-3 (MEGAMEGA-4) -4) with its optionswith its options. . The dialog boxThe dialog box of of Phylogeny (Phylogeny (««филогенияфилогения») ») andand Bootstrap Test of Phylogeny Bootstrap Test of Phylogeny

((««бутстреп тест филогениибутстреп тест филогении») ») are opened; submenu shows main trees are opened; submenu shows main trees allowed to buildallowed to build: (1) : (1) Neighbor JoiningNeighbor Joining; ; NJ (NJ (««ближайшего соседстваближайшего соседства»), (2) »), (2) Minimal Evolution (Minimal Evolution (««минимальной эволюцииминимальной эволюции»), (3) »), (3) Maximum Parsimony Maximum Parsimony ((««максимальной парсимониимаксимальной парсимонии») ») andand (4) (4) UPGMAUPGMA ((НПГМНПГМ))..

Page 23: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

5.5. TREE BUILDING WITH SOFTWARE TREE BUILDING WITH SOFTWARE PACKAGEPACKAGE MEGAMEGA-3 (-3 (MEGAMEGA-4) (-4) (77))

Tree building:Tree building: Bootstrap Test of Phylogeny Bootstrap Test of Phylogeny Neighbor Joining Neighbor Joining Analysis Preferences Analysis Preferences Phylogeny Test of EvolutionPhylogeny Test of Evolution (Options (Options BootstrapBootstrap, , ReplicationsReplications = 1000 и = 1000 и Random SeedRandom Seed = = 20044 (20044 (random numberrandom number)), , Model Model ((К2Р, К2Р, FigFig. 5.6). . 5.6). Run optionRun option Compute (Compute (««вычислитьвычислить»). »). We will We will have tree in thehave tree in the TreeExplorer (TreeExplorer (««исследователя деревьевисследователя деревьев») (») (FigFig. 5.7).. 5.7).

FigFig. 5.6. . 5.6. Opened window of main menu of MEGAOpened window of main menu of MEGA-3 (-3 (MEGAMEGA-4) -4) with its optionswith its options. . The dialog boxThe dialog box contain: contain: Bootstrap Test of PhylogenyBootstrap Test of Phylogeny Neighbor JoiningNeighbor Joining

Phylogeny Test of EvolutionPhylogeny Test of Evolution..

Page 24: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

5.5. TREE BUILDING WITH SOFTWARE TREE BUILDING WITH SOFTWARE PACKAGEPACKAGE MEGAMEGA-3 (-3 (MEGAMEGA-4) (-4) (88))

FigFig. 5.7. . 5.7. TreeExplorerTreeExplorer ((««исследователь исследователь деревьевдеревьев») ») ofof MEGAMEGA-3 -3 ((MEGAMEGA-4)-4)

NJ-tree file opened. NJ-tree file opened. Drosophila are on the tips Drosophila are on the tips of branches. Tree built on of branches. Tree built on nucleotide sequences of nucleotide sequences of MdhMdh genegene, , MEGAMEGA ((ExamplesExamples). ). Branch length Branch length is in the bottomis in the bottom. . Numbers Numbers in the nodes are bootstrap in the nodes are bootstrap support levelssupport levels (%). (%).

Page 25: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

6. 6. ANNOTATION ONANNOTATION ON PAUPPAUP, , MRBAYESMRBAYES AND AND SOME OTHER PROGRAMSSOME OTHER PROGRAMS

Other widely used PP areOther widely used PP are PAUPPAUP 4.0, 4.0, MrBayesMrBayes, , PHYLIP etcPHYLIP etc. . PAUPPAUP 4.0 ( 4.0 (SwoffordSwofford, 2002, 2002): Macintosh (): Macintosh («Макинтош»).«Макинтош»). This This PAUPPAUP 4.0 4.0 version explained in Hall (2001; 2003)version explained in Hall (2001; 2003). . For OSFor OS Windows there is PAUPWindows there is PAUP 4.0 4.0 1010b.b. PAUPPAUP 4.0 4.0 is very important tool (MODELTEST!)is very important tool (MODELTEST!). . Main its PP: Maximum Main its PP: Maximum LikelihoodLikelihood, , MLML, , NJ- and MP TreesNJ- and MP Trees. . Sustainability of tree quality is fineSustainability of tree quality is fine in in PAUP. Time in ML is bad property of PAUP; PAUP. Time in ML is bad property of PAUP; 67 67 seqseq at at CytCyt--bb ( (Kartavtsev et Kartavtsev et alal., 2007., 2007aa), ), tooktook 3 3 weeksweeks. . There is PAUPThere is PAUP forfor LinuxLinux//UnixUnix. . MrBayes MrBayes ((HulsenbeckHulsenbeck, , RondquistRondquist, 2001; , 2001; RonquistRonquist, , HuelsenbeckHuelsenbeck, 2003) , 2003) is is relatively small PPrelatively small PP. . Very effectiveVery effective. . Set ofSet of 67 67 seq was processed during 2 seq was processed during 2 daysdays. . Bayesian trees are MCMC based trees.Bayesian trees are MCMC based trees. MrBayesMrBayes provides other provides other opportunities, say phylogenetic trees based on morphologyopportunities, say phylogenetic trees based on morphology. . MrBayesMrBayes is not is not able to drow a tree. PP TreeViewable to drow a tree. PP TreeView ( (PagePage, 1996) , 1996) is necessary to view a tree is necessary to view a tree and build a consensus treeand build a consensus tree. . PPPP PHYLIPPHYLIP ( (FelsensteinFelsenstein, 1995) , 1995) is very good tool toois very good tool too. . Theoretic background Theoretic background is fine for it (Felsensteinis fine for it (Felsenstein, 2004). , 2004). PHYLIP gives opportunity to build main PHYLIP gives opportunity to build main treestrees. . Interface is for OSInterface is for OS DOS not very convenientDOS not very convenient..

Page 26: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

THANKSTHANKS!!

Page 27: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

Terminal taxa:Terminal taxa: A B C D E F G HA B C D E F G H OutgroupOutgroup: Внешняя : Внешняя Конечные таксоныКонечные таксоны группагруппа

Few TermsFew Terms

КореньКорень

Узлы,Узлы,События видообразованияСобытия видообразования

Внутренние узлыВнутренние узлы

ВетвиВетви

IngroupIngroup: Внутренние группы : Внутренние группы

Sister groupgroupSister groupgroup Сестринские группы

Page 28: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

AA

BB

CC

DD

EEUnresolved or Unresolved or Star-like TopologyStar-like TopologyНеразрешеннаяНеразрешеннаяили звездчатаяили звездчатаятопологиятопология

AA

CCEE

BB

DDPartly UnresolvedPartly UnresolvedTopologyTopologyЧастично Частично НеразрешеннаяНеразрешеннаятопологиятопология

Polytomy and MultifurcationsPolytomy and MultifurcationsПолитомия или Политомия или мультифуркациимультифуркации

AA

EE

CC

BB

DDFully ResolvedFully ResolvedBifurcation TreeBifurcation TreeПолностью Полностью РазрешенноеРазрешенноеБифуркационноеБифуркационноедреводрево

BifurcationBifurcationБифуркацияБифуркация

Dichotomy and PolychotomyDichotomy and PolychotomyДихотомия и полихотомияДихотомия и полихотомия

Page 29: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

ChimpChimpШимпанзеШимпанзе

MonkeyMonkeyМартышкаМартышка

FlyFlyМухаМухаRiceRice

РисРис

CabbageCabbageКапустаКапуста

Unrooted TreeUnrooted TreeНеукорененное древоНеукорененное древоThere is no a PossibilityThere is no a Possibility to talk on the Direction of Change or to talk on the Direction of Change or on a Descendanton a DescendantОтсутствует возможностьОтсутствует возможность говорить о направленности говорить о направленности или о предках на основе такого дерева.или о предках на основе такого дерева.

Page 30: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

MonkeyMonkey

Rooted TreeУкорененное древо On Rooted Tree one Could Suggest a Parent-and-DescendantOn Rooted Tree one Could Suggest a Parent-and-DescendantRelationshipsRelationshipsПо укорененному древу можно говорить об отношениях По укорененному древу можно говорить об отношениях предок - потомокпредок - потомок.. Exact Estimate of a Common Hypothetic Exact Estimate of a Common Hypothetic ancestorancestor Depends on Depends on the Place of Rootingthe Place of RootingТочная оценка общего гипотетического предка зависитТочная оценка общего гипотетического предка зависитОт места, куда установлен кореньОт места, куда установлен корень..

If Rooted HereIf Rooted HereЕсли укоренить Если укоренить здесьздесь

HumanHuman

MosquitoMosquitoRiceRice

SpinachSpinach SpinachSpinach RiceRiceMosquitoMosquitoMonkeyMonkey

HumanHuman

RootRootКореньКорень

Page 31: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

Species A Species B Species C

Species A

Species B

Species CSpecies TreeSpecies TreeВидовое древоВидовое древо

aa bb cc

aa

bb

ccGene TreeGene TreeГенное древоГенное древо

Difference between the Species Tree and Difference between the Species Tree and Gene TreeGene Tree: : Duplication of Gene CaseDuplication of Gene Case

Page 32: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

Reproductive IsolationРепродуктивная изоляция

Shortly after speciation, the sister taxa are highly likely to exibit a polyphyletic gene-tree status Вскоре после видообразо- вания сестринские таксоны с высокой вероятностью будут обнаруживать поли-филетический статус генного древа

After about 4N generation sister taxa appear reciprocally monophyletic with high probabilityПосле 4N поколений сес- тринские таксоны окажутся с высокой вероятностью реципрокно монофилетич- ными

Page 33: SOME TRAINING ON NUCLEOTIDE SEQUENCES: EDITION, REGISTRATION, ALIGNMENT AND TREE BUILDING

Sequence Submission to the GenBankSequence Submission to the GenBankПодписка последовательностей в Подписка последовательностей в GenBank (NCBI)GenBank (NCBI)

What are Barcodes?

Barcodes are short nucleotide sequences from a standard genetic locus for use in species identification. Currently, the Barcode sequence being accepted for animals is a 5' 650 base pair region of the mitochondrial cytochrome oxidase subunit I (COI) gene.

What does the Barcode Submission tool do?

The Barcode Submission tool provides for streamlined online submission of Barcode sequences into GenBank. With this tool, one can:

submit new Barcode sets complete your most recent incomplete submission download a flat file summary of completed submissions

How does the Barcode Submission tool work with My NCBI?

My NCBI is a central place to customize NCBI Web services. The Barcode Submission tool associates your Barcode submissions with your My NCBI user name and remembers your contact information to expedite future Barcode submissions. Barcode also associates your most recent incomplete submission with your My NCBI username so that if you're interrupted while submitting a Barcode set, you can complete the submission later.

To register for My NCBI, follow the link at the bottom of this page to Sign in to Use Barcode Submission Tool and click register for an account on the My NCBI Sign In page. Read My NCBI Help for more information about My NCBI.

In order to ensure that the My NCBI user currently using the Barcode Submission tool is the person submitting the Barcode set, you will be prompted for your My NCBI user name and password before you begin a Barcode submission.

What is needed to submit a Barcode set?

A My NCBI Account (register on My NCBI Sign In page) A web browser that supports both JavaScript and cookies The title of a published or in-press paper that discusses the Barcode Set A text file of the set of nucleotide sequences in FASTA format The names or sequences of forward and reverse primers A tab-delimited table of source modifier data for the set A text file of the set of protein sequences in FASTA format (optional) A tab-delimited table of trace attributes and a compressed archive containing the traces (optional)