Rna 3D structure prediction with NAST

Preview:

Citation preview

RNA 3D Structure Prediction with NAST

Xinpei Liu

刘欣培

Background

Test Simulations with NAST

Introduction to NAST

Content

1

2

3

4

System Consistency5

Effect of Secondary Structure

RNA folding vs Protein folding RNA 3D Structure Prediction Tools

• Manual

• Automatic

• Full atomic

• Coarse grained

• Physics based

• Knowledge based

Background

Introduction to NAST

Nucleic Acid Simulation Toolkit (NAST)• Funded by the Simbios National Center for Biomedical Computing• A knowledge-based coarse-grained tool for modeling RNA structures. It produ

ces a diverse set of plausible 3D structures that satisfy user-provided constraints based on:

• 1. Primary sequence• 2. Known or predicted secondary structure• 3. Known or predicted tertiary contacts (optional)

Requirements:• Python 2.6.x• PyOpenMM 2.0.0 (3.0.0 won't work!)

https://simtk.org/home/nastJonikas MA, Radmer RJ, Laederach A, Das R, Pearlman S, Herschlag D, Altman RB. Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters. RNA. 2009 Feb;15(2):189-99.

Advantages

• Provide information about the likely topology

of a molecule• Provide a good starting point for higher resolution atomic models

• Be able to handle large molecules (> 76nt) • Much faster than full-atomic simulation tools

• 1,000,000 steps within 138s• Allow uncertainty in the secondary structure (within a certain level)

Introduction to NAST

How to use NAST? • Primary Sequence File

• Go to http://www.rnasoft.ca/strand/ • Search for your structure and get a BPSEQ file • Use "parseBPseq.py" file in the package to generate a sequence f

ile • Secondary Structure File

• Use secondary structure prediction tool • e.g., Mcgenus• http://eole2.lsce.ipsl.fr/ipht/tt2ne/mcgenus.php

• Tertiary Contacts File (optional)• From experiments or phylogenetic analysis

Introduction to NAST

PDB ID 1ZIH 389 atoms 12 residues

Test Molecule Used

Simulations 1ZIH from an Unfolded Circle State 1,000,000 steps

Definition of q value

q is a normalized measure of similarity between a reference and comparison structure:

RMSD Mean: 2.683Sd: 0.449

3

2

3.5

4

Simulations

2.5

q value (ref.: crystal structure)Mean: 0.250 Variance: 0.00686

1ZIH from an Unfolded Circle State 1,000,000 steps

q value Mean: 0.246Variance: 0.00686

RMSD Mean: 2.704Sd: 0.454

Reference value:

Definition of GDT_TS Score

GDT_TS score    The Global Distance Test Total Score (GDT_TS) of Ca atoms is used to assess the correctness of the predicted model. GDT_TS has been commonly used in modeling studies and in the CASP community. GDT_TS is defined as:

where N in the total number residues of a target, GDTd is the number of aligned residues whose Ca-atom distance between the native structure and predicted model is less than d A (angstrom) after superposition of the two structures; and d is 1, 2, 4, and 8 A (angstrom).

•Zemla A: LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003, 31: 3370-3374.

Simulations

GDT_TSMean: 57.656%Sd: 7.223%

1ZIH from an Unfolded Circle State 1,000,000 steps

Test Molecule Used PDB ID 4JF2 1829 atoms 77 residues

Simulations 4JF2 From Unfolded Circle state 1,000,000 steps

RMSD Avg.: 11.830Sd: 2.591

15

Simulations

10

q valueMean: 0.128Variance:0.000788

20

4JF2 From Unfolded Circle state 1,000,000 steps

RMSD:10.3 ± 2.3Reference value:

q valueMean: 0.125Variance: 0.000964

Simulations

GDT_TSMean: 9.620%Sd: 4.681%

4JF2 From Unfolded Circle state 1,000,000 steps

14% ± 5% (the best cluster)

Reference value

1ZIH from Crystal Structure 1,000,000 steps

RMSD Mean: 8.364Sd: 1.710

Without Secondary Structure Constraints With Secondary Structure Constraints

• RMSD

32.52

3.5

RMSD Mean: 2.704Sd: 0.454

10

5

8

67

9

Effect of Secondary Structure

Without Secondary Structure Constraints

With Secondary Structure Constraints

• q value (ref: crystal)

q valueMean: 0.130Variance: 0.00458

q value Mean: 0.246Variance: 0.00686

1ZIH from Crystal Structure 1,000,000 steps

Effect of Secondary Structure

RMSD mean: 22.860Sd: 4.798

Without Secondary Structure Constraints

With Secondary Structure Constraints

• RMSD

RMSD Avg.: 11.378Sd: 1.176

4JF2 From Crystal Structure 1,000,000 steps

30

25

20

15

10

5

10

5

Effect of Secondary Structure

Without Secondary Structure Constraints

With Secondary Structure Constraints

• q value (ref: crystal)

q valueMean: 0.125Variance: 0.000964

q valueMean: 0.0761Variance: 0.00136

4JF2 From Crystal Structure 1,000,000 steps

Effect of Secondary Structure

Effect of Secondary Structure• Simulations with different percentage of wrong pairs in secondary structure

(600, 000 steps)

Mean Std.0% 6.0969 2.597115% 5.4951 1.805425% 4.4746 1.274835% 2.6558 2.0450

q valueMean: 0.3969Variance: 0.02268

System Consistency 1ZIH from an Unfolded Circle State

Reference Model: resulted structure from simulation with crystal structure (1,000,000 steps)

Reference Model:Crystal Structure (1,000,000 steps)

q value Mean: 0.246Variance: 0.00686

Reference Model: resulted structure from simulation with crystal structure (1,000,000 steps)

q valueMean:0.223Variance: 0.00276

4JF2 From Unfolded Circle stateSystem Consistency

q valueMean: 0.128Variance:0.000788

Reference Model:Crystal Structure

Folding result from NAST is able to provide a basic idea of the structure for a given sequence.

Small proportion of mistakes doesn’t really influence folding result but this holds only within a certain level.

The simulation will more likely to generate a folding that is more similar to other resulted models (with the same steps), instead of crystal structure

More tests with GDT-TS may be needed.

Conclusion

Any Question or Comment?