106

รศ · 2019. 9. 5. · NMR Spectrometer (~1-2.5A) Electron Microscope for cryo-EM ~10-15A Three complex techniques X-ray crystallography NMR spectroscopy Cyro-electron microscopy

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • รศ.ดร. เกียรติทวี ชูวงศ์โกมลE-mail: [email protected] Phone: 085-555-1480

    mailto:[email protected]

  • Assoc. Prof. Kiattawee Choowongkomon, Ph.D.

    Department of Biochemistry

    Faculty of Science

    Kasetsart University

    Email: [email protected]

    Mobile Phone: 085-555-1480

    mailto:[email protected]

  • Central Dogma of Biochemistry

  • Central Paradigm of Bioinformatics

  • 3 billion base pair => 6 G letters &

    1 letter => 1 byte

    The whole genome can be recorded in just 10 CD-ROMs!

    In 2003, Human genome sequence was

    deciphered!

    Genome is the complete set of genes of a living thing.

    In 2003, the human genome sequencing was completed.

    The human genome contains about 3 billion base pairs.

    The number of genes is estimated to be between 20,000 to 25,000.

    The difference between the genome of human and that of

    chimpanzee is only 1.23%!

  • Biological Diversity

    Bacteria

    Fruit Fly

    Human

    Yeast

  • Escherichia coli Methanococcus jannaschii

    Yeast Human

    E. coli M. jannaschii S. cerevisiae H. sapiens

    Characterized experimentally 2046 97 3307 10189

    Characterized by similarity 1083 1025 1055 10901

    Unknown, conserved 285 211 1007 2723

    Unknown, no similarity 874 411 966 7965

  • Proteins and Diseases

  • Proteins and Diseases (2)

    Distribution of PDB structure according to diseases

  • Protein Structure and Function

    proteins are polymers consisting of amino acids linked by covalent peptide bonds.

    native conformations : proteins have

    biological activity

    proteins are complex, they are defined in terms of four levels of structure.

  • Major examples of protein

    functions

    ⚫Binding

    ⚫Catalysis

    ⚫Molecular switches

    ⚫Structural components

  • Protein Function and Architecture

    Binding

    Catalysis

    Switching

    Structure

  • DNA

    Protein

    Nucleotides

    sequence

    Gene expression = Protein production

  • Bioinformatics: storage of biological

    information

    DNA (Genome)

    RNA (Transcriptome)

    Protein (Proteome)

  • Computational applications for

    DNA (genome)

    DNA Simple sequence analysis

    Database searching

    Pairwise analysis

    Regulatory analysis

    Gene Finding

    Whole genome annotation and analysis

    Comparative genomic analysis

  • DNA sequence: Bases: A, G, T, C Coding or non-coding sequence? Contain regulatory elements?

    Information of DNA (genome)

    >gi|109637932:950-1310 Hepatitis B virus (SUBTYPE

    ADW2), genotype A, complete genome

    GAAAACTTCCTGTTAACAGGCCTATTGATTGGAAAGTATGTCAAAGAATTGTGG

    GTCTTTTGGGCTTTGCTGCTCCATTTACACAATGTGGATATCCTGCCTTAATGC

    CTTTGTATGCCTGTATACAAGCTAAACAGGCTTTCACTTTCTCGCCAACTTACA

    AGGCCTTTCTAAGTAAACAGTACATGAACCTTTACCCCGTTGCTCGGCAACGGC

    CTGGTCTGTGCCAAGTGTTTGCTGACGCAACCCCCACTGGCTGGGGCTTGGCCA

    TAGGCCATCAGCGCATGCGTGGAACCTTTGTGGCTCCTCTGCCGATCCATACTG

    CGGAACTCCTAGCCGCTTGTTTTGCTCGCAGCCGGTC

  • Information of protein (proteinome)

    Protein:

    20 amino acids

    Primary sequence structure

    domain binding analysis

    Multiple sequence analysis

    Secondary and tertiary structure

    Active site

    Binding analysis

    Functional analysis including mutation analysis

    "middle surface protein (HepB)"

    MQWNSTAFHQALQDPKVRGLYFPAGGSSSGTV

    NPAPNIASHISSISARTGDPVTNMENITSGFL

    GPLLVLQAGFFLLTRILTIPQSLDSWWTSLNF

    LGGSPVCLGQNSQSPTSNHSPTSCPPICPGYR

    WMCLRRFIIFLFILLLCLIFLLVLLDYQGMLP

    VCPLIPGTTTTSTGPCKTCTTPAQGNSMFPSC

    CCTKPSDGNCTCIPIPSSWAFAKYLWEWASV

    RFSWLSLLVPFVQWFVGLSPTVWLSAIWMMWY

    WGPSLYSIVSPFIPLLPIFFCLWVYI

  • Structure levels of protein

  • Primary Structure

    Sequential order of amino acids in a polypeptide

    Writing left to right (N-ter to C-ter)

  • Protein Structure - Primary

    Protein: chain of amino acids joined by

    peptide bonds

    Amino Acid

    Central carbon (Cα) attached to:

    ○ Hydrogen (H)

    ○ Amino group (-NH2)

    ○ Carboxyl group (-COOH)

    ○ Side chain (R)

  • General Amino Acid Structure

    H

    R

    COOHH2N

  • 20 amino acids

  • 28

    Amino Acid

    R-groups

    Polar

    UnchargedCysteine

    Proline

    Serine

    Glutamine

    Asparagine

    ChargedArginine (+)

    Glutamic acid (-)

    Aspartic Acid (-)

    Lysine (+)

    Histidine (+)

    Non-Polar

    HydrophobicTryptophan

    Phenylalanine

    Isoleucine

    Tyrosine

    Leucine

    Valine

    Methionine

    AmbivalentGlycine

    Threonine

    Alanine

  • Linear polymers of proteins

    Amino acids are connected by amide bonds,

    often called peptide bonds

    peptide bond is a

    covalent bond between

    A carboxylic acid and

    A amino group by loss of

    A water molecule

  • Peptide bond

    Stability of peptide bond is due to resonance

    (delocalization of electrons over several atoms)

    Increase the polarity of the peptide bond

    ○ Also generate dipole moment

    Partial double-bond character

    ○ coplanar / non-rotatable

  • Primary Structure of Proteins

    The amino acid sequence (the primary structure) of a protein determines its three-dimensional structure, which, in turn, determines its properties.

    In every protein, the correct three-dimensional structure is needed for correct functioning.

    Determining the sequence of amino acids in a protein is a routine, but not trivial, operation in classical biochemistry.

  • Christian B. Anfinsen: Nobel Prize in Chemistry (1972)

    1KETAAAKFERQHMDSSTSAASSSNYCNQMMKS

    RNLTKDRCKPVNTFVHES

    LADVQAVCSQKNVACKNGQTNCYQSYSTMSITD

    CRETGSSKYPNCAYKTT

    QANKHIIVACEGNPYVPVHFDASV124

    Sequence Determines Structure

    All of the information

    necessary for folding

    the peptide chain into

    its "native” structure

    is contained in the

    primary amino acid

    structure of the

    peptide.

  • A case study of the effects of

    mutation: Sickle cell anemia

  • Secondary Structure

    Organizing of local conformation in polypeptide

    chain

  • Backbone Torsion Angles

    • Dihedral angle ω (omega):

    rotation about the peptide

    bond, namely Cα1-{C-N}- Cα

    2

    • Dihedral angle φ (phi): rotation

    about the bond between N and

    • Dihedral angle ψ (psi): rotation

    about the bond between Cα

    and the

    carbonyl carbon

  • Ramachandran Plot

    White = sterically

    disallowed

    conformations (atoms

    come closer than sum of

    van der Waals radii)

    Blue = sterically

    allowed conformations

  • Secondary Structure Prediction

    One of the first fields to emerge in

    bioinformatics (~1967)

    Grow from a simple observation that

    certain amino acids or combinations of

    amino acids seemed to prefer to be in

    certain secondary structures

    Subject of hundreds of papers and dozens

    of books, many methods…

  • PSSP Algorithms

    There are three generations in PSSP algorithms

    • First Generation: based on statisticalinformation of single aminoacids

    • Second Generation: based on windows(segments) of aminoacids. Typically a window containes 11-21 aminoacids

    • Third Generation: based on the use of windows on evolutionary information

  • PSSP: First Generation

    First generation PSSP systems are based on

    statistical information on a single aminoacid

    The most relevant algorithms:

    Chow-Fasman, 1974

    GOR, 1978

    Both algorithms claimed 74-78% of predictive

    accuracy, but tested with better constructed

    datasets were proved to have the predictive

    accuracy ~50% (Nishikawa, 1983)

  • Chou & Fasman

    Determined the frequency of occurrence

    of each amino acid in helices and sheets.

    Calculated from survey of 15 known

    structure proteins.

    http:/fasta.bioch.virginia.edu/fasta_www2/f

    asta_www.cgi?rm=misc1

  • Chou-Fasman parameters

    Note: The parameters given in the textbook are 100*Pi

  • Simplified C-F Algorithm

    Select a window of 7 residues

    Calculate average P over this window and assign that value to the central residue

    Repeat the calculation for Pb and Pc Slide the window down one residue and repeat

    until sequence is complete

    Analyze resulting “plot” and assign secondary structure (H, B, C) for each residue to highest value.

  • Exercise

    Predict the secondary structure of the following

    protein sequence:

    Ala Pro Ala Phe Ser Val Ser Leu Ala Ser Gly Ala

    142 57 142 113 77 106 77 121 142 77 57 142

    83 55 83 138 75 170 75 130 83 75 75 83

    66 152 66 60 143 50 143 59 66 143 156 66

  • Simplified C-F Algorithm

    helix beta coil

    10 20 30 40 50 60

  • Prediction Performance

    45

    50

    55

    60

    65

    70

    75C

    F

    GO

    R I

    LIM

    LE

    VIN

    PT

    IT

    JA

    SE

    P7

    GO

    R II

    I

    ZH

    AN

    G

    PH

    D

    Sco

    res (

    %)

  • Assoc. Prof. Kiattawee Choowongkomon, Ph.D.

    Department of Biochemistry

    Faculty of Science

    Kasetsart University

    Email: [email protected]

    Mobile Phone: 085-555-1480

    mailto:[email protected]

  • Download DNA/protein sequence

    Translate DNA to protein

    Comparing Protein sequences

    Extract information

    pI / MW

    secondary structure prediction

  • Assoc. Prof. Kiattawee Choowongkomon, Ph.D.

    Department of Biochemistry

    Faculty of Science

    Kasetsart University

    Email: [email protected]

    Mobile Phone: 085-555-1480

    mailto:[email protected]

  • Tertiary Structure of Proteins

    The tertiary structure of a protein is the three-

    dimensional arrangement of all the atoms in

    the molecule.

    The conformations of the side chains and the

    positions of any prosthetic groups are parts of

    the tertiary structure as is the arrangement of

    helical and pleated-sheet sections with

    respect to one another.

  • Important bonds for protein folding and stability

    The oxidization of the sulfhydryl

    groups of two cystein residues

    (intramolecule: ribonuclease;

    intersubunit; dimeric protein

    insulin)

    Weak (2-5 kcal/mol vs.

    covalent: 70-100 kcal/mol),

    but massive

    Weak (3 kcal/mol),

    affected by pH value

    Dipole molecules attract each other by van der Waals force (transient and weak: 0.1-0.2 kcal/mol)

    Hydrophobic interaction, a tendency of hydrophobic groups or molecules being excluded from interact with

    hydrophilic environment

  • Structural Modeling Methods

    1. Experimental Methods

    X-ray crystallography

    NMR spectroscopy

    Cyro-electron microscopy

    Other Biophysical Methods

    2. Computer modeling Methods

    Comparative methods

    Ab initio methods

  • X-ray Crystallography

    Diffraction pattern

    X-ray source Crystal

    Intensities

  • Nuclear Magnetic Resonance (NMR)

    Uniformly 15N and/or

    13C- labeled

    peptides

    Pulse sequences

    Magne

    t

    NMR SpectraAnalysis, Assignment,

    and Structure CalculationStructure

  • Nuclear Overhauser Effect (NOE)

    H H H

    HH

    H

    NOE

    NOE NOE

  • Cryo-Electron Microscope

    Single particle image reconstruction

    Koning et al. (2003)

    Bacteriophage MS2

  • Cryo-EM density at 7 A of

    Adenovirus PIIIA

  • High Field

    NMR

    Spectrometer

    (~1-2.5A) Electron Microscopefor cryo-EM ~10-15A

    Three complex techniques

    ❑ X-ray crystallography

    ❑ NMR spectroscopy

    ❑ Cyro-electron microscopy

    Synchrotron radiation X-ray

    crystallography (~1A)

  • Structure representation of ProteinsRibbon mesh Surface contour

    Ribbon with sidechain Space-filling model

  • Protein

    Data

    Bank

    http://www.rcsb.org/pdb/home/home.do

  • How to solve the structures

  • Rate of success to determine the

    structures

    http://targetdb.pdb.org/statistics/TargetStatistics.html

  • PDB File Format

    REMARK FILENAME="/usr/people/nonella/xplor/benchmark1/ALANIN.PDB"

    REMARK PARAM11.PRO ( from PARAM6A )

    ...

    REMARK JACS 103:3976-3985 WITH 1-4 RC=1.80/0.1

    REMARK DATE:16-Feb-89 11:21:32 created by user: nonella

    ATOM 1 CA ACE 1 -2.184 0.591 0.910 1.00 7.00 MAIN

    ATOM 2 C ACE 1 -0.665 0.627 0.966 1.00 0.00 MAIN

    ATOM 3 O ACE 1 -0.069 1.213 1.868 1.00 0.00 MAIN

    ...

    ATOM 64 N CBX 12 8.610 8.962 9.714 1.00 0.00 MAIN

    ATOM 65 H CBX 12 8.050 8.324 9.225 1.00 0.00 MAIN

    ATOM 66 CA CBX 12 9.223 8.571 11.014 1.00 0.00 MAIN

    END

    Example: alanin.pdb in the VMD Distribution

    Comments

    Coordinat

    e

    Informati

    on

  • PDB File Format (2)RTyp Num Atm Res Ch ResN X Y Z Occ Temp PDB Line

    ATOM 1 N ASP L 1 4.060 7.307 5.186 1.00 51.58 1FDL 93

    ATOM 2 CA ASP L 1 4.042 7.776 6.553 1.00 48.05 1FDL 94

    RTyp: Record Type

    Num: Serial number of the atom. Each atom has a unique serial number.

    Atm: Atom name (IUPAC format).

    Res: Residue name (IUPAC format).

    Ch: Chain to which the atom belongs (in this case, L for light chain of an

    antibody).

    ResN: Residue sequence number.

    X, Y, Z: Cartesian coordinates specifying atomic position in space.

    Occ: Occupancy factor

    Temp: Temperature factor (atoms disordered in the crystal have high

    temperature factors).

    PDB: The PDB data file unique identifier.

    Line: Line (record) number in the data file.

    (http://www.umass.edu/microbio/rasmol/pdb.htm)

  • How to Represent a Molecule?

    H

    H

    H

    H

    H

    H

    Small

    Molecules

    Example: C6H6

    Lines Sticks Ball & Sticks

    DotsSapcefill B&S&D

  • An Example: Alanine Peptide

    Lines Sticks (Bonds)

    Ribbons Cartoon

    Surface Balls & Sticks & Ribbons

  • More Examples: Proteins

    Cytochrome P450cam

    Avidin-Biotin Complexhttp://www.ks.uiuc.edu/Research/vmd/gallery/

  • 3D Visualization Tools

    Web-based tools

    JMOL (http://jmol.sourceforge.net/)

    CHIME (http://www.mdli.com/)

    Free Programs

    Rasmol (http://www.OpenRasMol.org/)

    Deep Viewer(http://www.expasy.org/spdbv/)

    Pymol (http://pymol.sourceforge.net/)

    Commercial Proteins

    Discover Studio (Insight)

    SYBYL

  • Chime/ JMOL

  • [ http://www.expasy.org/spdbv/ ]

    Deep Viewer (Spdb viewer)

  • Pymol

  • SYBYL

  • Discover Studio

  • Assoc. Prof. Kiattawee Choowongkomon, Ph.D.

    Department of Biochemistry

    Faculty of Science

    Kasetsart University

    Email: [email protected]

    Mobile Phone: 085-555-1480

    mailto:[email protected]

  • RCSB web

    Swiss PDB Viewer

    Pymol

    Discovery Studio

  • Assist. Prof. Kiattawee Choowongkomon, Ph.D.

  • Flow Chart for Protien Modeling

  • ?

    KQFTKCELSQNLYDIDGYGRIALPELICTMF

    HTSGYDTQAIVENDESTEYGLFQISNALWCK

    SSQSPQSRNICDITCDKFLDDDITDDIMCAK

    KILDIKGIDYWIAHKALCTEKLEQWLCEKE

    homology modeling

    (Comparative Modeling)

    Use as template

    & model8lyz1alc

    KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK

    FESNFNTQATNRNTDGSTDYGILQINSRWWCND

    GRTPGSRNLCNIPCSALLSSDITASVNCAKKIV

    SDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare

    Similar

    Sequence

    Homologous

  • Homology Modeling Steps

    1. Search homologous proteins: By sequence search tools such as dot plot, blast

    2. Alignment (key step): Find Structurally Conserved Regions (SCRs) and Structurally Variable Regions (SVRs)

    3. Core modeling: copy backbone coordinates from the homologous one with know structure

    4. Loop modeling: search fragment library

    5. Side chain modeling: search rotamer library

    6. Optimizing model: enegy minimization

    7. Evaluating model: some tools such as WHAT IF, PROCHECK, and Verify3D can be used

  • Searching Sequence Homologues in PDB

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWW

    PRETWQLKHGFDSADAMNCVCNQWER

    GFDHSDASFWERQWK

    Query Sequence PDB

  • Searching Sequence Homologues in PDB

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWW

    PRETWQLKHGFDSADAMNCVCNQWER

    GFDHSDASFWERQWK

    Query Sequence PDB

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWW

    PRETWQLKHGFDSADAMNCVCNQWER

    GFDHSDASFWERQWK

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFG

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGPRTEINSEQENCE

    PRTEINSEQUENCEPRTEINSEQNCE

    QWERYTRASDFHGTREWQIYPASDFG

    TREWQIYPASDFGPRTEINSEQENCE

    PRTEINSEQUENCEPRTEINSEQNCE

    QWERYTRASDFHGTREWQ

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQQWEWEWQWEWEQWEWE

    WQRYEYEWQWNCEQWERYTRASDFHG

    TREWQIYPASDWERWEREWRFDSFG

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGHKLMCNASQERWW

    PRETWQLKHGFDSADAMNCVCNQWER

    GFDHSDASFWERQWK

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFG

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQNCEQWERYTRASDFHG

    TREWQIYPASDFGPRTEINSEQENC

    PRTEINSEQENCEPRTEINSEQUENC

    EPRTEINSEQQWEWEWQWEWEQWEWE

    WQRYEYEWQWNCEQWERYTRASDFHG

    TR

    Hit #1

    Hit #2

  • Why Homology Modeling?

    Value in structure based drug design

    Find common catalytic sites/molecular recognition sites

    Use as a guide to planning and interpreting experiments

    70-80 % chance a protein has a similar fold to the target protein due to X-ray crystallography or NMR spectroscopy

    Sometimes it’s the only option or best guess

  • Homology Modeling Limitations

    Cannot study conformational changes

    Cannot find new catalytic/binding sites

    Large Bias towards structure of template

    Three exceptional cases to keep in mind Same fold but not similar sequence

    ○ Myoglobin & hemoglobin, only 20% similarity

    Different structures but similar functions○ subtilisin

    Similar sequences but different functions○ Chymotrypsionogen, trypsinogen and plasminogen

    ○ 40% homologous

    ○ 2 active, 1 no activity, cannot explain why

  • Homology modeling

    Web Server SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)

    CPHmodels (http://www.cbs.dtu.dk/services/CPHmodels/)

    ESyPred3D (http://www.fundp.ac.be/sciences/biologie /urbm/bioinfo/esypred/)

    3Djigsaw (http://www.bmm.icnet.uk/servers/3djigsaw/)

    Geno3D (http://geno3d-pbil.ibcp.fr/)

    Free Programs MODELLER (http://salilab.org/modeller/)

    TINK (http://dasher.wustl.edu/tinker/)

    Commercial Programs COMPOSER (www.tripos.com/data/SYBYL/ )

    http://swissmodel.expasy.org/SWISS-MODEL.htmlhttp://www.cbs.dtu.dk/services/CPHmodels/http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/http://www.bmm.icnet.uk/servers/3djigsaw/http://geno3d-pbil.ibcp.fr/http://salilab.org/modeller/http://dasher.wustl.edu/tinker/

  • Protein Threading

    The word threading implies that one drags the

    sequence (ACDEFG...) step by step through each

    location on each template

  • Protein Threading or Fold Recognition

    KQFTKCELSQNLYDIDG

    YGRIALPELICTMFHTS

    GYDTQAIVENDESTEYG

    LFQISNALWCKSSQSPQ

    SRNICDITCDKFLDDDI

    TDDIMCAKKILDIKGID

    YWIAHKALCTEKLEQWL

    CEKE

  • What do we need for protein

    threading ?

    Accuracy prediction of secondary structure

    All of (known) Protein folding pattern

    Scoring function to get the best result

  • http://zhanglab.ccmb.med.umich.edu/I-TASSER/

  • Assoc. Prof. Kiattawee Choowongkomon, Ph.D.

    Department of Biochemistry

    Faculty of Science

    Kasetsart University

    Email: [email protected]

    Mobile Phone: 085-555-1480

    mailto:[email protected]

  • 3D modeling

    Swissmodel

    https://swissmodel.expasy.org/int

    eractive/Kw3ETX/models/

    https://swissmodel.expasy.org/int

    eractive/RdDYfp/models/

    I-tasser

    http://zhanglab.ccmb.med.umich.edu

    /I-TASSER/output/S460254/

    https://zhanglab.ccmb.med.umich.ed

    u/I-TASSER/output/S460261/

    https://swissmodel.expasy.org/interactive/Kw3ETX/models/https://swissmodel.expasy.org/interactive/RdDYfp/models/http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S460254/https://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S460261/

  • A Good Protein Structure..

    Minimizes disallowed

    torsion angles

    Maximizes number of

    hydrogen bonds

    Maximizes buried

    hydrophobic ASA

    Maximizes exposed

    hydrophilic ASA

    Minimizes interstitial

    cavities or spaces

  • A Good Protein Structure..

    Minimizes number of

    “bad” contacts

    Minimizes number of

    buried charges

    Minimizes radius of

    gyration

    Minimizes covalent and

    noncovalent (van der

    Waals and coulombic)

    energies

  • Error Detections

    Geometrical error detection PROCHECK

    WHATCHECK

    Error detection using mean-force/statistical approaches PROVE

    PROSA II

    ANOLEA

  • Assessment of Φ,Ψ values: the Ramachandran plot

    Ramachandran et al. drew the simple scatterplot of Φ,Ψ for a set of proteins. Because of clashes between

    backbone atoms (N,Ca,C,O) and the Cβ or other part of the side-chain, only a small part of the Φ,Ψ plot is

    actually populated.

    The good thing is that the “Ramachandran constraints” are not included into refinement programs, thus

    making use of the Ramachandran statistics an orthogonal indicator. In addition, there is an excellent

    empirical correlation between the quality of structural models (as measured by the resolution) and their

    compliance with the “Ramachandran constraints” (cf the plot of “what makes a good quality indicator).

    Φ

    Ψ

    Example: PROCHECK output

    most favored region

    allowed region

    generously allowed region

    disallowed region

  • Plot 1. Ramachandran plot

    phi-psi torsion angles for all

    Glycine residues are triangles as these are not restricted to the regions of the plot

    The coloring on the plot represents the different Red areas = most favourable core

    Yellow areas = additional allow region

    Bright Yellow areas = generally allow region

    White area = disallowed region

    Less than 2% in disallowed region

  • Example of Prove Output

    Average Z-score of

    atoms in well resolved

    structures tend to be

    between -0.10 and

    0.10.

    Z-score rms of atoms in

    well resolved structures

    tend to be less than

    1.0.

    Outlier atoms > 3.0 SD

  • Summary of Validation

    Programs

    PROCHECK http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html

    WHATCHECK/WHATIF http://swift.cmbi.kun.nl/swift/whatcheck/

    PROSA http://www.came.sbg.ac.at/Services/prosa.html

    VERIFY3D http://nihserver.mbi.ucla.edu/Verify_3D/

    ANOLEA http://swissmodel.expasy.org/anolea/

    PROTABLE http://www.tripos.com (Commercial license)

  • Web services

    http://www.jcsg.org/prod/scripts/validation/sv2.

    cgi

    PPOCHECK

    SFCHECK

    PROVE

    ERRAT

    WASP

    DDQ

    WHATCHECK

    http://www.jcsg.org/prod/scripts/validation/sv2.cgi