24
Structure, Volume 24 Supplemental Information Structure of the EndoMS-DNA Complex as Mismatch Restriction Endonuclease Setsu Nakae, Atsushi Hijikata, Toshiyuki Tsuji, Kouki Yonezawa, Ken-ichi Kouyama, Kouta Mayanagi, Sonoko Ishino, Yoshizumi Ishino, and Tsuyoshi Shirai

Structure of the EndoMS-DNA Complex as Mismatch ... · (residues Lys3 to Glu120) were dimerized as in the crystal structure of PabNucS. These N- and C-terminal models were used as

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Structure, Volume 24

    Supplemental Information

    Structure of the EndoMS-DNA Complex

    as Mismatch Restriction Endonuclease

    Setsu Nakae, Atsushi Hijikata, Toshiyuki Tsuji, Kouki Yonezawa, Ken-ichi Kouyama, KoutaMayanagi, Sonoko Ishino, Yoshizumi Ishino, and Tsuyoshi Shirai

  • SUPPLEMENTARY INFORMATION

    Structure of the EndoMS-DNA complex as mismatch-restriction endonuclease

    Setsu Nakae, Astuhi Hijikata, Toshiyuki Tsuji, Koki Yonezawa, Ken-ichi Kouyama, Kouta

    Mayanagi, Sonoko Ishino, Yoshizumi Ishino, Tsuyoshi Shirai

    Contents

    Detailed Materials and Methods

    Supplementary Figure S1. Related to Figure 2. Alignment of putative EndoMS orthologs

    Supplementary Figure S2. Related to Figure 2. Molecular phylogeny and genome structure of putative EndoMS

    orthologs

    Supplementary Figure S3. Related to Figure 2. Structures of EndoMS and type II restriction enzymes

    Supplementary Figure S4. Related to Figure 3. Sequence and affinity of DNAs for EndoMS substrates

    Supplementary Figure S5. Related to Figure 3. DNA bound-structures of EndoMS

    Supplementary Figure S6. Related to Figure 4. Electron microscopic images and class averages of the

    TkoEndoMS-TkoPCNA- dsDNA complex

    Supplementary Table S1. Related to Figure 2. Structures similar to N- or C-terminal domains of EndoMS in the

    PDB

    Supplementary Table S2. Related to Figure 3. DNA base-pair and base-step parameters of EndoMS-bound

    and canonical B-form DNAs

    Supplementary References

  • Detailed Materials and Methods

    Database survey. Homologs of TkoEndoMS were screened using amino acid sequence (UniProt) and structure (PDB)

    databases 1,2. The amino acid sequence of TkoEndoMS (UniProt entry Q5JER9) was used for the query, and the

    UniProtKB database was searched with BLAST 3. A total of 1,336 sequences were found to have E-values lower than

    0.01 and were aligned with ClustalW; their molecular phylogenies were constructed with the neighbor-joining method

    4,5. Multiple sequence alignment and the phylogenetic tree of the 42 EndoMS/NucS homologs, which were selected

    according to clustering on the tree and overall conservation between TkoEndoMS, are shown in Figures S1 and S2,

    respectively. Putative EndoMS/NucS orthologs were detected in Archaea (Euryarchaeota, Crenarchaeota, and

    Thaumarchaeota) and a group of Eubacteria (Actinobacteria and Deinococcus-Thermus).

    The structural similarities between EndoMS/NucS and other proteins were examined in the PDB. Subunit A

    of PabNucS (PDB: 2VLD) was divided into N-terminal (residues Lys3 to Glu120) and C-terminal (residues Ser126 to

    Pro233) domains, and each domain was separately used for a query in a structural search with the fast structure

    superposition application in the SIRD database system (http://sird.nagahama-i-bio.ac.jp/sird/). Subunits/domains that

    showed less than 4.0 Å root mean square deviation (RMSD) for more than 50 residues were retrieved (Table S1).

    Redundant hits on similar structures (more than 20% sequence identity) were discarded.

    The results revealed that a few proteins showed structures similar to that of the N-terminal domain.

    Although the proteins appeared to bind RNA, none of the structures of these proteins had been determined in complex

    with RNA; thus, these proteins did not provide useful information for modeling of EndoMS-dsDNA interactions. In

    contrast, many DNA-binding proteins have been detected for the C-terminal domain. This was expected because the

    C-terminal domain of EndoMS belongs to a fold family, which contains a large variety of restriction, recombination,

    and repair nucleases 6,7. A previous study showed that the C-terminal domain of PabNucS showed structural similarities

    with RecB 8. However, the current results emphasized the similarities between EndoMS and type II restriction enzymes.

    The type II restriction enzyme SgrAI was superposed with the C-terminal domain of EndoMS with an RMSD of 3.4 Å

    for 91 residues; this was the most similar structure observed among known structures. Additionally, the restriction

    enzymes FokI, AspBHI, Cfr10I, Bse634I, R.BspD6I, and BsoBI were detected. From this analysis, all of the detected

  • structures in complex with DNA were those of restriction enzymes (Figure S3). Therefore, these structures were

    considered in modeling the EndoMS-dsDNA complex structure, as explained below.

    The UCSC Archaeal Genome Browser (http://archaea.ucsc.edu/) and ProOpDB

    (http://operons.ibt.unam.mx/OperonPredictor/) were referenced in order to investigate genomic structures proximal to

    the EndoMS gene 9,10. In ProOpDB, operons of the endoMS gene were searched, and operons consisting of the gene and

    radA-related genes were conserved only in the archaeal Thermococcus genus. The genome structures of the predicted

    operons were retrieved from the UCSC Archaeal Genome Browser for T. kodakaraensis KOD1, T. gammatolerans, T.

    oonurience, T. sp. strain 4557, T. sibiricus, and T. barophilus MP (Figure S2) 11-15. STRING (http://string-db.org/) and

    IntAct (http://www.ebi.ac.uk/intact/) databases were referenced for the experimentally detected protein-protein

    interactions and for prediction of the functional relationships with EndoMS, respectively 16,17.

    Knowledge-based modeling of the TkoEndoMS-dsDNA complex. According to the results of a database survey, two

    of the EndoMS C-terminal domains were assumed to tether dsDNA in manner similar to that of restriction enzymes.

    BsoBI was employed as the reference structure for the modeling because its catalytic domain exhibited the highest

    similarity to that of EndoMS in terms of motif conservation and sizes of insertions/deletions (Figure S3a). The

    N-terminal domains were thought to take part in a dimer formation. The C-terminal domain of PabNucS (residues

    Ser126 to Pro233, PDB: 2VLD) was superposed onto the corresponding domains of BsoBI (PDB: 1DC1) in order to

    construct a C-terminal domain dimer on dsDNA (Figure 6). The dsDNA was truncated, and nucleotides were replaced

    according to the cocrystallized structure (G-T-mismatch-DNA1 in Figure S4a). The N-terminal domains of PabNucS

    (residues Lys3 to Glu120) were dimerized as in the crystal structure of PabNucS. These N- and C-terminal models were

    used as search models in molecular replacement, as detailed below.

    Crystal structure analyses. The crystal structures of TkoEndoMS were determined with X-ray crystallography. The

    mutant TkoEndoMS gene harboring Asp165Ala (D165A) inactive mutation was cloned, expressed, and purified as

    described previously 18.

    DNA-bound TkoEndoMS crystals were obtained under initial conditions using 80 mM sodium cacodylate

  • buffer (pH 6.5) containing 0.16 M calcium acetate, 14.4% (w/v) PEG8000, and 20% (w/v) glycerol for a 0.5-mL

    reservoir and a mixture of 2 µL of the reservoir solution and 2 µL of protein solution in a 50 mM Tris-HCl (pH 8.0)

    buffer containing 0.1 mM EDTA, 0.5 mM DDT, 10% (w/v) glycerol, 0.6 M NaCl, 200 µM TkoEndoMS dimer, and 200

    µM dsDNA (T-G-mismatch-DNA1 in Figure S4a) for a hanging drop.

    Crystals grew at 18°C to approximate maximum dimensions of 0.3 × 0.3 × 0.01 mm3 within a few weeks.

    X-ray diffraction data were collected from loop-mounted crystals under cryogenic conditions with a CCD detector

    Quantum315 (ADSC) at BL38B1, SMART6500 (Bruker AXS) at BL44XU in SPring-8 (Hyogo, Japan), or Quantum

    270 (ADSC) at BL17A in Photon Factory (Tsukuba, Japan). The crystals were soaked for 10–30 s in a crystal growth

    buffer containing 15% (v/v) 2-methyl-2,4-pentanediol (MPD) for cryoprotection. The diffraction images were

    processed using the MOSFLM program 19,20.

    The crystal structure was solved with the molecular replacement method using the Phaser-MR application

    of PHENIX or MOLREP of CCP4 suites 21,22. A solution of the EndoMS-dsDNA crystal structure was initially

    attempted using the crystal structure of PabNucS (PDB: 2VLD) for search models. TkoEndoMS and PabNucS

    demonstrated 69.7% identity in amino acid sequences. PabNucS as dimers, monomers, or separated N-terminal and

    C-terminal domains was used in both all-atom and Cβ models for molecular replacements. However, no promising

    solution was obtained.

    Since the database survey clearly indicated the similarities between EndoMS and type II restriction

    enzymes, knowledge-based models of C-terminal domain dimer with dsDNA and N-terminal domain dimer were

    examined as search models. The domains in the models were rendered into Cβ models and were applied for Phaser-MR.

    Under these conditions, a solution that was reasonable in terms of crystal packing, R-values, and electron density map

    was obtained. The initial R/free R factors of the model were 0.377/0.366, while those after three cycles of rigid body,

    coordinates, and simulated annealing refinements were 0.355/0.386 for reflections between 25.0 and 2.4 Å resolution.

    In the map after refinement, the electron densities for most of the side chains were clearly observed as residual densities

    (Figure S5a).

    The model refinements were conducted by using COOT and the phenix.refine application of PHENIX 21,23.

    Because EndoMS was in homodimer form with two-fold symmetry, dsDNA might bind to the protein in two different

  • (opposing) directions. Since the dual-conformation was actually implied in the electron density map as a residual

    density, the dsDNA model was duplicated and those in opposite directions were built in as alternative coordinates of

    equal occupancy (Figures S5c, d, e, f, and g). In the initial refinement process, high residual densities were observed

    proximal to many of the bases, where only one of the alternative dsDNA models was considered (an example electron

    density is shown in Figure S5f). A high electron density proximal to Glu179 residues was interpreted to be a

    magnesium ion (Figure S5b), because the atoms positioned on this density showed interatomic distances 2.15 Å in

    average (s.d. 0.19 Å) to the coordinating oxygen atoms (values are those in the final models). Blobs of electron

    densities observed between proteins and DNA were modeled in MPD, a cryoprotectant used in diffraction experiments,

    because corresponding densities were not observed when data were collected without the cryoprotectant but in the

    presence of a higher (~35%) glycerol concentration (data not shown). Final R /freeR-factors of 0.190/0.244 were

    obtained for this crystal structure (Table I).

    Structural analyses of the TkoEndoMS-dsDNA complex were also attempted for DNAs with different

    mismatched base pairs (G-G, T-T, A-C, or T-C) or background sequences (DNA1, DNA2, or DNA3; Figure S4a).

    DNA1, DNA2, and DNA3 have no consensus base except for those in the mismatch pairs. Crystallization experiments

    were carried out using the same conditions as described above, and crystals were obtained for mismatch base pairs of

    G-T, G-G, or T-T regardless of the background sequence. On the other hand, no crystals grew for DNAs containing

    A-C or T-C mismatches, as well as normal A-T base pair. Although screening for crystallization conditions was

    repeated, no crystals were obtained for complexes containing these DNAs. The structures of T-T-mismatch-DNA1,

    G-G-mismatch-DNA1, G-T-mismatch-DNA2, and G-T-mismatch-DNA3 complexes were solved via molecular

    replacement using the structure of the G-T-mismatch-DNA1 complex as a search model and were refined with the same

    procedures as described above.

    The crystals of apo TkoEndoMS grew in a stock solution (50 mM Tris-HCl, pH 8.0, 0.1 mM EDTA, 0.5 mM

    DDT, 10% [w/v] glycerol, 0.6 M NaCl, and 445 µM TkoEndoMS) stored at 4°C. X-ray diffraction data were collected

    with a CCD detector Quantum 270 at BL17A in the Photon Factory under cryogenic conditions and processed as

    described above. Analysis of reflection data with the Xriage tool suggested that the crystal was twinned with an

    operator (-h, l, k). Data collection from other apo-form crystals was carried out, and the crystals were always twinned

  • with fractions of 0.05–0.08. The crystal structure was solved with the molecular replacement method by separately

    using N- and C-terminal domains of DNA-bound TkoEndoMS as search models. Model refinement was conducted as

    mentioned above by applying the twin operator.

    The quality of the models was evaluated by using the PROCHECK program 24. The parameters of dsDNA

    were evaluated by using W3DNA web-tool, and compared with that of a canonical B-form DNA (PDB: 1BNA) (Table

    S2) 25,26. The crystallographic parameters, data collection and refinement statistics, and PDB codes are summarized in

    Table I. The molecular graphics were prepared with CHIMERA 27.

    Evaluating Binding affinities of TkoEndoMS for various mismatch containing DNA.

    Electrophoresis mobility shift assays (EMSA) were examined to quantify the DNA binding affinity of TkoEndoMS as

    described previously 18. Various concentrations of TkoEndoMS were incubated with 5 nM 5'-Cy5-labeled dsDNA (15

    bp) which had no mismatch base pair or contained single base-pair mismatches (G-G, G-T, T-T, A-A, T-C, A-C, A-A,

    and C-C) in a reaction solution (20 mM Tris-HCl, pH 8.0, 6 mM (NH4)2SO4, 2 mM MgCl2, 100 mM NaCl, 0.1 mg/ml

    BSA, and 0.1 % Triton X-100) at 37°C for 5 min. Relatively low concentrations of protein (0.5, 1, 2.5, 5, and 10 nM as

    a dimer) were examined for preferred mismatches (G-G, G-T, and T-T), and higher concentrations were tested for

    non-preferred mismatches (A-A, T-C, A-C, A-A, and C-C) (Figure S4c). The protein-DNA complexes were

    fractionated by 8% native PAGE in 0.5 × TBE buffer. Typhoon Trio+ image analyzer and Image Quant TL software

    (GE healthcare) were used to quantify the fluorescent signal in each DNA band. The apparent Kd values were

    calculated based on a plotting of the rate of DNA binding versus the EndoMS concentration through non-linear

    regression from three independent experiments.

    Knowledge-based modeling of the TkoEndoMS-TkoPCNA-dsDNA complex. The structure of the

    TkoEndoMS-TkoPCNA-dsDNA complex was modeled by assembling the structures from PDB as has been previously

    applied for the P. furiosus DNA polymerase B-PCNA-dsDNA complex 28. Interface information for EndoMS-dsDNA,

    PCNA-dsDNA, and EndoMS-PCNA was retrieved from the crystal structures determined in this study, the E. coli DNA

    polymerase β subunit-DNA complex (PDB: 3BEP), and the P. furiosus PCNA-RFCL PIP-box complex (PDB: 1ISQ),

  • respectively 29,30.

    First, the crystal structure of the TkoEndoMS-dsDNA complex was assembled onto that of the

    PCNA-dsDNA complex by superposing the dsDNAs by gradually shifting the nucleotide segments for superposition

    (Figure 6). The dsDNA on PCNA was extended in advance by 5 base pairs to the PIP-binding side of the trimer to

    increase merging for DNA superposition. The superposed structure, in which EndoMS was placed in contact with

    PCNA, but not so close as to cause steric hindrance, was selected for the next step. Second, the PIP-box of RFCL was

    assembled to the model by superposing the PCNA subunits from the PCNA-dsDNA and PIP-box-PCNA complexes.

    Finally, the C-terminal residues, which were disordered in the crystal structures, were added to the model to complete

    the EndoMS structure by connecting the PIP-box to the C-terminal of the TkoEndoMS crystal structure (Arg240).

    Electron microscopy and single particle image analysis. The stock solutions of purified TkoEndoMS (5 µM, i.e., 2.5

    µM dimer), TkoPCNA (7.5 µM, i.e., 2.5 µM trimer), and synthetic dsDNA (2.5 µM, T-G-mismatch-DNA1’ in Figure

    S4a) were mixed in a buffer solution (50 mM Tris-HCl, 0.1 mM EDTA, 0.5 mM DDT, 10% glycerol, 0.6 M NaCl, pH

    8.0) and incubated at room temperature for 10 min.

    The sample solution was diluted 20-fold a buffer (50 mM Tris-HCl, 50 mM NaCl, pH 8.0), and 3 µL was

    applied to a copper grid supporting a continuous thin-carbon film. The sample was left for 1 min and then stained with

    three drops of 2% (w/v) uranyl acetate. Images of molecules were recorded with a TemCam-F216 CMOS camera

    (TVIPS) with a pixel size of 3.0 Å/pixel (a total of 38 images) using a JEM1010 EM (JEOL), operated at an

    accelerating voltage of 100 kV. A minimum dose system (MDS) was employed to reduce the electron radiation damage

    of the sample.

    Image analyses were carried out using the EMAN2 program suite 31. A total of 5,130 particle images of the

    assumed TkoEndoMS-TkoPCNA-DNA complex were manually selected with the e2boxer tool. No filter was applied to

    the individual images prior to image analysis. Alignment, classification, and averaging of the particle images were

    performed using the e2refine2d tool. The average number of particles per class average was 65 (Figure S6c). The model

    of the TkoEndoMS-TkoPCNA-DNA complex was used to generate density map projections with e2pdb2em and

    e2proc2d tools for comparisons.

  • Tabl

    e S1

    . Rel

    ated

    to F

    igur

    e 2.

    Str

    uctu

    res s

    imila

    r to

    N- o

    r C

    -term

    inal

    dom

    ains

    of E

    ndoM

    S in

    PD

    B

    PDB

    /Cha

    in*

    Sche

    mat

    ic a

    lignm

    ent**

    R

    MSD

    (Å)

    Iden

    tity(

    %)

    No.

    res

    Cov

    erag

    e(%

    ) N

    ote

    N-te

    rmin

    al

    2vld

    A

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ==

    - -

    - -

    Que

    ry

    1vu2

    K

    ____

    __==

    ====

    ===-

    ____

    _===

    --==

    ====

    -___

    ____

    ___=

    ====

    =-

    3.5

    7

    63

    81

    Smal

    l nuc

    lear

    ribo

    nucl

    eopr

    otei

    n E

    3sb2

    E __

    ___-

    ====

    ====

    =-__

    ___-

    ==-_

    ____

    ____

    -===

    ==-_

    _===

    ====

    3.

    5

    15

    60

    90

    RN

    A c

    hape

    rone

    Hfq

    C-te

    rmin

    al

    2vld

    A

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ==

    - -

    - -

    Que

    ry

    3n78

    A++

    _=

    ====

    =-_-

    -===

    ====

    =_==

    ====

    ====

    ====

    ====

    ====

    ====

    ____

    3.

    4

    8 91

    27

    Ty

    pe II

    rest

    rictio

    n en

    donu

    clea

    se S

    grA

    I

    2fok

    B+

    _-==

    ====

    __-=

    ==-=

    ====

    ====

    ====

    ====

    ==_=

    ====

    ====

    ====

    =-

    3.5

    11

    98

    17

    Ty

    pe II

    rest

    rictio

    n en

    donu

    clea

    se F

    okI

    3qp9

    D

    __==

    ====

    _===

    ===-

    ____

    ====

    =-==

    ====

    ====

    ====

    ====

    ====

    =-

    3.5

    8

    90

    19

    C2-

    type

    Ket

    ored

    ucta

    se

    4oc8

    A+

    =-==

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    =__-

    ====

    ====

    ====

    =_

    3.5

    12

    10

    2 26

    R

    estri

    ctio

    n en

    donu

    clea

    se A

    spB

    HI

    1cfr

    A+

    _===

    ====

    _===

    ==-=

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    =_

    3.6

    13

    10

    2 36

    R

    estri

    ctio

    n en

    donu

    clea

    se C

    fr10

    I

    3v21

    D++

    _=

    ====

    =-__

    ====

    -===

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ===_

    3.

    6

    10

    99

    34

    Type

    IIF

    rest

    rictio

    n en

    donu

    clea

    se B

    se63

    4I

    1z22

    A

    __==

    ===-

    _-==

    ====

    ____

    ====

    =-==

    ====

    ====

    ====

    ====

    ====

    ==

    3.7

    15

    90

    54

    R

    ab23

    GTP

    ase

    4im

    pA

    _-==

    ====

    _-==

    ===-

    -___

    ====

    =--=

    ====

    ====

    ====

    ====

    ====

    =-

    3.7

    10

    90

    17

    Po

    lyke

    tide

    synt

    hase

    2p14

    A+

    =-==

    ====

    _-==

    =-_-

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    =_

    3.8

    10

    99

    53

    Ty

    pe II

    S re

    stric

    tion

    endo

    nucl

    ease

    R.B

    spD

    6I

    1a2k

    E __

    ====

    =-_=

    ====

    --=_

    __==

    ===-

    -===

    ====

    ====

    ====

    ====

    ====

    3.

    9

    6 90

    43

    R

    as-f

    amily

    GTP

    ase

    Ran

    1xm

    xA

    ====

    ====

    _-==

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    -_

    3.9

    7

    104

    27

    Hyp

    othe

    tical

    pro

    tein

    VC

    1899

    3agp

    A

    _-==

    ===-

    _===

    ===-

    ____

    ====

    --==

    ====

    ====

    ====

    ====

    ====

    ==

    3.9

    7

    90

    7 Pr

    obab

    le S

    ecD

    F pr

    otei

    n-ex

    port

    mem

    bran

    e pr

    otei

    n

    3svt

    A

    __==

    ===-

    __-=

    =--=

    =-__

    -===

    ====

    ====

    ====

    ====

    ====

    ====

    ==

    3.9

    6

    90

    32

    Shor

    t-cha

    in ty

    pe d

    ehyd

    roge

    nase

    /redu

    ctas

    e

  • 3u2q

    A

    __==

    ====

    _-==

    ====

    -___

    -===

    --==

    ====

    ====

    ====

    ====

    ====

    ==

    3.9

    10

    90

    23

    El

    onga

    tion

    fact

    or T

    u 1

    4bxz

    E --

    _===

    ====

    ==__

    __==

    ====

    ====

    ====

    ===-

    _===

    ====

    ====

    ===_

    3.

    9

    7 91

    42

    RN

    A p

    olym

    eras

    e su

    buni

    t RPA

    BC1

    1dc1

    D++

    _=

    ====

    ==--

    ====

    ====

    =-==

    ====

    ====

    ====

    ====

    ====

    ====

    ===_

    4.

    0

    8 10

    2 32

    Re

    stric

    tion

    enzy

    me

    BsoB

    I

    1wtd

    B+

    ====

    ====

    ____

    -===

    ====

    ====

    ====

    ====

    ===-

    ====

    ====

    ==__

    __

    4.0

    17

    90

    33

    Ty

    pe II

    restr

    ictio

    n en

    donu

    clea

    se E

    coO

    109I

    2okf

    A

    __==

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    ====

    =-

    4.0

    13

    10

    6 82

    Fd

    xN e

    lem

    ent e

    xcisi

    on c

    ontro

    lling

    fact

    or p

    rote

    in

    2r

    eqC

    __==

    ====

    -===

    ====

    =--_

    ====

    =_-=

    ====

    ====

    ====

    ====

    ====

    =_

    4.0

    6

    93

    12

    Met

    hylm

    alon

    yl C

    oA m

    utas

    e

    * PD

    B co

    de a

    nd a

    sym

    id (c

    hain

    ID) a

    re in

    dica

    ted

    with

    ‘+’ (

    DN

    A u

    nbou

    nd) o

    r ‘++

    ’ (D

    NA

    bou

    nd) f

    or

    type

    II re

    stric

    tion

    enzy

    mes

    .

    **Th

    e re

    gion

    s of t

    he p

    rote

    ins s

    uper

    pose

    d to

    End

    oMS

    dom

    ains

    are

    indi

    cate

    d w

    ith ‘=

    ‘ (w

    ithou

    t gap

    ) or ‘

    -’ (i

    nclu

    ding

    gap

    ).

  • Conf

    .W

    atso

    nCr

    ick

    Shea

    rSt

    retc

    hSt

    agge

    rBu

    ckle

    Prop

    elle

    rO

    peni

    ngSh

    iftSl

    ide

    Rise

    Tilt

    Roll

    Twist

    Form

    Wat

    son

    Cric

    kSh

    ear

    Stre

    tch

    Stag

    ger

    Buck

    lePr

    opel

    ler

    Ope

    ning

    Shift

    Slid

    eRi

    seTi

    ltRo

    llTw

    istFo

    rmA

    C1C

    G15

    D-1

    .13

    -0.2

    10.

    22-5

    .08

    -14.

    382.

    22-

    --

    --

    --

    AG

    2CC1

    4D-0

    .53

    -0.2

    0-0

    .52

    -14.

    51-1

    5.33

    -6.3

    0-0

    .29

    0.42

    3.66

    7.15

    8.91

    43.7

    1B

    C1A

    G24

    B-0

    .42

    -0.2

    70.

    062.

    76-1

    4.20

    -3.6

    7-

    --

    --

    --

    AC3

    CG

    13D

    -1.6

    00.

    170.

    29-6

    .61

    -3.3

    28.

    080.

    65-0

    .25

    3.11

    -7.5

    01.

    7927

    .29

    BG

    2AC2

    3B-0

    .02

    -0.2

    70.

    25-4

    .46

    -10.

    85-4

    .02

    -0.3

    60.

    153.

    52-3

    .40

    6.42

    40.3

    1B

    AT4

    CA

    12D

    -0.1

    1-0

    .39

    0.10

    1.64

    -9.9

    1-6

    .11

    -0.7

    10.

    003.

    13-1

    .70

    2.98

    35.5

    2B

    C3A

    G22

    B0.

    00-0

    .25

    0.21

    -6.9

    4-3

    .93

    -2.3

    50.

    500.

    233.

    520.

    80-4

    .73

    38.1

    5B

    AA

    5CT1

    1D-0

    .17

    -0.3

    70.

    097.

    110.

    20-3

    .88

    0.39

    -0.0

    13.

    240.

    713.

    7933

    .51

    BG

    4AC2

    1B-0

    .37

    -0.4

    4-0

    .18

    9.31

    -10.

    39-1

    .30

    -0.3

    20.

    693.

    043.

    637.

    9524

    .47

    BA

    C6C

    G10

    D-0

    .51

    -0.3

    4-0

    .10

    6.38

    -11.

    90-2

    .06

    -0.6

    5-0

    .72

    3.31

    -1.7

    71.

    0734

    .52

    BA

    5AT2

    0B0.

    27-0

    .22

    0.03

    5.03

    -16.

    361.

    840.

    010.

    073.

    36-2

    .68

    3.16

    40.9

    0B

    AA

    7CT9

    D0.

    58-0

    .37

    0.29

    8.52

    -16.

    26-6

    .81

    0.21

    -0.1

    03.

    47-2

    .86

    6.62

    44.5

    2B

    A6A

    T19B

    -0.0

    9-0

    .04

    0.17

    3.54

    -18.

    135.

    560.

    10-0

    .31

    3.32

    -0.7

    00.

    9535

    .35

    BA

    T8C

    G8D

    AG

    9CC7

    D0.

    48-0

    .51

    0.44

    -1.2

    7-1

    3.66

    -3.9

    10.

    02-1

    .56

    3.58

    -2.4

    01.

    4417

    .07

    T7A

    A18

    B0.

    32-0

    .12

    0.13

    0.83

    -17.

    707.

    930.

    33-0

    .60

    3.34

    1.83

    -2.7

    534

    .76

    BA

    T10C

    A6D

    -0.7

    7-0

    .24

    -0.2

    51.

    80-1

    5.84

    1.80

    0.13

    -0.5

    93.

    127.

    968.

    8431

    .28

    T8A

    A17

    B0.

    25-0

    .21

    -0.1

    0-1

    .33

    -17.

    670.

    83-0

    .31

    -0.1

    83.

    322.

    960.

    7335

    .39

    BA

    C11C

    G5D

    -0.3

    10.

    12-0

    .13

    -7.0

    2-7

    .51

    3.26

    0.96

    0.14

    3.60

    1.69

    1.23

    41.0

    3B

    C9A

    G16

    B-0

    .02

    -0.2

    5-0

    .06

    -10.

    18-1

    7.25

    -0.8

    70.

    02-0

    .03

    3.39

    0.33

    -0.0

    539

    .27

    BA

    G12

    CC4

    D-0

    .90

    0.09

    0.24

    -0.1

    9-8

    .89

    3.23

    -0.4

    40.

    013.

    13-5

    .84

    6.99

    27.4

    0B

    G10

    AC1

    5B0.

    09-0

    .28

    0.27

    1.67

    -5.3

    1-1

    .13

    0.38

    0.86

    3.24

    -3.2

    93.

    8629

    .40

    BA

    T13C

    A3D

    -0.0

    2-0

    .14

    -0.0

    111

    .06

    -8.9

    5-4

    .56

    -0.2

    6-0

    .42

    3.05

    1.56

    6.07

    32.9

    5B

    C11A

    G14

    B0.

    07-0

    .28

    0.59

    -3.9

    6-1

    8.05

    -5.6

    2-1

    .30

    0.42

    3.68

    -4.6

    8-1

    2.20

    40.7

    8B

    AC1

    4CG

    2D0.

    17-0

    .29

    -0.0

    40.

    68-7

    .05

    -5.6

    5-0

    .02

    -0.3

    33.

    52-0

    .41

    3.73

    35.7

    7B

    G12

    AC1

    3B-0

    .53

    -0.1

    10.

    266.

    601.

    96-3

    .86

    0.77

    0.06

    3.23

    3.14

    -3.0

    932

    .62

    BA

    C15C

    G1D

    0.22

    -0.4

    2-0

    .02

    7.83

    -17.

    292.

    661.

    24-0

    .23

    3.16

    3.66

    7.84

    38.2

    8

    BG

    1CC1

    5D0.

    88-0

    .16

    0.15

    -16.

    59-7

    .85

    -10.

    70-

    --

    --

    --

    BG

    2CC1

    4D-0

    .85

    -0.3

    8-0

    .45

    -16.

    64-1

    6.11

    -7.5

    50.

    06-0

    .50

    3.44

    9.19

    8.43

    29.4

    4B

    A3C

    T13D

    0.12

    -0.2

    50.

    18-3

    .52

    -11.

    326.

    300.

    290.

    102.

    97-6

    .12

    0.41

    37.5

    0B

    BC4

    CG

    12D

    0.58

    0.06

    0.12

    -3.1

    2-0

    .76

    3.49

    0.10

    -0.3

    03.

    350.

    452.

    1033

    .79

    BB

    G5C

    C11D

    -0.4

    10.

    020.

    095.

    70-5

    .31

    5.95

    0.50

    0.01

    3.08

    2.60

    8.58

    28.1

    9B

    BA

    6CT1

    0D0.

    070.

    18-0

    .15

    5.60

    -11.

    810.

    00-1

    .07

    0.18

    3.42

    0.10

    -0.9

    338

    .67

    BB

    C7C

    G9D

    0.34

    0.14

    0.44

    2.97

    -13.

    281.

    640.

    26-0

    .43

    3.39

    -6.2

    17.

    6838

    .10

    BG

    8CT8

    D

    BT9

    CA

    7D0.

    120.

    130.

    11-7

    .70

    -18.

    01-2

    .61

    0.16

    -1.4

    53.

    554.

    333.

    1815

    .36

    BG

    10C

    C6D

    0.18

    -0.0

    8-0

    .11

    -4.7

    9-9

    .37

    0.01

    -0.2

    30.

    073.

    361.

    998.

    1040

    .50

    BB

    T11C

    A5D

    0.49

    -0.3

    70.

    00-3

    .53

    -3.6

    1-4

    .16

    0.50

    -0.5

    33.

    300.

    11-0

    .88

    37.2

    6B

    BA

    12C

    T4D

    0.43

    -0.3

    3-0

    .10

    3.28

    -6.6

    2-3

    .23

    -0.4

    1-0

    .27

    3.13

    1.60

    9.30

    31.4

    9B

    BG

    13C

    C3D

    -0.7

    2-0

    .18

    0.45

    6.25

    -3.5

    62.

    510.

    16-0

    .72

    3.13

    -5.4

    52.

    7126

    .21

    BB

    C14C

    G2D

    0.65

    -0.1

    00.

    28-1

    .70

    -8.1

    04.

    100.

    33-0

    .48

    3.53

    1.79

    2.12

    40.4

    4B

    G15

    CC1

    D-0

    .33

    -0.1

    10.

    4413

    .00

    -15.

    653.

    35-0

    .22

    -0.4

    02.

    89-1

    .72

    12.1

    925

    .95

    Cano

    nica

    l B-fo

    rm d

    sDN

    A (1

    bna)

    Endo

    MS-

    boun

    d ds

    DN

    A (5

    gke)

    Supp

    lem

    enta

    ry T

    able

    S2.

    Rel

    ated

    to F

    igur

    e 3.

    DN

    A ba

    se-p

    air

    and

    base

    -ste

    p pa

    ram

    ters

    of E

    ndoM

    S-bo

    und

    and

    cano

    nica

    l B-fo

    rm D

    NA

    s*

    * Ba

    se-p

    airs

    supe

    rpos

    ed in

    Fig

    ures

    3c

    and

    S3e

    are

    alig

    ned

    in e

    ach

    low.

    Con

    f. sh

    ows a

    ltena

    tive

    conf

    orm

    atio

    ns A

    and

    B. B

    ases

    (in

    Wat

    son

    and

    Cric

    k str

    ands

    ) are

    indi

    cted

    as b

    ase,

    resid

    ue n

    umbe

    r, an

    d ch

    ain

    ID. F

    orm

    indi

    cate

    s A, B

    , or o

    ther

    (bra

    nk) f

    orm

    of d

    oubl

    e-he

    lix.

  • Supplementary Figure S1. Related to Figure 2. Alignment of putative EndoMS orthologs

    Sequence alignment of the selected EndoMS homologs. The sequences are indicated as gene ID, genus, and species,

    followed by higher order taxonomy, where Arc and Bac represent kingdom Archaea and Eubacteria, respectively. Eur,

    Cre, Tha, Act, and DeiThe indicate the phyla Euryarchaeota, Crenarchaeota, Thaumarchaeota, Actinobacteria, and

    Deinococcus-Thermus, respectively (species names are truncated in the alignment). Invariant amino acids are shaded,

    and the residues consistent with the signature motif of type II restriction enzymes are underlined in the alignment.

  • a

    b

  • Supplementary Figure S2. Related to Figure 2. Molecular phylogeny and genome structure of putative EndoMS

    orthologs

    (a) Molecular phylogeny of the selected EndoMS homologs. The sequences are indicated as gene ID, genus, and species,

    followed by higher order taxonomy, where Arc and Bac represent kingdom Archaea and Eubacteria, respectively. Eur,

    Cre, Tha, Act, and DeiThe indicate the phyla Euryarchaeota, Crenarchaeota, Thaumarchaeota, Actinobacteria, and

    Deinococcus-Thermus, respectively (species names are truncated in the alignment). Invariant amino acids are shaded,

    and the residues consistent with the signature motif of type II restriction enzymes are underlined in the alignment. (b)

    Probable EndoMS operons in Thermococcus genomes are shown for T. kodakaraensis KOD1, T. gammatolerans, T.

    oonurience, T. sp. strain 4557, T. sibiricus, and T. barophilus MP. Genes are shown as arrows, and possible operons are

    boxed. Name of proteins encoded by the genes are abbreviated as MTAP (5'-methylthioadenosine phosphorylase),

    MTAD (S-adenosylhomocysteine deaminase), SK channel (calcium-gated potassium channel), GK (glycerate kinase),

    and CDC6 (cell division control protein 6).

  • Supplementary Figure S3. Related to Figure 2. Structures of EndoMS and type II restriction enzymes

    (a) Amino acid sequence alignment of TkoEndoMS and type II restriction enzymes with similar catalytic domain

    structures, namely, the type II restriction endonuclease SgrAI (3n78_A), type IIF restriction endonuclease Bse634I

    (3v21_D), and restriction enzyme BsoBI (1dc1_D). The secondary structures of each protein are indicated with yellow

    and green boxes for β-sheets and α-helices, respectively. The residues consistent with the signature motif of the type II

    restriction enzyme are indicated with asterisks. (b) The quaternary structures of type II restriction enzymes and

    TkoEndoMS are shown on top. The structures are superposed at the catalytic domains. The corresponding C-terminal

    catalytic domains (green and pink), N-terminal dimerizing domains (blue and tan), and dsDNA (red and orange) are

    colored for each protein. The superposition between catalytic domains of EndoMS (blue) and restriction enzymes

    (green, tan, and sky blue for 3n78, 3v21, and 1dc1, respectively) are shown at the bottom. Catalytic residues of

    EndoMS are shown as ball-and-stick models.

  • Supplementary Figure S4. Related to Figure 3. Sequence and affinity of DNAs for EndoMS substrates

    (a) Nucleotide sequences of the dsDNAs cocrystallized in X-ray analyses or complexed in EM analyses with

  • TkoEndoMS. The mismatch bases are indicated as bases not in-line with the rest of the sequence, and the cleavage

    positions are indicated with vertical lines. A comparison among the background sequences of DNA1, DNA2, and

    DNA3 is shown on the right. The strands were selected such that base identities were as high as possible. (b)

    Representative gel image for evaluating binding affinities of TkoEndoMS for DNA containing G-G, G-T, or T-T

    mismatch, which are preferably recognized by the enzyme. The mismatched base pair and the protein concentrations

    were shown above, and apparent Kd values were indicated below the gel image. (c) Representative gel image for

    evaluating binding affinities of TkoEndoMS for DNA containing A-G, T-C, A-C, A-A, or C-C mismatches, which were

    not bound by the enzyme.

  • Supplementary Figure S5. Related to Figure 3. DNA bound-structures of EndoMS

    (a) The electron density map of TkoEndoMS in complex with G-T-mismatch-DNA1 contoured at 1.2 s is shown on the

    model. The atoms are colored as in Figure 1a. (b) The omit (Fo - Fc) maps of TkoEndoMS in complex with

    G-T-mismatch-DNA1 contoured at 5.0 s are shown on the model. The nucleotides at 5′ and 3′ sides of cleavage bond

    and Mg2+ ion were omitted for the map shown in green. The mismatched nucleotides (only one of them is in sight) were

    omitted for the map shown in magenta. (c) 2Fo - Fc map contoured at 1.2 s around the binding site of flipped bases

    (G8A and T8B in alternative dsDNA models). (d) The omit (Fo - Fc) map of the same region as panel c contoured at

    3.7 s, where T8B was omitted. (e) The omit (Fo - Fc) map of the same region as panel c contoured at 3.7 s, where G8A

    was omitted. (f) The map of residual electron density contoured at 3.5 s around T9A-A7A or G9B-C7B pair. The

    carbon atoms in the bases in alternative dsDNA model A (T9A-A7A) and B (G9B-C7B, which were not modeled at this

    point) are shown in gray and white, respectively. (g) Alternative binding of dsDNA on TkoEndoMS. The same dsDNA

    models were bound in opposite directions to each other on TkoEndoMS homo-dimer, and colored in red (alternative

    model A) and yellow (alternative model B). (h) Superposition of TkoEndoMS-dsDNA complex structures. The proteins

    in T-G-mismatch-DNA1, T-T-mismatch-DNA1, G-G-mismatch-DNA1, T-G-mismatch-DNA2, and

    T-G-mismatch-DNA3 complexes are shown in light blue, tan, light green, pink, and gray, respectively. (i) A model of

    dsDNA in B-form (PDB: 1BNA, colored in blue and sky) was superposed onto the bound dsDNA

    (T-G-mismatch-DNA1, red and yellow) by ignoring the flipped out mismatched bases. The base-pair and base-step

    parameters were compared between the dsDNAs and summarized in Table S2, which showed the dsDNA bound to

    TkoEndoMS is in B-from as an overall.

  • Figure S6. Related to Figure 4. Electron microscopic images and class averages of the TkoEndoMS-TkoPCNA-

    dsDNA complex

    (a) An electron micrograph of the negatively stained TkoEndoMS-TkoPCNA-dsDNA complex (scale bar represents

    100 nm). A large aggregate and several isolated particles (enclose in boxes with white lines) are noticeable. (b) Close

    up image of the boxed area (black line) in panel a. (c) Two-dimensional class averages of the selected particles. The

    size of each individual image is 19.2 × 19.2 nm2. The class averages indicated with white dots were compared with the

    model projections in Figure 4c.

  • SUPPLEMENTARY REFERENCES

    1. UniProt, C. Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res 42, D191-8 (2013).

    2. Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat Struct Biol 10,

    980 (2003).

    3. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J Mol

    Biol 215, 403-10 (1990).

    4. Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol

    Biol Evol 4, 406-25 (1987).

    5. Thompson, J.D., Higgins, D.G. & Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive

    multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix

    choice. Nucleic Acids Res 22, 4673-80 (1994).

    6. Aggarwal, A.K. Structure and function of restriction endonucleases. Curr Opin Struct Biol 5, 11-9 (1995).

    7. Aravind, L., Makarova, K.S. & Koonin, E.V. SURVEY AND SUMMARY: holliday junction resolvases and

    related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic

    Acids Res 28, 3417-32 (2000).

    8. Ren, B. et al. Structure and function of a novel endonuclease acting on branched DNA substrates. EMBO J 28,

    2479-89 (2009).

    9. Chan, P.P., Holmes, A.D., Smith, A.M., Tran, D. & Lowe, T.M. The UCSC Archaeal Genome Browser: 2012

    update. Nucleic Acids Res 40, D646-52 (2011).

    10. Taboada, B., Ciria, R., Martinez-Guerrero, C.E. & Merino, E. ProOpDB: Prokaryotic Operon DataBase.

    Nucleic Acids Res 40, D627-31 (2011).

    11. Fukui, T. et al. Complete genome sequence of the hyperthermophilic archaeon Thermococcus kodakaraensis

    KOD1 and comparison with Pyrococcus genomes. Genome Res 15, 352-63 (2005).

    12. Lee, H.S. et al. The complete genome sequence of Thermococcus onnurineus NA1 reveals a mixed

    heterotrophic and carboxydotrophic metabolism. J Bacteriol 190, 7491-9 (2008).

    13. Mardanov, A.V. et al. Metabolic versatility and indigenous origin of the archaeon Thermococcus sibiricus,

  • isolated from a siberian oil reservoir, as revealed by genome analysis ProOpDB: Prokaryotic Operon DataBase.

    Appl Environ Microbiol 75, 4580-8 (2009).

    14. Vannier, P., Marteinsson, V.T., Fridjonsson, O.H., Oger, P. & Jebbar, M. Complete genome sequence of the

    hyperthermophilic, piezophilic, heterotrophic, and carboxydotrophic archaeon Thermococcus barophilus MP. J

    Bacteriol 193, 1481-2 (2011).

    15. Zivanovic, Y. et al. Genome analysis and genome-wide proteomics of Thermococcus gammatolerans, the most

    radioresistant organism known amongst the Archaea. Genome Biol 10, R70 (2009).

    16. Franceschini, A. et al. STRING v9.1: protein-protein interaction networks, with increased coverage and

    integration. Nucleic Acids Res 41, D808-15 (2012).

    17. Orchard, S. et al. The MIntAct project--IntAct as a common curation platform for 11 molecular interaction

    databases. Nucleic Acids Res 42, D358-63 (2013).

    18. Ishino, S. et al. Identification of a mismatch-specific endonuclease in hyperthermophilic archaea. Nucleic

    Acids Res 44, 2977-2986 (2016).

    19. Otwinowski, Z. & Minor, W. Processing of X-ray Diffraction Data Collected in Oscillation Mode. in Methods

    in Enzymology, Volume 276: Macromolecular Crystallography, part A, Vol. 276 (eds. C.W. Carter, J. & Sweet,

    R.M.) 307-326 (1997).

    20. Battye, T.G., Kontogiannis, L., Johnson, O., Powell, H.R. & Leslie, A.G. iMOSFLM: a new graphical

    interface for diffraction-image processing with MOSFLM. Acta Crystallogr D Biol Crystallogr 67, 271-81

    (2011).

    21. Adams, P.D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution.

    Acta Crystallogr D Biol Crystallogr 66, 213-21 (2010).

    22. Vagin, A. & Teplyakov, A. Molecular replacement with MOLREP. Acta Crystallogr D Biol Crystallogr 66,

    22-5 (2010).

    23. Emsley, P., Lohkamp, B., Scott, W.G. & Cowtan, K. Features and development of Coot. Acta Crystallogr D

    Biol Crystallogr 66, 486-501 (2004).

    24. Laskowski, R.A., Moss, D.S. & Thornton, J.M. Main-chain bond lengths and bond angles in protein structures.

  • J Mol Biol 231, 1049-67 (1993).

    25. Zheng, G., Lu, X.J. & Olson, W.K. Web 3DNA--a web server for the analysis, reconstruction, and

    visualization of three-dimensional nucleic-acid structures. Nucleic Acids Res 37, W240-6 (2009).

    26. Drew, H.R. et al. Structure of a B-DNA dodecamer: conformation and dynamics. Proc Natl Acad Sci U S A 78,

    2179-83 (1981).

    27. Pettersen, E.F. et al. UCSF Chimera--a visualization system for exploratory research and analysis. J Comput

    Chem 25, 1605-12 (2004).

    28. Mayanagi, K. et al. Architecture of the DNA polymerase B-proliferating cell nuclear antigen (PCNA)-DNA

    ternary complex. Proc Natl Acad Sci U S A 108, 1845-9 (2011).

    29. Georgescu, R.E. et al. Structure of a sliding clamp on DNA. Cell 132, 43-54 (2008).

    30. Matsumiya, S., Ishino, S., Ishino, Y. & Morikawa, K. Physical interaction between proliferating cell nuclear

    antigen and replication factor C from Pyrococcus furiosus. Genes Cells 7, 911-22 (2002).

    31. Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J Struct Biol 157, 38-46

    (2007).