1
Solution structure of Gaussia Luciferase with five disulfide bonds 1
and identification of a putative coelenterazine binding cavity by 2
heteronuclear NMR 3
4
Nan Wu1,$, Naohiro Kobayashi2,$, Kengo Tsuda3, Satoru Unzai4, Tomonori Saotome5, 5
Yutaka Kuroda5,* and Toshio Yamazaki2,* 6
7
1 College of Food and Biological Engineering, Zhengzhou University of Light Industry, 136 8
Kexue Road, Zhengzhou 450002, P. R. China. 9
2 NMR Science and Development Division, RSC, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, 10
Yokohama City, Kanagawa 230-0045, Japan. 11
3 Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 12
1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan. 13
4 Department of Frontier Bioscience, Faculty of Bioscience and Applied Chemistry, Hosei 14
University, 3-7-2, Kajino-cho, Koganei-shi, Tokyo, 184-8584, Japan. 15
5 Department of Biotechnology and Life Science, Graduate School of Engineering, 16
Tokyo University of Agriculture and Technology, 2-24-16 Nakamachi, Koganei-shi, Tokyo 17
184-8588, Japan. 18
$ Equal contribution 19
* Correspondence: YK: [email protected] (Tel: +81-42-388-7794) and TY : 20
[email protected] (Tel: +81-45-503-9262) 21
2
Keywords: SEP tag, in vitro refolding, disulfide bonds, helical protein, hydrophobic cavity, 22
intrinsically disordered region (IDR) 23
24
Database: The chemical shifts have been deposited in the Biological Magnetic Resonance 25
Bank (BMRB) under the accession No.36288, and the atomic coordinates are deposited in the 26
Protein Data Bank under accession number PDB-ID: 6KYN. The expression vector for 27
GLuc-TG (p21GLucTG) is deposited in Addgene (ID:124660). 28
29
Conflicts of interests: The authors declare no conflicts of interests. 30
31
3
Abstract 32
Gaussia luciferase (GLuc) is the smallest luciferase (18.2kDa; 168 residues) reported so 33
far and is thus attracting much attention as a reporter protein, but the lack of structural 34
information is hampering further application. Here, we report the first solution structure of a 35
fully active, recombinant GLuc determined by heteronuclear multidimensional NMR. We 36
obtained a natively folded GLuc by bacterial expression and efficient refolding using a 37
solubility tag. Almost perfect assignments of GLuc’s 1H, 13C and 15N backbone signals were 38
obtained. GLuc structure was determined using CYANA, which automatically identified over 39
2500 NOEs of which > 570 were long-range. GLuc is an all-alpha-helix protein made of nine 40
helices. The region spanning residues 10–18, 36-81, 96-145 and containing eight out of the 41
nine helices was determined with a Cα-atom RMSD of 1.39 ű 0.39 Å. The structure of GLuc 42
is novel and unique. Two homologous sequential repeats form two anti-parallel bundles made 43
by 4 helices and tied together by three disulfide bonds. The N-terminal helix 1 is grabbed by 44
these 4 helices. Further, we found a hydrophobic cavity where several residues responsible 45
for bioluminescence were identified in previous mutational studies, and we thus hypothesize 46
that this is a catalytic cavity, where the hydrophobic coelenterazine binds and the 47
bioluminescence reaction takes place. 48
4
Introduction 49
Luciferase (Luc) is a generic term for bioluminescent enzymes that catalyze the 50
oxidation of a substrate, often termed luciferin [1]. Together with GFP, Luc is widely 51
employed as a reporter protein [2–4]. Gaussia Luciferase (GLuc) is a luciferase isolated from 52
the marine Gaussia princeps [5], which catalyzes a bright blue light by oxidizing 53
coelenterazine. GLuc is the smallest luciferase reported so far with a molecular mass of 18.2 54
kDa (excluding the secretion tag). Nonetheless, its bioluminescence intensity is strong (200 55
fold higher than Firefly Luciferase and Renilla Luciferase, the two most widely used 56
luciferase), and it is thus considered as a potential ideal reporter protein [6]. Attempts to 57
improve or redesign GLuc’s bioluminescence characteristics included the lengthening of its 58
half-life luminescence [7–9], and the redshift of its light emission peak at 480 nm [10,11], 59
which is absorbed by tissues during in vivo applications [12]. However, structural information 60
at atomic resolution is still not available, making the redesign process tedious. 61
GLuc contains 10 cysteines, and previous studies demonstrated that the natively folded 62
GLuc contains five disulfide bonds. The presence of 5 disulfide bonds increases the risks of 63
misfolding when GLuc is bacterially produced, resulting in a low yield [13]. In order to 64
overcome this misfolding problem, several methods including fusion with pelB leader 65
sequence [7,14], cell-free systems [15], low-temperature expression [16] were reported, but 66
the yield of natively folded GLuc remained insufficient for high-resolution structural studies. 67
We previously developed a Solubility Enhancement Peptide tag (SEP tag [17–19]). We 68
showed that by attaching a SEP tag containing nine aspartic acids to GLuc’s C-terminus, we 69
5
could increase the solubility of GLuc, resulting in a spontaneous refolding and the formation 70
of native SS-bonds. Indeed, we obtained nearly 1mg of soluble and functional GLuc from a 71
200 ml of E.coli cultured in Luria-Bertani (LB) [11,13,20]. 72
Here, we used the SEP-tag fused GLuc construct to produce a sufficient amount of 15N 73
and 13C uniformly labeled GLuc for NMR studies. Heteronuclear multidimensional NMR 74
spectroscopy enabled over 99% backbone 1H, 13C, and 15N chemical shifts of GLuc to be 75
assigned. Flexible regions and highly stable regions were identified by 1H-15N heteronuclear 76
NOE [21] and H/D exchange experiments [22]. The three-dimensional structure calculated by 77
using CYANA (ver 3.98 [23]) were determined with a backbone (Cα ) RMSD of 78
1.39ű0.39Š(excluding residues in the flexible regions). 79
6
Results 80
Expression and purification of GLuc 81
The natively folded GLuc possesses ten cysteines that form five disulfide bonds, which 82
can be easily misformed when the protein is expressed in E.coli, and the cysteines are 83
air-oxidized in vitro. Here, we used a SEP-Tag, C9D, which solubilizes the protein during 84
air-oxidization and refolding, thereby increasing the yield of natively folded and active GLuc 85
[20]. The final yield of GLuc after tags cleavage and two times HPLC purification (Fig. S1) 86
was 1.5 mg per liter of M9 minimal medium culture, which was sufficient for NMR analysis. 87
GLuc’s identity was confirmed by MALDI-TOF mass (15N labeled GLuc, calculated=88
19055.8 Da, experimental=19062.5 Da, Fig. S2). To date, the yield of natively folded active 89
GLuc is almost nil when expressed without the C9D tag [20], and the solubilization tag was 90
thus essential to achieve the present amount of protein, though it was removed once the 91
protein was folded into its native conformation. 92
93
NMR analysis 94
The 1H-15N HSQC spectrum exhibited dispersed and sharp peaks (Fig. 1), indicating a 95
stable and well-folded structure. Almost all backbone chemical shifts were visible in the 96
heteronuclear NMR experiments, and over 99% of backbone 1H, 13C and 15N resonances of 97
non-proline residues were unambiguously assigned. C136 was the only un-assigned backbone 98
H-N chemical shifts. The broadened signals around residue C136 suggested that the region 99
encompassing the C136/C148 SS-bond was subjected to structural exchange, as suggested by 100
7
the 15N relaxation dispersion of D138 and L140 (Table 1), and thus the C136 H-N pair was 101
undetected. 102
103
104Fig. 1. 2D 1H-15N HSQC spectrum of GLuc. The peak assignments are shown using the 105one-letter code followed by the residue number. Resonance assignments are numbered starting at the 106first residue (lysine) behind the secretion tag, which was removed without affecting the 107bioluminescence activity. Mutations at E100A and G103R do not affect activity and are described in 108our prior paper [13]. The inset in the right bottom (marked AUC) shows the result of sedimentation 109equilibrium experiments of GLuc protein (concentration at 0.3 mg/mL). Scans from three different 110rotor speeds (●: 12,000 rpm; ■: 22,000 rpm; ▼: 37,000 rpm) monitored at 280nm. The lines represent 111the fit to a single species model. The determined molecular weight was 22 kDa, corresponding to a 112monomer. 113
114
115
116
117
118
8
Residue R2(50 Hz)-R2(1 kHz)
1/s
V12 6.66
S16 4.84 T21 2.47
D26 2.12
G28 5.60
L37 2.88
A47 2.61
S61 6.93
M69 4.87
K70 2.07
G75 4.78
T79 6.43
T125 2.26
D138 4.46 L140 4.07
T150 4.46
A152 10.85
119Table 1. R2-dispersion experiments on GLuc. Differences of effective 15N transverse relaxation 120rates at CPMG rates of 50 Hz and 1 kHz are listed only for peaks that are resolved and their rate 121difference > 2 1/s. Larger difference was observed for residues in the C-terminal flexible region 122(indicated by bold letters) indicative of its structural dynamics. The two-state model analysis of CPMG 123rates of 50, 100, 150, 200, 250, 300, 400, 500, 600, 800, 1000 Hz showed that the estimated exchange 124rate was 2500 +/- 800 1/s. Because the residues showing R2 dispersion spread around several blocks, 125we judged farther residue-specific analysis using single exchange rate is unreliable. 126 127
128
129
130
131
132
133
9
The side-chain atoms were automatically assigned by FLYA [24] (a function of 134
CYANA) using the aliphatic atoms identified in the 3D HCCH-TOCSY, 15N- and 13C-edited 135
NOESY spectra. The assignments were confirmed by visual inspection and when necessary 136
corrected manually using the NMR spectra viewer and analyzer MagRO [25,26].We assigned 137
over 82.4% of 1H, 13C and 15N atoms of entire GLuc molecule. 138
The secondary structure elements were analyzed by TALOS+ using the 1H, 13C, 15N 139
chemical shifts (Fig. 2A). TALOS+ indicated that GLuc contains 36.9% helix and 4.7% 140
sheets, in reasonable agreement with our previous prediction based on the consensus of seven 141
publicly available secondary structure predictors (30% helix and 4% sheets) as well as with 142
the results of our Circular Dichroism (CD) analysis (30% helix and 12% sheets) [13]. In 143
addition, the location of helices calculated by TALOS+ and the secondary structure prediction 144
mostly overlapped (Fig. S3). 145
146
147
148
149
150
151
152
153
10
154Fig. 2. Residue-resolved structural and dynamics features of GLuc. (A) GLuc’s Secondary 155structure predicted by TALOS+: α-helix and β-sheet tendency are shown with solid and open bars, 156respectively. (B) H/D exchange experiments. Residues that retained resonance signal after incubation 157in D2O after 20 minutes and 18 hours were marked with open bars and solid bars, respectively (1H-15N 158HSQC figures are shown in Fig. S5). (C) 1H-15N heteronuclear NOE experiment data used to assess 159GLuc backbone flexibility: 1H-15N heteronuclear NOE are shown with solid bars. The NOE values of 160residues that were not identified were assumed using the average value of the preceding and following 161residues are shown with open bars. Flexible regions of GLuc were identified with the threshold value 162of 0.5. (D) Backbone Cα displacement from the representative structures is calculated using nineteen 163NMR-derived structures, and the error bars show standard deviations. (E) GLuc’s amino acid sequence 164and secondary structure that identified from the representative structure. 165
166
167
168
11
Structure calculation and disulfide bond determination 169
Since GLuc was a monomer as demonstrated by AUC, all NOEs were used as 170
intramolecular NOEs (Fig. 1). The statistics of NOEs assigned during the 19 CYANA runs 171
are shown in Table 2. Even though distance constraints for hydrogen bonds, disulfide bonds, 172
and some manually assigned NOEs were included in the CYANA calculations in addition to 173
the standard automatically assigned NOEs, the target functions were reasonably small 174
(4.07+/-0.59), indicating that the resulting structures were consistent with the experimental 175
data. 176
Ellman’s assay indicated that all ten cysteines (C52, C56, C59, C65, C77, C120, C123, 177
C127, C136, and C148) are oxidized in the active GLuc, and thus that they should form five 178
disulfide bonds. Three disulfide bonds C59/C120, C65/C77, and C136/C148 were 179
unambiguously visible in the NMR structures, but the pairing of the remaining four cysteines 180
(C52, C56, C123, and C127), which were close to each other, was less straightforward to 181
determine. In order to determine the remaining two disulfide bonds, we set the distance 182
between the gamma sulfur (Sγ) to > 2Å so that any cysteine could freely combine with any of 183
the remaining three cysteines. We then identified cysteine pairs with Sγ distance < 3Å in the 184
380 structures obtained from 19 rounds. As a result, C52/C127 and C56/C123 were the most 185
favored pairs and were observed in 92.4% and 56.1% of the calculated structures, respectively. 186
On the other hand, C52/C56 and C123/C127 were observed in only 13.7% and 19.5% of the 187
structures. 188
189
12
190 191 192
193Table 2. Structural statistics for the nineteen best NMR-derived GLuc structures. 19419 rounds of CYANA calculation with different random seed [36]. * The averaged numbers of NOEs 195and their standard deviations are calculated over the 19 rounds of CYANA calculations. ** Averaged 196over the 18 structures (except for the representative structure). 197
198
199
200
201
202
NOE distance restraints*
All 2573.4 +/- 42.4 Intra residue (|i-j|=0) 565.9 +/- 14.7 Sequential (|i-j|=1) 728.6 +/- 6.6
Medium-range (2≤|i-j|≤4) 700.5 +/- 18.1 Long-range (|i-j|>5) 579.7 +/- 32.2
Dihedral angle 183 Hydrogen bonds 25 Disulfide bonds 3
RMSD to the representative structure**
Backbone Cα (residues 10–18, 36-81, 96-145) 1.39 Å +/- 0.39 Å Heavy atoms (residues 10–18, 36-81, 96-145) 1.84 Å +/- 0.20 Å
Backbone Cα (helices without α2) 1.30 Å +/- 0.36 Å Heavy atoms (helices without α2) 1.74 Å +/- 0.45 Å
Ramachandran plot (representative structure)
Residues in most favored regions 71.8 % Residues in additionally allowed regions 20.4 % Residues in generously allowed regions 4.9 %
Residues in disallowed regions 2.8 %
13
Overall fold and dynamic features of GLuc 203
The structure with the lowest overall target function among the 20 NMR-derived 204
structures in each round was selected. Nineteen structures from 19 rounds were used for 205
further analysis. Among the nineteen structures, seven structures formed the putatively 206
correct disulfide bonds (C52/C127, C56/C123, C59/C120, C65/C77, C136/C148). We finally 207
selected the structure with the lowest average pairwise RMSD (against all other eighteen 208
structures) as the representative structure. This structure also forms the putatively correct 209
disulfide bonds. 210
The nineteen superimposed NMR-derived structures with the lowest overall target 211
function show that GLuc has nine helices (α1-α9, Fig. 2E and Fig. 3), and the location of all 212
helices essentially corroborate the TALOS+ prediction except for α2 (Fig. 2A and Fig. 2E). 213
The N- (residues 1-9) and C-terminus (residues 146-168) of GLuc are highly disordered. 214
GLuc’s main structure is formed by residues 10-145, in which the structure of residues10-18, 215
35-81 and 97-145 were well-defined with an average backbone RMSD to the representative 216
structure for all other eighteen structures of 1.39Å (Table 2). Residues 19-34 and residues 217
82-96 are highly disordered and can be considered as intrinsically disordered regions (IDR 218
[27], Fig. 2D, and Fig. 3). The structures of helices α1 and α3-α9 were well-defined with an 219
average backbone RMSD of 1.30Å (Table 2). It has been reported that α3-loop-α4-loop-α5 220
(α3-α5, residues 37-72) and α7-loop-α8-loop-α9 (α7-α9, residues 109-143) are repeat 221
sequences [13,28]. The structure analysis shows that GLuc’s two repeat sequences are 222
connected by the second IDR (residues 82-96) and form an anti-parallel bundle (α3+α8 pair 223
14
and α4+α7 pair) that surrounds the N-terminal α1 helix. The anti-parallel bundles are firmly 224
tied by three disulfide bonds (C52/C127, C56/C123, and C59/C120), resulting in a high local 225
stability (Fig. 3). Moreover, all residues in the well-defined region exhibited 1H-15N 226
heteronuclear NOE values larger and more uniform than residues in the N- and C- terminus or 227
in the two IDRs, confirming that the well-defined regions obtained by calculation were 228
consistent with the rigid regions determined by HN NOE values (Fig. 2C and Fig. 2D). 229
230
15
231Fig. 3. Overall fold of GLuc (residues 10-148) determined by NMR. (A) Wire model of 232nineteen superimposed NMR-derived structures with the lowest target function. Helices were shown in 233red and loops in black. (B) Ribbon model of the representative structure. The nine helices of GLuc are 234marked from α1 to α9. (C) Ribbon model of the representative structure with the five disulfide bonds 235colored in green. Two IDRs are in cyan. (D) Ribbon model of the representative structure with its two 236moieties. The tightly packed moiety (residues 52-123) is shown in orange, whereas the loosely packed 237moiety (residues 19-51, 124-151) is shown in yellow. The central helix, α1 (residues 1-18), is in white. 238Residues R76, Q112 and Q116 are in blue, F104 and F113 are in magenta, and T66 and T124 are in 239purple. 240
16
Discussion 241
The structure of GLuc is novel, as we detected no similar structures in the Protein Data 242
Bank using DALI [29] (Fig. S4). It is even quite different from the structures of Renilla 243
luciferase (RLuc) [30], Oplophorus Luciferase (OLuc) [31] and apoaequorin [32], which like 244
GLuc uses coelenterazine as a substrate and are ATP independent luciferases. The 245
anti-parallel bundle of helices, which exhibits pseudo 2-fold symmetry, in the GLuc fold can 246
be divided into two moieties. Though both showed well-defined backbone structures, the 247
experimental data indicated differences in the side chain packing stability. The side chains of 248
residues 52-123 are tightly packed whereas those of residues19-51 and 124-151 are loosely 249
packed (Fig. 3D). The high stability of the former one reveals good agreement with the 250
residues showing extreme low H/D exchange rates, whereas residues in the latter one 251
exhibited high H/D exchange rate indicative of a low stability (Fig. 2B and Fig. S5). In the 252
tightly packed moiety, we found several hydrophilic residues with well-determined side-chain 253
structures. For instance, the chemical shifts of R76-Hε, Q112-Hε 1/2, and Q116-Hε 1/2 were 254
clearly different from averaged values observed in a flexible side chain. Furthermore, many 255
NOEs were assigned to these atoms corroborating the fact that these side chains are involved 256
in hydrogen bonds stabilizing the tightly packed moiety. Interestingly the side-chains of R76 257
and Q112 are stacked to the aromatic rings of F113 and F104, respectively, apparently 258
shifting NMR signals of these protons from ring current effect (Fig. 3D). Finally, the 259
hydroxyl protons of T66 and T124 were also clearly visible, suggesting that they are involved 260
17
in hydrogen bonds and thus in the N-terminal capping of helices α5 and α8, respectively (Fig. 261
3D). 262
Surface accessible analysis of the representative structure indicated a noticeable cavity 263
located among the central α1, α4 and α7 (Fig. 4A, Fig. 4B and Fig. S6). The cavity was made 264
by 19 residues: N10, V12, A13, V14, S16, N17, F18, L60, S61, I63, K64, C65, R76, C77, 265
H78, T79, F113, I114, V117 (For reader’s convenience, we underlined the hydrophobic 266
residues; Fig. 4C). Similar cavities formed by these 19 residues were identified in all other 267
eighteen NMR-derived structures, though the sizes and shapes of their cavities showed some 268
variation because of the limited resolution of the NMR structures. The 19 residues are 269
distributed on three structural segments: α4+α7 most rigid block, R76-T79 short loop, and the 270
central α1. The α4+α7 most rigid block was stabilized by the three disulfide bonds as 271
mentioned above (C52/C127, C56/C123, and C59/C120, Fig. 3C). In addition to the rigid 272
structure of the cavity, we observed a structural exchange suggested by 15N relaxation 273
dispersion (Table 1). S61 was close to both V12 and T79 (Fig. 4C), which exhibited the 274
second-largest dispersions. We hypothesized that the structural exchange is related to the 275
opened and closed form of this cavity. 276
277
278
279
280
281
18
282
283
Fig. 4. The cavity shown using the representative structure (residues 10-148). (A) Surface 284representation of GLuc: positive residues (Arg, Lys and His) are colored in blue; negative 285residues (Glu and Asp) are colored in red; and hydrophobic residues are colored in yellow. The 286entrance to cavity is indicated by an arrow. (B) The cavity representation (colored in transparent light 287blue) of GLuc shown from the same direction as in (A). Residues that retained 1H-15N HSQC signals 288after 20 minutes and 18 hours H/D exchanging are shown in pink and purple, respectively; residues 289located in the activity-related loop R76-T79 are in orange; two IDRs are shown in cyan. (C) Residue 290composition around the interior cavity. The cavity wall and its contributing residues are colored using 291the same color code as in (B). The insets show ribbon models of GLuc with the cavity colored in light 292blue and viewed from the same direction as in the main panel. The entrance to the cavity is indicated 293by an arrow. 294
295
19
A flexible docking simulation indicated that the cavity was large enough to 296
accommodate coelenterazine; and this was verified for all seven models that formed five 297
disulfide bonds (Fig. S7). H/D exchange indicated that the amide protons of N17, L60, S61, 298
F113, I114 and V117, which are located around the cavity, were visible after 20min 299
incubation in D2O, and among them, L60, S61, I114, and V117 signals were visible even after 300
18hrs (Fig. 2B and Fig. S5). The four residues are located in the α4+α7 most rigid block (Fig. 301
2B and 2E), indicating that the cavity wall is rigid. The hydrophobic character of the cavity’s 302
interior suggests a putative role in recruiting coelenterazine, a small poorly soluble molecule. 303
Furthermore, three activity-related residues R76, C77, H78 [10] are located in the short 304
R76-T79 loop that is near the α4+α7 most rigid block and stabilized through the C65/C77 305
disulfide bond. C65 and C77 are also associated with bioluminescence activity. In particular, 306
the mutation of C77 resulted in a vanishing luminescence [10]. This can be rationalized by 307
hypothesizing that the destruction of C65/C77 disulfide bond ruins the entire cavity structure 308
and inactivate GLuc. 309
Sequence alignment also points to the role of the cavity as a binding pocket for 310
coelentarzine. First, the seven cavity forming residues: L60, S61, V117 (in the hydrophobic 311
region); R76, H78 (in the activity-related loop); and C65, C77 (the disulfide bond, see Fig. 312
4C) are highly conserved in 12 luciferases (MoLuc, MpLuc, etc. see Fig. S8A). C65, R76, 313
C77, V117 were fully conserved, and L60, S61, H78 had a 92% conservation ratio. 314
Furthermore, the structures of OLuc, RLuc and apoaequorin also contain a similar 315
hydrophobic cavity. Altogether, these observations strongly suggest that the cavity constitutes 316
20
the coelenterazine’s binding site and is thus essential for GLuc’s bioluminescence activity. 317
Additionally, we noticed that several residues in the C-terminal region (K141-D168) are 318
remarkably conserved (Fig. S8A) despite their high flexibility as assessed by heteronuclear 319
NOE analysis (Fig. 2), which may suggest that they are functionally or perhaps structurally 320
important. The sequence alignment of residues 27-97 with residues 98-168 indicates that 321
K141-F151 in the C-terminal region has a high similarity with K70-Y80 where the 322
aforementioned activity-related loop (R76-T79) is located (Fig. S8B). Furthermore, our 323
previous mutational analysis demonstrated that W143, L144 and F151 also play an important 324
role in GLuc’s activity [11], and it is of interest to note that these conserved residues are 325
disordered in our NMR structure and can be defined as IDRs. 326
Finally, let us note that several lines of evidence suggested that these residues are not 327
completely disordered. First, residues around F151 exhibited low 1H -15N NOE values (0.0 ~ 328
-0.2, Fig. 2C), suggesting a disordered state in the nano- or pico-second time scale, and the 329
R2-dispersion experiments indicated that these residues experience a micro- or milli-second 330
time scale exchange between the folded and unfolded states rather than in a perfectly flexible 331
state (Table 1). This exchange between a folded and a less folded state was further 332
corroborated by the observation of strong intra-residues and sequential NOEs in the 3D 333
15N-edited NOESY, and the relatively broad line shapes of the peaks in the 2D 1H-15N HSQC 334
(data not shown). Taken together, our results raise the possibility of an active participation of 335
flexible regions (that can be considered as IDRs) in the coelenterazine oxidation reaction. 336
337
21
Conclusion 338
We produced a recombinant 13C, 15N labeled GLuc in E.coli, and assigned nearly all of 339
the backbone and most of the side chain chemical shifts. The N- and C-termini, as well as the 340
segment located between α1 and α3 (encompassing α2) and the loop between α5 and α6 were 341
flexible. GLuc’s structure is unique and is made of nine helices, constituting two anti-parallel 342
bundles, which are formed by, respectively, helices α3-α4 and α7-α8 of parts of homologous 343
sequential repeats. The helices are tied together by disulfide bonds to form a 4-finger 344
structure with a pseudo-2-fold symmetry surrounding the N-terminal helix 1. Finally, we 345
identified a hydrophobic cavity where coelenterazine is most likely to bind and the catalytic 346
reaction occurs. The fold of GLuc is novel, and we believe that the above reported 347
structural/dynamic information will open an avenue for redesigning the bioluminescence 348
activity of GLuc and thereby widen its scope of application. 349
22
Materials and methods 350
Expression system 351
A DNA sequence encoding the wild-type GLuc gene (UniProtKB ID: Q9BLZ2) without 352
the 17 residues secretion tag and with an E100A and G103R mutations that increased protein 353
expression was synthesized as reported previously [13]. The GLuc sequence was flanked with 354
an N terminal His-tag and a C terminal SEP-tag (Solubility Enhancement Peptide tag, C9D) 355
to facilitate protein expression, refolding, and purification [20]. Two Factor Xa cleavage sites 356
were inserted between GLuc and His-tag/SEP-Tag. The GLuc gene named GLuc-TG [13] 357
was inserted into pET21c (Novagen) at the NdeI/BamHI site to construct p21GLucTG with 358
ampicillin resistance. 359
360
Protein expression and purification 361
p21GLucTG was transformed into BL21(DE3), and pre-cultured in 1 L Luria-Bertani 362
(LB) medium at 37°C and 250 rpm shaking. When OD590nm reached 1.0, E.coli cells were 363
collected by soft centrifugation and transferred to a 1 L M9 medium containing 13C-glucose 364
and 15NH4Cl. Isopropyl β-D-Thiogalactoside (IPTG) was added at 1 mM final concentration 365
for inducing protein expression, and the temperature was lowered to 25°C for minimizing the 366
formation of inclusion bodies. After 4 hours with shaking at 250 rpm, the cells were harvested 367
by centrifugation and sonicated. GLuc was purified from the supernatant fraction using a 368
Nickel Nitrilotriacetic Acid (NTA) column followed with overnight dialysis at 4°C against 50 369
mM Tris-HCl, pH 8.0. GLuc was then air-oxidized for three days at the same conditions in 370
23
order to form the five disulfide bridges. Residual misfolded GLuc was removed using a 371
reversed phase High-Performance Liquid Chromatography (HPLC). The protein 372
concentration was determined using a Bradford assay [33], and Factor Xa was added to GLuc 373
dissolved in 50 mM Tris-HCl, 100 mM NaCl, and 5 mM CaCl2 at a ratio of 1:100 (w/w), and 374
the sample was again incubated for 8 hours at 37°C, 100 rpm for enzymatic cleavage of the 375
His- and the SEP-Tags. Uncleaved GLuc was removed using, again, reversed phase HPLC. 376
GLuc identity was confirmed by MALDI-TOF mass spectroscopy on an ABI SCIEX 377
TOF/TOF 5800 (Thermo Fisher Scientific Inc., Massachusetts, USA). GLuc was freeze-dried 378
and kept as a powder at -30°C until use. 379
380
Analytical Ultracentrifugation (AUC) 381
Sedimentation equilibrium experiments were carried out using an Optima XL-A 382
analytical ultracentrifuge (Beckman-Coulter, Inc., Brea, California, USA) with a four-hole 383
An60Ti rotor at 20°C. Before centrifugation, GLuc samples were dialyzed overnight against 384
50 mM MES and 100 mM NaCl at pH 4.7. The solvent density (1.006983 g/cm3) was 385
determined using DMA 5000 (Anton Paar). Each sample was then transferred into a cell with 386
a six-channel centerpiece. The sample concentrations were 1.2, 0.6, and 0.3 mg/mL. Data 387
were obtained at 12,000, 22,000, and 37,000 rpm. A total equilibration time of 24 hours was 388
used for each speed, with absorbance scans at 280 nm taken every 4 hours to ensure that 389
equilibrium had been reached. Data analysis was performed by global analysis of all of the 390
data sets obtained at different concentrations and rotor speeds using SEDPHAT [34]. 391
24
NMR analysis and structure calculation 392
NMR experiments for resonance assignments and 1H-15N heteronuclear NOE experiment 393
were conducted using 0.2 mM 15N single or 15N, 13C double labeled GLuc protein dissolved in 394
50 mM MES buffer pH 6.0 and 2 mM NaN3, at 293 K with 8%(v/v) D2O in a 5 mm Shigemi 395
microtube (Shigemi co., Ltd, Tokyo, Japan). NMR spectra were acquired on a Bruker 396
Avance-III 700 MHz spectrometer, equipped with a 5 mm CPTXI cryoprobe. 397
Two-dimensional and three-dimensional NMR experiments (1H-15N HSQC, HNCACB, 398
CBCA(CO)NH, HNCA, HNCO, HN(CA)CO) were performed for the backbone 15N and 13C 399
assignments. 15N-TOCSY-HSQC, 15N-NOESY-HSQC, and HCCH-TOCSY were used for 400
backbone and side-chain signal assignments. H/D exchange 1H-15N-HSQC experiment was 401
performed under the above-described conditions but by dissolving GLuc’s freeze-dried 402
powder in D2O instead of H2O. The transverse relaxation rate (R2) dispersion experiments for 403
backbone 15N atoms were performed on a Bruker Avance-III 900 MHz spectrometer, using 404
pulse scheme including constant time relaxation compensated CPMG pulse sequences [35]. A 405
series of 2D experiments were acquired with various rates of CPMG pulse (50, 100, 150, 200, 406
250, 300, 400, 500, 600, 800, 1000 Hz) in the two of 20 ms CPMG blocks. The 15N CPMG 407
irradiation intensity was set to 3125 Hz (~35 ppm). To avoid the error in peak intensity arisen 408
from the offset effect, two sets of relaxation experiments with different 15N irradiation centers 409
(111 ppm and 125 ppm) were carried out. For each signal, we read the series of peak 410
intensities using the spectrum giving the smaller errors by the offset effect. 411
412
25
Three-dimensional structure determination 413
Automated NOE assignments and structure calculations were performed using CYANA 414
(ver. 3.98) on a PC-cluster equipped with 20-core Intel Xeon E5-4627v3 (3.0 GHz) and using 415
the manually assigned chemical shifts and a list of NOE chemical shifts derived from the 3D 416
15N-, 13C-edited NOESY spectra of the aliphatic and aromatic regions. For each cycle of 417
CYANA calculation, 20 out of 100 structures were selected after 10,000 steps of simulated 418
annealing using distance constraints derived from automatically assigned NOEs. The detailed 419
algorithm and strategy are described [23]. For the automated NOE assignments of the NOEs 420
peaks, the tolerances were set to 0.04, 0.4 and 0.4 ppm for 1H, 15N and 13C signals, 421
respectively. 19 rounds of CYANA calculations with different random seeds were performed. 422
It should be noted that, for a minor number of the NOEs, the automated assignment was 423
ambiguous and depended on the random seed. We thus calculated the structures using 424
restraints that were slightly different from set to set, and we selected the best structure from 425
the structures generated using each of the sets and reported the ensemble of structures. The 426
treatment using many random seeds was previously discussed and the ensemble of structures 427
calculated using ambiguous assignments becomes more diverse and safer. We can reduce the 428
risk to make structure ensemble affected by wrong NOE assignments [36]. According to 429
experimentally determined slow-exchanging backbone amide protons and threonine (Thr) 430
side chain OH atoms, we applied 24 sets of distance constraints, two of them for hydrogen 431
bonds including Thr-OH related to N-terminal capping of α-helices and 21 of them for 432
backbone amide protons to the 19 rounds of CYANA calculations. 433
26
Other softwares 434
The secondary structure elements of GLuc were predicted by submitting the backbone 435
chemical shifts into TALOS+ (https://spin.niddk.nih.gov/bax/nmrserver/talos/) [37]. 436
Root-Mean-Square Deviation (RMSD) was calculated using a Biopython.PDB module 437
(https://biopython.org) [38]. Three-dimensional images of the NMR structures were generated 438
using PyMOL. The flexible docking simulation was calculated using AutoDock (Ver. 4.2.6) 439
and AutoDockTools (Ver.1.5.6) [39] 440
441
442
27
Acknowledgments 443
We thank members of the Kuroda Laboratory for discussion, and help and advice with 444
the experimental operations. We are especially thankful to Dr. Tetsuya Kamioka and Mr. 445
Fumiya Suzuki for advice and kind help with protein expression and purification. We are 446
grateful to Prof. Russel Hopcroft, University of Alaska–Fairbanks (UAF), for kind permission 447
to reproduce his photograph of Gaussia princeps (used in the graphical abstract; R. Hopcroft 448
copyright), and Prof. Yanhong Bai, Zhengzhou University of Light Industry, for her generous 449
support of this international joint research. 450
451
Funding sources 452
The work has financially supported by a grant-in-aid from the Japan Society for the 453
Promotion of Science (JSPS) KAKENHI- 23651213 and 26560432, the institute for Global 454
Innovation Research at TUAT, and the Doctoral Scientific Research Foundation of 455
Zhengzhou University of Light Industry (No. 2018BSJJ020). 456
457
Authors’ contribution 458
W.N, Y.K. and Y.T. conceived the project, analyzed the structure, and wrote the manuscript. 459
W.N., K.T., and T.Y. performed the NMR experiments and analyzed the data, N.K and T.Y. 460
performed and analyzed the structure calculation with W.N’s assistance, S.U. and T.S. 461
performed the AUC experiments. All authors contributed to finalize the manuscript and 462
approved it. 463
28
Competing Interests 464
The authors declare no conflicts of interests. 465
466
References 467
1 Shimomura O (2006) Bioluminescence: chemical principles and methods. World Scientific 468
Publishing, Singapore. 469
2 Oyama H, Morita I, Kiguchi Y, Miyake S, Moriuchi A, Akisada T, Niwa T & Kobayashi N 470
(2015) Gaussia Luciferase as a Genetic Fusion Partner with Antibody Fragments for 471
Sensitive Immunoassay Monitoring of Clinical Biomarkers. Anal Chem 87, 12387–472
12395. 473
3 Kaskova ZM, Tsarkova AS & Yampolsky I V (2016) 1001 lights: luciferins, luciferases, 474
their mechanisms of action and applications in chemical analysis, biology and medicine. 475
Chem Soc Rev 45, 6048–6077. 476
4 Yi S, Liu N-N, Hu L, Wang H & Sahni N (2017) Base-resolution stratification of cancer 477
mutations using functional variomics. Nat Protoc 12, 2323. 478
5 C.S. Sent-Gyorgyi BJB (2001) Luciferases, fluorescent proteins, nucleic acids encoding the 479
luciferases and fluorescent proteins and the use thereof in diagnostics, high throughput 480
screening and novelty items. U. S. Patent 6232107-B. 481
6 Tannous BA, Kim DE, Fernandez JL, Weissleder R & Breakefield XO (2005) 482
Codon-optimized Gaussia luciferase cDNA for mammalian gene expression in culture 483
and in vivo. Mol Ther 11, 435–443. 484
7 Maguire CA, Deliolanis NC, Pike L, Niers JM, Tjon-Kon-Fat LA, Sena-Esteves M & 485
Tannous BA (2009) Gaussia luciferase variant for high-throughput functional screening 486
applications. Anal Chem 81, 7102–7106. 487
29
8 Welsh JP, Patel KG, Manthiram K & Swartz JR (2009) Multiply mutated Gaussia 488
luciferases provide prolonged and intense bioluminescence. Biochem Biophys Res 489
Commun 389, 563–568. 490
9 Degeling MH, Bovenberg MS, Lewandrowski GK, de Gooijer MC, Vleggeert-Lankamp CL, 491
Tannous M, Maguire CA & Tannous BA (2013) Directed molecular evolution reveals 492
Gaussia luciferase variants with enhanced light output stability. Anal Chem 85, 3006–493
3012. 494
10 Kim SB, Suzuki H, Sato M & Tao H (2011) Superluminescent variants of marine 495
luciferases for bioassays. Anal Chem 83, 8732–8740. 496
11 Wu N, Kamioka T & Kuroda Y (2016) A novel screening system based on VanX‐497
mediated autolysis-Application to Gaussia luciferase. Biotechnol Bioeng 113, 1413–498
1420. 499
12 Gheysens O & Mottaghy FM (2009) Method of bioluminescence imaging for molecular 500
imaging of physiological and pathological processes. Methods 48, 139–145. 501
13 Wu N, Rathnayaka T & Kuroda Y (2015) Bacterial expression and re-engineering of 502
Gaussia princeps luciferase and its use as a reporter protein. BBA-Proteins Proteom 503
1854, 1392–1399. 504
14 Tannous BA (2009) Gaussia luciferase reporter assay for monitoring biological processes 505
in culture and in vivo. Nat Protoc 4, 582–591. 506
15 Goerke AR, Loening AM, Gambhir SS & Swartz JR (2008) Cell-free metabolic 507
engineering promotes high-level production of bioactive Gaussia princeps luciferase. 508
Metab Eng 10, 187–200. 509
16 Rathnayaka T, Tawa M, Sohya S, Yohda M & Kuroda Y (2010) Biophysical 510
characterization of highly active recombinant Gaussia luciferase expressed in 511
Escherichia coli. BBA-Proteins Proteom 1804, 1902–1907. 512
30
17 Islam MM, Khan MA & Kuroda Y (2012) Analysis of amino acid contributions to protein 513
solubility using short peptide tags fused to a simplified BPTI variant. BBA-Proteins 514
Proteom 1824, 1144–1150. 515
18 Khan MA, Islam MM & Kuroda Y (2013) Analysis of protein aggregation kinetics using 516
short amino acid peptide tags. BBA-Proteins Proteom 1834, 2107–2115. 517
19 Nautiyal K & Kuroda Y (2018) A SEP tag enhances the expression, solubility and yield of 518
recombinant TEV protease without altering its activity. N Biotechnol. 42, 77–84. 519
20 Rathnayaka T, Tawa M, Nakamura T, Sohya S, Kuwajima K, Yohda M & Kuroda Y 520
(2011) Solubilization and folding of a fully active recombinant Gaussia luciferase with 521
native disulfide bonds by using a SEP-Tag. BBA-Proteins Proteom 1814, 1775–1778. 522
21 Bah A, Vernon RM, Siddiqui Z, Krzeminski M, Muhandiram R, Zhao C, Sonenberg N, 523
Kay LE & Forman-Kay JD (2014) Folding of an intrinsically disordered protein by 524
phosphorylation as a regulatory switch. Nature 519, 106. 525
22 Yi Q & Baker D (2008) Direct evidence for a two‐state protein unfolding transition from 526
hydrogen‐deuterium exchange, mass spectrometry, and NMR. Protein Sci 5, 1060–527
1066. 528
23 Güntert P & Buchner L (2015) Combined automated NOE assignment and structure 529
calculation with CYANA. J Biomol NMR 62, 453–471. 530
24 Schmidt E & Güntert P (2012) A New Algorithm for Reliable and General NMR 531
Resonance Assignment. J Am Chem Soc 134, 12817–12829. 532
25 Johnson BA & Blevins RA (1994) NMR View: A computer program for the visualization 533
and analysis of NMR data. J Biomol NMR 4, 603–614. 534
26 Kobayashi N, Iwahara J, Koshiba S, Tomizawa T, Tochio N, Güntert P, Kigawa T & 535
Yokoyama S (2007) KUJIRA, a package of integrated modules for systematic and 536
interactive analysis of NMR data directed to high-throughput NMR structure studies. J 537
31
Biomol NMR 39, 31–52. 538
27 Ota M, Koike R, Amemiya T, Tenno T, Romero PR, Hiroaki H, Dunker AK & Fukuchi S 539
(2013) An assignment of intrinsically disordered regions of proteins based on NMR 540
structures. J Struct Biol 181, 29–36. 541
28 Inouye S & Sahara Y (2008) Identification of two catalytic domains in a luciferase 542
secreted by the copepod Gaussia princeps. Biochem Biophys Res Commun 365, 96–101. 543
29 Hasegawa H & Holm L (2009) Advances and pitfalls of protein structural alignment. Curr 544
Opin Struc. Biol 19, 341–348. 545
30 Loening AM, Fenn TD & Gambhir SS (2007) Crystal structures of the luciferase and 546
green fluorescent protein from Renilla reniformis. J Mol Biol 374, 1017–1028. 547
31 Tomabechi Y, Hosoya T, Ehara H, Sekine S, Shirouzu M & Inouye S (2016) Crystal 548
structure of nanoKAZ: The mutated 19 kDa component of Oplophorus luciferase 549
catalyzing the bioluminescent reaction with coelenterazine. Biochem Biophys Res 550
Commun 470, 88–93. 551
32 Head JF, Inouye S, Teranishi K & Shimomura O (2000) The crystal structure of the 552
photoprotein aequorin at 2.3 A resolution. Nature 405, 372–376. 553
33 Bradford MM (1976) A rapid and sensitive method for the quantitation of microgram 554
quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72, 555
248–254. 556
34 Vistica J, Dam J, Balbo A, Yikilmaz E, Mariuzza RA, Rouault TA & Schuck P (2004) 557
Sedimentation equilibrium analysis of protein interactions with global implicit mass 558
conservation constraints and systematic noise decomposition. Anal Biochem 326, 234–559
256. 560
35 Tollinger M, Skrynnikov NR, Mulder FAA, Forman-Kay JD & Kay LE (2001) Slow 561
Dynamics in Folded and Unfolded States of an SH3 Domain. J Am Chem Soc 123, 562
32
11341–11352. 563
36 Buchner L & Güntert P (2015) Increased Reliability of Nuclear Magnetic Resonance 564
Protein Structures by Consensus Structure Bundles. Structure 23, 425–434. 565
37 Shen Y, Delaglio F, Cornilescu G & Bax A (2009) TALOS+: a hybrid method for 566
predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR 567
44, 213–223. 568
38 Manderick B & Hamelryck T (2003) PDB file parser and structure class implemented in 569
Python. Bioinformatics 19, 2308–2310. 570
39 Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS & Olson AJ 571
(2009) AutoDock4 and AutoDockTools4: Automated docking with selective receptor 572
flexibility. J Comput Chem 30, 2785–2791. 573
574
33
575
576