9
Family 42 carbohydrate-binding modules display multiple arabinoxylan-binding interfaces presenting different ligand afnities Teresa Ribeiro a,1 , Teresa Santos-Silva b,1 , Victor D. Alves a , Fernando M.V. Dias a , Ana S. Luís a , José A.M. Prates a , Luís M.A. Ferreira a , Maria J. Romão b , Carlos M.G.A Fontes a, a CIISA-Faculdade de Medicina Veterinária, Pólo Universitário do Alto da Ajuda, Avenida da Universidade Técnica, 1300-477 Lisboa, Portugal b REQUIMTE/CQFB, Departamento de Química, FCT-UNL, 2829-516 Caparica, Portugal abstract article info Article history: Received 29 March 2010 Received in revised form 29 June 2010 Accepted 3 July 2010 Available online 14 July 2010 Keywords: Protein:carbohydrate interactions Carbohydrate-Binding Module (CBM) β-trefoil fold Clostridium thermocellum Enzymes that degrade plant cell wall polysaccharides display a modular architecture comprising a catalytic domain bound to one or more non-catalytic carbohydrate-binding modules (CBMs). CBMs display considerable variation in primary structure and are grouped into 59 sequence-based families organized in the Carbohydrate-Active enZYme (CAZy) database. Here we report the crystal structure of CtCBM42A together with the biochemical characterization of two other members of family 42 CBMs from Clostridium thermocellum. CtCBM42A, CtCBM42B and CtCBM42C bind specically to the arabinose side-chains of arabinoxylans and arabinan, suggesting that various cellulosomal components are targeted to these regions of the plant cell wall. The structure of CtCBM42A displays a beta-trefoil fold, which comprises 3 sub-domains designated as α, β and γ. Each one of the three sub-domains presents a putative carbohydrate-binding pocket where an aspartate residue located in a central position dominates ligand recognition. Intriguingly, the γ sub-domain of CtCBM42A is pivotal for arabinoxylan binding, while the concerted action of β and γ sub-domains of CtCBM42B and CtCBM42C is apparently required for ligand sequestration. Thus, this work reveals that the binding mechanism of CBM42 members is in contrast with that of homologous CBM13s where recognition of complex polysaccharides results from the cooperative action of three protein sub- domains presenting similar afnities. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Plant cell wall polysaccharides represent the most abundant reservoir of organic carbon within the biosphere. Recycling of photosynthetically xed carbon through the action of microbial plant cell wall hydrolases is, therefore, a fundamental biological process that has recently acquired considerable industrial importance [1]. Develop- ment of second generation bio-fuels derived from lignocellulosic biomass highlights the need to understand the biological processes that result in the production of soluble sugars from plant cell wall structural polysaccharides. It is well established that the complex and intricate nature of plant cell walls restricts the access of enzymes to their target substrates, primarily cellulose and hemicellulose. To overcome their limited accessibility to plant carbohydrates, microbial cellulases and hemicellulases have acquired complex molecular architectures generally comprising catalytic domains and non-catalytic carbohydrate- binding modules (CBMs). The primary role of CBMs is to target the appended catalytic module to the proximity of its substrate, thereby potentiating catalysis and reducing the accessibility constrains [2]. Carbohydrate modifying enzymes and their associated modules, which include CBMs, have been classied into sequence-based families in the CAZy database [3]. Currently there are more than 50 sequence-based families of CBMs (March 2010) which recognize a variety of microbial, plant and mammalian glycans. Based on the topology of the carbohy- drate-binding site, which complements the conformation of the target ligand, CBMs have been classied into three types [2]. Thus, in type A modules, which interact with the at surfaces of crystalline poly- saccharides, the binding site comprises a planar hydrophobic platform that contains three exposed aromatic amino acids [4]. These CBMs show no signicant afnity for soluble polysaccharides and the ligand specicity of CBM families that contain type A modules is, usually, invariant. In contrast, type B and type C CBMs recognize single carbohydrate chains either internally or at the termini, respectively, and present a ligand specicity that reects the substrate specicity of the appended catalytic domain [57]. Structural studies revealed that type B and C CBMs accommodate their target ligands in clefts or pockets, respectively [2,8,9]. The three-dimensional structure of most CBMs conforms to a β- sandwich fold in which a single ligand-binding site lies in a cleft located on the concave surface of the protein [2]. Ligand plasticity in Biochimica et Biophysica Acta 1804 (2010) 20542062 Data deposition: Coordinates and observed structure factor amplitudes for the CBM42A have been deposited in the Protein Data Bank with the PDB ID code 3KMV. Corresponding author. Tel.: + 351 213652876; fax: + 351 213652889. E-mail address: [email protected] (C.M.G.A. Fontes). 1 Equal contribution. 1570-9639/$ see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.bbapap.2010.07.006 Contents lists available at ScienceDirect Biochimica et Biophysica Acta journal homepage: www.elsevier.com/locate/bbapap

Family 42 carbohydrate-binding modules display multiple arabinoxylan-binding interfaces presenting different ligand affinities

Embed Size (px)

Citation preview

Biochimica et Biophysica Acta 1804 (2010) 2054–2062

Contents lists available at ScienceDirect

Biochimica et Biophysica Acta

j ourna l homepage: www.e lsev ie r.com/ locate /bbapap

Family 42 carbohydrate-binding modules display multiple arabinoxylan-bindinginterfaces presenting different ligand affinities☆

Teresa Ribeiro a,1, Teresa Santos-Silva b,1, Victor D. Alves a, Fernando M.V. Dias a, Ana S. Luís a,José A.M. Prates a, Luís M.A. Ferreira a, Maria J. Romão b, Carlos M.G.A Fontes a,⁎a CIISA-Faculdade de Medicina Veterinária, Pólo Universitário do Alto da Ajuda, Avenida da Universidade Técnica, 1300-477 Lisboa, Portugalb REQUIMTE/CQFB, Departamento de Química, FCT-UNL, 2829-516 Caparica, Portugal

☆ Data deposition: Coordinates and observed structCBM42A have been deposited in the Protein Data Bank⁎ Corresponding author. Tel.: +351 213652876; fax:

E-mail address: [email protected] (C.M.G.A. Fonte1 Equal contribution.

1570-9639/$ – see front matter © 2010 Elsevier B.V. Adoi:10.1016/j.bbapap.2010.07.006

a b s t r a c t

a r t i c l e i n f o

Article history:Received 29 March 2010Received in revised form 29 June 2010Accepted 3 July 2010Available online 14 July 2010

Keywords:Protein:carbohydrate interactionsCarbohydrate-Binding Module (CBM)β-trefoil foldClostridium thermocellum

Enzymes that degrade plant cell wall polysaccharides display a modular architecture comprising a catalyticdomain bound to one or more non-catalytic carbohydrate-binding modules (CBMs). CBMs displayconsiderable variation in primary structure and are grouped into 59 sequence-based families organized inthe Carbohydrate-Active enZYme (CAZy) database. Here we report the crystal structure of CtCBM42Atogether with the biochemical characterization of two other members of family 42 CBMs from Clostridiumthermocellum. CtCBM42A, CtCBM42B and CtCBM42C bind specifically to the arabinose side-chains ofarabinoxylans and arabinan, suggesting that various cellulosomal components are targeted to these regionsof the plant cell wall. The structure of CtCBM42A displays a beta-trefoil fold, which comprises 3 sub-domainsdesignated as α, β and γ. Each one of the three sub-domains presents a putative carbohydrate-bindingpocket where an aspartate residue located in a central position dominates ligand recognition. Intriguingly,the γ sub-domain of CtCBM42A is pivotal for arabinoxylan binding, while the concerted action of β and γsub-domains of CtCBM42B and CtCBM42C is apparently required for ligand sequestration. Thus, this workreveals that the binding mechanism of CBM42 members is in contrast with that of homologous CBM13swhere recognition of complex polysaccharides results from the cooperative action of three protein sub-domains presenting similar affinities.

ure factor amplitudes for thewith the PDB ID code 3KMV.+351 213652889.s).

ll rights reserved.

© 2010 Elsevier B.V. All rights reserved.

1. Introduction

Plant cell wall polysaccharides represent the most abundantreservoir of organic carbon within the biosphere. Recycling ofphotosynthetically fixed carbon through the action of microbial plantcell wall hydrolases is, therefore, a fundamental biological process thathas recently acquired considerable industrial importance [1]. Develop-ment of second generation bio-fuels derived from lignocellulosicbiomass highlights the need to understand the biological processesthat result in the production of soluble sugars from plant cell wallstructural polysaccharides. It is well established that the complex andintricate nature of plant cellwalls restricts the access of enzymes to theirtarget substrates, primarily cellulose and hemicellulose. To overcometheir limited accessibility to plant carbohydrates, microbial cellulasesand hemicellulases have acquired complex molecular architecturesgenerally comprising catalytic domains andnon-catalytic carbohydrate-binding modules (CBMs). The primary role of CBMs is to target the

appended catalytic module to the proximity of its substrate, therebypotentiating catalysis and reducing the accessibility constrains [2].Carbohydrate modifying enzymes and their associated modules, whichinclude CBMs, have been classified into sequence-based families in theCAZy database [3]. Currently there are more than 50 sequence-basedfamilies of CBMs (March 2010) which recognize a variety of microbial,plant and mammalian glycans. Based on the topology of the carbohy-drate-binding site, which complements the conformation of the targetligand, CBMs have been classified into three types [2]. Thus, in type Amodules, which interact with the flat surfaces of crystalline poly-saccharides, the binding site comprises a planar hydrophobic platformthat contains three exposed aromatic amino acids [4]. These CBMs showno significant affinity for soluble polysaccharides and the ligandspecificity of CBM families that contain type A modules is, usually,invariant. In contrast, type B and type C CBMs recognize singlecarbohydrate chains either internally or at the termini, respectively,and present a ligand specificity that reflects the substrate specificity ofthe appended catalytic domain [5–7]. Structural studies revealed thattypeB andC CBMs accommodate their target ligands in clefts or pockets,respectively [2,8,9].

The three-dimensional structure of most CBMs conforms to a β-sandwich fold in which a single ligand-binding site lies in a cleftlocated on the concave surface of the protein [2]. Ligand plasticity in

Fig. 1. Modular architecture of C. thermocellum enzymes containing CBM42 modules. Abbreviations: Dk, type I dockerin domain; GH5, family 5 glycoside hydrolase domain; GH43,family 43 glycoside hydrolase domain; CBM42, family 42 carbohydrate-binding domain.

Table 1Data collection and refinement statistics.

CrystalSpace group P3221Unit cell parameters (Å) a=b, i 106.37, 237.56Mathews parameter (Å3/Da) 2.93

Data collection statisticsX-ray source ESRF, ID29Wavelength (Å) 0.976No. of observed reflections 1414710No. of unique reflections 144547 (20870)Resolution limits (Å) 45.22–1.80 (1.90–1.8)Completeness (%) 99.9 (99.8)Redundancy 9.8 (9.2)Multiplicity 9.8 (9.2)Average I/σ(I) 19.8 (4.9)Rsym (%) 0.089 (0.381)

Refinement statisticsResolution limits (Å) 45.22–1.80R-factor (%) (No. of reflections) 0.169 (144547)R-free (%) (No. of reflections) 0.197 (7239)No. protein residues in the asymmetric unit 1103No. water molecules in the asymmetric unit 864No. atoms in the asymmetric unit 10307rmsd bond length (Å) 0.013rmsd bond angles (°) 1.416Average temperaturefactor (Å2)

Protein main chain atoms 7.1Protein side chain atoms 9.0Water molecules 15.7Ca2+ 29.6

Ramachandran plot Residues in most favouredregions (%)

87.6

Residues in additionallyallowed regions (%)

11.6

Residues in generouslyallowed regions (%)

0.8

Overall G-factor 0.01

2055T. Ribeiro et al. / Biochimica et Biophysica Acta 1804 (2010) 2054–2062

CBMs built on a β-sandwich platform usually results from subtlevariations at the binding interface that confer capacity to accommo-date heterogeneity in the composition and linkage of the sugarbackbone per se or in the branches that may decorate thecarbohydrate polymers [10,11]. In contrast, a variety of CBMs haveevolved the capacity to recognize their target ligands at multiplebinding sites, as exemplified by members of families CBM13 andCBM42, which assume a β-trefoil fold [12,13]. These modules, whichare typical type C CBMs, show a sequential 3-fold internal repeat of ~45 amino acid residues comprising three sub-domains, denoted as α,β and γ, each one containing a discrete ligand binding site. AlthoughCBM13 and CBM42 are built on a similar scaffold, the ligand-bindingsites of the two structurally related families display differenttopologies and locations within the protein. However, the threetype C binding interfaces of CBM13 and CBM42 have a pocket-liketopology,which is particularly suited to recognize small sugars [14,15].Thus, the fungal family 42 CBM (AkCBM42) of Aspergillus kawachiiarabinofuranosidase GH54 [12,15], termed AkAbf54 was shown tobind arabinose side-chains of arabinoxylans [12]. Asp435 and Asp488,located in AkCBM42 binding pockets β and γ, respectively, form twopivotal hydrogen bonds with the O-2 and O-3 atoms of an arabinosemolecule captured in complexwith the CBMand thus play a key role inligand recognition [12]. In addition, AkCBM42 His416 (pocket β) andHis463 (pocket γ) form an additional hydrogen bond with the O-5atomof the arabinosemoiety. Furthermore, an aromatic stacking effectis accomplished by Tyr417–Phe419–Tyr456 triad in pocket β andTyr464–Tyr359 in pocket γ. The pocket of AkCBM42 sub-domain αwas, apparently, non-functional and itwas suggested that this is due tothe replacement of an aspartate by a glutamate at position 387.Members of family CBM42 were shown to bind specifically to thearabinose side-chain of arabinoxylans, while not interacting directlywith the xylan backbone individually [12,16]. In contrast, members ofCBM13 found in xylanases were shown to display a higher degree ofaffinity and specificity for insoluble xylan [13,17]; subtle variations inthe ligand-binding sites allow the α, β and γ sub-domains to bindcooperatively three different xylan strands of the insoluble macro-molecule. Both CBM13s found in xylanases and CBM42s located inarabinofuranosidases were shown to promote the activity of theappended catalytic domains against insoluble xylans [12,16].

Clostridium thermocellum produces a remarkably complex functionalnanomachine, termed the cellulosome, which efficiently degrades plantcell wall polysaccharides [18,19]. Cellulosome assembly results from theinteraction of type I dockerin domains, present in cellulosomal cellulasesand hemicellulases, and the cohesin domains of a large non-catalyticintegrating protein which acts as a molecular scaffold, termed CipA[20,21]. CipA contains a family 3 CBM that binds crystalline cellulose,thus anchoring the enzyme complex onto the plant cell wall [22]. Inaddition, most cellulosomal enzymes also contain CBMs that bind a

variety of carbohydrates, allowing the individual catalytic units tointeract with their specific target substrates [18]. C. thermocellumproteome presents 72 polypeptides containing type I dockerins, whichallowed assigning those proteins as cellulosomal components [23].Inspection of the primary sequence of those enzymes revealed thatproteinswith accession numbers Cthe_0015, Cthe_2138 and Cthe_2139contain family 42 CBMs. The three enzymes are putative GH43arabinofuranosidases, although their precise role in the function of theassociated enzymes remains unknown (see Fig. 1 for moleculararchitecture of the three proteins). Here we report the structural andbiochemical characterization of C. thermocellum cellulosomal CBM42s.The structure of one of these proteins was solved and was used toperform a mutagenesis study on the ligand specificity of the variousCBM42. The data suggest that cellulosomal CBM42s display a restricted

Table 2Polysaccharide specificity of C. thermocellum CBM42A, CBM42B and CBM42C deter-mined by affinity polyacrylamide gel electrophoresis.

Liganda CBM42Ab CBM42B CBM42C

Hydroxyethylcellulose (HEC) − − −Lichenan (Icelandic moss) − − −Laminarin − − −Curdlan − − −Barley β-Glucan − − −Oat-spelt Xylan + + +Wheat arabinoxylan +++ +++ ++Rye arabinoxylan ++ ++ ++Glucuronoxylan − − −Xyloglucan (Tamarind) + + +Mannan − − −Galactomannan − − −Galactomannan (Gal:Man, 21:79) (Carob) − − −Galactomannan (Gal:Man, 38:40) (Guar) − − −Galactan (Potato) − − −Arabinogalactan + + +Galactan (Lupin) − − −Arabinan ++ ++ +++Glucomannan − − −Rhamnogalacturonan (Soya bean pectic fibre) − − −Rhamnogalacturonan I − − −Pectic galactan (Potato) − − −Pustulan − − −Pullulan − − −a Ligands were screened at a concentration of 0.1 mg/ml.b No detectable binding, −; clear binding, +; strong binding ++; very strong

binding +++.

2056 T. Ribeiro et al. / Biochimica et Biophysica Acta 1804 (2010) 2054–2062

specificity for arabinose side-chains located on complex polysacchar-ides. In addition, the various CBM42 sub-domains recognize the targetligand with distinct affinities.

2. Materials and methods

2.1. Cloning, protein expression and purification

Genes encoding the family 42 CBM modules of C. thermocellumCthe_0015 (CtCBM42A, residues 26 to 170), Cthe_2138 (CtCBM42B,residues 19 to 166) and Cthe_2139 (CtCBM42C, residues 477 to 614)

Fig. 2. Structural alignment of family 42 CBMs. The secondary structure displayed in each suhydrophobic nucleus. Crucial residues for ligand binding are highlighted with inverted coloResidues in positions 18, 65 and 113, involved in aromatic stacking, display an extra α, β o

were amplified through PCR from C. thermocellum YS genomic DNA,using the primers listed in Table 1s. The PCR employed the thermostableDNA polymerase NZYPremium (NZYTech Ltd) and the forward andreverse primers contained engineered NheI and XhoI or NheI and SalIrestriction sites, respectively (Table 1s). Amplified DNA was directlycloned into pNZY28 (NZYTech Ltd) and sequenced to ensure that nomutations were accumulated during the amplification. The genes weresubsequently sub-cloned into NheI and XhoI restricted pET21a (Nova-gen) generating pCBM42A, pCBM42B, and pCBM42C, respectively. Allrecombinant derivatives contained a C-terminal His6-tag.

Escherichia coli BL21 DE3 cells harbouring all the recombinantexpression vectors were cultured in Luria–Bertani broth at 37 °C tomid-exponential phase (A600 nm 0.6) and recombinant proteinexpression was induced by the addition of 1 mM isopropyl 1-thio-β-D-galactopyranoside and incubation for a further sixteen hours at25 °C. The His6-tagged recombinant proteins and their generatedmutant derivatives (see below) were purified from cell-free extractsby immobilized metal ion affinity chromatography (IMAC) asdescribed previously [23,24]. Purified proteins were buffer exchangedinto 20 mM Tris–HCl buffer, pH 7.5, containing 100 mM NaCl and5 mM CaCl2. SDS/PAGE showed that all the recombinant proteinswere more than 95% pure after a Coomassie Blue staining. Wild typeCtCBM42B and its mutant derivatives were shown to suffer limitedproteolysis after expression in E. coli (data not shown). Since thetruncated derivatives still bound the affinity column it is anticipatedthat proteolysis occurs at the protein N-terminus. Addition of severalprotease inhibitors and changing growing conditions had no effect inproteolysis. For crystallization, CtCBM42A was further purified by sizeexclusion chromatography. Following IMAC, the protein was bufferexchanged into 50 mM Hepes–HCl buffer, pH 7.5, containing 200 mMNaCl and 2 mM CaCl2 and then subjected to gel filtration using aHiLoad 16/60 Superdex 75 column (GE Healthcare) at a flowrate of1 ml/min. Purified CtCBM42A was concentrated using an Amicon-10 kDa molecular weight centrifugal concentrator and washed threetimes into 2 mM CaCl2.

2.2. Source of sugars used

All soluble polysaccharides were purchased from MegazymeInternational (Bray, County Wicklow, Ireland), except oat spelt

b-domain is based on CtCBM42A structure. The grey highlighted residues compose thers and black boxes surround the residues providing an aromatic stacking environment.r γ symbol informing to which binding pocket they belong to.

Fig. 3. Structure of C. thermocellum CtCBM42A. In panel A, CtCBM42A β-trefoil isdisplayed revealing its internal three-fold pseudo-symmetry. The three trefoil sub-domains, each composed of four β strands (β1, β2, β3 and β4) and one α-helix (α1),are distinctively colored: α sub-domain (salmon), β sub-domain (blue) and γ sub-domain (green). A calcium ion (magenta) is located at the center of the hairpin tripletcap. In panel B, CtCBM42Aα, β and γ sub-domains were superimposed. The picture wasprepared using UCSF Chimera [41].

Fig. 4. Overlay of CtCBM42A and AkCBM42A CBM sub-domains. CtCBM42A is colored as: α syellow. β and γ pockets display the 2D44 (AkCBM42A) arabinotriose ligands. The picture w

Table 3Apparent affinities of C. thermocellum cellulosomal CBM42 modules and their mutantderivatives for arabinoxylan and arabinan. Data represent the mean of three separateexperiments±standard deviation.

Protein Sub-domain

Ka (%−1)

Arabinan Wheat Arabinoxylan

CBM42A 114±5 137±4CBM42A_D41A α 14±1 18±2CBM42A_S91A β 86±2 96±3CBM42A_D138A γ NB NBCBM42A_Y121A γ 128±6 155±8CBM42A_D138A_A26Y α+γ NB NBCBM42A_D138A_F28Y α+γ NB NBCBM42A_D138A_A26YF28Y α+γ NB NBCBM42A_D138A_S73Y β+γ NB NBCBM42A_D138A_S91D β+γ 82±5 106±6CBM42A_D138A_S73YS91D β+γ 59±3 57±2CBM42B 122±7 145±5CBM42B_D41A α 114±6 162±8CBM42B_D91A β 21±4 29±2CBM42B_D138A γ 59±1 17±2CBM42B_D91A_D138A β+γ 8±1 NBCBM42C 222±8 139±6CBM42C_D41A α 227±9 79±3CBM42C_D91A β 102±4 26±2CBM42C_D138A γ 96±6 12±1CBM42C_D91A_D138A β+γ 12±2 NB

NB, affinity too low to allow calculation of Ka.

2057T. Ribeiro et al. / Biochimica et Biophysica Acta 1804 (2010) 2054–2062

xylan, laminarin, galactomannan and hydroxyethylcellulose (HEC),which were obtained from Sigma. Catalogue numbers of polysacchar-ides where more than one version exists are: wheat arabinoxylan, P-WAXYM; rye arabinoxylan, P-RAXY; Galactomannan, Carob (P-GALML); Galactomannan, Guar (P-GGMMV). Avicel (PH101) wasobtained from Serva, while acid-swollen cellulose was prepared asdescribed previously [9].

2.3. Site-directed mutagenesis

Site-directed mutagenesis was carried out using the PCR-basedNZYMutagenesis site-directed mutagenesis kit (NZYTech Ltd)according to the manufacturer's instructions, using DNA of plasmidspCBM42A, pCBM42B and pCBM42C as templates. The sequence ofthe primers used to generate these mutants is displayed in Table 1s.The mutated DNA sequences were sequenced to ensure that onlythe appropriate mutations had been incorporated into the nucleicacids.

ub-domain (salmon), β sub-domain (blue) and γ sub-domain (green) and AkCBM42 inas prepared using UCSF Chimera [41].

2058 T. Ribeiro et al. / Biochimica et Biophysica Acta 1804 (2010) 2054–2062

2.4. Affinity gel electrophoresis

The affinity of CtCBM42A, CtCBM42B and CtCBM42C and respec-tive mutant derivatives for a range of soluble polysaccharides wasdetermined by affinity gel electrophoresis (AGE). The method usedwas essentially that described by Tomme et al. [25], using thepolysaccharide ligands at a concentration of 0.1% (w/v), unless statedotherwise. Electrophoresis was carried out for 4 h at room temper-ature in native 10% (w/v) polyacrylamide gels. The non-bindingnegative control reference protein was bovine serum albumin (BSA).Quantitative assessment of binding was carried out as describedpreviously [26], using polysaccharide concentrations ranging from0.002 to 0.1% (w/v). Briefly, the migration distances of the CBMs andthe reference protein were measured from the bottom of the proteinbands evident on the gels and these data were used to determine thedissociation constants (KD) from plots of 1/(R0−r) versus 1/C,according to the affinity equation shown in Eq. (1):

1= R0−rð Þ = 1 = R0−RCð Þ 1 + KD = Cð Þ ð1Þ

Where r is the relative migration distance of the CBM in thepresence of ligand in the gel, R0 is the relative migration distance ofthe free CBM in the absence of ligand, RC is the relative migrationdistance of the complex at high excess of ligand where all CBMmolecules are fully complexed, C is the concentration of the ligand inthe gel and KD is the dissociation constant of the CBM for themacromolecular ligand. KD values were determined as the inverse ofthe absolute value of the intercept on the abscissa of data plottedaccording to the affinity equation. The data show the average andstandard deviation of three independent experiments.

2.5. Crystallization and data collection

Crystallization of CtCBM42A was performed using the hanging-drop, vapour-diffusion method. Initial crystallization conditions weredetermined using Crystal Screens I and II from Hampton Research,with drops of 1 μl of 40 mg/ml protein and 1 μl of precipitating agent.Small crystals grew in the presence of sodium acetate (pH 4.6) andsodium formate and further experiments were performed in order toimprove crystal quality. The optimized crystallization conditionscontained 0.1 M sodium acetate and 2.1 M sodium formate andcrystals with 0.1×0.1 mm were obtained within two weeks. Crystalswere cryo-cooled with 30% glycerol prior to data collection anddiffracted beyond 1.7 Å resolution. A complete data set was collectedat beamline ID29 at the European Synchrotron Radiation Facility(ESRF, Grenoble, France). Data were integrated using MOSFLM [27] inP3221 space group, with cell constants a=b=106.37 Å, c=237.56 Åand scaled with SCALA [28] from the CCP4 suite [29], to a maximumresolution of 1.8 Å with 99.9% completeness. Statistical data aresummarized in Table 1. Matthews [30] coefficient calculationssuggested the presence of several molecules in the asymmetric unit,from six (3.55 Å3/Da and 65% of solvent content) up to nine (2.36 Å3/Da and 48% of solvent content).

2.6. Structure determination and refinement

Structure determination was performed by the program PHASER[31], using as a search model the CBM42 module of the α-L-arabinofuranosidase from A. kawachii (AkCBM42, pdb accession code

Fig. 5. Quantitative AGE analysis of the interaction of C. thermocellum CtCBM42A, CtCBM42BBovine serum albumin (lane 1), wild type CBM42s (lane 2), D41A (lane 3), D91A (lane 4concentrations of barley β-glucan (Panel A). The percentage concentrations of the soluble pointeracting band that corresponds to the proteolytic derivative of CtCBM42B that apparentlyCBM42C (•) and its mutant derivatives D41A (■), D91A (▲) and D138A (▼) for arabinan is

1WD3), which was first edited by CHAINSAW [32]. The high similaritybetween the two proteins (36% of sequence identity) allowed theprogram to find a total of eight molecules in the asymmetric unit.Density modification, together with non-crystallographic symmetryaveraging, was performed with NCSREF [29] and the final electrondensity maps showed clear solvent boundaries and good qualityelectron density that allowed model building (figure of merit of 0.7).ARP/wARP [33] was used to automatically build the protein model.Model completion, editing and initial validation were carried out inCOOT [34].

Restrained refinement of the molecular model was done usingREFMAC 5.5 [35] and water molecules were added using COOT. Apartfrom a few initial and final residues in the polypeptide chains, allatoms in the protein could be properly assigned and refined. In the lastcycle of refinement, temperature factors were refined isotropically forprotein atoms and anisotropically for the calcium atoms. R-work andR-free converged to 15.8% and 19.2%, respectively, and geometricalvalidation was carried out by PROCHECK [36], STAN [37–39] andMOLPROBITY [40]. According to these programs, the final modelcontains 99.2% of the residues in mostly favored and allowed regionsof the Ramachandran plot and only 0.8% of the residues in generouslyallowed regions of the plot. Refinement details are summarized inTable 1.

3. Results and discussion

3.1. C. thermocellum family 42 cellulosomal CBMs bind to arabinoxylan

To investigate the biological function of cellulosomal CBM42modules, recombinant derivatives of the proteins were expressed inthe soluble form in E. coli and purified to electrophoretic homogene-ity. CtCBM42B suffered some proteolysis as judged by SDS-PAGEanalysis (data not shown), although full length CtCBM42B retained anintact CBM function (see below). The capacity of CtCBM42A,CtCBM42B and CtCBM42C to bind a range of polysaccharides wasassessed through affinity gel electrophoresis (AGE). The data,presented in Table 2, showed that all three CBM42 present similarligand specificities and bind, preferentially, arabinoxylans andarabinan. The CtCBM42s displayed a lower degree of affinity forxyloglucan, arabinogalactan and oat spelt xylan, a less substitutedheterocarbohydrate when compared with arabinoxylans and arabi-nan. In contrast, the CBMs do not interact with hydroxyethylcellulose,lichenan, laminarin, curdlan, barley-β glucan, mannan, galactoman-nan, galactan, glucomannan and a variety of pectins. Furthermore, theCBMs were unable to interact with insoluble forms of cellulose (datanot shown). Thus, cellulosomal CBM42 domains comprise functionalCBMs with a restricted specificity to arabinose-containing polysac-charides. The primary sequences of the three C. thermocellum CBM42swere alignedwith the sequence of the fungal family 42 CBM AkCBM42[12,15]. The data, presented in Fig. 2, revealed that all cellulosomalCBM42 display a considerable conservation at the putative residuesinvolved in ligand recognition. However, while CtCBM42B sub-domain α contains an asparagine in the position that is usuallyoccupied by the conserved histidine, the β sub-domain of CtCBM42Alacks the conserved aspartate that is replaced by a serine. Thus, it isapparent that not all the sub-domain of CBM42 family membersdisplay exact conservation in the putative ligand interacting residues.The significance of the observed differences in CBM function will befurther explored below.

and CtCBM42C and its mutant derivatives with arabinan (AA) and arabinoxylan (AX).) and D138A (lane 5) were subjected to AGE in gels containing a range of differentlysaccharides are displayed above the figures. Gels containing CtCBM42B present a non-lacks CBM activity. In panel B the plot of the AGE data used to determine the affinity ofdisplayed for exemplification.

2059T. Ribeiro et al. / Biochimica et Biophysica Acta 1804 (2010) 2054–2062

2060 T. Ribeiro et al. / Biochimica et Biophysica Acta 1804 (2010) 2054–2062

3.2. The crystal structure of CBM42A

To gain further insights into the molecular determinants of ligandspecificity within CBM42 domains that recognize arabinose sidechains of complex hemicelluloses, the crystal structure of CtCBM42Awas determined (Fig. 3A). As expected, CtCBM42A structure adopts aβ-trefoil architecture with the three sub-domains (α, β and γ),exhibiting a rmsd of 0.76–1.22 Å upon superposition (Fig. 3B).CtCBM42A pockets α and γ are decorated by Asp41, His25, Phe28,Tyr65 (pocket α) and Asp138, His120, Tyr121, Tyr18 (pocket γ) andmay comprehend functional binding sites (Fig. 3B). In contrast,CtCBM42A β sub-domain presents dramatic changes at the residuespositioned at the pocket interface, which may affect its capacity torecognize carbohydrates. Thus, the potentially critical residues forcarbohydrate binding are not conserved in the three CtCBM42Apockets (Fig. 3B), suggesting that not all binding sites are functional.

The architecture of the three pockets of CtCBM42A was comparedwith those of the fungal AkCBM42 (Fig. 4). In theα sub-domain, one ofthe tyrosine residues has been replaced by a threonine in AkCBM42and an alanine in CtCBM42A. The conserved aspartate present inCtCBM42Aα sub-domain is missing in AkCBM42α sub-domain. In theβ sub-domain, the two CBM42s have a similar arrangement althoughCtCBM42A lacks the conserved aspartate that is replaced in thisposition by Ser91. Analysis of the γ sub-domain revealed thatCtCBM42A putative sugar-interacting residues are approximately inthe same positions as the corresponding residues of AkCBM42.However, the distance separating the phenolic oxygen atoms of the

Fig. 6. CBM42 family phylogenetic tree. Phylogenetic tree based on a CBM42 family multiplwhile AkCBM42 and four other related Aspergillus sequences are highlighted in red. The sequtowhich the CBM is attached: GH2 (green), GH43 (blue), GH54 (red), GH93 (magenta) and N(red), β (blue) and γ (green) sub-domains as individual sequences. A grey box surrounds e

tyrosines is slightly larger in CtCBM42A, 12.5 Å, than in AkCBM42,10.2 Å. In addition, CBM42A has a third tyrosine residue, Tyr123, inthe vicinity of the binding cleft. This residue, at 14.9 Å and 8.5 Å fromthe two other tyrosines, is confining the ligand binding arrangementand putatively influencing the site specificity. Taken together thesedata suggest that the β sub-domain is inactive in CtCBM42A, which isin clear contrast with the fungal CBM42 (AkCBM42) where the β sub-domain participates in ligand recognition but the α sub-domain isinactive [12].

A calcium ion, coordinated by Ala30, Leu77 and Leu125main chaincarbonyls and by Asp29 and Asp76 side chains and one watermolecule, has been identified in each of the eight molecules of theasymmetric unit. It is the first time a calcium ion is observed in amember of CBM42 family where, due to its internal position withinthe protein, it is likely to have a structural role and thus, should notparticipate directly in ligand recognition. This was confirmed byobserving that all Clostridial CtCBM42s analyzed retained the capacityto bind their target polysaccharides in the presence of EDTA (data notshown). AkCBM42 contains one disulfide bond which is not present inCtCBM42A although this difference has no consequence in the overallstructure.

3.3. Mapping CBM42 ligand binding sites by site-directed mutagenesis

As described above, inspection of the three putative pockets ofCtCBM42A, CtCBM42B and CtCBM42C reveals the presence of a highlyconserved aspartate residue that was suggested to play a pivotal role

e sequence alignment. CtCBM42A, CtCBM42B and CtCBM42C are highlighted in yellowence name is color coded according to the catalytic Glycoside Hydrolase module familyot Classified (black). The small phylogram insert is based on an alignment consideringαukaryotic sequences (the figure was produced based on MAFFT/NJ-UPGMA [42]).

2061T. Ribeiro et al. / Biochimica et Biophysica Acta 1804 (2010) 2054–2062

in ligand binding [12] (Fig. 2). To evaluate the importance of thisresidue for arabinan and arabinoxylan recognition,mutant derivativesof CtCBM42A, CtCBM42B and CtCBM42C at each of the three sub-domains were produced and their biochemical properties compared.The data for CtCBM42A, presented in Table 3 and Fig. 5, revealed thatmutation of D138A located in the γ pocket abolishes binding to botharabinoxylan and arabinan. Mutation of S91A in sub-domain βresulted in a modest decrease on the apparent affinity of CtCBM42Aagainst both polysaccharides, while the change of the acid residue atthe α sub-domain (D41A) leads to a considerable reduction in therelative affinity. Thus, the data suggest that Asp138 dominates ligandrecognition by CtCBM42A and that, in contrast with what waspreviously reported for AkCBM42, CtCBM42A α sub-domain has arole in carbohydrate binding through its Asp41 residue. In general, nosignificant differences were found when comparing affinities forarabinan and arabinoxylan. Mutation of Tyr121, which potentiallyestablishes a hydrophobic platform for stacking arabinose residues,had a residual positive effect in ligand affinity (Table 3). This could berelated to a relief effect of the steric hindrance provided by thearomatic triad at the γ pocket.

Data presented above revealed that CtCBM42A sub-domainsα andβ present a much reduced binding capacity when compared with sub-domain γ. To identify the residue substitutions implicated in thisreduced affinity, Ala26 and Phe28 in sub-domain α and Ser73 andSer91 in sub-domain β were replaced by the overlapping residuespresent in sub-domain γ (Fig. 2). These mutant derivatives weregenerated using the gene encoding D138A protein as template for thePCR reaction, since this CBM presents a non-functional γ sub-domain.S91D mutation was able to partially restore the affinity of the D138Amutant thus suggesting that lack of an aspartate residue in the βpocket explains, in part, its reduced affinity. Addition of a tyrosine inorder to create an aromatic triad in this pocket was ineffective toimprove ligand affinity. The same mutations at the α sub-domainhave no effect in ligand recognition suggesting that potentially otherunidentified topological differences are responsible for the impairedaffinity revealed by the two sub-domains.

To evaluate which of the CtCBM42B and CtCBM42C sub-domainsinteract with arabinoxylan and arabinan, the apparent affinities of thewild type proteins and their various mutant derivatives for the twopolysaccharides were determined by AGE. The data, presented inTable 3 and Fig. 5, revealed that CBM42B (and CBM42A) recognizeboth arabinan and arabinoxylan, while CtCBM42C seems to bindarabinan with a higher affinity. Similarly to what was described forCtCBM42A, Asp138 and Asp91 play a key role in the recognition ofarabinoxylan and arabinan by both CtCBM42B and CtCBM42C.Intriguingly, Asp41 seems to play no direct role in the recognition ofboth polysaccharides by CtCBM42B. In contrast, in CtCBM42C thisresidue participates in ligand recognition since CtCBM42C D41Adisplays a reduction in the binding affinity for arabinoxylan. Thus, thedata suggest that both pockets β and γ contribute to the recognition ofcarbohydrates by CtCBM42B and CtCBM42C modules. .A considerablereduction in ligand affinity is only observed when the β and γsubdomains are mutated.

Taken together, these data suggest that CBM42 presents a differentmechanism of ligand recognition when compared with the recentlydescribed examples of members of CBM13 family that bind xylan.CBM13s from Streptomyces lividans and Streptomyces olivaceoviridisbind the xylan backbone internally. The three binding sites (α, β andγ) cooperate during ligand recognition although they display slightlydifferent affinities for xylohexaose. Accommodation of xylooligosac-charides in a pocket-like binding site results from subtle changes inthe amino acid residues that decorate the three binding interfaces.Data reported by Fujimoto et al. [14], Notenboom et al. [13,17] andScharpf et al. [17] suggest that the structure of a xylan moleculeprecludes the binding of a single carbohydrate chain by the three sub-domains at the same time, suggesting that the cooperative effects are

mainly important for binding insoluble xylan clusters. It is thusintriguing why a molecular scaffold that originally displayed threefunctional binding sites, such as CBM42, has evolved a mechanism ofbinding that restricts ligand recognition primarily to a single bindingpocket, such as it has been described here for CtCBM42A, CtCBM42Band CtCBM42C. It is possible that ligand recognition through a singleinterface allows the correct positioning of the associated catalyticdomain on the arabinoxylan molecule, thus efficiently directing theenzyme catalytic site into the substrate. In addition, it is possible thatin CtCBM42A, the α and β sub-domains have an unknown ligandspecificity, which could contribute to raise the ligand plasticity ofCBM42s. Finally, the location of CtCBM42A, CtCBM42B and CtCBM42Cin modular enzymes that are part of a highly elaborate multi-enzymecomplex, the cellulosome, may have generated positional and stericconstrains that may have resulted in the evolution of higher ligandaffinities in specific CBM42 subdomains.

3.4. Evolutionary relationships among CBM42 family members

The evolutionary relationships among family 42 CBMs depositedwithin CAZy database are depicted in the phylogenetic tree presented inFig. 6. Currently there are 50 available polypeptide sequences in CBM42,which include representatives from eubacteria and eukaryotes. Overall,the analysis indicates that polypeptides originated from eubacteria oreukaryotes are grouped in different clusters. In addition, sequenceclustering also parallels the nature of the catalytic domain to which themodule is appended to. Thus, there are two clearly distinct groups ofCBM42 modules: those that are appended to GH43 bacterial enzymesand those that belong to GH54 eukaryotic enzymes. Although GH54enzymes comprehendmainly arabinofuranosidases (EC 3.2.1.55), GH43enzymes that are bound to CBM42 modules remain to be characterizedbiochemically. GH43 is a large and diverse GH family currentlycontaining more than 1100 entries. Most of GH43 members remain tobe characterized although the family contains β-xylosidase (EC3.2.1.37), β-1,3-xylosidase (EC 3.2.1.-), α-L-arabinofuranosidase (EC3.2.1.55), arabinanase (EC 3.2.1.99), xylanase (EC 3.2.1.8) and galactan1,3-β-galactosidase (EC 3.2.1.145) enzyme activities. Cthe_0015,Cthe_2138 and Cthe_2139 GH43 domains display strong identity withuncharacterized GH43 domains which precludes any speculation abouttheir putative substrate specificities. In addition, two small clusterscontainingGH2and unclassified catalytic domainswere detected,whileone of the enzymes is appended to a GH93 module. The smallphylogram image inserted in Fig. 6 depicts a genetic phylogeny basedon an alignment that considers CBM42 α, β and γ sub-domains asindividual sequences.

4. Conclusions

Here, the ligand specificities of the three unique family 42 CBMs of C.thermocellum cellulosome (CtCBM42A, CtCBM42B and CtCBM42C) wereevaluated and the crystal structure of CtCBM42A was determined at a1.8 Å resolution. Cellulosomal CBM42s are specifically located in GH43enzymes, an enzyme family that usually displays arabinofuranosidaseactivity by removing the arabinose side-chains of complexhemicelluloses.Cellulosomal CBM42s were shown to share identical ligand specificitiesand bind arabinose containing polysaccharides, particularly arabinoxylanand arabinan, thus reflecting the putative substrate specificity of theassociated catalytic modules. Hence, C. thermocellum CBM42s target thecellulosomal arabinofuranosidase catalytic components to the plant cellwall regions presenting arabinose containing polysaccharides. Thus, incontrastwith the observed variation in ligand specificity revealed bymostCBM families, CBM42members seems to present a muchmore restrictedcapacity for recognizing plant cell wall polysaccharides. The structure ofCtCBM42A revealed a β-trefoil fold with the CBM containing threeputative ligand binding sub-domains with a pocket-like architecture.However, mutagenesis experiments revealed that although CtCBM42A,

2062 T. Ribeiro et al. / Biochimica et Biophysica Acta 1804 (2010) 2054–2062

CtCBM42B and CtCBM42C hold a topology with three potential ligand-binding sites, binding of arabinoxylan and arabinan by CtCBM42A isorchestrated primarily by the γ sub-domain. On the other hand,CtCBM42B and CtCBM42C bind to the arabinose containing polysacchar-ides through the concerted action of both the β and γ pockets. Thus, thiswork confirms that CBM42modules display a platform that could sustainthree potential ligand binding pockets although sequestration ofarabinose containing hemicelluloses is dominated, primarily, by the γsub-domain. The work presented here exemplifies how similar CBMscaffolds presenting multiple ligand binding interfaces have evolveddifferent mechanisms of polysaccharide recognition.

Acknowledgments

This work was supported by the grant PTDC/BIA-PRO/69732/2006 from the FCT, Portugal, and by the individual fellowships SFRH/BD/32326/2006 (T.R.), SFRH/BPD/26991/2006 (T.S.S) and SFRH/BPD/26508/2006 (F.M.V.D.).

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.bbapap.2010.07.006.

References

[1] M.E. Himmel, E.A. Bayer, Lignocellulose conversion to biofuels: current challenges,global perspectives, Curr. Opin. Biotechnol. 20 (2009) 316–317.

[2] A.B. Boraston, D.N. Bolam, H.J. Gilbert, G.J. Davies, Carbohydrate-binding modules:fine-tuning polysaccharide recognition, Biochem. J. 382 (2004) 769–781.

[3] B.L. Cantarel, P.M. Coutinho, C. Rancurel, T. Bernard, V. Lombard, B. Henrissat, TheCarbohydrate-Active EnZymes database (CAZy): an expert resource for glycoge-nomics, Nucleic Acids Res. 37 (2009) D233–D238.

[4] S. Raghothama, P.J. Simpson, L. Szabo, T. Nagy, H.J. Gilbert, M.P. Williamson,Solution structure of the CBM10 cellulose binding module from Pseudomonasxylanase A, Biochemistry 39 (2000) 978–984.

[5] A.B. Boraston, T.J. Revett, C.M. Boraston, D. Nurizzo, G.J. Davies, Structural andthermodynamic dissection of specific mannan recognition by a carbohydratebinding module, TmCBM27, Structure 11 (2003) 665–675.

[6] S.J. Charnock, D.N. Bolam, J.P. Turkenburg, H.J. Gilbert, L.M. Ferreira, G.J. Davies, C.M. Fontes, The X6 “thermostabilizing” domains of xylanases are carbohydrate-binding modules: structure and biochemistry of the Clostridium thermocellum X6bdomain, Biochemistry 39 (2000) 5013–5021.

[7] L. Szabo, S. Jamal, H. Xie, S.J. Charnock, D.N. Bolam, H.J. Gilbert, G.J. Davies,Structure of a family 15 carbohydrate-binding module in complex withxylopentaose. Evidence that xylan binds in an approximate 3-fold helicalconformation, J. Biol. Chem. 276 (2001) 49061–49065.

[8] C. Montanier, A.L. van Bueren, C. Dumon, J.E. Flint, M.A. Correia, J.A. Prates, S.J.Firbank, R.J. Lewis, G.G. Grondin, M.G. Ghinet, T.M. Gloster, C. Herve, J.P. Knox, B.G.Talbot, J.P. Turkenburg, J. Kerovuo, R. Brzezinski, C.M. Fontes, G.J. Davies, A.B.Boraston, H.J. Gilbert, Evidence that family 35 carbohydrate binding modulesdisplay conserved specificity but divergent function, Proc. Natl. Acad. Sci. USA 106(2009) 3065–3070.

[9] S. Najmudin, C.I. Guerreiro, A.L. Carvalho, J.A. Prates, M.A. Correia, V.D. Alves, L.M.Ferreira, M.J. Romao, H.J. Gilbert, D.N. Bolam, C.M. Fontes, Xyloglucan isrecognized by carbohydrate-binding modules that interact with beta-glucanchains, J. Biol. Chem. 281 (2006) 8815–8828.

[10] J.L. Henshaw, D.N. Bolam, V.M. Pires, M. Czjzek, B. Henrissat, L.M. Ferreira, C.M.Fontes, H.J. Gilbert, The family 6 carbohydrate binding module CmCBM6-2contains two ligand-binding sites with distinct specificities, J. Biol. Chem. 279(2004) 21552–21559.

[11] V.M. Pires, J.L. Henshaw, J.A. Prates, D.N. Bolam, L.M. Ferreira, C.M. Fontes, B.Henrissat, A. Planas, H.J. Gilbert, M. Czjzek, The crystal structure of the family 6carbohydrate bindingmodule from Cellvibrio mixtus endoglucanase 5a in complexwith oligosaccharides reveals two distinct binding sites with different ligandspecificities, J. Biol. Chem. 279 (2004) 21560–21568.

[12] A. Miyanaga, T. Koseki, Y. Miwa, Y. Mese, S. Nakamura, A. Kuno, J. Hirabayashi, H.Matsuzawa, T. Wakagi, H. Shoun, S. Fushinobu, The family 42 carbohydrate-bindingmodule of family 54 alpha-L-arabinofuranosidase specifically binds the arabinofur-anose side chain of hemicellulose, Biochem. J. 399 (2006) 503–511.

[13] V. Notenboom, A.B. Boraston, S.J. Williams, D.G. Kilburn, D.R. Rose, High-resolution crystal structures of the lectin-like xylan binding domain fromStreptomyces lividans xylanase 10A with bound substrates reveal a novel modeof xylan binding, Biochemistry 41 (2002) 4246–4254.

[14] Z. Fujimoto, A. Kuno, S. Kaneko, H. Kobayashi, I. Kusakabe, H. Mizuno, Crystalstructures of the sugar complexes of Streptomyces olivaceoviridis E-86 xylanase:

sugar binding structure of the family 13 carbohydrate bindingmodule, J. Mol. Biol.316 (2002) 65–78.

[15] A. Miyanaga, T. Koseki, H. Matsuzawa, T. Wakagi, H. Shoun, S. Fushinobu, Crystalstructure of a family 54 alpha-L-arabinofuranosidase reveals a novel carbohy-drate-binding module that can bind arabinose, J. Biol. Chem. 279 (2004)44907–44914.

[16] H. Ichinose, M. Yoshida, Z. Fujimoto, S. Kaneko, Characterization of a modularenzyme of exo-1, 5-alpha-L-arabinofuranosidase and arabinan binding modulefrom Streptomyces avermitilis NBRC14893, Appl. Microbiol. Biotechnol. 80 (2008)399–408.

[17] M. Scharpf, G.P. Connelly, G.M. Lee, A.B. Boraston, R.A. Warren, L.P. McIntosh, Site-specific characterization of the association of xylooligosaccharides with theCBM13 lectin-like xylan binding domain from Streptomyces lividans xylanase 10Aby NMR spectroscopy, Biochemistry 41 (2002) 4255–4263.

[18] E.A. Bayer, J.P. Belaich, Y. Shoham, R. Lamed, The cellulosomes: multienzymemachines for degradation of plant cell wall polysaccharides, Annu. Rev. Microbiol.58 (2004) 521–554.

[19] A.L. Carvalho, F.M. Dias, J.A. Prates, T. Nagy, H.J. Gilbert, G.J. Davies, L.M. Ferreira,M.J. Romao, C.M. Fontes, Cellulosome assembly revealed by the crystal structure ofthe cohesin–dockerin complex, Proc. Natl. Acad. Sci. USA 100 (2003)13809–13814.

[20] A.L. Carvalho, F.M. Dias, T. Nagy, J.A. Prates, M.R. Proctor, N. Smith, E.A. Bayer, G.J.Davies, L.M. Ferreira, M.J. Romao, C.M. Fontes, H.J. Gilbert, Evidence for a dualbinding mode of dockerin modules to cohesins, Proc. Natl. Acad. Sci. USA 104(2007) 3089–3094.

[21] B.A. Pinheiro, M.R. Proctor, C. Martinez-Fleites, J.A. Prates, V.A. Money, G.J. Davies,E.A. Bayer, C.M. Fontesm, H.P. Fierobe, H.J. Gilbert, The Clostridium cellulolyticumdockerin displays a dual binding mode for its cohesin partner, J. Biol. Chem. 283(2008) 18422–18430.

[22] J. Tormo, R. Lamed, A.J. Chirino, E. Morag, E.A. Bayer, Y. Shoham, T.A. Steitz, Crystalstructure of a bacterial family-III cellulose-binding domain: a general mechanismfor attachment to cellulose, EMBO J. 15 (1996) 5739–5751.

[23] B.A. Pinheiro, H.J. Gilbert, K. Sakka, K. Sakka, V.O. Fernandes, J.A. Prates, V.D. Alves,D.N. Bolam, L.M. Ferreira, C.M. Fontes, Functional insights into the role of noveltype I cohesin and dockerin domains from Clostridium thermocellum, Biochem. J.(2009).

[24] A.L. Carvalho, A. Goyal, J.A. Prates, D.N. Bolam, H.J. Gilbert, V.M. Pires, L.M. Ferreira,A. Planas, M.J. Romao, C.M. Fontes, The family 11 carbohydrate-binding module ofClostridium thermocellum Lic26A-Cel5E accommodates beta-1, 4- and beta-1, 3–1,4-mixed linked glucans at a single binding site, J. Biol. Chem. 279 (2004)34785–34793.

[25] P. Tomme, A. Boraston, J.M. Kormos, R.A. Warren, D.G. Kilburn, Affinityelectrophoresis for the identification and characterization of soluble sugarbinding by carbohydrate-binding modules, Enzyme Microb. Technol. 27 (2000)453–458.

[26] K. Takeo, Affinity electrophoresis: principles and applications, Electrophoresis 5(1984) 187–195.

[27] A.G.W. Leslie, Recent changes to the MOSFLM package for processing film andimage plate data, Newslett. Protein Crystallogr. 26 (1992).

[28] W. Kabsch, Automatic-indexing of rotation diffraction patterns, J. Appl. Crystal-logr. 21 (1988) 67–71.

[29] COLLABORATIVE COMPUTATIONAL PROJECT NUMBER 4, NUMBER 4, ``The CCP4Suite: Programs for Protein Crystallography'', Acta Crystallogr. D50 (1994) 760–763.

[30] B.W. Matthews, Solvent content of protein crystals, J. Mol. Biol. 33 (1968)491–497.

[31] A.J. McCoy, R.W. Grosse-Kunstleve, P.D. Adams, M.D. Winn, L.C. Storoni, R.J. Read,Phaser crystallographic software, J. Appl. Crystallogr. 40 (2007) 658–674.

[32] N. Stein, CHAINSAW: a program for mutating pdb files used as templates inmolecular replacement, J. Appl. Crystallogr. 41 (2008) 3.

[33] A. Perrakis, R.Morris, V.S. Lamzin, Automatedproteinmodel building combinedwithiterative structure refinement, Nat. Struct. Biol. 6 (1999) 458–463.

[34] P. Emsley, K. Cowtan, Coot: model-building tools for molecular graphics, ActaCrystallogr. D Biol. Crystallogr. 60 (2004) 2126–2132.

[35] G.N. Murshudov, A.A. Vagin, E.J. Dodson, Refinement of macromolecularstructures by the maximum-likelihood method, Acta Crystallogr. D Biol. Crystal-logr. 53 (1997) 240–255.

[36] R.A. Laskowski, M.W. MacArthur, D.S. Moss, J.M. Thornton, PROCHECK: a programto check the stereochemical quality of protein structures, J. Appl. Crystallogr. 26(1993) 283–291.

[37] G.J. Kleywegt, T.A. Jones, Phi/psi-chology: Ramachandran revisited, Structure 4(1996) 1395–1400.

[38] M. Nayal, E. Di Cera, Valence screening of water in protein crystals revealspotential Na+ binding sites, J. Mol. Biol. 256 (1996) 228–234.

[39] M.S. Weiss, R. Hilgenfeld, A method to detect nonproline cis peptide bonds inproteins, Biopolymers 50 (1999) 536–544.

[40] I.W. Davis, A. Leaver-Fay, V.B. Chen, J.N. Block, G.J. Kapral, X. Wang, L.W. Murray,W.B. Arendall III, J. Snoeyink, J.S. Richardson, D.C. Richardson, MolProbity: all-atom contacts and structure validation for proteins and nucleic acids, NucleicAcids Res. 35 (2007) W375–W383.

[41] E.F. Pettersen, T.D. Goddard, C.C. Huang, G.S. Couch, D.M. Greenblatt, E.C. Meng, T.E. Ferrin, UCSF Chimera—a visualization system for exploratory research andanalysis, J. Comput. Chem. 25 (2004) 1605–1612.

[42] K. Katoh, H. Toh, Recent developments in theMAFFTmultiple sequence alignmentprogram, Brief. Bioinform. 9 (2008) 286–298.