Transcript
Page 1: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

J Grid Computing (2012) 10:743–767DOI 10.1007/s10723-012-9246-z

WeNMR: Structural Biology on the Grid

Tsjerk A. Wassenaar · Marc van Dijk · Nuno Loureiro-Ferreira · Gijs van der Schot · Sjoerd J. de Vries ·Christophe Schmitz · Johan van der Zwan · Rolf Boelens · Andrea Giachetti · Lucio Ferella ·Antonio Rosato · Ivano Bertini · Torsten Herrmann · Hendrik R. A. Jonker · Anurag Bagaria ·Victor Jaravine · Peter Güntert · Harald Schwalbe · Wim F. Vranken · Jurgen F. Doreleijers ·Gert Vriend · Geerten W. Vuister · Daniel Franke · Alexey Kikhney · Dmitri I. Svergun ·Rasmus H. Fogh · John Ionides · Ernest D. Laue · Chris Spronk · Simonas Jurkša · Marco Verlato ·Simone Badoer · Stefano Dal Pra · Mirco Mazzucato · Eric Frizziero · Alexandre M. J. J. Bonvin

Received: 12 December 2011 / Accepted: 18 September 2012 / Published online: 30 November 2012© The Author(s) 2012. This article is published with open access at Springerlink.com

Abstract The WeNMR (http://www.wenmr.eu)project is a European Union funded interna-tional effort to streamline and automate analy-

T. A. Wassenaar · M. van Dijk · N. Loureiro-Ferreira ·G. van der Schot · S. J. de Vries · C. Schmitz ·J. van der Zwan · R. Boelens · A. M. J. J. Bonvin (B)Bijvoet Center for Biomolecular Research, Faculty ofScience, Utrecht University, Padualaan 8, 3584 CH,Utrecht, The Netherlandse-mail: [email protected]

A. Giachetti · L. Ferella · A. Rosato · I. BertiniMagnetic Resonance Center, University of Florence,50019 Sesto Fiorentino, Italy

T. HerrmannCentre de RMN à très Hauts Champs, Institut desSciences Analytiques, Université de Lyon, UMR-5280CNRS, ENS Lyon, UCB Lyon 1, 5 rue de la Doua,69100 Villeurbanne, France

H. R. A. Jonker · H. SchwalbeInstitute of Organic Chemistry and ChemicalBiology and Biomolecular Magnetic ResonanceCenter, Goethe University Frankfurt,60438 Frankfurt am Main, Germany

A. Bagaria · V. Jaravine · P. GüntertInstitute of Biophysical Chemistry and BiomolecularMagnetic Resonance Center, Goethe UniversityFrankfurt, 60438 Frankfurt am Main, Germany

W. F. VrankenEuropean Bioinformatics Institute, Hinxton,Cambridge, CB10 1SD, UK

sis of Nuclear Magnetic Resonance (NMR) andSmall Angle X-Ray scattering (SAXS) imag-ing data for atomic and near-atomic resolution

J. F. DoreleijersProtein Biophysics/IMM, Radboud UniversityNijmegen, Geert Grooteplein 26-28,Nijmegen, The Netherlands

G. VriendCMBI, Radboud University Nijmegen MedicalCentre, Geert Grooteplein 26-28,Nijmegen, The Netherlands

G. W. VuisterDepartment of Biochemistry, School of BiologicalSciences, Henry Wellcome Building, University ofLeicester, Lancaster Road, Leicester LE1 9HN, UK

D. Franke · A. Kikhney · D. I. SvergunEuropean Molecular Biology Laboratory,Hamburg Outstation, Notkestrasse 85,D22603 Hamburg, Germany

R. H. Fogh · J. Ionides · E. D. LaueDepartment of Biochemistry, University ofCambridge, 80 Tennis Court Road, CB2 1GA,Cambridge, UK

C. Spronk · S. JurkšaUAB “Spronk NMR Consultancy”, Palangos gatvë 4,01402, Vilnius, Lithuania

M. Verlato · S. Badoer · S. Dal Pra ·M. Mazzucato · E. FrizzieroIstituto Nazionale di Fisica Nucleare,Sez. di Padova, 35131 Padova, Italy

Page 2: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

744 T.A. Wassenaar et al.

molecular structures. Conventional calculation ofstructure requires the use of various softwarepackages, considerable user expertise and amplecomputational resources. To facilitate the use ofNMR spectroscopy and SAXS in life sciencesthe WeNMR consortium has established standardcomputational workflows and services througheasy-to-use web interfaces, while still retainingsufficient flexibility to handle more specific re-quests. Thus far, a number of programs often usedin structural biology have been made availablethrough application portals. The implementationof these services, in particular the distributionof calculations to a Grid computing infrastruc-ture, involves a novel mechanism for submissionand handling of jobs that is independent of thetype of job being run. With over 450 registeredusers (September 2012), WeNMR is currently thelargest Virtual Organization (VO) in life sciences.With its large and worldwide user community,WeNMR has become the first Virtual ResearchCommunity officially recognized by the EuropeanGrid Infrastructure (EGI).

Keywords Web portals · Nuclear magneticresonance · Small angle x-ray scattering ·Structural biology · Proteins ·Virtual research community

Present Address:T. A. WassenaarBiocomputing Group, Department of BiologicalSciences, University of Calgary, 2500 University DriveNW, AB T2N 1N4 Calgary, Canada

Present Address:N. Loureiro-FerreiraEuropean Grid Infrastructure (EGI),140 Science Park, 1098 XG Amsterdam,The Netherlands

Present Address:W. F. VrankenDepartment of Structural Biology, VIB, andStructural Biology Brussels, Vrije Universiteit Brussel,Pleinlaan 2, 1050 Brussels, Belgium

Present Address:S. Dal PraIstituto Nazionale di Fisica Nucleare, CNAF,40127 Bologna, Italy

1 Introduction

1.1 NMR Spectroscopy

NMR Spectroscopy is one of two main techniquesthat allow determining three dimensional (3D)structures of biomacromolecules, such as proteins,RNA, DNA, and their complexes, at atomic reso-lution. Knowledge of their 3D structures is vitalfor understanding functions and mechanisms ofaction of macromolecules, and for elucidating andpredicting the effect of mutations. 3D structuresare also important as guides for the design of newexperimental studies and as starting points forrational drug design. An advantage of NMR overX-ray crystallography is that it also allows investi-gation of time-dependent chemical and conforma-tional phenomena, including reaction and foldingkinetics and intramolecular dynamics. For thesereasons, NMR plays an important role within thelife sciences.

The principles underlying NMR are modula-tion of the natural magnetic moment of atomicnuclei, and measurements of how the system re-laxes back to the initial state [1, 2]. The signal thusobtained is a fading wave consisting of many in-dividual frequency contributions: the Free Induc-tion Decay, FID. Typically, up to 27000 differentfrequencies can be resolved at the highest mag-netic fields that are nowadays available. To in-vestigate the frequency contributions and theirdecays, such measurements have to be repeatedmany times, due to the low signal-to-noise ratio.To obtain structural information from NMR data,many more, but also more complex measurementshave to be run, yielding substantial amounts ofdata that need processing.

Processing data from NMR to obtain a 3Dstructure typically involves the following steps,summarized graphically in Fig. 1. First the rawdata have to be processed, more specificallyFourier-transformed, to obtain spectra revealingthe different frequency contributions and theirrelations. These frequencies are the resonances ofthe atoms measured, but to infer structural infor-mation from them, these resonances subsequentlyhave to be assigned to individual contributors(atoms/residues). If the assignment is sufficientlycomplete, structural restraints can be determined

Page 3: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 745

Fig. 1 NMR data processing from signal to 3D struc-ture. After acquisition of the primary NMR data, theseare Fourier transformed to obtain spectra in which theindividual frequency contributions or resonances of spinsystems, and their relations, are revealed. The resonancessubsequently have to be assigned to individual atoms. Ifsufficient resonances have been assigned, restraints can beinferred from the data, pertaining to distances between

atoms, dihedral angles, domain orientations, etc. When anadequate number of restraints is available, these can beused to calculate a set of three-dimensional structures op-timally satisfying these restraints. The resulting structuresrepresent the structure of the protein in solution, whichis validated against the available experimental data. Al-though the process is here depicted linearly, intermediatestages may involve iterative cycles of refinement

from the spectra, including inter-atomic distancerestraints, dihedral angle restraints, and orienta-tion restraints. These structural restraints are thenused to calculate a number of structures using avariety of molecular modeling approaches, afterwhich structure validation checks are performedto assert the quality of the results.

For each of the steps involved, specialized com-puter programs are available, each with its owncharacteristics and often with its own data for-mat. Processing of NMR data has thus become atask for specialists, who can understand the dataand their formats, as well as the programs, withinstallation requirements and usage details. Fur-thermore, NMR data processing requires consid-erable data storage and computational resources.These factors together currently represent a bar-rier for groups in life sciences to employ thefull power of NMR. Against this background, theeNMR project was run as a European initiativefunded under the Framework 7 e-Infrastructureprogramme to considerably facilitate this process[3]. It is now carried on by the WeNMR (aWorldwide e-Infrastructure for NMR and struc-

tural biology) project since November 2010. Theproject aims at allowing groups lacking the re-sources to add NMR to their toolbox, as well asallowing dedicated NMR groups to improve theirstandard from basic practice towards cutting-edgeresearch.

1.2 Small Angle X-Ray Scattering

Small Angle X-Ray Scattering (SAXS) is a widelyused method for the low resolution structuralanalysis of biological macromolecules in solution[4, 5]. The sample is exposed to a collimated andpossibly focused beam of X-rays, where the elasticscattering of the illuminated sample volume isrecorded. The observed scattering is anisotropicfunction of the scattering angle thus yielding aone-dimensional scattering pattern after radiallyaveraging the scattering data (typically recordedby a two-dimensional detector) around the beamcenter. From these scattering patterns, a varietyof important overall parameters, e.g. informationabout size, volume or shape, can readily be de-rived. Although sample requirements of SAXS

Page 4: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

746 T.A. Wassenaar et al.

are mostly similar to those of NMR or crystallog-raphy, i.e. the need of highly pure, monodisperseparticles that remain solvable at high (startingfrom ca 1 mg/ml) concentration, there are dis-tinct advantages of SAXS over the other methods.Firstly, no additional sample preparation step, e.g.crystallization, is required; secondly, on modernsynchrotrons data collection and initial process-ing can be performed within a few seconds orminutes, allowing for an almost instant assessmentof sample quality and its invariant parameterssuch as the radius of gyration, molecular massand volume. Further, ab initio modeling can oftenbe performed directly at the beamline, so thatpreliminary shapes of the measured protein areavailable within minutes, not hours or even daysas required with other methods.

SAXS is highly complementary to NMR. In-deed, SAXS covers a broad range of macromolec-ular sizes but has a limited resolution (typically,10–20 Angstrom), whereas NMR has a limitedapplicability to large objects. For multisubunitcomplexes or multidomain proteins SAXS pro-vides the information on intersubunit distances,but might be less sensitive to rotations; NMR datacan yield local distances and orientations but doesnot probe the global arrangement of the com-ponents. The atomic models obtained by NMRcan be cross-validated using SAXS or employedas rigid bodies in quaternary structure analysis ofmulticomponent particles (e.g. [6, 7]). In additionto the hybrid approaches, methods have alreadybeen developed to directly incorporate SAXSconstraints during NMR structure refinement [8–10]. The two methods can also be jointly usedto analyze flexible systems like partially unfoldedproteins providing an ensemble description ofsuch systems which provides more insight than asingle conformation solution in the rigid case [11].

1.3 Objectives of WeNMR

The main objectives of the WeNMR project thusare:

• to provide integrated protocols for NMR andSAXS data processing

• to provide access to end users through user-friendly web interfaces

• to exploit Grid technology for computation-ally demanding tasks in structural biology

• to lower the barriers for access to Grid re-sources in life sciences, notably in structuralbiology

• to build a Virtual Research Communityaround a web portal

Considering the background sketched, these ob-jectives set the challenges to be met within theproject. The first of these has been the imple-mentation of a new NMR Grid infrastructure.Historically, due to the requirements for process-ing of large amounts of data, NMR spectroscopyhas always been intimately linked with high per-formance computing. Therefore, sites with high-end facilities for performing NMR measurementscommonly also have considerable computationalresources. For the WeNMR partners it thus cameas a natural first step to integrate the existingresources into a Grid, offering a single standardfor deployment and use of applications across thecontributing sites, as well as a natural mecha-nism to share resources. Currently, the WeNMRproject involves an operational Grid that was ini-tially based on the gLite middleware, developedin the context of the EGEE project series [12].Since May 2010 gLite has been supported by theEMI project (European Middleware Initiative),aimed to deliver a consolidated set of middle-ware components based on the four major mid-dleware providers in Europe: ARC, dCache, gLiteand UNICORE. The WeNMR Grid infrastruc-ture (latter referred as ‘the Grid’) consists of thecomputing and mass storage resources of the fourWeNMR core partner sites operating a numberof central EMI Grid services like WMS [13], LB,Top-BDII, LFC and locally the EMI Grid servicesCREAM-CE and SE, plus a number of Grid sitespart of the EGI infrastructure operated by thoseNational Grid Initiatives that agreed to supportour Virtual Organization. During 2011 WeNMR,in collaboration with the SBGrid Consortium, car-ried out a program to achieve interoperabilitywith the Open Science Grid (OSG) infrastructurein US, that is based on the Globus middleware(see Section 4.3). Having an operational Grid, theprograms involved in the different steps, whichoften require direct user interaction, have to be

Page 5: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 747

interfaced in such a way that they can be runautomatically. Focus has been initially placed onthe CPU intensive programs, which have to beoperated remotely as Grid enabled applications.This has to be done in such a way that they can becombined in automated workflows for protocol-ized processing of data, raising the issue of inter-operability. In addition, web interfaces should beset up to be easy to use, yet sufficiently flexible forexpert users. At the same time a mechanism is re-quired to handle job traffic to and from the Grid.In the following sections, these different aspectsare discussed in more detail, providing an accountof the state of the project thus far. But beforediscussing the more technical details regarding theimplementation, the application portals that areavailable are discussed in more detail.

1.4 Related Work

WeNMR is clearly not the first and only projectoffering web portals and science gateways. Thelatter are defined as a community-specific setof tools, applications, and data collections thatare integrated together via a web portal or adesktop application, providing access to resourcesand services [14]. The concept introduced by theformer TeraGrid project, XSEDE nowadays,is gaining momentum as several projects ande-infrastructures like WeNMR, Gisela and theEuropean Grid Infrastructure promote the samegateway model to their communities (https://gisela-gw.ct.infn.it/science-gateways, http://go.egi.eu/sciencegateways). The gateways support avariety of capabilities including workflows, vir-tualisation software and hardware, visualisationas well as resource discovery, job execution ser-vices, access to data collections, applications,and tools for data analysis. The science gatewayparadigm has been introduced to overcome thedifficulties met by virtual research communitiesto access distributed resources. Their main goalis to hide the underlying complexity of a system,which gathers together heterogeneous resources,while enforcing access security to distributedcomputing infrastructures (DCIs) [15]. Whenit comes to developing gateways, there is nounique solution but typically various philosophiesand implementations are followed. For instance,

next to the WeNMR portals described here,gUSE/WS-PGRADE [16, 17] and Genius [18]are two Grid portal technologies with a rich setof features, well known for their ability to buildcomplex application workflows. Other examplesof workflow-enabled Grid portals are OGCE[19] and Vine Toolkit [20]. All these tools offerfeatures for managing the whole life cycle ofgeneric workflows on diverse DCIs, like Gridsand clouds. Collaborative environments, likeInSilicoLab (http://insilicolab.Grid.cyfronet.pl/)and SOMA2 (http://www.csc.fi/english/pages/soma)offer modeling environments for researchers.

In the case of WeNMR most of the web por-tals have been developed prior to the WeNMRproject, and independently of the use of the Grid.This explains the heterogeneity of the web portals.WeNMR main focus was not to define a perfectmodel to apply it to all our web portals, but toprovide solutions that are attractive to the enduser. In that context, allowing some flexibilityfor the WeNMR developers seems reasonable.Maybe more important than the choice of an ar-chitecture, a model, and its implementation, weconsider that the success of the WeNMR projectshould be measured by its number of users, andtheir satisfaction.

2 The WeNMR Web Portals for StructuralBiology

The web portals developed within WeNMRare among the most important elements of theproject, as these form the points of entry forthe end users. The ultimate goal is to offer tousers registered with the eNMR/WeNMR VirtualOrganization (VO) complete online protocols forprocessing NMR and SAXS data. The steps of atypical NMR data analysis are shown in Fig. 1.It is to note that each of the steps shown, andevery program involved, has value by itself asweb based service. For this reason, a piece-wiseimplementation has been adopted and programsthat are ported to the Grid are simultaneouslybeing made available as a web portal. Currently,nineteen application portals are operational andcan be accessed on http://www.wenmr.eu. Theterm ‘application portals’ in this text is used

Page 6: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

748 T.A. Wassenaar et al.

to describe the access to a specific softwarepackage via a web interface giving access insome instance to various pre-defined workflowsor protocols settings within the application. Theworkflow are thus integral part of the softwarepackage and not implemented at a higher levelusing workflow managers like Taverna [21] orShiwa (http://www.shiwa-workflow.eu/) for exam-ple. The WeNMR website, next to offering alarge variety of documentation and a user sup-port center, presents an aggregation of severalapplications portals (or simply ‘portals’). Theseportals provide access, among other services, toHADDOCK [22, 23] for the prediction of bio-molecular complexes, Xplor-NIH [24], CYANA[25, 26] and CS-ROSETTA [27, 28] for calcu-lating structures from NMR data, AMBER [29]and GROMACS [30] for structure refinementand molecular dynamics (MD) simulations, Ccp-Nmr [31] for data conversion, MARS [32] forbackbone assignment, TALOS+ [33] for torsionangle prediction, MDDNMR [34] for NUS(Non-Uniform Sampling) spectral processing, andselected applications from the SAXS data process-ing suite ATSAS [35]. Next to these availableportals, several new ones for various NMR ap-plications have been recently introduced, includ-ing the UNIO program [36, 37] that providescomputational routines for each individual stepdepicted in Fig. 1. The main WeNMR portalfor NMR applications is shown in Fig. 2. Thecentral access point for all WeNMR portals is:http://www.wenmr.eu/wenmr/nmr-services.

2.1 HADDOCK

HADDOCK [22, 23] is an acronym for High Am-biguity Driven DOCKing and is a program to pre-dict structures of biomolecular complexes fromindividual components. As the full name indicates,this approach in docking of biomolecules distin-guishes itself from other methods by using exter-nal information to guide the docking process. Suchinformation can be empirical, theoretical or both,pertaining to the residues or atoms involved in thebinding interface. From this information ambigu-ous restraints are derived that are used to drivethe docking. HADDOCK is particularly useful inpredicting complexes from known experimental

structures of the partners using NMR data, such aschemical shift perturbations and residual dipolarcouplings (RDCs). Chemical shift perturbationsand RDCs can be obtained relatively easily andalso for macromolecules of increasing size, mak-ing the large applicability of HADDOCK as atool for cutting edge structural biology appar-ent. HADDOCK has proven its value within theCAPRI (Critical Assessment of PRediction of In-teractions) experiment, a blind evaluation of theperformance of current docking methods [22, 38–40].

The initial rigid-body docking process gener-ates typically a large number of structures, typ-ically in the order of thousands. From these anumber of structures, typically several hundred,are selected for further flexible refinement. Theresults are scored, analyzed and returned to theuser. The structure calculations are the CPU in-tensive part of the process typically resulting in400–1,000 individual jobs being sent to the Grid,depending on the exact parameter settings.

Offering the full functionality of HADDOCKthrough a web portal requires putting forth a com-plicated form, contrasting with the objective ofhaving a simple interface. To avoid compromisesregarding user friendliness and functionality theportal is divided in four interfaces, correspondingto different levels of control and user experience:

• The Easy Interface requires no more than pro-viding the two components of a complex andthe residues of each that are involved in theinteraction.

• The Expert Interface allows the user to providehis own customized restraints to be includedin the docking process and to specify certainaspects of the system setup, sampling andanalysis.

• The Guru Interface offers almost full controlof parameters, allowing e.g. specification ofsymmetry and relaxation anisotropy restraintsand RDCs as well as of parameters pertainingto the energy, the scoring and the analysis ofresults.

• Finally, for complete control a File UploadInterface is available, where a HADDOCKrun parameter file can be provided. This isparticularly useful for those who have their

Page 7: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 749

Fig. 2 The WeNMR webportal (http: //www.wenmr.eu/wenmr/nmr-services)

Page 8: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

750 T.A. Wassenaar et al.

own standard protocol or who want to repli-cate a previous run with minor modifications.This option also offers a simple way to buildpipelines from other applications.

To facilitate the user’s task and keep the formsmanageable, foldable menus were introduced thatgroup related parameters under a single header.In this way, users only need to unfold groups ofoptions that should be changed from their de-fault values. Except for the File Upload Inter-face, the various HADDOCK portals share thedata structure, albeit that part of the variablesis fixed to predefined values for the Easy andExpert interfaces. This has the advantage thatthey can all couple to a single back end CGI(Common Gateway Interface) script to handle therequest.

After issuing a request, the user is presented alink to a site where the progress can be followed.After the run is finished, the results can be viewedonline and selected complexes or the completeoutput data of the run can be downloaded to alocal machine.

The use of the HADDOCK portal requiresregistration with a valid Grid certificate, givinga username and password. These are thereafterused to sign service requests. The requests them-selves are handled using an eToken-based robotcertificate, as is explained in more detail in theimplementation details.

2.2 Xplor-NIH

Xplor-NIH [24, 41] is one of the programs forstructure calculations that have been ported tothe Grid and are available through the WeNMRweb portal. It is a versatile program that canbe operated through a command line interfaceor with scripts in the specific Xplor language. Itcan incorporate several experimental and empir-ical information, such as NMR or SAXS data, tobe used as restraints. These restraints are addedto the topological description of the system andused to drive it to a folded state using simulatedannealing. This procedure is typically repeatedmany times to obtain sufficient statistics regardingthe goodness of fit of the structures determinedagainst the experimental data. Since the different

annealing runs are independent of each other,they can be easily sent to the Grid as individualGrid jobs.

The portal uses a design that is different fromthe HADDOCK portal, and is aimed at moredirect user interaction during the process. Userslog in with their Grid certificate loaded in theweb browser, gaining access to an environmentwhere projects can be started, stored and man-aged. Structure calculation projects are initiatedby filling in a form and providing files for thestructures and topological descriptions of themolecules, as well as for the different restraints tobe included in the calculations.

When the structure calculations have finished,the user can view and download the results. Inaddition, it is possible to select a number of struc-tures for further energy refinement using the AM-BER package [29] for MD simulations.

2.3 CYANA

Another widely used program for calculat-ing structures from conformational restraints isCYANA (Combined Assignment and dYnamicsAlgorithm for NMR Applications) [25, 26]. Itsmain characteristics are the ability for iterativeassignment of NOE peaks, and structure calcula-tions through simulated annealing in torsion anglespace. Like with Xplor-NIH, the structure calcu-lations involve many simulated annealing runs,divided over several iterative cycles.

The design of the web portal for CYANA issimilar to that of HADDOCK. Foldable menusare used to hide optional sets of parameters, bydefault presenting an intuitive menu offering astandard structure calculation protocol. The por-tal allows three modes of invocation of the ser-vice. Users can request structure calculation (i)using a set of upper distance bound restraints,(ii) providing a list of assigned peaks, or (iii)providing a list of unassigned peaks, in whichcase the automated peak assignment will beperformed.

Use of the service requires having a license forCYANA and registering for use of the portal,presenting a valid Grid certificate. This will givea username and password that can be used to signservice requests.

Page 9: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 751

2.4 CS-ROSETTA

Chemical-Shift ROSETTA or CS-ROSETTA [27,28] unlike Xplor-NIH and CYANA, allows struc-ture determination of proteins, based on chemicalshift information alone. It thus bypasses the needfor NOE (nuclear Overhauser effect) based dis-tance restraints, which usually require consider-able time to obtain. Further advantages of usingchemical shifts are that these are among the mostreliable parameters that can be obtained fromNMR spectroscopy and that they can potentiallybe obtained for larger macromolecules for whichNOEs become impractical. Chemical shift-basedstructure determination is computationally muchmore expensive than structure calculations usingdistance restraints. The most time consuming partof a CS-ROSETTA run consists of a large num-ber (500 to 2,500) of independent Monte Carlocalculations that can be easily distributed over theGrid to calculate in the order of 10,000 to 50,000structures.

Structure determination using CS-ROSETTArequires as only input the amino acid sequenceand a list of chemical shifts and a number of para-meters to control the process that can be changedfrom the default values. The chemical shifts arefirst used to select a set of protein fragmentsfrom a structure database, e.g. the Protein DataBank (PDB) [42], based on the list of chemicalshifts as predicted with SPARTA. Then the reg-ular ROSETTA protocol [43] for Monte Carloassembly and relaxation is used to reassemblethe protein from the fragments. For the resultingmodels the chemical shifts are again predicted us-ing SPARTA [44] and the deviations between thepredicted and target values are used as a pseudo-energy term in the scoring of the models, yieldinga ranking based on both overall structural qualityas well as on the match with the experimentaldata.

The computational cost involved in chemi-cal shift based structure determination makesCS-ROSETTA a typical example of a programthat is beyond the capacity of most local sites.Here, the access to Grid resources througha web-portal, combining computational powerand ease of use, clearly demonstrates its addedvalue.

To use the service, users have to register witha valid Grid certificate to obtain a user nameand password that can subsequently be used tosign service requests. The web interface for CS-ROSETTA itself is straightforward and only re-quires uploading a file containing chemical shiftsin TALOS [45] format, and modifying the para-meters to control the calculations. When the jobhas finished, the user receives an e-mail containinga link to the result page that gives an overviewof the run, including some statistics and imagesto assess the overall quality of the results. Theuser can then select a number of structures toview in more detail or to download, or choose todownload the whole set of results as an archive.

2.5 MARS

The fifth portal, MARS performs automatic back-bone assignment of 13C/15N labeled proteins andis applicable to a wide variety of NMR data, in-cluding RDCs [32].

Its advanced features compare favorably withother assignment tools and include:

• simultaneous optimization of the local andglobal quality of assignment to minimize prop-agation of initial assignment errors and thusproviding robustness against missing chemicalshift information; applicable to proteins above15 kDa using only Ca and Cb chemical shiftinformation with connectivity thresholds ashigh as 0.5 ppm;

• applicable to proteins with very high degener-acy such as partially or fully unfolded proteins;

• combination of the secondary structure pre-diction program PSIPRED [46] with statisticalchemical shift distributions, which were cor-rected for neighboring residue effects [47], toimprove identification of likely positions inthe primary sequence;

• assessment of the reliability of fragment map-ping by performing multiple assignment runswith noise-disturbed chemical shifts.

Registration and a valid Grid certificate areneeded to use the service. Its interface is similar toCYANA: after uploading peak lists a file contain-ing assigned chemical shifts is downloaded afterjob has finished.

Page 10: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

752 T.A. Wassenaar et al.

2.6 TALOS+

TALOS+ [33], like its predecessor TALOS [45],is a program to predict torsion angles for aminoacids given information regarding chemical shiftand the probable regions in the Ramachandranplot for each type of amino acid. TALOS+ distin-guishes itself by the inclusion of a neural networkcomponent, the output of which is added as anempirical term in the conventional TALOS database search. To prevent assignment of torsion an-gles to the backbone of flexible regions, TALOS+first identifies such regions using the flexibilityprediction program RCI developed by Berjanskiiand Wishart [48].

The TALOS+ portal offers a simple interfacewhere an input file in either TALOS or BMRB(Biological Magnetic Resonance Data Bank [49])format can be uploaded. In addition a number ofPDB ID codes can be given, indicating structuresthat should be excluded from the calculations.Unlike most of the other portals, the TALOS+portal can be used without a Grid certificate, asthe calculations are run on a local server.

2.7 MDDNMR

The MDDNMR [34] portal can process individualNUS (Non-Uniform Sampling) multidimensionalNMR spectra. The interface supports the NUSdata recorded by two major spectrometer brands:Varian and Bruker. The main advantage of usageof NUS data is the substantially higher resolu-tion in the indirect spectral dimensions. The NUSacquisition mode of both Vnmr and TopSpinspectrometers controller software makes use ofstandard NMR experiments, except that only afraction of a full data set is recorded. This meansthat it can be used with virtually any pulse se-quence available. After acquisition, MDDNMRreplenishes the missing data points in the fullmatrix. The resulting regular spectra are thenprocessed conventionally with FFT (fast Fouriertransform), LP (Linear Programming), windowfunctions etc. The current portal allows thisto be repeated for each experiment in thedataset; several such high-resolution experimentsare processed sequentially as single matrices, andthe resulting high-resolution FT-domain spectra,

after peak-picking, are amenable for analysis,e.g. automated assignment using MARS. Mostof the experimental data types supported byMDDNMR can be processed via the portal, in-cluding constant-time acquisition (CT), J-couplingsplitting etc. Use of the service requires register-ing for use of the portal, presenting a valid Gridcertificate.

2.8 CcpNmr

The CcpNmr portal, for CcpNmr [31] based dataconversions, is not directly related to NMR dataprocessing, but is an important element withinthe WeNMR project. The reason for this is thefact that the programs already ported to the Gridoften have their own data formats, making itimpossible to combine these as steps in a directpipeline. Establishing interoperability of such pro-grams requires conversion of output from one stepto match the input of a next step. This is whatCcpNmr FormatConverter was designed for. Ithas a comprehensive internal data model [50] thatcan contain all different types of NMR relateddata. These data can be imported from files in alarge number of formats. Likewise, the data storedin a CcpNmr project can be exported in any of thefile formats required, provided that the data arepresent in the model. In this way, CcpNmr pro-vides a straightforward approach in meeting oneof the challenges of the WeNMR project, namelyestablishing interoperability of the programs in-volved in NMR data processing and building au-tomated protocols for complex tasks.

The portal for CcpNmr FormatConverter wasdeveloped to offer the WeNMR members an easysolution for matching program output and inputduring the steps involved in processing of theirdata. At present, the portal allows conversionbetween several file formats, aimed at facilitatingthe use of the other portals. The file conversionsfor the portal are performed locally, as these arenot computationally intensive. Use of the CcpNmrportal thus does not require a Grid certificate.

The CcpNmr FormatConverter portal greatlysimplifies data transfer between programs, butit is still a separate, user-initiated step. Work iscurrently in progress towards truly seamless in-teroperability, where data conversion happens in

Page 11: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 753

the background without the user being aware of it.This requires storing data in the more comprehen-sive and precise CcpNmr internal data model andintegrating the conversion to and from program-specific data formats with the calculation launch.Once the input data for a calculation are knownand precisely defined, it is much easier to interpretthe calculation results in a fully automatic man-ner. In particularly difficult cases, additional datanecessary for interpreting the results can be gen-erated as part of the program launch. The use of asingle data format along a processing pipeline alsomakes it possible to create an alternative, semi-standardized pipeline interface, and simplifies thehandling of workflows where a single set of start-ing data is used to launch different calculations inparallel.

The first result along these lines is the alter-native HADDOCK interface created as part ofthe Extend-NMR project (www.extend-nmr.eu).The interface allows complete control over HAD-DOCK input data and parameters, and works en-tirely within the CcpNmr data model. The result-ing set-up is exported as a HADDOCK parameterfile, which can be executed on the HADDOCKportal through the file upload interface (2.2). An-other example is the facility for running the Cingvalidation server on CcpNmr project files directlyfrom the CcpNmr Analysis program.

2.9 AnisoFit

The AnisoFIT web portal allows the deter-mination of the magnetic susceptibility ten-sor anisotropy of a paramagnetic center (usingpseudocontact shift data or RDCs) or of theanisotropy of the diffusion tensor (for RDCs in-duced by orienting media). This is done by fittingthe input experimental data to the protein coor-dinates, which the user must provide in the formof a PDB file. In addition, it is necessary for theuser to provide information on the experimentalconditions (temperature, magnetic field) and todefine a value for the experimental error (if notspecified in the input data file). The results areprovided in a graphical as well as tabular form,which can be saved locally. Unlike most of theother portals, the AnisoFIT portal can be used

without a Grid certificate, as the calculations arerun on a local server.

2.10 AMBER

AMBER [29] is a suite of programs for use inmolecular modeling and molecular simulations ofproteins and nucleic acids that is now availablethrough the WeNMR web portal. The AMBER-based Portal Server for NMR structures (AMPS-NMR) has been developed to provide an user-friendly entry point mainly designed for the en-ergy refinement of NMR structures based on re-strained molecular dynamics (rMD), but permit-ting also free MD simulations [51].

AMPS-NMR offers a personal workspace forthe creation and management of calculations. Anew user with her/his personal certificate installedin the web browser can access to the portal fromWeNMR web page and straightforwardly createa new user login and password. The portal allowsusers to create a new single calculation or a so-called project in which it is possible to save anumber of different calculations that belong to acommon project (e.g. MD refinements of proteinstructures generated with different ensembles ofrestraints).

The setup of a rMD refinement with AMPS-NMR comprises four steps:

• In the first step the protein structure to beoptimized is uploaded. Here it is possible toupload one PDB file, whose format will be au-tomatically validated and converted into theformat recognized by AMBER. During thisstep the user can add explicit water moleculesto solvate the protein, add counter ions, insertnew bonds for selected atoms (e.g. protein tometal, disulfide bridges). For NMR structures,which are typically represented by a bundleof 20–40 different conformers, this first stepis carried out only for the first conformer andthen automatically applied to all the structuresin the bundle. For each structure in the bun-dle, an individual job is sent to the Grid.

• The second step manages the NMR restraints.Four types of restraint are allowed: NOE, di-hedral angles, RDCs and pseudocontact shifts(PCS). The last restraints are the so-called

Page 12: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

754 T.A. Wassenaar et al.

paramagnetic restraints. For all restraints it ispossible to upload Xplor, Dyana, or Cyanafiles. For paramagnetic restraints, it is possibleto fit the anisotropy tensor directly in the website, since all the functionalities of ANISOFITare embedded in AMPS-NMR.

• The third step manages the setting of rMDcalculations. The user can set manually allMD parameters or select a pre-configured setof parameters, ranging from a simple energy-minimization to a multi-step MD refinementprotocol.

• The fourth and final step allows the user togive a name to the calculation and submit itto the Grid.

After the job has been submitted, the user caneasily check the status of the job (i.e. Sched-uled, Running and Completed) through the user-friendly jobs management web page, as shown inFig. 3. Each completed job is automatically re-trieved from the Grid. All output data are storedin the personal workspace of the user. Withinthis workspace, the user can browse a completed

Fig. 3 AMPS-NMR jobsmanagement. Thiswebpage allows users tomanage their jobssubmitted to/running onthe Grid (box A). Inparticular, it is possible tocheck the status (e.g.scheduled, running,completed) of eitherindividual or allcalculations (box B), andto erase calculationseither individually or forall in a user-selectedstatus (box C).Completed jobs areautomatically retrievedfrom the Grid and theirstatus is changed to ‘E’.

Page 13: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 755

calculation and download all its various files, orjust extract the bundle of refined conformers. En-ergy and violation statistics can also be obtainedautomatically.

AMPS-NMR computed more than 4,500 jobs inthe last year.

2.11 GROMACS

Like AMBER, GROMACS [30, 52] is a suite ofprograms for molecular simulations. GROMACSis freely available, open source and known for itsversatility and performance. Unlike most otherpackages, GROMACS is not linked to a specificforce field, but can perform simulations with anyforce field that has been ported to the file for-mat required. Available force fields include recentversions from GROMOS [53–56], AMBER [57],CHARMM [58] and OPLS [59].

The GROMACS portal [60] now made op-erational within WeNMR interfaces to a proto-colized, yet flexible GROMACS workflow. Thisworkflow has originally been developed to facili-tate setting up and running simulations for para-metric studies, including statistical comparison ofsets of simulations [61, 62]. Such studies typicallyinvolve many simulations, requiring robust, au-tomated processing, yet with sufficient flexibilityto specify particular settings. The workflow startswith generating a topology for protein and nucleicacid chains, according to the force field specified.The system is then solvated and equilibrated invarious steps, followed by a short run productionrun. A second interface allows users to uploadpre-equilibrated systems for production runs. Fora more detailed description of the workflow andits various options please refer to [60].

The user front end of the portal resemblesother WeNMR portals, using foldable menus toorganize the form. However, this portal, shownschematically in Fig. 4 represents a new class,together with the UNIO portal discussed next.The new design aims to provide a more dynamicand user-friendly experience, both for the end-user, as well as for the server administrator. Tothis end, the portal comprises of three front-endpages, two for regular users and one for admin-

istrators, which are derived from a single portaldescription. From the user perspective, the firstline of entry is the submission page, where a filecan be indicated for uploading and the simulationconditions can be specified. Following submission,the user is redirected to the results page. This isa PHP enabled page that updates every minute,dynamically loading data representing the actualstate of the simulation, allowing the user stayingup-to-date. Major events are also communicatedthrough e-mail.

The dynamic nature of the user and adminis-trator front-ends results from a combination ofPHP code and Python CGI scripts. The scriptsinterface to a global server configuration file orto a specific job configuration file, which are alsoactively used by the server processes. This allowsan active flow of data between the front-end pagesand the server as well as individual processes.

A new job is processed initially by a singlepython CGI script, based on the Spyder datamodel also used in the HADDOCK server [63].First the user credentials are validated, the inputfile format is checked and the input is inspectedfor the presence of non-standard amino acids andligands. Then a project directory is created, to-gether with a job description file that is placed inthe server job pool, which is under control of thejobmanager and Gridmanager daemon processes.

The jobmanager process regularly checks eachjob in the server job pool. New jobs are submittedto the Grid by placing them in the Grid pool, andactive jobs are inspected for a change in status,which will invoke a specified action. The jobman-ager also performs sanity checks to assert consis-tency between the files in the project directoryand those in the Grid pool. If an irregularity isdetected a recovery is attempted, but if that failsthe job will be terminated. When a job finishes, apost-processing function is invoked that extractsstatistics to display on the results page, cleansthe job and makes the job archive available fordownload. The jobmanager is also responsible forkeeping the user informed by e-mail about majorchanges in the status.

The jobs put in the Grid pool come undercontrol of the Gridmanager, which takes care

Page 14: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

756 T.A. Wassenaar et al.

Fig. 4 The GROMACSweb portal, accessible viahttp://www.wenmr.eu/wenmr/molecular-dynamics-software. Threecategories of input formare displayed to theend-user: (i) the requiredparameters, to supply aPDB file containing thecoordinate of thestructure, (ii) the optionalparameters, to control thesimulation time, theforcefield, the solventmodel and theelectrostatic treatment,and (iii) the advancedparameters to controladditional expert settings

of trafficking to and from the Grid: submittingjobs, querying statuses, and retrieving results fromfinished runs. The jdl files submitted to the Gridby the Gridmanager use functionality from thenew EMI CREAM-CE enabled sites on the Grid.

This involves the CPUNumber argument whichallows GROMACS to run on 6 CPU’s usingmultithreading, speeding up the simulation sig-nificantly. The presence of this argument instructs,the workload management system (WMS) to

Page 15: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 757

select only those sites. The Gridmanager will at-tempt submitting, querying or retrieving a job apreset number of times, before reporting the jobas failed to the user.

The GROMACS server can be used for proto-colized single runs, but also allows simultaneousexecution of multiple molecular dynamics simula-tions on Grid infrastructure for parametric stud-ies. A more detailed description of this server andthe GROMACS workflow involved, expandingon the various options available is now published[60].

2.12 UNIO

UNIO (latin: to combine into one, large preciouspearl) is a comprehensive NMR data analysissuite that assembles expert systems for allsequentially applied parts of the conventionalNOE-based NMR structure determinationworkflow. UNIO interconnects the MATCHalgorithm for automated sequence-specificbackbone assignment [37], the ASCAN algorithmfor automated side-chain assignment [36],the CANDID algorithm for automated NOEassignment [26], and the ATNOS algorithm forautomated NMR signal identification [64]. Thesefour major data analysis components form thecore of a UNIO NMR structure determination.Crucial key elements in order to seamlesslyintegrate the four principal data analysiscomponents are routines for automated chemicalshift referencing and automated chemical shiftadaptation between the different input NMRspectra used. The standard UNIO data analysisprotocol requires only a minimal set of sixNMR spectra, namely 3 APSY and 3 NOESY.Notably, UNIO promotes the use of projectionNMR spectroscopy of high-dimensional spectra.Conventional triple-resonance NMR experimentscan alternatively be used instead of APSY data forobtaining the backbone resonance assignmentsusing MATCH. The UNIO application suite isthoroughly road-tested as attested by more than30 de novo protein structures determined usingthe entire four-step unsupervised protocol (withminor human intervention), and by more thanhundreds to thousands of NMR structuresdetermined using individual components

implemented in the UNIO NMR analysissuite (as for the NOE assignment routinein CYANA ported also to WeNMR). Thepower of UNIO for unsupervised protein NMRstructure determination is well documentedas proven by the results obtained after thefirst round of the eNMR-organized NMRcommunity-wide blind testing initiative CASD-NMR (first ever performed objective NMRsoftware comparison) for assessing the potentialof different computational approaches claiming todetermine three-dimensional protein structuresin an unattended manner. Notably, it is worthmentioning that all the programs ported to theWeNMR Grid infrastructure performed very well,confirming that the WeNMR portals (CYANA,CS-ROSETTA, UNIO) present indeed cutting-edge research for protein NMR structuredetermination.

The UNIO portal is designed as a multiple com-ponent web portal, similar to the HADDOCK andGROMACS portals. The UNIO portal consists offour principal web sub-interfaces, correspondingto the essential parts of a NMR structure deter-mination project. It offers expert systems for thecomputationally demanding tasks of NMR sig-nal identification, sequence-specific backbone andside-chain resonance assignment, and comprehen-sive collection of distance restraints with the lattertask focusing on the primary source of NMR-based protein modeling. UNIO is compatible withthe most commonly used and extremely power-ful NMR structure calculation programs, such asCYANA and CNS, which are also operational onthe WeNMR Grid infrastructure. This interfac-ing to other programs represents a template ex-ample of smoothly interconnecting different webportals in WeNMR that is provided “under thehood” and do not require any intervention bythe web portal user. The developed interfacing tothe two other abovementioned WeNMR portalsconveniently equips the structural biology com-munity with all computational processing toolsnecessary for pursuing their own, complete pro-tein structure determination by NMR. All UNIOsub-portals support all commonly used file for-mats for different input data (AMBER, AN-SIG, BMRB, BRUKER, CARA, CNS, CYANA,FASTA, NMRVIEW, PDB, SPARKY, XEASY,

Page 16: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

758 T.A. Wassenaar et al.

Xplor, Xplor-NIH) assuring the ease of use ofthe individual UNIO portals. In addition, theentrance UNIO portal provides a newly devel-oped expert system for detecting errors in chem-ical shift referencing and sequence-specific res-onance assignments (UNIO-ShiftInspector avail-able as fifth sub-portal). Again, the interplaybetween the result of this UNIO subtask andother WeNMR portals (CS-Rosetta, TALOS+)shows the tight interconnection between differentWeNMR-developed portals.

Concerning the practical use of the UNIOportal, for each of the UNIO sub-interfaces(MATCH, ASCAN, ATNOS/CANDID, CAN-DID, and UNIO-ShiftInspector), folded menusare provided, in which standard default valuesfor controlling the calculation process are alreadyfilled in. After job submission, the user is pre-sented a link to a web site that enables moni-toring the progress of the calculation. After thecalculation of a subtask has finished, the user willbe informed by email providing a web address inorder to store the complete output data at a localcomputer. The use of the UNIO portal requiresregistration with a valid Grid certificate, and sub-sequently the user will be provided with a loginname and password for use of the UNIO portal.The UNIO portal is free-of-charge for non-profitorganizations.

2.13 iCing

Validation of the structural models constitutes anessential step in the whole process of structuredetermination, not only by NMR but also all othermethods, such as X-ray crystallography. Its impor-tance is underscored by the recent establishmentby the wwPDB of validation taskforces, that willdraft recommendations for the scientific commu-nity. The WeNMR Grid is designed to incorporatea validation server as integral part of the flow ofdata, allowing for users to examine their struc-tural results prior to final submission to wwPDBdatabases.

The iCing web portal (https://nmr.cmbi.ru.nl/icing/) presents a front-end to the CING vali-dation suite. The software tools implemented in

CING provide for a comprehensive validationanalysis, which includes the results of as manyas eleven [65] different programs and integrationof all results into a single, comprehensive andinteractive HTML/Web2.0 report.

In addition to its own CING project files, theiCing server also accepts as input ‘plain’ PDBfiles, Cyana, or CcpNmr project files containing alldata generated in the typical course of a structurecalculation project, e.g. shifts, peaks, restraints,coordinate data, etc. The iCing server also al-lows direct communication with programs in theWeNMR environment as a service. The CcpNmrprogram Analysis for example, directly uploads itsdata and retrieves the iCing results and a similarmechanism is currently under implementation forthe UNIO and Xplor-NIH structure calculationprograms. The iCing front-end is based on thepopular Google Web Toolkit (GWT) technology.The iCing server routinely processes on a weeklybasis all new NMR entries released by the ww-PDB, which are accessible via the NRG-CINGportal [66]. Use of the iCing portal does not re-quire a Grid certificate.

2.14 ATSAS

Since its initial release in 2003, the ATSAS soft-ware suite for Small Angle X-Ray and Neu-tron Scattering (SAS), became the de-facto stan-dard in SAS data analysis of biological solutions.The package provides tools for data reductionand analysis, for ab initio shape determinationand rigid body modeling, it provides means toadd missing fragments to complexes, can provideinformation on oligomeric mixtures and hierarchi-cal and flexible systems. Since 2003 it was down-loaded over 30,000 times by over 5,500 users from1,500 institutions worldwide. It helped to facilitateabout 1,100 publications in the time from 2007 to2011, of which roughly 700 were devoted to bio-logical solution SAS. This represents more than50 % of all publications in this field.

ATSAS is a suite containing over 20 major pro-grams, and each individual application of the suitewas ported to be usable on Grid nodes. They areas such available in a package for download and

Page 17: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 759

individual use, for direct usage on Grid CEs as-sociated with the enmr.eu VO where the packagewas installed on, and through the associated Gridportals. Although development of the Grid por-tals for ATSAS began only recently, progress wasquickly made due to the experiences accumulatedwith the ATSAS Online service, a community ser-vice that allowed running of ATSAS programs ona small-scale cluster facility of EMBL Hamburg.With its rapidly growing user base, relocation tothe Grid is the logical next step.

The first available ATSAS Grid portal providesan easy to use interface for the ab initio beadmodeling program DAMMIF [67]. Contrary toNMR methods that allow the reconstruction ofmodels at atomic resolution, bead modeling withSAS is meant to provide information of the overallshape of the macromolecule. DAMMIF achievesthis goal by filling a search space with denselypacked dummy atoms, or beads, which initiallyare randomly assigned to be part of the moleculespace or part of the solvent space. Employing asimulated annealing search procedure, DAMMIFthen attempts to find a compact confirmationof interconnected beads whose simulated scat-tering pattern fits the recorded scattering datawell; other conditions besides compactness canbe enforced simultaneously. It is to note that thedummy atom positions in the final ab initio modeldo not represent atoms as they do in NMR results,but only scattering centers at defined locations.These must not be interpreted in any other waybut as “filled space”.

Although DAMMIF is not too computationallydemanding (the execution time of on a typical PCcomputer, depending on the search parameters,generally varies from 30 s to a couple of hours),one needs to run DAMMIF multiple times toverify the stability of the solution [68]. This re-quirement, which holds for any ab initio shape de-termination application, makes ab initio modelinga perfect use case for the scalability of the Grid.

Besides DAMMIF it is planned to provide por-tals for DAMMIN, MONSA [69] and GASBOR[70] within a short time frame. Other parts ofthe ATSAS suite, including rigid body modelingprograms utilizing also the NMR information, willbe made available on portals in due course.

2.15 Accessing the WeNMR Web Portals

Access to the WeNMR portals is granted to a userupon completion of the following steps:

• The user needs to obtain a Grid certificate.The procedure is country-dependent, it usu-ally involve identification (passport, ID card)with the registration authority. The WeNMRwebsite provides a support page to guideas much as possible users to the properregistration authorities and procedures (seehttp://wenmr.eu/wenmr/access/registration).

• The Grid certificate needs to be installed in aweb browser.

• The user has to join the enmr.eu VirtualOrganization (https://voms2.cnaf.infn.it:8443/voms/enmr.eu) using the Grid certificate en-abled web browser. The ‘Acceptable UsagePolicy’ of EGI must be accepted.

• Because each software package used by theWeNMR portals have different license poli-cies, the user need to register separately foreach of the portals to be used. All web portalsare accessible free of charge for users workingin non-profit academic institutions.

• The user will receive a username and pass-word for each portal to be used, once aWeNMR admin has checked the validity ofthe enmr.eu membership. The WeNMR teamis in a process to adopt a user-friendlierauthentication process such as Single SignOn, allowing users to manage their registra-tion with the various portals from a centrallocation.

Every run launched via one of the portals iseffectively submitted using an eToken-based ro-bot certificate. In this way, users do not needto have their Grid certificate loaded into aweb browser to access the application portals.The application portals keep track of all usersubmissions.

Most of the portals are documented with tu-torials (to explain the use of the web interface)and with ‘command line use cases’ for advancedusers who want to submit directly (i.e. not throughthe web portals) their job to the Grid, using

Page 18: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

760 T.A. Wassenaar et al.

the software packages that are already deployed.WeNMR also offers a help center (http://www.wenmr.eu/wenmr/support/help-center-forums) tohelp users with specific questions.

3 Structural Biology on the Grid: DesignStrategies and Implementation

Successfully running web portals requires a propermachinery to handle requests. This machinery in-volves various steps that can be categorized inthree layers of operation: The server level involveshandling of service requests, either by direct hu-man interaction or through requests from anothermachine. This stage includes input type checking.The next level involves preparation, traffickingand monitoring of jobs between the server and theGrid. The third layer is the core layer involving theprocess(es) to be run on a worker node. The tasksassociated with these different levels are concep-tually unrelated and allow for a component baseddevelopment approach, in which distinct tasks areprogrammed in a most generic form. This has the

Fig. 5 Process model. To facilitate implementation andmanagement, processes are represented and, if needed,rewritten in a manner adhering to a simple five nodemodel, with the process itself as the central block thathas four connectors: an input and an output connector,a connector for dependencies and one for logging. Theoutput of one process can be connected to the input ofanother to build larger sequences or workflows. Obviously,each connection can involve several components, e.g. aprocess’ input can consist of several files and/or optionsettings. The process itself may be regarded a black box,as long as the connections are well-defined

advantage that such building blocks can be easilymaintained, adapted and reused.

To facilitate the component-based implementa-tion, a single, simple model, illustrated in Fig. 5,was designed within the WeNMR consortium forthe representation of processes. This model char-acterizes any process as a block with four connec-tors: input, output, dependencies and logging. Theinput and output connectors allow building largersequences or complex workflows. The dependen-cies are considered static to a process and thelogging is for messaging and provenance. Logginginformation can also be used to check the statusand react to errors. Processes are all shaped to ad-here to this simple model, which can be achievedby rewriting programs or by wrapping them insidea script. Doing so, a set of process modules isobtained that can easily be combined in a pipeline.

Process pipelines are often built imperatively,using one of the standard scripting languages. Butan imperative approach has the drawback that itis inflexible: e.g. a failure will cause the wholepipeline to fail and a process has to be startedanew. For this reason, a partial declarative ap-proach was designed, in which direct communi-cation between processes from the different lay-ers of operation are eliminated. Rather, outputfrom one level that forms the input for anotheris ‘pooled’ on disk. Processes from the next layerthat depend on these data are run periodically,scanning the pool for data matching the input re-quirements. This has the advantage that the stateof all processes is naturally check pointed and thatthe use of computational resources can be bettercontrolled. How this approach is used to connectthe web portals to Grid calculations is illustratedin Fig. 6 and explained in more detail below. Notethat this description is relatively general: whilethe GROMACS, HADDOCK, UNIO and CS-ROSETTA portals strictly adhere to this model,the remaining portals currently only partially fol-low the separation into the three layers.

3.1 The First Level: Request and Data Handling,Invocation of the Service, Reporting

The first level of operation involves the interac-tion with the user, both processing the request, aswell as presenting the results. A service request is

Page 19: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 761

Fig. 6 Grid job submission management using job pool-ing. The figure shows a general scheme for managing jobtrafficking to and from the Grid, using server side jobpooling. This scheme is characterized by a separation ofthree layers of operation, between which there is no di-rect communication. Green boxes indicate user interaction,whereas yellow boxes indicate jobs that are running pe-riodically as daemon jobs and that use an eToken-basedrobot certificate for generating a Grid proxy. The blueellipses represent ‘pools’, which are used for storage of jobor result packages. User service requests are processed onthe server, up to the point of generating a job package

that is stored on disk. On the Grid UI (User Interface) adaemon job (Grid-submission) is running on a scheduledbase scanning the ‘job pool’ for job packages and sub-mitting these to the Grid when found. Another daemonjob (Grid-polling) is periodically checking running jobs fortheir status, retrieving the results when ready and plac-ing these in a result pool. Finally, results are presentedback to the user, possibly after post-processing (results-processing). Currently the HADDOCK, CS-ROSETTA,UNIO, CYANA, MARS, GROMACS and MDD-NMRportals, which all send jobs to the Grid, are implementedfollowing this model

made by filling in a form that is parsed by a CGIscript. This script also performs type checking andvalidation of the user input and presents the user aunique ID with which the results can be retrieved.

Both the web form and the CGI script dependprimarily on the data to be provided, and it ispossible to generate these automatically from adescription of these data. To this purpose, the

Page 20: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

762 T.A. Wassenaar et al.

Spyder framework (S.J. de Vries, unpublished)was designed, initially to facilitate setting up andmanaging the portals for HADDOCK. Next to thegeneration of web forms, Spyder natively supportsdata validation for known types and can convertbetween data types if all intermediary conver-sions are defined. Thus, Spyder offers a singleframework for managing most of the elements in-volved in the first stage of processing of requests.

The data parsed and validated are thenprocessed by a request specific script that com-bines all data and control parameters requiredfor further processing into a self-contained jobpackage. This package is subsequently placed intoa job pool directory, which ends processing of therequest at the first level.

3.2 The Second Level: Grid Job Preparation,Trafficking and Monitoring

At the second level, a daemon job is runningperiodically, scanning the job pool for jobs thatare ready to be run on the Grid and submittingthese when found. In principle, this daemon jobdoes not require information regarding the natureof the job, although in practice different instancesare run, each linked to one type of job to bettercontrol the work load associated with the differenttasks.

A separate daemon job is running, also period-ically, checking the status of the jobs running onthe Grid and retrieving the results when finished.Alternatively, this process can resubmit the jobwhen it has failed. The results are put back, aftervalidation, in a place where they can be accessedthrough a web page. Like the submission process,the polling and retrieval process is in principleindependent, since all information regarding thejob, such as the directory to place the results in,are contained in the job package.

Submission, polling and retrieval of outputare handled using a standard toolbox for Gridoperation, which, in the case of WeNMR, is theEMI middleware suite (using standard commandssuch as glite-wms-job-submit, glite-wms-job-statusand glite-wms-job-output). Accordingly, the jobsthat operate at this level require the use of avalid proxy. To facilitate proxy management,all of the processes at the second level of

operation are running using an eToken-basedrobot certificate, in accordance with the securityrequirements for data portals formulated by theJoint Security Policy Group (https://www.jspg.org/wiki/VO_Portal_Policy). The submission andpolling daemons are documented and can bedownloaded at http://www.wenmr.org/wenmr/automated-Grid-submission-and-polling-daemons.

3.3 The Third Level: Primary Tasks

The third level of operation involves the tasksrunning on the Grid. This requires programs to beported to the Grid, but that process is relativelystraightforward. The only aspect that is differentfrom more common strategies is that the processeshave to adhere to the process model discussed pre-viously, facilitating automation and provenance.

3.4 Software Deployment

Software required by the various portals is man-aged by the WeNMR team. Jobs sent to theGrid by portals are not including the executables.Instead, they use pre-deployed software on thevarious sites supporting the enmr.eu VO. Soft-ware deployment is done using standard Grid jobswith software manager role. Once an applicationhas been deployed and shown to give proper re-sults from test Grid jobs, a software tag with theapplication name is added to the particular site(using the lcg-tags command). The web portals, ontheir side, create jdl scripts that use requirementsto target the sites offering a particular software.For example, to target a site supporting the CNSsoftware (version1.2) the following requirement isadded the the jdl script:

Member(“VO-enmr.eu-CNS1.2”,other.GlueHostApplicationSoftwareRunTimeEnvironment)

This mechanism allows flexibility in softwaredeployment and site tagging for specific applica-tions and has the advantage that no specific siteneeds to be defined in the jdl script at submissiontime.

Page 21: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 763

4 Results

4.1 Status and Statistics

Since the start of the eNMR project in fall 2007,considerable progress has been made, both inthe deployment and the utilization of an in-frastructure for structural biology. Currently, theWeNMR-dedicated infrastructure is distributedover four partner sites, which together provide abody of 320 dedicated CPUs, and 3.4 TB of dedi-cated storage. Resources are shared with 29 othersites, giving access to over 42,000 shared CPUcores and over 200 TB of shared storage. Whilethe storage space is available for all enmr.eumembers, it is barely used by the web portalsapplication, for two reasons: Firstly because thevarious applications used are deployed on thedifferent Grid sites by WeNMR administrators.This means that the statically compiled binarieswith the necessary auxiliary data needed for ex-ecution are installed in a software directory ofeach Grid site. Once a site has been validatedfor a specific application, it is “tagged” with theappropriate software tag. WeNMR jobs JDL’sinclude a reference to the tag so that the WMSconsider only the relevant Grid sites for job ex-ecution. Secondly, most of the input and outputdata transferred via network are relatively small(several MB). Finally, the results are retrievedlocally on the specific site where a given portal isrunning where they are further analyzed and post-processed. Upon successful completion of theirjob, users are notified by email. Results are storedfor a limited period of time so that the user canretrieve it.

About 99 % of the CPU resources availableto the enmr.eu VO are shared with other VO’s.However, there are no assigned CPU usage lim-its, implying that potentially there could be com-petition among different VO’s for resources. Infact, WeNMR already accounts for more than30 % of Grid usage in the Life Sciences do-main. Over the last year, around 2 million jobshave been run on the Grid, corresponding toabout 1,000 years of normalized CPU time. Theoverall CPU efficiency, the total CPU time di-vided by the total wall time, of all jobs (exclud-ing software manager role) was around 89.3 %

(statistics taken from the EGI accounting por-tal, https://accounting.egi.eu/egi.php). In terms ofconsumption of Grid resources, WeNMR has ac-cess to 42 k CPU-cores and has reached a maxi-mum of 4.1 k jobs running simultaneously, i.e. ca.10 % of the total capacity.

Including the nineteen applications that havealready been made available through a web portal,twenty programs have been ported to the Grid sofar and are accessible for advanced users with acomputer configured with a EMI User Interfaceservice. In addition, we have successfully tested anew setup based on cloud computing for softwarethat is hard, if not impossible to install on theGrid. For the NRG-CING portal [66], the CING(G. Vuister, et al., CING; an integrated residue-based structure validation program suite, manu-script in preparation) structure validation programsuite was run on Virtual Machine instances calledVirtualCing, on all 9,000+ NMR projects in thePDB.

Currently there are more than 450 users regis-tered, several of which use the portals on a regularbasis. All portals are collecting Grid accountinginformation (how many runs submitted by a user,how many jobs submitted to the Grid etc. . . ). Themost active portals are the ones for HADDOCKand for CS-ROSETTA, which have processedover 3,750 and 775 requests thus far. Togetherthese requests account for over 85 % of the jobsthat have been run on the Grid, as a result of thetask farming (delegation by a master computer ofa large numbers of independent jobs on variouscomputers/clusters/Grid working nodes, followedby a collection and integration of the results by themaster) approach involved.

The WeNMR webportals are accessible to anyuser with a valid Grid certificate. Although theWeNMR project is founded by the Europeancommunity, anyone with a valid Grid certificate,and attached to a public research institution,can join the enmr.eu Virtual Organization andaccess our web portal to run their calculationon the Grid. Currently, our user base spans 41countries, 11 of them outside Europe, repre-senting 27 % of our users (for user statisticsand worldwide distribution see http://wenmr.eu/wenmr/wenmr-Grid-statistics). This implies thatthe three Grid sites operated by the WeNMR

Page 22: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

764 T.A. Wassenaar et al.

partners, and all other Grid sites supporting ourVO, are allowing jobs on their facilities indepen-dently of the nationality of the users, as long asthey have been approved for the use of a specificapplication portal (most of which using a robotcertificate).

4.2 The WeNMR Virtual Research Community(VRC)

WeNMR groups different research teams intoa worldwide Virtual Research Community. TheWeNMR VRC represents the largest Virtual Or-ganization in the life sciences, accounting forabout 33 % of the CPU time on the Gridwithin the life science area. It has been thefirst VRC officially recognized by the EGI. Toserve its various communities, from the gen-eral public to experienced users and softwaresdeveloppers, WeNMR operates a VRC portal(www.wenmr.eu) that offers a variety of services,information channels and communication tools.User support is provided by documentation pagesand tutorials, maintained by the WeNMR part-ners, as well as Wiki pages and a forum forwhich any registered user can contribute to. TheWeNMR web site also provides a chat and a videoconferencing tools. Important events (courses,workshops and conferences) are announced onthe web sites, and event reports are available viaa blog. Altogether, those community services areaimed to facilitate the communication betweenall WeNMR members, life sciences or structuralbiologist experts, e-Science developers, and thegeneral public.

4.3 Collaboration with OSG and SBGrid

WeNMR has been in contact since its beginningwith the SBGrid Consortium (http://sbGrid.org)which is serving the structural biology usercommunity in the US. SBGrid is also a Vir-tual Organization of the Open Science Grid(OSG, http://www.openscienceGrid.org), the USnational distributed computing Grid for data-intensive research, which is based on the Globusmiddleware and is interoperable with the Euro-pean EGI Grid. At the beginning of 2011 it hasbeen agreed between WeNMR and SBGrid that

every US user registering with SBGrid is giventhe option to register with the enmr.eu VO too,allowing them to use the WeNMR services.

Moreover, OSG, SBGrid and WeNMR rep-resentatives designed a technical plan aiming atsetting up a test-bed for allowing the submissionof WeNMR Grid jobs towards the OSG resourcecenters supporting the SBGrid VO.

The first proof of concept has been achievedthrough the use of dedicated EMI middlewareservices components deployed in Europe andconfigured to submit WeNMR jobs to OSG. Afurther step has started to enable the job sub-mission from Europe making use of the Con-dor based GlideinWMS [71] system services de-ployed in OSG and properly configured to receiveWeNMR jobs. A detailed description of the cur-rent achievements is available in [72].

The testing phase is expected to last until theend of 2012. The submission and polling daemonsdeveloped by WeNMR will be adapted to inter-act with the GlideinWMS system. After that, ifsuccessful, the WeNMR production portals willhave the technical possibility to route their work-load in US, exploiting potentially a few addi-tional tens of thousands of CPU-cores provided byOSG resource centers and making the extensionof WeNMR Grid infrastructure to OSG Grid areality.

5 Conclusions

Since the beginning of the eNMR project, theeNMR/WeNMR consortium has managed to set upan operational Grid (http://www.wenmr.eu/wenmr/wenmr-Grid-statistics), to port over twentyapplications and bring up nineteen web portals(http://www.wenmr.eu/wenmr/nmr-services), withseveral others being finalized. At the time ofwriting, WeNMR has already grown to bethe largest virtual organization within the lifesciences. This successful start has been underlinedby the award for the best demonstration of anapplication, received at the EGEE (EnablingGrids for E-sciencE) 2009 User Forum. With thepresent-day momentum, the WeNMR project israpidly evolving into a factor of importance withinstructural biology, and life sciences in general. It

Page 23: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 765

has been the first Virtual Research Communityofficially recognized by the EGI. WeNMRalso aims at serving the ESFRI INSTRUCTstructural biology community and as such hasbeen recognized by INSTRUCT (http://www.structuralbiology.eu/support/funding/wenmr-facilities).

Currently, efforts include writing WSDL (WebService Definition Language) definitions for theportals that will allow calling services remotely,e.g. from a workflow-manager. At the next stageof the project the different elements will be com-bined, providing comprehensive, yet easy-to-usetools for integrated analysis of NMR data. Incor-poration of SAXS services has already begun withthe first pilot project offering an ab initio shapedetermination, and a number of other SAXSmethods are presently being added to support awider user community in structural biology. Theyalso include an innovative Grid-based computa-tional approach that integrates NMR and SAXSdata which has recently been developed by someof us [73].

Up-to-date information, regarding the stateof the project, the available services, and howto join WeNMR and its virtual organizationenmr.eu, can be found on the project web page athttp://www.wenmr.eu.

Acknowledgements The WeNMR project is funded bythe European Commission under an FP7 e-Infrastructuregrant, contract no. 261572 and builds on the previousFP7 e-Infrastructure project e-NMR, contract no. 213010.Support from the former EGEE and the current EGIin terms of expertise and recognition is also acknowl-edged. The national Grid Initiatives of Belgium, France,Germany, Italy, the Latin America Grid infrastructure viathe GISELA project, the Netherlands (via the Dutch BigGrid project), Portugal, Spain, South Africa, Taiwan andUK is acknowledged for the use of web portals, computingand storage facilities. Finally, the authors like to thankthose that have expressed their interest in and support tothe project.

Open Access This article is distributed under the terms ofthe Creative Commons Attribution License which permitsany use, distribution, and reproduction in any medium,provided the original author(s) and the source are credited.

References

1. Bloch, F.: Nuclear induction. Phys. Rev. 70, 460 (1946)

2. Purcell, E.M., Torrey, H.C., Pound, R.V.: Resonanceabsorption by nuclear magnetic moments in a solid.Phys. Rev. 69, 37 (1946)

3. Bonvin, A.M.J.J., Rosato, A., Wassenaar, T.A.: TheeNMR platform for structural biology. J. Struct. Funct.Genomics 11, 1–8 (2010)

4. Feigin, L.A., Svergun, D.I.: Structure Analysis bySmall-angle X-ray and Neutron Scattering. PlenumPress, New York (1987)

5. Mertens, H.D.T., Svergun, D.I.: Structural character-ization of proteins and complexes using small-angleX-ray solution scattering. J. Struct. Biol. 172, 128–141(2010)

6. Morgan, H.P., et al.: Structural basis for engagementby complement factor H of C3b on a self surface. Nat.Struct. Mol. Biol. 18, 463–470 (2011)

7. Prischi, F., et al.: Structural bases for the interactionof frataxin with the central components of iron-sulphurcluster assembly. Nat. Commun. 1, 95 (2010)

8. Gabel, F., et al.: A structure refinement protocol com-bining NMR residual dipolar couplings and small an-gle scattering restraints. J. Biomol. NMR 41, 199–208(2008)

9. Grishaev, A., Tugarinov, V., Kay, L.E., Trewhella, J.,Bax, A.: Refined solution structure of the 82-kDa en-zyme malate synthase G from joint NMR and syn-chrotron SAXS restraints. J. Biomol. NMR 40, 95–106(2008)

10. Grishaev, A., Wu, J., Trewhella, J., Bax, A.:Refinement of multidomain protein structures bycombination of solution small-angle X-ray scatteringand NMR data. J. Am. Chem. Soc. 127, 16621–16628(2005)

11. Bernado, P., Svergun, D.I.: Structural insights into in-trinsically disordered proteins by small-angle X-rayscattering. In: Uversky, V.N., Longhi, S. (eds.) Instru-mental Analysis of Intrinsically Disordered Proteins:Assessing Structure and Conformation, pp. 451–476Wiley, Hoboken (2010)

12. Ferrari, T., Gaido, L.: Resources and services of theEGEE production infrastructure. J. Grid Computing 9,119–133 (2011)

13. Avellino, G., et al.: The dataGrid workload manage-ment system: challenges and results. J. Grid Computing2, 353–367 (2004)

14. Wilkins-Diehr, N.: Special issue: science gateways—common community interfaces to Grid resources: ed-itorials. Concurr. Comput.: Pract. Exp. 19, 743–749(2007)

15. Fargette, M., Barbera, R., Rotondo, R.: A simplifiedaccess to Grid resources by science gateways. In: In-ternational Symposium on Grids and Clouds and theOpen Grid Forum. Taipei (2011)

16. Farkas, Z., Kacsuk, P.: P-GRADE portal: a genericworkflow system to support user communities. FutureGener. Comput. Syst. 27, 454–465 (2011)

17. Kacsuk, P.: P-GRADE portal family for Grid in-frastructures. Concurr. Comput.: Pract. Exp. 23, 235–245 (2011)

18. Barbera, R., Falzone, A., Ardizzone, V., Scardaci,D.: The GENIUS Grid portal: its architecture,

Page 24: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

766 T.A. Wassenaar et al.

improvements of features, and new implementationsabout authentication and authorization. In: Proceed-ings of the 16th IEEE International Workshops onEnabling Technologies: Infrastructure for Collabora-tive Enterprises, pp. 279–283. IEEE Computer Society,(2007)

19. Zhang, C., Kelley, I., Allen, G.: Grid portal solutions:a comparison of GridPortlets and OGCE. Concurr.Comput.: Pract. Exp. 19, 1739–1748 (2007)

20. Szejnfeld, D., et al.: Vine toolkit—towards por-tal based production solutions for scientific andengineering communities with Grid-enabled resourcessupport. Scalable Comput.: Pract. Exp. 11, 161–172(2010)

21. Hull, D., et al.: Taverna: a tool for building and runningworkflows of services. Nucleic Acids Res. 34, W729–W732 (2006)

22. De Vries, S.J., et al.: HADDOCK versus HADDOCK:New features and performance of HADDOCK2.0 onthe CAPRI targets. Proteins 69, 726–733 (2007)

23. Dominguez, C., Boelens, R., Bonvin, A.M.J.J.:HADDOCK: A protein-protein docking approachbased on biochemical or biophysical information. J.Am. Chem. Soc. 125, 1731–1737 (2003)

24. Schwieters, C.D., Kuszewski, J.J., Tjandra, N., Clore,G.M.: The Xplor-NIH NMR molecular structure de-termination package. J. Magn. Reson. 160, 65–73(2003)

25. Güntert, P., Mumenthaler, C., Wüthrich, K.: Torsionangle dynamics for NMR structure calculation withthe new program DYANA. J. Mol. Biol. 273, 283–298(1997)

26. Herrmann, T., Güntert, P., Wüthrich, K.: Protein NMRstructure determination with automated NOE assign-ment using the new software CANDID and the torsionangle dynamics algorithm DYANA. J. Mol. Biol. 319,209–227 (2002)

27. Shen, Y., et al.: Consistent blind protein structure gen-eration from NMR chemical shift data. Proc. Natl.Acad. Sci. USA. 105, 4685–4690 (2008)

28. Shen, Y., Vernon, R., Baker, D., Bax, A.: De novoprotein structure generation from incomplete chemicalshift assignments. J. Biomol. NMR 43, 63–78 (2009)

29. Case, D.A., et al.: The Amber biomolecular simulationprograms. J. Comput. Chem. 26, 1668–1688 (2005)

30. Van Der Spoel, D., et al.: GROMACS: fast, flexible,and free. J. Comput. Chem. 26, 1701–1718 (2005)

31. Vranken, W.F., et al.: The CCPN data model for NMRspectroscopy: development of a software pipeline.Proteins 59, 687–696 (2005)

32. Jung, Y.S., Zweckstetter, M.: Mars—robust automaticbackbone assignment of proteins. J. Biomol. NMR 30,11–23 (2004)

33. Shen, Y., Delaglio, F., Cornilescu, G., Bax, A.: TALOSplus: a hybrid method for predicting protein backbonetorsion angles from NMR chemical shifts. J. Biomol.NMR 44, 213–223 (2009)

34. Jaravine, V.A., Zhuravleva, A.V., Permi, P.,Ibraghimov, I., Orekhov, V.Y.: HyperdimensionalNMR spectroscopy with nonlinear sampling. J. Am.Chem. Soc. 130, 3927–3936 (2008)

35. Petoukhov, M.V., Konarev, P.V., Kikhney, A.G.,Svergun, D.I.: ATSAS 2.1—towards automated andweb-supported small-angle scattering data analysis. J.Appl. Crystallogr. 40, S223–S228 (2007)

36. Fiorito, F., Herrmann, T., Damberger, F.F., Wüthrich,K.: Automated amino acid side-chain NMR assignmentof proteins using 13C- and 15N-resolved 3D [ 1H, 1H]-NOESY. J. Biomol. NMR 42, 23–33 (2008)

37. Volk, J., Herrmann, T., Wüthrich, K.: Automatedsequence-specific protein NMR assignment using thememetic algorithm MATCH. J. Biomol. NMR 41, 127–138 (2008)

38. Lensink, M.F., Mendez, R., Wodak, S.J.: Docking andscoring protein complexes: CAPRI 3rd edition. Pro-teins 69, 704–718 (2007)

39. Mendez, R., Leplae, R., Lensink, M.F., Wodak, S.J.:Assessment of CAPRI predictions in rounds 3–5 showsprogress in docking procedures. Proteins 60, 150–169(2005)

40. van Dijk, A.D., et al.: Data-driven docking: HAD-DOCK’s adventures in CAPRI. Proteins 60, 232–238(2005)

41. Schwieters, C.D., Kuszewski, J.J., Clore, G.M.: UsingXplor-NIH for NMR molecular structure determina-tion. Prog. NMR Spectrosc. 48, 47–62 (2006)

42. Berman, H.M., et al.: The protein data bank. NucleicAcids Res. 28, 235–242 (2000)

43. Rohl, C.A., Strauss, C.E., Misura, K.M., Baker, D.:Protein structure prediction using Rosetta. MethodsEnzymol. 383, 66–93 (2004)

44. Shen, Y., Bax, A.: Protein backbone chemical shiftspredicted from searching a database for torsion angleand sequence homology. J. Biomol. NMR 38, 289–302(2007)

45. Cornilescu, G., Delaglio, F., Bax, A.: Protein backboneangle restraints from searching a database for chemicalshift and sequence homology. J. Biomol. NMR 13, 289–302 (1999)

46. McGuffin, L.J., Bryson, K., Jones, D.T.: The PSIPREDprotein structure prediction server. Bioinformatics 16,404–405 (2000)

47. Wang, Y.J., Jardetzky, O.: Investigation of the neigh-boring residue effects on protein chemical shifts. J. Am.Chem. Soc. 124, 14075–14084 (2002)

48. Berjanskii, M.V., Wishart, D.S.: A simple method topredict protein flexibility using secondary chemicalshifts. J. Am. Chem. Soc. 127, 14970–14971 (2005)

49. Ulrich, E.L., et al.: BioMagResBank. Nucleic AcidsRes. 36, D402–D408 (2008)

50. Fogh, R.H., et al.: MEMOPS: data modelling and au-tomatic code generation. J. Integr. Bioinform. 7, 123(2010)

51. Bertini, I., Case, D.A., Ferella, L., Giachetti, A.,Rosato, A.: A Grid-enabled web portal for NMRstructure refinement with AMBER. Bioinformatics 27,2384–2390 (2011)

52. Lindahl, E., Hess, B., van der Spoel, D.: GROMACS3.0: a package for molecular simulation and trajectoryanalysis. J. Mol. Model. 7, 306–317 (2001)

53. Oostenbrink, C., Villa, A., Mark, A.E., Van Gunsteren,W.F.: A biomolecular force field based on the free

Page 25: WeNMR: Structural Biology on the GridWeNMR: Structural Biology on the Grid 745 Fig. 1 NMR data processing from signal to 3D struc- ture. After acquisition of the primary NMR data,

WeNMR: Structural Biology on the Grid 767

enthalpy of hydration and solvation: the GROMOSforce-field parameter sets 53A5 and 53A6. J. Comput.Chem. 25, 1656–1676 (2004)

54. Schuler, L.D., Daura, X., Van Gunsteren, W.F.: Animproved GROMOS96 force field for aliphatic hydro-carbons in the condensed phase. J. Comput. Chem. 22,1205–1218 (2001)

55. Scott, W.R.P., et al.: The GROMOS biomolecular sim-ulation program package. J. Phys. Chem. A 103, 3596–3607 (1999)

56. van Gunsteren, W.F., et al.: Biomolecular Simulation:The Gromos 96 Manual and User Guide. BIOMOSBV, Zürich, Groningen (1996)

57. Ponder, J.W., Case, D.A.: Force fields for protein sim-ulations. Adv. Protein Chem. 66, 27–85 (2003)

58. Brooks, B.R., et al.: Charmm—a program for macro-molecular energy, minimization, and dynamics calcula-tions. J. Comput. Chem. 4, 187–217 (1983)

59. Jorgensen, W.L., Maxwell, D.S., TiradoRives, J.: De-velopment and testing of the OPLS all-atom force fieldon conformational energetics and properties of organicliquids. J. Am. Chem. Soc. 118, 11225–11236 (1996)

60. Van Dijk, M., Wassenaar, T., Bonvin, A.M.J.J.: Aflexible, Grid-enabled web portal for GROMACSmolecular dynamics simulations. J. Chem. TheoryComput. (2012). doi:10.1021/ct300102d

61. Villa, A., Fan, H., Wassenaar, T., Mark, A.E.: How sen-sitive are nanosecond molecular dynamics simulationsof proteins to changes in the force field? J. Phys. Chem.B 111, 6015–6025 (2007)

62. Wassenaar, T.A., Mark, A.E.: The effect of box shapeon the dynamic properties of proteins simulated underperiodic boundary conditions. J. Comput. Chem. 27,316–325 (2006)

63. De Vries, S.J., van Dijk, M., Bonvin, A.M.J.J.: TheHADDOCK web server for data-driven biomoleculardocking. Nat. Protoc. 5, 883–897 (2010)

64. Herrmann, T., Güntert, P., Wüthrich, K.: ProteinNMR structure determination with automated NOE-identification in the NOESY spectra using the newsoftware ATNOS. J. Biomol. NMR 24, 171–189 (2002)

65. Doreleijers, J.F., Sousa da Silva, A.W., Krieger,E., Nabuurs, S.B., Spronk, C.A.E.M., Stevens, T.J.,Vranken, W.F., Vriend, G., Vuister, G.W.: CING: anintegrated residue-based structure validation programsuite. J. Biomol. NMR 54, 267–283 (2012)

66. Doreleijers, J.F., et al.: NRG-CING: integrated valida-tion reports of remediated experimental biomolecularNMR data and coordinates in wwPDB. Nucleic AcidsRes. (2011). doi:10.1093/nar/gkr1134

67. Franke, D., Svergun, D.I.: DAMMIF, a programfor rapid ab-initio shape determination in small-angle scattering. J. Appl. Crystallogr. 42, 342–346(2009)

68. Volkov, V.V., Svergun, D.I.: Uniqueness of ab initioshape determination in small-angle scattering. J. Appl.Crystallogr. 36, 860–864 (2003)

69. Svergun, D.I.: Restoring low resolution structure ofbiological macromolecules from solution scattering us-ing simulated annealing. Biophys. J. 76, 2879–2886(1999)

70. Svergun, D.I., Petoukhov, M.V., Koch, M.H.: Deter-mination of domain structure of proteins from X-raysolution scattering. Biophys. J. 80, 2946–2953 (2001)

71. Bradley, D., Sfiligoi, I., Padhi, S., Frey, J.,Tannenbaum, T.: Scalability and interoperabilitywithin glideinWMS. J. Phys.: Conf. Ser. 219, 062036(2010)

72. Verlato, M.: Extending WeNMR e-Infrastructure out-side Europe. In: EGI Community Forum 2012/EMISecond Technical Conference, Munich (2012)

73. Bertini, I., et al.: Conformational space of flexible bi-ological macromolecules from average data. J. Am.Chem. Soc. 132, 13553–13558 (2010)


Recommended