ThalInd, a β-thalassemia and hemoglobinopathies database for India: defining a model country-specific and disease-centric bioinformatics resource

Human MutationDATABASES

ThalInd, a b-Thalassemia and HemoglobinopathiesDatabase for India: Defining a Model Country-Specificand Disease-Centric Bioinformatics Resource

Sujata Sinha,1,2 Michael L. Black,1 Sarita Agarwal,3 Reena Das,4 Alan H. Bittles,1,5 and Matthew Bellgard1�

1Centre for Comparative Genomics, Murdoch University, Perth, Australia; 2Thalassemia Working Group, Varanasi, India; 3Sanjay Gandhi Post

Graduate Institute of Medical Sciences, Lucknow, India; 4Postgraduate Institute of Medical Education and Research, Chandigarh, India; 5Edith

Cowan University, Perth, Australia

Communicated by Richard G.H. CottonReceived 19 November 2010; accepted revised manuscript 29 March 2011.

Published online 21 April 2011 in Wiley Online Library (www.wiley.com/humanmutation). DOI 10.1002/humu.21510

ABSTRACT: Web-based informatics resources for geneticdisorders have evolved from genome-wide databases likeOMIM and HGMD to Locus Specific databases(LSDBs) and National and Ethnic Mutation Databases(NEMDBs). However, with the increasing amenability ofgenetic disorders to diagnosis and better management,many previously underreported conditions are emergingas disorders of public health significance. In turn, thegreater emphasis on noncommunicable disorders hasgenerated a demand for comprehensive and relevantdisease-based information from end-users, includingclinicians, patients, genetic epidemiologists, healthadministrators and policymakers. To accommodate thesedemands, country-specific and disease-centric resourcesare required to complement the existing LSDBs andNEMDBs. Currently available preconfigured Web-basedsoftware applications can be customized for this purpose.The present article describes the formulation andconstruction of a Web-based informatics resource forb-thalassemia and other hemoglobinopathies, initially foruse in India, a multiethnic, multireligious country with apopulation approaching 1,200 million. The resourceThalInd (http://ccg.murdoch.edu.au/thalind) has beencreated using the LOVD system, an open sourceplatform-independent database system. The system hasbeen customized to incorporate and accommodate datapertinent to molecular genetics, population genetics,genotype–phenotype correlations, disease burden, andinfrastructural assessment. Importantly, the resource alsohas been aligned with the administrative health systemand demographic resources of the country.Hum Mutat 32:887–893, 2011. & 2011 Wiley-Liss, Inc.

KEY WORDS: bioinformatics resource; database; b-tha-lassemia; hemoglobinopathies; India ThalInd

Introduction

The evolution of Web-based informatics resources for geneticdisorders can be traced through three distinct phases: (1) genome-wide mutation databases, (2) locus-specific databases (LSDBs),and (3) national and ethnic mutation databases (NEMDBs).Genome-wide databases, such as OMIM, HGMD, and Ensembl,contain pooled information on all genes and incorporateadvanced tools for gene analysis with user interface. LSDBs weredesigned so that researchers dealing with a specific disease canretrieve current data from a single source and thus need to searchno further than an LSDB [Cotton, 2009]. The majority of LSDBsincorporate tools for the analysis of gene expression and thephenotype in normal and disease conditions [Patrinos, 2006].NEMDBs represent the third phase and were devised to provideinformation on disease-causing mutations and their frequencies indifferent population groups within a country. They can help in theoptimization of molecular diagnostic services and the creation ofappropriate awareness among clinicians, scientists, and the generalpublic about genetic disorders that may be prevalent in differentpopulations and communities [Patrinos, 2006]. The developmentof disease-specific national resources could therefore be consid-ered the next critical phase in the evolution of Web-basedinformatics resources on genetic disorders.

With the increasing amenability of many genetic disordersto prevention, early diagnosis, and better management throughearly intervention and increasing curative options, informaticsresources also need to be extended and up-scaled to providerelevant information to a wide range of potential users, includingclinicians, patients, genetic epidemiologists, health administrators,and policymakers. It also is desirable that data and issues of thisnature can be accommodated within the purview of a nationalhealth system. Disease-specific national resources can accommo-date the requirements of information on disease burden, treat-ment options, and existing facilities for diagnosis and trackingutilization of these services would greatly enhance their relevanceto society.

In this article we introduce and discuss the design of a country-specific bioinformatics resource for a genetic disorder of wide-spread, major public health significance. The highlight of theconceptual model is alignment with national demographic,administrative, and health systems illustrated by its the applicationto b-thalassemia and other hemoglobinopathies, autosomalrecessive disorders adversely affecting the health of large numbersof people worldwide. The resource has been specifically designed

OFFICIAL JOURNAL

www.hgvs.org

& 2011 WILEY-LISS, INC.

Additional Supporting Information may be found in the online version of this article.�Correspondence to: Matthew Bellgard, Centre for Comparative Genomics,

Murdoch University, South Street, Perth, WA 6150, Australia.

E-mail: [email protected]

for adoption in India, a large and demographically complexcountry, but it has the additional potential to serve as a prototypefor other genetic disorders that impact on health in all low- andmiddle-income countries.

Bioinformatics Requirements of b-Thalassemiaand Hemoglobinopathies in the Indian Context

It has been suggested that in the near future b-thalassemia andrelated disorders are likely to emerge as the category of geneticdisease that will have the most widespread impact on public healthand health resources in India [Agarwal, 2005; Petrou, 2010;Weatherall, 2010]. The autosomal recessive disease b-thalassemiais the most complex disease among the larger group of inheritedhemoglobin disorders. The b-globin gene itself is located onchromosome 11p15.5, with 242 mutations reported [Giardineet al., 2007]. However, expression of the b-globin gene isinfluenced by secondary and tertiary genetic modifiers of thedisease phenotype, resulting in extensive phenotypic diversity[Weatherall, 2001]. The prevalence of symptomatic or clinicallysilent hemoglobin variants such as HbE, HbS, and HbD within thesame population subsets further contribute to diverse phenotypes,resulting in thalassemic hemoglobinopathies and homozygousand compound hemoglobinopathies. Stem cell transplantation isthe only curative option currently available to patients, but in low-income countries its adoption has been restricted because oflimited donor availability, the high costs involved, and the smallnumber of specialist centers. As a result, a large majority ofpatients remain reliant on chronic management regimensinvolving regular blood transfusions and iron chelation therapy,which places a huge burden on the national health resources andon the resources of patients, their families, and communities.Given these circumstances, and the large numbers of patientsinvolved [Sinha et al., 2009], it is envisaged that a comprehensivenational information resource on thalassemia could greatly aidhealthcare delivery and control strategies [IUSSTF, 2007].

The occurrence and prevalence of recessive mutations in aparticular population may be dependent on marriage andreproductive practices [Sinha et al., 2009]. The complex, highlystratified structure of the Indian population, characterized by theunique, long-established caste system, has been further compli-cated by multiple waves of immigration and subdivisions based onsix major religions and 22 major spoken languages [Black et al.,2010]. With a multifaceted population history of this nature, athorough knowledge and understanding of local communitystructure is required in order to devise an effective and relevantinformatics resource for b-thalassemia.

Health policies in India are formulated at national level andimplemented by individual states according to the directives of thenational government with each state organizing its own healthinfrastructure via a Department for Health and Family Welfare.The decennial Census of India (www.censusindia.gov.in) is themajor national demographic resource, and serves as a referencepoint for most national planning and policy decisions. TheNational Rural Health Mission (NRHM) (http://mohfw.nic.in/nrhm.htm), a major initiative of the national government focuseson improving healthcare delivery to the rural areas where 470%of the population reside.

A national bioinformatics resource aligned with the healthadministrative system should therefore facilitate: (1) the imple-mentation of prevention and control programs by public healthauthorities; (2) an improvement in the availability and accessibility

of thalassemia care services in all areas rural and urban; (3) clinicalresearch to improve chronic management regimens; and (4) researchin related disciplines, including population genetics, public health,and clinical medicine.

Structuring the Web-Based National Resourcefor b-Thalassemia and Hemoglobinopathies

Given the immense size of the population, the very significantlevel of regional diversity, and the current health administrativesystem, the creation of a pan-Indian Web-based resource created bymerging information generated at State level is a logical progression.The key components required of an effective b-thalassemia Web-based resource are outlined in Figure 1.

The core component of this resource, called ThalInd, is a centralWeb-based curatable database of unique b-globin gene variants,derived from the entry records of patient(s) carrying a pathogenicmutation on one or both alleles. This curatable resource capturesdata from published studies and direct submissions from clinical andresearch laboratories, identifying them as ‘‘submitters’’ and ‘‘sub-mission centers,’’ respectively. Data mined from published studiesand those received through direct submission would be analyzed,verified, and curated to transform them into a State level repositoryof information capable of ensuring data output in a user-friendlymanner to meet the needs of a wide and heterogeneous user base.

In creating country specific bioinformatics resources, it isimportant to recognize that, for inherited disorders of publichealth significance, the impact of planning and strategies for healthoutcomes may be evident only over quite lengthy time spans. Ittherefore is imperative that the design of such resources is modularand scalable to accommodate future issues and challenges.

LOVD (Leiden Open Variation Database system), a popularWeb application for LSDBs that was chosen for the currentresource is freely available, platform-independent and is installedon an HTTP server [Fokkema et al., 2005].

The major advantage offered by LOVD for the present definedpurposes is its flexibility for customization and extension, thusenabling the identification and addition of data items required forthe extraction of a wide range of information at the user end. Theprovision of five different access levels, that is, Administrator,Manager, Curators, Submitters, and General User interface, makesit readily applicable to a multicenter system that can be managedby a consortium of expert advisors or curators facilitating datacollection, verification, and submission from various centers andfor dissemination of information to the general public. Further, asand when deemed necessary, the entire dataset can be retrievedand transferred to another suitable Web-based application.

A major limitation in adapting LOVD for an autosomalrecessive disorder like b-thalassemia is in representing both allelesfrom affected individual(s) to retrieve information on genotype.For variant data, LOVD offers a complex system of assigning two-letter codes to each unique variant, which when suffixed to variantallele, denotes the other allele. Due to the lack of a ready alter-native we have adopted this system in the current resource. For thepatient data we have devised a simpler system, which is elaboratedbelow.

Sources of Start-Up Data

The publication of molecular data on thalassemia andhemoglobinopathies in Indian population commenced in the1980s and during the last three decades descriptive publications

888 HUMAN MUTATION, Vol. 32, No. 8, 887–893, 2011

from various centers and groups located in different parts of thecountry have followed. A systematic meta-analysis of publisheddata on b-thalassemia undertaken by the authors [Sinha et al.,2009] is the major source of validated and authenticated data forthe current initiative. Unpublished anonymized data fromindividual patients and carriers submitted by one of thesubmission centers (Centre for Genetic Disorders, Banaras HinduUniversity, Varanasi, India) have been included to demonstrate thecapacity of the database to accommodate both published andunpublished data from different sources. Data on the structuralhemoglobin variants HbS, HbE, and HbD from these publishedstudies have also been included as variant alleles. The start-up datatherefore comprise 61 unique variants collated from 9,401 allelesderived from published studies, together with 142 alleles from 105individuals (Supp. Fig. S1).

Customizing the Lovd System

Customizing and formatting a database requires the establish-ment of methods and protocols for data capture, and datacuration and enabling data output in user-friendly formats.

Data capture is facilitated by evolving a format for merginginformation from different sources, to enable both the high-throughput transfer of data generated in research and referrallaboratories and individual entries from clinical laboratories.A spreadsheet consisting of required ‘‘data items’’ as columns andindividual ‘‘entries’’ corresponding to alleles as rows provided anappropriate template for the current LOVD installation. The priorassessment of bioinformatics requirements provided a guide forfiltering available information on disease and incorporatingrelevant information as data items by the addition of specific

custom columns to the variant and patient tables in the LOVDformat.

‘‘State’’ is the only mandatory data item that has to be specified forevery entry to be uploaded in the database, other than those requiredby the LOVD system. Variant molecular data have been limited tobasic information on specific variation, including the DNA change,its type, location, pathogenicity, and frequency. Provision has beenmade to record the national and regional frequencies of variantalleles, in addition to those present at State level.

To facilitate queries from a wide set of users, the important dataitems in the patient table include all of the terms commonly usedto refer to disease types: clinical disease phenotype, hematologicphenotype, and genotype. Associated a-genotypes, if available, canbe entered in the provided column to assist in the correlation andprediction of phenotypes.

Geographical origin can be recorded down to district level forurban areas and to village level in rural areas. In addition, theRural or Urban origin of the individual is recorded in a separatecolumn and rural origin is ascertained by entering the village and/or town name in the designated column. To accommodatecomplex ethnicity, the data on ethnic origin is recorded in threecolumns for religion, caste, and community. To enable theincorporation of information on burden of disease and clinicaloutcomes, columns for birth year, mortality, and morbidity havebeen created. The columns for registration with thalassemiacenters and patient support groups have been included to generateinformation on infrastructure and the availability and accessibilityof care services. A column on prenatal diagnosis is created togenerate data on the availability and acceptability of this effectivecontrol option. This information is recorded in entries from adultcarrier females as whether they have availed the option or not.

Figure 1. Key components of ThalInd: the Web-based national resource for b-thalassemia and hemoglobinopathies in India.

HUMAN MUTATION, Vol. 32, No. 8, 887–893, 2011 889

Official land area codes for States and districts, and identifica-tion codes for individuals, families, and published studies areentered as separate data items, to help build a data collectionsystem that minimizes the risk of duplication of data on variantalleles and also assist in assessing carrier prevalence and diseaseburden in specific administrative units of districts and States. Lasta reference Patient_ID assigned to each entry and created bycombining these land area and identification codes constitutes oneof the most important data items in the current resource, aselaborated in sections on data curation and data output. Figure 2is a schematic representation of information flow across thedatabase.

Linkages to External Resources

Linkages to genome-wide databases such as OMIM, EntrezGene, HGMD, and GeneTests are preconfigured on the homepageof the LOVD installation. HbVar (http://globin.cse.psu.edu/hbvar)is the globin gene LSDB providing information on all variantsresulting from mutations in the a- and b-globin gene clusters, andit incorporates detailed descriptions of laboratory and clinicalphenotypes related to specific genotypes, along with the structuraland functional properties of hemoglobins. A separate flat file, XPRbase, provides detailed experimental protocols for DNA analysis[Patrinos et al., 2004]. Each entry uploaded in the ThalInddatabase features a customized link to HbVar that on clickingopens the respective page in HbVar, where the user can accessinformation on clinical and laboratory information associatedwith that particular variant.

The decennial Census of India (www.censusindia.gov.in)provides substantial health-related data. To facilitate the demo-graphic mapping of thalassemia mutations, geographical landcodes for states and districts used by the national Census authori-ties were incorporated into the specific reference Patient_IDassigned to individual patient entries. The links to land areacodes and population data have been incorporated in the homepage to facilitate curation, and they also can lead interested

end-users to the cited demographic resource. The HealthManagement Information System (http://www.nrhm-mis.nic.in)initiated by the Government of India in October 2008 underNRHM is intended to serve as the single window for all health-related information, and can be accessed from the ThalInd homepage. It is envisaged that in time this link can be used for thefurther two-way integration, transfer, and display of information.

Curation and Management of the Database

Curation is the key to maintaining the integrity and quality of adatabase. Issues in curation for the current installation relate to:(1) data mining and analysis, validation, and authentication ofsubmitted data; (2) assigning reference Patient_ID to all entrieswhether from publications or through direct submission;(3) updating national, regional, and state level frequencies basedon published studies; and (4) maintenance of the anonymity andconfidentiality of personal data.

Published data are mined by the curators, and analyzed andverified for any duplication, before entry into the database on aregular basis. Each publication is assigned an identification codeand archived in a file. The data are then analyzed and divided intogroups with all records sharing the same parameters forminga single group, whether based on alleles or patients, and addedas a single entry with ‘‘n’’ number of alleles. Patient-based datasubmitted by individual submitters will be verified at thesubmission center level to eliminate any error or duplicationbefore inclusion in the database. The availability of patient- andfamily-specific details should substantially decrease the risk ofduplications.

After authentication each entry, whether from a publication orindividual patients and carriers from submission centers andsubmitters, is manually assigned a reference ‘‘Patient_ID’’ by thecurators, which is in addition to the autogenerated serialnumerical ‘‘patient_id.’’ This reference Patient_ID is an aggrega-tion of four codes, indicating the state of origin and sourcepublication in the case of entries from publications. For individual

Figure 2. Schematic of the ThalInd depicting flow of information across the database.


patient-based entries, the four codes used to construct the specificPatient_IDs are the State code, district code, family code, andindividual code. Only the variant allele is entered for hetero-zygotes, whereas for homozygotes and compound heterozygotesone allele is entered per row. The Patient_ID has ‘‘a’’ as suffix inboth alleles in case of homozygous patients. In the compoundheterozygous state one allele has suffix ‘‘a’’ and the other alleles hassuffix ‘‘b.’’

The assigned Patient_ID serves multiple purposes in curation,output, and analysis of data. In curation, it facilitates easy trackingto respective publication or to laboratory sources for verificationand additional information if required, and it is provided to thesubmission centers for reference use as a reference link betweenThalInd and the submission center database. Confidential data onindividuals, carriers, or patients are restricted to the submissioncenters, which have direct contact with the patients and theirfamilies and are under a legal obligation to protect their privacy.Detailed clinical and laboratory data are also to be retained by thesubmission centers, whereas the analytical derivatives as diseasephenotypes and morbidity and mortality figures will beexportable. Importantly, to facilitate and optimize the regularupdate of mortality and morbidity data from thalassemia centersor support groups, the Patient ID is so constructed that whilerequisitioning these data from submission centers the identity ofthe patient/carrier remains confidential from the curator.

ThalInd has been designed as a resource that can be managed bya group or a consortium of researchers and clinicians engaged inthe diagnosis, management, or research of thalassemia in variousparts of the country. Initially, three submission centers have beenidentified and the registered submitters from each center, alongwith curator, manager, and database administrator, comprise theinitial consortium managing the database. The consortium itself isexpected to expand as more submitters from submission centers

are registered. The database was designed and is currently hosted atthe Centre for Comparative Genomics server (http://ccg.murdoch.edu.au/thalind) at Murdoch University, Australia. However, thesystem is capable of installation on any collaborating institutionalserver within India. Staff from the submission centers can bothaccess and edit the data that they submit. The submission centerswill be encouraged and helped to collect and store data incompatible software formats, to facilitate the direct transfer of largevolumes of data by exporting text files to the database. If thesubmitters wish to publish data they submitted to the database,where it will be withheld from public view until publication of thestudy and released only after communication with the relevantsubmitters.

Data Output and Analysis

This disease-centric and country-specific ThalInd database(http://ccg.murdoch.edu.au/thalind) has been designed to provideinformation across a wide spectrum of areas related to inheritedhemoglobin disorders and is operable at national level. The dataitems can broadly be classified into four groups addressing:molecular genetics, population genetics, clinical phenotype–genotype correlations, and a clutch of parameters to assess thedisease burden, infrastructure, and services. The latter are collec-tively referred to as health informatics and consist of informationutilized to improve healthcare delivery (Fig. 3).

The querying process in this LOVD-based installation is simple,with each data item or column having a search box. Figure 3indicates some of the key data items that can be combined withother data items for generating useful information at State level,for micromapping at district and village levels, and for data onhealth informatics.

Figure 3. Columns constituting the ThalInd database. The schema shows conceptual groupings of data items to highlight the wide spectrumof issues addressed by the database.


Variant-based searches can be conducted using this DBID, orby using an HGVS variation code or common variation name.The users can navigate to the locus-specific database HbVar andother genome-wide databases for molecular genetics. Data onvariants can be combined with different geographical and ethnicorigins to yield data on population genetics (Supp. Fig. S2). Thesearch can additionally be focused to yield results on specificdisease phenotypes, genotypes, the numbers of homozygotes,compound heterozygotes, and specific compound heterozygotesin different ethnic groups within or across States, and it can befurther refined to include the effect of associated a-genemutations on the phenotype.

The judicious use of combinations of disease phenotypes andgenotypes can in itself provide useful indicators. For instance, theclinical phenotype ‘‘Thalassemia Intermedia’’ with a hematologicphenotype ‘‘Heterozygous Beta Thalassemia’’ will display patientswho may indicate the presence of dominant b-thalassemia allelesor the need to analyze them for a gene triplications or othermodifiers.

An important feature of the database is that the storedinformation can be split into data obtained from publishedstudies and that from individual patients. Using ‘‘State Code_Publication’’ or ‘‘State Code_Individual’’ in combination with‘‘Geographic origin/state’’ will retrieve entries from publicationsor individual patients, respectively.

The individual patient data can then further be queried at Stateand district levels to yield information on parameters of healthinformatics. Clinical disease phenotypes, like Thalassemia Majorand Intermedia, can be combined with birth year to generate datathat can indicate the disease burden by estimation of the numberof affected births by year in particular States or districts. Theavailability, quality, and effectiveness of care services in prolonginglife and improving quality of life can also be assessed through theanalysis of data provided in the mortality and morbidity columns.Data indicating nonutilization of prenatal diagnosis as a controloption in females from an area reporting a relatively high numberof affected cases may serve as a pointer to explore the availabilityof this facility in the area. An analysis of State and district codesembedded in Patient_IDs, displayed as search results for particularvariant alleles can readily help in identifying districts with a highprevalence of affected individuals. ‘‘Patient_ID’’ can also be usedto track the outcomes of individual patients and the informationobtained can then be used to assess and direct care and controloptions as indicated.

Discussion

The primary objective of this study was to design a country-specific and disease-centric resource capable of moving beyondthe limits of a LSDB and creating an information system that alsocan be a tool for clinical and population-based research and forthe formulation and implementation of a public health approachto b-thalassemia. Despite recognition that thalassemias present anincreasing global health burden, the true magnitude of the burdenremains unknown in most low- and middle-income countries dueto the unavailability of comprehensive and representative data[Weatherall, 2010]. This gap in information is in part due to a lackof uniformity in the study formats of the available published data,and an inability to collate and utilize data on carriers and patientsgenerated in small care centers and clinical laboratories. Theformat and protocols established for data capture and curationthrough the current resource aim to achieve the increasegeneration of data that are relevant, authentic, and representative

of the respective geographical regions and ethnic groups, as well asbeing amenable to utilization by healthcare agencies.

To achieve alignment with the existing health administrativesystem in India, it was necessary to select ‘‘State’’ as a mandatoryfield in the database, which led to the exclusion of previouslypublished molecular data where such information was missing[Sinha et al., 2009]. However, this minimum criterion creates alogical basis for data uniformity and provides a commondenominator for the calculation of mutation frequencies, carrierprevalence rates in different ethnic population groups, diseasephenotypes, mortality and morbidity data, the disease burden inrural and urban areas, the estimation of diagnostic, managementand support services, and the utilization and acceptance of optionssuch as prenatal diagnosis. One of the custom columns includedin the database records information on Rural or Urban origin ofthe individual. This column was specifically created to providesegregated data on rural and urban populations, in response to theadministrative health policy on rural healthcare delivery establishedthrough the National Rural Health Mission in 2005.

In the current resource, the start-up data from publishedstudies mainly provide a spectrum of mutations and frequencies atregional and state level, whereas the unpublished patient data on142 alleles from 105 individuals from one of the three submissioncenters provide information on specific ethnic and geographicalorigins, clinical phenotypes, disease burden, and clinical data onmanagement and associated mutations. Thus, it is believed thatthe submission of further individual patient data from submissioncenters and individual submitters related to patient supportgroups will greatly enrich the database and provide valuableinformation that can be utilized to improve the health status ofthe population and, more especially, individuals affected by thisgroup of disorders.

The frequency of a variant allele in different population subsetsis one of the most significant data items in population genetics.In ThalInd, the percentage frequencies of variants at State levelsderived from published studies are presently displayed, and thesealso can be compared with ten common variants identified atnational and regional levels [Black et al., 2010]. To avoidmisrepresentation, to date, only the frequencies of those variantsfor which a minimum of 100 alleles have been reported from aState, and where the frequency of the variant allele is at least 1%or more have been presently uploaded in the database. Thus,in addition to State level frequencies of variants, a comparativeprofile of common variants at national, regional, and state levelcan also be obtained by the end user (Supp. Fig. S3). In the future,with increased submission of individual data, variant frequenciesare expected to be more representative of specific geographicalareas and ethnic groups. Enabling the dynamic generation ofvariant frequencies with graphical and map-based displays can bevery advantageous and will be undertaken in future up-grading ofthe database.

Future modifications will also focus on improving the captureof data related to clinical care and management. The section of thedatabase containing information on individual patients registeredwith thalassemia centers and support groups has the potential toexpand into a clinical research base, with suitable expansion andcustomization to accommodate follow-up information on patientsby recording their response to treatment regimens and relevantlaboratory data monitoring clinical events. With the availabilityof such a resource, which merges data from different centers,systematic multicenter studies could be undertaken on a cohort ofpatients for research and evaluation, with the potential of greatlyimproving the clinical management of these disorders.


It is envisaged that the attributes of the database discussedabove and described in earlier sections, together with its capacityto expand and support a network of submitters from variouscenters, are likely to encourage and promote its growth among theintended user-base.

Conclusion

We have outlined the formulation and construction of aWeb-based bioinformatics resource for b-thalassemia and hemo-globinopathies, inherited hemoglobin disorder of increasingpublic health significance in India, and many other low- andmiddle-income countries of the world [Weatherall, 2010; WHO,2006]. Our goal was to create a common format for the collectionof thalassemia-related data, thereby enabling interoperabilitybetween different centers across the country and leading to theformation of an information resource with the capacity to supporta nationwide network. The LOVD-based ThalInd can easily beinstalled on any institutional server in India and entire datasetscan safely be uploaded in their existing format. It is anticipatedthat in the future more detailed analyses of clinical andtherapeutic data, statistical summaries of genotpyes and phenop-types, and dynamic generation of allele frequencies will need to beaccommodated.

Acknowledgments

This study was made possible through an Endeavour Executive Award by

the Commonwealth Government of Australia to Sujata Sinha, Thalassemia

Working Group, Varanasi, India, and the generous financial contribution

provided by the Western Australian State Government in the establishment

of the WA Centre of Excellence for Comparative Genomics and support

of this project. The authors gratefully acknowledge the provision

of anonymized unpublished patient data by Professor Rajiva Raman,

Coordinator, Centre for Genetic Disorders, Banaras Hindu University,

Varanasi, India, and Research Coordinator, Thalassemia Working Group,

Varanasi.

References

Agarwal MB. 2005. The burden of hemoglobinopathies in India—time to wake up?

J Assoc Physician India 53:1017–1018.

Black ML, Sinha S, Agarwal S, Colah R, Das R, Bellgard M, Bittles AH. 2010.

A descriptive profile of b-thalassemia mutations in India, Pakistan and Sri Lanka.

J Commun Genet 1:149–157.

Cotton RGH. 2009. Toward the ideal mutation update and locus specific database for

disease genes. Hum Mutat 30:v.

Fokkema IFAC, den Dunnen JT, Taschner PEM. 2005. LOVD: easy creation of a locus

specific sequence variation database using an ‘‘LSDB-in-a-Box’’ approach.

Hum Mutat 26:63–68.

Giardine B, Van Baal S, Kaimakis P, Riemer C, Miller W, Samara M, Kollia P,

Anagnou NP,Chui DH, Wajcman H, Hardison RC, Patrinos GP. 2007. HbVar

database of human hemoglobin variants and thalassemia mutations: 2007

update. Hum Mutat 28:206.

IUSSTF. 2007. Genetic disorders: hemoglobinopathies Indo–US science & technology

forum, New Delhi, India. P40. Annual Report 2006–2007.

Patrinos GP. 2006. National and ethnic mutation databases: recording populations’

genography. Hum Mutat 27:879–887.

Patrinos GP, Giardine B, Riemer C, Miller W, Chui DHK, Anagnou NP, Wajcman H,

Hardison RC. 2004. Improvements in the HbVar database of human

hemoglobin variants and thalassemia mutations for population and sequence

variation studies. Nucleic Acids Res 32:D537–D541.

Petrou M. 2010. Screening of beta thalassaemia. Ind J Hum Genet 16:1–5.

Sinha S, Black ML, Agarwal S, Colah R, Das R, Ryan K, Bellgard M, Bittles AH. 2009.

Profiling b-thalassaemia mutations in India at state and regional levels:

implications for genetic education, screening and counselling programmes.

HUGO J 3:51–62.

Weatherall DJ. 2001. Phenotype–genotype relationships in monogenic disease:

lessons from the thalassaemias. Nat Rev Genet 2:245–255.

Weatherall DJ. 2010. The inherited diseases of hemoglobin are an emerging global

health burden. Blood 115:4331–4336.

WHO. 2006. Thalassemia and other haemoglobinopathies. World Health Organization

Resolutions, May 2006, EB118.R1 and WHA59.20.


Documents

ThalInd, a β-thalassemia and hemoglobinopathies database for India: defining a model country-specific and disease-centric bioinformatics resource