15
IRIDA: Canada’s federated platform for genomic epidemiology William Hsiao, Ph.D. [email protected] @wlhsiao BC Centre for Disease Control Public Health Laboratory and University of British Columbia

Irida immemxi hsiao

Embed Size (px)

Citation preview

Page 1: Irida immemxi hsiao

IRIDA: Canada’s federated platform for genomic epidemiology

William Hsiao, [email protected]

@wlhsiao

BC Centre for Disease Control Public Health Laboratory and University of British Columbia

Page 2: Irida immemxi hsiao

IRIDA Platform Overview

• IRIDA= Integrated Rapid Infectious Disease Analysis

• A free, open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time disease outbreak investigations

Core Functions:

• Management of strain and genomic sequence data

• Rapid processing and analysis of genomic data

• Informative display of genomic results

• Sample, Case, and aggregate data (“metadata”) Management

Target audience:

• Public health agencies who need a platform to manage and process genomic data

• Public health agencies who need a platform to use genomics for outbreak investigations

IRIDA

Sequencing Instruments

Web Application

Data management

Built-in Analytical

Tools

External Galaxy

Command-line Tools

Page 3: Irida immemxi hsiao

10 simple rules (wish list) to build a better public health microbiology genomic epidemiology analysis systemDownloadLatest version at https://github.com/phac-nml/irida

Page 4: Irida immemxi hsiao

1: Engage the Users Through the Entire Software Development Cycle

- Project Team has direct access to state of the art research in academia

- Project Team is directly embedded in user organization

Page 5: Irida immemxi hsiao

2: Have A Simple User Interface

Line List View (under testing)

Timeline View (Conceptualization)

Selectable fields

Travel

Symptoms and Onset

Exposure Types

Hospitalization

Launch a pipeline

Be Like

Page 6: Irida immemxi hsiao

3: Build a Robust, Extensible Platform

• IRIDA uses Galaxy tomanage workflows

• Adding additional pipelines is relativelyeasy

• Using a standardAPI to allow 3rd party tools to obtain data from IRIDA (e.g. IslandViewer and GenGIS)

IRIDA

Servlet Container

REST API Central File Storage

Web Interface

Application Logic

Compute ClusterGalaxy

$ ~ >_ Galaxy

http://www.pathogenomics.sfu.ca/islandviewer/http://kiwi.cs.dal.ca/GenGIS/Main_Page

Page 7: Irida immemxi hsiao

4: Have Extensive Documentation

• Documentation should be available for • Users – step by step tutorial with screen shots / FAQ• System Administrators – installation instructions / issue trackers• Developers – open source, collaborative development / IRC Channel

• Easily Accessible at https://irida.corefacility.ca/documentation/

Page 8: Irida immemxi hsiao

5: Implement QC Throughout the Whole Application

• Genomics is sensitive and sequence data are inherently noisy

• Genomics is a rapidly advancing technology• Standardizing pipelines difficult and can stifle innovation• Better to standardize the performance and reporting metrics and ensure any

validated pipelines meet the testing criteria• Developing a general QC testing module (RCQC) that use ontology to standardize

QC metrics (https://github.com/Public-Health-Bioinformatics/rcqc)

• Data Provenance and Version Control (data + Pipelines) are must’s for Diagnostic Labs

Page 9: Irida immemxi hsiao

6: Build to Enable Collaboration

• Be able to compare pipelines• Pipeline implemented using Galaxy – transparent

and shareable • Define QC criteria using ontology to compare the

different pipelines of the same purpose

• Be able to share data in standard formats to minimize data re-entry from one platform to another

• Federation of platforms using standard API to share data and analysis results

Page 10: Irida immemxi hsiao

7: Use Compatible Data Standards

• Sequence data are more compatible / shareable but metadata are currently in silo and incompatible

• Collaboration and Sharing are difficult when data are incompatible

• Compatibility != Sameness

• Use Ontology to allow customization of term list but all terms with same meaning (semantics) should have the same universal ID (e.g. an URL) to facilitate mapping of terms

Page 11: Irida immemxi hsiao

8: Implement Fine Grained Access Control

Detailed View Restricted View

E.g. User role permissions control visibility and editing of content

Authorization

• Industry-standard authentication and authorization mechanisms

• Local authorization per instance.

• Method-level authorization.• Object-level authorization.

Page 12: Irida immemxi hsiao

9: Use Technology to Safeguard Patient Privacy

It’s easy to lose control of the Excel Line List - someone can make a copy of the content and pass it around without your knowledge; typos are common and cumulative!

Technology can control who sees what and when

Separate out sensitive patient data from pathogen sequence data but be able to bring them together when necessary without resorting to emailing of line lists!

Page 13: Irida immemxi hsiao

10: Have Multiple, Flexible Access Options

• No one size fits all solution; Having many platforms to choose from is a good thing (but data should be portable across platforms!)

• IRIDA is available in several different flavours:Local Install Virtual Machine Cloud Instance Public Version

Advantages Full control of the system; your data never leave your centre

Full control of the system; Easy to setup

Full control of the system; does not require local computing infrastructure

No setup required, upload your data and have it processed using Compute Canada Resource

Disadvantages Computing infrastructure and IT support needed to main the resource

Not really scalable if run on your own desktop; some performance loss

Data go into a cloud environment; uploading to cloud environment can be slow

Data go into a public instance (data remain private to your account); upload can be slow

Page 14: Irida immemxi hsiao

14

AcknowledgementsProject LeadersFiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NML

University of LisbonJoᾶo Carriҫo

National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsonTarah LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsMorag GrahamChrystal BerryLorelee TschetterAleisha Reimer

Laboratory for Foodborne Zoonoses (LFZ)Eduardo TaboadaPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven Mutschall

Simon Fraser University (SFU)Melanie CourtotEmma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav DhillonRaymond Lo

BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC)Judy Isaac-RentonPatrick TangNatalie PrystajeckyJennifer GardyDamion DooleyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorCletus D’SouzaAna Paccagnella

University of MarylandLynn Schriml

Canadian Food Inspection Agency (CFIA)Burton BlaisCatherine CarrilloDominic Lambert

Dalhousie UniversityRob BeikoAlex Keddy

McMaster UniversityAndrew McArthurDaim Sardar

European Nucleotide ArchiveGuy CochranePetra ten HoopenClara Amid

European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina

Page 15: Irida immemxi hsiao

1515

IRIDA Annual General MeetingWinnipeg, April 8-9, 2015