Upload
dinhkiet
View
216
Download
2
Embed Size (px)
Citation preview
1
Digital Library Development in the Asia Pacific
Digital Library Future Research
Case Study in International Disease Surveillance
Hsinchun Chen, Ph.D.
美國亞歷桑那大學, 陳炘鈞 博士
McClelland Professor,
Director, Artificial Intelligence Lab
NSF COPLINK Center
Dept. of Management Information Systems
University of Arizona
Acknowledgement:
*NSF DLI1, DLI2, NSDL
*NIH, NLM, NCI
*NSF DG, DOJ, DOD,
DHS
2
Outline
• DL After the First Decade – Asian Digital Library Development
– NSF Cyberinfrastructure Program and NSF Chatham Digital Library Workshop
• Case Study: International Disease Surveillance – BioPortal (from FMD to Avian Influenza)
3
Introduction
• Digital libraries represent a form of information technology in which social impact matters as much as technological advancement.
• Over the past decade the development of digital library activities has been steadily increasing.
• International conferences in digital library have proliferated from their roots of ACM and IEEE Digital Conferences (and then the Joint Conference on Digital Libraries, JCDL) to the European version of ECDL (European Conference on Digital Libraries) and the Asian version of ICADL (International Conference of Asian Digital Libraries).
4
DLI Program Implementation History
Digital Libraries Initiative – Phase 1 (1994-98) Sponsors: NSF, DARPA, NASA
76 Proposals, 6 Awards, $25M total
Digital Libraries Initiative – Phase 2 (1999-03) Sponsors: NSF, DARPA, NASA, NIH/NLM, NEH
Partners: IMLS, Smithsonian Institution, NARA
~300 Proposals, 34 Awards, $48M total
International Digital Libraries Cooperative Research Initiative (1999-03) Sponsors: NSF, JISC, DFG
~60 Proposals, 16 Awards (6 with JISC, 4 with DFG), $6M NSF total
Int’l DL Cooperative Research Initiative and Applications (2002- ) Sponsors: NSF, JISC, DFG
~50 Proposals, Awards in process (4 with JISC, 2 with DFG)
5
Select Digital Library Development Milestones
1994 NSF Digital Library Initiative Phase 1 (DLI-1)
The First Annual Conference on the Theory and Practice of Digital Libraries, College Station,
Texas
1995 First IEEE Advances in Digital Libraries Conference, McClean, Virginia
1996 First ACM Conference on Digital Libraries, Bethesda, Maryland
1997 First European Conference on Research and Advanced Technology for Digital Libraries (ECDL),
Pisa, Italy
1998 First International Conference on Asian Digital Libraries (ICADL 1998), Hong Kong, China
1999 President’s Information Technology Advisory Committee (PITAC) Report
NSF Digital Library Initiative Phase 2 (DLI-2)
NSF National Science, Mathematics, Engineering, and Technology Digital Library (NSDL) Program
ICADL 1999, Taipei, Taiwan
2000 ICADL 2000, Seoul, Korea
2001 ICADL 2001, Bangalore, India
First ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2001), Roanoke, Virginia
2002 ICADL 2002, Singapore
2003 ICADL 2003, Kuala Lumpur, Malaysia
2004 JCDL 2004, Tucson, Arizona
ICADL 2004, Shanghai, China
7
Overview of ICADL
• ICADL (International Conference of Asian Digital Libraries)
• Overview of ICADL – 80 participants in Hong Kong in 1998 (host: CS)
– 150+ participants in Taipei, Taiwan in 1999 (host: LIS)
– 300+ participants in Seoul, Korea in 2000 (host: CS)
– 600+ participants in Bangalore, India in 2001 (host: LIS)
– 400+ participants in Singapore in 2002 (host: LIS)
– 350+ participants in Kuala Lumpur, Malaysia in 2003 (host: NLM)
– 350+ participants in Shanghai, China in 2004 (host: LIS, CS)
• ICADL 2005, Bangkok, Thailand in December 2005
8
Summary on Participation of ICADL Conferences
ICADL
1998
ICADL
1999
ICADL
2000
ICADL
2001
ICADL
2002
ICADL
2003
ICADL
2004
# of Papers 23 18 37 34 54 71+ 73
# of Papers from Asia 18 14 26 25+ 31+ 53 54
# of Countries (regions) 7 7 12 12 20 16 17
# of Countries from Asia 6 5 7 9 12 11 12
# of Institutions 17+ 14+ 31+ 33+ 55+ 54+ 61+
# of Institutions from
Asia
12+ 10+ 21+ 22+ 23+ 39+ 44+
# of Academic
Departments/Disciplines
6+ 6+ 8+ 8+ 11+ 17+ 18+
9
Increase of Papers Accepted in ICADL
0
10
20
30
40
50
60
70
80
ICADL
1998
ICADL
1999
ICADL
2000
ICADL
2001
ICADL
2002
ICADL
2003
ICADL
2004
# of Papers
# of Papers from Asia
10
Increase of Countries Represented in ICADL
0
5
10
15
20
25
ICADL
1998
ICADL
1999
ICADL
2000
ICADL
2001
ICADL
2002
ICADL
2003
ICADL
2004
# of Countries
# of Countries from
Asia
11
Topical Analysis in ICADL
• Digital library research is not restricted to only technical aspects; it involves social aspects as well.
• From a technological perspective, digital libraries are a set of electronic resources that are built to help create, search, and use information.
• From a sociological perspective, digital libraries are constructed by a community of users who use the system to better support their informational needs and applications. (Borgman, 1998)
12
Topical Analysis: Social Aspect
• Multicultural Issues
– In Asian digital library applications, there are
countless scenarios that involve creating and
distributing locally produced information collections.
– e.g.
• INFLIBNT project aimed at creating a digital library of theses
and dissertations from India (Vijayakumar and Murthy, 2001).
• The Tsinghua University Architecture Digital Library
developed a prototype system to provide rich, valuable
resources for traditional Chinese architecture research and
education (Xing et al., 2002).
13
Topical Analysis: Social Aspect
• Asian Languages and Cross-lingual Issues
– A crucial feature of Asian digital libraries is the ability to work in various local languages.
– Chinese, Japanese, Korean, Indian, Malaysian, and Thai language processing techniques have been reported.
– e.g. • Wong and Li (1998) and Yang et al. (1998) both studied
Chinese information retrieval and discussed issues related to Chinese language indexing techniques.
• Theeramunkong et al. (2002) investigated using n-gram and HMM approaches for Thai OCR application.
14
Topical Analysis: Social Aspect
• Asian Languages and Cross-lingual Issues (cont’d)
– Cross-lingual information retrieval between English
and Asian languages has been more widely studied in
ICADL conferences than in other western digital
library conferences.
– e.g.
• Qin et al. (2003) presented an English-Chinese cross-lingual
Web retrieval system in the business domain.
• Sugimoto (2001) presented a multilingual document browsing
tool and its metadata creation carried out at ULIS.
15
Samples of Significant Digital Library
Research in Asia Pacific: Capturing
Cultural Heritage and Indigenous
Knowledge
16 http://www.iidl.net
• Focus
– To provide information on Islam and Muslims around the world
– To act as a referral centre to direct information enquiries on Islam to the appropriate sources
– To promote sharing and exchange of knowledge among scholars of Islam and those interested in it
– To enable the world to understand Islam better
• Partners – National Library of Malaysia
– Multimedia Development Corporation
– International Islamic
University Malaysia Library
International Islamic Digital Library – Malaysia
17 http://www.iidl.net
• Contents
– Books,
– Manuscripts
– Special collections,
– Theses and articles,
– Journals and conferences papers,
– Pictures, audios and videos
• Service
– Both in Arabic and English
– Category browse
– Browse search
– Keyword search
– Expert search
– Broadcast search
International Islamic Digital Library – Malaysia
18
• Focus
– To develop information processing tools to facilitate human machine interaction in Indian languages and multi-lingual knowledge resources.
– To support R&D efforts in the area of information processing in Indian Languages and to support research on knowledge tools: representation, integration, compression and learning methodologies.
– To consolidate technologies thus developed for Indian languages and integrate these to develop innovative user products and services.
http://www.tdil.mit.gov.in
Technology Development for Indian Languages – India
19
• Funding
– Ministry of Information Technology, India
• Partners
– Indian Institute of Technology, Kanpur Hindi, Nepali
– Indian Institute of Technology, Mumbai Marathi, Konkani
– Indian Institute of Technology, Guwahati Assamese,
Manipuri
http://www.tdil.mit.gov.in
Technology Development for Indian Languages – India
20
Technology Development for Indian Languages – India
• Contents
– Multi-lingual dictionaries,
– Thesauri,
– Educational software,
– Encyclopedia,
– Gyan-nidhi creative writing system,
– Translation support systems,
– OCR,
– Text-to-speech & speech recognition system,
– Pocket translator,
– Personal digital assistants,
– Reading machine for blinds & deaf,
– Portals,
– e-governance / e-commerce / e-skills.
http://www.tdil.mit.gov.in
21
• Focus
– Strengthen and protect the cultural tradition
and heritage
– Enhance the usage and sharing of
information resource
– Serve the national projects and related
researches
http://www.nlc.gov.cn
China Digital Library – China
22
• Funding – 10th “Five-year Project”
– Ministry of Culture, China
• Partners – National Library of China
– Tsinghua University
– Peking University
– China Academy of Science
– China Academy of Social Science
– etc. (more than 100 different types of libraries and partners)
http://www.nlc.gov.cn
China Digital Library – China
24
Research Directions for Digital Libraries
(The Next Decade):
- JCDL 2004, 2005, 2006
- NSF Cyberinfrastructure Program
- NSF Chatham Digital Library Workshop
25
Overview of JCDL 2004
• Joint IEEE-CS/ACM Conference on Digital
Libraries (JCDL), Tucson, AZ
• Theme: Global Reach and Diverse Impact.
– Co-Chairs:
• Hsinchun Chen, Howard Wactlar, Ching-chih Chen
26
Strong Participation in JCDL’04
JCDL 2004
summary
Participation
number Details
# of Papers 61 50 papers are from USA and 11 from other
countries. 200+ submissions.
# of Countries
12 Participating countries include: USA,
Canada, the Netherlands, Portugal, the
United Kingdom, Brazil, Australia, New
Zealand, China, Taiwan, Japan, etc.
# of Institutions 78 61 out of 78 are US institutions and 17 are
institutions from other countries.
# of Departments
/ Disciplines
19 CS, Information Science, Library, MIS, Arts,
Biostatics, Medicine, Religion…
# of Participants 450 Panels, paper sessions, tutorials, workshops
27
JCDL 2005 and 2006
• JCDL 2005, Denver, Colorado
• JCDL 2006, Charlotte, North Carolina
28
Social Networks,
Cyberinfrastructure (CI)
and Meta CI
Daniel E. Atkins
School of Information & Dept. of Elec. Engr. & Comp. Sci.
University of Michigan Ann Arbor
[email protected] November 2005
30
NSF Blue Ribbon Advisory Panel on
Cyberinfrastructure
“a new age has dawned in scientific and engineering research, pushed by continuing progress in computing, information, and communication technology, and pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.”
32
NSF states intent to
“play a leadership role”
• NSF will play a leadership role in the development and support of a comprehensive cyberinfrastructure essential to 21st century advances in science and engineering research and education.
• NSF is the only agency within the U.S. government that funds research and education across all disciplines of science and engineering. ... Thus, it is strategically placed to leverage, coordinate and transition cyberinfrastructure advances in one field to all fields of research. – From NSF Cyberinfrastructure Vision for the 21st Century
Discovery
33
Cyberinfrastructure-enhanced
Knowledge Communities (Networks)
Outcomes: New Ideas, New Tools, Education &
Career Development, Outreach*
Attributes: Collaborative, Multidisciplinary,
Geographically Distributed, Inter-institutional*
Specific Cyber Environments:
collaboratories, grids, e-science community,
virtual teams, community portal, ...
Cyber-infrastructure Services
Equipment, Software, People, Institutions
Computation, Storage, Communication and
Interface Technologies
Bro
ader A
pplic
atio
n to
oth
er d
iscip
lines a
nd
types o
f activ
ity.
* From Cummings & Kiesler (2003) report on KDI Initiative: ultidisciplinary scientific
collaborations, see http://www.p2design.com/papers/kdi.pdf
34
Cyberinfrastructure
includes both
• Technology Infrastructure (creation and
provisioning) - middleware, portals, HPC,
hybrid (IP & lambda) networks, ....
• Social Infrastructure (competition &
cooperation, IP policies, incentive
structure, cost, etc.)
36
Transforming the Information Landscape:
Research Directions for Digital Libraries
Ronald L. Larsen
School of Information
Sciences
University of Pittsburgh
Report of the NSF Workshop on Research Directions for
Digital Libraries*
*Knowledge Lost in Information, NSF Award No. IIS-0331314
37
Workshop Background
• DLI over, DLI2 winding down
• What is next?
– Did we finish the job? Are we done?
• What have we learned?
– What constitutes DL research?
– Does it influence other disciplines?
• Should DL change from
“Initiative” to “Program”?
38
Emerging U.S. Vision for DL’s
Next generation digital libraries will be:
• A confluence of resources, technology and infrastructure
• An intersection of national priorities and scientific goals
• A common testbed for all computer and information science research (sub)disciplines
• Federated resources serving individuals, institutions and governments simultaneously
• A progression from information to knowledge
39
Users
• Cognitive Completion
– MS spell-checker for facts and knowledge
– Task and user context sensitive
• Do what I mean
– Find what I need
– Be aware of what I know
• Collaboration
– Identifying the collegial context
– Providing contextual guidance
• Managing personal libraries
– All that is seen and heard
– Personal memory assistant
40
National priorities influence IT research agenda
Advances in Science and Engineering
Economic Prosperity and
Vibrant Civil Society
National and Homeland Security
Information
Technology
Research
Digital Libraries form the Enabling Resources, Technology & Infrastructure
41
National priorities influence IT research agenda
• Advances in Science and Engineering – Advance the frontiers of science and engineering research and education
– Examples include those that collect, disseminate, and analyze observational or experimental data, or data from models or simulations
• Economic Prosperity and Vibrant Civil Society – Human and socio-technical aspects of current and future distributed
information
– Topics include business, work, health, government, learning, and community, and their related policy implications.
• National and Homeland Security – Robust Information Technology to protect critical infrastructures and support
the understanding of threats to national security
– Examples include collaborative knowledge environments, knowledge discovery, medical informatics, information extraction and fusion, cross-linguality, spoken language and imagery, social network analysis
42
Recommendations
• $20M / year for new U.S. research
– Search, context, extraction, ubiquity, productivity
• $40M /year for sustaining evolving resources in the U.S.
– Acquisition, access, usage, stewardship, management
• Coordinate with Advanced Cyberinfrastructure Program
43
DL Research After the First Decade –
Global Reach and Diverse Impact!
Can DL help with international security?
Can DL help save human life and
improve patient well-being?
45
BioPortal: Disease and Bioagent Information Sharing,
Surveillance, Analysis, and Visualization
• Research Team
University of Arizona
University of California, Davis
Kansas State University
Arizona Department of Public Health
University of Utah
New York State Department of Health/HRI
California Department of Health Services/PHFE
U.S. Geological Survey
The SIMI Group
• Acknowledgements: NSF, ITIC, DHS, DOD/AFMIC, IDIWC, AZDPS
46
Research Partners and Supports
• University of Arizona
• University of California, Davis
• Kansas State University
• University of Utah
• Arizona Department of Public Health
• New York State Department of Health/HRI
• California Department of Health Services/PHFE
• U.S. Geological Survey
• The SIMI Group
• NSF
• CIA/ITIC
• DHS
• DOD/AFMIC
• CDC
• AZDPS
47
Outline
• Project Background
• BioPortal V1.0 Achievements
– System Architecture
– System Functionalities
– BioPortal Collaboration Framework
• New Developments
– International Foot-and-mouth Disease Monitoring
– Syndromic Surveillance
– Livestock Health Surveillance
49
Background (I)
• In September, 2002, representatives of 18 different agencies, including DOD, DOE, DOJ, DHS, NIH/NLM, CDC, CIA, NSF, and NASA, were convened to discuss “disease surveillance.”
• An interagency working group called Disease Informatics Senior Coordinating Committee (DISCC) was established.
• DISCC established an Infectious Disease Informatics Working Committee (IDIWC) to survey the field and identify gaps.
• IDIWC developed “requirements” for a National Infectious Disease Informatics Infrastructure (NIDII).
50
Background (II)
• In June, 2003, IDIWC was charged with the task of developing one or more rapid prototype systems to demonstrate interoperability and innovation across species and jurisdictions.
• Botulism and West Nile virus were selected as diseases.
• States of New York and California were selected as partners.
• The University of Arizona was chosen as integrator and was provided with a supplement to an existing NSF grant.
51
BioPortal Project Goals
• Demonstrate and assess the technical feasibility and
scalability of an infectious disease information
sharing (across species and jurisdictions),
alerting, and analysis framework.
• Develop and assess advanced data mining and
visualization techniques for infectious disease data
analysis and predictive modeling.
• Identify important technical and policy-related
challenges in developing a national infectious
disease information infrastructure.
52
BioPortal V1.0 Accomplishments
• Prototype system design and development – Initial design and implementation of interoperable
messaging backbones
– Live prototype systems
– Preliminary user evaluation
• Information sharing – Data sharing agreements/memoranda of understanding
(MOUs) developed
– Many disease datasets integrated into the portal
• Analysis and visualization – Hotspot analysis research
– Spatial-Temporal Visualizer (STV)
53
Data Ingest Control
Module Cleansing / Normalization
New
Info
-Sh
ari
ng
In
fra
str
uctu
re
NYSDOH CADHS
XML/HL7
Network
PHINMS
Network
Adaptor Adaptor Adaptor
Portal Data Store
(MS SQL 2000)
SS
L/R
SA
SS
L/R
SA
Information Sharing Infrastructure Design
54
Data Access Infrastructure Design
Web Server (Tomcat 4.21 / Struts 1.2)
Data Store
(MS SQL 2000)
Da
ta S
tore
W
NV
-BO
T P
ort
al
Data Search
and Query
Spatial-
Temporal
Visual-
ization
HAN or
Personal
Alert
Management
Analysis /
Prediction
User Access Control API (Java)
Dataset
Privileges
Management
Browser (IE/Mozilla/…)
Public health
professionals,
researchers, policy
makers, law enforcement
agencies & other users
Access
Privilege
Def.
SSL connection
55
BioPortal Collaboration Framework
• A Memorandum of Understanding (MOU) is used to document the relationship between parties that will be sharing data: – Who the entities are and how they will act
independently and cooperatively
– What the mutual interests, benefits, and purposes of sharing data are
– How each party will maintain control over and share their resources, and what each party shall provide to the other (e.g., system accounts, portal access)
– Which types of data are to be shared (e.g., dead bird surveillance)
56
Datasets Integrated: WNV, BOT Index Dataset Test Data Available
Data Duration
(MM/YY) Data Size
Spatial
Granularity
Temporal
Granularity
1 [NY] WNV Human Yes Test Data 574 Zip Date
2 [NY] Dead Bird Yes Test Data 942 Lat/Long shifted
3 [NY] Mosquito Yes Test Data 815 County Date
4 [NY] WNV Captive Animal Yes Test Data 39 Zip Date
5 [NY] Botulism Human Yes Test Data 10 Zip Date
6 [CA] WNV Human Yes 09/03–10/03 186 County Date
7 [CA] Dead Bird Yes 01/03–10/03 3032 City/zip Minutes
8 [CA] Chicken Sera Yes 04/03–10/03 18887 Site Date
9 [CA] Mosquito Pool Yes 01/98–10/03 3518 Site Date
10 [CA] Botulism Yes 01/01–12/02 53 Zip Date
11 [USGS] EPIZOO - Preliminary Yes 07/99–09/03 46 events County Date
12 [USGS] EPIZOO – WNV Yes 08/1999-07/2004 113 events County Date
13 [USGS] EPIZOO - Botulism Yes 12/1989-12/2004 702 events County Date
14 [UC Davis] FMD Yes 1996 - 2003 3288 Site/Province Date/Month
15 International FMD Yes 01/1982-03/2005 6789 Province Non-temporal
16 BioWatch Yes 1/10- 1/17 2004 480 Exact Site Date
17 [CA] Mosquito Treatment Yes 1/14-11/30 2004 6194 Exact Site Date
18 National Infant Botulism Yes 1/16-11/25 2004 15 Zip Date
57
Communications/Messaging
• Scalable, flexible, light-weight, and extendible. Easy to include: – New diseases
– New jurisdictions
– New techniques!
• Messaging infrastructure – installed and tested – NYSDOH-UA: PHIN MS
– CADHS-UA: Regional message broker
– NWHC-UA: PHIN MS
• XML generation/conversion – NY_DeadBird, NY_Alerts, NY_BotHuman, NY_WNVHuman,
NY_CaptiveAnimal, NY_Mosquito
– CA_BotHuman, CA_WNVHuman, CA_DeadBird, CA_Chicken, CA_Mosquito
– USGS_Epizoo
58
BioPortal Research Framework
• BioPortal – Demo: Develop the system for
demonstration purposes using scrubbed data.
Refine system functionality and performance
based on user feedback.
• BioPortal – Operation: Develop the system for
production mode with real data and real users.
• BioPortal – Research: Continue to develop
advanced technologies and practical sharing
policies. Expand to new diseases and jurisdictions.
60
Spatio-Temporal Data Mining & Hotspot Analysis
• A hotspot is a condition indicating some form of clustering in a spatial and temporal distribution (Rogerson & Sun 2001; Theophilides et al. 2003;
Patil & Tailie 2004).
• For WNV, localized clusters of dead birds typically identify high-risk disease areas (Gotham et
al. 2001).
• Automatic detection of dead bird clusters using hotspot analysis can help predict disease outbreaks and aid in effective allocation of prevention/control resources.
61
Existing Hotspot Analysis Approach: SaTScan
• The spatial scan statistical techniques implemented in SaTScan are widely used to detect and evaluate disease outbreaks (Kulldorff 2001). – NYSDOH has used SaTScan to develop an early
warning system for WNV (Gotham et al. 2001).
• An important factor considered by spatial scan statistical analysis is the baseline. – The significance of the density of dead birds depends
on the historical distribution of bird deaths, human population, and so on.
62
Other Hotspot Analysis Approaches: CrimeStat and RSVC
• Hotspot analysis techniques applied to crime analysis: CrimeStat (Levine 2002).
• CrimeStat’s Risk-Adjusted Nearest Neighbor Hierarchical Clustering (RNNH): Uses a kernel density estimation obtained from baseline data to adjust the threshold that controls whether data points can be grouped together.
• Risk-Adjusted Support Vector Machine Clustering (RSVC): It combines the power and flexibility of support vector machine-based clustering and the risk adjustment idea of RNNH.
63
Case Study (NY WNV)
– On May 26, 2002, the first dead bird with
WNV was found in NY • Based on NY’s test dataset
March 5 May 26 July 2
baseline new cases
140 records 224 records
64
NY Dead bird 2002
RNNH
SaTScan #1
SaTScan #2
Analysis results from SaTScan and RNNH
high density
area
Zoom in
Baseline cases in zoomed-in area Baseline + new cases in zoomed-in area Hotspot analysis results
SaTScan
RNNH
Hotspots Zoom in
Close-up of the hotspots
SaTScan picks
large cluster
- 71 new
- 7 baseline
RNNH picks
small cluster
- 53 new
- 6 baseline
66
Hotspot Analysis Findings
• RSVC delivers similar recall levels and
higher precision than SaTScan.
• RNNH matches RSVC precision, but has
very low recall.
• RSVC significantly outperforms other
methods in the F-measure.
• Techniques could be complementary for
different hotspot analysis tasks.
67
Spatial-Temporal Visualization
• Integrates four visualization techniques – GIS View
– Periodic Pattern View
– Timeline View
– Central Time Slider
• Visualizes the events in multiple dimensions to identify hidden patterns – Spatial
– Temporal
– Hotspot analysis
– Phylogenetic (planned)
68
User Login
Choose WNV disease
data
User main page Available dataset list
Select CA dead bird,
chicken and NY dead
bird data
Select CA dead bird,
chicken and NY dead
bird data
Advanced Search criteria
Positive cases
Positive cases
Positive cases
Specify bird
species
Dataset name
Spatial / Temporal
Time range
County / State
Results listed in table Select background maps
Select NY / CA
population, river and
lakes
Start STV
69
Zoom in
NY
Timeline
Periodic
Pattern
GIS
Control
panel View all 3 year
data
Overall pattern
NY dead bird temporal
distribution pattern
1 year window
in 3 year span
Concentrated
in May / Jun
NY dead bird temporal
distribution pattern
Move time
slider, year 2 Similar time
pattern
NY dead bird temporal
distribution pattern
Move time
slider, year 3 Similar time
pattern
NY dead bird temporal
distribution pattern
Zoom in
Close Close
Year 2001
data
Spatial distribution
pattern
Spatial distribution
pattern
2 weeks
window
Spatial distribution
pattern
70
Move time
slider
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Spatial distribution
pattern
Season end
Dead bird cases
migrate from long island
Into upstate NY
Enable
population
map
Overlay population map
Dead bird cases
distribute along
populated areas near
Hudson river
71
BioPortal New Developments
• NSF Infectious Disease Informatics Grant (2004-
9)
• International Foot-and-Mouth Disease BioPortal
(2005-6); FMD Lab, UC Davis
• Human Syndromic Surveillance System; Arizona
State Department of Health (2005-6)
• Livestock Syndromic Surveillance System;
Kansas State University RSVP-A (2005-6)
72
New Research Directions
• Analytical Algorithms – Prospective hotspot analysis & auto baseline discovery
– Spatial-Temporal correlation analysis
– Dynamic Network Analysis
• Visualization – International FMD news visualization
– Phylogenetic Spatial-Temporal visualization
• Syndromic Surveillance – Syndromic surveillance system survey
– Emergency room chief complaint syndromic classification
– Livestock syndromic surveillance
73
Extended BioPortal Research Framework
• BioPortal – Demo
• BioPortal – Operation
• BioPortal – Research
• FMD – BioPortal: A dedicated instance of BioPortal customized for International Foot-and-Mouth disease monitoring. Additional functionalities such as gene sequence analysis and FMD News are added
• BioPortal – Syndromic Surveillance: A specialized BioPortal instance that processes chief complainants using a hybrid method of ontology and knowledge rules
• BioPortal – Livestock: A BioPortal instance devoted in Livestock syndromic surveillance case management and data analysis
75
Introduction
• Foot-and-mouth disease (FMD) is the top disease on the Office International des Epizooties (OIE) List A, which can infect all cloven-hoofed animals.
• FMD is the most contagious infectious diseases of livestock animals:
– Massive shedding of virus and contamination of the environment.
– Transmitted by direct or indirect contact (droplets), animate vectors (humans), inanimate vectors (vehicles
– Serologically diverse with seven distinct types (A, O, C, SAT1, SAT2, SAT3, Asia1), which makes diagnosis and vaccination problematic, and genetic diversity likely.
• Endemic in Africa, Asia, Middle East and South America
• Potential cost for U.S. outbreaks: >$10 billion
• Broader economic impact: trade and travel restrictions.
77
International FMD BioPortal Goals
• Real-time, web-based situational awareness of FMD
outbreaks worldwide through the establishment of an
international information sharing and analysis system
• FMDv characterization at the genomic level integrated
with associated epidemiological information and
modeling tools to forecast national, regional, and/or
international spread and the prospect of import into the
U.S. and the rest of North America
• Web-based crisis management of resources—
facilities, personnel, diagnostics, and therapeutics
78
Research Plans
• Global FMD epidemiological data
– (Near) real-time data collection
– Web-based information sharing and analysis
• International FMD news
– Indexed collection of global FMD news
– Search and visualization of the FMD news via the web
• FMD genetic/sequence data
– Predictive model using phylogenetic, spatial, and temporal
information to stop FMD at the boarder
– Visualization for FMD event in time, space, and genetic
space
79
Preliminary Global FMD Dataset
• Provider: UC Davis FMD Lab
• Information sources: reference labs and OIE
• Coverage: 28 countries globally
• Time span: May, 1905 – March, 2005
• Dataset size: 30,000+ records of which 6789 records are complete
• Host species: Cattle, Caprine, Ovine, Bovine, Swine, NK, Elephant, Buffalo, Sheep, Camelidae, Goat
Ovine
37%
Bovine
37%
Caprine
4%
Cattle
5%
Sheep
3% Camelidae
0%
Goats
0%
Swine
11%
Elephant
0%
Buffaloes
3%
Regionwise Distribution of FMD Data
South America
66%
Central and South
Asia
15%
Africa
1%
Middle East Asia
4%Europe
14%
81
International FMD News
• Provider: UC Davis FMD Lab
• Information sources: Google, Yahoo, and
open Internet sources
• Time span: Oct 4, 2004 – present (real-time
messaging under development)
• Data size: 460 events (6/21/05)
• Coverage: 51 countries (Africa:11, Asia:16,
Europe:12, Americas:12)
Africa
11% Aisa
1%
Asia
15%
Australia
14%
Europe
27%
America
27%
UNDEFINED
5%
84
FMD Genetic Information Analysis
• Genome clustering analysis
– Phylogenetic clustering
– Spatial clustering
– Temporal clustering
• Hotspot detection among gene sequences
– Create a tree structure based on semantic distance
between gene sequences.
– Automatically detect the dense portion of the tree.
– Identify the connection between the semantic cluster and
the geographic pattern of gene sequences.
85
FMD Genetic Visualization
• Goal: Extend STV to incorporate 3rd dimension, phylogenetic distance – Include a phylogenetic tree.
– Identify phylogenetic groups and color-code the isolate points on the map.
– Leverage available NCBI tools such as BLAST.
• Proof of concept: SAT 2 & 3 analysis – Data: 54 partial DNA sequence records in South
Africa received from UC Davis FMD Lab (Bastos,A.D. et al. 2000, 2003)
– Date range: 1978-1998
– Countries covered: South Africa, Zimbabwe, Zambia, Namibia, Botswana
87
Phylogenetic Trees
AY168791
AY168790
AY168797
AY168796
AY168795
AY258044
AY168794
AY258045
AY258048
AY168792
AY258052
AY258051
AY258050
AY258043
AY168800
AY168799
AY258047
AY168793
AY258046
AY168801
0.02
AY
168791
AY
168790
AY
168797
AY168796
AY16
8795
AY258044
AY168794
AY
2580
45
AY
258048
AY
1687
92
AY
258052A
Y25
8051AY258050
AY258043
AY
1688
00
AY168799
AY258047
AY168793
AY
258046
AY
168801
0.02
AY
168791
AY
168790
AY16
8797
AY168796
AY168795
AY258044
AY168794
AY258045AY258048A
Y168792
AY
2580
52
AY
258051
AY25
8050
AY258043
AY168800
AY168799
AY258047
AY168793
AY258046
AY
168801
0.02
88
Phylogenetic Tree of Sample FMD Data
Identify 6 groups within 2 major families (MEGA3; based on sequence similarity)
Group4
Group2
Group3
Group1
Group5
Group6
89
Genetic, Spatial, and Temporal Visualization of FMD
Data
Isolates’
locations color coded
Phylogenetic tree color coded
Isolates’
appearances in time
90
FMD Time Sequence Analysis
2nd family cases exist before 1993 and a comeback lately
First family cases appeared throughout
the period
Second family cases existed before 1993 and reappeared later after
1997
94
BioPortal: Future Work
• Complete open source, generic BioPortal
architecture and system
• Develop multi-lingual BioPortal
• Incorporate other diseases, e.g., avian
influenza, SARS
• Solicit partners and expand test sites
• Continue infectious disease informatics
research
97
“Library as the Place”
“Librarian in Context”
Role of a librarian in a digital world?
Challenges and opportunities facing the
library and information profession?
98
For more information:
Hsinchun Chen
AI Lab web site:
http://ai.arizona.edu