38
Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley www. sims . berkeley . edu /~ vanhouse

Networked Biodiversity Data and Credibility: Citizen Science and Occurrence Data in CalFlora Nancy Van House SIMS, UC Berkeley vanhouse

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Argument Networked info >> ready access to unpublished

information Information from outside own epistemic community Accessed by people from outside own epistemic

community Issues of trust and credibility

Of info Of sources Of users

This paper: empirical study of a user-designed, state-level biodiversity digital library: How have consortium of users and producers of info addressed

problems of networked data? Practical development of knowledge spaces

Biodiversity Data Broad range of datasets: biological,

geographical, meteorological, geological… Many, varied data producers and users Created, used for different purposes

Large quantities of data that vary in Precision and accuracy Methods of data collection, description, storage

Politically, economically, sensitive data Old data particularly valuable

Change over time How things used to be before…

Biodiversity Data, Epistemic Communities, Knowledge

Spaces Boundary crossings Scientific specialties Planners, governmental agency decision-

makers Plant enthusiasts Resource-extraction industries Environmental activists

Boundary objects (Star) Users Designers Managers Technologists

Botanical Occurrence Data

Specimen in hand (herbarium) Report of sighting, no specimen Literature – published, scientific Literature – e.g. flora: ‘this species occurs in

Marin County… List - e.g., all species observed on the

Bootjack Trail in Tamalpais State Park, Nov. 4, 2002, by Joe Smith

List – “Common plants of Tamalpais State Park”

Observers in the Field as Source of Fine-Grained

Data How far north does California Sun Cup grow? Is Desert Sand Verbena still growing in Los Angeles county?

               (The last observation was recorded in 1935) Does Five-finger Fern grow in Ventura county?

               (There are no direct observations, but it has been reported from surrounding counties)

Does anyone know about that new patch of invasive Artichoke Thistle growing on my local hillside?               (Early alerts to new infestations while they are small & are

easier to eradicate than well established ones) What local biodiversity will be lost if the city decides to allow

that new housing development on the edge of town?               (You can record the plants that are growing there now as a

record for history)

Sources of Data Academic researchers Professionals in government agencies, environmental

organizations, Park rangers, forest service botanists, professionals in

environmental groups… Consultants

Much government research/planning done by consultants Land developers, resource extraction industries hire them

to prepare environmental impact documents Native plant enthusiasts

“Expert amateurs” -- “citizen science” California Native Plant Society

People belong to multiple communities; roles overlap

Risks Unreliable info

Erroneous info Undetected duplication > belief that a species is prevalent

>> not preserving a population of a rare species Chasing after erroneous reported sighting of a rare species Confusing naturally-occurring and cultivated populations

Accurate but not credible info Discounting significant sighting as amateur’s error

Inappropriate use of info Private landowners destroying specimens of a rare plant to

avoid legal limits on land development Collectors (over-)collecting specimens of rare or valuable

species Cacti, orchids, floristic materials, mushrooms

CalFlora

http://www.calflora.org Comprehensive web-accessible database of

plant distribution information for California Independent non-profit organization Designed/managed by people from botanical

community, not librarians or technologists Contributors and users: a coalition of public

and private organizations and individuals To exist, has to be responsive to users’ needs,

concerns, practices and negotiate differences x epistemic and organizational boundaries

In conjunction with UC Berkeley Digital Library (http://elib.cs.berkeley.edu)

CalFlora Priorities

Focus on people; put technology in the back seat

Pay attention to how the world works for the people who produce and use information

Honor existing traditions of data exchange

CalFlora Components

Occurrence Database

Synonyms Database

Photos Database

CalFlora Occurrence Database

> 850,000 geo-referenced reports of observations Specimens in collections Reports from literature Reports from field Checklists

Sources 19 institutions Recently began accepting reports from

registered contributors via Internet

Changing Emphases in Occurrence Data

Existing data - emphasis on Unusual taxa Interesting locations, or where observers happened to be

Surprisingly small #s Most Calif species distributions based on <100 obs

Data collection methods Some emphasize rare taxa, underestimate common Some emphasize common taxa, underestimate rare

New emphasis on common plants Preserving species requires preserving community Better understanding of current distributions Baseline data for future developments

CalFlora Occurrence Database: Significance

Most comprehensive source by far (for Calif) Data from many sources ‘synoptically present’ Adding data from the public:

“When you have 5 million little trail lists for the whole state of Calif…all of a sudden you have a real density of observations [that] would be meaningful.”

Common as well as rare taxa Reasonably easy to use Data downloadable, manipulable Updated quickly Remote access via Internet

CalFlora Tensions Dangers, benefits of info about rare taxa

Controversy over photos, location info– benefits outweigh dangers?

Data Quality Accuracy, (undetectable) duplication Inclusiveness of observations vs. selectivity, quality

Trusting users Benefits vs. dangers of wide access to information Users’ abilities to understand info, use appropriately

vs. guidance from CalFlora, e.g. re quality

Tensions, cont. Quality, precision of mapping

County level too gross; Not too specific for rarities Who bears the cost?

If free, no one has incentive to support it Fee may discourage frivolous use State: if they charge for their data, even $1, they

can deny people access Archiving

Deletion of modalities? Track data back to source, definitions, conditions of collection

Stability of electronic media Stability of independent organization

Tensions, Cont.

Between technologists and information creators and users Techies not understanding the complex

social, organizational, epistemic issues around creation, maintenance, curation, use of digital libraries

Discussed elsewhere – http://www.sims.berkeley.edu/~vanhouse/

p84vanhouse.pdf

Assessing Trustability of Data from Expert

Amateurs

How (Some) Experts Assess Occurrence Reports

The evidence: Type of report (specimen, field observation,

list) Type of search (casual, directed)

The source: Personal knowledge of contributor’s

expertise Examination of other contributions, same

person Annotations by trusted others

Ancillary conditions: Likelihood of that species appearing at that

time, habitat, geographical location Other, similar reports

Current Practice

Know the individuals: “If they are active in CNPS [Calif Native Plant

Society], the people in CNPS know each another…That’s where you get to that really personal level of quality control and assurance and data reliability.”

“We have a collection of the usual suspects.” “When I started my job I went to lots of meetings but I

know everybody now.” Review the observations one by one

“That’s why we have a fairly large concern about any sort of automated library like CalFlora. No one is looking at those kind of things.”

How CalFlora Presents Occurrence Data

Links to data source(s) – personal and institutional

Compliance with institutional source’s requirements Fuzzed locations Links to institutional source’s caveats,

explanations

Publicly-contributed observations Info about observer Info about observation Annotations by experts

Data from the public -- How to identify ‘expert

amateurs’? May be expert in Particular place

Know the common flora Know when something unsual shows up (not

not nec’ly what it is) Particular taxon

Know this taxon and its species and subspecies (but not necessarily others)

Wide range of common taxa But not unusual ones

Contributor Registration

Biography, credentials (free text) Expertise/interests (free text) Affiliation Contact info/web site Vows

“I will submit only my own observations of wild plants. I realize that this system is only for first-hand reports about plants, native and introduced, that are growing without deliberate planting or cultivation.”

“I will…make sure I have the correct scientific name…I will submit uncertain identifications only if I believe them to be very important and time sensitive, and will label such reports ‘uncertain.’”

Contributor Registration (cont) Experience level (self-assessment)

I am a professional biologist/botanist, or have professional training in botany.

Although I do not have formal credentials, I am recognized as a peer by professional botanists.

Although I do not consider myself to have professional-level knowledge, I am quite experienced in the use of keys and descriptions, and/or have expertise with the plants for which I will be submitting observations.

I do not have extensive experience or background in botany, but I am confident that I can accurately identify the plants for which I will be submitting observations.

Occurrence Report

Species identification, habitat, location, date Method of identification

“I recognize …from prior determinations and experience” “I compared this plant with herbarium specimens” “I keyed this plant in a botanical reference” “I compared … with published taxonomic descriptions” “An expert reviewed and confirmed this identification”

Certainty of identification “I am confident of this identification, and submit this as a

positive observation.” “I am not certain of this identification but believe it to be

a significant observation and submit it here as an alert only.”

Observation Contribution Process

Data entered Photo appears (if available) – I.e., “Are you

sure?” If new county record, notice appears

I.e., “Are you sure?” Lists who will be notified – record likely to be

reviewed If new county record, notice sent to county

agricultural officials If listed as rare species, notice sent to appropriate

state agency

Annotations

Herbarium practice: experts annotate records with corrections, comments

CalFlora: registered experts can annotate photos and occurrence records Annotation by an expert raises the

credibility of a record. Actually – how often?

Annotation history viewable

Current Developments: CalFlora Meeting Tomorrow

Invited wide range of interested parties to come discuss future of CalFlora Services Funding

Seeking to create an engaged user group

Seeking to create a community around CalFlora

Attendees: many people no one seems to already know

Knowledge Spaces“ Knowledge is not simply local, it is located....It has place and

creates a space…. “Knowledge spaces have a wide diversity of components: people,

skills, local knowledge and equipment … linked by social strategies and technical devices …

“To move knowledge from the local site and moment of its production and application to other places and times, knowledge producers deploy a variety of social strategies and technical devices for creating the equivalences and connections between otherwise heterogeneous and isolated knowledges….

“Knowledge spaces acquire their … seemingly unchallengeable naturalness thru the suppression and denial of work involved in their construction.”

--David Turbull, Masons, Tricksters, and Cartographers p. 19-20

CalFlora and Local Knowledge

Not as opposed to scientific, but intimate, specific In biodiversity:

Baseline data early warning of subtle changes

How to collect, report, evaluate? CalFlora: retain the modalities Retain link to observer, info about observer

CalFlora as a Knowledge Space

Links layers of data, knowledge Allows user flexibility in moving local knowledge, combining, filtering

different kinds of data, different sources, making linkages, equivalences

Seeks to preserve the work and multiple voices behind the data Seeks to create a knowledge space, epistemic community by

making linkages among CalFlora users and contributors Moving

from small-scale and personal to large-scale and impersonal To large-scale and personal?

Conclusion

Trust as always a critical issue in knowledge Networking as

Foregrounding taken-for-granted practices Making new practices possible Creating new knowledge spaces Making linkages and equivalences across different kinds of knowledge Empowering users to make own linkages, assessments for different

purposes

Information systems as sociotechnical networks Often invisible to the participants who see them as merely technical

Using concepts of knowledge spaces, epistemic cultures to help understand and contribute to system design and use