Databases מאגרי מידע

Preview:

DESCRIPTION

Databases מאגרי מידע. אחסון שליפה. Different kinds of DBs dealing with biological information retrieved by various means. DNA. RNA. protein. phenotype. Protein sequences Translated nuc sequences Protein domains Protein structure. Diseases polymorhism Gene expression - PowerPoint PPT Presentation

Citation preview

Databases

מאגרי מידע

אחסון

שליפה

DNA RNA

•cDNA•ESTs•Non-coding RNA

phenotype

DNA sequences (individual genes or complete genomes)

•Protein sequences •Translated nuc sequences•Protein domains•Protein structure

protein

•Diseases•polymorhism•Gene expression•Prot-prot interactions

Different kinds of DBs dealing with biological information retrieved by various means

• A database is a structured collection of information.

• A database is composed of basic objects called records or entries (רשומות).

• Each record is composed of fields (שדות), which hold defined data that is related to that record.

Common to all databases

Let’s consider the following database of students learning bioinfo in HUJI

A database can be thought of as a large table, where the rows represent records and the columns represent fields.

Databases

IDFirst NameLast NameGenderComments0775523/7SharonAsulinfemaleLikes scuba

diving020304/4NuritNivfemaleComes from

Cuba03321/3NuritSharonfemale-88924/5YossiYarkonmaleFather of

sharon – must go home earlier

ID (Accession Numbers): Unique identifiers of the database records.

Each record has

unique identifier

For some records there is only partial information – some fields contain no

data (quality of DB)

Some records contain similar data in some of

the fields

tiroshy
more defined data (female, male) or less defined (comments)

Data Retrieval• The purpose of databases is

not merely to collect and organize data, but mainly to allow advanced data retrieval.

• A query (שאילתא) is a method to retrieve information from the database.

• The organization of each record into predetermined fields, allows us to use queries on fields.

The best search strategy…

1. Think – phrase your scientific question.

2 .Choose appropriate database

Boolean operatorsKeywords

Fields

Syntax

Phrase your query

4. Access additional entries discussing same or similar entities by links to additional databases.

5 .Think, evaluate. The computer is just a machine.

You are (hopefully) a thinking organism.

Terms/words for search [field] + (BOLLEAN OPERATORS) Terms/words for Search [field]

Phrasing a query…

cell OR cycle

cell NOT cycle

1 AND 2

1 OR 2

1 NOT 2

1

1

2

2

cell AND cycle1 2

“cell cycle”

Boolean Operators

Cell* - cell, cells, cellular etc)

The secretary wants to locate the record of the student Sharon Asulin but does not remember the last name – search Sharon

FieldID

First NameLast Name

GenderComments

0775523/7SharonAsulinfemaleLikes scuba diving

020304/4NuritNivfemaleComes from Cuba

03321/3NuritSharonfemaleReceives scholarship

88924/5YossiYarkonmaleProud father of sharonThe search was not limited to a certain field Sharon[all fields]

OOPS!!

Retrieved too many records that don’t match the required data - too much noise.

Not found (-)

Found (+)

RelatedFalse negative

True positive

UnrelatedTrue negative

False positive

Search results“sci

entific

trut

h”

Evaluating Search Results

FieldID

First NameLast NameGenderComments

0775523/7SharonTrue positive

AsulinfemaleLikes scuba diving

020304/4NuritNivfemaleComes from Cuba

03321/3NuritSharonFalse positive

femaleReceives scholarship

88924/5YossiYarkonmaleProud father of sharon False positiveWhat can we do to reduce/eliminate false positives

without reducing true positives?

Sensitivity

Ability of a method to detect positives, irrespective of how many false positives are reported.

Selectivity

Ability of a method to reject negatives, irrespective of how many false negatives are rejected.

Sensitivity Selectivity

Find all students whose first name is SharonSharon[first name]

Keyword synthax (NCBI) field definition

Let’s refine our search

IDFirst NameLast Name

GenderComments

0775523/7

SharonAsulinfemaleLikes scuba diving

020304/4NuritNivfemaleComes from Cuba

03321/3NuritSharonfemaleReceives scholarship

88924/5YossiYarkonmaleFather of sharon – must go home earlier

IDFirst NameLast Name

GenderComments

0775523/7

SharomAsulinfemaleLikes scuba diving

020304/4NuritNivfemaleComes from Cuba

03321/3NuritSharonfemaleReceives scholarship

88924/5YossiYarkonmaleFather of sharon – must go home earlier

Now we don’t retrieve any answer (false negative?) and we are still not distracted by the noise.

The original search phrase sharon[all fields] would have retrieved all the noise but not the required info.

The secretary wants to locate the record of the female student who comes from Cuba but does not remember her name.Search female[gender] AND *cuba*[comments] Keyword synthax (NCBI) field definition Boolean operator

FieldID

First NameLast Name

GenderComments

0775523/7SharonAsulinfemaleLikes scuba diving – false positive

020304/4NuritNivfemaleComes from Cuba true positive

03321/3NuritSharonfemaleReceives scholarship

88924/5YossiYarkonmaleProud father of sharon

והעיקר, והעיקר :

לא לפחד כלל