18
Databases עעעעע עעעע עעעעע עעעעע

Databases מאגרי מידע

Embed Size (px)

DESCRIPTION

Databases מאגרי מידע. אחסון שליפה. Different kinds of DBs dealing with biological information retrieved by various means. DNA. RNA. protein. phenotype. Protein sequences Translated nuc sequences Protein domains Protein structure. Diseases polymorhism Gene expression - PowerPoint PPT Presentation

Citation preview

Page 1: Databases מאגרי מידע

Databases

מאגרי מידע

אחסון

שליפה

Page 2: Databases מאגרי מידע

DNA RNA

•cDNA•ESTs•Non-coding RNA

phenotype

DNA sequences (individual genes or complete genomes)

•Protein sequences •Translated nuc sequences•Protein domains•Protein structure

protein

•Diseases•polymorhism•Gene expression•Prot-prot interactions

Different kinds of DBs dealing with biological information retrieved by various means

Page 3: Databases מאגרי מידע

• A database is a structured collection of information.

• A database is composed of basic objects called records or entries (רשומות).

• Each record is composed of fields (שדות), which hold defined data that is related to that record.

Common to all databases

Let’s consider the following database of students learning bioinfo in HUJI

Page 4: Databases מאגרי מידע

A database can be thought of as a large table, where the rows represent records and the columns represent fields.

Databases

IDFirst NameLast NameGenderComments0775523/7SharonAsulinfemaleLikes scuba

diving020304/4NuritNivfemaleComes from

Cuba03321/3NuritSharonfemale-88924/5YossiYarkonmaleFather of

sharon – must go home earlier

ID (Accession Numbers): Unique identifiers of the database records.

Each record has

unique identifier

For some records there is only partial information – some fields contain no

data (quality of DB)

Some records contain similar data in some of

the fields

tiroshy
more defined data (female, male) or less defined (comments)
Page 5: Databases מאגרי מידע

Data Retrieval• The purpose of databases is

not merely to collect and organize data, but mainly to allow advanced data retrieval.

• A query (שאילתא) is a method to retrieve information from the database.

• The organization of each record into predetermined fields, allows us to use queries on fields.

Page 6: Databases מאגרי מידע

The best search strategy…

Page 7: Databases מאגרי מידע

1. Think – phrase your scientific question.

2 .Choose appropriate database

Boolean operatorsKeywords

Fields

Syntax

Phrase your query

4. Access additional entries discussing same or similar entities by links to additional databases.

5 .Think, evaluate. The computer is just a machine.

You are (hopefully) a thinking organism.

Page 8: Databases מאגרי מידע

Terms/words for search [field] + (BOLLEAN OPERATORS) Terms/words for Search [field]

Phrasing a query…

Page 9: Databases מאגרי מידע

cell OR cycle

cell NOT cycle

1 AND 2

1 OR 2

1 NOT 2

1

1

2

2

cell AND cycle1 2

“cell cycle”

Boolean Operators

Cell* - cell, cells, cellular etc)

Page 10: Databases מאגרי מידע

The secretary wants to locate the record of the student Sharon Asulin but does not remember the last name – search Sharon

FieldID

First NameLast Name

GenderComments

0775523/7SharonAsulinfemaleLikes scuba diving

020304/4NuritNivfemaleComes from Cuba

03321/3NuritSharonfemaleReceives scholarship

88924/5YossiYarkonmaleProud father of sharonThe search was not limited to a certain field Sharon[all fields]

Page 11: Databases מאגרי מידע

OOPS!!

Retrieved too many records that don’t match the required data - too much noise.

Page 12: Databases מאגרי מידע

Not found (-)

Found (+)

RelatedFalse negative

True positive

UnrelatedTrue negative

False positive

Search results“sci

entific

trut

h”

Evaluating Search Results

Page 13: Databases מאגרי מידע

FieldID

First NameLast NameGenderComments

0775523/7SharonTrue positive

AsulinfemaleLikes scuba diving

020304/4NuritNivfemaleComes from Cuba

03321/3NuritSharonFalse positive

femaleReceives scholarship

88924/5YossiYarkonmaleProud father of sharon False positiveWhat can we do to reduce/eliminate false positives

without reducing true positives?

Page 14: Databases מאגרי מידע

Sensitivity

Ability of a method to detect positives, irrespective of how many false positives are reported.

Selectivity

Ability of a method to reject negatives, irrespective of how many false negatives are rejected.

Sensitivity Selectivity

Page 15: Databases מאגרי מידע

Find all students whose first name is SharonSharon[first name]

Keyword synthax (NCBI) field definition

Let’s refine our search

IDFirst NameLast Name

GenderComments

0775523/7

SharonAsulinfemaleLikes scuba diving

020304/4NuritNivfemaleComes from Cuba

03321/3NuritSharonfemaleReceives scholarship

88924/5YossiYarkonmaleFather of sharon – must go home earlier

Page 16: Databases מאגרי מידע

IDFirst NameLast Name

GenderComments

0775523/7

SharomAsulinfemaleLikes scuba diving

020304/4NuritNivfemaleComes from Cuba

03321/3NuritSharonfemaleReceives scholarship

88924/5YossiYarkonmaleFather of sharon – must go home earlier

Now we don’t retrieve any answer (false negative?) and we are still not distracted by the noise.

The original search phrase sharon[all fields] would have retrieved all the noise but not the required info.

Page 17: Databases מאגרי מידע

The secretary wants to locate the record of the female student who comes from Cuba but does not remember her name.Search female[gender] AND *cuba*[comments] Keyword synthax (NCBI) field definition Boolean operator

FieldID

First NameLast Name

GenderComments

0775523/7SharonAsulinfemaleLikes scuba diving – false positive

020304/4NuritNivfemaleComes from Cuba true positive

03321/3NuritSharonfemaleReceives scholarship

88924/5YossiYarkonmaleProud father of sharon

Page 18: Databases מאגרי מידע

והעיקר, והעיקר :

לא לפחד כלל