11
BS(H) Botany 3 rd M Introduction to computer 1005 Maha tariq UniversityOf Education Okara

Analysis of gene expression

Embed Size (px)

DESCRIPTION

Analysis of Gene Expression

Citation preview

Page 1: Analysis of gene expression

BS(H) Botany 3rd MIntroduction to computer

1005Maha tariq

UniversityOf Education Okara

Page 2: Analysis of gene expression

Overview Processes SAGE DDD RNA Seq SDS_PAGE Microarrays

References

Page 3: Analysis of gene expression

Genome analysis tells us what genes are present, but before we can determine the organism’s phenotype, we need to know how those genes are expressed: under what conditions, in what tissues, how much gene product is made, etc. Also, understanding and curing diseases is

tied to the analysis of what genes are expressed in disease states.

Page 4: Analysis of gene expression

Serial Analysis of Gene Expression

The basis of this technique is that a gene can be uniquely identified using only a small (10-30 nt) piece from the 3’ end (which is not translated)

These tags are extracted (from cDNA), then concatenated into long molecules that are amplified with PCR (or cloned) and sequenced.

The number of times each tag appears is proportional to the amount of its mRNA present.

Much SAGE data in NCBI.

Page 5: Analysis of gene expression

DDD is based on data from EST experiments. The NCBI UniGene database combines ESTs for each gene separately. The proportion of ESTs from a given gene can be compared between experimental treatments.

This is obviously limited to well-studied species.

Page 6: Analysis of gene expression

This is a new method, published in 2008. It is probably the method of choice today for analyzing RNA content. Also called whole transcriptome shotgun sequencing.

Very simple: isolate messenger RNA, break it into 200-300 base fragments, reverse transcribe, then perform large scale sequencing using 454, Illumina. Or other massively parallel sequencing technology. RNA sequences then compared to genomic

sequences to find which gene is expressed and also exon boundaries

Exon boundaries are a problem with very short reads: you might only have a few bases of overlap to one of the exons.

As with all RNA methods, which RNAs are present depends on the tissue analyzed and external conditions like environmental stress or disease state.

Get info on copy number over a much wider range than microarrays. Also detects SNPs.

Page 7: Analysis of gene expression

SDS-PAGE is a method for separating proteins according to their molecular weight. SDS = sodium dodecyl

sulfate (a.k.a. sodium lauryl sulfate), a detergent that unfolds proteins and coats them in charged molecules so that their charge to mass ratio is essentially identical. “Native” gel

electrophoresis uses undenatured proteins, which vary greatly in charge to mass ratio.

SDS denaturation isn’t perfect: some proteins behave anomalously,

PAGE = polyacrylamide gel electrophoresis

Page 8: Analysis of gene expression

DNA microarrays and DNA chips are essentially the same thing: a set of DNA molecules attached to a solid substrate in an array of very small spots. Affymetrix is a company that sells microarray

chips attached to a silicon substrate Many microarrays are homemade: DNA spotted

onto glass microscope slides Microarrays work by hybridization: cDNA

made from mRNA is labelled with a fluorescent tag, then hybridized with the array. After washing, only complementary sequences remain bound. A laser scanner excites each spot in turn, and the amount of fluorescence is read. The level of fluorescence is proportional to the amount of mRNA present in the original prep.

Originally, cDNA from each gene was used to make the array, Later, synthetic oligonucleotides were used, and today, 50-60 not synthetic oligonucleotides based on the gene sequences seem to be the standard.

In most cases, RNAs from two different conditions are compared (experimental vs. control). The two cDNAs derived from the RNAs are labelled with Cy3, a green-fluorescing dye, and Cy5, a red-fluorescing dye. If the two RNAs are present in equal amounts,

you get a yellow spot; otherwise red or green predominates.

Page 9: Analysis of gene expression

Microarray data is subject to a lot of potential errors. These fall into 3 main categories: replication, background subtraction, and data normalization.

Replication of each experimental data point is essential. There is a lot of variation between spot intensities in a typical experiment, especially with home-created microarrays.

The background fluorescence level needs to be subtracted from all data points. Since the background is not necessarily uniform, this can lead to spots with negative intensities (which can be set to zero).

Data normalization means attempting to bring the variance of the expression level to a constant value. It has been observed that the variance tends to increase with stronger signals. A way to correct for that is to include a multiplicative error term as well as an additive error term in statistical calculations.

Page 10: Analysis of gene expression

Most microarray experiments compare 2 conditions, using red and green dyes. Thus each gene sequence gives data that is a ratio of red to green. The problem is, when plotted on a

regular linear graph, the distance between ½ and 1 is much smaller than the distance between 1 and 2, even though they express the same (but inverse) ratios.

The solution is to take the base 2 logarithm of the red/green ratio. log2(x) = -log2(1/x), so increases and decreases give similar ranges.

Similarly, the expression level can be expressed as the geometric mean of the red and green signals: The square root of red times green. However, taking the logarithm of this spreads the data out better.

Other data manipulations can further improve appearances.

meangeometricRG _

Page 11: Analysis of gene expression

Dr. Leming Shi, National Center for Toxicological Research. "Microarray Quality Control (MAQC) Project“

Wilson CL, Miller CJ (2005). "Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis"