Analyzing Data from a Splice Array Analyzing Data from a Splice Array ...
der=0 cellpadding=0 cellspacing=0 width=100%>
Yahoo! is not affiliated with the authors of this page or responsible for its content.
Analyzing Data from a Splice Array Analyzing Data from a Splice Array Experiment Experiment
1
Analyzing Data from a Splice Array
Analyzing Data from a Splice Array
Experiment
Experiment
Jean Yee Hwa Yang
Yuanyuan Xiao
Mark Segal
http://www.biostat.ucsf.edu/jean/
University of California, San Francisco
Biological verification
and interpretation
Microarray experiment
Experimental design
Image analysis
Normalization
Biological question
Testing
Estimation
Discrimination
Analysis
Clustering
Life
Life
cycle
cycle
Quality
Measurement
Failed
Pass
Outline
Outline
Background
Experimental design of the splice arrays
Preprocessing
Stepwise normalization
Comparison study
Finding differential expressed genes.
Differential expression via distance summary (DEDS)
Summary
Background
Background
2
Nuclear RNA
Nuclear RNA
p
p
rocessing
rocessing
e
e
vents
vents
5 capping
3 cleavage and
polyadenylation
Intron removal
splicing
mRNA transport to
cytoplasm for
translation
For Yeast
~ 6000 genes
~ 250 contain introns
Taken from http://www.accessexcellence.org/
Biological background
Biological background
Mutants
Spt4-5 -- chromatin specific elongation factors.
Spt4d, Spt5.4, Spt5.194 and Spt5.242
Ceg1 Capping enzyme mutant
Long term question
How does the Spt4-Spt5 complex affect transcription elongation?
Investigate the role of Spt-Spt5 complex in splicing.
Specific question
Identification of genes with splicing defects in mutant strains. i.e.
Identify DE genes in the splice array.
Fabrication of spotted arrays
Fabrication of spotted arrays
Arrayed Library
(96 or 384-well plates of
bacterial glycerol stocks)
PCR amplification
Directly from colonies with
SP6-T7 primers in 96-well
plates
Consolidate into
384-well plates
Spot as microarray
on glass slides
Expression profiling with DNA microarrays
cDNA A
Cy5 labeled
cDNA B
Cy3 labeled
Hybridization
Scanning
Laser 1 Laser 2
+
Analysis
Image Capture
3
Fabrication of
Fabrication of splicing-specific microarrays
Int
exon 1
exon 2
Ex2
Splicing
exon 1
exon 2
SJ
Intronless
Intronless
Clark, et al., Science 2002, 298:907-910
Intron-containing genes
Intronless genes
Print Layout:
4 X 4 Print tips
15X24 Probes / Print tip
5760 Probes total
Experimental design
Experimental design
target samples
target samples
spt4d
wt
spt5.194
spt5.242
spt5.4
ceg1
x2
x2
x2
x2
x2
x2
These mutants are defective for transcription elongation.
22 arrays were hybridized, scanned and quantified using
GenePix.
Normalization
Normalization
Normali
Normali
z
z
ation
ation
This is known as the process of identifying and removing systematic
variation not due to real differences between RNA treatments i.e.
differential gene expression.
These systematic variation can be observed from the dependence of
ratios on
Fluorescent intensity (A)
Spatial (S) heterogeneity.
Print-tip.
384-well plate.
Time order of print.
Often, these dependencies are correlated with each other.
4
Preprocessing steps and options
Preprocessing steps and options
Which genes to use
Normalization methods
§
All
§
Intronless
§
Exon
§
Ratios [two channels]
-- Median
-- Loess
-- Print-tip / pins
§
Intensities [single channel]
-- ANOVA
-- Quantile normalization
-- VSN
Adjusting A
Adjusting A
Before
Within
Within
-
-
slide normalization: adjusting A
slide normalization: adjusting A
To correct for any
dye-biases
that
commonly occur in cDNA microarrays.
Global normalization, median shift.
Robust linear normalization (local regression
model) [
Kelper et al Genome Biology 2003
.]
An
Intensity (A) dependent
loess fit to log-
ratios.
Adjusting A
Adjusting A
Before
After
5
Within
Within
-
-
slide normalization: adjusting S
slide normalization: adjusting S
To correct for any
spatial imbalance
that
commonly occur in cDNA microarrays.
Adjustment to print-tip-groups.
2D-loess: Local spatial smoothing.
[
These are implemented in Bioconductor.
]
ANOVA adjusting for rows and columns effect.
Use median filter to estimate and adjust for the spatial
trend. Size of smoothing element is a 3 by 3 block of
spots. [
Ref Wilson et al Bioinformatics, 2003 and is implemented in a
Rpackage tRMA which is available at
http://www.pi.csiro.au/gena/tRMA/
]
Illustration
Illustration
+
+
Wilson et al (2003)
2D-loess
Fitted
Normalized
Between
Between
-
-
slide normalization: adjusting scale
slide normalization: adjusting scale
Here, we are concerned with making the single-channels
between slides comparable.
Quantile normalisation is based on the idea of normalising
for equivalent medians or quartiles, requiring that
every
quantile across channels be equal
and forcing the
channels to have the same distribution.
This distribution is estimated by the average of each
quantile across all channels.
[Ref: Natalie Thorne and Gordon Smyth have
implemented this method in the Bioconductor package
limma.
6
Stepwise normalization
Stepwise normalization
Motivation:
Different slides within an experiment are similar but
distinct from each other, therefore, we propose a data-
specific normalization.
Avoid over fitting and introducing too much noise.
Model
df
Null
0
Median 1
rlm
2
loess
~5
A
Model
df
Null
0
Median 1x16
rlm
2x16
loess 5x16
Print-tips
Model
df
Null
0
Median 1x22
rlm
2x22
loess 5x22
Plate
Model
df
Null
0
ANOVA 1+60+96
rlm
4
loess
~20
Med filter >30
2D-Spatial
At each step, select the best model based on
BIC = -2Log(L) + Klog(N)
Different degree of spatial adjustment
This is an example of a print-tip median normalization
Model
df
Null
0
Median 1
rlm
2
loess
~5
A
This is an example of a print-tip median normalization
Model
df
Null
0
Median 1
rlm
2
loess
~5
A
Model
df
Null
0
Median 1x16
rlm
2x16
loess 5x16
Print-tips
7
This is an example of a print-tip median normalization
Model
df
Null
0
Median 1
rlm
2
loess
~5
A
Model
df
Null
0
Median 1x16
rlm
2x16
loess 5x16
Print-tips
Model
df
Null
0
Median 1x22
rlm
2x22
loess 5x22
Plate
Model
df
Null
0
ANOVA 1+60+96
rlm
4
loess
~20
Med filter >30
2D-Spatial
Experimental design of
Experimental design of
splice arrays
splice arrays
+
+
Comparison
Comparison
Criteria for comparison
Criteria for comparison
Its often hard to use DE genes as the comparisons criteria, unless we
have a set of spike-ins.
Splice arrays are constructed arrays that can be used to compare
different normalisation methods.
Array layout splicing-specific microarrays
Int
exon 1
exon 2
Exon
Splicing
exon 1
exon 2
SJ
Intronless
Intronless
Clark, et al., Science 2002, 298:907-910
Intron-containing genes
Intronless genes
8
Array layout
Array layout
Probes:
~ Examine 260 genes
40mer oligonucleotides from SJ,
Int, Exon and Intronless and 4
replicates for each gene.
~
1100 SJ
~
1100 Int
~
1100 Exon
~
800 Intronless
Print Layout:
4 X 4 Print tips
15X24 Probes / Print tip
5760 Probes total
Probe of interest
Use for self-normalization
Constantly expressed genes.
Without using
Without using
exon
exon
information
information
M =
wt
mut
SJ
SJ
2
log
Clark, et al., Science 2002, 298:907-910
exon 1
exon 2
exon 1
exon 2
Mt
wt
Assumption:
We assume that the probes are
close to each other on the slides
Self normalization
Self normalization
and Index forming
and Index forming
M
mt
= SJ index =
wt
wt
mut
mut
Ex
SJ
Ex
SJ
2
log
Clark, et al., Science 2002, 298:907-910
exon 1
exon 2
exon 1
exon 2
Mt
wt
Assumption:
We assume that the probes are
close to each other on the slides
Criteria for comparison
Criteria for comparison
We use
SJ-Index
as the standard and compared the various
normalization to SJ-Index based on Euclidean distance.
Assume there are no Exons (or
gene-specific controls
) on the arrays.
This is the case for most experiment, only the
probe of interest
are
spotted (i.e. SJ probes).
For each gene,
Observed SJ
obs
= SJ C1.
Observed Ex
obs
= Ex C2.
We assume.
C1 = C2 and
E{log
2
(Ex
MT
/ Ex
WT
)} = 0.
9
Normalization methods
Normalization methods
No Normalization
Median = Global median.
Loess = Global loess fit.
PrintTip = Print-tip loess.
CSIRO = Spatial method proposed by Wilson et al
VSN = VSN method proposed by Huber et al
Quantile = Quantile normalization (this method adjust
for between arrays).
Step = Stepwise normalization.
Controls spots are essential to validate
assumptions before individual
normalization.
Results may change if we consider only
control spots.
Looking at the average effects for each
mutant.
10
Back to splice array
Back to splice array
Which genes to use
Intronless
Normalization method
print-tip loess
Step 1: Adjusting for potential spatial problem
+
Step 2: Since we have the exon information, we performed
self normalization and combined the replicates.
Normalization method
Self normalization via construction of
SJ-index and IA-index
wt
wt
mut
mut
mt
SJ
Ex
SJ
Ex
SJ
2
log
index
SJ
=
=
exon 1
exon 2
exon 1
exon 2
Mt
wt
exon 1
exon 2
exon 1
exon 2
wt
wt
mut
mut
mt
IA
Ex
IA
Ex
IA
2
log
IAindex
=
=
Expression data in splice array experiments
Express