# more than 99% of human DNA sequences are the same across the population
# it must occur in at least 1% of the population.
# SNPs, which make up about 90% of all human genetic variation, occur every 100 to 300 bases along the 3-billion-base human genome
# two of every three SNPs involve the replacement of cytosine (C) with thymine (T)
# SNPs can occur in both coding (gene) and noncoding regions of the genome.
# Many SNPs have no effect on cell function, but scientists believe others could predispose people to disease or influence their response to a drug
Saturday, October 27, 2007
Normalization
# Background Correction (Oligonucleotide arrays)
- The array is split into 16 rectangular zones
- Zone background is chosen to be the lowest 2% of intensities in each zone
- The background for each of the probes is computed as weighted sum of backgrounds of all zones.
- The corrected probe balues can be calculated by subtracting the background
# Normalization is necessary because the raw intensities of labeled targets vary among arrays due to sources of experimental variability independent of level of expression
- The array is split into 16 rectangular zones
- Zone background is chosen to be the lowest 2% of intensities in each zone
- The background for each of the probes is computed as weighted sum of backgrounds of all zones.
- The corrected probe balues can be calculated by subtracting the background
# Normalization is necessary because the raw intensities of labeled targets vary among arrays due to sources of experimental variability independent of level of expression
Twin Study
C: the number of concordant pairs
D: the number of discordant pairs
Pairwise concordance = C/(C+D)
Probandwise concordance = 2C/(2C+D)
D: the number of discordant pairs
Pairwise concordance = C/(C+D)
Probandwise concordance = 2C/(2C+D)
Pedigree Analysis
library(kinship)
# generate an example data
id <- 1:14
dadid <- c(NA, NA, 1, 1, 1, 3, 5, NA, NA, 8, 8, NA, NA, 11)
momid <- c(NA, NA, 2, 2, 2, 12, 13, NA, NA, 9, 9, NA, NA, 4)
sex <- c(1, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1)
affected <-c(1, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1)
status <-c(0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0)
# make it a data frame
ped <- data.frame(id, dadid, momid, sex, affected, status)
# pedigree analysis
pp<-pedigree(id=ped$id,dadid=ped$dadid,momid=ped$momid,sex=ped$sex,affected=ped$affected,status=ped$status)
# pedigree plot
plot(pp)
# generate an example data
id <- 1:14
dadid <- c(NA, NA, 1, 1, 1, 3, 5, NA, NA, 8, 8, NA, NA, 11)
momid <- c(NA, NA, 2, 2, 2, 12, 13, NA, NA, 9, 9, NA, NA, 4)
sex <- c(1, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1)
affected <-c(1, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1)
status <-c(0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0)
# make it a data frame
ped <- data.frame(id, dadid, momid, sex, affected, status)
# pedigree analysis
pp<-pedigree(id=ped$id,dadid=ped$dadid,momid=ped$momid,sex=ped$sex,affected=ped$affected,status=ped$status)
# pedigree plot
plot(pp)
Microarray in general
### cDNA microarray - 2 color
Cy5: Red (experimental, mutant)
Cy3: Green (reference, wild-type)
M = log2(R/G)
A = {log2(R) + log2(G)}/2
### "Long" oligo arrays
- two color and double-stranded
- 60-80 bp long
### Affymetrix arrays
- one color array and single stranded
- 25 bp
# probe: material that is purposedfully places on the array before the experiment
# target: the material that is gathered from a sample
# hybridization: target material is put on array, then targets stick to complementary probes
Cy5: Red (experimental, mutant)
Cy3: Green (reference, wild-type)
M = log2(R/G)
A = {log2(R) + log2(G)}/2
### "Long" oligo arrays
- two color and double-stranded
- 60-80 bp long
### Affymetrix arrays
- one color array and single stranded
- 25 bp
# probe: material that is purposedfully places on the array before the experiment
# target: the material that is gathered from a sample
# hybridization: target material is put on array, then targets stick to complementary probes
Comparison Analysis (Experimental vs Baseline arrays)
# Compare the difference values (PM-MM) of each probe pair in the baseline array to its matching probe pair on the experimental array.
# Before comparing two arrays, variations between the two experiments caused by technical and biological factors must be corrected by scaling, normalization or a Robust normalization.
# Change p-value
Using the difference between PM and MM as well as PM and background intensities, the Change p-value is calculated by the Wilcoxon's signed rank test.
# Change Call
Increase (I): p-value < gamma1
Marginal Increase (MI): gamma1 < p-value < gamma2
No Change (NC): gamm2 < p-value < 1-gamma2
Marginal Decrease (MD): 1-gamma2 < p-value < 1-gamma1
Decrease (D): p-value > 1-gamma1
# Signal Log Ratio Algorithm
One-step Tukey's Biweight method
# Before comparing two arrays, variations between the two experiments caused by technical and biological factors must be corrected by scaling, normalization or a Robust normalization.
# Change p-value
Using the difference between PM and MM as well as PM and background intensities, the Change p-value is calculated by the Wilcoxon's signed rank test.
# Change Call
Increase (I): p-value < gamma1
Marginal Increase (MI): gamma1 < p-value < gamma2
No Change (NC): gamm2 < p-value < 1-gamma2
Marginal Decrease (MD): 1-gamma2 < p-value < 1-gamma1
Decrease (D): p-value > 1-gamma1
# Signal Log Ratio Algorithm
One-step Tukey's Biweight method
Single Array Analysis (Oilgonucleotide expression arrays)
# Single stranded DNA, 25 bp
# 14~20 probe pairs for each gene
# Each probe pair has a Perfect Match (PM) and a Miss Match (MM)
# MAS 4.0 (Average difference) = average of PM-MM difference
# Low-level anaysis: feature extraction, normalization, computation of expression indexes
# High-level analysis: t-test, ANOVA
# Discrimination Score
R = (PM - MM) / (PM + MM)
# Detection p-value by One-sided Wilcoxon's Signed Rank test
H0: E(R) = tau (default = 0.015)
Ha: E(R) > tau
# Detection Call
Present: p-value <= alpha1
Marginal: alpha1 < p-value <= alpha2
Absent: p-value > alpha2
defaults: alpha1=0.04, alpha2 = 0.06
# Signal Algorithm
One-Step Turkey's Biweight Estiimate
If PM > MM, informative
if PM < MM, uninformative and use an imputed value called Idealized Mismatch (IM)
# 14~20 probe pairs for each gene
# Each probe pair has a Perfect Match (PM) and a Miss Match (MM)
# MAS 4.0 (Average difference) = average of PM-MM difference
# Low-level anaysis: feature extraction, normalization, computation of expression indexes
# High-level analysis: t-test, ANOVA
# Discrimination Score
R = (PM - MM) / (PM + MM)
# Detection p-value by One-sided Wilcoxon's Signed Rank test
H0: E(R) = tau (default = 0.015)
Ha: E(R) > tau
# Detection Call
Present: p-value <= alpha1
Marginal: alpha1 < p-value <= alpha2
Absent: p-value > alpha2
defaults: alpha1=0.04, alpha2 = 0.06
# Signal Algorithm
One-Step Turkey's Biweight Estiimate
If PM > MM, informative
if PM < MM, uninformative and use an imputed value called Idealized Mismatch (IM)
Subscribe to:
Posts (Atom)