![]() |
Hosted @RenLab |
|
|
Protocols used in ChIP-on-chip experimentsMicroarray analysis and identification of in vivo DNA binding sites: Microarray Scanning and analysis The microarray slides are scanned using a GenePix 4000B scanner from Axon Instruments, and each microarray image is first analyzed with the GenePix Pro 4.0 image analysis software to derive the Cy3 and Cy5 fluorescent intensity and background noise for all the spots on the array. The intensities for both Cy3 and Cy5 channels are first adjusted by subtracting the background intensity of each spot using the formula Ichannel font> - Bchannel, where channel represents either Cy5 or Cy3. The intensities are further adjusted by subtracting the median intensity of all the blank spots on the array. If any of these values are lower than 10, they are automatically raised to a value of 10. A normalization factor is calculated based on the intensities of spots that are considered good (more than 65% of pixels have intensities higher than the background intensity plus one standard deviation). The median of the intensity ratios, I635/I532 is then used to adjust the Cy3 channel intensity to the same level as that of the Cy5 channel. The quantitative amplification of small amounts of DNA generates some uncertainty in values for the low intensity spots. In order to estimate that uncertainty and average repeated experiments with appropriate related weights, a single array error model
is used. The significance of a measured ratio at a spot is defined by a statistic X, which is formulated as:
where a1,2 are the intensities measured in the two channels for each spot, σ1,2 are the uncertainties due to background subtraction, and f is a fractional
multiplicative error such as would come from hybridization non-uniformities, fluctuations in the dye incorporation efficiency, and scanner gain fluctuations. The distribution of X can be found to be close to a Gaussian distribution in experiments
where Cy3 and Cy5 samples are identical. The significance of a change of magnitude X is then calculated using a one-sided probability model as follows:
where μ is the average of X, and σ is the standard deviation of X. Since the intensities are normalized, μ should be near 0. The Erf(x) function is the standard normal accumulative distribution function correspond ing to standard normal curve areas. If the Cy3 and Cy5 samples are not identical, the Gaussian distribution can be skewed, because the ChIP can result in many DNA spots with significantly higher intensities in one channel than the other. Since the input DNA is always present, the intens ity distribution of those non-enriched spots can be used to obtain the parameters of X distribution. First, the spots whose X is less than 0 are identified. These spots should be on the left half of the Gaussian distribution. Their mirr or spots are then generated using X+ = -X-. The mean and standard distribution of these new X can be calculated. When the genomic binding sites of a protein are investigated using the above method, it is routine to perform several independent experiments so that the results are more reliable. To combine replicate data sets, each sample is first analyzed individu
ally using the above single array error model. The average binding ratio and associated P values from these multiple experiments are then calculated using a weighted averaging analysis method. For each spot, the uncertainty in the log(Ratio)<
/em> is defined as:
σ = log(a2/a1)/Xnorm (3) Where a1,2are the intensities measured in the two channels for each spot, and Xnorm is the normalized X for each spot. The weights for each spot are then define
d as
wi = 1/σi2 (4) The average log(Ratio) is then calculated using the following formula
Where n is the total number of experiments, xi and wi are the log(Ratio) and weight, respectively, for a particular spot in each experiment. The weighte
d average |