|
Motivation:
Identifying patterns of co-expression in microarray data by cluster
analysis has been a productive approach to uncovering molecular mechanisms
underlying biological processes under investigation. Using experimental
replicates can generally improve the precision of the cluster analysis by
reducing the experimental variability of measurements made under different
experimental conditions. In such situations Bayesian mixtures allow for
an efficient use of information by precisely modeling between-
replicates variability.
Results:
We developed Bayesian mixture based clustering procedure for clustering
gene expression data with experimental replicates. In this approach,
a Bayesian mixture model is extended to accommodate experimental
replicates. Clusters of co-expressed genes are created from the
posterior distribution of clusterings, which is estimated by a
Gibbs sampler. Previously we established that this approach to
clustering microarray data with experimental replicates outperforms
alternative approaches based on traditional clustering approaches.
Utility of both finite and infinite mixture models in this setting
was investigated. By analyzing synthetic and the real-world
datasets we established that the precise modeling of intra-gene
variability is of important for accurate identification of
co-expressed genes. Such modeling is possible only when replicated
data is available. We also introduce a heuristic modification to the
Gibbs sampler based on the "reverse annealing" principle. This
modification effectively overcame the tendency of the Gibbs
sampler to converge to different modes of the posterior distribution
when started from different initial positions in high experimental
variability situations. Finally, we demonstrate that the Bayesian
infinite mixture model with "elliptical" variance structure
is capable identifying the underlying structure of the data
without knowing the "correct" number of clusters.
|