Breaking the curse of dimensionality in genomic data
More recently, rapid technological advances permit RNA sequencing at the single-cell level from thousands of cells in parallel, accelerating progress in the biomedical sciences. But quantifying RNAs from such a tiny material poses great technical challenges. Even small errors in the calculations for a large number of genes can quickly add up so that any useful information is lost among signal noise.
Now, a team from the Kyoto University Institute for Advanced Study of Human Biology (WPI-ASHBi) has developed a new mathematical method that can eliminate the noise and thus enable the extraction of clear signals from single-cell RNA sequencing data. The new method successfully decreases random sampling noise in the data to enable a precise and complete understanding of a cell's activity. The research has recently been published in the journal Life Science Alliance.
The lead author of the paper, Yusuke Imoto from ASHBi, explains, "Each gene represents a different dimension in RNA sequencing data, which means that tens of thousands of dimensions must be collected across multiple cells and analyzed. Even the slightest noise in one dimension can majorly impact the downstream data analyses so that potentially important signals are lost. This is why we call this the "curse of dimensionality."
To break the curse of dimensionality, the Kyoto team has developed a new noise reduction method, RECODE-standing for "resolution of the curse of dimensionality"-to remove the random sampling noise from single-cell RNA sequencing data. RECODE applies high-dimensional statistical theories to recover accurate results, even for genes expressed at very low levels.
First, the team tested their method on data from a broadly well-studied cell population, human peripheral blood. They confirmed that RECODE successfully removes the curse of dimensionality to reveal expression patterns for individual genes close to their expected values.
Next, when compared against other state-of-the-art analysis methods, RECODE outperformed the competition by giving much truer representations of gene activation. Moreover, RECODE is simpler to use than other methods, without relying on parameters or using machine learning for the calculations to work.
Finally, the team tested RECODE on a complex dataset from mouse embryo cells containing many different types of cells with unique gene expression patterns. Whereas other methods blurred the results, RECODE clearly resolved gene expression levels, even for rare cell types.
Yusuke Imoto et. al, Resolution of the curse of dimensionality in single-cell RNA sequencing data analysis,Life Science Alliance, 9-Aug-2022
B.Sc Life Sciences, M.Sc Biotechnology, B.Ed