Method Descriptions

The seven methods used by Scopecell are briefly introduced and the related working flow chart or experimental result chart is provided, so that you can quickly understand the working flow of the method. The images provided are from the articles corresponding to each method. If you want to learn more about a method, you can follow the hyperlink to the corresponding method's GitHub home page.

scPROTEIN

scPROTEIN consists of peptide uncertainty estimation based on a multitask heteroscedastic regression model and cell embedding generation based on graph contrastive learning. scPROTEIN can estimate the uncertainty of peptide quantification, denoise protein data, remove batch effects and encode single-cell proteomic-specific embeddings in a unified framework.

Figures imported from Wei Li, et al. scPROTEIN: a versatile deep graph contrastive learning framework for single-cell proteomics embedding, Nature Methods, 2024.

MAGIC

MAGIC takes an observed count matrix and recovers an imputed count matrix representing the likely expression for each individual cell, based on data diffusion between similar cells. For a given cell, MAGIC first identifies the cells that are most similar and aggregates gene expression across these highly similar cells to impute gene expression that corrects for dropout and other sources of noise.

Figures imported from David van Dijk, et al. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion, Cell 174, 2018.

BBKNN

BBKNN is a fast and intuitive batch effect removal tool , which actively combats this effect by taking each cell and identifying a (smaller) k nearest neighbours in each batch separately, rather than the dataset as a whole. These nearest neighbours for each batch are then merged into a final neighbour list for the cell.

Figures imported from Krzysztof Polański, et al. BBKNN: fast batch alignment of single cell transcriptomes, Bioinformatics 36, 2020.

AutoClass

A deep learning-based method AutoClass for thorough cleaning of scRNA-Seq data, which integrates two neural network components, an autoencoder, and a classifier. This composite network architecture is essential for filtering out noise and retaining signal effectively. Unlike many other scRNA-Seq imputation methods, AutoClass does not rely on any distribution assumption, and fully counts the non-linear interactions between genes.

Figures imported from Hui Li, et al. A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data, Nature Communications 13, 2022.

Harmony

Harmony, an algorithm for robust, scalable and flexible multi-dataset integration to meet four key challenges of unsupervised scRNAseq joint embedding: scaling to large datasets, identification of both broad populations and fine-grained subpopulations, flexibility to accommodate complex experimental design, and the power to integrate across modalities.

Figures imported from Ilya Korsunsky, et al. Fast, sensitive and accurate integration of single-cell data with Harmony, Nature Methods 16, 2019.

Scanorama

Scanorama is an effective tool for combining multiple scRNA-seq datasets, addressing technical variation introduced by differences in sample preparation, sequencing depth, and experimental batches that can confound the analysis of diverse datasets.

Figures imported from Brian Hie, et al. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama, Nature Biotechnology 37, 2019.

KNN-combat

KNN-combat is a combined method addressing missing values and batch effects in data. It uses K-nearest neighbor (KNN) interpolation first to fill missing values and ensure data matrix integrity. Secondly, the ComBat method is applied to the complete data to correct systematic errors introduced by the experimental batch, preserving biological differences in the samples.

Figures adapted from the combined workflow principles of ComBat (Johnson et al., 2007) and KNN imputation literature.

Squidpy

Squidpy is an open-source Python library designed for the analysis and visualization of spatial omics data, built upon the Scanpy ecosystem. It enables the integration of molecular profiles with spatial coordinates to construct spatial neighborhood graphs and extract image features. Squidpy provides a comprehensive suite of spatial statistics, including Moran’s I, neighborhood enrichment, co-occurrence, and ligand–receptor interaction analysis, allowing systematic characterization of cellular spatial organization.

Figure imported from Palla, G., et al. Squidpy: a scalable framework for spatial omics analysis, Nature Methods 19, 2022.