library(Canek)
# Functions
## Function to plot the pca coordinates
<- function(pcaData = NULL, label = NULL, legPosition = "topleft"){
plotPCA <- as.integer(label)
col plot(x = pcaData[,"PC1"], y = pcaData[,"PC2"],
col = as.integer(label), cex = 0.75, pch = 19,
xlab = "PC1", ylab = "PC2")
legend(legPosition, pch = 19,
legend = levels(label),
col = unique(as.integer(label)))
}
On this toy example we use the two simulated batches included in the
SimBatches
data from Canek’s package.
SimBatches
is a list containing:
batches
: Simulated scRNA-seq datasets with genes (rows)
and cells (columns). Simulations were performed using Splatter.cell_type
: a factor containing the celltype labels of
the batches<- list(B1 = SimBatches$batches[[1]], B2 = SimBatches$batches[[2]])
lsData <- factor(c(rep("Batch-1", ncol(lsData[[1]])),
batch rep("Batch-2", ncol(lsData[[2]]))))
<- SimBatches$cell_types
celltype table(batch)
#> batch
#> Batch-1 Batch-2
#> 631 948
table(celltype)
#> celltype
#> Cell Type 1 Cell Type 2 Cell Type 3 Cell Type 4
#> 1451 53 38 37
We perform the Principal Component Analysis (PCA) of the joined datasets and scatter plot the first two PCs. The batch-effect causes cells to group by batch.
<- Reduce(cbind, lsData)
data <- prcomp(t(data), center = TRUE, scale. = TRUE)$x pcaData
plotPCA(pcaData = pcaData, label = batch, legPosition = "bottomleft")
plotPCA(pcaData = pcaData, label = celltype, legPosition = "bottomleft")
We correct the toy batches using the function RunCanek. This function accepts:
On this example we use the list of matrices created before.
<- RunCanek(lsData) data
We perform PCA of the corrected datasets and plot the first two PCs. After correction, the cells group by their corresponding cell type.
<- prcomp(t(data), center = TRUE, scale. = TRUE)$x pcaData
plotPCA(pcaData = pcaData, label = batch, legPosition = "topleft")
plotPCA(pcaData = pcaData, label = celltype, legPosition = "topleft")
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#>
#> Matrix products: default
#> BLAS/LAPACK: /Users/martin/miniconda3/envs/Canek/lib/libopenblasp-r0.3.18.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] Canek_0.2.1
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.8.3 highr_0.9 DEoptimR_1.0-11
#> [4] bslib_0.3.1 compiler_4.1.3 bluster_1.4.0
#> [7] jquerylib_0.1.4 class_7.3-20 prabclus_2.3-2
#> [10] BiocNeighbors_1.12.0 numbers_0.8-2 tools_4.1.3
#> [13] mclust_5.4.9 digest_0.6.29 jsonlite_1.8.0
#> [16] evaluate_0.15 lattice_0.20-45 pkgconfig_2.0.3
#> [19] rlang_1.0.2 Matrix_1.4-1 igraph_1.3.0
#> [22] cli_3.2.0 rstudioapi_0.13 yaml_2.3.5
#> [25] parallel_4.1.3 xfun_0.30 fastmap_1.1.0
#> [28] stringr_1.4.0 knitr_1.38 cluster_2.1.3
#> [31] sass_0.4.1 S4Vectors_0.32.4 fpc_2.2-9
#> [34] diptest_0.76-0 nnet_7.3-17 stats4_4.1.3
#> [37] grid_4.1.3 robustbase_0.95-0 R6_2.5.1
#> [40] flexmix_2.3-17 BiocParallel_1.28.3 rmarkdown_2.13
#> [43] irlba_2.3.5 kernlab_0.9-30 magrittr_2.0.3
#> [46] matrixStats_0.61.0 modeltools_0.2-23 MASS_7.3-56
#> [49] htmltools_0.5.2 BiocGenerics_0.40.0 stringi_1.7.6
#> [52] FNN_1.1.3