Running a simulation based on unpaired expression and spatial data
Source:vignettes/Rpackage_unpaired.Rmd
Rpackage_unpaired.Rmd
This tutorial will run a simulation with sCCIgen
without
the interactive interface based on unpaired expression and spatial
data.
2 Load and clean sample data
Download sample data from https://github.com/songxiaoyu/sCCIgen_data/tree/main/input_data.
# This is a subset of MERFISH OV cancer dataset with 1000 cells' expression and 500 cells spatial map.
load("MERFISH_OV_2025_expr.Rdata")
load("MERFISH_OV_2025_spatial.Rdata")
anno <- colnames(expr)
dim(expr)
# [1] 550 248065
dim(spatial)
# [1] 209173 3
expr[1:3, 1:3]
# Others Others Others
# PDK4 0 0 0
# CCL26 0 0 0
# CX3CL1 0 0 0
spatial[1:3, ]
# Labels center_x center_y
# 1 Others 9483.241 -48.53505
# 2 Tumor 9489.692 -42.11135
# 3 Tumor 9497.020 -37.24292
# Note: Number of cells in expression data and spatial data are not the same, as they are used to mimic unpaired data from different experiments.
3 Analysis of the existing data to provide insights into the parameters of the simulation
This is part is to fit the expression data. When sim_method=“copula”, it will fit both the gene marginal distribution and gene-gene correlation. When sim_method=“ind”, it will only fit the gene marginal distribution.
Note: If the number of genes or cells are large, model fitting may take some time. It is suggested to select a reasonable sample size (e.g. <2500 per cell type) before the model fitting, as more cells may not be needed improve the estimation. Similarly, if some genes are extremely zero-inflated, narrowing the simulation to reasonably variable genes is an option.
3.1 Task 1: Estimate model parameters from the unpaired data for simulation
# model fitting
ModelEst <- Est_ModelPara(expr = expr,
anno = anno,
sim_method = "ind",
ncores = 10)
saveRDS(ModelEst, file = "Unpaired_fit_wo_cor.RDS")
4 Create a parameter file
Users need to create a parameter file. The sample parameter file for snRNAseq based simulation is here for downloading and filling in to perform simulations.
5 Perform the entire simulation and save the results
Assuming you already have a parameter file, you can run the entire simulation using codes like this:
# Simulate default data - using existing cells but simulate expression with ground truth
input <- "MERFISH_unpaired_default_param.yml"
model_param_path <- "Unpaired_fit_wo_cor.RDS"
ParaSimulation(input = input, ModelFitFile = model_param_path)