Skip to contents

This tutorial will run a simulation with sCCIgen without the interactive interface based on unpaired expression and spatial data.

1 Load R package

2 Load and clean sample data

Download sample data from https://github.com/songxiaoyu/sCCIgen_data/tree/main/input_data.

# This is a subset of MERFISH OV cancer dataset with 1000 cells' expression and 500 cells spatial map.
load("MERFISH_OV_2025_expr.Rdata")
load("MERFISH_OV_2025_spatial.Rdata")
anno <- colnames(expr)

dim(expr)
# [1]    550 248065
dim(spatial)
# [1] 209173      3

expr[1:3, 1:3]
#        Others Others Others
# PDK4        0      0      0
# CCL26       0      0      0
# CX3CL1      0      0      0
spatial[1:3, ]
#   Labels center_x  center_y
# 1 Others 9483.241 -48.53505
# 2  Tumor 9489.692 -42.11135
# 3  Tumor 9497.020 -37.24292

# Note: Number of cells in expression data and spatial data are not the same, as they are used to mimic unpaired data from different experiments.

3 Analysis of the existing data to provide insights into the parameters of the simulation

This is part is to fit the expression data. When sim_method=“copula”, it will fit both the gene marginal distribution and gene-gene correlation. When sim_method=“ind”, it will only fit the gene marginal distribution.

Note: If the number of genes or cells are large, model fitting may take some time. It is suggested to select a reasonable sample size (e.g. <2500 per cell type) before the model fitting, as more cells may not be needed improve the estimation. Similarly, if some genes are extremely zero-inflated, narrowing the simulation to reasonably variable genes is an option.

3.1 Task 1: Estimate model parameters from the unpaired data for simulation

# model fitting
ModelEst <- Est_ModelPara(expr = expr, 
                          anno = anno, 
                          sim_method = "ind", 
                          ncores = 10)

saveRDS(ModelEst, file = "Unpaired_fit_wo_cor.RDS")

4 Create a parameter file

Users need to create a parameter file. The sample parameter file for snRNAseq based simulation is here for downloading and filling in to perform simulations.

5 Perform the entire simulation and save the results

Assuming you already have a parameter file, you can run the entire simulation using codes like this:

# Simulate default data - using existing cells but simulate expression with ground truth
input <- "MERFISH_unpaired_default_param.yml"
model_param_path <- "Unpaired_fit_wo_cor.RDS"


ParaSimulation(input = input, ModelFitFile = model_param_path)