Skip to contents

This tutorial will run a simulation with sCCIgen without the interactive interface based on snRNAseq data.

1 Load R package

2 Load and clean sample data

Download sample data from https://github.com/songxiaoyu/sCCIgen_data/tree/main/input_data

load("snRNAseq_breast_2025_expr.Rdata")

dim(expr)
#[1] 4751 5990
expr[1:3, 1:3]
#        Epithelial Adipocyte Adipocyte
# NOC2L           0         0         0
# KLHL17          1         0         0
# ISG15           0         0         0
anno <- colnames(expr)

3 Analysis of the existing data to provide insights into the parameters of the simulation

Users can split the sCCIgen simulation into (1) model fitting and (2) simulation using fitted model and user-provided parameters steps to expedite the simulations.

It is especially helpful if the number of genes and/or cells are very large and users want to run simulation for more than once. By splitting the simulation into these two steps, users can estimate model parameters only once and save the results for multiple use.

3.1 Task 1: Estimate model parameters from the snRNAseq for simulation

This is part is to fit the expression data. When sim_method=“copula”, it will fit both the gene marginal distribution and gene-gene correlation. When sim_method=“ind”, it will only fit the gene marginal distribution.

Note: If the number of genes or cells are large, model fitting may take some time. It is suggested to select a reasonable sample size (e.g. <2500 per cell type) before the model fitting, as more cells may not be needed improve the estimation. Similarly, if some genes are extremely zero-inflated, narrowing the simulation to reasonably variable genes is an option.

# model fitting
ModelEst <- Est_ModelPara(expr = expr, 
                          anno = anno, 
                          sim_method = "copula", 
                          ncores = 10)

saveRDS(ModelEst, file = "snRNAseq_breast_2025_fit_w_cor.RDS")

Note: When snRNAseq is used as the only input dataset, users cannot estimate the cells’ spatial patterns and interactions from the data. Users can specify parameters in simulation to build in additional variations.

4 Create a parameter file

Users need to create a parameter file. The sample parameter file for snRNAseq based simulation is here for downloading and filling in to perform simulations.

5 Run simulation

# load parameter file
input <- "snRNAseq_default_param.yml"

# The default parameter file does not provide estimated model parameters. One can run below for simulation
ParaSimulation(input = input)

# Alternatively, one can run simulation, with model parameters added in with ModelFitFile (optional for speed).
model_param_path <- "snRNAseq_breast_2025_fit_wo_cor.RDS"
ParaSimulation(input = input, ModelFitFile = model_param_path)

6 Run nested functions to obtain simulation byproducts

6.1 Task 1: Plot the spatial regions simulated by sCCIgen

If users are interested to obtain the simulated regions, a nested function RandomRegionWindow can be used as follows:

# The parameter file specifies that nRegion=2 and seed=1234
win <- RandomRegionWindow(nRegion = 2, seed = 1234)

plot(win$window[[1]], col = "pink")
plot(win$window[[2]], col = "blue", add = TRUE)