Skip to contents

This tutorial will run a simulation with sCCIgen without the interactive interface based on SRT data.

1 Load R package

2 Load and clean sample data

Download sample data from https://github.com/songxiaoyu/sCCIgen_data/tree/main/input_data

load("SeqFishPlusCortex_2025_expr.Rdata")
load("SeqFishPlusCortex_2025_region_spatial.Rdata")

dim(expr)
# [1] 2500  511
dim(spatial)
# [1] 511   4
expr[1:3, 1:3]
#       ExcitatoryNeuron ExcitatoryNeuron ExcitatoryNeuron
# Aatf                 0                0                0
# Abat                 1                0                0
# Abhd2                2                2                0

spatial[1:3,]
#               anno X_final  Y_final field_col
# 1 ExcitatoryNeuron 1632.02 -1305.70         0
# 2 ExcitatoryNeuron 1589.47  -669.51         0
# 3 ExcitatoryNeuron 1539.89 -1185.90         0

anno <- colnames(expr)
region <- spatial[, 4]

3 Analysis of the existing data to provide insights into the parameters of the simulation

Users can split the sCCIgen simulation into (1) model fitting and (2) simulation using fitted model and user-provided parameters steps to expedite the simulations.

It is especially helpful if the number of genes and/or cells are very large and users want to run simulation for more than once. By splitting the simulation into these two steps, users can estimate model parameters only once and save the results for multiple use.

3.1 Task 1: Estimate model parameters from the SRT for simulation

This is part is to fit the expression data. When sim_method=“copula”, it will fit both the gene marginal distribution and gene-gene correlation. When sim_method=“ind”, it will only fit the gene marginal distribution.

Note: If the number of genes or cells are large, model fitting may take some time. It is suggested to select a reasonable sample size (e.g. <2500 cells per cell type) before the model fitting, as more cells may not be needed improve the estimation. The user interface automatically uses a maximum of 2500 cells per cell type in model fitting. In the meantime, if some genes are extremely zero-inflated, narrowing the simulation to reasonably variable genes is an option.

# model fitting
ModelEst <- Est_ModelPara(expr = expr, 
                          anno = anno, 
                          sim_method = "ind", 
                          region = region, 
                          ncores = 10)

saveRDS(ModelEst, file = "SeqFishPlusCortex_fit_wo_cor_region.RDS")

3.2 Task 2: Estimate CCIs in the input data following the existing Giotto pipeline

library(Giotto)

# Preprocess data with Giotto pipeline
## Three types of networks are built in to determine cell neighborhood: Delaunay, kNN, and distance-based. Here we use Delaunay network as an example. 
db <- preprocessGiotto(expr_data = expr, 
                       spatial_data = spatial, 
                       run_hvg = TRUE,
                       run_kNN_network = FALSE,
                       run_Delaunay_network = TRUE,
                       run_Dist_network=FALSE)

# Estimate cell-cell attraction and inhibition patterns, and save in 
# pre-defined folder
cellProximityTable(gobject = db, 
                   abs_enrichm = 0.3, 
                   p_adj = 0.05,
                   spatial_network_name = "Delaunay_network",
                   output_file = "est_CCI_dist_dist_DN_region.csv", 
                   seed=123)

# Estimate gene expressions of cells impacted by their neighbors, and save in
# pre-defined folder
ExprDistanceTable(gobject = db, 
                  in_hvg = TRUE, 
                  region_specific = TRUE, 
                  abs_log2fc_ICG = 0.3, 
                  p_adj = 0.05,
                  spatial_network_name = "Delaunay_network",
                  output_file = "est_CCI_dist_expr_DN_region.csv", seed=123)

# Estimate gene expressions of cells impacted by gene expressions of neighboring # cells, narrow to known pairs, such as ligand and receptor pairs, and save in
# pre-defined folder
ExprExprTable(gobject = db, 
              database = "mouse", 
              region_specific = TRUE, 
              spatial_network_name= "Delaunay_network",
              p_adj = 0.05, 
              abs_log2fc_LR = 0.3,
              output_file = "est_CCI_expr_expr_DN_region.csv", seed=123)

3.3 Task 3: Estimate spatial region specific genes

# Estimate spatial region specific genes.
SpatialTable(gobject = db, 
             top_num = 2, 
             fdr_cut = 0.05, 
             output_file = "est_CCI_expr_expr_region.csv")

4 Create a parameter file

Users need to create a parameter file. The sample parameter file for SRT based simulation is here for downloading and filling in to perform simulations.

5 Perform the entire simulation and save the results

Assuming you already have a parameter file, you can run the entire simulation using codes like this:

# We select the parameter file developed for SeqFISH+ data with region information. This file simulates 10000 cells, with no gene-gene correlation (ind), and have an output of 400 spots.
input <- "SeqFishPlus_Region_n10000_ind_multi400_param.yml"

ParaSimulation(input = input)

# Simulate new cells and genes where genes have region specific expressions
input <- "SeqFishPlus_RegionDiffGenes.yml"
model_param_path="SeqFishPlusCortex_fit_wo_cor_region.RDS"
ParaSimulation(input = input, ModelFitFile = model_param_path)