Run (differential) intercellular communication analysis
run_interaction_analysis.Rd
Perform (differential) cell type to cell type communication analysis from a Seurat object, using an internal database of ligand-receptor interactions (LRIs). It infers biologically relevant cell-cell interactions (CCIs) and how they change between two conditions of interest. Over-representation analysis is automatically performed to determine dominant differential signals at the level of the genes, cell types, GO Terms and KEGG Pathways.
Usage
run_interaction_analysis(
seurat_object,
LRI_species,
seurat_celltype_id,
seurat_condition_id,
iterations = 1000,
scdiffcom_object_name = "scDiffCom_object",
seurat_assay = "RNA",
seurat_slot = "data",
log_scale = FALSE,
score_type = "geometric_mean",
threshold_min_cells = 5,
threshold_pct = 0.1,
threshold_quantile_score = 0.2,
threshold_p_value_specificity = 0.05,
threshold_p_value_de = 0.05,
threshold_logfc = log(1.5),
return_distributions = FALSE,
seed = 42,
verbose = TRUE,
custom_LRI_tables = NULL
)
Arguments
- seurat_object
Seurat object that must contain normalized data and relevant
meta.data
columns (see below). Gene names must be MGI (mouse) or HGNC (human) approved symbols.- LRI_species
Either
"mouse"
,"human"
,"rat"
or"custom"
. Indicates which LRI database to use and corresponds to the species of theseurat_object
. Use"custom"
at your own risk to use your own LRI table (seecustom_LRI_tables
).- seurat_celltype_id
Name of the
meta.data
column inseurat_object
that contains cell-type annotations (e.g.:"CELL_TYPE"
).- seurat_condition_id
List that contains information regarding the two conditions on which to perform differential analysis. Must contain the following three named items:
column_name
: name of themeta.data
column inseurat_object
that indicates the condition on each cell (e.g."AGE"
)cond1_name
: name of the first condition (e.g."YOUNG"
)cond2_name
: name of the second condition (e.g."OLD"
)
Can also be set to
NULL
to only perform a detection analysis (see Details).- iterations
Number of permutations to perform the statistical analysis. The default (
1000
) is a good compromise for an exploratory analysis and to obtain reasonably accurate p-values in a short time. Otherwise, we recommend using10000
iterations and to run the analysis in parallel (see Details). Can also be set to0
for debugging and quickly returning partial results without statistical significance.- scdiffcom_object_name
Name of the
scDiffCom
S4 object that will be returned ("scDiffCom_object"
by default).- seurat_assay
Assay of
seurat_object
from which to extract data. See Details for an explanation on how data are extracted based on the three parametersseurat_assay
,seurat_slot
andlog_scale
.- seurat_slot
Slot of
seurat_object
from which to extract data. See Details for an explanation on how data are extracted based on the three parametersseurat_assay
,seurat_slot
andlog_scale
.- log_scale
When
FALSE
(the default, recommended), data are treated as normalized but not log1p-transformed. See Details for an explanation on how data are extracted based on the three parametersseurat_assay
,seurat_slot
andlog_scale
.- score_type
Metric used to compute cell-cell interaction (CCI) scores. Can either be
"geometric_mean"
(default) or"arithmetic_mean"
. It is strongly recommended to use the geometric mean, especially when performing differential analysis. The arithmetic mean might be used when uniquely doing a detection analysis or if the results want to be compared with those of another package.- threshold_min_cells
Minimal number of cells - of a given cell type and condition - required to express a gene for this gene to be considered expressed in the corresponding cell type. Incidentally, cell types with less cells than this threshold are removed from the analysis. Set to
5
by default.- threshold_pct
Minimal fraction of cells - of a given cell type and condition - required to express a gene for this gene to be considered expressed in the corresponding cell type. Set to
0.1
by default.- threshold_quantile_score
Threshold value used in conjunction with
threshold_p_value_specificity
to establish if a CCI is considered "detected". The default (0.2
) indicates that CCIs with a score in the 20% lowest-scores are not considered detected. Can be modified without the need to re-perform the permutation analysis (see Details).- threshold_p_value_specificity
Threshold value used in conjunction with
threshold_quantile_score
to establish if a CCI is considered "detected". CCIs with a (BH-adjusted) specificity p-value above the threshold (0.05
by default) are not considered detected. Can be modified without the need to re-perform the permutation analysis (see Details).- threshold_p_value_de
Threshold value used in conjunction with
threshold_logfc
to establish how CCIs are differentially expressed betweencond1_name
andcond2_name
. CCIs with a (BH-adjusted) differential p-value above the threshold (0.05
by default) are not considered to change significantly. Can be modified without the need to re-perform the permutation analysis (see Details).- threshold_logfc
Threshold value used in conjunction with
threshold_p_value_de
to establish how CCIs are differentially expressed betweencond1_name
andcond2_name
. CCIs with an absolute logFC below the threshold (log(1.5)
by default) are considered "FLAT". Can be modified without the need to re-perform the permutation analysis (see Details).- return_distributions
FALSE
by default. IfTRUE
, the distributions obtained from the permutation test are returned alongside the other results. May be used for testing or benchmarking purposes. Can only be enabled wheniterations
is less than1000
in order to avoid out of memory issues.- seed
Set a random seed (
42
by default) to obtain reproducible results.- verbose
If
TRUE
(default), print progress messages.- custom_LRI_tables
A list containing a LRI table and, if known, tables with annotations supplied by the user. Overwrite
LRI_species
and the corresponding internal LRI table. Use to your own risk! Must contain at least the following named item:LRI
: a data.table of LRIs
The data.table of LRIs must be in the same format as the internal LRI_tables, namely with the columns "LRI", "LIGAND_1", "LIGAND_2", "RECEPTOR_1", "RECEPTOR_2", "RECEPTOR_3". Other named data.tables can be supplied for over-representation analysis (ORA) purposes.
Value
An S4 object of class scDiffCom-class
.
Details
The primary use of this function (and of the package) is to perform
differential intercellular communication analysis. However, it is also
possible to only perform a detection analysis (by setting
seurat_condition_id
to NULL
), e.g. if one wants to
infer cell-cell interactions from a dataset without having conditions on the cells.
By convention, when performing differential analysis, LOGFC are computed as
log(score(cond2_name)/score(cond1_name))
. In other words,
"UP"-regulated CCIs have a larger score in cond2_name
.
Parallel computing. If possible, it is recommended to
run this function in parallel in order to speed up the analysis for large
dataset and/or to obtain better accuracy on the p-values by setting a higher
number of iterations
. This is as simple as loading the
future
package and setting an appropriate plan
(see also our
vignette).
Data extraction. The UMI or read counts matrix is extracted from
the assay seurat_assay
and the slot seurat_slot
. By default,
it is assumed that seurat_object
contains log1p-transformed
normalized data in the slot "data" of its assay "RNA". If log_scale
is FALSE
(as recommended), the data are expm1()
transformed
in order to recover normalized values not in log scale.
Modifying filtering parameters (differential analysis only). As long as
the slot cci_table_raw
of
the returned scDiffCom object is not erased, filtering parameters can be
modified to recompute the slots cci_table_detected
and
ora_table
, without re-performing the time consuming permutation
analysis. This may be useful if one wants a fast way to analyze how the
results behave in function of, say, different LOGFC thresholds. In practice,
this can be done by calling the functions FilterCCI
or
RunORA
(see also our
vignette).
Examples
if (FALSE) {
run_interaction_analysis(
seurat_object = seurat_sample_tms_liver,
LRI_species = "mouse",
seurat_celltype_id = "cell_type",
seurat_condition_id = list(
column_name = "age_group",
cond1_name = "YOUNG",
cond2_name = "OLD"
)
)
}