Pan-cancer quantification of neoantigen-mediated immunoediting in cancer evolution
Dependencies
This project is depended on R software, R packages, shell bash, Julia software and some Julia packages :
- NeoEnrichment - do neoantigen enrichment for this project.
- tidyverse - tidy data
- data.table - read and tidy data
- survival - survival analysis
- survminer - plot survival fit
- DT - show data table as a table in html
- patchwork - arrange figures into complex compound figures
- maftools - summarize, Analyze and Visualize MAF files
- ComplexHeatmap - make heatmap
- ezcox - operate a batch of univariate or multivariate Cox models and return tidy result
- ggcharts - plot pyramid bar plot
- parallel - Parallel Computing
- Julia - do simulation
- knitr, rmdformats - used to compile this file
library(tidyverse)
library(ggpubr)
library(cowplot)
library(NeoEnrichment)
library(parallel)
library(survminer)
library(survival)
library(data.table)
library(patchwork)
library(ComplexHeatmap)
library(maftools)
library(ezcox)
library(ggcharts)
Data download and preprocessing
This part will describe how and where the data that this project used is downloaded and pre-processed.
TCGA pancancer data download and clean
Mutation data
Pre-compiled curated somatic mutations (called by MuTect2) for TCGA cohorts were downloaded from Xena and only keep missense variants for following analysis
<- data.table::fread("~/useful_data/GDC-PANCAN.mutect2_snv.tsv") %>%
mutect2 ::filter(effect=="missense_variant" & filter=="PASS")
dplyrsaveRDS(mutect2,file = "../data/pancancer_mutation.rds")
view this file:
<- readRDS("../data/pancancer_mutation.rds")
pancancer_mutation ::datatable(head(pancancer_mutation),
DToptions = list(scrollX = TRUE, keys = TRUE,
columnDefs = list(list(className = 'dt-center', targets = 2))), rownames = FALSE
)
To predict neoantigen, mutation files were first transformed into single sample VCF format by maf2vcf tools (This Code can be found in code/shell/TCGA_neoantigen
folder).
ABSOLUTE-annotated MAF file which contains cancer cell fraction (CCF) information of mutations was downloaded from GDC PanCanAtlas publications (TCGA_consolidated.abs_mafs_truncated.fixed.txt.gz), and then we used liftover function from R package “rtracklayer” to convert the this hg37 genome coordinates file to hg38 coordinates:
<- data.table::fread("~/useful_data/TCGA_consolidated.abs_mafs_truncated.fixed.txt")
absolute_maf
<- absolute_maf %>%
absolute_maf ::select(Hugo_Symbol,Chromosome,Start_position,End_position,ccf_hat,sample,
dplyr
Protein_Change,Variant_Classification,Variant_Type,Reference_Allele,
Tumor_Seq_Allele2,ref,alt,purity)library(GenomicRanges)
$Chromosome <- paste0("chr",absolute_maf$Chromosome)
absolute_maf
<- makeGRangesFromDataFrame(absolute_maf,
absolute_granges keep.extra.columns=TRUE,
ignore.strand=TRUE,
seqinfo=NULL,
seqnames.field="Chromosome",
start.field="Start_position",
end.field="End_position",
starts.in.df.are.0based=FALSE)
library(rtracklayer)
<- import.chain("~/useful_data/hg19ToHg38.over.chain")
chainObject
<- as.data.frame(liftOver(absolute_granges, chainObject))
results <- results %>%
results filter(width==1)
<- results %>%
results filter(Variant_Classification=="Missense_Mutation") %>%
mutate(index=paste(substr(sample,1,16),seqnames,start,Reference_Allele,Tumor_Seq_Allele2,sep = ":")) %>%
select(sample,index,Hugo_Symbol,Protein_Change,Reference_Allele,Tumor_Seq_Allele2,purity,ccf_hat)
saveRDS(results,file = "../data/all_mut_mis_ccf.rds")
Driver mutation data was downloaded from Bailey M H et.al study (Mutation.CTAT.3D.Scores.txt). This study used three different categories of tools to find driver mutations:
- tools distinguishing benign versus pathogenic mutations using sequence (CTAT population);
- tools distinguishing driver versus passenger mutations using sequence (CTAT cancer);
- tools discovering statistically significant three-dimensional clusters of missense mutations (structure based).
We keep mutations identified by ≥2 approaches as finial high confident driver mutations, including 3437 unique mutations:
<- data.table::fread("~/data/Mutation.CTAT.3D.Scores.txt")
driver <- driver %>%
driver_mutations mutate(nmethod=`New_Linear (functional) flag`+`New_Linear (cancer-focused) flag`+`New_3D mutational hotspot flag`) %>%
filter(nmethod>=2)
saveRDS(driver_mutations,file = "../data/driver_mutations.rds")
Expression data
The normalized gene-level RNA-seq data (TPM, transcripts per million) for 31 TCGA cohorts were downloaded from Xena (tcga_RSEM_gene_tpm) and transformed from log-form to non-log form (This file is too large to store in Github):
<- data.table::fread("~/useful_data/xena_RSEM_TPM/tcga_RSEM_gene_tpm.gz")
TPM_log2 <- data.table::fread("~/useful_data/xena_RSEM_TPM/mapping_probe")
mapping <- mapping[,1:2]
mapping <- left_join(TPM_log2 %>% rename(id=sample),mapping) %>%
TPM_log2_mapping select(id,gene,everything())###log2(tpm+0.001)
<- as.data.frame(TPM_log2_mapping)
TPM_log2_mapping <- TPM_log2_mapping
tpm_trans 3:ncol(tpm_trans)] <- apply(TPM_log2_mapping[,3:ncol(TPM_log2_mapping)],2,
tpm_trans[,function(x){(2^x) - 0.001})
saveRDS(tpm_trans,file = "~/useful_data/xena_RSEM_TPM/tpm_trans.rds")
HLA typing and immune cell infiltration data
HLA typing data was downloaded from Thorsson et.al study and transformed to the format required by the neoantigen prediction tools:
<- read.table("~/useful_data/panCancer_hla.tsv",sep = "\t")
hla <- hla %>%
hla separate(col = V2,into = c("HLA-A_1","HLA-A_2",
"HLA-B_1","HLA-B_2",
"HLA-C_1","HLA-C_2"),sep = ",")
###no stars
<- t(base::apply(hla,1,function(x){
hla which(duplicated(x))] <- NA
x[<- gsub("\\*","",x)
x
x%>% as.data.frame()
}))
colnames(hla)[1] <- "Patient"
write.table(hla,file = "~/useful_data/TCGA_HLA_typing.txt",sep = "\t",quote = F,col.names = T,row.names = F)
Immune cell infiltration data for all TCGA tumors was downloaded from ImmuneCellAI study (Miao et al., 2020), which estimates the abundance of 24 immune cells comprised of 18 T-cell subtypes and 6 other immune cells. We can only download indivial cancer type data from the web server, so we need combine them all:
<- list.files("~/test/immune_score/immuneCellAI/")
files <- vector("list",length(files))
re for(i in seq_along(re)){
<- data.table::fread(paste0("~/test/immune_score/immuneCellAI/",files[i]),data.table = F) %>%
dt rename(sample=V1) %>%
mutate(sample=substr(sample,1,16)) %>%
mutate(sample=gsub("\\.","-",sample))
<- dt
re[[i]]
}<- bind_rows(re)
results saveRDS(results,file = "../data/pancancer_subtcells.rds")
We can view this file:
<- readRDS("../data/pancancer_subtcells.rds")
pancancer_immune ::datatable(head(pancancer_immune),
DToptions = list(scrollX = TRUE, keys = TRUE), rownames = FALSE
)
Other Immune cell infiltration data including CIBERSORT (abs mode), Quantiseq were downloaded from the TIMER2.0 study.
TCGA pancancer data processing
This part describes HLA typing and neoantigen prediction of TCGA data.
HLA typing
For samples which don’t have corresponding HLA infromation data in Thorsson et.al study, HLA genotyping was performed with Optitype (Szolek et al., 2014), using default parameters. This corresponding code can be found in code/python/HLA_typing
(using snakemake pipeline tools).
Neoantigen prediction
Mutect2 mutation files were first transformed into VCF format by maf2vcf tools, and we used NeoPredPipe to predict neoantigen (Schenck et al., 2019). Single-nucleotide variants leading to a single amino acid change are the focus of this study. From the output results, if the IC50 of a novel peptide is less than 50, the bind level is SB (strong binder, rank is less than 0.5%), and the expression level (TPM) is greater than 1, then this peptide is labeled as neoantigen. A mutation is considered neoantigenic if there is at least one peptide derived from the mutated site is predicted as neoantigen. The neoantigen prediction code using NeoPrePipe can be found in code/shell/TCGA_neoantigen
. Then we add CCF and mRNA expression information to neoantigen prediction data:
library(dplyr)
setwd("/public/slst/home/wutao2/TCGA_neopredpipe/batch_results")
<- read.table("files")
files for (i in 1:100){
<- data.table::fread(paste(files$V1[i],"/batch_",files$V1[i],".neoantigens.unfiltered.txt",sep = ""),fill=TRUE)
mut
<- mut %>%
mut select(c(1,4:11,21:28))
colnames(mut) <- c("sample","chr","position",
"ref","alt","gene","exp","res_pos",
"hla","score_el","rank_el","score_ba",
"rank_ba","IC50","candidate","bindlevel",
"novelty")
<- mut %>%
mut mutate(novelty=ifelse(is.na(novelty),0,novelty)) %>%
mutate(index=paste(sample,chr,position,ref,alt,sep = ":"))
saveRDS(mut,file = paste(files$V1[i],".rds",sep = ""))
}
<- list.files("~/test/data/2021_03_31/")
files <- vector("list",100)
re
for (i in 1:100){
<- readRDS(paste("~/test/data/2021_03_31/",files[i],sep = ""))
mut <- mut %>%
neo filter(novelty == 1) %>%
filter(IC50<50 & bindlevel=="SB" & novelty==1 ) %>%
distinct(index,.keep_all = T)
<- mut %>%
mut distinct(index,.keep_all = T) %>%
mutate(neo = ifelse(index %in% neo$index , "neo","not_neo"))
<- mut
re[[i]]
}
<- bind_rows(re)
all_mut <- all_mut %>%
all_mut select(sample,chr,position,ref,alt,neo,gene,exp)
<- all_mut %>%
all_mut mutate(gene=gsub("\\:.+","",gene))
saveRDS(all_mut,file = "~/test/data/2021_04_17/all_mut.rds")
<- readRDS("~/useful_data/xena_RSEM_TPM/tpm_trans.rds")
tpm <- tpm[!duplicated(tpm$gene),]
tpm $tpm_exp <- mapply(function(sample,gene){
all_mut$gene==gene,substr(sample,1,15)]
tpm[tpm$sample,all_mut$gene)
},all_mut<- all_mut %>%
all_mut1 filter(lengths(tpm_exp)!=0)
$tpm_exp <- as.numeric(all_mut1$tpm_exp)
all_mut1saveRDS(all_mut1,file = "../data/all_mut_tpm_not_filter.rds")##this file is used to calculte EXP-ES
<- all_mut1 %>%
all_mut1 mutate(neo2=ifelse(neo=="neo" & tpm_exp>1,"neo","not_neo"))
<- all_mut1 %>%
all_mut1 select(-neo,-exp) %>%
rename(neo=neo2)
saveRDS(all_mut1,file = "~/test/all_mut_tpm.rds")
##add ccf
<- readRDS("~/test/data/2021_04_05/all_mut_mis_ccf.rds")
results
<- readRDS("~/test/all_mut_tpm.rds")
all_mut_tpm #all_mut_tpm <- readRDS("~/test/all_mut_tpm_not_filter.rds")
<- all_mut_tpm %>%
all_mut mutate(index=paste(sample,chr,position,ref,alt,sep = ":")) %>%
::rename(ref_allele=ref,alt_allele=alt)
dplyr
<- inner_join(
all_mut_ccf
all_mut,%>% select(-sample),
results by="index"
)<- all_mut_ccf[!duplicated(all_mut_ccf$index),]
all_mut_ccf saveRDS(all_mut_ccf,file = "../data/all_mut_ccf_tpm.rds")##This file is used to calculate CCF-ES
Immunotherapy data download and clean
We collected three cohorts of immunotherapy datasets for our analysis. The Hugo et al. (2016) dataset
was related to anti-PD-1 therapy in metastatic melanoma. This dataset has 37 samples with WES data, 26 were also analyzed by RNA sequencing (RNA-seq). The Riaz et al. (2017) dataset
was related to anti-PD-1 therapy in metastatic melanoma. This dataset has 56 samples with WES data, 40 with RNA-seq. The David Liu et.al (2019) cohort
has patients with melanoma treated with anti-PD1 ICB, which 119 samples has WES data, 112 with RNA-seq.
Immune therapy response for patients was defined by RECIST v1.1(CR/PR/SD/PD), responding tumors were derived from patients who have complete or partial responses (CR/PR) in response to anti-PD-1 therapy; non-responding tumors were derived from patients who had progressive disease or stable disease (PD/SD).
The details of mutation calling, producing copy number segment file (using GATK4) and RNA_seq process can be found in Methods. Code can be found in code/shell/ICI
. The HLA typing and neoantigen prediction was the same as the process in TCGA data as previously described. The code for caculating CCF by ABSOLUTE can be found in code/R/ICI_ABSOLUTE.R
After got neoantiogen prediction and ABSOLUTE CCF information, we can combine these data for downstream analysis:
####combine neoantigen prediction data
library(dplyr)
library(readr)
<- list.files("immunetherapy/")
files <- files[grepl("neoantigens.unfiltered.txt",files)]
files <- vector("list",25)
re for (i in 1:25){
<- read_table2(paste("immunetherapy/",files[i],sep = ""),col_names = paste0(rep("V",27),c(1:27)))
mut
<- mut %>%
mut select(c(1,3:10,20:27))
colnames(mut) <- c("sample","chr","position",
"ref","alt","gene","exp","res_pos",
"hla","score_el","rank_el","score_ba",
"rank_ba","IC50","candidate","bindlevel",
"novelty")
<- mut %>%
mut mutate(novelty=ifelse(is.na(novelty),0,novelty)) %>%
mutate(index=paste(sample,chr,position,ref,alt,sep = ":"))
<- mut %>%
neo filter(IC50<50 & bindlevel=="SB" & novelty==1 & exp>1) %>%
distinct(index,.keep_all = T)
<- mut %>%
mut distinct(index,.keep_all = T) %>%
mutate(neo = ifelse(index %in% neo$index , "neo","not_neo"))
<- mut %>%
mut select(sample,chr,position,gene,exp,neo) %>%
mutate(gene=gsub("\\:.+","",gene))
$sample <- paste(gsub("_.+","",files[i]),mut$sample,sep = "_")
mut<- mut
re[[i]]
}
<- bind_rows(re)
all_mut saveRDS(all_mut,file = "../data/Immunotherapy/all_mut_ici.rds")
###ccf in maf files of ABSOLUTE output
##nadeem
<- read.table("immunetherapy/nadeem_absolute/Nad_sample_run",sep = ",",stringsAsFactors = F)
sample_run <- list.files("immunetherapy/nadeem_absolute/")[-1]
files <- vector("list",57)
re
for (i in 1:57){
<- read.table(paste0("immunetherapy/nadeem_absolute/",files[i]),sep = "\t",header = T) %>%
test select(sample,Hugo_Symbol,Chromosome,Start_position,cancer_cell_frac,purity)
$sample <- paste("nadeem",as.character(sample_run[which(sample_run$V2==unique(test$sample)),"V1"]),
testsep = "_")
<- test
re[[i]]
}<- bind_rows(re)
nadeem_ccf
##willy
<- read.table("immunetherapy/willy_absolute/clinical.txt",sep = ",",stringsAsFactors = F)
sample_run <- list.files("immunetherapy/willy_absolute/")[-1]
files <- vector("list",38)
re
for (i in 1:38){
<- read.table(paste0("immunetherapy/willy_absolute/",files[i]),sep = "\t",header = T) %>%
test select(sample,Hugo_Symbol,Chromosome,Start_position,cancer_cell_frac,purity)
$sample <- paste("willy",as.character(sample_run[which(sample_run$V2==unique(test$sample)),"V1"]),
testsep = "_")
<- test
re[[i]]
}<- bind_rows(re)
willy_ccf
###liu
<- read.table("immunetherapy/liu_absolute/liu_sample_run",sep = ",",stringsAsFactors = F)
sample_run <- list.files("immunetherapy/liu_absolute/")[-1]
files <- vector("list",121)
re
for (i in 1:121){
<- read.table(paste0("immunetherapy/liu_absolute/",files[i]),sep = "\t",header = T) %>%
test select(sample,Hugo_Symbol,Chromosome,Start_position,cancer_cell_frac,purity)
$sample <- paste("liu",as.character(sample_run[which(sample_run$V2==unique(test$sample)),"V1"]),
testsep = "_")
<- test
re[[i]]
}<- bind_rows(re)
liu_ccf
<- bind_rows(liu_ccf,nadeem_ccf,willy_ccf)
all_ccf saveRDS(all_ccf,file = "../data/Immunotherapy/all_ccf.rds")
###merge ccf and neoantigen data
<- left_join(
all_mut_ccf %>%
all_mut_ici mutate(index=paste(sample,chr,position,sep = ":")),
%>%
all_ccf mutate(Chromosome=paste0("chr",Chromosome)) %>%
mutate(index=paste(sample,Chromosome,Start_position,sep = ":")) %>%
select(index,cancer_cell_frac,purity),
by="index"
)
<- all_mut_ccf %>% filter(!is.na(cancer_cell_frac))
all_mut_ccf saveRDS(all_mut_ccf,file = "../data/Immunotherapy/all_mut_ccf_ici.rds")
TCGA pancancer analysis
The existence of significant immunoediting signal
We investigate the status of this immunoediting signal with the new method developed in this study using TCGA pan-cancer dataset. The method detail showed in Method part of manuscript, and implemented in the package NeoEnrichment
which can be found in Github
We filterd samples with at least 1 neoantigenic and 1 subclonal mutation (CCF<0.6) and calculated CCF-NES value:
<- readRDS("../data/all_mut_ccf_tpm.rds")
all_mut_ccf <- all_mut_ccf %>%
all_mut_ccf rename(ccf=ccf_hat) %>%
mutate(neo=ifelse(neo=="neo","yes","no"))
<- all_mut_ccf %>%
samples_has_subclonal filter(ccf<0.6) %>%
select(sample) %>%
distinct(sample)
%>%
all_mut_ccf filter(sample %in% samples_has_subclonal$sample) %>%
group_by(sample) %>%
summarise(c_n=sum(neo=="yes"),c_m=sum(neo=="no")) %>% filter(c_n>=1 & c_m >=1) -> summ
<- all_mut_ccf %>%
neo_missense filter(sample %in% summ$sample)
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 16),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- dt %>% filter(sample == x)
data <- NeoEnrichment::cal_nes_new_test(dt = data,
a sample_counts = 1000,
need_p = FALSE)
return(a)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- neo_missense %>% select(sample,neo,ccf) %>% filter(!is.na(ccf))
neo_missense <- cal_nes_warp(neo_missense)
neo_nes saveRDS(neo_nes,file = "../data/neo_nes_ccf06_1.rds")
For each sample, we permutate the neoantigen labeling (ie. randomly select the same number of mutations as the actual neoantigen number in the selected sample and label them as neoantigenic mutations) and calculate ES value. For pan-cancer or cancer type dataset, we can obtain the same number of simulated samples and corresponding ES values, then calculate the median ES of these simulated samples. This process was repeated many times (usually 2000 times) to get the simulated distribution of median ES. The actual pan-cancer or cancer type median ES values are compared with this simulated ES distribution, and p values are then reported:
<- readRDS("/public/slst/home/wutao2/cal_nes/ccf/neo_nes_ccf06_1.rds")
neo_nes <- readRDS("/public/slst/home/wutao2/cal_nes/ccf/all_mut_ccf_tpm.rds")
all_mut_ccf <- all_mut_ccf %>%
all_mut_ccf rename(ccf=ccf_hat) %>%
mutate(neo=ifelse(neo=="neo","yes","no"))
<- all_mut_ccf %>%
neo_missense filter(sample %in% neo_nes$sample)
<- neo_missense %>%
neo_missense select(sample,neo,ccf) %>%
filter(!is.na(ccf))
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 30),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- dt %>% filter(sample == x)
data <- NeoEnrichment::cal_nes_new_test(dt = data,
a sample_counts = 1000,
need_p = FALSE)
return(a)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- vector("list",2000)
res for (i in 1:2000){
<- neo_missense %>%
neo_missense_sim group_by(sample) %>%
mutate(neo_sim=sample(neo,length(neo)))
<- neo_missense_sim %>%
neo_missense_sim select(-neo) %>%
rename(neo=neo_sim)
<- cal_nes_warp(neo_missense_sim)
neo_nes_sim $sim_num <- i
neo_nes_sim<- neo_nes_sim
res[[i]] print(paste0("Complete ",i," sim. "))
}saveRDS(res,file = "/public/slst/home/wutao2/cal_nes/ccf/sim_2000_not_filter_driver.rds")
<- readRDS("~/test/data/2021_09_26/sim_2000_not_filter.rds")
sim_2000_not_filter <- bind_rows(sim_2000_not_filter)
sim_2000_not_filter $cancer <- get_cancer_type(sim_2000_not_filter$sample,
sim_2000_not_filterparallel = T,cores = 20)
saveRDS(sim_2000_not_filter,file = "../../tmp/sim_2000_not_filter_all_ccf.rds")##this file is too large to push to Github
We can visualize the simulation using WVPlots
package :
<- readRDS("../data/neo_nes_ccf06_1.rds")
neo_nes <- readRDS("../../tmp/sim_2000_not_filter_all_ccf.rds")
sim_all %>%
sim_all group_by(cancer,sim_num) %>%
summarise(median_es=median(es)) -> summ2
#> `summarise()` has grouped output by 'cancer'. You can override using the
#> `.groups` argument.
$cancer <- get_cancer_type(neo_nes$sample)
neo_nes<- neo_nes %>%
neo_nes_summ group_by(cancer) %>% summarise(median_es=median(es))
#The following code can be used to plot cancer type simulation which showd in FigS4
#res <- vector("list",30)
# for (i in 1:30){
# dt <- sim_all %>% filter(cancer==neo_nes_summ$cancer[i])
# dt_summ <- dt %>%
# group_by(sim_num) %>%
# summarise(median_es=median(es))
# p <- WVPlots::ShadedDensity(frame = dt_summ,
# xvar = "median_es",
# threshold = neo_nes_summ$median_es[i],
# title = neo_nes_summ$cancer[i],
# tail = "left")
# p$layers[[1]]$aes_params$colour <- "red"
# p$layers[[1]]$aes_params$size <- 1
# p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
# p$layers[[3]]$aes_params$colour <- "black"
# p$layers[[3]]$aes_params$size <- 1
# p1 <- p + labs(x="Simulation median es")+
# theme_prism()
# res[[i]] <- p1
# }
#
# plot_grid(plotlist = res)
%>%
sim_all group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ
<- WVPlots::ShadedDensity(frame = summ,
p xvar = "median_es",
threshold = median(neo_nes$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p#p$layers[[4]]$aes_params$label <- "Actual median ES" #geom_text
<- p + labs(x="Simulation median es")+
p1 theme_prism()
p1
##save
# neo_nes_summ <- neo_nes_summ %>%
# rowwise() %>%
# mutate(p=mean(summ2$median_es[summ2$cancer==cancer] <= median_es))
# saveRDS(neo_nes_summ,file = "neo_nes_summ_ccf_061_not_filter.rds")
Then we can compare the median actual CCF-ES with median simulation CCF-ES in pan-cancer and cancer-type:
<- function(dt,pancancer_p,dt2,median_es){
get_f1 <- ggplot(data=dt,aes(x=1,y=es))+
p1 geom_violin(alpha=0.7,width=0.5)+
geom_boxplot(width=0.2)+
theme_prism()+
labs(x=paste0("median es = ",round(median_es,digits = 3),"\n n = ",nrow(dt)),size=1)+
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank())+
theme(text = element_text(size = 7))+
annotate(geom="text", x=1, y=1.1, label=paste0("p = ",pancancer_p),
color="red",size=4)
$cancer <- get_cancer_type(dt$sample)
dt%>% group_by(cancer) %>%
dt summarise(median_est=median(es),c=n()) %>%
arrange(median_est) %>%
mutate(label=paste0(cancer,"\n(n=",c,")"))-> summ1
<- left_join(summ1,dt2 %>% select(cancer,p))
summ1 $p <- signif(summ1$p,digits = 1)
summ1<- summ1 %>%
summ1 mutate(sig=case_when(
<= 0.05 & p > 0.01 ~ "*",
p < 0.01 ~ "**",
p TRUE ~ "ns"
))<- left_join(dt,summ1)
dt $label <- factor(dt$label,levels = summ1$label)
dt<- data.frame(x = 1:nrow(summ1), y = 1.1, family = summ1$sig)
df2 <- ggplot(data=dt,aes(x=label,y=es))+
p2 geom_boxplot()+
theme_prism()+
theme(axis.title.x = element_blank())+
theme(axis.text.x = element_text(angle = 45,vjust = 1, hjust = 1))+
theme(text = element_text(size = 6))+
geom_text(data=df2,aes(x=x,y=y,label = family))+
geom_hline(yintercept=0,
color = "red", size=1)
<- p1 + p2 + plot_layout(widths = c(1, 8))
p3
}
<- readRDS("../data/neo_nes_ccf06_1.rds")
neo_nes4 <- readRDS("../data/neo_nes_summ_ccf_061_not_filter.rds")
neo_nes_summ <- get_f1(neo_nes4,pancancer_p = 0.051,dt2 = neo_nes_summ,median_es = median(neo_nes4$es))
p3 #> Joining, by = "cancer"Joining, by = "cancer"
p3
The observed ESCCF is -0.017 (p=0.051). In PAAD and LUAD, the observed ESCCF values are significant lower compared with the random simulations, suggesting the existence of immunoediting-elimination signal. Since some neoantigenic mutations can be cancer drivers, which are known to undergo positive selection during the evolution of cancer. Neoantigens that happens to be cancer drivers are not undergoing immune based negative selection, probably because the driving force could override the negative selection, so we compared ES between the samples which neoantigenic and driver lie in the same gene with other samples:
<- all_mut_ccf %>%
all_mut_ccf mutate(gene_protein_change=paste(Hugo_Symbol,Protein_Change,sep = "-"))
<- driver_mutations %>%
driver_mutations mutate(gene_protein_change=paste(gene,protein_change,sep = "-"))
<- all_mut_ccf %>%
all_mut_ccf mutate(is_driver=ifelse(gene_protein_change %in% driver_mutations$gene_protein_change,"yes","no"))
sum(all_mut_ccf$is_driver=="yes" & all_mut_ccf$neo=="yes") / sum(all_mut_ccf$neo=="yes")##0.01029878
%>% group_by(sample) %>%
all_mut_ccf summarise(inter_gene=intersect(Hugo_Symbol[neo=="yes"],
=="yes"])) -> aaa##701 samples
Hugo_Symbol[is_driver<- all_mut_ccf %>%
all_mut_ccf mutate(sample_neo_index=paste(sample,neo,Hugo_Symbol,sep = ","))
<- aaa %>% mutate(sample_neo_index=paste(sample,"yes",inter_gene,sep = ","))
aaa
%>%
all_mut_ccf mutate(in_aaa = ifelse(sample_neo_index %in% aaa$sample_neo_index,"yes","no")) %>%
group_by(sample) %>%
summarise(need_sample=ifelse(any(in_aaa=="yes"),"no","yes")) %>%
filter(need_sample=="yes") -> summ2
<- intersect(samples_has_subclonal$sample,summ2$sample)
need_samples %>%
all_mut_ccf filter(sample %in% need_samples) %>%
group_by(sample) %>%
summarise(c_n=sum(neo=="yes"),c_m=sum(neo=="no")) %>% filter(c_n>=1 & c_m >=1) -> summ
<- all_mut_ccf %>% filter(sample %in% summ$sample)
neo_missense <- neo_missense %>% select(sample,neo,ccf) %>% filter(!is.na(ccf))
neo_missense <- cal_nes_warp(neo_missense)##addp
neo_nes median(neo_nes$es)
saveRDS(neo_nes,file = "../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds")
<- readRDS("../data/neo_nes_ccf06_1.rds")
neo_nes <- readRDS("../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds")
neo_nes_remove_drivers <- neo_nes %>%
neo_nes2 filter(!(sample %in% neo_nes_remove_drivers$sample))
<- data.frame(es=c(neo_nes_remove_drivers$es,neo_nes2$es),
dt type=c(rep("neo not in driver",nrow(neo_nes_remove_drivers)),
rep("neo in driver",nrow(neo_nes2))))
$type <- factor(dt$type,levels = c("neo not in driver","neo in driver"))
dtggplot(data = dt,aes(x=type,y=es))+
geom_boxplot()+
stat_compare_means()+
theme_prism()+
theme(axis.title.x = element_blank())
We can see that samples which neoantigenic and driver mutation lie in the same gene have the higher ES than other samples. So we filtered these sample to remove the effect that driver mutation cound couse.
When samples with neoantigenic and driver mutations lied on same genes are not included, the observed median ESCCF is -0.023 (n=5295, P=0.0055) (the simulation procedure is the same as above):
<- readRDS("../../tmp/sim_2000_not_filter_all_ccf.rds")##this file is too large to push to Github
sim_all <- sim_all %>%
sim_filter_driver filter(sample %in% neo_nes_remove_drivers$sample)
%>%
sim_filter_driver group_by(cancer,sim_num) %>%
summarise(median_es=median(es)) -> summ2
#> `summarise()` has grouped output by 'cancer'. You can override using the
#> `.groups` argument.
$cancer <- get_cancer_type(neo_nes_remove_drivers$sample)
neo_nes_remove_drivers<- neo_nes_remove_drivers %>%
neo_nes_summ group_by(cancer) %>% summarise(median_es=median(es))
#The following code can be used to plot cancer type simulation which showd in FigS6
#res <- vector("list",30)
# for (i in 1:30){
# dt <- sim_filter_driver %>% filter(cancer==neo_nes_summ$cancer[i])
# dt_summ <- dt %>%
# group_by(sim_num) %>%
# summarise(median_es=median(es))
# p <- WVPlots::ShadedDensity(frame = dt_summ,
# xvar = "median_es",
# threshold = neo_nes_summ$median_es[i],
# title = neo_nes_summ$cancer[i],
# tail = "left")
# p$layers[[1]]$aes_params$colour <- "red"
# p$layers[[1]]$aes_params$size <- 1
# p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
# p$layers[[3]]$aes_params$colour <- "black"
# p$layers[[3]]$aes_params$size <- 1
# p1 <- p + labs(x="Simulation median es")+
# theme_prism()
# res[[i]] <- p1
# }
#
# plot_grid(plotlist = res)
%>%
sim_filter_driver group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ
<- WVPlots::ShadedDensity(frame = summ,
p xvar = "median_es",
threshold = median(neo_nes_remove_drivers$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p#p$layers[[4]]$aes_params$label <- "Actual median ES" #geom_text
<- p + labs(x="Simulation median es")+
p1 theme_prism()
p1
##save
# neo_nes_summ <- neo_nes_summ %>%
# rowwise() %>%
# mutate(p=mean(summ2$median_es[summ2$cancer==cancer] <= median_es))
# saveRDS(neo_nes_summ,file = "neo_nes_summ_ccf_061.rds")
<- readRDS("../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds") %>%
neo_nes3 select(-p)
<- readRDS("../data/neo_nes_summ_ccf_061.rds")
neo_nes_summ <- get_f1(neo_nes3,pancancer_p = 0.0055,dt2 = neo_nes_summ,median_es = median(neo_nes3$es))
p3 #> Joining, by = "cancer"Joining, by = "cancer"
p3
Several cancer types including ACC, PAAD, UCEC, LUAD show significant low ESCCF values. This data demonstrates the existence of immunoediting-elimination signal in TCGA dataset.
We then calculated the EXP-ES and corresponding simulation median ES distribution:
<- readRDS("../data/all_mut_tpm_not_filter.rds")
all_mut_exp <- all_mut_exp %>%
all_mut_exp select(sample,tpm_exp,neo,chr,position,gene) %>%
rename(exp=tpm_exp)
%>%
all_mut_exp group_by(sample) %>%
summarise(n_a=sum(neo=="not_neo"),n_c=sum(neo=="neo")) %>%
filter(n_a>=1,n_c>=1) -> summ
<- readRDS("../data/pancancer_mutation.rds") %>%
pancancer_mutation mutate(index=paste(Sample_ID,chrom,start,sep = ":")) %>%
select(index,Amino_Acid_Change)
<- left_join(
all_mut_exp %>% mutate(index=paste(sample,chr,position,sep = ":")),
all_mut_exp
pancancer_mutation
)<- all_mut_exp %>%
all_mut_exp mutate(gene_protein_change=paste(gene,Amino_Acid_Change,sep = "-"))
<- readRDS("../data/driver_mutations.rds")
driver_mutations <- driver_mutations %>%
driver_mutations mutate(gene_protein_change=paste(gene,protein_change,sep = "-"))
<- all_mut_exp %>%
all_mut_exp mutate(is_driver=ifelse(gene_protein_change %in% driver_mutations$gene_protein_change,"yes","no"))
sum(all_mut_exp$is_driver=="yes" & all_mut_exp$neo=="neo") / sum(all_mut_exp$neo=="neo")##0.004702473
%>% group_by(sample) %>%
all_mut_exp summarise(inter_gene=intersect(gene[neo=="neo"],
=="yes"])) -> aaa##778 samples
gene[is_driver<- all_mut_exp %>%
all_mut_exp mutate(sample_neo_index=paste(sample,neo,gene,sep = ","))
<- aaa %>% mutate(sample_neo_index=paste(sample,"neo",inter_gene,sep = ","))
aaa
%>%
all_mut_exp mutate(in_aaa = ifelse(sample_neo_index %in% aaa$sample_neo_index,"yes","no")) %>%
group_by(sample) %>%
summarise(need_sample=ifelse(any(in_aaa=="yes"),"no","yes")) %>%
filter(need_sample=="no") -> summ2
<- all_mut_exp %>% filter(sample %in% summ$sample) %>%
neo_exp filter(!(sample %in% summ2$sample))
<- neo_exp %>%
neo_exp select(sample,neo,exp)
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 20),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- NeoEnrichment::cales_t(data = dt,barcode = x,type = "II",
a calp = TRUE,sample_counts = 1000,
cal_type = "exp")
#df <- data.frame(sample=x,es=a)
return(a)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- cal_nes_warp(neo_exp)
nes_exp saveRDS(nes_exp,file = "../data/es_exp_filter_driver.rds")
<- all_mut_exp %>% filter(sample %in% summ$sample) %>% select(sample,neo,exp)
neo_exp <- cal_nes_warp(neo_exp)
nes_exp saveRDS(nes_exp,file = "../data/es_exp_not_filter_driver.rds")
##sim
<- readRDS("../data/es_exp_not_filter_driver.rds")
neo_nes <- readRDS("../data/all_mut_tpm_not_filter.rds")
all_mut_exp <- all_mut_exp %>%
all_mut_exp select(sample,tpm_exp,neo) %>%
rename(exp=tpm_exp)
<- all_mut_exp %>% filter(sample %in% neo_nes$sample)
neo_missense <- neo_missense %>% filter(!is.na(exp))
neo_missense
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 40),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- NeoEnrichment::cales_t(data = dt,barcode = x,type = "II",
a calp = FALSE,sample_counts = 1000,
cal_type = "exp")
<- data.frame(sample=x,es=a)
df return(df)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- vector("list",2000)
res for (i in 1:2000){
<- neo_missense %>%
neo_missense_sim group_by(sample) %>%
mutate(neo_sim=sample(neo,length(neo))) %>%
ungroup()
<- neo_missense_sim %>%
neo_missense_sim select(-neo) %>%
rename(neo=neo_sim)
<- cal_nes_warp(neo_missense_sim)
neo_nes_sim $sim_num <- i
neo_nes_sim<- neo_nes_sim
res[[i]] print(paste0("Complete ",i," sim. "))
}saveRDS(res,file = "../data/sim_2000_not_filter_driver_exp.rds")
<- readRDS("../data/sim_2000_not_filter_driver_exp.rds")
sim_2000_not_filter <- bind_rows(sim_2000_not_filter)
sim_2000_not_filter $cancer <- get_cancer_type(sim_2000_not_filter$sample,
sim_2000_not_filterparallel = T,cores = 20)
saveRDS(sim_2000_not_filter,file = "../../tmp/sim_2000_not_filter_driver_exp_all.rds")
<- readRDS("../../tmp/sim_2000_not_filter_driver_exp_all.rds")
sim_all <- readRDS("../data/es_exp_filter_driver.rds")
nes_exp <- sim_all %>%
sim_filter_driver filter(sample %in% nes_exp$sample)
%>%
sim_filter_driver group_by(cancer,sim_num) %>%
summarise(median_es=median(es)) -> summ2
#> `summarise()` has grouped output by 'cancer'. You can override using the
#> `.groups` argument.
$cancer <- get_cancer_type(nes_exp$sample)
nes_exp<- nes_exp %>%
neo_nes_summ group_by(cancer) %>% summarise(median_es=median(es))
#The following code can be used to plot cancer type simulation which showd in FigS7
#res <- vector("list",30)
# for (i in 1:30){
# dt <- sim_all %>% filter(cancer==neo_nes_summ$cancer[i])
# dt_summ <- dt %>%
# group_by(sim_num) %>%
# summarise(median_es=median(es))
# p <- WVPlots::ShadedDensity(frame = dt_summ,
# xvar = "median_es",
# threshold = neo_nes_summ$median_es[i],
# title = neo_nes_summ$cancer[i],
# tail = "left")
# p$layers[[1]]$aes_params$colour <- "red"
# p$layers[[1]]$aes_params$size <- 1
# p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
# p$layers[[3]]$aes_params$colour <- "black"
# p$layers[[3]]$aes_params$size <- 1
# p1 <- p + labs(x="Simulation median es")+
# theme_prism()
# res[[i]] <- p1
# }
#
# plot_grid(plotlist = res)
%>%
sim_all group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ
<- WVPlots::ShadedDensity(frame = summ,
p xvar = "median_es",
threshold = median(nes_exp$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p#p$layers[[4]]$aes_params$label <- "Actual median ES" #geom_text
<- p + labs(x="Simulation median es")+
p1 theme_prism()+
scale_y_continuous(breaks=c(0,25,50,75,100))
p1
##save
# neo_nes_summ <- neo_nes_summ %>%
# rowwise() %>%
# mutate(p=mean(summ2$median_es[summ2$cancer==cancer] <= median_es))
# saveRDS(neo_nes_summ,file = "../data/neo_nes_summ_exp.rds")
<- readRDS("../data/neo_nes_summ_exp.rds")
neo_nes_summ <- get_f1(nes_exp,pancancer_p = "< 0.0005",dt2 = neo_nes_summ,median_es = median(nes_exp$es))
p3 #> Joining, by = "cancer"Joining, by = "cancer"
p3
In majority of cancer types (including KICH, DLBC, CHOL, SARC, LUAD, PAAD, UCS, STAD, LGG, LUSC, HNSC, OV, UCEC, BRCA, LIHC, COAD), a significant low ESRNA values are observed . This study demonstrates that the immunoediting escape through down-regulating the expression of neoantigenic alteration is prevalent in human cancer.
When we compared the median cancer type CCF-ES and EXP-ES, we found the significant negative corrlation between this two type ES value:
##these add p data can be calculated by change the parameter of need_p to TRUE
<- readRDS("../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds")
nes_ccf <- readRDS("../data/es_exp_filter_driver.rds")
nes_exp
<- inner_join(
nes_ccf_exp %>% rename(ccf_es=es),
nes_ccf %>% rename(exp_es=es)
nes_exp
)#> Joining, by = "sample"
$cancer <- get_cancer_type(nes_ccf_exp$sample)
nes_ccf_exp%>%
nes_ccf_exp group_by(cancer) %>%
summarise(median_ccf_es=median(ccf_es),
median_exp_es=median(exp_es)) -> summ_ccf_exp
cor.test(summ_ccf_exp$median_ccf_es,summ_ccf_exp$median_exp_es)
#>
#> Pearson's product-moment correlation
#>
#> data: summ_ccf_exp$median_ccf_es and summ_ccf_exp$median_exp_es
#> t = -3.5673, df = 28, p-value = 0.001323
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#> -0.7651667 -0.2488362
#> sample estimates:
#> cor
#> -0.5589928
ggplot(data = summ_ccf_exp,aes(x=median_ccf_es,y=median_exp_es))+
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
geom_point()+
geom_label_repel(aes(label = cancer), size = 3,max.overlaps=35)+
stat_cor(label.y = 0.2,label.x = 0,
aes(label = paste(..r.label.., ..p.label.., sep = "~`,`~")),
size=4)+
theme_prism()+
labs(x="Median CCF ES",y="Median EXP ES")
%>%
nes_ccf_exp group_by(cancer) %>%
summarise(per_escape=mean(exp_es<0 & p_value<0.05),
median_es=median(ccf_es)) -> summ
cor.test(summ$per_escape,summ$median_es)
#>
#> Pearson's product-moment correlation
#>
#> data: summ$per_escape and summ$median_es
#> t = 3.5254, df = 28, p-value = 0.001476
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#> 0.2426552 0.7624262
#> sample estimates:
#> cor
#> 0.5544535
ggplot(data = summ,aes(x=median_es,y=per_escape))+
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
geom_point()+
geom_label_repel(aes(label = cancer), size = 3,max.overlaps=35)+
stat_cor(label.y = 0.12,label.x = 0.1,
aes(label = paste(..r.label.., ..p.label.., sep = "~`,`~")),
size=4)+
theme_prism()+
labs(x="Median CCF ES",y="Escape sample percent")
Neoantigen enrichment score and immune negative selection strength quantification
Here we investigate the connections between neoantigen enrichment score (ESCCF) and immune negative selection strength s using a stochastic branching cancer evolution model as previously described (Lakatos et al., 2020). The detail of simulation process was described in the Method part and code can be found in code/julia/simulation.jl
.
Mutations harbored in at least 5 cells out of 105 were collected at the end of each simulation and the CCF was calculated. To account for imperfect sequencing measurements, CCF values were computed via a simulated sequencing step introducing noise to these frequencies with the indicated read depth (see method):
<- function(depth){
cal_ccf function(ccf){
map_dbl(ccf,function(x,y){rbinom(1,y,x)/y},depth)
}
}<- cal_ccf(depth = 200)
dep_ccf <- list.files("data/merge/")
files <- vector("list",length = 50)
res for (i in 1:50){
<- readRDS(paste0("~/test/data/merge/",files[i]))
s <- s %>%
s mutate(mut_neo=ifelse(mut_neo=="false","no","yes")) %>%
rename(neo=mut_neo)
$ccf_noise <- dep_ccf(s$ccf)
s<- s %>%
s filter(ccf_noise>0)
<- s %>% select(-ccf) %>% rename(ccf=ccf_noise)
s <- cal_nes_warp(s)##calculate ES
tt <- tt
res[[i]] print(paste0("complete ",i))
}<- bind_rows(res)
nes_sim <- nes_sim %>%
nes_sim mutate(selection = substr(sample,3,6))
saveRDS(nes_sim,file = "../data/nes_sim_growth_200_addp.rds")
We can plot the distribution of ES of 100 simulated tumors (read depth 200×) across different selection strength and the precent of significant tumors (FDR adjust p value <0.1):
<- readRDS("../data/nes_sim_growth_200_addp.rds")
nes_sim $s <- as.numeric(nes_sim$selection)
nes_sim<- seq(0,0.24,0.02)
need_s <- nes_sim %>% filter(s %in% need_s)
nes_sim_s ggplot(data=nes_sim_s,aes(x=paste0("-",selection),y=es,fill=selection))+
geom_violin()+
theme_prism()+
theme(axis.text.x = element_text(angle = 45,vjust = 1, hjust = 1))+
geom_hline(yintercept=(-0.023),
color = "red", size=1,linetype = "dashed")+
labs(x="Selection",y="ES")+
guides(fill=F)+
stat_summary(fun="median", geom="point",color="yellow",size=2)
#> Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
#> "none")` instead.
%>%
nes_sim_s group_by(selection) %>%
summarise(adj_p=p.adjust(p,method = "fdr"),sig_per=mean(adj_p<0.1)) -> summ
#> `summarise()` has grouped output by 'selection'. You can override using the
#> `.groups` argument.
<- summ %>% group_by(selection) %>% summarise(per=unique(sig_per))
summ
ggplot(data=summ,aes(x=paste0("-",selection),y=per*100))+
geom_bar(stat = "identity")+
theme_prism()+
theme(axis.text.x = element_text(angle = 45,vjust = 1, hjust = 1))+
labs(x="Selection",y="Percent of significant samples (%)")
ES-CCF show near linear correlation with s values. This analysis suggests that the quantified ES-CCF can be used to infer the immune selection strength in patient. The median ES-CCF in TCGA datasets is -0.023, suggesting a median immune negative selection strength s=-0.08.
Using this simulated data, we can also compare our method with other method. To compare the power of our method with method used in this study (Lakatos et al., 2020), we used two side K-S test to detect the difference between the VAF distribution of all mutations and neoantigen-associated mutations and reported K-S D statistic and corresponding p value:
<- list.files("data/merge/")
files <- vector("list",length = 50)
res
for (i in 1:50){
<- readRDS(paste0("~/test/data/merge/",files[i]))
s <- s %>%
s mutate(mut_neo=ifelse(mut_neo=="false","no","yes")) %>%
rename(neo=mut_neo)
$ccf_noise <- dep_ccf(s$ccf)
s<- s %>%
s filter(ccf_noise>0)
<- s %>% select(-ccf) %>% rename(ccf=ccf_noise)
s
<- vector("list",length = length(unique(s$sample)))
results_ccf names(results_ccf) <- unique(s$sample)
<- makeCluster(getOption("cl.cores", 20),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- s %>% filter(sample == x)
data <- data %>% filter(neo=="yes")
neo <- ks.test(neo$ccf,data$ccf)
ks_t <- data.frame(D=ks_t$statistic,p=ks_t$p.value,sample=x)
ttt return(ttt)
simplify = FALSE)
},stopCluster(cl)
<- bind_rows(results_ccf)
res_ones <- res_ones
res[[i]] print(paste0("complete ",i))
}<- bind_rows(res)
res_ks $selection <- substr(res_ks$sample,3,6)
res_kssaveRDS(res_ks,file = "../data/ks_sim_depth200.rds")
Plot the distribution of KS D statics of 100 simulated tumors (read depth 200×) across different selection strength and the precent of significant tumors (FDR adjust p value <0.1)
<- readRDS("../data/ks_sim_depth200.rds")
res_ks $s <- as.numeric(res_ks$selection)
res_ks<- seq(0,0.24,0.02)
need_s <- res_ks %>% filter(s %in% need_s)
res_ks_s ggplot(data=res_ks_s,aes(x=paste0("-",selection),y=D,fill=selection))+
geom_violin()+
theme_prism()+
theme(axis.text.x = element_text(angle = 45,vjust = 1, hjust = 1))+
labs(x="Selection",y="K-S statistic")+
guides(fill=F)+
stat_summary(fun="median", geom="point",color="yellow",size=1)
#> Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
#> "none")` instead.
%>%
res_ks_s group_by(selection) %>%
summarise(adj_p=p.adjust(p,method = "fdr"),sig_per=mean(adj_p<0.1)) -> summ
#> `summarise()` has grouped output by 'selection'. You can override using the
#> `.groups` argument.
ggplot(data=summ,aes(x=paste0("-",selection),y=sig_per))+
geom_bar(stat = "identity")+
theme_prism()+
theme(axis.text.x = element_text(angle = 45,vjust = 1, hjust = 1))+
labs(x="Selection",y="Percent of significant samples")
Of note, no tumors show significant signal under the same simulated conditions.
To futher compare the power of methods, in addition to sequencing limitations, we also added different proportions of false positive neoantigen: we randomly sampled nonantigenic mutations of simulated tumors (varied between 5 and 500% of the number of true neoantigen) that were falsely labeled as neoantigen. Then we calculated the number of significant samples in each read depth and proportions of false positive neoantigen combination as the power of method to detect selection strength:
###power analysis
##es
<- function(depth){
cal_ccf function(ccf){
map_dbl(ccf,function(x,y){rbinom(1,y,x)/y},depth)
}
}
<- c(0,5,20,50,100,200,500)*0.01
neo_per <- c(50,100,200,500,800,1000,2000,5000,10000)
depth <- vector("list",63)
re
<- expand.grid(neo_per,depth)
com $lable <- paste(com$Var1,com$Var2,sep = "_")
com
names(re) <- com$lable
<- readRDS(paste0("~/test/data/merge/0.24.rds"))
s <- s %>%
s mutate(mut_neo=ifelse(mut_neo=="false","no","yes")) %>%
rename(neo=mut_neo) %>%
mutate(mut_sample_id=paste(sample,mut,sep = "-"))
set.seed(20211024)
for (i in neo_per){
for (j in depth){
<- cal_ccf(depth = j)
dep_ccf <- s %>% filter(neo =="yes")
neo <- neo %>% group_by(sample) %>% summarise(neo_c=n())
neo_summ <- s %>% filter(neo != "yes")
not_neo <- left_join(not_neo,neo_summ)
not_neo
<- not_neo %>%
false_neo group_by(sample) %>%
summarise(sample_id=sample(mut_sample_id,unique(neo_c)*i,replace = F))
<- not_neo %>%
not_neo mutate(neo = ifelse(mut_sample_id %in% false_neo$sample_id,"yes",neo))
<- rbind(neo,not_neo %>% select(-neo_c))
s_new
$ccf_noise <- dep_ccf(s_new$ccf)
s_new<- s_new %>%
s_new filter(ccf_noise>0)
<- s_new %>% select(-ccf) %>% rename(ccf=ccf_noise)
s_new <- cal_nes_warp(s_new)
a paste(i,j,sep = "_")]] <- a
re[[cat(paste0("FP ",i," & depth ",j," complted \n"))
}
}saveRDS(re,file = "../data/sim_growth_power_alt.rds")
##KS
for (i in neo_per){
for (j in depth){
<- cal_ccf(depth = j)
dep_ccf <- s %>% filter(neo =="yes")
neo <- neo %>% group_by(sample) %>% summarise(neo_c=n())
neo_summ <- s %>% filter(neo != "yes")
not_neo <- left_join(not_neo,neo_summ)
not_neo
#false_neo <- sample(not_neo$mut,nrow(neo)*i,replace = F)
<- not_neo %>%
false_neo group_by(sample) %>%
summarise(sample_id=sample(mut_sample_id,unique(neo_c)*i,replace = F))
<- not_neo %>%
not_neo mutate(neo = ifelse(mut_sample_id %in% false_neo$sample_id,"yes",neo))
<- rbind(neo,not_neo %>% select(-neo_c))
s_new
$ccf_noise <- dep_ccf(s_new$ccf)
s_new<- s_new %>%
s_new filter(ccf_noise>0)
<- s_new %>% select(-ccf) %>% rename(ccf=ccf_noise)
s_new <- vector("list",length = length(unique(s_new$sample)))
results_ccf names(results_ccf) <- unique(s_new$sample)
<- makeCluster(getOption("cl.cores", 20),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- s_new %>% filter(sample == x)
data <- data %>% filter(neo=="yes")
neo <- ks.test(neo$ccf,data$ccf)
ks_t <- data.frame(D=ks_t$statistic,p=ks_t$p.value,sample=x)
ttt return(ttt)
simplify = FALSE)
},stopCluster(cl)
<- bind_rows(results_ccf)
res_ones paste(i,j,sep = "_")]] <- res_ones
re[[cat(paste0("FP ",i," & depth ",j," complted \n"))
}
}saveRDS(re,file = "../data/sim_growth_power_alt_ks.rds")
We used heatmap to display the power:
##es
<- readRDS("../data/sim_growth_power_alt.rds")
re <- data.frame(neo_depth=names(re),fre=NA)
results for (i in 1:nrow(results)){
<- re[[i]]
dt $padj <- p.adjust(dt$p,method = "fdr")
dtprint(paste0(names(re)[i]," ",sum(dt$padj < 0.1)))
$fre[i] <- sum(dt$padj < 0.1)
results
}#> [1] "0_50 38"
#> [1] "0.05_50 33"
#> [1] "0.2_50 21"
#> [1] "0.5_50 0"
#> [1] "1_50 7"
#> [1] "2_50 0"
#> [1] "5_50 0"
#> [1] "0_100 83"
#> [1] "0.05_100 73"
#> [1] "0.2_100 63"
#> [1] "0.5_100 44"
#> [1] "1_100 13"
#> [1] "2_100 0"
#> [1] "5_100 4"
#> [1] "0_200 90"
#> [1] "0.05_200 89"
#> [1] "0.2_200 70"
#> [1] "0.5_200 53"
#> [1] "1_200 28"
#> [1] "2_200 12"
#> [1] "5_200 0"
#> [1] "0_500 99"
#> [1] "0.05_500 99"
#> [1] "0.2_500 83"
#> [1] "0.5_500 71"
#> [1] "1_500 51"
#> [1] "2_500 11"
#> [1] "5_500 1"
#> [1] "0_800 98"
#> [1] "0.05_800 97"
#> [1] "0.2_800 92"
#> [1] "0.5_800 79"
#> [1] "1_800 44"
#> [1] "2_800 19"
#> [1] "5_800 1"
#> [1] "0_1000 100"
#> [1] "0.05_1000 99"
#> [1] "0.2_1000 92"
#> [1] "0.5_1000 75"
#> [1] "1_1000 46"
#> [1] "2_1000 34"
#> [1] "5_1000 7"
#> [1] "0_2000 100"
#> [1] "0.05_2000 100"
#> [1] "0.2_2000 97"
#> [1] "0.5_2000 88"
#> [1] "1_2000 70"
#> [1] "2_2000 16"
#> [1] "5_2000 0"
#> [1] "0_5000 100"
#> [1] "0.05_5000 100"
#> [1] "0.2_5000 98"
#> [1] "0.5_5000 95"
#> [1] "1_5000 72"
#> [1] "2_5000 31"
#> [1] "5_5000 7"
#> [1] "0_10000 100"
#> [1] "0.05_10000 100"
#> [1] "0.2_10000 98"
#> [1] "0.5_10000 95"
#> [1] "1_10000 79"
#> [1] "2_10000 48"
#> [1] "5_10000 9"
<- results %>%
results separate(col = neo_depth,into = c("neo_per","depth"),sep = "_")
%>%
results mutate(depth=paste0("depth",depth)) %>%
pivot_wider(names_from = depth,values_from = fre) %>% as.data.frame() -> mat
rownames(mat) <- mat$neo_per
<- mat %>% select(-neo_per)
mat <-c("#551A8B", "#8B3A62", "#FFFF00")
cols
colnames(mat) <- gsub("depth","",colnames(mat))
rownames(mat) <- as.character(100*as.numeric(rownames(mat)))
Heatmap(mat,cluster_columns = F,cluster_rows = F,row_names_side = "left",
name = "Power(%)",col = cols,
row_title = "Ratio of false positive \n neoantigens added (%)",
column_title = "Read depth",column_names_rot = 0,column_title_side = "bottom")
#> Warning: The input is a data frame, convert it to the matrix.
##ks
<- readRDS("../data/sim_growth_power_alt_ks.rds")
re <- data.frame(neo_depth=names(re),fre=NA)
results for (i in 1:nrow(results)){
<- re[[i]]
dt $padj <- p.adjust(dt$p,method = "fdr")
dtprint(paste0(names(re)[i]," ",sum(dt$padj < 0.1)))
$fre[i] <- sum(dt$padj < 0.1)
results
}#> [1] "0_50 0"
#> [1] "0.05_50 0"
#> [1] "0.2_50 0"
#> [1] "0.5_50 0"
#> [1] "1_50 0"
#> [1] "2_50 0"
#> [1] "5_50 0"
#> [1] "0_100 0"
#> [1] "0.05_100 0"
#> [1] "0.2_100 0"
#> [1] "0.5_100 0"
#> [1] "1_100 0"
#> [1] "2_100 0"
#> [1] "5_100 0"
#> [1] "0_200 0"
#> [1] "0.05_200 0"
#> [1] "0.2_200 0"
#> [1] "0.5_200 0"
#> [1] "1_200 0"
#> [1] "2_200 0"
#> [1] "5_200 0"
#> [1] "0_500 66"
#> [1] "0.05_500 62"
#> [1] "0.2_500 30"
#> [1] "0.5_500 16"
#> [1] "1_500 1"
#> [1] "2_500 0"
#> [1] "5_500 0"
#> [1] "0_800 95"
#> [1] "0.05_800 85"
#> [1] "0.2_800 68"
#> [1] "0.5_800 47"
#> [1] "1_800 13"
#> [1] "2_800 0"
#> [1] "5_800 0"
#> [1] "0_1000 97"
#> [1] "0.05_1000 97"
#> [1] "0.2_1000 95"
#> [1] "0.5_1000 55"
#> [1] "1_1000 19"
#> [1] "2_1000 0"
#> [1] "5_1000 0"
#> [1] "0_2000 100"
#> [1] "0.05_2000 100"
#> [1] "0.2_2000 99"
#> [1] "0.5_2000 96"
#> [1] "1_2000 73"
#> [1] "2_2000 0"
#> [1] "5_2000 0"
#> [1] "0_5000 100"
#> [1] "0.05_5000 100"
#> [1] "0.2_5000 100"
#> [1] "0.5_5000 100"
#> [1] "1_5000 98"
#> [1] "2_5000 65"
#> [1] "5_5000 0"
#> [1] "0_10000 100"
#> [1] "0.05_10000 100"
#> [1] "0.2_10000 100"
#> [1] "0.5_10000 100"
#> [1] "1_10000 100"
#> [1] "2_10000 96"
#> [1] "5_10000 0"
<- results %>%
results separate(col = neo_depth,into = c("neo_per","depth"),sep = "_")
%>%
results mutate(depth=paste0("depth",depth)) %>%
pivot_wider(names_from = depth,values_from = fre) %>% as.data.frame() -> mat
rownames(mat) <- mat$neo_per
<- mat %>% select(-neo_per)
mat <-c("#551A8B", "#8B3A62", "#FFFF00")
cols
colnames(mat) <- gsub("depth","",colnames(mat))
rownames(mat) <- as.character(100*as.numeric(rownames(mat)))
Heatmap(mat,cluster_columns = F,cluster_rows = F,row_names_side = "left",
name = "Power(%)",col = cols,
row_title = "Ratio of false positive \n neoantigens added (%)",
column_title = "Read depth",column_names_rot = 0,column_title_side = "bottom")
#> Warning: The input is a data frame, convert it to the matrix.
Pan-cancer features and correlations of immunoediting signal
Then we compared the immune cell infiltration level between tumors with detectable immunoediting-elimination signal (ESCCF<0, P<0.05) or immunoediting-escape signal (ESRNA<0, P<0.05) and other samples:
<- readRDS("../data/pancancer_subtcells.rds")
pancancer_subtcells <- readRDS("../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds")
nes_ccf <- readRDS("../data/es_exp_filter_driver.rds")
nes_exp
<- left_join(
ccf_immune
nes_ccf,
pancancer_subtcells%>%
) mutate(es_type=ifelse(es<0 & p <0.05,"yes","no"))
#> Joining, by = "sample"
<- left_join(
exp_immune
nes_exp ,
pancancer_subtcells%>%
) mutate(es_type=ifelse(es<0 & p_value <0.05,"yes","no"))
#> Joining, by = "sample"
<- ggplot(data=ccf_immune,aes(x=es_type,y=CD8_T+NK))+
p1 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Elimination",y="CD8 T + NK")
<- ggplot(data=ccf_immune,aes(x=es_type,y=iTreg+nTreg))+
p2 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Elimination",y="iTreg + nTreg")
+ p2
p1 #> Warning: Removed 70 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 70 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 70 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 70 rows containing non-finite values (stat_compare_means).
<- ggplot(data=exp_immune,aes(x=es_type,y=CD8_T+NK))+
p3 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Escape",y="CD8 T + NK")
<- ggplot(data=exp_immune,aes(x=es_type,y=iTreg+nTreg))+
p4 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Escape",y="iTreg + nTreg")
+ p4
p3 #> Warning: Removed 92 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 92 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 92 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 92 rows containing non-finite values (stat_compare_means).
In tumors with detectable immunoediting-elimination signal, a slightly increased CD8+ T cell infiltration status compared with the remaining samples were observed, but the difference does not reach statistical significance (P=0.2). This data suggests that historically happened immunoediting-elimination process may not be reflected in the current immune cell infiltration status.
The immunoediting-escape signal quantified as ES-RNA also do not show statistically significant difference between samples with detectable immunoediting-escape signal and the remaining samples in CD8+ T cell infiltration. However we observe a significant up-regulated regulatory T cell (Treg) percentage in samples with detectable immunoediting-escape signal, and up-regulated Treg has been reported to stimulate tumor immune escape.
Immunotherapy dataset analysis
Quantified immunoediting-elimination signal predicts the clinical response of cancer immunotherapy
To investigate the predictive performance of the quantified immunoediting-elimination signal (ESCCF) in ICI response prediction for individual patient, we searched for public ICI datasets with raw WES data and RNA-seq data available, and three melanoma ICI datasets have been identified. Then we calculated the immunoediting-elimination signal (ES-CCF) for each patient:
<- readRDS("../data/Immunotherapy/all_mut_ccf_ici.rds")
all_mut_ccf_ici <- all_mut_ccf_ici %>%
all_mut_ccf_ici rename(ccf=cancer_cell_frac) %>%
mutate(neo=ifelse(neo=="neo","yes","no"))
<- all_mut_ccf_ici %>% filter(ccf<0.6) %>% select(sample) %>%
samples_has_subclonal distinct(sample)
%>% filter(sample %in% samples_has_subclonal$sample) %>%
all_mut_ccf_ici group_by(sample) %>%
summarise(c_n=sum(neo=="yes"),c_m=sum(neo=="no")) %>% filter(c_n>=1 & c_m >=1) -> summ
<- all_mut_ccf_ici %>% filter(sample %in% summ$sample)
neo_missense <- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 18),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- dt %>% filter(sample == x)
data <- NeoEnrichment::cal_nes_new_test(dt = data,
a sample_counts = 1000,
need_p = FALSE)
return(a)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- neo_missense %>% select(sample,neo,ccf) %>% filter(!is.na(ccf))
neo_missense <- cal_nes_warp(neo_missense)
neo_nes $es <- round(neo_nes$es,digits = 3)
neo_nessaveRDS(neo_nes,file = "../data/Immunotherapy/nes_immunetherapy.rds")
In univariate Cox proportional hazards regression analysis, quantified ESCCF value is significantly associated with cancer patients’ survival (p=0.03), and low ESCCF value (suggest the presence of high immunoediting-elimination signal) is associated with improved ICI clinical response (Hazard ratio (HR)=3.74, 95%CI=1.11-12.6) :
<- readRDS("../data/Immunotherapy/all_clinical.rds")
all_clinical
<- readRDS("../data/Immunotherapy/nes_immunetherapy.rds")
neo_nes <- left_join(neo_nes,all_clinical,by="sample")
neo_nes <- neo_nes %>% filter(!is.na(response2))
neo_nes
##cox analysis
<- show_forest(neo_nes,covariates = "es",time = "OS.time",status = "OS")
p1 #> => Processing variable es
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> Resized limits to included dashed line in forest panel
p1
Patients are divided into three groups based on ES-CCF value cutoff determined by surv_cutpoint
function, patients with lowest ES-CCF values (indicate the presence of immunoediting-elimination signal) show the best survival after ICI:
<- readRDS("../data/Immunotherapy/all_mut_ccf_ici.rds")
all_mut_ccf_ici <- all_mut_ccf_ici %>%
all_mut_ccf_ici rename(ccf=cancer_cell_frac) %>%
mutate(neo=ifelse(neo=="neo","yes","no"))
surv_cutpoint(neo_nes,"OS.time","OS","es")
#> cutpoint statistic
#> es -0.222 2.443914
<- neo_nes %>%
neo_nes mutate(es_type = case_when(
<= (-0.222) ~ "low",
es > (-0.222) ~ "high"
es
))
<- neo_nes %>% filter(es_type=="low")
low <- neo_nes %>% filter(es_type=="high")
high <- all_clinical %>%
all_clinical filter(sample %in% all_mut_ccf_ici$sample) %>%
mutate(type=case_when(
%in% low$sample ~ "low",
sample %in% high$sample ~ "high",
sample TRUE ~ "others"
))<- show_km(all_clinical,"type")
p3 p3
we use logistic regression to compare the efficiency of ESCCF, TMB and neoantigenic mutation count in predicting immunotherapy clinical response. Relationship between prognosis (patients with clinical response or without clinical response) and ESCCF, TMB and neoantigenic mutation count was analyzed. The goodness of fit was performed by Hosmer–Lemeshow test (H-L test):
<- neo_nes %>%
neo_nes mutate(obj=ifelse(response2=="response",1,0))
<- glm(obj ~ es, data = neo_nes, family = "binomial")
model summary(model)
#>
#> Call:
#> glm(formula = obj ~ es, family = "binomial", data = neo_nes)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.1562 -0.8402 -0.7737 1.4243 1.7095
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.8676 0.2247 -3.862 0.000113 ***
#> es -1.1293 0.9067 -1.246 0.212942
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 118.34 on 96 degrees of freedom
#> Residual deviance: 116.79 on 95 degrees of freedom
#> AIC: 120.79
#>
#> Number of Fisher Scoring iterations: 4
::performance_hosmer(model)
performance#> # Hosmer-Lemeshow Goodness-of-Fit Test
#>
#> Chi-squared: 4.874
#> df: 8
#> p-value: 0.771
#> Summary: model seems to fit well.
%>%
all_mut_ccf_ici group_by(sample) %>%
summarise(tmb=n()/38,nb=sum(neo=="yes")/38) -> tmb_nb
<- left_join(neo_nes,tmb_nb)
neo_nes #> Joining, by = "sample"
<- glm(obj ~ tmb, data = neo_nes, family = "binomial")
model2 summary(model2)
#>
#> Call:
#> glm(formula = obj ~ tmb, family = "binomial", data = neo_nes)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.5311 -0.7640 -0.7319 1.3947 1.7305
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -1.30753 0.29590 -4.419 9.92e-06 ***
#> tmb 0.06035 0.02584 2.335 0.0195 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 118.34 on 96 degrees of freedom
#> Residual deviance: 110.98 on 95 degrees of freedom
#> AIC: 114.98
#>
#> Number of Fisher Scoring iterations: 4
::performance_hosmer(model2)
performance#> # Hosmer-Lemeshow Goodness-of-Fit Test
#>
#> Chi-squared: 15.447
#> df: 8
#> p-value: 0.051
#> Summary: model seems to fit well.
<- glm(obj ~ nb, data = neo_nes, family = "binomial")
model3 summary(model3)
#>
#> Call:
#> glm(formula = obj ~ nb, family = "binomial", data = neo_nes)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.5523 -0.7890 -0.7279 1.1717 1.7284
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -1.2858 0.3006 -4.278 1.89e-05 ***
#> nb 0.8816 0.4001 2.203 0.0276 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 118.34 on 96 degrees of freedom
#> Residual deviance: 112.47 on 95 degrees of freedom
#> AIC: 116.47
#>
#> Number of Fisher Scoring iterations: 4
::performance_hosmer(model3)
performance#> # Hosmer-Lemeshow Goodness-of-Fit Test
#>
#> Chi-squared: 9.389
#> df: 8
#> p-value: 0.311
#> Summary: model seems to fit well.
<- ggplot(neo_nes, aes(x=es, y=obj)) +
p4 geom_point(alpha=.5) +
stat_smooth(method="glm", se=FALSE, fullrange=TRUE,
method.args = list(family=binomial),color="red")+
theme_bw()+
theme(axis.title.y = element_blank())+
labs(x="ES",title = "H-L test P value = 0.771")
<- ggplot(neo_nes, aes(x=tmb, y=obj)) +
p5 geom_point(alpha=.5) +
stat_smooth(method="glm", se=FALSE, fullrange=TRUE,
method.args = list(family=binomial),color="red")+
theme_bw()+
theme(axis.title.y = element_blank())+
labs(x="TMB",title = "H-L test P value = 0.051")
<- ggplot(neo_nes, aes(x=nb, y=obj)) +
p6 geom_point(alpha=.5) +
stat_smooth(method="glm", se=FALSE, fullrange=TRUE,
method.args = list(family=binomial),color="red")+
theme_bw()+
theme(axis.title.y = element_blank())+
labs(x="Neoantigen burden",title = "H-L test P value = 0.311")
<- p4 + p5 + p6
p7
p7#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
The H-L test P-value of TMB is 0.051, close to 0.05, implicate the difference between prediction and expectation is close to significant. The H-L test P-value of ESCCF is 0.771 , suggesting that the quantified immunoediting-elimination signal can be biomarker for ICI clinical response prediction.
ICI clinical response are known to be influenced by variables like gender and drug type, so we performed multivariate Cox regression including treament (Nivolumab or Pembrolizumab) and gender as covariate:
<- readRDS("../data/Immunotherapy/all_treatment.rds")
all_tre <- left_join(
neo_nes
neo_nes,all_tre
)#> Joining, by = "sample"
show_forest(neo_nes,covariates = c("es"),
time = "OS.time",status = "OS",
controls = "Treatment")
#> => Processing variable es
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
show_forest(neo_nes,covariates = c("es"),
time = "OS.time",status = "OS",
controls = "Gender")
#> => Processing variable es
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
Based on this analysis, the effects of ESCCF is not influence by ICI type and gender, however more samples are needed to further validated the clinical effects of ESCCF in ICI clinical response prediction.
Supplementary analyses
Sample statistics and clinical features
Number of the patient with CCF information and number of patient with at least one neoantigenic mutation and subclonal mutation (CCF<0.6) are shown for each cancer type and number of the patient with mRNA expression information and number of patient with at least one neoantigenic mutation and accompanied mRNA expression information are shown for each cancer type:
###sample statistics
<- readRDS("../data/all_mut_ccf_tpm.rds")
all_mut_ccf <- all_mut_ccf %>%
all_mut_ccf rename(ccf=ccf_hat) %>%
mutate(neo=ifelse(neo=="neo","yes","no"))
<- all_mut_ccf %>% filter(ccf<0.6) %>% select(sample) %>%
samples_has_subclonal distinct(sample)
%>%
all_mut_ccf group_by(sample) %>%
summarise(c_n=sum(neo=="yes"),c_m=sum(neo=="no")) %>%
mutate(type=ifelse(sample %in% samples_has_subclonal$sample,"yes","no")) -> sample_summ
$cancer <- get_cancer_type(sample_summ$sample)
sample_summ%>%
sample_summ group_by(cancer) %>%
summarise(sample_counts=n(),
sample_with_neoantigen_counts=sum(c_n>=1 & c_m >=1 & type=="yes")) -> cancer_summ
<- cancer_summ %>% arrange(sample_counts)
cancer_summ $cancer <- factor(cancer_summ$cancer,levels = cancer_summ$cancer)
cancer_summ
%>%
cancer_summ ::rename(`All samples`=sample_counts,
dplyr`Samples with >= 1 neoantigen and have subclonal mutations`=sample_with_neoantigen_counts) %>%
pivot_longer(cols = c("All samples","Samples with >= 1 neoantigen and have subclonal mutations"),
names_to="type",
values_to="counts") -> cancer_summ_longer
<- pyramid_chart(data = cancer_summ_longer, x = cancer, y = counts, group = type)
p1 #> Warning: `expand_scale()` is deprecated; use `expansion()` instead.
#> Warning: `expand_scale()` is deprecated; use `expansion()` instead.
#> Warning: `expand_scale()` is deprecated; use `expansion()` instead.
#> Warning: `expand_scale()` is deprecated; use `expansion()` instead.
<- readRDS("../data/all_mut_tpm_not_filter.rds")
all_mut_exp %>%
all_mut_exp group_by(sample) %>%
summarise(n_a=sum(neo=="not_neo"),n_c=sum(neo=="neo")) -> sample_summ
$cancer <- get_cancer_type(sample_summ$sample)
sample_summ%>%
sample_summ group_by(cancer) %>%
summarise(sample_counts=n(),
sample_with_neoantigen_counts=sum(n_a>=1 & n_c >=1)) -> cancer_summ
<- cancer_summ %>% arrange(sample_counts)
cancer_summ $cancer <- factor(cancer_summ$cancer,levels = cancer_summ$cancer)
cancer_summ
%>%
cancer_summ ::rename(`All samples`=sample_counts,
dplyr`Samples with >= 1 neoantigen`=sample_with_neoantigen_counts) %>%
pivot_longer(cols = c("All samples","Samples with >= 1 neoantigen"),
names_to="type",
values_to="counts") -> cancer_summ_longer
<- pyramid_chart(data = cancer_summ_longer, x = cancer, y = counts, group = type)
p2 #> Warning: `expand_scale()` is deprecated; use `expansion()` instead.
#> Warning: `expand_scale()` is deprecated; use `expansion()` instead.
#> Warning: `expand_scale()` is deprecated; use `expansion()` instead.
#> Warning: `expand_scale()` is deprecated; use `expansion()` instead.
p1
p2
We also calculated the significant sample in each cancer type:
<- readRDS("../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds") %>%
pancancer_ccf mutate(es_type=ifelse(es<0 & p <0.05,"sig_neg","others"))
<- readRDS("../data/es_exp_filter_driver.rds") %>%
pancancer_exp mutate(es_type=ifelse(es<0 & p_value <0.05,"sig_neg","others"))
$cancer <- get_cancer_type(pancancer_ccf$sample)
pancancer_ccf$cancer <- get_cancer_type(pancancer_exp$sample)
pancancer_exp
<- function(dt,total_sig_sample){
get_plot <- dt %>%
sig filter(es_type=="sig_neg") %>%
::group_by(cancer) %>%
dplyrsummarise(counts=n()) %>%
arrange(counts) %>%
mutate(cancer=factor(cancer,levels = cancer)) %>%
mutate(`Sample proportion`= counts/total_sig_sample) %>%
mutate(label=paste("frac(",counts,",",total_sig_sample,")",sep = ""))
<- ggplot(data=sig,aes(x=cancer,y=`Sample proportion`))+
p1 geom_bar(mapping = aes(x=cancer,y=`Sample proportion`),stat = "identity")+
theme_classic()+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))+
scale_y_continuous(expand = expansion(mult = c(0, .1)))+##remove blank in the bottom
theme(axis.title.x=element_blank())+
geom_text(aes(label=label), position=position_dodge(width=0.9), vjust=-0.25,size=2,parse=TRUE)
return(p1)
}
get_plot(pancancer_ccf,sum(pancancer_ccf$es_type=="sig_neg"))
get_plot(pancancer_exp,sum(pancancer_exp$es_type=="sig_neg"))
Pan-cancer sample distribution in pie plot:
<- pancancer_ccf %>%
pancancer_ccf mutate(es_type2=ifelse(es<0 & p<0.05,"ES < 0 & P < 0.05","Others"))
<- pancancer_exp %>%
pancancer_exp mutate(es_type2=ifelse(es<0 & p_value<0.05,"ES < 0 & P < 0.05","Others"))
<- pancancer_ccf$es
ccf_es <- pancancer_ccf$es_type2
ccf_es1 par(mfrow=c(4,1))
<- par(mar = rep(0, 4))
op plot_pie(ccf_es,expression = "ccf_es<0",c("ES (CCF) < 0","ES (CCF) >= 0"))
#> NULL
plot_pie(ccf_es1,expression = "ccf_es1=='ES < 0 & P < 0.05'",c("ES (CCF) < 0 & P < 0.05","Others"))
#> NULL
<- pancancer_exp$es
exp_es <- pancancer_exp$es_type2
exp_es1 plot_pie(exp_es,expression = "exp_es<0",c("ES (EXP) < 0","ES (EXP) >= 0"))
#> NULL
plot_pie(exp_es1,expression = "exp_es1=='ES < 0 & P < 0.05'",c("ES (EXP) < 0 & P < 0.05","Others"))
#> NULL
The immune-thearpy dataset sample distribution and clinical parameters:
<- readRDS("../data/Immunotherapy/all_mut_ccf_ici.rds")
all_sample_exp_ccf <- all_sample_exp_ccf %>% filter(!is.na(cancer_cell_frac))
ccf <- all_sample_exp_ccf %>% filter(!is.na(exp))
exp
<- union(names(table(ccf$sample)),names(table(exp$sample)))
all_sample <- intersect(names(table(ccf$sample)),names(table(exp$sample)))
both <- setdiff(names(table(ccf$sample)),names(table(exp$sample)))
WES <- setdiff(names(table(exp$sample)),names(table(ccf$sample)))
RNA <- readRDS("../data/Immunotherapy/all_clinical.rds")
all_clinical
<- all_clinical %>% filter(response2=="response")
response <- all_clinical %>% filter(response2!="response")
no_response
<- read_csv("../data/Immunotherapy/willy_clinical.csv") %>%
willy_tre select(`Patient ID`,Treatment,Gender,Age) %>%
rename(sample=`Patient ID`) %>% mutate(sample=paste("willy_",sample,sep = "")) %>% na.omit()
##nadeem Nivolumab
<- data.frame(sample=all_sample[grepl("nadeem_",all_sample)],Treatment="Nivolumab",
nadeem_tre Gender=NA,Age=NA,stringsAsFactors = F)
<- readRDS("../data/Immunotherapy/liu_clinical.rds") %>%
liu_clinical select(X,Tx,`gender..Male.1..Female.0.`) %>%
mutate(Gender=ifelse(`gender..Male.1..Female.0.`==1,"M","F")) %>%
select(-`gender..Male.1..Female.0.`) %>% mutate(Age=NA) %>%
rename(sample=X,Treatment=Tx) %>% mutate(sample=paste0("liu_",sample)) %>% filter(grepl("_Patient",sample))
<- bind_rows(list(willy_tre,nadeem_tre,liu_clinical))
all_tre saveRDS(all_tre,file = "../data/Immunotherapy/all_treatment.rds")
<- data.frame(sample=all_sample,
dt stringsAsFactors = F) %>%
mutate(`Data type`= case_when(
%in% both ~ "Both",
sample %in% WES ~ "WES",
sample %in% RNA ~ "RNA-Seq"
sample %>%
)) mutate(Response=ifelse(sample %in% response$sample,"Responder","Non-Responder")) %>%
mutate(cohort=gsub("_.*","",sample)) %>%
left_join(.,all_tre,by="sample")
<- dt %>% arrange(sample)
dt
<- left_join(dt,all_clinical,by="sample")
dt_cli
<- RColorBrewer::brewer.pal(3,"Accent")
cohort_col <- RColorBrewer::brewer.pal(3,"Dark2")
data_type_col <- RColorBrewer::brewer.pal(3,"Set1")[1:2]
Response_col <- RColorBrewer::brewer.pal(3,"Paired")
Treatment_col = HeatmapAnnotation(
ha Cohort=dt$cohort,
`Data type` = dt$`Data type`,
Response = dt$Response,
Treatment=dt$Treatment,
col = list(Cohort=c("liu"=cohort_col[1],"nadeem"=cohort_col[2],"willy"=cohort_col[3]),
`Data type` = c("WES" =data_type_col[1], "RNA-Seq" = data_type_col[2],"Both"=data_type_col[3]),
Response = c("Responder"=Response_col[1],"Non-Responder"=Response_col[2]),
Treatment=c("Nivolumab"=Treatment_col[1],"Pembrolizumab"=Treatment_col[2])),
annotation_name_side="left",height = unit(300, "cm"),width = unit(20, "cm")
)
= Legend(labels = c("liu","nadeem","willy"), legend_gp = gpar(fill=c(cohort_col[1],cohort_col[2],cohort_col[3])),
lgd1 title = "Cohort", ncol = 3)
= Legend(labels = c("WES","RNA-Seq","Both"), legend_gp = gpar(fill=c(data_type_col[1],data_type_col[2],data_type_col[3])),
lgd2 title = "Data type", ncol = 3)
= Legend(labels = c("Responder","Non-Responder"), legend_gp = gpar(fill=c(Response_col[1],Response_col[2])),
lgd3 title = "Response", ncol = 2)
= Legend(labels = c("Nivolumab","Pembrolizumab"), legend_gp = gpar(fill=c(Treatment_col[1],Treatment_col[2],Treatment_col[3])),
lgd4 title = "Treatment", ncol = 2)
draw(ha, test = T)
= packLegend(lgd1, lgd2,lgd3,lgd4)
pd draw(pd,x = unit(14, "cm"), y = unit(6, "cm"), just = c("left", "bottom"))
::include_graphics("sample_statistics.png") knitr
<- left_join(all_sample_exp_ccf,dt_cli,by="sample")
all_sample_exp_ccf %>% group_by(sample) %>% summarise(TMB=log((n()/38)+1)) %>%
all_sample_exp_ccf left_join(.,dt_cli,by="sample") %>% filter(!is.na(cohort)) -> all_tmb
%>% group_by(cohort) %>% summarise(tmb_median=median(TMB),tmb_min=min(TMB),max_tmb=max(TMB))
all_tmb %>% filter(!is.na(cohort)) %>%
all_sample_exp_ccf group_by(sample) %>% summarise(neo_counts=sum(neo=="neo")) %>%
left_join(.,dt_cli,by="sample")-> all_neo
%>% group_by(cohort) %>% summarise(median_neo=median(neo_counts))
all_neo ########表格####
<- tibble(
gt_input cohort=c("Liu et.al \n (2019)","Hogo et.al \n (2016)","Riaz et.al \n (2017)"),
`Tumor type`=c("melanoma","melanoma","melanoma"),
Treatment=c("monotherapy","monotherapy","monotherapy"),
Strategy=c("WES RNA-seq","WES RNA-seq","WES RNA-seq"),
Patients=c(130,37,56),
`Number of men(%)`=c("77(59.2%)","26(70.3%)",NA),
`Number of women(%)`=c("53(40.8%)","11(29.7%)",NA),
`Median age, years`=c(NA,61,NA),
`Median survial time(OS,month)`=c(19.4,18.3,17.4),
`TMB median`=c(1.42,2.49,1.86),
`Neoantigen counts median`=c(28.5,77,52)
)
<- gt(data = gt_input)
gt_tbl %>%
gt_input gt(rowname_col = "cohort") %>%
cols_align(
align = "center"
-> gt_tbl
) ::grid.table(gt_input) gridExtra
::include_graphics("sample_table.png") knitr
#####oncoprint############
##liu
<- readRDS("../data/Immunotherapy/liu_mutations.rds")
liu_mutations <- liu_mutations %>% mutate(change=str_extract(cDNA_Change,"[A-Z]>[A-Z]")) %>%
liu_mutations mutate(Tumor_Seq_Allele2=gsub(">","",str_extract(change,">[A-Z]"))) %>%
mutate(Reference_Allele=gsub(">","",str_extract(change,"[A-Z]>"))) %>%
rename(Tumor_Sample_Barcode=Patient)
<- readRDS("../data/Immunotherapy/liu_clinical.rds") %>%
liu_clinical filter(grepl("Patient",X)) %>%
select(X,OS,dead) %>%
rename(Tumor_Sample_Barcode=X,Overall_Survival_time=OS,Overall_Survival_Status=dead)
<- read.maf(maf = liu_mutations,
liu clinicalData = liu_clinical,
verbose = FALSE)
oncoplot(maf = liu,top=10,draw_titv = TRUE,showTitle = FALSE,removeNonMutated=T)
#> Warning in titv(maf = maf, useSyn = TRUE, plot = FALSE): Non standard Ti/Tv
#> class: 1540TRUE
##willy
<- list.files("../data/Immunotherapy/willy/",full.names = T)
files <- merge_mafs(mafs = files)
willy_maf #> Merging 38 MAF files
#> -Validating
#> -Silent variants: 225813
#> -Summarizing
#> --Possible FLAGS among top ten genes:
#> TTN
#> MUC16
#> FLG
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 5.520s elapsed (1.690s cpu)
oncoplot(maf = willy_maf,top=10,draw_titv = TRUE,showTitle = FALSE,removeNonMutated=T)
###nadeem
<- list.files("../data/Immunotherapy/nadeem/",full.names = T)
files <- merge_mafs(mafs=files)
nadeem_maf #> Merging 66 MAF files
#> -Validating
#> -Silent variants: 84078
#> -Summarizing
#> --Possible FLAGS among top ten genes:
#> TTN
#> MUC16
#> OBSCN
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 1.580s elapsed (0.610s cpu)
oncoplot(maf = nadeem_maf,top=10,draw_titv = TRUE,showTitle = FALSE,removeNonMutated=T)
Robust of method
Minimum number of non-neoantigenic mutations to calculate significant P
The calculated sample-level specific P values are dependent on the mutation rank, and also the number of total mutations and the number of antigenic mutations. So we done the simulation analysis to get the minimum number of antigenic and non-antigenic mutations to get confident quantification of ES. For each number of neoantigenic mutation, we put the neoantigenic mutations in the position of CCF interval or TPM expression with the smallest rank, and put the non-neoantigenic mutations in the position of CCF interval or TPM expression with the largest rank. Then we can get the minimum number of non-neoantigenic mutations to calculate significant p values in each number of neoantigenic mutation.
##CCF
set.seed(202202181)
<- vector("list",10)
neo_mut for (i in 1:10){
<- i
neo_c1 <- 0
mut_c1 <- 1
j <- data.frame(sample=rep("sim",neo_c1+mut_c1),
dt neo=c(rep("yes",neo_c1),rep("no",mut_c1)),
ccf=c(rep(0.005,neo_c1),rep(1,mut_c1)))
<- NeoEnrichment::cal_nes_new_test(dt = dt,
a sample_counts = 1000,
need_p = TRUE)
<- data.frame(neo_c=neo_c1,mut_c=mut_c1,es=a$es,p=a$p)
dt_current while (a$p >= 0.05){
<- mut_c1 + 1
mut_c1 <- j + 1
j <- data.frame(sample=rep("sim",neo_c1+mut_c1),
dt neo=c(rep("yes",neo_c1),rep("no",mut_c1)),
ccf=c(rep(0.005,neo_c1),rep(1,mut_c1)))
<- NeoEnrichment::cal_nes_new_test(dt = dt,
a sample_counts = 1000,
need_p = TRUE)
<- c(neo_c1,mut_c1,a$es,a$p)
dt_current[j,] print(dt_current)
}<- dt_current
neo_mut[[i]]
}
<- bind_rows(neo_mut)
neo_mut saveRDS(neo_mut,file = "../data/ccf_neo_mut_counts.rds")
###exp
<- vector("list",10)
neo_mut_exp for (i in 1:10){
<- i
neo_c1 <- 0
mut_c1 <- 1
j <- data.frame(sample=rep("sim",neo_c1+mut_c1),
dt neo=c(rep("neo",neo_c1),rep("not_neo",mut_c1))) %>% mutate(exp=row_number())
<- NeoEnrichment::cales_t(data = dt,barcode = "sim",type = "II",
a calp = TRUE,sample_counts = 1000,
cal_type = "exp")
<- data.frame(neo_c=neo_c1,mut_c=mut_c1,es=a$es,p=a$p_value)
dt_current while (a$p_value >= 0.05){
<- mut_c1 + 1
mut_c1 <- j + 1
j <- data.frame(sample=rep("sim",neo_c1+mut_c1),
dt neo=c(rep("neo",neo_c1),rep("not_neo",mut_c1))) %>%
mutate(exp=row_number())
<- NeoEnrichment::cales_t(data = dt,barcode = "sim",type = "II",
a calp = TRUE,sample_counts = 1000,
cal_type = "exp")
<- c(neo_c1,mut_c1,a$es,a$p_value)
dt_current[j,] print(dt_current)
}<- dt_current
neo_mut_exp[[i]]
}<- bind_rows(neo_mut_exp)
neo_mut_exp saveRDS(neo_mut_exp,file = "../data/exp_neo_mut_counts.rds")
<- readRDS("../data/ccf_neo_mut_counts.rds")
neo_mut_ccf <- neo_mut_ccf %>% filter(p<0.05)
neo_mut_sig <- neo_mut_sig %>%
neo_mut_sig mutate(all_mut_counts=neo_c+mut_c)
<- ggplot(data=neo_mut_sig,aes(x=neo_c,y=all_mut_counts))+
p1 geom_bar(stat = "identity")+
theme_prism()+
labs(x="neoantigen counts",y="all mutation counts")+
scale_x_continuous(breaks=neo_mut_sig$neo_c, labels = neo_mut_sig$neo_c)
<- readRDS("../data/exp_neo_mut_counts.rds")
neo_mut_exp <- neo_mut_exp %>% filter(p<0.05)
neo_mut_exp_sig <- neo_mut_exp_sig %>%
neo_mut_sig mutate(all_mut_counts=neo_c+mut_c)
<- ggplot(data=neo_mut_sig,aes(x=neo_c,y=all_mut_counts))+
p2 geom_bar(stat = "identity")+
theme_prism()+
labs(x="neoantigen counts",y="all mutation counts")+
scale_x_continuous(breaks=neo_mut_sig$neo_c, labels = neo_mut_sig$neo_c)
/p2 p1
For example, if a patient sample only have one neoantigenic mutations, then at least 20 total mutations are required for reliable quantification of the immunoediting signal. This is a bottom line requirement, and indicates that any samples with 1 neoantigenic mutation and less than 20 total mutations will not have the chance to have significant (P<0.05) immunoediting signal.
Threshold for neoantigen prediction
We used the less stringent threshold: IC50 < 500 and %Rank < 2 and TPM > 1 (TPM cutoff only for calculating ESccf) to filter neoantigens. This increases the precent of antigenic mutatoins from 9% to 19% for dataset used to calculate ESccf and from 16% to 31% for dataset used to calculated ESrna:
<- list.files("~/test/data/2021_03_31/")
files <- vector("list",100)
re
for (i in 1:100){
<- readRDS(paste("~/test/data/2021_03_31/",files[i],sep = ""))
mut <- mut %>%
neo filter(novelty == 1) %>%
filter(IC50 < 500 & bindlevel=="WB" & novelty==1 ) %>%
distinct(index,.keep_all = T)
<- mut %>%
mut distinct(index,.keep_all = T) %>%
mutate(neo = ifelse(index %in% neo$index , "neo","not_neo"))
<- mut
re[[i]]
}
<- bind_rows(re)
all_mut <- all_mut %>%
all_mut select(sample,chr,position,ref,alt,neo,gene,exp)
<- all_mut %>%
all_mut mutate(gene=gsub("\\:.+","",gene))
saveRDS(all_mut,file = "../data/all_mut_rank2_IC50.rds")
##add tmp
<- readRDS("~/useful_data/xena_RSEM_TPM/tpm_trans.rds")
tpm <- tpm[!duplicated(tpm$gene),]
tpm $tpm_exp <- mapply(function(sample,gene){
all_mut$gene==gene,substr(sample,1,15)]
tpm[tpm$sample,all_mut$gene)
},all_mut<- all_mut %>%
all_mut1 filter(lengths(tpm_exp)!=0)
$tpm_exp <- as.numeric(all_mut1$tpm_exp)
all_mut1saveRDS(all_mut1,file = "data/all_mut_tpm_not_filter_Rank2_IC50.rds")##this file is used to calculte EXP-ES
<- all_mut1 %>%
all_mut1 mutate(neo2=ifelse(neo=="neo" & tpm_exp>1,"neo","not_neo"))
<- all_mut1 %>%
all_mut1 select(-neo,-exp) %>%
rename(neo=neo2)
saveRDS(all_mut1,file = "data/all_mut_tpm_Rank2_IC50.rds")
##add ccf
<- readRDS("~/test/data/2021_04_05/all_mut_mis_ccf.rds")
results <- readRDS("data/all_mut_tpm_Rank2_IC50.rds")
all_mut_tpm <- all_mut_tpm %>%
all_mut mutate(index=paste(sample,chr,position,ref,alt,sep = ":")) %>%
::rename(ref_allele=ref,alt_allele=alt)
dplyr
<- inner_join(
all_mut_ccf
all_mut,%>% select(-sample),
results by="index"
)<- all_mut_ccf[!duplicated(all_mut_ccf$index),]
all_mut_ccf saveRDS(all_mut_ccf,file = "data/all_mut_ccf_tpm_Rank2_IC50.rds")##This file is used to calculate CCF-ES
Caculate ESccf and ESrna:
##cal
<- readRDS("data/all_mut_ccf_tpm_Rank2_IC50.rds")
all_mut_ccf <- readRDS("~/Immunoediting/data/driver_mutations.rds")
driver_mutations <- all_mut_ccf %>%
all_mut_ccf rename(ccf=ccf_hat) %>%
mutate(neo=ifelse(neo=="neo","yes","no"))
<- all_mut_ccf %>% filter(ccf<0.6) %>% select(sample) %>%
samples_has_subclonal distinct(sample)
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 20),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- dt %>% filter(sample == x)
data <- NeoEnrichment::cal_nes_new_test(dt = data,
a sample_counts = 1000,
need_p = FALSE)
return(a)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}
<- all_mut_ccf %>%
all_mut_ccf mutate(gene_protein_change=paste(Hugo_Symbol,Protein_Change,sep = "-"))
<- driver_mutations %>%
driver_mutations mutate(gene_protein_change=paste(gene,protein_change,sep = "-"))
<- all_mut_ccf %>%
all_mut_ccf mutate(is_driver=ifelse(gene_protein_change %in% driver_mutations$gene_protein_change,"yes","no"))
sum(all_mut_ccf$is_driver=="yes" & all_mut_ccf$neo=="yes") / sum(all_mut_ccf$neo=="yes")##0.01388538
%>% group_by(sample) %>%
all_mut_ccf summarise(inter_gene=intersect(Hugo_Symbol[neo=="yes"],
=="yes"])) -> aaa##701 samples
Hugo_Symbol[is_driver<- all_mut_ccf %>%
all_mut_ccf mutate(sample_neo_index=paste(sample,neo,Hugo_Symbol,sep = ","))
<- aaa %>% mutate(sample_neo_index=paste(sample,"yes",inter_gene,sep = ","))
aaa
%>%
all_mut_ccf mutate(in_aaa = ifelse(sample_neo_index %in% aaa$sample_neo_index,"yes","no")) %>%
group_by(sample) %>%
summarise(need_sample=ifelse(any(in_aaa=="yes"),"no","yes")) %>%
filter(need_sample=="yes") -> summ2
<- intersect(samples_has_subclonal$sample,summ2$sample)
need_samples %>%
all_mut_ccf filter(sample %in% need_samples) %>%
group_by(sample) %>%
summarise(c_n=sum(neo=="yes"),c_m=sum(neo=="no")) %>% filter(c_n>=1 & c_m >=1) -> summ
<- all_mut_ccf %>% filter(sample %in% summ$sample)
neo_missense <- neo_missense %>% select(sample,neo,ccf) %>% filter(!is.na(ccf))
neo_missense <- cal_nes_warp(neo_missense)
neo_nes median(neo_nes$es)##0.19的新抗原
saveRDS(neo_nes,file = "../data/neo_es_ccf_IC50_Rank2.rds")
###模拟
<- readRDS("../data/neo_es_ccf_IC50_Rank2.rds")
neo_nes <- readRDS("../data/all_mut_ccf_tpm_Rank2_IC50.rds")
all_mut_ccf <- all_mut_ccf %>%
all_mut_ccf rename(ccf=ccf_hat) %>%
mutate(neo=ifelse(neo=="neo","yes","no"))
<- all_mut_ccf %>% filter(sample %in% neo_nes$sample)
neo_missense <- neo_missense %>% select(sample,neo,ccf) %>% filter(!is.na(ccf))
neo_missense
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 20),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- dt %>% filter(sample == x)
data <- NeoEnrichment::cal_nes_new_test(dt = data,
a sample_counts = 1000,
need_p = FALSE)
return(a)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- vector("list",2000)
res for (i in 1:2000){
<- neo_missense %>%
neo_missense_sim group_by(sample) %>%
mutate(neo_sim=sample(neo,length(neo)))
<- neo_missense_sim %>%
neo_missense_sim select(-neo) %>%
rename(neo=neo_sim)
<- cal_nes_warp(neo_missense_sim)
neo_nes_sim $sim_num <- i
neo_nes_sim<- neo_nes_sim
res[[i]] print(paste0("Complete ",i," sim. "))
}saveRDS(res,file = "../data/sim_2000_filter_driver_IC50_Rank2.rds")
<- readRDS("../data/sim_2000_filter_driver_IC50_Rank2.rds")
sim_2000 <- bind_rows(sim_2000)
sim_2000 $cancer <- get_cancer_type(sim_2000$sample,
sim_2000parallel = T,cores = 20)
saveRDS(sim_2000,file = "../../tmp/sim_2000_filter_driver_IC50_Rank2_all_ccf.rds")
##cal rna
<- readRDS("~/tmp/all_mut_tpm_not_filter_Rank2_IC50.rds")
all_mut_exp <- all_mut_exp %>% select(sample,tpm_exp,neo,chr,position,gene) %>% rename(exp=tpm_exp)
all_mut_exp %>%
all_mut_exp group_by(sample) %>%
summarise(n_a=sum(neo=="not_neo"),n_c=sum(neo=="neo")) %>%
filter(n_a>=1,n_c>=1) -> summ
<- readRDS("data/pancancer_mutation.rds") %>%
pancancer_mutation mutate(index=paste(Sample_ID,chrom,start,sep = ":")) %>%
select(index,Amino_Acid_Change)
<- left_join(
all_mut_exp %>% mutate(index=paste(sample,chr,position,sep = ":")),
all_mut_exp
pancancer_mutation
)<- all_mut_exp %>%
all_mut_exp mutate(gene_protein_change=paste(gene,Amino_Acid_Change,sep = "-"))
<- readRDS("~/Immunoediting/data/driver_mutations.rds")
driver_mutations <- driver_mutations %>%
driver_mutations mutate(gene_protein_change=paste(gene,protein_change,sep = "-"))
<- all_mut_exp %>%
all_mut_exp mutate(is_driver=ifelse(gene_protein_change %in% driver_mutations$gene_protein_change,"yes","no"))
sum(all_mut_exp$is_driver=="yes" & all_mut_exp$neo=="neo") / sum(all_mut_exp$neo=="neo")##0.01029878
%>% group_by(sample) %>%
all_mut_exp summarise(inter_gene=intersect(gene[neo=="neo"],
=="yes"])) -> aaa##1949 samples
gene[is_driver<- all_mut_exp %>%
all_mut_exp mutate(sample_neo_index=paste(sample,neo,gene,sep = ","))
<- aaa %>% mutate(sample_neo_index=paste(sample,"neo",inter_gene,sep = ","))
aaa
%>%
all_mut_exp mutate(in_aaa = ifelse(sample_neo_index %in% aaa$sample_neo_index,"yes","no")) %>%
group_by(sample) %>%
summarise(need_sample=ifelse(any(in_aaa=="yes"),"no","yes")) %>%
filter(need_sample=="no") -> summ2
<- all_mut_exp %>% filter(sample %in% summ$sample) %>%
neo_exp filter(!(sample %in% summ2$sample))
<- neo_exp %>%
neo_exp select(sample,neo,exp)
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 20),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- NeoEnrichment::cales_t(data = dt,barcode = x,type = "II",
a calp = TRUE,sample_counts = 1000,
cal_type = "exp")
#df <- data.frame(sample=x,es=a)
return(a)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- cal_nes_warp(neo_exp)
nes_exp saveRDS(nes_exp,file = "../data/nes_exp_Rank2_IC50.rds")
##sim
<- readRDS("../data/nes_exp_Rank2_IC50.rds")
neo_nes <- readRDS("~/tmp/all_mut_tpm_not_filter_Rank2_IC50.rds")
all_mut_exp <- all_mut_exp %>%
all_mut_exp select(sample,tpm_exp,neo) %>%
rename(exp=tpm_exp)
<- all_mut_exp %>% filter(sample %in% neo_nes$sample)
neo_missense <- neo_missense %>% filter(!is.na(exp))
neo_missense
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 35),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- NeoEnrichment::cales_t(data = dt,barcode = x,type = "II",
a calp = FALSE,sample_counts = 1000,
cal_type = "exp")
<- data.frame(sample=x,es=a)
df return(df)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- vector("list",2000)
res for (i in 1:2000){
<- neo_missense %>%
neo_missense_sim group_by(sample) %>%
mutate(neo_sim=sample(neo,length(neo))) %>%
ungroup()
<- neo_missense_sim %>%
neo_missense_sim select(-neo) %>%
rename(neo=neo_sim)
<- cal_nes_warp(neo_missense_sim)
neo_nes_sim $sim_num <- i
neo_nes_sim<- neo_nes_sim
res[[i]] print(paste0("Complete ",i," sim. "))
}saveRDS(res,file = "../data/sim_2000_es_exp_Rank2_IC50.rds")
<- readRDS("../data/sim_2000_es_exp_Rank2_IC50.rds")
sim_2000 <- bind_rows(sim_2000)
sim_2000 $cancer <- get_cancer_type(sim_2000$sample,
sim_2000parallel = T,cores = 20)
saveRDS(sim_2000,file = "../../tmp/sim_2000_es_exp_Rank2_IC50_all.rds")
Then we can get the pan-cancer simulation p values:
<- readRDS("../data/neo_es_ccf_IC50_Rank2.rds")
neo_nes <- readRDS("../../tmp/sim_2000_filter_driver_IC50_Rank2_all_ccf.rds")
sim_all $cancer <- NeoEnrichment::get_cancer_type(neo_nes$sample)
neo_nes<- neo_nes %>%
neo_nes_summ group_by(cancer) %>% summarise(median_es=median(es))
%>%
sim_all group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ
<- WVPlots::ShadedDensity(frame = summ,
p xvar = "median_es",
threshold = median(neo_nes$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p<- p + labs(x="Simulation median es")+
p1 theme_prism()
##rna
<- readRDS("../data/nes_exp_Rank2_IC50.rds")
neo_nes <- readRDS("../../tmp/sim_2000_es_exp_Rank2_IC50_all.rds")
sim_all $cancer <- NeoEnrichment::get_cancer_type(neo_nes$sample)
neo_nes<- neo_nes %>%
neo_nes_summ group_by(cancer) %>% summarise(median_es=median(es))
%>%
sim_all group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ
<- WVPlots::ShadedDensity(frame = summ,
p xvar = "median_es",
threshold = median(neo_nes$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p<- p + labs(x="Simulation median es")+
p2 theme_prism()
/p2 p1
Under this new cutoff, a trend in ESCCF signal (ES=-0.013, P=0.054) can still be observed. The pan-cancer ESRNA signal (ES=-0.056, P<0.005) is still significant.
Another method for neoantigen prediction
We also used MHCfurry implemented in pVACtools to predict neoantigen (cutoff settings: IC50<50 and %Rank <0.5 and TPM >1, and 16% of mutations are predicted as neoantigen), and calculted corresponding ESccf and ESrna. The shell script of predicting neoantigen using pVACseq can be found in code/shell/pVACseq.sh
:
##process output
library(dplyr)
<- data.table::fread("/public/slst/home/wutao2/TCGA_pvacseq/out_files_names",data.table = F,header = F)
samples for (i in samples$V1){
<- data.table::fread(paste0("/public/slst/home/wutao2/TCGA_pvacseq/out_files/",i),
dt data.table = F)
<- dt %>% select(Chromosome,Start,Stop,Reference,
dt `Gene Name`,`HLA Allele`,
Variant,`MHCflurry MT Score`,`MHCflurry MT Percentile`,
`MHCnuggetsI MT Score`,`Sample Name`)
write.table(dt,file = paste0("/public/slst/home/wutao2/TCGA_pvacseq/out_reorganized/",i),
sep = "\t",col.names = F,row.names = F,quote = F)
}
##get neoantigen
library(dplyr)
<- list.files("/public/slst/home/wutao2/TCGA_pvacseq/out_reorganized",full.names = T)
files <- vector("list",length(files))
re
for (i in 1:length(files)){
<- data.table::fread(files[i],data.table = F)
mut <- mut %>% mutate(
mut index = paste(V1,V2,V3,V4,V5,sep = ":")
)<- mut %>%
neo_mhcfurry filter(V8<50 & V9 <0.5) %>%
distinct(index,.keep_all = T)
<- mut %>%
neo_mhcnuggets filter(V10 < 50) %>%
distinct(index,.keep_all = T)
<- mut %>%
mut distinct(index,.keep_all = T) %>%
mutate(neo_MHCfurry = ifelse(index %in% neo_mhcfurry$index , "neo","not_neo")) %>%
mutate(neo_MHCnuggets = ifelse(index %in% neo_mhcnuggets$index , "neo","not_neo"))
<- mut
re[[i]]
}
<- bind_rows(re)
all_mut saveRDS(all_mut,file = "/public/slst/home/wutao2/TCGA_pvacseq/res/all_mut_neo.rds")
##add tpm and ccf
colnames(all_mut)[1:2] <- c("gene","sample")
##add tpm
<- readRDS("~/data/tpm_trans.rds")
tpm <- tpm[!duplicated(tpm$gene),]
tpm $tpm_exp <- mapply(function(sample,gene){
all_mut$gene==gene,substr(sample,1,15)]
tpm[tpm$sample,all_mut$gene)
},all_mut<- all_mut %>%
all_mut1 filter(lengths(tpm_exp)!=0)
$tpm_exp <- as.numeric(all_mut1$tpm_exp)
all_mut1<- all_mut1 %>%
all_mut1 ::separate(col = index,sep = ":",into = c("chr","p","position","ref","alt"))
tidyr<- all_mut1 %>% select(-p)
all_mut1 saveRDS(all_mut1,file = "../data/another_methods/all_mut_neo_exp.rds")
<- readRDS("../data/all_mut_mis_ccf.rds")
results <- all_mut1 %>%
all_mut1 mutate(index=paste(sample,chr,position,ref,alt,sep = ":")) %>%
::rename(ref_allele=ref,alt_allele=alt)
dplyr
<- inner_join(
all_mut_ccf
all_mut1,%>% select(index,ccf_hat,Protein_Change),
results by="index"
%>% rename(ccf=ccf_hat) %>% distinct(index,.keep_all = T) %>% select(-index)
) saveRDS(all_mut_ccf,file = "../data/another_methods/all_mut_neo_ccf.rds")
##cal ccf
##cal es
##mhcfurry
<- readRDS("../data/another_methods/all_mut_neo_ccf.rds")
all_mut_ccf <- all_mut_ccf %>%
all_mut_ccf mutate(neo=ifelse(neo_MHCfurry=="neo" & tpm_exp>1,"yes","no"))
<- all_mut_ccf %>%
all_mut_ccf mutate(gene_protein_change=paste(gene,Protein_Change,sep = "-"))
<- readRDS("../data/driver_mutations.rds")
driver_mutations <- driver_mutations %>%
driver_mutations mutate(gene_protein_change=paste(gene,protein_change,sep = "-"))
<- all_mut_ccf %>%
all_mut_ccf mutate(is_driver=ifelse(gene_protein_change %in% driver_mutations$gene_protein_change,"yes","no"))
sum(all_mut_ccf$is_driver=="yes" & all_mut_ccf$neo=="yes") / sum(all_mut_ccf$neo=="yes")## 0.0134365
%>% group_by(sample) %>%
all_mut_ccf summarise(inter_gene=intersect(gene[neo=="yes"],
=="yes"])) -> aaa##1560 samples
gene[is_driver<- all_mut_ccf %>%
all_mut_ccf mutate(sample_neo_index=paste(sample,neo,gene,sep = ","))
<- aaa %>% mutate(sample_neo_index=paste(sample,"yes",inter_gene,sep = ","))
aaa
%>%
all_mut_ccf mutate(in_aaa = ifelse(sample_neo_index %in% aaa$sample_neo_index,"yes","no")) %>%
group_by(sample) %>%
summarise(need_sample=ifelse(any(in_aaa=="yes"),"no","yes")) %>%
filter(need_sample=="yes") -> summ2
<- all_mut_ccf %>%
samples_has_subclonal filter(ccf<0.6) %>%
select(sample) %>%
distinct(sample)
<- intersect(samples_has_subclonal$sample,summ2$sample)
need_samples %>%
all_mut_ccf filter(sample %in% need_samples) %>%
group_by(sample) %>%
summarise(c_n=sum(neo=="yes"),c_m=sum(neo=="no")) %>% filter(c_n>=1 & c_m >=1) -> summ
<- all_mut_ccf %>% filter(sample %in% summ$sample)
neo_missense <- neo_missense %>% select(sample,neo,ccf) %>% filter(!is.na(ccf))
neo_missense
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 40),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- dt %>% filter(sample == x)
data <- NeoEnrichment::cal_nes_new_test(dt = data,
a sample_counts = 1000,
need_p = FALSE)
return(a)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}
<- cal_nes_warp(neo_missense)
neo_nes saveRDS(neo_nes,file = "../data/another_methods/es_ccf_mhcfurry_filter_driver.rds")
saveRDS(neo_missense,file = "../data/another_methods/mut_cal_ccf.rds")
##sim
<- vector("list",2000)
res for (i in 1:2000){
<- neo_missense %>%
neo_missense_sim group_by(sample) %>%
mutate(neo_sim=sample(neo,length(neo))) %>%
ungroup()
<- neo_missense_sim %>%
neo_missense_sim select(-neo) %>%
rename(neo=neo_sim)
<- cal_nes_warp(neo_missense_sim)
neo_nes_sim $sim_num <- i
neo_nes_sim<- neo_nes_sim
res[[i]] print(paste0("Complete ",i," sim. "))
}saveRDS(res,file = "../data/sim_2000_es_ccf_mhcfurry.rds")
<- readRDS("../data/sim_2000_es_ccf_mhcfurry.rds")
sim_2000 <- bind_rows(sim_2000)
sim_2000 $cancer <- get_cancer_type(sim_2000$sample,
sim_2000parallel = T,cores = 20)
saveRDS(sim_2000,file = "../../tmp/sim_2000_filter_all_ccf_mhcfurry.rds")
##exp
<- readRDS("../data/another_methods/all_mut_neo_exp.rds")
all_mut_exp <- all_mut_exp %>%
all_mut_exp rename(neo=neo_MHCfurry) %>%
rename(exp=tpm_exp)
%>%
all_mut_exp group_by(sample) %>%
summarise(n_a=sum(neo=="not_neo"),n_c=sum(neo=="neo")) %>%
filter(n_a>=1,n_c>=1) -> summ
<- readRDS("../data/pancancer_mutation.rds") %>%
pancancer_mutation mutate(index=paste(Sample_ID,chrom,start,sep = ":")) %>%
select(index,Amino_Acid_Change)
<- left_join(
all_mut_exp %>% mutate(index=paste(sample,chr,position,sep = ":")),
all_mut_exp
pancancer_mutation
)<- all_mut_exp %>%
all_mut_exp mutate(gene_protein_change=paste(gene,Amino_Acid_Change,sep = "-"))
<- readRDS("../data/driver_mutations.rds")
driver_mutations <- driver_mutations %>%
driver_mutations mutate(gene_protein_change=paste(gene,protein_change,sep = "-"))
<- all_mut_exp %>%
all_mut_exp mutate(is_driver=ifelse(gene_protein_change %in% driver_mutations$gene_protein_change,"yes","no"))
sum(all_mut_exp$is_driver=="yes" & all_mut_exp$neo=="neo") / sum(all_mut_exp$neo=="neo")##0.006262794
%>% group_by(sample) %>%
all_mut_exp summarise(inter_gene=intersect(gene[neo=="neo"],
=="yes"])) -> aaa##1707 samples
gene[is_driver<- all_mut_exp %>%
all_mut_exp mutate(sample_neo_index=paste(sample,neo,gene,sep = ","))
<- aaa %>% mutate(sample_neo_index=paste(sample,"neo",inter_gene,sep = ","))
aaa
%>%
all_mut_exp mutate(in_aaa = ifelse(sample_neo_index %in% aaa$sample_neo_index,"yes","no")) %>%
group_by(sample) %>%
summarise(need_sample=ifelse(any(in_aaa=="yes"),"no","yes")) %>%
filter(need_sample=="no") -> summ2
<- all_mut_exp %>% filter(sample %in% summ$sample) %>%
neo_exp filter(!(sample %in% summ2$sample))
<- neo_exp %>%
neo_exp select(sample,neo,exp)
<- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 20),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- NeoEnrichment::cales_t(data = dt,barcode = x,type = "II",
a calp = FALSE,sample_counts = 1000,
cal_type = "exp")
<- data.frame(sample=x,es=a)
df return(df)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- cal_nes_warp(neo_exp)
nes_exp saveRDS(nes_exp,file = "../data/another_methods/es_exp_filter_driver_mhcfurry.rds")
saveRDS(neo_exp,file = "../data/another_methods/mut_call_exp_mhcfurry.rds")
##sim
<- readRDS("../data/another_methods/mut_call_exp_mhcfurry.rds")
neo_exp <- function(dt){
cal_nes_warp <- vector("list",length = length(unique(dt$sample)))
results_ccf names(results_ccf) <- unique(dt$sample)
<- makeCluster(getOption("cl.cores", 20),type="FORK")
cl <- parSapply(cl=cl,names(results_ccf),
results_ccf function(x){
<- NeoEnrichment::cales_t(data = dt,barcode = x,type = "II",
a calp = FALSE,sample_counts = 1000,
cal_type = "exp")
<- data.frame(sample=x,es=a)
df return(df)
simplify = FALSE)
},stopCluster(cl)
<- Filter(function(x){length(x)>1},results_ccf)
results_ccf <- bind_rows(results_ccf)
pancancer_nes_ccf return(pancancer_nes_ccf)
}<- vector("list",2000)
res for (i in 1:2000){
<- neo_exp %>%
neo_missense_sim group_by(sample) %>%
mutate(neo_sim=sample(neo,length(neo))) %>%
ungroup()
<- neo_missense_sim %>%
neo_missense_sim select(-neo) %>%
rename(neo=neo_sim)
<- cal_nes_warp(neo_missense_sim)
neo_nes_sim $sim_num <- i
neo_nes_sim<- neo_nes_sim
res[[i]] print(paste0("Complete ",i," sim. "))
}saveRDS(res,file = "data/sim_2000_filter_driver_exp_mhcfurry.rds")
<- readRDS("data/sim_2000_filter_driver_exp_mhcfurry.rds")
sim2000 <- bind_rows(sim2000)
sim2000 $cancer <- get_cancer_type(sim2000$sample,parallel = T,cores = 20)
sim2000saveRDS(sim2000,file = "../../tmp/sim_2000_filter_all_exp_mhcfurry.rds")
Then we can get the simulation p values for MHCfurry dataset:
##ccf mhcfurry
<- readRDS("../data/another_methods/es_ccf_mhcfurry_filter_driver.rds")
neo_nes <- readRDS("../../tmp/sim_2000_filter_all_ccf_mhcfurry.rds")
sim_all %>%
sim_all group_by(cancer,sim_num) %>%
summarise(median_es=median(es)) -> summ2
#> `summarise()` has grouped output by 'cancer'. You can override using the
#> `.groups` argument.
$cancer <- get_cancer_type(neo_nes$sample)
neo_nes<- neo_nes %>%
neo_nes_summ group_by(cancer) %>% summarise(median_es=median(es))
%>%
sim_all group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ
<- WVPlots::ShadedDensity(frame = summ,
p xvar = "median_es",
threshold = median(neo_nes$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p<- p + labs(x="Simulation median es")+
p1 theme_prism()
###exp mhcfurry
<- readRDS("../data/another_methods/es_exp_filter_driver_mhcfurry.rds")
neo_nes <- readRDS("../../tmp/sim_2000_filter_all_exp_mhcfurry.rds")
sim_all %>%
sim_all group_by(cancer,sim_num) %>%
summarise(median_es=median(es)) -> summ2
#> `summarise()` has grouped output by 'cancer'. You can override using the
#> `.groups` argument.
$cancer <- get_cancer_type(neo_nes$sample)
neo_nes<- neo_nes %>%
neo_nes_summ group_by(cancer) %>% summarise(median_es=median(es))
%>%
sim_all group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ
<- WVPlots::ShadedDensity(frame = summ,
p xvar = "median_es",
threshold = median(neo_nes$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p<- p + labs(x="Simulation median es")+
p2 theme_prism()
/p2 p1
With this new neoantigen prediction tool, we still see a trends of immunoediting signal (ESCCF=-0.017, P=0.015). Similarly a significant ESRNA signal can also been detected (ES=-0.029, P<0.005). This data further demonstrate the original conclusion of this study.
Immune escape analysis
We consider the following immune escape mechanisms:
- Suppress the transcription of genome alterations encoding high antigenicity (quantified as ESRNA);
- Antigen presentation pathway gene alterations;
- PD-L1 or CTLA-4 overexpression.
- HLA LOH
Antigen presentation pathway genes were selected based on the list of antigen processing and presentation machinery (APM) from the Gene Ontology Consortium (GO:0002474). Gene level non-silent mutation file was downloaded from UCSC Xena. If one sample has non-silent mutations in APM genes, then this sample was labled as escape by APM mutation.
Immune checkpoint gene overexpression was assessed using RNA-seq data. Normal expression values (in transcripts per million mapped reads (TPM)) of PD-L1 and CTLA-4 were established from the TCGA based on RNA-seq expression of the two proteins in normal samples. Checkpoint overexpression was called if either PD-L1 or CTLA-4 expression in the tumor was higher than the mean plus two standard deviations of normal expression.
The HLA LOH status data was obtained from Xiangyong Li et al. If at least one HLA allele is subject to loss by LOHHLA, then the sample is labled as HLA LOH.
library(tidyr)
####apm
<- readRDS("../data/ap_pathway.rds")
apm_gene <- data.table::fread("../data/mc3.v0.2.8.PUBLIC.nonsilentGene.xena",data.table = F)
non_slient_mutation <- apm_gene$V1
apm_gene which(!(apm_gene %in% non_slient_mutation$sample))
<- non_slient_mutation %>%
non_slient_mutation ::filter(sample %in% apm_gene)
dplyr<- apply(non_slient_mutation[,2:ncol(non_slient_mutation)],2,function(x){any(x==1)}) %>% as.data.frame()
check colnames(check) <- "apm_mut"
$sample <- rownames(check)
checksaveRDS(check,file = "../data/apm_mut_sample.rds")
###checkpoint over expression CD274 CTLA-4
<- readRDS("../data/tpm_trans.rds")
tpm_gene <- tpm_gene %>% filter(gene == "CD274") %>%
CD274 select(c(3:ncol(tpm_gene))) %>% t() %>% as.data.frame()
$sample <- rownames(CD274)
CD274colnames(CD274)[1] <- "PDL1"
<- tpm_gene %>% filter(gene == "CTLA4") %>%
CTLA4 select(c(3:ncol(tpm_gene))) %>% t() %>% as.data.frame()
$sample <- rownames(CTLA4)
CTLA4colnames(CTLA4)[1] <- "CTLA4"
<- left_join(CD274,CTLA4,by="sample")
check_point saveRDS(check_point,file = "../data/checkpoint_exp.rds")
table(substr(check_point$sample,14,15))
<- data.frame(sample=colnames(tpm_gene)[which(substr(colnames(tpm_gene),14,15)==11)])
normal <- left_join(normal,check_point,by="sample")
normal $cancer <- EasyBioinfo::get_cancer_type(normal$sample)
normal%>%
normal group_by(cancer) %>%
summarise(mean_pdl1 = mean(PDL1),
sd_pdl1 = sd(PDL1),
mean_ctla4 = mean(CTLA4),
sd_ctla4 = sd(CTLA4)) -> mean_sd
<- check_point %>%
cancer filter(as.numeric(substr(sample,14,15)) < 11)
$cancer <- EasyBioinfo::get_cancer_type(cancer$sample)
cancer<- left_join(cancer,mean_sd,by="cancer")
cancer <- na.omit(cancer)
cancer
<- cancer %>%
cancer mutate(PDL1_over = ifelse(PDL1 > (mean_pdl1+2*sd_pdl1),"yes","no"),
CTLA4_over = ifelse(CTLA4 > (mean_ctla4+2*sd_ctla4),"yes","no"))
saveRDS(cancer,file = "../data/checkpoint_over.rds")
<- readxl::read_xlsx("../data/TCGA_HLA_benchmark_20200810.xlsx",sheet = 2)
loh <- loh %>%
loh filter(LOH_cal == TRUE) %>%
mutate(LOH_status = ifelse(nchar(LossAllele)==1,"no","yes"))
table(loh$LOH_status)
<- loh %>% select(Sample_Barcode,LOH_status)
loh saveRDS(loh,file = "../data/HLA_loh.rds")
We can show the percentage of samples that have differnet escape mechanisms in each cancer type:
<- readRDS("../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds")
nes_ccf <- readRDS("../data/es_exp_filter_driver.rds")
nes_exp <- readRDS("../data/checkpoint_over.rds")
checkpoint_over <- readRDS("../data/apm_mut_sample.rds")
apm_mut_sample <- readRDS("../data/HLA_loh.rds")
LOH
<- apm_mut_sample %>%
apm_mut_sample mutate(sample=substr(sample,1,12))
<- checkpoint_over %>%
checkpoint_over mutate(sample=substr(sample,1,12)) %>%
mutate(over_exp=ifelse(PDL1_over == "yes" | CTLA4_over == "yes" ,"yes","no")) %>%
select(sample,cancer,over_exp) %>%
distinct_all(.keep_all = T)
<- checkpoint_over[!duplicated(checkpoint_over$sample),]
checkpoint_over <- nes_exp %>%
nes_exp mutate(sample=substr(sample,1,12))
<- LOH %>% rename(sample=Sample_Barcode)
LOH <- Reduce(left_join,list(apm_mut_sample,checkpoint_over,nes_exp,LOH))
all_escape <- na.omit(all_escape)
all_escape <- all_escape %>%
all_escape mutate(es_down=ifelse(es<0 & p_value<0.05,"yes","no"))
saveRDS(all_escape,file = "../data/escape_all.rds")
<- readRDS("../data/escape_all.rds")
all_escape <- all_escape %>%
dt group_by(cancer) %>%
summarise(`APM mutation` = mean(apm_mut == TRUE),
`Checkpoint over-expression` = mean(over_exp=="yes"),
`Rna downregulation` = mean(es_down=="yes"),
`LOH`=mean(LOH_status=="yes")) %>% as.data.frame()
rownames(dt) <- dt$cancer
<- dt %>% select(-cancer)
dt
library(ComplexHeatmap)
library(circlize)
#> ========================================
#> circlize version 0.4.13
#> CRAN page: https://cran.r-project.org/package=circlize
#> Github page: https://github.com/jokergoo/circlize
#> Documentation: https://jokergoo.github.io/circlize_book/book/
#>
#> If you use it in published research, please cite:
#> Gu, Z. circlize implements and enhances circular visualization
#> in R. Bioinformatics 2014.
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(circlize))
#> ========================================
<- dt * 100
dt = colorRamp2(c(0, 50,100), c("green", "white", "red"))
col_fun Heatmap(dt, name = "Sample proportion (%)",
cell_fun = function(j, i, x, y, width, height, fill) {
grid.text(sprintf("%.1f", dt[i, j]), x, y, gp = gpar(fontsize = 10))
cluster_rows = F,cluster_columns = F)
},#> Warning: The input is a data frame, convert it to the matrix.
Then we can compare the ESccf between escape samples and no escape samples:
<- readRDS("../data/checkpoint_over.rds") %>%
checkpoint_over mutate(type=ifelse(PDL1_over=="yes" | CTLA4_over=="yes","yes","no"))
<- checkpoint_over %>% filter(type=="yes")
checkpoint_over_escape <- readRDS("../data/apm_mut_sample.rds")
apm_mut_sample <- apm_mut_sample %>% filter(apm_mut)
apm_mut_escape <- readRDS("../data/HLA_loh.rds")
LOH <- LOH %>% filter(LOH_status=="yes")
LOH_escape <- readRDS("../data/es_exp_filter_driver.rds")
nes_exp <- nes_exp %>% filter(es<0 & p_value<0.05)
es_escape
<- Reduce(union,list(substr(checkpoint_over_escape$sample,1,12),
escape_samples substr(apm_mut_escape$sample,1,12),
substr(LOH_escape$Sample_Barcode,1,12),
substr(es_escape$sample,1,12)))
<- nes_ccf %>%
es_ccf mutate(escape=ifelse(substr(sample,1,12) %in% escape_samples,"yes","no"))
table(es_ccf$escape)
#>
#> no yes
#> 1704 3591
$cancer <- get_cancer_type(es_ccf$sample)
es_ccf<- es_ccf %>% filter(escape=="yes")
escape <- es_ccf %>% filter(escape=="no")
no_escape ggplot(data=es_ccf,aes(x=escape,y=es))+
geom_boxplot()+
stat_compare_means()+
theme_prism()
We can get the simulation p value for pancancer and cancer types as previously described:
<- readRDS("../../tmp/sim_2000_not_filter_all_ccf.rds")
sim_all <- sim_all %>%
sim_escape filter(sample %in% escape$sample)
<- sim_all %>%
sim_no_escape filter(sample %in% no_escape$sample)
%>%
sim_escape group_by(cancer,sim_num) %>%
summarise(median_es=median(es)) -> summ1
#> `summarise()` has grouped output by 'cancer'. You can override using the
#> `.groups` argument.
<- escape %>%
escape_summ group_by(cancer) %>% summarise(median_es=median(es))
%>%
sim_escape group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ2
<- WVPlots::ShadedDensity(frame = summ2,
p xvar = "median_es",
threshold = median(escape$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p<- p + labs(x="Simulation median es")+
p1 theme_prism()+
labs(y="Density")
p1
<- no_escape %>%
noescape_summ group_by(cancer) %>% summarise(median_es=median(es))
%>%
sim_no_escape group_by(sim_num) %>%
summarise(median_es=median(es)) -> summ3
<- WVPlots::ShadedDensity(frame = summ3,
p xvar = "median_es",
threshold = median(no_escape$es),
title = "",
tail = "left")
$layers[[1]]$aes_params$colour <- "red"
p$layers[[1]]$aes_params$size <- 1
p$layers[[2]]$aes_params$fill <- "blue" #geom_ribbon
p$layers[[3]]$aes_params$colour <- "black"
p$layers[[3]]$aes_params$size <- 1
p<- p + labs(x="Simulation median es")+
p2 theme_prism()+
labs(y="Density")
p2
%>%
sim_no_escape group_by(cancer,sim_num) %>%
summarise(median_es=median(es)) -> summ4
#> `summarise()` has grouped output by 'cancer'. You can override using the
#> `.groups` argument.
<- noescape_summ %>%
noescape_summ rowwise() %>%
mutate(p=mean(summ4$median_es[summ4$cancer==cancer] <= median_es))
<- escape_summ %>%
escape_summ rowwise() %>%
mutate(p=mean(summ1$median_es[summ1$cancer==cancer] <= median_es))
<- escape %>% select(-p)
escape <- no_escape %>% select(-p)
no_escape <- get_f1(escape,pancancer_p = 0.13,dt2 = escape_summ,median_es = median(escape$es))
p3 #> Joining, by = "cancer"Joining, by = "cancer"
p3
<- get_f1(no_escape,pancancer_p = 0.034,dt2 = noescape_summ,median_es = median(no_escape$es))
p4 #> Joining, by = "cancer"Joining, by = "cancer"
p4
The result shows that immune elimination signal can be detected in samples without escape signal, while in samples with immunoediting escape mechanisms this signal is much weaker. ESCCF values of patients without immune escape mechanisms are significantly lower than patients with immune escape mechanisms. This data is consistent with our previous observation that there is anti-correlation between immunoediting elimination and escape signal.
CCF distribution analysis
As our method is relied on the distribution of CCF, we can show the CCF distribution of pancancer and all cancer types, plotting as well the CCF of neoantigen and non-neoantigens:
<- readRDS("../data/all_mut_ccf_tpm.rds") %>% as.data.frame()
all_mut_ccf <- all_mut_ccf %>%
all_mut_ccf rename(ccf=ccf_hat) %>%
mutate(neo=ifelse(neo=="neo","yes","no"))
<- readRDS("../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds")
es <- all_mut_ccf %>%
neo_missense filter(sample %in% es$sample) %>%
select(sample,neo,ccf) %>% filter(!is.na(ccf))
$cancer <- get_cancer_type(neo_missense$sample)
neo_missense
ggplot(data = neo_missense,aes(x=ccf,..scaled..,color=neo))+
geom_density(size=1)+
theme_bw()+
labs(y="Density",x="CCF")+
facet_wrap(. ~ cancer)
It is apparent that in many cancer types there are clear shift of CCF curves of neoantigenic mutations compared with non-neoantigenic mutations, and this difference highlight the immunoediting elimination signal quantified in this study.
Other immune cell infiltration quantification methods
Comparisons between TCGA cancer patients with detectable immunoediting-elimination or escape signal (ES<0, P<0.05) and the remaining patients in infiltration status quantified by using CD8 T cells puls NK cells, the ratio CD8/Treg and NK/Treg cells. We use two other methods: cibersort and quantiseq:
<- readRDS("../data/es_exp_filter_driver.rds")
es_exp <- readRDS("../data/neo_nes_ccf06_1_remove_driver_samples_addp.rds")
es_ccf <- readRDS("../data/pancancer_subtcells.rds")
pancancer_subtcells <- left_join(
exp_immune
es_exp,
pancancer_subtcells%>%
) mutate(escape=ifelse(es<0 & p_value <0.05,"yes","no")) %>% na.omit(.)
#> Joining, by = "sample"
<- left_join(
ccf_immune
es_ccf,
pancancer_subtcells%>%
) mutate(es_type=ifelse(es<0 & p <0.05,"yes","no")) %>% na.omit(.)
#> Joining, by = "sample"
<- ggplot(data=ccf_immune,aes(x=es_type,y=exp(CD8_T)/exp(iTreg+nTreg)))+
p1 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Elimination",y="CD8_T / Treg")
<- ggplot(data=ccf_immune,aes(x=es_type,y=exp(NK)/exp(iTreg+nTreg)))+
p2 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Elimination",y="NK / Treg")
<- ggplot(data=exp_immune,aes(x=escape,y=exp(CD8_T)/exp(iTreg+nTreg)))+
p3 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Escape",y="CD8_T / Treg")
<- ggplot(data=exp_immune,aes(x=escape,y=exp(NK)/exp(iTreg+nTreg)))+
p4 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Escape",y="NK / Treg")
+p2)/(p3+p4) (p1
<- data.table::fread("../data/infiltration_estimation_for_tcga.csv",check.names = F,data.table = F) %>%
immune rename(sample=cell_type)
<- immune %>%
cibersort select(sample,ends_with("CIBERSORT-ABS"))
colnames(immune)
#> [1] "sample"
#> [2] "B cell_TIMER"
#> [3] "T cell CD4+_TIMER"
#> [4] "T cell CD8+_TIMER"
#> [5] "Neutrophil_TIMER"
#> [6] "Macrophage_TIMER"
#> [7] "Myeloid dendritic cell_TIMER"
#> [8] "B cell naive_CIBERSORT"
#> [9] "B cell memory_CIBERSORT"
#> [10] "B cell plasma_CIBERSORT"
#> [11] "T cell CD8+_CIBERSORT"
#> [12] "T cell CD4+ naive_CIBERSORT"
#> [13] "T cell CD4+ memory resting_CIBERSORT"
#> [14] "T cell CD4+ memory activated_CIBERSORT"
#> [15] "T cell follicular helper_CIBERSORT"
#> [16] "T cell regulatory (Tregs)_CIBERSORT"
#> [17] "T cell gamma delta_CIBERSORT"
#> [18] "NK cell resting_CIBERSORT"
#> [19] "NK cell activated_CIBERSORT"
#> [20] "Monocyte_CIBERSORT"
#> [21] "Macrophage M0_CIBERSORT"
#> [22] "Macrophage M1_CIBERSORT"
#> [23] "Macrophage M2_CIBERSORT"
#> [24] "Myeloid dendritic cell resting_CIBERSORT"
#> [25] "Myeloid dendritic cell activated_CIBERSORT"
#> [26] "Mast cell activated_CIBERSORT"
#> [27] "Mast cell resting_CIBERSORT"
#> [28] "Eosinophil_CIBERSORT"
#> [29] "Neutrophil_CIBERSORT"
#> [30] "B cell naive_CIBERSORT-ABS"
#> [31] "B cell memory_CIBERSORT-ABS"
#> [32] "B cell plasma_CIBERSORT-ABS"
#> [33] "T cell CD8+_CIBERSORT-ABS"
#> [34] "T cell CD4+ naive_CIBERSORT-ABS"
#> [35] "T cell CD4+ memory resting_CIBERSORT-ABS"
#> [36] "T cell CD4+ memory activated_CIBERSORT-ABS"
#> [37] "T cell follicular helper_CIBERSORT-ABS"
#> [38] "T cell regulatory (Tregs)_CIBERSORT-ABS"
#> [39] "T cell gamma delta_CIBERSORT-ABS"
#> [40] "NK cell resting_CIBERSORT-ABS"
#> [41] "NK cell activated_CIBERSORT-ABS"
#> [42] "Monocyte_CIBERSORT-ABS"
#> [43] "Macrophage M0_CIBERSORT-ABS"
#> [44] "Macrophage M1_CIBERSORT-ABS"
#> [45] "Macrophage M2_CIBERSORT-ABS"
#> [46] "Myeloid dendritic cell resting_CIBERSORT-ABS"
#> [47] "Myeloid dendritic cell activated_CIBERSORT-ABS"
#> [48] "Mast cell activated_CIBERSORT-ABS"
#> [49] "Mast cell resting_CIBERSORT-ABS"
#> [50] "Eosinophil_CIBERSORT-ABS"
#> [51] "Neutrophil_CIBERSORT-ABS"
#> [52] "B cell_QUANTISEQ"
#> [53] "Macrophage M1_QUANTISEQ"
#> [54] "Macrophage M2_QUANTISEQ"
#> [55] "Monocyte_QUANTISEQ"
#> [56] "Neutrophil_QUANTISEQ"
#> [57] "NK cell_QUANTISEQ"
#> [58] "T cell CD4+ (non-regulatory)_QUANTISEQ"
#> [59] "T cell CD8+_QUANTISEQ"
#> [60] "T cell regulatory (Tregs)_QUANTISEQ"
#> [61] "Myeloid dendritic cell_QUANTISEQ"
#> [62] "uncharacterized cell_QUANTISEQ"
#> [63] "T cell_MCPCOUNTER"
#> [64] "T cell CD8+_MCPCOUNTER"
#> [65] "cytotoxicity score_MCPCOUNTER"
#> [66] "NK cell_MCPCOUNTER"
#> [67] "B cell_MCPCOUNTER"
#> [68] "Monocyte_MCPCOUNTER"
#> [69] "Macrophage/Monocyte_MCPCOUNTER"
#> [70] "Myeloid dendritic cell_MCPCOUNTER"
#> [71] "Neutrophil_MCPCOUNTER"
#> [72] "Endothelial cell_MCPCOUNTER"
#> [73] "Cancer associated fibroblast_MCPCOUNTER"
#> [74] "Myeloid dendritic cell activated_XCELL"
#> [75] "B cell_XCELL"
#> [ reached getOption("max.print") -- omitted 45 entries ]
<- function(ccf_es_dt,exp_es_dt,immune_dt,sample_len,cd8_nk_names,treg_names){
get_immune_plot <- left_join(
ccf_immune %>% mutate(sample=substr(sample,1,sample_len)),
ccf_es_dt
immune_dt%>%
) mutate(es_type=ifelse(es<0 & p <0.05,"yes","no"))
<- left_join(
exp_immune %>% mutate(sample=substr(sample,1,sample_len)),
exp_es_dt
immune_dt%>%
) mutate(es_type=ifelse(es<0 & p_value <0.05,"yes","no"))
<- ggplot(data=ccf_immune,aes(x=es_type,y=CD8_NK))+
p1 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Elimination",y=cd8_nk_names)
<- ggplot(data=ccf_immune,aes(x=es_type,y=Treg))+
p2 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Elimination",y=treg_names)
<- ggplot(data=exp_immune,aes(x=es_type,y=CD8_NK))+
p3 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Escape",y=cd8_nk_names)
<- ggplot(data=exp_immune,aes(x=es_type,y=Treg))+
p4 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Escape",y=treg_names)
<- ggplot(data=ccf_immune,aes(x=es_type,y=CD8_treg))+
p5 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Elimination",y="CD8/Treg")
<- ggplot(data=exp_immune,aes(x=es_type,y=CD8_treg))+
p6 geom_boxplot()+
stat_compare_means()+
theme_prism()+
labs(x="Escape",y="CD8/Treg")
<- list(p1,p2,p3,p4,p5,p6)
res return(res)
}
<- cibersort %>%
cibersort mutate(CD8_NK=`T cell CD8+_CIBERSORT-ABS`+`NK cell activated_CIBERSORT-ABS`,
Treg=`T cell regulatory (Tregs)_CIBERSORT-ABS`,
CD8_treg=exp(`T cell CD8+_CIBERSORT-ABS`)/exp(`T cell regulatory (Tregs)_CIBERSORT-ABS`))
<- get_immune_plot(es_ccf,es_exp,cibersort,sample_len = 15,cd8_nk_names = "CD8 T + NK",treg_names = "Treg")
res #> Joining, by = "sample"Joining, by = "sample"
1]] + res[[2]] + res[[3]] + res[[4]] + res[[5]] + res[[6]] + plot_layout(ncol = 2,nrow = 3)
res[[#> Warning: Removed 65 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 65 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 65 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 65 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 88 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 88 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 88 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 88 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 65 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 65 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 88 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 88 rows containing non-finite values (stat_compare_means).
<- immune %>%
quantiseq select(sample,ends_with("QUANTISEQ"))
colnames(quantiseq)
#> [1] "sample"
#> [2] "B cell_QUANTISEQ"
#> [3] "Macrophage M1_QUANTISEQ"
#> [4] "Macrophage M2_QUANTISEQ"
#> [5] "Monocyte_QUANTISEQ"
#> [6] "Neutrophil_QUANTISEQ"
#> [7] "NK cell_QUANTISEQ"
#> [8] "T cell CD4+ (non-regulatory)_QUANTISEQ"
#> [9] "T cell CD8+_QUANTISEQ"
#> [10] "T cell regulatory (Tregs)_QUANTISEQ"
#> [11] "Myeloid dendritic cell_QUANTISEQ"
#> [12] "uncharacterized cell_QUANTISEQ"
<- quantiseq %>%
quantiseq mutate(CD8_NK=`T cell CD8+_QUANTISEQ`+`NK cell_QUANTISEQ`,
Treg=`T cell regulatory (Tregs)_QUANTISEQ`,
CD8_treg=exp(`T cell CD8+_QUANTISEQ`)/exp(`T cell regulatory (Tregs)_QUANTISEQ`))
<- get_immune_plot(es_ccf,es_exp,quantiseq,sample_len = 15,cd8_nk_names = "CD8 T + NK",treg_names = "Treg")
res #> Joining, by = "sample"Joining, by = "sample"
1]] + res[[2]] + res[[3]] + res[[4]] + res[[5]] + res[[6]] + plot_layout(ncol = 2,nrow = 3)
res[[#> Warning: Removed 65 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 65 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 65 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 65 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 88 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 88 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 88 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 88 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 65 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 65 rows containing non-finite values (stat_compare_means).
#> Warning: Removed 88 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 88 rows containing non-finite values (stat_compare_means).
Comparison with other biomarkers of ICI
The performance of ESCCF has been compared with 15 biomarkers reported to have significant association with immune checkpoint inhibitor (ICI) response, including tumor mutation burden (TMB), clonal TMB, indel mutation burden, burden of indels escaping nonsense mediated decay (NMD), SERPINB3 mutations, CD274 (PD-L1) expression, CD38 expression, CD8A expression, CXCL13 expression, CXCL9 expression, T cell inflamed gene expression signature, IMPRES, CD8 T effector from the POPLAR trial, cytolytic score, and UV signature. The detail of calculation of these biomarkers can refer to the methods part of our manuscript or corresponding reference:
####Indel mutation
<- data.table::fread("data/Immunotherapy/liu_indel_counts",data.table = F)
liu_indel <- liu_indel %>%
liu_indel mutate(sample=gsub("_pass.vcf","",V1)) %>%
mutate(sample=paste0("liu_",sample)) %>%
mutate(indel_counts=V2+V3) %>%
select(sample,indel_counts)
<- data.table::fread("data/Immunotherapy/nadeem_indel_counts",data.table = F)
nadeem_indel <- nadeem_indel %>%
nadeem_indel mutate(sample=gsub("_pass.vcf","",V1)) %>%
mutate(sample=paste0("nadeem_",sample)) %>%
mutate(indel_counts=V2+V3) %>%
select(sample,indel_counts)
<- data.table::fread("data/Immunotherapy/willy_hugo_indel_counts",data.table = F)
willy_indel <- willy_indel %>%
willy_indel mutate(sample=gsub("_pass.vcf","",V1)) %>%
mutate(sample=paste0("willy_",sample)) %>%
mutate(indel_counts=V2+V3) %>%
select(sample,indel_counts)
<- bind_rows(liu_indel,nadeem_indel,willy_indel)
all_indel saveRDS(all_indel,file = "../data/Immunotherapy/all_indel_counts.rds")
##NMD-escape indel TMB
<- readRDS("../data/Immunotherapy/nes_immunetherapy.rds")
neo_nes
<- data.table::fread("~/tmp/hg38_NMDetectiveA_Lindeboom_et_al.v2.gtf",data.table = F)
gtf <- gtf %>% rename(start=V4,end=V5)
gtf <- data.table::setDT(gtf)
gtf ::setkey(gtf, V1,start, end)
data.table
$escape_nmd_indels <- NA
neo_nesfor (i in 1:nrow(neo_nes)){
<- gsub("_.+","",neo_nes$sample[i]) %>% stringr::str_to_title(.)
cohort <- gsub(".+_","",neo_nes$sample[i])
pt <- data.table::fread(paste0("~/immune_thearpy/",cohort,"_indel/",pt),data.table = F,sep = "\t")
t_s <- t_s %>%
t_s mutate(type=ifelse(nchar(V6)<nchar(V7),"In","Del")) %>%
mutate(start=V2+1,
end=ifelse(type=="Del",V3+1,V2+1+(nchar(V7)-1))) %>%
select(V1,start,end)
<- data.table::setDT(t_s)
t_s ::foverlaps(t_s, gtf, type="any") -> a
data.table$is_nmd <- NA
afor (j in 1:nrow(a)){
<- strsplit(a$V9[j],split = ";")
b <- gsub(" nmd_score ","",b[[1]][length(b[[1]])]) %>% strsplit(.,split = ",") %>% unlist() %>% as.numeric()
nmd_score $is_nmd[j] <- ifelse(sum(nmd_score>0.52,na.rm = T) >= 1,"yes","no")
a
}$index <- paste(a$V1,a$i.start,a$i.end,sep = ":")
a<- a %>%
a group_by(index) %>%
summarise(NMD_escape=ifelse(all(is_nmd=="no"),"yes","no"))
$escape_nmd_indels[i] <- sum(a$NMD_escape=="yes")
neo_nes
}saveRDS(neo_nes,file = "../data/Immunotherapy/nes_ici_escapeNMD_indel.rds")
###TPM
<- readRDS("~/immune_thearpy/TPM/liu/RNA_seq_TPM.rds")
RNA_seq_TPM rownames(RNA_seq_TPM) <- RNA_seq_TPM[,1]
<- RNA_seq_TPM[,-1]
RNA_seq_TPM <- as.data.frame(t(RNA_seq_TPM))
liu_tpm $gene <- rownames(liu_tpm)
liu_tpmcolnames(liu_tpm)[1:121] <- paste0("liu_",colnames(liu_tpm)[1:121])
saveRDS(liu_tpm,file = "../data/Immunotherapy/liu_tpm.rds")
<- data.table::fread("~/immune_thearpy/TPM/gencode.v29.annotation.gtf",data.table = F)
anno <- anno %>% select(V9) %>% distinct(V9)
anno <- anno %>%
anno rowwise() %>%
mutate(gene_id=strsplit(V9,split = ";")[[1]] %>%
grep("gene_id",.,value = T) %>%
strsplit(.,split = "\"") %>% `[[`(1) %>% `[`(2) %>% gsub("[.].+","",.))
<- anno %>%
anno distinct(gene_id,.keep_all = T)
<- anno %>%
anno rowwise() %>%
mutate(gene_name=strsplit(V9,split = ";")[[1]] %>%
grep("gene_name",.,value = T) %>%
strsplit(.,split = "\"") %>% `[[`(1) %>% `[`(2) %>% gsub("[.].+","",.))
<- anno %>% select(gene_id,gene_name) %>% distinct_all()
anno saveRDS(anno,file = "../tmp/ici_gene_anno.rds")
<- function(path,cohort){
get_tpm <- list.files(path = path)
files <- data.table::fread(paste0(path,files[1]),data.table = F) %>%
dt rename(gene_id=V1,tpm=V2)
<- left_join(
dt
dt,anno%>% na.omit() %>% select(gene_name,tpm) %>% distinct(gene_name,.keep_all = T)
) colnames(dt)[2] <- paste0(cohort,"_",gsub(".tsv","",files[1]))
for (i in 2:length(files)){
<- data.table::fread(paste0(path,files[i]),data.table = F) %>%
dt1 rename(gene_id=V1,tpm=V2)
<- left_join(
dt1
dt1,anno%>% na.omit() %>% select(gene_name,tpm) %>% distinct(gene_name,.keep_all = T)
) colnames(dt1)[2] <- paste0(cohort,"_",gsub(".tsv","",files[i]))
<- left_join(dt,dt1,by = "gene_name")
dt
}#re_tpm <- bind_rows(res)
return(dt)
}
<- get_tpm("~/immune_thearpy/TPM/willy/",cohort = "willy")
willy_tpm saveRDS(willy_tpm,file = "../data/Immunotherapy/willy_tpm.rds")
<- get_tpm("~/immune_thearpy/TPM/nadeem/",cohort = "nadeem")
nadeem_tpm saveRDS(nadeem_tpm,file = "../data/Immunotherapy/nadeem_tpm.rds")
<- Reduce(intersect,list(liu_tpm$gene,willy_tpm$gene_name,nadeem_tpm$gene_name))
all_gene <- left_join(
all_tpm %>% filter(gene %in% all_gene),
liu_tpm %>% filter(gene_name %in% all_gene) %>% rename(gene=gene_name)
willy_tpm %>% left_join(.,nadeem_tpm %>% filter(gene_name %in% all_gene) %>% rename(gene=gene_name)) %>% select(gene,everything())
)
saveRDS(all_tpm,file = "../data/Immunotherapy/all_tpm_ici.rds")
###cal_cyt
<- readRDS("../data/Immunotherapy/all_tpm_ici.rds")
all_tpm = function(x, na.rm=TRUE, zero.propagate = FALSE){
gm_mean if(any(x < 0, na.rm = TRUE)){
return(NaN)
}if(zero.propagate){
if(any(x == 0, na.rm = TRUE)){
return(0)
}exp(mean(log(x), na.rm = na.rm))
else {
} exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}
}<- all_tpm %>%
two_gene filter(gene %in% c("GZMA","PRF1"))
%>%
two_gene select(-gene) %>%
t() %>% as.data.frame() -> two_gene
$sample <- rownames(two_gene)
two_gene$cyt <- apply(two_gene[,1:2],1,gm_mean)
two_gene<- two_gene %>%
two_gene select(sample,cyt)
saveRDS(two_gene,file = "../data/Immunotherapy/cyt_ici.rds")
###Ayers gene exp signature
c("CD3D","IDO1","CIITA","CD3E",
"GZMK","CD2","HLA-DRA","CXCL13",
"IL2RG","NKG7","HLA-E","CXCR6","LAG3","TAGAP","CXCL10","STAT1","GZMB") %in% all_tpm$gene
<- all_tpm %>%
ayer_gene filter(gene %in% c("CD3D","IDO1","CIITA","CD3E",
"GZMK","CD2","HLA-DRA","CXCL13",
"IL2RG","NKG7","HLA-E","CXCR6","LAG3","TAGAP","CXCL10","STAT1","GZMB"))
rownames(ayer_gene) <- ayer_gene$gene
%>%
ayer_gene select(-gene) %>%
t() %>% as.data.frame() -> ayer_gene
$sample <- rownames(ayer_gene)
ayer_gene$ayer_score <- apply(ayer_gene[,1:17],1,mean)
ayer_gene<- ayer_gene %>%
ayer_gene select(sample,ayer_score)
saveRDS(ayer_gene,file = "../data/Immunotherapy/ayer_score_ici.rds")
###IMPRES
<- data.frame(sample=colnames(all_tpm)[2:ncol(all_tpm)],
dt PDCD1_TNFSF4=NA,CD27_PDCD1=NA,CTLA4_TNFSF4=NA,
CD40_CD28=NA, CD86_TNFSF4=NA, CD28_CD86=NA, CD80_TNFSF9=NA,
CD274_VSIR=NA, CD86_HAVCR2=NA, CD40_PDCD1=NA, CD86_CD200=NA, CD40_CD80=NA,
CD28_CD276=NA, CD40_CD274=NA,TNFRSF14_CD86=NA)
for (i in dt$sample){
for(j in colnames(dt)[2:ncol(dt)]){
<- strsplit(j,split = "_")[[1]]
genes <- all_tpm %>%
df filter(gene %in% genes) %>%
select(i,gene)
$sample==i,j] <- ifelse(df[df$gene==genes[1],1] < df[df$gene==genes[2],1],1,0)
dt[dt
}
}$IMPRES <- apply(dt[,2:ncol(dt)],1,sum)
dt<- dt %>% select(sample,IMPRES)
dt saveRDS(dt,file = "../data/Immunotherapy/IMPRES_ici.rds")
##POPLAR
<- all_tpm %>%
POPLAR_gene filter(gene %in% c("CD8A", "GZMA", "GZMB","IFNG", "EOMES", "CXCL9", "CXCL10", "TBX21"))
rownames(POPLAR_gene) <- POPLAR_gene$gene
%>%
POPLAR_gene select(-gene) %>%
t() %>% as.data.frame() -> POPLAR_gene
$sample <- rownames(POPLAR_gene)
POPLAR_gene$POPLAR_score <- apply(POPLAR_gene[,1:8],1,mean)
POPLAR_gene<- POPLAR_gene %>%
POPLAR_gene select(sample,POPLAR_score)
saveRDS(POPLAR_gene,file = "../data/Immunotherapy/POPLAR_score_ici.rds")
##some gene
<- all_tpm %>%
gene_exp filter(gene %in% c("CXCL9","CD8A","CD274","CD38","CXCL13","PDCD1"))
rownames(gene_exp) <- gene_exp$gene
%>%
gene_exp select(-gene) %>%
t() %>% as.data.frame() -> gene_exp
$sample <- rownames(gene_exp)
gene_expsaveRDS(gene_exp,file = "../data/Immunotherapy/need_gene_exp_ici.rds")
##bind all
<- readRDS("../data/Immunotherapy/all_clinical.rds")
all_clinical <- readRDS("../data/Immunotherapy/nes_immunetherapy.rds")
neo_nes <- readRDS("../../tmp/all_mut_ici_withrefalt.rds")
all_mut_ici <- left_join(neo_nes,all_clinical,by="sample")
neo_nes <- neo_nes %>% filter(!is.na(response2))
neo_nes <- readRDS("../data/Immunotherapy/all_ccf.rds") %>%
all_ccf filter(!is.na(cancer_cell_frac))
#clonal TMB
<- readRDS("../data/Immunotherapy/all_mut_ccf_ici.rds")
all_mut_ccf_ici %>%
all_mut_ccf_ici group_by(sample) %>%
summarise(all_tmb=n()/38,clonal_tmb=sum(cancer_cell_frac>=0.9)/38) -> tmb
<- left_join(neo_nes,tmb)
neo_nes
##indel tmb
<- readRDS("../data/Immunotherapy/all_indel_counts.rds")
all_indel_counts <- left_join(neo_nes,all_indel_counts)
neo_nes
##escape NMD indel TMB
<- readRDS("../data/Immunotherapy/nes_ici_escapeNMD_indel.rds")
nes_ici_escapeNMD_indel <- left_join(neo_nes,nes_ici_escapeNMD_indel %>% select(-es))
neo_nes
##signature
#smoke sig4
<- readRDS("../data/sig_ici_need.rds")
sigs <- sigs %>%
sigs ::rename(smoke=Signature.4,UV=Signature.7) %>%
dplyrmutate(APOBEC=Signature.2+Signature.13) %>%
::select(-Signature.13,-Signature.2)
dplyr<- left_join(
neo_nes
neo_nes,
sigs
)
###SERPINB3 mut
<- all_mut_ici %>%
SERPINB3_mut group_by(sample) %>%
summarise(SERPINB3_mut=ifelse("SERPINB3" %in% gene,"yes","no"))
<- left_join(
neo_nes
neo_nes,
SERPINB3_mut
)
##gene exp
<- readRDS("../data/Immunotherapy/need_gene_exp_ici.rds")
need_gene_exp_ici <- left_join(
neo_nes
neo_nes,
need_gene_exp_ici
)
###gene exp signature
#Ayers IMPRES POPLAR CYT
<- readRDS("../data/Immunotherapy/ayer_score_ici.rds")
ayer_score_ici <- readRDS("../data/Immunotherapy/IMPRES_ici.rds")
IMPRES_ici <- readRDS("../data/Immunotherapy/POPLAR_score_ici.rds")
POPLAR_score_ici <- readRDS("../data/Immunotherapy/cyt_ici.rds")
cyt_ici
<- left_join(
neo_nes
neo_nes,ayer_score_ici%>% left_join(.,IMPRES_ici) %>% left_join(.,POPLAR_score_ici) %>%
) left_join(.,cyt_ici)
colnames(neo_nes)
##add rna
<- readRDS("../data/Immunotherapy/nes_immunetherapy_exp.rds")
neo_nes_exp <- left_join(
neo_nes
neo_nes,%>% select(sample,es) %>% rename(es_exp=es)
neo_nes_exp
)
saveRDS(neo_nes,file = "../data/Immunotherapy/all_factors_ici.rds")
Univariate Cox regression analysis was performed to estimate the hazard ratio (HR) associated with different factors:
<- readRDS("../data/Immunotherapy/all_factors_ici.rds")
neo_nes show_forest(neo_nes,covariates = c("es","es_exp",colnames(neo_nes)[7:10],
colnames(neo_nes)[14:19],
colnames(neo_nes)[21:24],"UV"),
time = "OS.time",status = "OS",merge_models = T)
#> => Processing variable es
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable es_exp
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable all_tmb
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable clonal_tmb
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable indel_counts
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable escape_nmd_indels
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable SERPINB3_mut
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable CD274
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable CD38
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable CD8A
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable CXCL13
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable CXCL9
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable ayer_score
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable IMPRES
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable POPLAR_score
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable cyt
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
#> => Processing variable UV
#> ==> Building Surv object...
#> ==> Building Cox model...
#> ==> Done.
In this melanoma dataset, only ESCCF and UV signature show significant HR. These new data further validate the performance of ESCCF in immune checkpoint inhibitor therapy clinical response prediction.