Ziyu Tao1-3* Shixiang Wang1-3* Chenxu Wu1-3* Huimin Li1 Tao Wu1 Xiangyu Zhao1 Wei Ning1 Guangshuai Wang1 Xue-Song Liu1

1 School of Life Science and Technology, ShanghaiTech University, Shanghai 201203, China
2 Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
3 University of Chinese Academy of Sciences, Beijing, China

Abstract

DNA alteration signatures are recurring patterns that are the imprints of mutagenic processes accumulated during the evolution of cancer cell. Despite the importance of copy number alteration (CNA) in cancer progression,comprehensive understanding about the mutational processes and signatures of CNA is still lacking. Here we developed a method to categorize CNA based on various fragment properties, which reflect the consequences of mutagenic processes and can be extracted from different types of data, including whole genome sequencing (WGS), whole exome sequencing (WES) and SNP array. The signature of CNA has been extracted from 2778 pan-cancer analysis of whole genomes (PCAWG) WGS samples, and further validated using 10851 the cancer genome atlas (TCGA) SNP array dataset. Novel copy number signatures associated with haploid chromosome have been identified. The activities of some copy number signatures consistently predict cancer patients’ prognosis. This study provides a repertoire for understanding the signatures and mutational processes of CNA.

Methodology

CNA classification strategy for signature analysis.

Figure 1: CNA classification strategy for signature analysis.

For each CNA segment, the following features have been considered: 1, segment context, including segment shape and copy number change number; 2, Absolute copy number; 3, LOH status; 4, segment size. In total 176 types of CNA segments have been defined accordingly.

De novo CNA signature extraction with sigprofiler.

Figure 2: De novo CNA signature extraction with sigprofiler.

We selected the signature extraction solution as the maximum signature number which meets the following criteria: 1. No over fit. 2. Stability should be at least local maximal. 3. Mean cosine distance should be as small as possible. Based on the rules, we selected 14 signatures for PCAWG copy number data.

Results

Proportion of tumors with the signature and the median activity of the signature are shown for 32 PCAWG cancer types. For each individual tumor, only signatures that contribute to >=5% of the total are counted.

Figure 3: Proportion of tumors with the signature and the median activity of the signature are shown for 32 PCAWG cancer types. For each individual tumor, only signatures that contribute to >=5% of the total are counted.

Representative CNA profile, prominent features and potential mechanisms for each identified CNA signatures extracted in PCAWG dataset.

Figure 4: Representative CNA profile, prominent features and potential mechanisms for each identified CNA signatures extracted in PCAWG dataset.

An important purpose of CNA signature analysis is to identify the underlying mutational processes for CNA, and classify CNA based on mutational processes. Potential mutational processes for CNA include intrinsic inducers and extrinsic inducers. Intrinsic CNA inducers include: double-strand break repair defect (HRD etc.); cell cycle defect; DNA replication defects; telomere loss, etc.

CNA signature activity and cancer patients’ prognosis.

Figure 5: CNA signature activity and cancer patients’ prognosis.

The CNA signatures extracted from cancer patients could be cancer prognosis biomarkers.In pan-cancer level, activities of CNS11, CNS12 are significantly associated with poor OS.

Associations between the relative activities of CNA signatures and tumor stages.

Figure 6: Associations between the relative activities of CNA signatures and tumor stages.

Conclusion

Here we developed a unified and comprehensive method for copy number signature analysis. Our method can be applied in cancer patients with copy number profiles generated with WGS, WES or SNP array data. Our copy number signature analysis method is based on a novel and comprehensive method to catalog copy number segments. New CNA patterns have been identified, and the activities of some CNA signature have been demonstrated to have prognostic power in pan-cancer datasets, suggesting copy number signatures could be biomarkers to guide cancer precision medicine.

More

You can scan QR code at bottom center to see online analysis report. All code and related data are published at https://github.com/XSLiuLab/Pan-cancer_CNA_signature.

Acknowledgments

We thank ShanghaiTech University High Performance Computing Public Service Platform for computing services. This work was supported by The National Natural Science Foundation of China (31771373), Shanghai Science and Technology Commission (21ZR1442400), and startup funding from ShanghaiTech University.

©Cancer Biology Group 2021

Research group led by Xue-Song Liu in ShanghaiTech. University. Lab website is shown in QR code at bottom right.

The repertoire of copy number alteration signatures in human cancer