目录:
研究工具
文献管理
- Endnote - 上科大图信中心提供了软件下载
- Zotero + 坚果云 - 强推!
- 如何配置请查看好物推荐 - 坚果云中“如何在Zotero中设置webdav连接到坚果云?”一节。
- Zotero基本介绍请看文献管理软件Zotero基础及进阶示范,里面也包含了与国外流行云盘Dropbox的连用配置。
学术搜索
文献追踪
文档书写
- Typora - 便捷的Markdown写作以及各种格式导出
- LaTex - 上科大图信中心有教程
- Word
科学上网
Figure制作与美化
- Adobe Photoshop
- Adobe Illustrator
- PPT
三款软件上科大图信中心软件库都有。
虽然掌握好编程可以快速、自定义、批量绘制图形,但有时候不满足出版刊物要求的格式或者需要排版、美化等,上面三款软件都提供了类似的功能。
注意,AI提供的是矢量图制作、Photoshop提供的是位图、而PPT支持两者。
推荐教程:
- 《生命科学插图从入门到精通 —— Adobe Illustrator使用技巧》(该书实验室已购买)
- 《史上最全PPT科研作图教程及素材(第二版)》
数据分析与生信软件(包)
下面按软件(包)的功能进行划分,基础工具包常提供底层计算和基础的数据分析操作支持,
基础工具
- R - R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
- Bioconductor - Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, and an active user community.
- Python - Python is a programming language that lets you work quickly and integrate systems more effectively.
- IPython - IPython provides a rich architecture for interactive computing with:
- A powerful interactive shell.
- A kernel for Jupyter.
- Support for interactive data visualization and use of GUI toolkits.
- Flexible, embeddable interpreters to load into your own projects.
- Easy to use, high performance tools for parallel computing.
- Biopython - Biopython is a set of freely available tools for biological computation written in Python by an international team of developers.
- Numpy - NumPy is the fundamental package for scientific computing with Python. It contains among other things:
- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities
- Pandas - pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
数据下载
- UCSCXenaTools - UCSCXenaTools is a R package download and explore data from UCSC Xena data hubs.
- TCGAbiolinks - An R/Bioconductor package for integrative analysis with TCGA data.
- GEOquery - Get data from NCBI Gene Expression Omnibus (GEO).
数据分析
TCGA
- TCGAbiolinks - An R/Bioconductor package for integrative analysis with TCGA data.
聚类
- Clusternomics - Integrative clustering for heterogeneous biomedical datasets.
异质性与克隆演化
- sciclone - An R package for inferring the subclonal architecture of tumors
- pyclone - Probabilistic model for inferring clonal population structure from deep NGS sequencing
绘图与交互
Web App
数据库
基础数据库
- NCBI
- EGA - The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects.
- GEO
- ENCODE - Encyclopedia of DNA Elements.
- GENCODE - The goal of the GENCODE project is to identify and classify all gene features in the human and mouse genomes with high accuracy based on biological evidence, and to release these annotations for the benefit of biomedical research and genome interpretation.
- Roadmap - The NIH Roadmap Epigenomics Mapping Consortium was launched with the goal of producing a public resource of human epigenomic data to catalyze basic biology and disease-oriented research. The Consortium leverages experimental pipelines built around next-generation sequencing technologies to map DNA methylation, histone modifications, chromatin accessibility and small RNA transcripts in stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease.
- IGSR: The International Genome Sample Resource - Providing ongoing support for the 1000 Genomes Project data.
- Ensembl - Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.
- IEDB - The Immune Epitope Database (IEDB) is a freely available resource funded by NIAID. It catalogs experimental data on antibody and T cell epitopes studied in humans, non-human primates, and other animal species in the context of infectious disease, allergy, autoimmunity and transplantation. The IEDB also hosts tools to assist in the prediction and analysis of epitopes.
- UniProt - The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.
- OMIM - An Online Catalog of Human Genes and Genetic Disorders.
- Synapse - Synapse is a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate.
癌症数据库
- COSMIC - the Catalogue Of Somatic Mutations In Cancer, is the world’s largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.
- CCLE - Cancer Cell Line Encyclopedia.
- TCGA GDC
- ICGC - Cancer genomics data sets visualization, analysis and download.
- UCSC Xena
- Broad Firehose
- cBioPortal - The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets.
- TCIA - The Cancer Immunome Atlas: The Cancer Immunome Database (TCIA) provides results of comprehensive immunogenomic analyses of next generation sequencing data (NGS) data for 20 solid cancers from The Cancer Genome Atlas (TCGA) and other datasources.
- TSNAdb - TSNAdb is developed based on pan-cancer immunogenomic analyses of somatic mutation data and human leukocyte antigen (HLA) allele information for 16 tumor types with 7748 tumor samples from The Cancer Genome Atlas (TCGA) and The Cancer Immunome Atlas (TCIA).
- Tumor Fusion Gene Data Portal - Landscape of cancer-associated fusions using the Pipeline for RNA sequencing Data Analysis. Transcripts fusion as a result of genomic rearrangement is an important class of somatic alteration, as a cancer initiating event and as a molecular therapeutic target for specific tumors. Our Pipeline for RNA sequencing Data Analysis (PRADA) enables us to detect fusion transcripts with high confidence comprehensively. Based on integrated analysis of paired-end RNA sequencing and DNA copy number data from The Cancer Genome Atlas(TCGA), The Tumor Fusion Gene Data Portal provides a bona-fide fusion list across many tumor types.
- TSGene - Tumor suppressor gene database.
- oncogene database - the first literature database for oncogenes.
分析数据库
- The MEME Suite - Motif-based sequence analysis tools.
- MEXPRESS - MEXPRESS is a data visualization tool designed for the easy visualization of TCGA expression, DNA methylation and clinical data, as well as the relationships between them.
- ARCHS - 人鼠RNASeq数据挖掘。
- GEPIA - Gene Expression Profiling Interactive Analysis.
- MiPanda - Michigan Portal for the Analysis of NGS Data.
- TIP - Tracking Tumor Immunophenotype.
- IOExplorer - An Immune-Oncology data analysis portal brought to you by MSKCC IPOP.
- CGI - Cancer Genome Interpreter.
- TIDE - Tumor Immune Dysfunction and Exclusion: TIDE is a gene expression biomarker to predict the clinical response to immune checkpoint blockade.
- OncoLnc - Here you can link TCGA survival data to mRNA, miRNA, or lncRNA expression levels. To get started simply input either a Tier 3 TCGA mRNA, miRNA, or MiTranscriptome beta lncRNA.
测序项目
- UK Biobank - UK Biobank is a national and international health resource with unparalleled research opportunities, open to all bona fide health researchers. UK Biobank aims to improve the prevention, diagnosis and treatment of a wide range of serious and life-threatening illnesses – including cancer, heart diseases, stroke, diabetes, arthritis, osteoporosis, eye disorders, depression and forms of dementia. It is following the health and well-being of 500,000 volunteer participants and provides health information, which does not identify them, to approved researchers in the UK and overseas, from academia and industry.
项目与基金数据库
- CDE Browser - NCBI项目查询
- Keynote - Keynote is a series of clinical trials to determine whether an investigational immunotherapy may help in the treatment of cancer. The investigational immunotherapy is pembrolizumab (MK-3475).
其他数据库
- Codon USage Database
- ExAC Browser - The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.
- TANRIC - Enable Scientific Distcovery Through ncRNA
课题组研发工具
该部分汇总相关课题组及其研发和使用的工具。
Li Ding’s Lab
Li Ding’s Lab at the Washington University School of Medicine
- msisensor - microsatellite instability detection using tumor only or paired tumor-normal data
- MuSiC2 - identifying mutational significance in cancer genomes
- gatk4wxscnv - Pipeline for WXS CNV using GATK4
- hotspot3d - 3D hotspot mutation proximity analysis tool. This 3D proximity tool can be used to identify mutation hotspots from linear protein sequence and correlate the hotspots with known or potentially interacting domains, mutations, or drugs. Mutation-mutation and mutation-drug clusters can also be identified and viewed.
- Pindel - Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
- pindel2 - Detecting break points of large deletions and medium sized insertions from paired-end short read.
- queryGDC - Command line client for Genomic Data Commons graphGL queries
- importGDC - Command line client for Genomic Data Commons data downloads
- gdc_qc_analysis - Somatic mutation pipeline comparison of TCGA samples between Genomic Data Commons (GDC) and MC3
- outlier - Outlier analysis module to identify aberrantly highly expressed genes.
- VariantQC - Variant quality checking scripts for complex indel variant discovery and filtering from Pindel-C outputs. Referenced in Systematic discovery of complex insertions and deletions in human cancers (doi:10.1038/nm.4002).
- cmds - Cohort DNA Copy Number Analysis. A population-based method for DNA copy number analysis: recurrent copy number aberration identification in multiple samples (with no need of single-sample calling). Developed for a quick analysis of high resolution and large population data.
- vcf2maf - Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms.
- parse-cosmic - Script to carefully parse through and standardize somatic variant lists from COSMIC