Introduction

The Bioinformatics Core (BC) provides the computational solutions for high-throughput MS-base and NGS-base data analysis, advance biostatistics, data mining, and omics data integration. The core has built and installed the platforms, workflows, and tools on servers with the capacity to handle big data and parallel computing. As the technologies for bioinformatics are constantly evolving, the core is committed to provide the latest technologies to meet the need of investigators in IBC. In addition, the core offers on-site consultations and trainings for common bioinformatics tools and analysis.

fig2

1. Servers

The core installs various servers to meet different needs of bioinformatics.The in-house galaxy server is a web-based platform for small-scale data analysis with user-friendly interface and wide-range of bioinformatics tools. The Linux server has the specifications for high-throughput data analysis from next generation sequencing and genome assembly. Major tools installed on the server include SPAdes/MaSuRCA for genome assembly, maker for genome annotation, Salmon for RNAseq quantation, Blast+ on NCBI nr database, AntiSMASH for fungi/bacteria for gene cluster analysis, Mothur for 16S rRNA sequence data analysis, etc. The web server allows building in-house web applications and R/Shiny server supports the R-based statistics and interactive graphs for in-house web applications.

fig3

2. Advanced Data Analysis

The core provides services for advanced data analysis including differential expression analysis for proteomics/genomics data, survival analysis, multivariate survival analysis, heatmap, clustering, interactome analysis, pathway analysis, functional enrichment analysis, genotype-phenotype association analysis, circus plot, parametric/non-parametric statistics, logistic regression, time series analysis, machine learning, etc.

fig4

3. Data Mining and Integration

The core provides the service for data mining on the databases from public domains including cancer genomic data from Genomic Data Common (GDC) data portal, the Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA) in NCBI, the somatic mutation database in COSMIC, proteomics data from PRIDE Archive in EBI, and post translation modification database, etc. The core also helps building web tools for data visualization and information integration.

fig5 1

4. Streamline and automate data analysis

The core collaborates with the investigators in IBC to streamlines data analysis workflows for complex and computation intensive project such as genomic analysis for pan-cancer study, microbiome analysis, alternative splicing identification and quantitation, and automated genome assembly/annotation.