Computational Biology and Bioinformatics Consultation
Computational Biology and Bioinformatics Consultation provides researchers with:
- Analysis of next generation sequencing data, including Exome-sequencing, RNA-sequencing, ChIP-sequencing and whole genome re-sequencing
- Analysis of microarray datasets, including mRNA (Affymetrix), SNP, and micro-RNA
- Analysis of nCounter NanoString data
- Analysis of publicly available datasets using search parameters defined by the OSUCCC client; for example, from the Gene Expression Omnibus (GEO) database of microarray results, The Cancer Genome Atlas (TCGA), and International Cancer Genome Consortium (ICGC) data portals that allow access to results of thousands of deep sequencing projects
- Pathway analysis of results from microarray and sequencing data
- Image analysis, including processing images in order to reconstruct a three-dimensional tissue, or improvement of image quality using specialized computer programs
- Additional bioinformatics analysis as needed (including protein structure prediction, Genbank search, BLAST, PDB search and motif analysis)
Custom Research Management and Support Tools
The OSUCCC has developed several novel data management tools designed to integrate with the data sharing service, creating an optimal flow of data.
The BISR-created program QUEST organizes data, allows the addition of metadata and controls security access. The directory can be made visible to OSUCCC investigators, who can also request access in collaboration with the data owner.
Additionally, the BISR develops novel data management interfaces for the CALGB consortium and for the OSUCCC Mouse Phenotype Shared Resource.
Massively Parallel Sequence Analysis
The sequence output from the Illumina HiSeq 2500 consists of about 200 million sequence tags of 50-100 bp in length per sample.
The sequencer output is useful to biologists when the BISR provides several standard data transformations. The BISR provides three classes of service for the sequence analysis:
- Running an automated pipeline from the sequencing machine to compile sequence reads and their quality assignments
- Running best practice workflows for the analysis of RNA-sequencing, ChIP-sequencing, Exome-sequencing, and whole genome re-sequencing (besides alignment results, final output includes: gene expression quantification for RNA-sequencing, variant analysis for exome-sequencing and whole genome re-sequencing, and peak detection for ChIP-sequencing datasets)
- Performing downstream analysis, including sample comparisons, pathway analysis and integration with private and publicly available datasets
Analysis of Microarray Methods
The Analysis of Microarray Methods includes mRNA (Affymetrix), SNP and micro-RNA. Microarray methods results are first analyzed by the Biostatistics Shared Resource to identify genes/probes that have significantly changed expression level in the assay. The BISR not only provides downstream analysis of identifying pathways enriched in the results, but it also helps with data presentation and results submission in public databases.
Publicly Available Datasets Analysis
The Gene Expression Omnibus (GEO) is one of the world’s largest databases in which primary data from nearly all published microarray experiments are stored. The BISR staff analyzes this publicly available data with the biologist.
The Cancer Genome Atlas (TCGA), and International Cancer Genome Consortium (ICGC) are data portals that allow access to results of thousands of deep sequencing projects. BISR staff implements and runs custom workflows and integrates these publicly available datasets to generate new hypotheses or draw new conclusions.
Pathway and Network Analysis of Results from Sequencing and Microarray
Typical analysis of sequencing and Microarray data results in set of genes that have altered gene expression, or have functional mutations. The BISR staff are expert users of the Ingenuity Pathway Analysis program, which enables the identification of biological pathways and networks that are enriched in results. The pathway analysis assists the biologist with navigating the large numbers of gene hits, and it can generate new hypotheses for the biologist to test in the laboratory.
Additionally, the BISR can provide advanced network analysis, such as co-expression network analysis, gene regulatory and interaction network analysis, and dense network motif discovery.
Image analysis, including 3D reconstruction, deconvolution and de-noising, offers a variety of image analysis tools to facilitate microscopic image presentation or to make interpretations that otherwise would not be apparent on the original image because of technical limitations.
Additional Bioinformatics Analysis
A variety of other bioinformatics analyses are available to OSUCCC investigators, including mapping of amino acid residues on 3D structures, threading polypeptides onto known structures of homologous proteins, Genbank searches, BLAST and motif searches.
Modeling, Annotation and Curation of Core Data Sources
Modeling, semantic annotation and curation of core data sources includes databases and data sources associated with Shared Resources and investigator projects and laboratories, as well as enterprise systems such as Electronic Health Records, Clinical Trial Management Systems, bio-specimen management systems and data warehouses.
Data Sharing Infrastructure
The Data Sharing Infrastructure provides OSUCCC investigators services, expertise and access to technology platforms in support of heterogeneous and multi-dimensional biomedical data management requirements.
The BISR provides consultative services, including:
- Project planning and feasibility analyses
- Human-mediated access to the OSUWMC enterprise data warehouse (EDW)
- Research database design and management
- Development of custom plug-ins and/or extensions to the preceding tools
Services and software include:
- REDCap (a flexible and end-user configurable electronic data capture system)
- REDCap Survey (a locally controlled web-based survey engine)
- StudySearch (a semantically anchored clinical trial registry, with investigator, regulatory and patient-facing interfaces)
- caTissue Suite (a comprehensive biospecimen management system)
- TRITON (an integrative translational research data management platform tailored for multi-site studies that incorporate clinical trials, correlative basic science data capture and bio-specimen disposition/tracking)
- the TRIAD service-oriented architecture (SOA) for the purposes of distributed data resource discovery, integration, and exchange
- Microsoft Office SharePoint (for the purposes of web content management as well as the provision of team-oriented collaboration sites that incorporate document and calendar sharing, discussion forums and WIKI functionality)
- laboratory information management system (LIMS) development, where we have developed a novel data management system for the CALGB Consortium (now the National Cancer Trials Network) allowing clinicians across the nation to upload, share, discuss and annotate different cases and the cytogenetic images