Sign in to eRamp to place orders for the Biomedical Informatics Shared Resource
Computational Biology and Bioinformatics Consultation
Computational Biology and Bioinformatics Consultation provides researchers with:
- Analysis of next generation sequencing data, including Exome-sequencing, RNA-sequencing, ChIP-sequencing and whole genome re-sequencing
- Analysis of microarray datasets, including mRNA (Affymetrix), SNP, and micro-RNA
- Analysis of nCounter NanoString data
- Analysis of publicly available datasets using search parameters defined by the OSUCCC client; for example, from the Gene Expression Omnibus (GEO) database of microarray results, The Cancer Genome Atlas (TCGA), and International Cancer Genome Consortium (ICGC) data portals that allow access to results of thousands of deep sequencing projects
- Pathway analysis of results from microarray and sequencing data
- Image analysis, including processing images in order to reconstruct a three-dimensional tissue, or improvement of image quality using specialized computer programs
- Additional bioinformatics analysis as needed (including protein structure prediction, Genbank search, BLAST, PDB search and motif analysis)
Custom Research Management and Support Tools
The OSUCCC has developed several novel data management tools designed to integrate with the data sharing service, creating an optimal flow of data.
The BISR-created program QUEST organizes data, allows the addition of metadata and controls security access. The directory can be made visible to OSUCCC investigators, who can also request access in collaboration with the data owner.
Additionally, the BISR develops novel data management interfaces for the CALGB consortium and for the OSUCCC Mouse Phenotype Shared Resource.
Massively Parallel Sequence Analysis
The sequence output from the Illumina HiSeq 2500 consists of about 200 million sequence tags of 50-100 bp in length per sample.
The sequencer output is useful to biologists when the BISR provides several standard data transformations. The BISR provides three classes of service for the sequence analysis:
- Running an automated pipeline from the sequencing machine to compile sequence reads and their quality assignments
- Running best practice workflows for the analysis of RNA-sequencing, ChIP-sequencing, Exome-sequencing, and whole genome re-sequencing (besides alignment results, final output includes: gene expression quantification for RNA-sequencing, variant analysis for exome-sequencing and whole genome re-sequencing, and peak detection for ChIP-sequencing datasets)
- Performing downstream analysis, including sample comparisons, pathway analysis and integration with private and publicly available datasets
Analysis of Microarray Methods
The Analysis of Microarray Methods includes mRNA (Affymetrix), SNP and micro-RNA. Microarray methods results are first analyzed by the Biostatistics Shared Resource to identify genes/probes that have significantly changed expression level in the assay. The BISR not only provides downstream analysis of identifying pathways enriched in the results, but it also helps with data presentation and results submission in public databases.
Publicly Available Datasets Analysis
The Gene Expression Omnibus (GEO) is one of the world’s largest databases in which primary data from nearly all published microarray experiments are stored. The BISR staff analyzes this publicly available data with the biologist.
The Cancer Genome Atlas (TCGA), and International Cancer Genome Consortium (ICGC) are data portals that allow access to results of thousands of deep sequencing projects. BISR staff implements and runs custom workflows and integrates these publicly available datasets to generate new hypotheses or draw new conclusions.
Pathway and Network Analysis of Results from Sequencing and Microarray
Typical analysis of sequencing and Microarray data results in set of genes that have altered gene expression, or have functional mutations. The BISR staff are expert users of the Ingenuity Pathway Analysis program, which enables the identification of biological pathways and networks that are enriched in results. The pathway analysis assists the biologist with navigating the large numbers of gene hits, and it can generate new hypotheses for the biologist to test in the laboratory.
Additionally, the BISR can provide advanced network analysis, such as co-expression network analysis, gene regulatory and interaction network analysis, and dense network motif discovery.
Image analysis, including 3D reconstruction, deconvolution and de-noising, offers a variety of image analysis tools to facilitate microscopic image presentation or to make interpretations that otherwise would not be apparent on the original image because of technical limitations.
Additional Bioinformatics Analysis
A variety of other bioinformatics analyses are available to OSUCCC investigators, including mapping of amino acid residues on 3D structures, threading polypeptides onto known structures of homologous proteins, Genbank searches, BLAST and motif searches.
Modeling, Annotation and Curation of Core Data Sources
Modeling, semantic annotation and curation of core data sources includes databases and data sources associated with Shared Resources and investigator projects and laboratories, as well as enterprise systems such as Electronic Health Records, Clinical Trial Management Systems, bio-specimen management systems and data warehouses.
Data Sharing Infrastructure
The Data Sharing Infrastructure provides OSUCCC investigators services, expertise and access to technology platforms in support of heterogeneous and multi-dimensional biomedical data management requirements.
The BISR provides consultative services, including:
- Project planning and feasibility analyses
- Human-mediated access to the OSUWMC enterprise data warehouse (EDW)
- Research database design and management
- Development of custom plug-ins and/or extensions to the preceding tools
Services and software include:
- REDCap (a flexible and end-user configurable electronic data capture system)
- REDCap Survey (a locally controlled web-based survey engine)
- StudySearch (a semantically anchored clinical trial registry, with investigator, regulatory and patient-facing interfaces)
- caTissue Suite (a comprehensive biospecimen management system)
- TRITON (an integrative translational research data management platform tailored for multi-site studies that incorporate clinical trials, correlative basic science data capture and bio-specimen disposition/tracking)
- the TRIAD service-oriented architecture (SOA) for the purposes of distributed data resource discovery, integration, and exchange
- Microsoft Office SharePoint (for the purposes of web content management as well as the provision of team-oriented collaboration sites that incorporate document and calendar sharing, discussion forums and WIKI functionality)
- laboratory information management system (LIMS) development, where we have developed a novel data management system for the CALGB Consortium (now the National Cancer Trials Network) allowing clinicians across the nation to upload, share, discuss and annotate different cases and the cytogenetic images
During the last grant period, the BISR has supported efforts for customized data management by providing Electronic Data Capture platforms (EDC) for researcher use (e.g., REDCap, StudyTRAX, and TeleForm) as well as custom code extensions to these EDC tools for specific use-cases (e.g., asynchronous data collection methodology and dynamic audio integration).
Analysis and execution/implementation of data requests, queries, and project-specific data-marts corresponding to the contents of an Enterprise Data Warehouse (EDW) as well as specific reporting tools associated with OSUWMCs Electronic Health Record (EHR) system
During the last grant period, the BISR has facilitated efforts for querying the OSUWMC EDW by providing human-mediated access to the electronic health record for IRB-approved secondary use purposes, serving as the honest broker for protection of protected health information.
Formulation and application of novel algorithms, software, and methodologies to enable the syntactic and semantic integration of heterogeneous and large-scale biomedical data sets
The BISR supports efforts for custom software engineering where commercial and/or existing open-source data management tools do not meet investigator needs. These software engineering projects involve the use of industry standard technologies and agile project management best practice.
Instrumentation and Equipment
All Biomedical Informatics Shared Resource (BISR) projects use the Jira Task Management System (TMS) for improved project tracking. The TMS allows BISR leadership to monitor projects with a simple dashboard. A single screen can be customized to graph the total tasks accomplished (burn-down chart) as well as team member assignments for all projects.
Equipment includes a variety of desktop computers with site licenses for specific software. No other special equipment is needed. Site licenses include: Ingenuity Pathway Analysis.
A comprehensive computational infrastructure, maintained by Ohio State’s Department of Biomedical Informatics and Ohio State’s Wexner Medical Center Information Technology Department, includes:
The Multiscale Computing Laboratory, located in the Biomedical Informatics Department, houses two data-centric clusters with 1.25 TB and 16 TB storage space.
The first data-centric cluster is a 5-node IA32 cluster with dual Xeon CPUs. Each cluster node, interconnected via a dedicated Gigabit Switch, has 250GB RAID disk space, 2GB memory and dual Gigabit network cards.
The second data-centric cluster has 8 nodes with dual 64-bit AMD Opteron 240 CPUs. Each node, interconnected via a dedicated Gigabit Switch, has 8 GB memory, dual Gigabit network cards and SATA RAID system with eight 250GB disks.
The laboratory also has several PCs, used as workstations for code development and testing.
The Ohio Super Computer Facility (OSC) maintains a large-scale mass storage system. The system has multiple levels of storage at different conceptual distances from computer hardware, and it has different usage and management patterns.
The overall storage system comprises a core 50 TB high-throughput Fibre Channel RAID disk arrays, a 320 TB pool of IDE RAID disk arrays, 128TB tape storage and a cluster of 20 PCs dedicated to providing direct access to storage. All components are connected to each other and other machines of OSC via Fibre Channel switches.