What is Tertiary Analysis and How it Works?

A tertiary analysis uses a series of downstream investigations to understand the effects of variation. Sequence data naturally contains variants irrespective of loci and functional status. Tertiary analyses utilize annotation data, functional prediction data, and population frequency data. The genetics community has begun to create public data repositories to support this growing field, and efforts are underway to catalog these datasets to allow future studies to compare variant frequencies to common populations.

One example of tertiary analysis is rare sequence variants. Rare sequence variants are categorized according to their frequency. A rare sequence variant will be classified as a very low-frequency variant in the 1000 genome project catalogue or the dbSNP 129 catalogue, the last "pure" dbSNP catalogue. The region with a high rare variant load is usually related to the study focus. It will therefore be helpful to compare several samples to find the one that corresponds to the most frequent one.

A common method for understanding the complexity of Tertiary analysis is the use of a step-tree approach. These approaches elicit domain expertise and validate their use by conducting a literature review. In addition to this, they can be generalized to broader scopes and applied in applications beyond Tertiary Analysis. This approach is applicable to technology design, cognitively complex activities, and domain-specific domains.

The next step in the analysis workflow is variant detection. The variant detection step is more flexible and customizable, and sometimes refers to the variant-calling step. Variant detection entails determining differences between the sample and reference genome. Variants may be single nucleotide variants (SNVs), smaller insertions and deletions (called indels), or structural changes of the genome. Also included are copy number variants.

The process of NGS includes three stages: primary analysis, secondary analysis, and tertiary analysis. Primary analysis prepares the reads for further processing. Secondary analysis includes alignments, UMI analysis, and gene / transcript quantification. Tertiary analysis combines the outputs from primary and secondary steps to identify signaling pathways, regulated targets, and interaction partners. For NGS projects, QIAGEN's software platform is designed to support the entire process.

Secondary analysis requires appropriate software. It should be easy to use and provide an effective user interface. The input data is typically in BAM or VCF format. Tertiary analysis does not require the same level of computing power as the previous phase. Results obtained from whole genome sequencing can be processed on a computer slightly above standard specifications. Targeted re-sequencing and exome analysis require even less computational power. So, if you plan to do tertiary analysis, you should start by choosing a tool that provides a comprehensive solution.

CompStor Insight is the first tertiary analysis appliance to hit the market. It offers low-cost per-subject cost and push-button workflows for variant data analysis. It uses proprietary MemStac(tm) tiered memory technology to process up to several thousand genome datasets in a single run. In addition, CompStor Insight can interface with various knowledge databases.

The process of secondary analysis is typically resource intensive and consists of applying a series of algorithms to each sample. However, this approach can be automated using a server-oriented connector package. This package, Galaxy from Penn State, offers a single-sample GUI for a comprehensive secondary analysis solution. With a commercially available software package, it is possible to perform any secondary analysis in a single run. You can even integrate the two pipelines into a single analysis workflow.

While no studies have explicitly explored the tertiary analysis process, many bioinformaticians recognize the need to build their tools around model-based process models. This helps them better understand the complexity and utility of tertiary bioinformatics analysis. It's also important to note that Tertiary Analysis tools are not the only way to understand the results of a bioinformatics analysis.