Skip to content

Glossary

This files contain a glossary of technical terms and concepts that are used in the documentation.

Alignement

File formats

An aligned file can be stored in different formats described below:

Format Content Compressed Description Specification
SAM Textual No Base file format for aligned read SAM format
BAM Binary No Binary version of SAM BAM format
CRAM Binary Yes Highly compressed version of BAM CRAM format
FASTA Textual No For each sequence contains a single letter representing the nucleotide at a specific position. It doesn't have a formal specification, more information here
GFF3 Textual No Annotation data GFF3 format
VCF Textual No Variant data
BCF Binary Yes (optional) Variant data BCF format

NOTE: The reality is more nuanced than what described in this table, as a SAM file can also contain unaligned reads, CRAM files can store uncompressed data, and there are many more details that are not described here for the sake of brevity. In any case, the table above contain the most common usages of these files, you can find more information by looking at their respective specifications.

TBD: add unaligned formats. FASTQ, Textual, DNA sequences with quality scores, Yes, FASTQ format