Glossary
This files contain a glossary of technical terms and concepts that are used in the documentation.
Alignement
File formats
An aligned file can be stored in different formats described below:
Format | Content | Compressed | Description | Specification |
---|---|---|---|---|
SAM | Textual | No | Base file format for aligned read | SAM format |
BAM | Binary | No | Binary version of SAM | BAM format |
CRAM | Binary | Yes | Highly compressed version of BAM | CRAM format |
FASTA | Textual | No | For each sequence contains a single letter representing the nucleotide at a specific position. | It doesn't have a formal specification, more information here |
GFF3 | Textual | No | Annotation data | GFF3 format |
VCF | Textual | No | Variant data | |
BCF | Binary | Yes (optional) | Variant data | BCF format |
NOTE: The reality is more nuanced than what described in this table, as a SAM file can also contain unaligned reads, CRAM files can store uncompressed data, and there are many more details that are not described here for the sake of brevity. In any case, the table above contain the most common usages of these files, you can find more information by looking at their respective specifications.
TBD: add unaligned formats. FASTQ, Textual, DNA sequences with quality scores, Yes, FASTQ format