Bacterial Genomics

Adapter clipping and quality trimming using Trimmomatic

Introduction

This process uses Trimmomatic to perform some of the read cleaning steps to remove and trim FastQ sequences.

Adapter clipping

Trimmomatic supplies a multi-record FastA sequence of known Illumina adapter sequences here and some have concatenated all of those into a single adapter FastA file to identify and remove sequence reads containing any known adapters. The sequences they provide are just the adapter without barcodes. However, they do not supply sequences for some other kits that I have dealt with (e.g., Rubicon Genomics ThruPLEX, NEBNext).

I used NCBI’s UniVec database here which contains Illumina adapter sequences and other unrelated sequences to form a more comprehensive multi-FastA adapter file, which includes the exact barcode names as well in their deflines for identifying which specific adapter(s) were removed. This file is used for adapter removal.

The Illumina instruments should be detecting and removing perfect matches, but when there is a sequencing error or two, it can end up in the FastQ output. So, this Illumina adapter identification and removal process allows for up to 2 mismatches from each (roughly 50-70 bp lengths) sequence.

Quality trimming

Length filter

Sister read pairing

Log information

Each of these trimming effects/outcomes are tallied and logged:

[!NOTE] Trimmomatic does not list which specific adapters (by name) were removed but there is a feature request for this to be implemented within Trimmomatic.