Overview
To create a set of non-redundant likely protein coding sequences, all sequences were translated into longest open-reading frames using the software package Transdecoder (Version 2.01). Sequences with open-reading frames of at least 300 nucleotides (100 amino acids) were collected and oriented in the direction of the open-reading frame. For each species these oriented sequences were clustered cd-hit software package (version 4.6) with the longest sequence from each cluster being chosen as the representative sequence. |