-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Hi Brian,
This isn't so much an issue as a question about how TransDecoder works. I have received a transcriptome assembled from a combination of short-read and long-read sequencing using rnaSPAdes. The transcriptome has already had CD-HIT and TransDecoder applied. However, when TransDecoder was applied, -m 50 was used. This has resulted in a very large transcriptome (~400,000 transcripts), which is very unlikely to be true in this case. I have used the default -m 100 on my other transcriptomes of closely related species and have much smaller transcriptomes (~30,000 - 90,000 transcripts). I tried to apply TransDecoder to the large processed transcriptome, CD-HIT and TransDecoder applied at -m 50, and this caused the transcriptome to drastically reduce in size to ~ 17,000 transcripts. I unfortunately don't have access to the un-processed transcriptome, so I can't test how the second run of TransDecoder affects the transcriptome size.
I am wondering what is happening with this second run of TransDecoder, is it sound to run it a second time with a higher -m cutoff? I assumed that as I am changing the minimum amino acid length of the ORFs it would just remove those between 50 and 100 amino acids in length. But it is unlikely that there are over 350,000 reads that are this length in my transcriptome. Could this potentially have to do with the increased rate of false positive ORF predictions with the reduced length parameter?
Many thanks for your help,
Molly