scTurtle32 -f ../1Mreads.fa.gz -o kmer_counts -k 31 -n 6000000 -t 3cause Turtle 0.3.1 to seg fault almost immediately.
1Mreads.fa.gz
Use -i
switch to tell Turtle the file is in
fasta format.
scTurtle32 -i ../1Mreads.fa -o kmer_counts -k 31 -n 6000000 -t 3Should be fine and give output in about ten seconds like
Turtle Copyright (C) 2014 Rajat Shuvro Roy, Alexander Schliep. This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. For details see the document COPYING. Parameters received: fasta input ../1Mreads.fa ouput prefix kmer_counts k-mer length 31 Freq. k-mers 6000000 no of threads 3 STATs: No of reads: 1000000 No of k-mers: 45613374 no of frequent k-mers found :5590403
-t
) right.
kmer_counts0
>2 GGGGTGGACCCAAAAACTCCCCACGCCCCCC
GGGGTGGACCCAAAAACTCCCCACGCCCCCC
does not appear in ../1Mreads.fa
at all let alone twice.
BUT turtle includes complementary matches.
I.e. GGGGGGCGTGGGGAGTTTTTGGGTCCACCCC
does occur twice in ../1Mreads.fa
gawk -f complement.awk GGGGTGGACCCAAAAACTCCCCACGCCCCCCwill generate the complementary strand.
TGTGTGGGGGGCGTGGGGAGTTTTTGGGTCC
not reportedTGTGTGGGGGGCGTGGGGAGTTTTTGGGTCC
then neither it (nor its complement) are reported in Turtle's output
file kmer_counts0
scTurtle does not report unique k-mers, i.e. with count of exactly one.
Also Turtle treats separate sequences as separate and does not consider the tail of the previous sequence as being adjacent to the start of the next even though they are in the same file.
For the purposes of experiment only,
if sequences
2_489329
and 1_489330
are run together,
then Turtle will find two cases where
GGGGTGGACCCAAAAACTCCCCACGCCCCCC
or its complement match
and so report two matches for it.