proTRAC QuickStart and Troubleshooting

QuickStart

  1. Download the proTRAC folder to the desired directory on your machine.
  2. Map your sequence reads (FASTA format) to a genome with SeqMap (Jiang and Wong 2008). Use the option /output_all_matches. SeqMap is freely available here. Many Genomes are available at NCBI or Ensembl. Map your reads with: seqmap 0 reads.fas genome.fas ELAND3_output.txt /output_all_matches
  3. Start proTRAC and use ELAND3_output.txt as input file
  4. Adjust settings and start.

 

Troubleshooting

My ELAND3 file is huge

Check your sequence dataset. Do not map homo- or dipolymeric stretches like GCGCGCGCGCGCGCGC or AAAAAAAAAAAAAAA to the genome. These sequences produce millions of hits since they will map to microsatellites. In some cases it can be helpful to filter out sequence reads that correspond to rRNA or tRNA sequences etc. You can find most useful Perl scripts like filter_simple_repeats.pl or map_sequences.pl as part of the "NGS tools for the novice". The whole toolkit is freely available here.

 

I cannot execute proTRAC.pl

Executing Perl scripts requires installation of a Perl distribution. Furthermore, you may need to install additional Perl modules like Tk and GD. Missing modules will be listed in the error message. Modules are freely available at the Comprehensive Perl Archive Network. If you are not on a Windows system, you can also try to run proTRAC.exe via an emulated Windows. We ran proTRAC.exe without any problems on a MAC with emulated Windows XP by Parallels Desktop®.

 

proTRAC returns an error message when reading the input file.

Ensure that your input file is in ELAND3 format. Your file should look something like this:

trans_id       trans_coord         target_seq         probe_id            probe_seq        num_mismatch      strand
Chr1             2549081   TTGTACTACTTCCATT     3     TTGTACTACTTCCATT           0                      -
Chr1             3743045   TGAGGCCATGTTTCA     1     TGAGGCCATGTTTCA           0                      -
Chr1             3785722   TCAATTCTTGACTTCT     2     TCAATTCTTGACTTCT           0                      +
Chr1             3797369   TTTCTTATCGTGCATG     1     TTTCTTATCGTGCATG           0                      -

proTRAC does not assemble cluster candidates

Cluster candidates are assembled on the basis of hit density. There might be no hit accumulation in your ELAND3 input file that satisfies your settings.  Try to reduce p for hit density or the minimum hit density. If the appointed sliding window size seems to be to high (usually ~10), probably one of the simple settings or probabilistic settings is adjusted too strict.

 

proTRAC assembles cluster candidates but verifies 0 clusters

The cluster candidates do not pass the requirements. Probably one or several of the simple settings or probabilistic settings are adjusted too strict. IMPORTANT: Keep in mind what you mapped to the genome. If all of your sequences are 26-32nt in length, there cannot be an accumulation of reads with typical length, if typical length is set to 26-32. If you mapped piRNAs, do not expect an accumulation of loci starting with T as compared to the entirety of mapped reads. In this case, use the option based on random base composition.

 

Validation of results takes hell of a long time

Probably quite a few clusters have been detected. Abort computation at this point. The detected clusters and the optional cluster summary file have already been saved.

 

How to include transcriptional information in proTRAC?

This information must be included in the sequence file that is mapped to the genome. FASTA titles must refer to the abundance of the respective sequence read:

>73
GCTAGCTAGCGTAGCTAGCTGCGCTA
>2
AATGCGCTATATACGGCTCTTATAGCGCAT
>12
TCTCTAGAGATCTCTTTTTTAAGTC

A Perl script (discard_redundant_sequences.pl) that converts FASTA files with redundant sequences to the required format is part of the "NGS tools for the novice". The whole toolkit is freely available here.

 

Any more questions or problems?

Contact David Rosenkranz: rosenkrd@uni-mainz.de  

 

 

back to Software

back to profile of David Rosenkranz

Zum Inhalt der Seite springen Zur Navigation der Seite springen