AGELESSMolecularMicro

PAINT: Leishmania Sexual Reproductive Strategies As Resolved Through Compuational Methods Designed for Aneuploid Genomes

washUlogo
Jahangheer S. Shaik, Deborah E. Dobson, David L. Sacks and Stephen M. Beverley




MAIN

INDEX


ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

PAINT Package  

Example Data

Generate List File

This page walks you though various steps required to generate a List file using PAINT package.

Prerequisites

1) Align the fastq files to the reference genome of interest using your favourite aligner (Bowtie, BWA, Novoalign etc.)
2) Sort the generated Binary alignment map (BAM) file using samtools 'sort' function
3) Generate an allele file using findAlleles utility of PAINT

How to Run it?

The listAlleles utility of PAINT package takes allele file generated using the step 3 of prequisites and summarizes it in terms of frequencies. At this stage two different filters can be applied. Type java -jar PAINT.jar listAlleles -h for options.
  • The -m parameter is the minimum frequency required to call a base. A frequency threshold of  0.4  means a nucleotide at any given position with frequency greater than or equal to 40% will be called. If there is more than one nucleotide with frequency more than 40% at a given genomic position, which is typical of heterozygous sites in anneuploid genomes, then both will be listed. Depending on the threshold specified, there might be more than 2 nucleotides at any given position.
  • The -n parameter calls nucleotides at a position if the read depth (coverage) at that position is greater than or equal to the number specified. if the coverage is less than the number specified, a "-" will be reported.
  •  Finally, -o is the output file.

Command Example

java -jar PAINT.jar listAlleles -i "../DemoData/alleleFiles/FV1_SAT_srt.allele" -o "../DemoData/listFiles/FV1_SAT_srt.list" -m 0.15 -n 10

Output

The output of the listAlleles program looks as follows:
LmjF01
21    A    13.0    1.0    0.0    0.0    0.0    4    9    129.69230769230768
22    C    15.0    0.0    0.0    1.0    0.0    5    10    132.4
23    C    15.0    0.0    0.0    1.0    0.0    5    10    132.4
24    C    15.0    0.0    0.0    1.0    0.0    5    10    132.4
25    T    16.0    0.0    1.0    0.0    0.0    6    10    133.5
26    A    16.0    1.0    0.0    0.0    0.0    6    10    133.5
27    A    16.0    1.0    0.0    0.0    0.0    6    10    133.5
28    C    16.0    0.0    0.0    1.0    0.0    6    10    133.5
29    C    16.0    0.0    0.0    1.0    0.0    6    10    133.5
30    C    16.0    0.0    0.0    1.0    0.0    6    10    133.5
31    T    16.0    0.0    1.0    0.0    0.0    6    10    133.5
32    A    17.0    1.0    0.0    0.0    0.0    7    10    134.47058823529412
33    A    18.0    1.0    0.0    0.0    0.0    7    11    135.33333333333334
34    C    19.0    0.05263157894736842    0.0    0.9473684210526315    0.0    8    11    136.10526315789474
35    C    19.0    0.0    0.0    1.0    0.0    8    11    136.10526315789474
36    C    19.0    0.0    0.0    1.0    0.0    8    11    136.10526315789474
........

LmjF02
.........
The list file consists of the chromosome name followed by information corresponding to each genomic position. To generate this list file, a minimum read depth of 10 and minimum frequency of 15% were specified as shown above.
For example, " 34    C    19.0    0.05263157894736842    0.0    0.9473684210526315    0.0    8    11    136.10526315789474" in the list file means the following:
  • 34    C    19.0    0.05263157894736842    0.0    0.9473684210526315    0.0    8    11    136.10526315789474- Genomic position
  • 34    C    19.0    0.05263157894736842    0.0    0.9473684210526315    0.0    8    11    136.10526315789474- Nucleotide called. C is called as it occurs with 94.7% frequency. A is not called as it is less than 15% thresold that was specified when running the utility. 
  • 34    C    19.0    0.05263157894736842    0.0    0.9473684210526315    0.0    8    11    136.10526315789474- Read depth (coverage) at that position. The information corresponding to genomic positions 9 through 20 that were present in the allele file (see output of find Alleles) is lost because it did not meet the minimum read depth criteria of 10 that was specified in the command.
  •   34    C    19.0    0.05263157894736842    0.0    0.9473684210526315    0.0    8    11    136.10526315789474- Normalized allele frequencies
  • 34    C    19.0    0.05263157894736842    0.0    0.9473684210526315    0.0    8    11    136.10526315789474- Forward and reverse reads at that genomic position
  • 34    C    19.0    0.05263157894736842    0.0    0.9473684210526315    0.0    8    11    136.10526315789474- Average Mapping quality