MAIN
INDEX
ANALYTICAL PIPELINE
CONTACT
SYSTEM REQUIREMENTS
PAINT Package
Example Data |
Generate List FileThis page walks you though various steps required to generate a List file using PAINT package.
Prerequisites1) Align the fastq files to the reference genome of interest using your favourite aligner (Bowtie, BWA, Novoalign etc.) 2) Sort the generated Binary alignment map (BAM) file using samtools 'sort' function 3) Generate an allele file using findAlleles utility of PAINT
How to Run it?The listAlleles utility of PAINT package takes allele file generated using the step 3 of prequisites and summarizes it in terms of frequencies. At this stage two different filters can be applied. Type java
-jar PAINT.jar listAlleles -h for options.
- The -m parameter is the
minimum frequency required to call a base. A frequency threshold of
0.4 means a nucleotide at any given position with frequency
greater
than or equal to 40% will be called. If there is more than one
nucleotide with frequency more than 40% at a given genomic position,
which is typical of heterozygous sites in anneuploid genomes, then both
will be listed. Depending on the threshold specified, there might be
more than 2 nucleotides at any given position.
- The -n parameter calls nucleotides at a position if the read depth (coverage)
at that position is greater than or equal to the number specified. if
the coverage is less than the number specified, a "-" will be reported.
- Finally, -o is the output file.
Command Examplejava
-jar PAINT.jar listAlleles -i "../DemoData/alleleFiles/FV1_SAT_srt.allele" -o "../DemoData/listFiles/FV1_SAT_srt.list" -m 0.15 -n 10OutputThe
output of the listAlleles program looks as follows:
LmjF01 21
A 13.0 1.0
0.0 0.0 0.0
4 9 129.69230769230768 22
C 15.0 0.0
0.0 1.0 0.0
5 10 132.4 23
C 15.0 0.0
0.0 1.0 0.0
5 10 132.4 24
C 15.0 0.0
0.0 1.0 0.0
5 10 132.4 25
T 16.0 0.0
1.0 0.0 0.0
6 10 133.5 26
A 16.0 1.0
0.0 0.0 0.0
6 10 133.5 27
A 16.0 1.0
0.0 0.0 0.0
6 10 133.5 28
C 16.0 0.0
0.0 1.0 0.0
6 10 133.5 29
C 16.0 0.0
0.0 1.0 0.0
6 10 133.5 30
C 16.0 0.0
0.0 1.0 0.0
6 10 133.5 31
T 16.0 0.0
1.0 0.0 0.0
6 10 133.5 32
A 17.0 1.0
0.0 0.0 0.0
7 10 134.47058823529412 33
A 18.0 1.0
0.0 0.0 0.0
7 11 135.33333333333334 34
C 19.0
0.05263157894736842 0.0
0.9473684210526315 0.0
8 11 136.10526315789474 35
C 19.0 0.0
0.0 1.0 0.0
8 11 136.10526315789474 36
C 19.0 0.0
0.0 1.0 0.0
8 11 136.10526315789474
........
LmjF02
......... | The
list file consists of the chromosome name followed by information corresponding to each genomic position.
To generate this list file, a minimum read depth of 10 and minimum
frequency of 15% were specified as shown above. For
example, " 34 C
19.0 0.05263157894736842
0.0 0.9473684210526315
0.0 8 11
136.10526315789474" in the list file means the following:
- 34
C 19.0
0.05263157894736842 0.0
0.9473684210526315 0.0
8 11 136.10526315789474- Genomic position
- 34 C
19.0 0.05263157894736842
0.0 0.9473684210526315
0.0 8 11
136.10526315789474- Nucleotide called. C is called as it occurs
with 94.7% frequency. A is not called as it is less than 15% thresold
that was specified when running the utility.
- 34 C 19.0
0.05263157894736842 0.0
0.9473684210526315 0.0
8 11 136.10526315789474-
Read depth (coverage) at that position. The information corresponding
to genomic positions 9 through 20 that were present in the allele file
(see output of find Alleles) is lost because it did not meet the minimum read depth criteria of 10 that was specified in the command.
- 34 C 19.0 0.05263157894736842 0.0 0.9473684210526315 0.0 8 11 136.10526315789474- Normalized allele frequencies
- 34 C 19.0 0.05263157894736842 0.0 0.9473684210526315 0.0 8 11 136.10526315789474- Forward and reverse reads at that genomic position
- 34 C 19.0 0.05263157894736842
0.0 0.9473684210526315
0.0 8 11 136.10526315789474- Average Mapping quality
|
|