AGELESSMolecularMicro

PAINT: Leishmania Sexual Reproductive Strategies As Resolved Through Compuational Methods Designed for Aneuploid Genomes

washUlogo
Jahangheer S. Shaik, Deborah E. Dobson, David L. Sacks and Stephen M. Beverley




MAIN

INDEX


ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

AGELESS Package  

Example Data

Generate A merged Allele file

This page walks you though various steps required to generate a Merged allele file using AGELESS package.

Prerequisites

1) Align the fastq files to the reference genome of interest using your favourite aligner (Bowtie, BWA, Novoalign etc.)
2) Sort the generated Binary alignment map (BAM) file using samtools 'sort' function
3) Generate allele files using findAlleles utility of PAINT
4) Generate list files using listAlleles utility of PAINT
5) Find parental SNVs using findSNPsSomy utility of PAINT. Extract Homozygous SNVs from the parental lines and place them in a directory.

How to Run it?

The findAllelesSNPpositions utility of PAINT package takes the parental homozygous SNV files and finds markers where parental lines are different from each other. It then finds the composition of all files in the listFiles directory at those markers. Type java -jar PAINT.jar findAllelesSNPpositions -h for options.
  •  -i is the directory where the list files are placed. List files are generated by  listAlleles utility of PAINT.
  • -j is the directory where parental homozygous SNVs are placed. The SNVs are found using the findSNPsSomy utility of PAINT. Alternatively, you may use any of your favourite SNV callers.
  • -k is either "vcf" or "tab". If the generated SNV file is in VCF format then set -k to "vcf" or "tab" otherwise
  • -o is the output merged allele file

Command Example

java -jar PAINT.jar findAllelesSNPpositions -i "../DemoData/listFiles" -j "../DemoData/snpFile/homozygous/" -o "../DemoData/otherFiles/mergedAlleles.txt-k "vcf"

Output

The output of the findAllelesSNPpositions utility program looks as follows:
merged allele
Markers are the genomic positions where both parental lines are different from each other. The reference and alternate allele are also listed in the  merged allele file. The subsequent columns contain the allele composition of the files in the listFiles directory (-i option in findAllelesSNPpositions). FV1_SAT and LV39c5HYG are the parental lines and as can be seen, both the parental lines are homozygous different at those loci. The hybrids  as expected have contribution from both the parental lines and therefore are heterozygous. Missing data is represented by "-".The users can filter out the markers where parental lines have missing data (LmjF.01 position 20 in FV1_SAT for example). This can be done by "filter" option in excel. Perform filtering in both merged allele and merged allele frequency files if filtering is to be performed.