DECA User Guide¶
Introduction¶
DECA is a copy number variant caller built on top of Apache Spark to allow rapid variant calling on cluster/cloud computing environments. DECA is built on ADAM’s APIs, and is a reimplementation of the XHMM copy number variant caller. DECA provides an order of magnitude performance improvement over XHMM when running on a single machine. When running on a 1,024 core cluster, DECA can call copy number variants from the 1,000 Genomes exome reads in approximately 5 hours. DECA is highly concordant with XHMM, with >93% exact breakpoint concordance, and <0.01% discordant CNV calls.
Running DECA¶
DECA is run through the deca-submit command line:
./bin/deca-submit
Using SPARK_SUBMIT=/usr/local/bin/spark-2.2.1-bin-hadoop2.7/bin/spark-submit
Usage: deca-submit [<spark-args> --] <deca-args> [-version]
Choose one of the following commands:
normalize : Normalize XHMM read-depth matrix
coverage : Generate XHMM read depth matrix from read data
discover : Call CNVs from normalized read matrix
normalize_and_discover : Normalize XHMM read-depth matrix and discover CNVs
cnv : Discover CNVs from raw read data
The deca-submit script follows the same conventions as the adam-submit
command line, whose documentation can be found
here.
As a result, just like ADAM, DECA can be deployed on a local machine, on
AWS,
an in-house cluster running YARN
or SLURM,
or using Toil.
We provide a Toil workflow for running DECA as part of the bdgenomics.workflows
package.
bdgenomics.workflows
can be installed with
pip.