Pathogenicity of Mutation ANalyzer (Beta)
What is PathoMAN?
Cancer care professionals are confronted with interpreting results from multiplexed gene sequencing of patients at hereditary risk for cancer. Assessments for variant classification now require orthogonal data searches, requiring aggregation of multiple lines of evidence from diverse resources. The burden of evidence for each variant to meet thresholds for pathogenicity or actionability now poses a growing challenge for those seeking to counsel patients and families following germline genetic testing. A computational algorithm that automates, provides uniformity and significantly accelerates this interpretive process is needed. The tool described here, Pathogenicity of Mutation Analyzer (PathoMAN) automates germline genomic variant curation from clinical sequencing based on ACMG guidelines. PathoMAN aggregates multiple tracks of genomic, protein and disease specific information from public sources. We compared expert manually curated variant data from studies on (i) prostate cancer (ii) breast cancer and (iii) ClinVar to assess performance. PathoMAN achieves high concordance (83.1% pathogenic, 75.5% benign) and negligible discordance (0.04% pathogenic, 0.9% benign) when contrasted against expert curation. Some loss of resolution (8.6% pathogenic, 23.64% benign) and gain of resolution (6.6% pathogenic, 1.6% benign) was also observed. We highlight the advantages and weaknesses related to the programmable automation of variant classification. We also propose a new nosology for the five ACMG classes to facilitate more accurate reporting to ClinVar. The proposed refinements will enhance utility of ClinVar to allow further automation in cancer genetics. PathoMAN will reduce the manual workload of domain level experts. It provides a substantial advance in rapid classification of genetic variants by generating robust models using a knowledge-base of diverse genetic data.
Towards automation of germline variant curation in clinical cancer genetics
Vignesh Ravichandran, Zarina Shameer, Yelena Kemel, Michael Walsh, Karen Cadoo, Steven Lipkin, Diana Mandelker, Liying Zhang, Zsofia Stadler, Mark Robson, Kenneth Offit, Vijai Joseph
Genetics in Medicine 2019; https://doi.org/10.1038/s41436-019-0463-8
Pre-print is available in biorxiv
What should my input file look like?
The input file for Batch upload option is an MS office generated Comma seperated file with .CSV as extention. Make sure the last line of your input file has a carriage return, line feed (CR LF). You can use a free text editor such as notepad++ to ensure your file meets the principle of robustness.
PathoMAN requires six comma seperated columns in the following order - chromosome, position, reference allele, alternate allele, allele count, allele number without any header. Reference or Alternate allele cannot be "-" to mark delettions.
Determination of PVS1: null variants
Curated lists of cancer-causing genes from the literature, various genetic testing panels and OMIM genes that causes autosomal dominant disease were aggregated. If the variant in a gene from this list was a Tier 1 mutation (frameshift, truncating, essential splice variant and initiation codon), and not present in thelast exon, then PVS1 was scored 1. A gene with a functional domain encoded by the last exon, such as ATM, was an exception to the last exon criteria. PVS1 was not scored for BRCA2 mutations observed after the polymorphic stop rs11571833 (K3326X).
Determination of PS1 and PM5: known pathogenic missense
If a missense variant was reported as pathogenic by multiple submitters with no conflicts and had a gold star of 2 or more in ClinVar, irrespective of the alternative allele but leading to the same amino acid change, then PS1 was scored 1. If a missense variant was not seen in ClinVar but had another pathogenic missense variant at the same amino acid with a different amino acid change, then PM5 was coded 1.
Determination of PS3 or BS3: strong prior evidence of pathogenic or benign
An aggregated select list of reported pathogenic and benign variants from the literature were used as a knowledge-base for PS3/BS3. If the variants were in the curated list (missense variants in BRCA1/2 reported by ENIGMA), or if it was a truncating variant and ClinVar had reported it as pathogenic or benign with a gold star 2 or more, then PS3 or BS3 was coded 1. Our selective use of ClinVar assigns higher confidence for the truncating variants and select missense variants reported by domain experts that are either pathogenic or benign. We also include published saturation editing experimental evidence for BRCA1.
Determination of PS4 PM2 BA1 BS1 and BS2: rarity and enrichment of variant in cases
If a variant was present in aggregated public controls such as the ExAC-noTCGA dataset and gnomAD with an allele frequency greater than 5%, then the variant was coded 1 for BA1. If the variant had an allele frequency in public controls between 1% and 5% then BS1 was coded 1. If the variant was also present in a homozygous form in the public controls, then BS2 was coded 1. If the variant was absent from ExAC-noTCGA data or gnomAD general population data, and then PM2 was coded 1. For variants, not scored as BA1, BS1, BS2 or PM2, Fishers Exact test was performed against the user defined population (ExACnoTCGA or gnomAD). The population included all the major groups (NFE, FIN, SAS, AMR, AFR, EAS) in ExAC and ASJ population in the gnomAD database. If the odds ratio was greater than 3 and p-value less than 0.05, then the variant were given a score of 1 for PS4. This was a robust measure for weighting pathogenicity in uncommon variants and non-singletons.
Determination of PM1: membership in a protein domain of functional significance
If the amino acid that was being altered by the mutation was present in a protein domain, or a residue involved in signalling, binding with other proteins, or in an active site, then PM1 was coded 1. Currently, we use Uniprot for annotation of protein features. We acknowledge the incremental value of a curated somatic hotspot list (http://cancerhotspots.org) to aid in this classification.
Determination of PM4 BP3: genomic complexity and context of the variant
If the mutation was an in-frame insertion/deletion or a stop loss in a non-repetitive region, then the variant was coded 1 for PM4. Instead, if it was an in-frame insertion/deletion in a repetitive region, then BP3 was coded 1. The repeat masker track from UCSC genome browser was used for this criteria.
Determination of PP3 BP4: In silico prediction of deleteriousness
We used Annovar17 to annotate the variants with dbNSFP track to get results of deleteriousness predictions from 12 in silico algorithms – CADD, FATHMM, LRT, MutationAssessor, MutationTaster, PROVEAN, Polyphen2-HDIV, Polyphen2- HVAR, RadialSVM35, SIFT, VEST3 and M-CAP. Use of an ensemble of in silico prediction algorithms improves prediction across a wide range of genes and cancer types. Hence if more than 7 (>50%) algorithms call a variant deleterious, then the variant was coded 1 for PP3. Otherwise, BP4 was coded 1. In contrast, many of the old ClinVar records relied on only SIFT or Polyphen.
Determination of PP5 BP6: Known variant with insufficient details
If the variant was in ClinVar with gold star less than 2 and was pathogenic, then PP5 was scored 1. If it’s benign, BP6 was scored 1. Thus, a variant reported once in ClinVar does not command high value in the pathogenicity determination, but can be upgraded depending on other ancillary information tagged to it.
Determination of BP7: synonymous variants
For synonymous silent mutations, we used adaptive boosting and random forest scores from dbscSNV, which if it was less than 0.6, and then BP7 was scored 1. dbscSNV is a database of precomputed prediction scores for SNVs, that may occur in splice consensus regions. Higher scores reflect the variants effect in splicing.
Determination of PP2 and BP1: missense driven disease genes
We used ClinVar to collect all reported missense variants per gene. We then selected the confident (gold star 2 or more) pathogenic and benign calls. The list of genes with higher ratio of pathogenic to benign variants called missense-driven pathogenic genes and the list of genes with lower ratio of pathogenic to benign variants were called missense-driven benign genes. Any missense variant in the missense-driven pathogenic gene list was scored PP2 and any missense variant in the missense-driven benign gene list was scored BP1. Classic example of such genes is PTEN and TP53. Almost all PTEN missense mutations were pathogenic.
Determination of PS2, PM6, PP1 and BS4: denovo and cosegregation
ACMG criteria require both paternity and maternity confirmed for de novo variants. PathoMAN requires user input for de novo status and segregation information for classification. Three options for de novo evidence includes, de novo with both paternity and maternity confirmed, de novo without paternity or maternity confirmed and no de novo evidence at al. Similarly, for cosegregation evidence, the options provided were co-segregation with the disease, lack of co-segregation with the disease and no co-segregation. For larger trio studies, in the future, we expect to include a module that looks for de novo variants computationally from a mutisample VCF and pedigree file information.
Determination of PM3, BP2: recessive inheritance
These two categories apply to variants with a recessive disease. Germline cancer variants are generally associated with cancer predisposition syndromes in an autosomal dominant inheritance pattern. Hence, PM3 and BP2 doesn’t apply for current PathoMAN variant classification. They were scored 0. We expect to add compound heterozygosity to the next version of the algorithm. We also will incorporate select gene variants in mismatch repair genes that can be classified by this category into the knowledge-base.
Determination of PP4, BP5: disease specific conditions
PP4 and BP5, per ACMG were two criteria to consider in a patient’s predisposition to a specific disease with single gene aetiology. Cancer is a disease with multi-gene aetiology although certain genes such as RB1 may be strong candidates for PP4. Once we incorporate a pedigree file, and cancer phenotype variables, we should be able to apply these criteria. In the current iteration, these were scored 0. The final classification schema was based on the original ACMG scoring and pathogenicity was predicted for the datasets described.