Using a VCF File

Last updated: 4 years ago (view history), Time to read: 25 mins

Using VCF file on ProteinPaint

Current implementation works for SNV/indel data only.

Preparing a VCF file

Input file: file.vcf

bgzip file.vcf
tabix -p vcf file.vcf.gz

If successful, this will produce two files “file.vcf.gz” and “file.vcf.gz.tbi”. The latter is the TBI index file. CSI index also works.

Hosting a VCF file

Host it on the ProteinPaint server:

Put both the .gz file and the index file at the same directory

inside the directory of the ProteinPaint server.

a. The directory is specified in the file “serverconfig.json”.
Obtain the relative path to the .gz file from but not include the directory.

a. E.g. for /path/to/file.vcf.gz, the path would be “path/to/file.vcf.gz”

Host it on a web server:

Put the .gz and index files on the web server. Obtain the URL of the .gz file for submission. For most cases ProteinPaint expects to find the index file at the same location of the .gz file
If the index file does not share URL with the .gz file, provide its URL via the .indexURL attribute in the JSON definition

Showing a VCF file on ProteinPaint

Via URL parameters

Examples:

http://server?block=on&genome=[genome]&vcffile=[name],[file path]
http://server?block=on&genome=[genome]&vcfurl=[name],[file URL]

The “vcfurl” cannot accept a separate URL for the index file, for such case, use embedding API instead

This will launch a genome browser view showing the VCF track.

Via embedding API

If the index file is using a separate URL with the .gz file, the embedding API must be used, in which the attribute “indexURL” can be used to provide an additional URL for the index file.

General introduction of embedding API.

General introduction on JSON definition for tracks.

See section Summary of VCF track object for an overview of the VCF track object.

Variant annotation formats

This section describes supported formats for functional annotation of variants. The annotation is optional and mostly for coding variants. The annotation will color-code variants using ProteinPaint’s built-in protein mutation classes.

Shortcoming is the built-in classes are hardcoded and cannot be customized.

ProteinPaint fully supports custom defined classes through an alternative method. See the section on highlighting and filtering with INFO fields.

VEP format

The VCF output from the VEP is accepted by ProteinPaint. Find details at http://useast.ensembl.org/info/docs/tools/vep/vep_formats.html at the “VCF output” section.

The annotation is encoded in an INFO field called CSQ. It supports functional annotation of multiple isoforms on a given allele.

Description of the CSQ sub-fields must be in a required format for the purpose of identifying the name and order of the sub-fields. Following is one example:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|ALLELE_NUM|DISTANCE|STRAND|SYMBOL|SYMBOL_SOURCE|HGNC_ID|BIOTYPE|CANONICAL|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|SIFT|PolyPhen|EXON|INTRON|DOMAINS|HGVSc|HGVSp|GMAF|AFR_MAF|AMR_MAF|ASN_MAF|EUR_MAF|AA_MAF|EA_MAF|CLIN_SIG|SOMATIC|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|LoF_info|LoF_flags|LoF_filter|LoF">

Required CSQ sub-fields:

Feature_type
- When the value of Feature_type is “Transcript” for a variant, the value of “Feature” should be isoform name, e.g. RefSeq or ENSEMBL isoform. If multiple isoform names joined by comma, the first will be used
Consequence
- The VEP consequence notation http://useast.ensembl.org/info/genome/variation/predicted_data.html is converted to internal notation used by ProteinPaint. A match table is shown below
HGVSp
Protein_position
Amino_acids
HGVSc
Existing_variation
- If any above fields are available, will use the value as the label of the variant

VEP notation	ProteinPaint notation	Note
transcript_ablation	deletion, intragenic	Not yet supported.
spliceacceptorvariant	splice
splicedonorvariant	splice
stop_gained	nonsense
frameshift_variant	frameshift
stop_lost	nonsense
start_lost	nonsense
transcript_amplification	nonstandard	Not yet supported.
inframe_insertion	proteinins
inframe_deletion	proteindel
missense_variant	missense
proteinalteringvariant	nonsense	Not explicitly supported
spliceregionvariant	splice_region
incompleteterminalcodon_variant	nonsense
stopretainedvariant	silent	Not explicitly supported.
synonymous_variant	silent
codingsequencevariant	nonstandard	Not explicitly supported.
maturemiRNAvariant	exon	Not yet supported.
5primeUTR_variant	utr_5
3primeUTR_variant	utr_3
noncodingtranscriptexonvariant	exon
intron_variant	intron
NMDtranscriptvariant	silent	Not equivalent.
noncodingtranscript_variant	exon
upstreamgenevariant	noncoding	Not explicitly supported.
downstreamgenevariant	noncoding	Not explicitly supported.
TFBS_ablation	noncoding	Not yet supported.
TFBS_amplification	noncoding	Not yet supported.
TFbindingsite_variant	noncoding	Not yet supported.
regulatoryregionablation	noncoding	Not yet supported.
regulatoryregionamplification	noncoding	Not yet supported.
feature_elongation	noncoding	Not yet supported.
regulatoryregionvariant	noncoding	Not yet supported.
feature_truncation	noncoding	Not yet supported.
intergenic_variant	noncoding	Not yet supported.

ANN format

As generated by SnpEff: http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf

This format uses the “ANN” INFO field. The meta line:

##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">

Example annotation data:

ANN=T|5_prime_UTR_variant|MODIFIER|TUBB8|TUBB8|transcript|XM_011519459.2|protein_coding|1/4|c.-177G>A|||||750|,T|upstream_gene_variant|MODIFIER|TUBB8|TUBB8|transcript|NM_177987.2|protein_coding||c.-187G>A|||||187|,T|upstream_gene_variant|MODIFIER|TUBB8|TUBB8|transcript|XM_011519460.2|protein_coding||c.-1248G>A|||||961|,T|upstream_gene_variant|MODIFIER|TUBB8|TUBB8|transcript|XM_017016193.1|protein_coding||c.-1370G>A|||||967|,T|upstream_gene_variant|MODIFIER|TUBB8|TUBB8|transcript|XM_017016192.1|protein_coding||c.-1370G>A|||||955|

The “Annotation” notations are converted to the built-in mutation class using the same set of VEP rules.

In-house format

This format uses multiple INFO fields to provide per-alt allele values:

##INFO=<ID=AV_POS,Number=A,Type=Integer,Description="Annovar+: adjusted allele position in order of ALT column">
##INFO=<ID=AV_REF,Number=A,Type=String,Description="Annovar+: adjusted reference allele in order of ALT column">
##INFO=<ID=AV_ALT,Number=A,Type=String,Description="Annovar+: adjusted variant allele in order of ALT column">
##INFO=<ID=AV_ACC,Number=A,Type=String,Description="Annovar+: isoform accession in order of ALT column">
##INFO=<ID=AV_AA,Number=A,Type=String,Description="Annovar+: amino acid change in order of ALT column">
##INFO=<ID=AV_CLASS,Number=A,Type=String,Description="Annovar+: variant class in order of ALT column">

Example annotation:

AV_POS=69655;AV_REF=G;AV_ALT=C;AV_ACC=NM_001005484;AV_AA=D189H;AV_CLASS=missense

Combining multiple VCF files in one track

This is only achievable by declaring the tracks in JSON and calling ProteinPaint’s embedding API.

Following example shows the JSON part of a VCF track which combines multiple VCF files.

{
"type":"vcf",
"name":"track name",
"tracks":[
    {
      "file":"path/to/sample1.gz"
    },  
    {   
       "file":"path/to/sample2.gz"
    },  
    ... more files ...
]
}

In above example, the “tracks” attribute provides a list of member tracks.

Each member track can have 0 or multiple samples.

Highlighting and filtering variants with INFO fields

Variants from a VCF track can be highlighted by other types of annotation, in addition to the default mutation class.

To recreate this display, save following text into an HTML file, and display it in a web browser.

<!DOCTYPE html>
<html>
<head><meta charset="utf-8"></head>
<body>
<script src="https://proteinpaint.stjude.org/bin/proteinpaint.js" charset="utf-8"></script>
<div id=aaa style="margin:10px"></div>
<script>
runproteinpaint({
    host:'https://proteinpaint.stjude.org',
    holder:document.getElementById('aaa'),
    parseurl:true,
    block:true,
    genome:'hg38',
    position:'chr12:25245155-25246060',
    nativetracks:'RefGene',
    tracks:[   {
        type:'vcf',
        name:'Clinvar',
        file:'hg38/clinvar.hg38.vcf.gz',
        vcfinfofilter:{lst:[
            {
                name:'Clinical significance',
                locusinfo:{
                    key:'CLNSIG',
                },
                categories:{
                    'Uncertain_significance':{color:'#aaa',label:'Uncertain significance',textcolor:'white'},
                    'not_provided': {color:'#ccc',label:'Not provided'},
                    '_not_provided':{color:'#ccc',label:'Not provided'},
                    'Benign':{color:'#43ac6a',label:'Benign',textcolor:'white'},
                    'Benign/Likely_benign':{color:'#43ac6a',label:'Benign/Likely benign',textcolor:'white'},
                    'Likely_benign':{color:'#5bc0de',label:'Likely benign',textcolor:'white'},
                    'Likely_pathogenic':{color:'#e99002',label:'Likely pathogenic',textcolor:'white'},
                    'Pathogenic':{color:'#f04124',label:'Pathogenic',textcolor:'white'},
                    'Pathogenic/Likely_pathogenic':{color:'#f04124',label:'Pathogenic/Likely pathogenic',textcolor:'white'},
                    'drug_response':{color:'gold',label:'Drug response',textcolor:'white'},
                    '_drug_response':{color:'gold',label:'Drug response',textcolor:'white'},
                    'Conflicting_interpretations_of_pathogenicity':{color:'#90C3D4',label:'Conflicting interpretations of pathogenicity'},
                    'other':{color:'#ccc',label:'Other'},
                    '_other':{color:'#ccc',label:'Other'},
                    'not_provided':{color:'#ccc',label:'Not provided'},
                    '_not_provided':{color:'#ccc',label:'Not provided'},
                    'risk_factor':{color:'#ccc',label:'Risk factor'},
                    '_risk_factor':{color:'#ccc',label:'Risk factor'},
                    'association':{color:'#ccc',label:'Association'},
                    '_association':{color:'#ccc',label:'Association'},
                    'Affects':{color:'#ccc',label:'Affects'},
                    '_Affects':{color:'#ccc',label:'Affects'},
                    'protective':{color:'#ccc',label:'Protective'},
                    '_protective':{color:'#ccc',label:'Protective'},
                }
            },
            {
                name:'ExAC frequency',
                locusinfo:{ key:'AF_EXAC' },
                numericfilter:[ {side:'<',value:0.0001}, {side:'<',value:0.001},  {side:'<',value:.01} ],
            },
            {
                name:'GO-ESP frequency',
                locusinfo:{ key:'AF_ESP' },
                numericfilter:[ {side:'<',value:0.0001}, {side:'<',value:0.001},  {side:'<',value:.01} ],
            },
            {
                name:'1000 Genomes frequency',
                locusinfo:{ key:'AF_TGP' },
                numericfilter:[ {side:'<',value:0.0001}, {side:'<',value:0.001},  {side:'<',value:.01} ],
            },
        ],
               setidx4mclass:0,
               setidx4numeric:3,
               }
        }
    ]
})
</script>
</body>
</html>

The example declares the VCF track as a JavaScript object (with the attribute “vcfinfofilter” explained below), and submit it to the public ProteinPaint server through the embedding API.

The attribute ”vcfinfofilter” provides instructions about what INFO fields can be used to highlight variants. It must contain the ”lst” attribute which is an array, and each element is an object describing an INFO field.

In each object describing an INFO field, use ”locusinfo” to indicate that this field provides a value for each locus, and ”altalleleinfo” to indicate that this field provides one value for each alternative allele, for a given locus.

In a VCF file, an INFO field may describes categorical information about variants. In this case, the attribute ”categories” is used to list a set of keys from such categorical information. And for each key, it provides the color and text label. Keys found in the VCF file but not listed here will be shown in black color. Variants not annotated by this INFO field will be aggregated under the category “Not annotated” and shown in black.

Numerical INFO field can be used to provide cutoff values and filter variants, through ”numericfilter”. The attribute “categories” should not be used for numerical INFO.

Categorical INFO field, locus-level

{
    name:'Variant origin',
    locusinfo:{ 
             key:'SAO'
       },
    categories:{
        '0':{color:'#aaa',label:'unspecified'},
        '1':{color:'orange',label:'Germline'},
        '2':{color:'blue',label:'Somatic'},
        '3':{color:'magenta',label:'Both'}
    }
},

Its META line:

##INFO=<ID=SAO,Number=1,Type=Integer,Description="Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both">

Annotation to a variant:

#CHROM  POS     ID    REF     ALT     QUAL    FILTER  INFO
1       1014143 .     C       T       .       .       ...;SAO=1;...

Categorical INFO field, alt-allele-level

{
    name:'Review status',
    altalleleinfo:{
        key:'CLNREVSTAT',
        separator:'|',
    },
    categories:{
        'no_assertion':{color:'#545454',label:'no assertion'},
        'no_criteria':{color:'#858585',label:'no criteria'},
        'mult':{color:'blue',label:'multiple submitters no conflicts'},
        'conf':{color:'red',label:'conflicting'},
        'exp':{color:'orange',label:'expert'},
        'guide':{color:'green',label:'Practice guideline'}
    }
}

Its META line:

##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="no_assertion - No assertion provided, no_criteria - No assertion criteria provided, single - Criteria provided single submitter, mult - Criteria provided multiple submitters no conflicts, conf - Criteria provided conflicting interpretations, exp - Reviewed by expert panel, guideline - Practice guideline">

Annotation to a variant:

1       1051336 rs141463750     C       G,T     .       .       ...;CLNREVSTAT=single;...

(this may not be a good example of alt-allele-level INFO given the particular way that ClinVar selects an allele to describe, see the CLNALLE field)

Categorical field without predefined categories and automatic color assignment

This can apply to both locus-level and alt-allele level INFO fields.

{
    name:'...',
    altalleleinfo:{
        key:'CLNREVSTAT'
    },
      autocategory:true
}

When “autocategory” is set to true, it no longer requires the “categories{}” to predefine the list of types and colors. Such will be dynamically assigned each time the mutation data is loaded. A fixed color list will be used for assigning color to the list of showing categories by the order of abundance.

May allow adding a color list.

Numerical INFO field

For numerical INFO field, its value can be used for filtering variants in the style of population frequency. Same as categorical INFO, it can be either alt-allele-level or locus-level.

{
vcfinfofilter:{lst:[
    {
    name:'Population frequency filter',
    altalleleinfo:{
        key:'AF'
    },
    numericfilter:[.000001, 0.00001, 0.0001, 0.001],
      numericCutoff:0.0001,
      numericCutoffSide:'>'
    }
]}
}

The META line:

##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">

Annotation to a variant:

1       13528   .       C       G,T     1771.54 .      ...;AF=5.438e-04,3.738e-04;...

By default the filter will retain variants with values lower than the cutoff. To change the side of comparison, use the optional “side” attribute:

    numericfilter:[ {side:">",value:1}, {side:">", value:10} ... ]

To set the default cutoff values from a numeric filter, use the ”numericCutoff” and ”numericCutoffSide” attributes as shown in above example.

Missing value: if the numeric INFO field is missing for a variant, [its value will default to 0]{.underline}. This may work for the case of population frequency (AF), but may not make sense for other types of data. Possible solutions:

Specify a value that a missing variant will default to.
Underneath the Y scale, show a new row labeled “missing” to bin all the missing variants.

Additional features

These features will draw certain charts based on numeric attributes of the underlying samples.

Population frequency filter

The population frequency filter can be applied to any multi-sample VCF track, and requires appropriate configuration.

As above, this requires declaring the track as JSON object and working with the embedding API. following is one example:

{
    type:'vcf',
    name:'SCD 143 samples',
    file:'path/to/vcf.gz',
    populationfrequencyfilter:{
         name:"Population frequency filter",
         lst:[0.01, 0.05, 0.1]
       }
},

The attribute “populationfrequencyfilter” defines a set of fraction values between 0 and 1. Each value will search as a cutoff, e.g. value 0.01 allows to select variants with less than 1% frequency in the population (dividend is the total # of samples in the VCF track).

In ProteinPaint, it’s displayed as below:

The cutoff values (1%, 5%, 10%) are listed in the legend section at the bottom. Click ‘<5%’ to enable filtering, and resulted in 28 out of 35 variants visible in the track display. Click on the same cutoff value again to disable filtering.

Coverage-VAF plot

The coverage-VAF plot is available for any multi-sample tracks with allele read count information. The plot can be generated upon clicking on a mutation that is present in 2 or more samples.

This graph Requires the AD FORMAT field

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

Declare the track as the following JSON object with the “vaf2coverageplot” attribute.

{
    type:'vcf',
    name:'SCD 143 samples',
    file:'path/to/vcf.gz',
    vaf2coverageplot:{}
},

Optionally, specify following attributes in vaf2coverageplot to distinguish samples by color in the plot.

Sample line plots

Track Shows numerical Y axis, for each variant, all samples are shown as strokes

One numerical value for each sample, as provided by a metadata attribute.

              "samplebynumericvalue":{
                     "attrkey":"HbFlevel",
                     "axislabel":"HbF level"
               },

Genotype by numeric value

Track shows numerical Y axis. For each variant, one circle is shown for each genotype. Each circle represent a group of samples carrying that genotype. Circle vertical position is determined by a consensus value (mean) across the samples. This value should be provided as INFO fields. Circle sizes are scaled by number of samples in that group.

The following configuration will require a multi-sample VCF file for getting the total number of samples for each genotype.

            "genotypebynumericvalue":{
                "axislabel":"Genotype HbF",
                "refref":{
                    "infokey":"Hom_Ref_HbF",
                    "color":"gray"
                },
                "refalt":{
                    "infokey":"Het_HbF",
                    "color":"blue"
                },
                "altalt":{
                    "infokey":"Hom_Alt_HbF",
                    "color":"red"
                }
            },

Following configuration will als provide sample count in each genotype group as a INFO field. This requires two INFO fields for each genotype, one to provide Y-axis value, the other to provide number of samples for sizing the circle.

    "genotypebynumericvalue":{
          "axislabel":"Genotype HbF",
        "refref":{
              "infokey":"Hom_Ref_HbF",
                  "genotyepCountInfokey": STR,
            "color":"gray"
        },
        "refalt":{
            "infokey":"Het_HbF",
                  "genotyepCountInfokey": STR,
            "color":"blue"
        },
        "altalt":{
            "infokey":"Hom_Alt_HbF",
                  "genotyepCountInfokey": STR,
            "color":"red"
        }
    },

Boxplots by genotype

Upon clicking a variant, this will show a set of boxplots, one for each group of samples carrying one genotype from this variant. The boxplot reflects distribution of samples with respect to a specified numeric attribute.

{
    "genotype2boxplot":{
      "axislabel": STR,
      "sampleannotationkey": STR,
      "boxplotvaluekey": STR  // alternative to sampleannotationkey
    },
}

If ”sampleannoationkey” is set, will require sampleannotation to be provided. See the section on “sample metadata annotation”.

If ”boxplotvaluekey” is provided, its value should be the name of an INFO field that provides the box-and-whisker values for creating boxplots for each genotype.

The INFO field should be defined as:

##INFO=<ID=boxplotbygenotype,Number=1,Type=String,Description="For each possible genotype, define a boxplot with 5 numbers: low whisker, 1st quartile, median, 3rd quartile, high whisker. Genotypes are represented as refref, refalt, altalt. Separate genotypes by |, separate values by /">

Example value to this INFO field:

boxplotbygenotype=refref/0.1/1/4/6/8|refalt/1/2/3/4/5|altalt/2/3/4/5/6

Sample metadata annotation

Sample annotation is an optional feature. If provided, it can be used to improve coverage-VAF plot, and divide samples to groups by the “sampletype” attribute to make separate coverage-VAF plots for each.

To provide sample annotation, add a nested attribute ”sampleannotation.annotation” to the VCF track object:

{
    ... VCF definitions ... 

    "sampleannotation":{
      "annotation":{
         "sample1_D":{
            "diagnosis":"SCD",
            "sampletype":"Tumor",
            "color":"blue"
         },
         "sample1_G":{
            "diagnosis":"SCD",
            "sampletype":"Normal",
            "color":"blue"
         },
         ... more samples ... 
       },
       "key4annotation":"sample"
    }
}

“annotation” provides a hash, where are keys and values are annotations for this sample.

“Key4annotation” defines the attribute name for . It enables user to specify if the annotation is at sample-level or patient-level.

!!! todo further clarify

Additional enhancements/description can be added to ”sampleannotation“:

    sampleannotation:{
      levels:[
              {k:'diagnosis',label:'Diagnosis',full:'diagnosis_full_name'},
              {k:'subtype',label:'Subtype',full:'subtype_full_name'}
         ],
      variantsunburst:true,
      key2color:{"root...SCD":"green"}
    },

“Levels” provides a hierarchy of attributes for stratifying samples. Levels are arranged from high to low hierarchy. In each level, the string value of ”k” and ”full” should be found as attributes in “sampleannotation.annotation” of any sample.

“Variantsunburst” allows showing a sunburst chart of samples when clicking on a variant.

“Key2color” is a temp fix to define color for an annotation value from any level. In the example, “SCD” is the actual value of first level, and it must have prefix “root…“. This defines the display color in sunburst chart.

Live example: https://proteinpaint.stjude.org/pcgp.html in which the track “PCGP somatic mutation” is a multi-sample VCF track and is equipped with metadata annotation to its samples. Its track legend shows the annotation which can be used to filter samples:

All the metadata annotation and track JSON declaration can be found in the source of the HTML page, around following lines:

Speeding up multi-sample VCF track loading

Given a multi-sample VCF track (cohortVCF), prepare a variant-only version of this track (variantVCF), and combine them in the track definition like below:

{
    type:'vcf',
    name:'lots of samples',
    file:'path/to/variant_VCF',
    vcfcohorttrack:{
        file:'path/to/cohort_VCF',
    },
}

When displaying this track in ProteinPaint, the variantVCF will be loaded for showing all variants in the view range. Upon clicking a variant, the cohortVCF will be queried to retrieve sample information on this variant, for rendering coverage-VAF plot or other uses.

This design cuts down data transfer and can greatly improve performance for loading large-cohort VCF track.

The “samplenamemap()” function can be applied in such track setup. Add this function in the “vcfcohortrack” object.

Linking an image to each variant

{
    type:'vcf',
    name:'lots of samples',
    file:'path/to/variant_VCF',
    variant2img:{
        path:'path/to/imagefolder/',
            ext:'png'
    },
}

By setting “variant2img”, it allows to link an image to each variant. A “Image” button will appear at the details panel of the variant; clicking the button will display the image associated with this variant below.

For each variant, the image will be located in the following way; note that the position is 1-based, and path is relative to the TP directory.

variant2img.path + chromosome.position.REFallele.ALTallele + . + variant2img.ext

Table display of multi-column INFO Field

For a complex INFO field such as:

##INFO=<ID=INFOKEY,Number=.,Type=String,Description="Consequence annotations. Format: Allele|Consequence|SYMBOL|Entrez Gene ID|Feature_type|Feature|HGVSc|HGVSp|CIViC Variant Name|CIViC Variant ID|CIViC Variant Aliases|CIViC HGVS|Allele Registry ID|ClinVar IDs|CIViC Variant Evidence Score|CIViC Entity Type|CIViC Entity ID|CIViC Entity URL|CIViC Entity Source|CIViC Entity Variant Origin|CIViC Entity Status">

Through configuration, contents of such INFO field for a variant can be displayed as a table:

CIViC Entity Type	CIViC Entity ID	CIViC Entity URL	CIViC Entity Source	CIViC Entity Variant Origin	CIViC Entity Status
evidence	3766	[https://civicdb.org/links/evidence/3766]{.underline}](https://civicdb.org/links/evidence/3766)	PUBMED [22621641	Somatic	submitted
evidence	7485	[https://civicdb.org/links/evidence/7485]{.underline}](https://civicdb.org/links/evidence/7485)	PUBMED [31151904	Somatic	submitted
evidence	7077	[https://civicdb.org/links/evidence/7077]{.underline}](https://civicdb.org/links/evidence/7077)	PUBMED [25527633	Somatic	submitted
evidence	3768	[https://civicdb.org/links/evidence/3768]{.underline}](https://civicdb.org/links/evidence/3768)	PUBMED [23278307	Somatic	submitted
evidence	3781	[https://civicdb.org/links/evidence/3781]{.underline}](https://civicdb.org/links/evidence/3781)	PUBMED [23733758	Somatic	submitted

`info2table:{}`

Key: name of the CSQ-like INFO field.

Value: {} with keys of “col_separator” and “fields” array.

“Fields” array should contain all the sub-fields in the INFO field, even for those hidden ones. Each element describes configurations with display/formatting instructions for each sub-field.

info2table: {
    INFOKEY: {
       col_separator: '|',
       separate_tables:[ ... ], // optional
       fields: [
           { name: 'Allele' },
           { name: 'Consequence' },
           { name: 'SYMBOL' },
           { name: 'Entrez Gene ID' },
           { name: 'Feature_type' },
           { name: 'Feature' },
           { name: 'HGVSc' },
           { name: 'HGVSp' },
           ... more fields ... 
       ]
    }
}

`info2singletable:{}`

Same setting as the “info2table”.

Will only parse the first item of the CSQ field. This is useful in such cases when the given sub-fields are duplicated across all items.

A “vertical” two-column table will be displayed, with a row for each sub-field.


> Allele	> T
> Consequence	> missense_variant
> SYMBOL	> BRAF
> Entrez Gene ID	> 673
> Feature_type	> transcript
> Feature	> ENST00000288602.6
> HGVSc	> ENST00000288602.6:c.1799T>A

`.hide:true`

If set, the field will be hidden from the table.

`.name:STR`

Required. A display name of the field.

`.ampersand2br:true`

Will convert all ’&’ to HTML line break
.

`.isurl:true`

The column value is a full URL and will be shown as a tag.

`.appendUrl: STR`

To append the value to the end of the given partial URL and display the composed URL.

`.urlMatchLst:{}`

Example config:

urlMatchLst: {
   separator: '_',
   idIndex: 0,
   types: [
      { type: 'pubmed', appendUrl: 'https://www.ncbi.nlm.nih.gov/pubmed/' },
      { type: 'asco', appendUrl: 'https://meetinglibrary.asco.org/record/' }
   ]
}

Which allows to parse following values into respective URL links.

31566309_(PubMed)
168986_(ASCO)

`separate_tables:\[\]`

Optional, and only applies to “info2table” but not “info2singletable”.

When multiple “rows” could be contained in one INFO field, the rows are to be divided into groups based on criteria of the “groupers” array, and shown in separate tables. Each table has the same set of columns, as defined in the “fields[]” array.

separate_tables: [
   {
       headhtml: 'Accepted evidence',
       groupers: [
           { field: 'CIViC Entity Type', value: 'evidence' },
           { field: 'CIViC Entity Status', value: 'accepted' }
       ]
   },
   { ... another table ... }
]

URL links for variants

A URL can be made for SNV/indel variants from a VCF track, so that when user clicks on the variant and shows the details, the URL link will be shown, to link to the variant in another website.

To enable this, add the ”url4variant” attribute in the track definition:

{
    type:'vcf',
    name:'Clinvar',
    file:'hg38/clinvar.hg38.201612.vcf.gz',
    url4variant:[
        {
        label:'PeCAN-PIE',
        makeurl:(m)=>{
            return 'https://pecan.stjude.org/#/variant/view/'
                +m.chr.replace('chr','')
                +'/'
                +(m.pos+1)
                +'/'
                +m.ref
                +'/'
                +m.alt
        }
        }
    ],
}

Each element in “url4variant” describes a method to produce a URL for the variant. This allows multiple URL links to be produced on the same variant.

Each element has the ”makeurl” attribute, a JavaScript function that takes in the variant object as input, and returns the fully-formed URL string, parameterized by the attributes of the variant.

The method may not work for mutation types other than SNV/indel.

Summary of VCF track object

A schematic overview of all available options that you can use to customize a VCF track.

Using a VCF File

Preparing a VCF file

Hosting a VCF file

Showing a VCF file on ProteinPaint

Via URL parameters

Via embedding API

Variant annotation formats

VEP format

ANN format

In-house format

Combining multiple VCF files in one track

Highlighting and filtering variants with INFO fields

Categorical INFO field, locus-level

Categorical INFO field, alt-allele-level

Categorical field without predefined categories and automatic color assignment

Numerical INFO field

Additional features

Population frequency filter

Coverage-VAF plot

Sample line plots

Genotype by numeric value

Boxplots by genotype

Sample metadata annotation

Speeding up multi-sample VCF track loading

Linking an image to each variant

Table display of multi-column INFO Field

info2table:{}

info2singletable:{}

.hide:true

.name:STR

.ampersand2br:true

.isurl:true

.appendUrl: STR

.urlMatchLst:{}

separate_tables:\[\]

URL links for variants

Summary of VCF track object

Page History 1-2 of 2 results

`info2table:{}`

`info2singletable:{}`

`.hide:true`

`.name:STR`

`.ampersand2br:true`

`.isurl:true`

`.appendUrl: STR`

`.urlMatchLst:{}`

`separate_tables:\[\]`