Splice junction with read count
The track can be used to represent RNA splice junction data from RNA-Seq assays. Each file can contain information for 1 or more samples.
File format of single sample data
The file has 6 required columns:
- Chromosome name, e.g. “chr1”
- Start, 0-based position of the last nucleotide of the upstream exon
- Stop, 0-based position of the first nucleotide of the downstream exon
- Strand, +/-
-
Type of junction, if not available, use empty string
a. It can be arbitrary text value, e.g. “known” or “novel”. Any used types should be stated in the .categories{} attribute of the track so they can be distinguished by color.
- Read count, integer value
Steps to convert the tabular text file to a junction track
sort -k1,1 -k2,2n textfile > textfile.sorted
bgzip textfile.sorted
tabix -p bed textfile.sorted.gz
This generates two files:
textfile.sorted.gz
textfile.sorted.gz.tbi
Put both files in the same directory on the server, and use the file path (or URL) to the .gz file for submitting.
Refer to this document for declaring the junction track in JSON format.
File format for multiple samples
The first five columns are the same as single sample file. The 6th column is the read count for the first sample, the 7th column is the second sample, so that arbitrary number of samples can be represented in this way.
Optionally, provide a header line to denote sample names, e.g.:
#chr start stop strand type sample1 sample2 ...
Header line must begin with ”#“. Bgzip this file in the same way. To index this file, run tabix with additional parameter:
tabix -p bed -c "#" multisample.gz
Sample names like “sample1” and “sample2” in the header of above example can be replaced by JSON object strings, as a way of encoding additional information on samples in the track file.
{"patient":"SJACT001","sample":"SJACT001_D","sampletype":"DIAGNOSIS","diagnosis_group_short":"ST","diagnosis_group_full":"Solid Tumor","diagnosis_short":"ACT","diagnosis_full":"Adrenocortical Carcinoma","diagnosis_subtype_short":"TP53-mut","diagnosis_subtype_full":"TP53-mut"}
{"patient":"SJACT002","sample":"SJACT002_D","sampletype":"DIAGNOSIS","diagnosis_group_short":"ST","diagnosis_group_full":"Solid Tumor","diagnosis_short":"ACT","diagnosis_full":"Adrenocortical Carcinoma","diagnosis_subtype_short":"TP53-mut","diagnosis_subtype_full":"TP53-mut"}
This can allow plotting samples by different colors. To do so, add “cohortsetting” attribute to track object when using the embedding API:
runproteinpaint({
... other parameters ...
tracks:[
{
type:'junction',
name:'track name',
file:'path/to/file.gz',
cohortsetting:{
uselevelidx:0,
cohort:{
levels:[
{
k:'diagnosis_group_short',
label:'cancer'
}
]
}
}
}
],
})