Frequently Asked Questions

Last updated: a year ago (view history), Time to read: 10 mins

Will I be charged for using St. Jude Cloud Genomics Platform?

Within St. Jude Cloud Genomics Platform, any St. Jude data you receive through a data request is sponsored, meaning that you do not have to pay a fee to store this data in St. Jude Cloud. You will not incur any costs except in the following situations:

  • Any other files, such as input files uploaded to or results produced by St. Jude Cloud, will incur a monthly fee. See your DNAnexus billing information for the cost per GB.
  • Any analysis workflows, including those provided by St. Jude or your own that you have uploaded and packaged into the cloud, will incur a charge. The charge depends on the underlying compute resources used and the amount of time taken. Documentation for specific workflows we provide should contain guidance on how much the workflow costs. See your DNAnexus billing information for the price of each VM size per hour.
  • At the current time, downloading data is free to end-users. Note, however, that downloading is not without cost to St. Jude. All cloud environments charge an egress fee for anyone downloading data outside of the cloud. At the current time, St. Jude has chosen to sponsor any egress fees associated with downloading data. However, we reserve the right to alter this in the future.

How can I set up billing for my lab?

Billing setup is different based on whether you are an internal user (you work at St. Jude) or an external user. If you are a St. Jude Employee, please refer to the dedicated intranet page for instructions. If you are not a St. Jude Employee, please refer to our Create an Account guide.

Does St. Jude Cloud allow for-profit companies to access genomics data?

We do not allow for-profit companies to access any of our restricted access genomics data. We are persistently working with our institution to create a path forward for companies. If you work for a for-profit company and would like to be notified if this rule changes, feel free to email us at support@stjude.cloud.

Why do I need to sign the Data Access Agreement (DAA)?

The data access agreement serves many purposes. Ultimately, the terms included in the data access agreement are in place to protect our patients. We take patient security very seriously, and we require that requesters are committed to protecting that privacy to the fullest extent.

How do I submit edits/revisions to the DAA?

We do not alter the terms of the data access agreement for any reason except when the terms are found to be directly in conflict with state or national law. In this case, please send a reference to law and a short description to support@stjude.cloud. Otherwise, please understand that simply cannot manage the operational overhead of differing agreements with different parties.

Can I get a Microsoft Word version of the DAA?

We do not provide any editable format of the DAA, as we do not accept edits or revisions from external parties.

Where can I find the latest version of the Data Access Agreement (DAA)?

You can download the latest version of the DAA here.

Where do I submit the Data Access Agreement (DAA)?

You can submit your Data Access Agreement in the drag and drop box on the last step of the data request process.

What if I did not fill out the Data Download Permission section of the original DAA, but now I want to download data?

This would be a change in terms from the original agreement, you would need to fill out a new DAA (including the Data Download Permission section for any data sets you want to download.

Is there another mechanism for downloading the data besides using the dx command line tool?

For an out of the box solution, the dx download command is the canonical way to download files from St. Jude Cloud Genomics Platform. If you are downloading large amounts of files, you may also try the DNAnexus Download Agent, which at times has been reported to be more performant and/or reliable.

St. Jude Cloud data is made available via signed links that are generated by Azure Storage and expire after a certain amount of time. This mechanism is the standard and most broadly available method for accessing data stored in the cloud environments, and the files may be accessed by a number of open source tools for download. To generate a signed link to any of the files, you may make use of the dx make_download_url command followed by any of the major command line tools (aria2c, wget, curl, etc). Note that, if you choose to generate signed links to our data, it is your responsibility to ensure the links remain secure, including but not limited to the links remaining only known by yourself (so that others can not access the data in an authorized manner, as anyone can use them) and expiring as soon as possible (to minimize the amount of time the link is live).

No commercial solution (such as aspera) is available, nor are any custom transfer processes.

We note here that downloading data at a large scale, no matter which tool is used, is an inherently unpredictable and unreliable process. This is a key driving factor for why so many in the community (including ourselves with the St. Jude Cloud project) have adopted the model of bringing tools to the data in the cloud. While we allow users to download data from our platform, we recommend those having issues with downloading data analyze the data in the cloud or try downloading the data again at a later time with retries enabled.

What clinical information is available about samples in St. Jude Cloud?

You can view the basic clinical and phenotypic information we currently provide here.

Can I gain access to further clinical information than what is currently available?

We are working towards being able to provide additional clinical annotations such as treatment, outcome, and survival data in the future. Unfortunately, we do not offer it today and we do not have a timeline for when it will be available.

We do not provide individual consent forms or blank consent forms for any samples on St. Jude Cloud. We have chosen to remain consistent with the requirements of the other major genomic data repositories in that (1) there is an internal vetting process by the St. Jude IRB to ensure samples may be shared with the research community, but (2) we do not share the informed consents with data requesters.

Can I request FASTQ files on St. Jude Cloud?

We do not store FASTQ files in St. Jude Cloud because it would double the storage cost without any benefit. Several tools exist that you can leverage to revert BAM to FASTQ files — we recommend using Picard SamToFastq to revert BAM files. You can efficiently revert BAMs to FASTQs in the cloud by wrapping the conversion tool of your choice into a Cloud App

How can I work with genomics data in the cloud?

You can view this guide to learn how create a cloud application.

How can I run an analysis workflow on multiple sample files at the same time?

The DNAnexus interface does have a batch tool available; however, it is in early testing, so we recommend using dx-toolkit on the command line as the most reliable and user friendly approach to batch and submit jobs. You can find our documentation on how to install and get started with dx-toolkit here. You may also refer to the sample script below that loops through all the BAM files in the data folder and submits a job using the BAM and matching index file.

for bam in $(dx ls '/data/*.bam'); do  
  dx run \  
    --yes \  
    --input "0.BAM=/data/$bam" \
    --input "0.BAM_INDEX=/data/$bam.bai" \
    "$PROJECT_ID:/Rapid RNA-Seq (BAM)"
done

Note that this sample script assumes that the BAM and index files are in the data folder and the Rapid RNA-Seq analysis workflow is in the project. $PROJECT_ID can be set to your project dxid, and Rapid RNA-Seq (BAM) can be changed to the workflow you want to run.

Why am I getting a connectivity error when connecting to DNAnexus API via SSH?

If you are trying to run something like $ dx run --ssh <executable>
and are getting a connectivity error, it may be that your firewall is too restrictive. Are you able to perform the command from an unrestricted network (like a home network)? If yes, you can resolve this issue by asking your network administrator to whitelist connections to Azure US West. All subnets (Region Name=“uswest”) are provided here.

How can I delete my account?

Today, a St. Jude Cloud Genomics Platform account is simply a DNAnexus account. Thus, of you’d like to delete your account, you’ll need to email DNAnexus asking for it to be removed. You can do so by contacting DNAnexus support at support@dnanexus.com with the following email.

Subject: St. Jude Cloud account deletion

Hi DNAnexus,

Would you please assist me in deleting my St. Jude Cloud account? My username is _.

Thank you!