Developing Production Workflows for Long-read Data

Kiran V. Garimella from the Broad Institute spoke at the Nanopore Community Meeting 2019 about “Long-read genomes and transcriptomes on the cloud.” Garimella shared photos of the PromethION 48 at the Broad. They shared a graphic of the amount of data produced at the institute over the last decade. Garimella explained that at one point, the amount of data produced by sequencing at the Broad outpaced the ability to provide storage space. The HPC at the Broad is robust… yet there are limitations with the amount of work and number of people using that cluster. The goal with long-read data has been to migrate the tools to the cloud. Garimella explained that they worked to develop a workflow definition language (WDL). The group put together Docker images and containers for their applications. The new setup has some local development, and most production is on the cloud. The ability to create and spin down compute resources reduces costs. One project that is supported by this infrastructure is the Rare Genomes Project. Long-read approaches are used when plausible variants are not found with short-read data. Garimella’s group has rewritten assemblers to launch in Google Cloud, for example. Transcriptome sequencing with long reads is particularly powerful. Direct RNA sequencing was too challenging because of the low throughput. cDNA PCR produced sufficient yield. The team selected samples with high RNA quality. The researchers leveraged cloud-based GPUs. They used Flair for isoform discovery after re-basecalling data. This process resulted in the discovery of a tremendous number of potential isoforms. With this workflow, they have generated transcriptome data and workflows that benefit the community.

brain and idea bubbles surrounding it — What is needed for more efficient workflows for data analyses on the cloud? Photo by KATRIN BOLOVTSOVA on Pexels.com

Post Categories

Credits

Website images were purchased from and edited in Canva.com. Blog post images are from the WordPress free image library powered by Pexels. Gallery images used were taken or created by Carlos C. Goller or otherwise attribution is stated. Blog posts represent my reflections and reference relevant sources of information, including conferences, podcasts, books, and workshops when applicable. I strive for proper attribution of sources and accessibility of content. I am still early in the journey. I appreciate feedback!

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Developing Production Workflows for Long-read Data

Understanding Alternative Splicing with Blessy R Package

Advancements in Antisense Oligonucleotide Design

Ultra-Fast Classifiers for Pediatric Tumors: Insights from Lennart Kester