Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing

Kenneth A. Watanabe, Arielle Homayouni, Tara Tufano, Jennifer Lopez, Patricia Ringler, Paul Rushton and Qingxi J. Shen

DNA Research, 22(5): 319-329, 2015, doi: 10.1093/dnares/dsv015, First published online: September 3, 2015

Abstract: Annotation of the rice (Oryza sativa) genome has evolved significantly since release of its draft sequence, but it is far from complete. Several published transcript assembly programmes were tested on RNA-sequencing (RNA-seq) data to determine their effectiveness in identifying novel genes to improve the rice genome annotation. Cufflinks, a popular assembly software, did not identify all transcripts suggested by the RNA-seq data. Other assembly software was CPU intensive, lacked documentation, or lacked software updates. To overcome these shortcomings, a heuristic ab initio transcript assembly algorithm, Tiling Assembly, was developed to identify genes based on short read and junction alignment. Tiling Assembly was compared with Cufflinks to evaluate its gene-finding capabilities. Additionally, a pipeline was developed to eliminate false-positive gene identification due to noise or repetitive regions in the genome. By combining Tiling Assembly and Cufflinks, 767 unannotated genes were identified in the rice genome, demonstrating that combining both programmes proved highly efficient for novel gene identification. We also demonstrated that Tiling Assembly can accurately determine transcription start sites by comparing the Tiling Assembly genes with their corresponding full-length cDNA. We applied our pipeline to additional organisms and identified numerous unannotated genes, demonstrating that Tiling Assembly is an organism-independent tool for genome annotation.

Full article

Quick Start Guide

This guide is not all-inclusive. Please refer to the documentation for more detailed instructions.

This software is licensed as an open source GNU general public license version 2.0 and is free to use for academic purposes, but we ask that if you use this software, please site our publication.

When installing this software, please put all scripts into the same directory.

Modify the connectDB.pl script so that it contains a valid MySQL username and password.

The SAM file of the aligned RNA-seq data must be loaded into a MySQL database prior to running the Tiling Assembly software. The load_sam_file.pl script was provided to parse and load your SAM file of RNA-seq data. A header.sam file is provided which contains the field names of the sam table and is required for the load_sam_file.pl to run.

A typical order of running the Tiling Assembly is as follows:

1. Create the necessary database tables in MySQL.
2. load_sam_file.pl
3. load_junctions.pl
4. exon_builder2.pl
5. exons_from_junctions.pl
6. link_exons.pl
7. scan_exons.pl
8. transcript_builder2.pl

Documentation

File MD5 checksum: 3637267B3F7F0399BF32769BB6F93EE9

Notice: Tiling Assembly is freely available for academic research only. For commercial usage, please contact us for a license.



Contact Information

Professor Jeffery Q. Shen
School of Life Sciences, MS4004
University of Nevada, Las Vegas
4505 S. Maryland Parkway
Las Vegas, NV 89154-4004
jeffery.shenATunlv.edu
Office: (702) 895-4704
Lab: (702) 895-1529



© 2015 All Rights Reserved for Shen Laboratory UNLV