Combining DNA Methylation with Deep Learning Improves Sensitivity and Accuracy of Eukaryotic Genome Annotation

Zynda, Gregory J.

Sangam Home
→
Electronic Theses and Dissertations (ETDs)
→
IUScholarWorks
→
View Item

dc.contributor	Dalkilic, Mehmet
dc.creator	Zynda, Gregory J.
dc.date	2020-04-28T19:23:14Z
dc.date	2020-04-28T19:23:14Z
dc.date	2020-04
dc.date.accessioned	2023-02-24T18:22:05Z
dc.date.available	2023-02-24T18:22:05Z
dc.identifier	http://hdl.handle.net/2022/25389
dc.identifier.uri	http://localhost:8080/xmlui/handle/CUHPOERS/260033
dc.description	Thesis (Ph.D.) - Indiana University, School of Informatics, Computing, and Engineering, 2020
dc.description	The genome assembly process has significantly decreased in computational complexity since the advent of third-generation long-read technologies. However, genome annotations still require significant manual effort from scientists to produce trust-worthy annotations required for most bioinformatic analyses. Current methods for automatic eukaryotic annotation rely on sequence homology, structure, or repeat detection, and each method requires a separate tool, making the workflow for a final product a complex ensemble. Beyond the nucleotide sequence, one important component of genetic architecture is the presence of epigenetic marks, including DNA methylation. However, no automatic annotation tools currently use this valuable information. As methylation data becomes more widely available from nanopore sequencing technology, tools that take advantage of patterns in this data will be in demand. The goal of this dissertation was to improve the annotation process by developing and training a recurrent neural network (RNN) on trusted annotations to recognize multiple classes of elements from both the reference sequence and DNA methylation. We found that our proposed tool, RNNotate, detected fewer coding elements than GlimmerHMM and Augustus, but those predictions were more often correct. When predicting transposable elements, RNNotate was more accurate than both Repeat-Masker and RepeatScout. Additionally, we found that RNNotate was significantly less sensitive when trained and run without DNA methylation, validating our hypothesis. To our best knowledge, we are not only the first group to use recurrent neural networks for eukaryotic genome annotation, but we also innovated in the data space by utilizing DNA methylation patterns for prediction.
dc.language	en
dc.publisher	[Bloomington, Ind.] : Indiana University
dc.subject	genome annotation
dc.subject	deep learning
dc.subject	rnn
dc.subject	dna methylation
dc.subject	epigenetics
dc.title	Combining DNA Methylation with Deep Learning Improves Sensitivity and Accuracy of Eukaryotic Genome Annotation
dc.type	Doctoral Dissertation

Files in this item

Files	Size	Format	View
zynda_dissertation_20-04-18.pdf	7.015Mb	application/pdf	View/Open

This item appears in the following Collection(s)

IUScholarWorks [635]
Indiana University Bloomington

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Combining DNA Methylation with Deep Learning Improves Sensitivity and Accuracy of Eukaryotic Genome Annotation

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection