Classification of long non-coding RNAs

Classification of long non-coding RNAs

Internship Description

​Long non-coding RNAs (lncRNAs) have been found to perform various functions in a wide variety of important biological processes. To make easier interpretation of lncRNA functions and conduct deep mining on these transcribed sequences, it is important to classify lncRNAs into different groups. lncRNA classification attracts much attention recently. The main technical difficulties are 1) the limited number of known lncRNAs (small training sample size), and 2) the very different lengths of lncRNAs. This project is to apply and further improve the string kernel algorithms developed in Prof. Gao’s group to the lncRNA classification problem. ​​​​

Deliverables/Expectations

The visiting student for this project is expected to finish the following deliverables:
1.       Give a throughout literature review on lncRNA classification methods and potential machine learning methods that can be applied to this problem.
2.       Get familiar with the string kernel algorithms developed in Prof. Gao’s group.
3.       Gather an lncRNA dataset to be used as the benchmark set for this research.
4.       Conduct a comprehensive comparative study of the state-of-the-art methods on the benchmark set.
5.       Apply the string kernel algorithms on lncRNA classification and evaluate the performance.
6.       If necessary, improve the string kernel algorithms to achieve better performance.
Write a report to summarize the results.

Faculty Name

Xin Gao

Field of Study

Computer science, bioinformatics, electrical engineering, applied mathematics