Institutional-Repository, University of Moratuwa.  

A Model based approach for cluster traditional rice varieties of Sri Lanka

Show simple item record

dc.contributor.advisor Wickramarachchi, N Silva, MDRL 2016-10-24T15:10:07Z 2016-10-24T15:10:07Z
dc.description.abstract As a result of the enormous volume of data produced by highly developed modern techniques, focus on clustering biological data has shown a great interest among biologist to detect the underlying patterns in data since the biological experiment itself has failed to identify the hidden information and divergence patterns exist in data correctly. This study aims to (1) assist clustering biologically similar sequences to detect divergence patterns exist in rice genomic data, by developing a program using the model based clustering algorithm based on Chinese restaurant process which was originally proposed to cluster gene expression data (2) focus on nding the performance of calculating the pairwise distance matrix of rice genome sequences based on the 12-dimensional natural vector of the DNA sequence, as the similarity measure in cluster analysis. The developed program based on the proposed model based clustering method was executed on ALFP pro le data set consisting features of 53 Sri Lankan traditional and wild rice varieties in order to identify the genetic divergence among them. Both a statistical and a biological cluster evaluation were carried out to validate the results obtained. Statistical evaluation was done based on the Bayes ratio to measure the tightness of the clusters formed. Biological evaluation was conducted with the help of the domain experts and research work done by the institute of rice of Sri Lanka. The results showed that the proposed algorithm is capable of identifying highly similar varieties of rice showing their divergence patterns. Finding the performance of how well the natural vector method captures the information encoded in rice genome sequences, 10 rice disease resistance genes which belong to three di erent protein families from Rice genome annotation project database were used. The results showed that the pairwise distance matrix calculated based on 12-dimensional natural vector method gives e cient results compared to traditional proximity matrices. It also revealed that the xed length size sequences (sub sequences) which are not greater than the minimum total length of the selected sequences are also highly capable of capturing the encoded information in total length, regardless of the sub sequence length. en_US
dc.language.iso en en_US
dc.subject Model-Based clustering, Genetic Diversity en_US
dc.title A Model based approach for cluster traditional rice varieties of Sri Lanka en_US
dc.type Thesis-Abstract en_US
dc.identifier.faculty Engineering en_US MPhil en_US
dc.identifier.department Department of Computer Science & Engineering en_US 2015
dc.identifier.accno 109892 en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record