Show simple item record

dc.contributor.advisor Ranathunga S
dc.contributor.author Sewwandi KAU
dc.date.accessioned 2022
dc.date.available 2022
dc.date.issued 2022
dc.identifier.citation Sewwandi, K.A.U. (2022). Duplicate bug report detection using pre - trained language models [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21592
dc.identifier.uri http://dl.lib.uom.lk/handle/123/21592
dc.description.abstract Software testing and defect reporting are significant factors of software development and maintenance. Defects are identified and reported in a bug tracking system like JIRA, or Bugzilla. Those reported defects are further triaged by an expert who has an understanding of the repository, system, and developers and assigns them to the developers to fix them. During this defect reporting there can be duplicate bugs reported and identifying duplicate bugs is a crucial task. Manual labeling of duplicate defects is time-consuming, may identify defects as duplicate bug reports, and also increases the cost of software maintenance. Therefore automated duplicate bug report detection is very significant. This research proposes a duplicate bug report classification methodology that leverages the Pre-trained language models BERT and XLNet with Multi-Layer Perceptron as the Deep Learning classifier for duplicate bug detection. We tested on publicly available datasets related to Eclipse, NetBeans, and OpenOffice bug reporting datasets. The selected models were shown to outperform the previously proposed systems for the same task. Among them, the approach used with BERT embeddings has shown the best results. Further experiments showed that BERT is capable of domain adaptation –meaning that even when the BERT was finetuned with different bug report datasets, it is still capable of detecting duplicate bugs in an unseen dataset. Finally, a multi-stage classification was done using a Convolutional Neural Network model and a BERT model using Eclipse and NetBeans datasets and a combined dataset of Eclipse and NetBeans. The approach used with the combined dataset has outperformed the baseline approach. en_US
dc.language.iso en en_US
dc.subject DUPLICATE BUG DETECTION en_US
dc.subject BERT en_US
dc.subject XLNET en_US
dc.subject MLP en_US
dc.subject CNN en_US
dc.subject DOMAIN ADAPTATION en_US
dc.subject MULTI-STAGE CLASSIFICATION en_US
dc.subject COMPUTER SCIENCE & ENGINEERING -Dissertation en_US
dc.subject COMPUTER SCIENCE -Dissertation en_US
dc.subject INFORMATION TECHNOLOGY -Dissertation en_US
dc.title Duplicate bug report detection using pre - trained language models en_US
dc.type Thesis-Abstract en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree MSc In Computer Science and Engineering en_US
dc.identifier.department Department of Computer Science and Engineering en_US
dc.date.accept 2022
dc.identifier.accno TH4977 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record