Duplicate bug report detection using pre - trained language models

Sewwandi KAU

UoM IR
→
Thesis & Dissertation
→
Faculty of Engineering, Computer Science & Engineering
→
Master of Science in Computer science and Engineering
→
View Item

dc.contributor.advisor	Ranathunga S
dc.contributor.author	Sewwandi KAU
dc.date.accessioned	2022
dc.date.available	2022
dc.date.issued	2022
dc.identifier.citation	Sewwandi, K.A.U. (2022). Duplicate bug report detection using pre - trained language models [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21592
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/21592
dc.description.abstract	Software testing and defect reporting are significant factors of software development and maintenance. Defects are identified and reported in a bug tracking system like JIRA, or Bugzilla. Those reported defects are further triaged by an expert who has an understanding of the repository, system, and developers and assigns them to the developers to fix them. During this defect reporting there can be duplicate bugs reported and identifying duplicate bugs is a crucial task. Manual labeling of duplicate defects is time-consuming, may identify defects as duplicate bug reports, and also increases the cost of software maintenance. Therefore automated duplicate bug report detection is very significant. This research proposes a duplicate bug report classification methodology that leverages the Pre-trained language models BERT and XLNet with Multi-Layer Perceptron as the Deep Learning classifier for duplicate bug detection. We tested on publicly available datasets related to Eclipse, NetBeans, and OpenOffice bug reporting datasets. The selected models were shown to outperform the previously proposed systems for the same task. Among them, the approach used with BERT embeddings has shown the best results. Further experiments showed that BERT is capable of domain adaptation –meaning that even when the BERT was finetuned with different bug report datasets, it is still capable of detecting duplicate bugs in an unseen dataset. Finally, a multi-stage classification was done using a Convolutional Neural Network model and a BERT model using Eclipse and NetBeans datasets and a combined dataset of Eclipse and NetBeans. The approach used with the combined dataset has outperformed the baseline approach.	en_US
dc.language.iso	en	en_US
dc.subject	DUPLICATE BUG DETECTION	en_US
dc.subject	BERT	en_US
dc.subject	XLNET	en_US
dc.subject	MLP	en_US
dc.subject	CNN	en_US
dc.subject	DOMAIN ADAPTATION	en_US
dc.subject	MULTI-STAGE CLASSIFICATION	en_US
dc.subject	COMPUTER SCIENCE & ENGINEERING -Dissertation	en_US
dc.subject	COMPUTER SCIENCE -Dissertation	en_US
dc.subject	INFORMATION TECHNOLOGY -Dissertation	en_US
dc.title	Duplicate bug report detection using pre - trained language models	en_US
dc.type	Thesis-Abstract	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.degree	MSc In Computer Science and Engineering	en_US
dc.identifier.department	Department of Computer Science and Engineering	en_US
dc.date.accept	2022
dc.identifier.accno	TH4977	en_US