Show simple item record

dc.contributor.author Sumanaweera, DN
dc.contributor.author Doole, FF
dc.contributor.author Pathiraja, DP
dc.contributor.author Deshapriya, GGK
dc.contributor.author Dias, G
dc.date.accessioned 2018-09-19T21:08:16Z
dc.date.available 2018-09-19T21:08:16Z
dc.identifier.uri http://dl.lib.mrt.ac.lk/handle/123/13577
dc.description.abstract Many computer systems; especially in corporations, contain large amount of documents such as letters, reports and presentations. Many such documents are present in several versions. Such data needs to be synchronized with branch offices and mobile devices, often over slow and expensive connections. However, as many documents are stored in an already compressed format, it is difficult to compress them further by exploiting the hidden redundancies. We present a novel approach named RepoZip which improves the compression of an existing compression algorithm over a document collection, by exploiting the inter-document meta-data and content-level redundancies. It concentrates on compressing OOXML documents that have been constructed through the archival of a hierarchy of meta-data files and PDF documents which include deflated content streams. Therefore, the RepoZip approach achieves larger compression gains over OOXML document collections or PDF document collections by exploiting usually undetected meta-data level similarities. en_US
dc.language.iso en en_US
dc.subject Keywords—lossless compression; meta-data similarity; OOXML; PDF; clusters; generalized suffix tree en_US
dc.title RepoZip : a technique for lossless compression of document collections en_US
dc.type Conference-Abstract en_US
dc.identifier.faculty Engineering en_US
dc.identifier.department Department of Computer Science and Engineering en_US
dc.identifier.year 2015 en_US
dc.identifier.conference Moratuwa Engineering Research Conference - MERCon 2015 en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.email dinithi.10@cse.mrt.ac.lk en_US
dc.identifier.email fahima.10@cse.mrt.ac.lk en_US
dc.identifier.email daham.10@cse.mrt.ac.lk en_US
dc.identifier.email kelum.10@cse.mrt.ac.lk en_US
dc.identifier.email gihan@uom.lk en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record