Show simple item record Sumanaweera, DN Doole, FF Pathiraja, DP Deshapriya, GGK Dias, G 2018-09-19T21:08:16Z 2018-09-19T21:08:16Z
dc.description.abstract Many computer systems; especially in corporations, contain large amount of documents such as letters, reports and presentations. Many such documents are present in several versions. Such data needs to be synchronized with branch offices and mobile devices, often over slow and expensive connections. However, as many documents are stored in an already compressed format, it is difficult to compress them further by exploiting the hidden redundancies. We present a novel approach named RepoZip which improves the compression of an existing compression algorithm over a document collection, by exploiting the inter-document meta-data and content-level redundancies. It concentrates on compressing OOXML documents that have been constructed through the archival of a hierarchy of meta-data files and PDF documents which include deflated content streams. Therefore, the RepoZip approach achieves larger compression gains over OOXML document collections or PDF document collections by exploiting usually undetected meta-data level similarities. en_US
dc.language.iso en en_US
dc.subject Keywords—lossless compression; meta-data similarity; OOXML; PDF; clusters; generalized suffix tree en_US
dc.title RepoZip : a technique for lossless compression of document collections en_US
dc.type Conference-Abstract en_US
dc.identifier.faculty Engineering en_US
dc.identifier.department Department of Computer Science and Engineering en_US
dc.identifier.year 2015 en_US
dc.identifier.conference Moratuwa Engineering Research Conference - MERCon 2015 en_US Moratuwa, Sri Lanka en_US en_US en_US en_US en_US en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record