Compressing small payloads

When you’re contemplating about using lossless compression in a part of your application, your probably more than aware of the compression algorithms that are available today. Historically, deflate (as used in plain old ZIP files) has been very popular with many free and commercial implementations available. Since .NET 2.0 the DeflateStream class was added to the .NET Framework. This allowed the developer to include compression in an application in a relatively simple manner without introducing extra dependencies.

More recently, a whole plethora of algorithms are popular: bzip2, PPMd, LZMA (used in 7zip), and so forth. These more recent algorithms provide greater compression ratios, often trading in some speed to allow more number crunching.

But these characteristics apply to large payloads: backups, lossless image compression, installation media and so forth. When you need to compress small payloads (anywhere from possibly a couple of bytes to a few hundred bytes) the picture changes. Let’s take a look at what happens.

In some particular cases, compressing such small payloads may have a large impact on the performance of an application. Why save a couple of bytes, you say? Well, for example, compressing the ViewState in an ASP.NET application may be beneficial for the overall performance of the web application. The ViewState is (or should be) encrypted, making it impossible for any server-side page result compression to achieve a good compression ratio on that part of the page (you know, the base64 formatted string in a hidden input element). The ViewState should always be kept to a minimum, and compressing even what’s left may have a great impact on overall application performance. Add all those bytes together in a typical browsing session, combined with a relatively slow network connection (think 3G on a mobile platform) and you’ll see the benefit.

I wrote a small console application, attached to this blog post, that performs some testing on various implementations of popular algorithms that are available at no cost. I’m using pretty nonsensical payloads, but still it will prove my point.

Here are the results:

For an input length of 4 bytes:

Method NameOutput LengthRatioTime
.NET DeflateStream1092725%0,50ms
#ZipLib Deflate12300%56ms
#ZipLib Bzip2421050%17ms
7Zip LZMA22550%57ms

For a rather repetitive input with length of 61 bytes:

Method NameOutput LengthRatioTime
.NET DeflateStream115189%0,02ms
#ZipLib Deflate1830%1,5ms
#ZipLib Bzip25387%0,12ms
7Zip LZMA2948%21ms

For an input length of 57 bytes:

Method NameOutput LengthRatioTime
.NET DeflateStream151265%0,06ms
#ZipLib Deflate61107%1,7ms
#ZipLib Bzip284147%0,2ms
7Zip LZMA72126%22ms

For an input length of 472 bytes:

Method NameOutput LengthRatioTime
.NET DeflateStream41087%0,05ms
#ZipLib Deflate27659%0,22ms
#ZipLib Bzip230364%0,40ms
7Zip LZMA33270%34ms

For an input length of 1052 bytes:

Method NameOutput LengthRatioTime
.NET DeflateStream66864%0,1ms
#ZipLib Deflate51849%0,32ms
#ZipLib Bzip254452%0,96ms
7Zip LZMA58956%22ms

Our first attention must be drawn to the testing method: each compression method for each input is only performed once. This may have some impact in the speed comparison, especially in the result for test with smallest input as I suspect JIT compilation still needed to be done.

Next, my biggest conclusion would be that for small payloads (even up to a kilobyte), 7zip provides no additional compression benefit over the other algorithms. On the contrary, it never exceeds #ZipLib’s deflate implementation with these input samples and it takes considerably longer to complete.

As a surprise, the .NET implementation of deflate adds considerable overhead in the smallest payloads. Using Reflector, I was able to find out why: the compressor always adds a large Huffman dictionary at the front (called the ‘preamble’) that contains little useful information for the compression. #ZipLib’s implementation incurs the smallest overhead: just 12 bytes for an input size of 4. Granted, it seems like such a waste at 300% increase. If it matters, it’s usually most beneficial to compress anyway, and determine which payload (the compressed or the uncompressed one) has the smallest size and include a small token to indicate whether decompression is required.

My final conclusion: always test using real-world data and measure the benefits, or you might be in for a little surprise in the end!

Sample Code

The article contains sample code project(s).
You must be logged in to view or download sample code.
Sign in now

Comments

Leave a comment
You must be logged in to post comments.
Sign in now
 
 
Technical
Business
rss feed