When you’re contemplating about using lossless compression in a part of your application, your probably more than aware of the compression algorithms that are available today. Historically, deflate (as used in plain old ZIP files) has been very popular with many free and commercial implementations available. Since .NET 2.0 the DeflateStream class was added to the .NET Framework. This allowed the developer to include compression in an application in a relatively simple manner without introducing extra dependencies.
More recently, a whole plethora of algorithms are popular: bzip2, PPMd, LZMA (used in 7zip), and so forth. These more recent algorithms provide greater compression ratios, often trading in some speed to allow more number crunching.
But these characteristics apply to large payloads: backups, lossless image compression, installation media and so forth. When you need to compress small payloads (anywhere from possibly a couple of bytes to a few hundred bytes) the picture changes. Let’s take a look at what happens.
In some particular cases, compressing such small payloads may have a large impact on the performance of an application. Why save a couple of bytes, you say? Well, for example, compressing the ViewState in an ASP.NET application may be beneficial for the overall performance of the web application. The ViewState is (or should be) encrypted, making it impossible for any server-side page result compression to achieve a good compression ratio on that part of the page (you know, the base64 formatted string in a hidden input element). The ViewState should always be kept to a minimum, and compressing even what’s left may have a great impact on overall application performance. Add all those bytes together in a typical browsing session, combined with a relatively slow network connection (think 3G on a mobile platform) and you’ll see the benefit.
I wrote a small console application, attached to this blog post, that performs some testing on various implementations of popular algorithms that are available at no cost. I’m using pretty nonsensical payloads, but still it will prove my point.
Here are the results:
For an input length of 4 bytes:
| Method Name | Output Length | Ratio | Time |
| .NET DeflateStream | 109 | 2725% | 0,50ms |
| #ZipLib Deflate | 12 | 300% | 56ms |
| #ZipLib Bzip2 | 42 | 1050% | 17ms |
| 7Zip LZMA | 22 | 550% | 57ms |
For a rather repetitive input with length of 61 bytes:
| Method Name | Output Length | Ratio | Time |
| .NET DeflateStream | 115 | 189% | 0,02ms |
| #ZipLib Deflate | 18 | 30% | 1,5ms |
| #ZipLib Bzip2 | 53 | 87% | 0,12ms |
| 7Zip LZMA | 29 | 48% | 21ms |
For an input length of 57 bytes:
| Method Name | Output Length | Ratio | Time |
| .NET DeflateStream | 151 | 265% | 0,06ms |
| #ZipLib Deflate | 61 | 107% | 1,7ms |
| #ZipLib Bzip2 | 84 | 147% | 0,2ms |
| 7Zip LZMA | 72 | 126% | 22ms |
For an input length of 472 bytes:
| Method Name | Output Length | Ratio | Time |
| .NET DeflateStream | 410 | 87% | 0,05ms |
| #ZipLib Deflate | 276 | 59% | 0,22ms |
| #ZipLib Bzip2 | 303 | 64% | 0,40ms |
| 7Zip LZMA | 332 | 70% | 34ms |
For an input length of 1052 bytes:
| Method Name | Output Length | Ratio | Time |
| .NET DeflateStream | 668 | 64% | 0,1ms |
| #ZipLib Deflate | 518 | 49% | 0,32ms |
| #ZipLib Bzip2 | 544 | 52% | 0,96ms |
| 7Zip LZMA | 589 | 56% | 22ms |
Our first attention must be drawn to the testing method: each compression method for each input is only performed once. This may have some impact in the speed comparison, especially in the result for test with smallest input as I suspect JIT compilation still needed to be done.
Next, my biggest conclusion would be that for small payloads (even up to a kilobyte), 7zip provides no additional compression benefit over the other algorithms. On the contrary, it never exceeds #ZipLib’s deflate implementation with these input samples and it takes considerably longer to complete.
As a surprise, the .NET implementation of deflate adds considerable overhead in the smallest payloads. Using Reflector, I was able to find out why: the compressor always adds a large Huffman dictionary at the front (called the ‘preamble’) that contains little useful information for the compression. #ZipLib’s implementation incurs the smallest overhead: just 12 bytes for an input size of 4. Granted, it seems like such a waste at 300% increase. If it matters, it’s usually most beneficial to compress anyway, and determine which payload (the compressed or the uncompressed one) has the smallest size and include a small token to indicate whether decompression is required.
My final conclusion: always test using real-world data and measure the benefits, or you might be in for a little surprise in the end!
Sample Code
The article contains sample code project(s).
You must be logged in to view or download sample code.
Sign in now