I’m a big fan of 7-Zip. It isn’t the best-looking application ever written, but that could be because its creator, Igor Pavlov, is concerned much more with its compression methods than its interface. 7-Zip has its own container format, but more important is the LZMA compression algorithm that Igor wrote and put into the public domain.

I decided to do some quick and dirty benchmarks to track the progress of LZMA/7-Zip over time. I went back as far as Igor supplied binaries, including one from the very old 3.x series. Rather than test every single release between then and now, I used only “stable” releases, with the exception of version 4.65, which is the latest version of any sort, as well as 4.66, which uses an alpha version of Igor’s new LZMA2 codec (and, as you’ll see, provides definite performance improvement).

I used Igor’s Timer utility to time the process (global time was reported). The corpus in this case was the Linux kernel source, v2.6.28. I conducted these tests on a RAM disk to eliminate hard disk latency issues (especially for decompressions, which improved by about 25% from my initial HDD-based tests). My rig is a Intel Core 2 Quad Q6600 [2.4Ghz], with 4GB of RAM (one dedicated to the RAM disk), running Vista SP1 x64.

The command line setup was an approximation of the 7-Zip GUI’s “ultra” settings: -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on, letting the archiver auto-choose the number of threads to spawn.

The Data

LZMA Efficiency
7-zip version encoding time (s) decoding time (s)
3.13 541.271 43.379
4.20 531.457 44.040
4.23 527.871 42.425
4.32 341.290 42.126
4.42 219.451 42.211
4.57 174.064 44.163
4.62 170.973 42.836
4.65 170.917 43.058
4.66 (lzma2) 126.259 46.663

The Analysis

LZMA efficiency graph

Without conducting a more thorough battery of tests on a variety of different configurations, it’s difficulty to say with certain just where the performance improvements came from, be it better using of threading or multiprocessors, general algorithmic improvements, or something else. I also don’t know if the performance increases we see reside in improvements to LZMA itself as Igor was finalizing it, or just the code quality of 7-Zip, which implements LZMA.

In any case, the improvements since 3.13 are very clear (remember that lower is better), at least for compression, and for “ultra” settings. Decompression remained largely similar, which surprised me. Some of these results might be directly tied to the number and type of files that were compressed in the case: 4.66, for instance, improves decompression speed for uncompressable files, but no such files exist here since it’s source code.

Hats off to Igor Pavlov for his steady improvement on both a really great compression standard and one of my favorite pieces of software for Windows.

§3580 · February 9, 2009 · Tags: , , , , , ·

1 Comment to “Tracking LZMA efficiency”

  1. everling says:

    I followed your link from Coding Horror.

    Thank you for the great work in revealing Igor’s continuing optimisation efforts. It gives me warm and fuzzy feelings to use such a well loved tool. ^_^

Leave a Reply