<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>A Modest Construct &#187; Linux</title>
	<atom:link href="http://heliologue.com/tag/linux/feed/" rel="self" type="application/rss+xml" />
	<link>http://heliologue.com</link>
	<description></description>
	<lastBuildDate>Fri, 03 Feb 2012 17:18:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Linux Command-Line Compressors Compared</title>
		<link>http://heliologue.com/2011/05/03/linux-command-line-compressors-compared/</link>
		<comments>http://heliologue.com/2011/05/03/linux-command-line-compressors-compared/#comments</comments>
		<pubDate>Tue, 03 May 2011 13:22:31 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=7042</guid>
		<description><![CDATA[A long time ago, I ran a comparison of various command-line compressors in Linux. Recently, intrigued by the rise of parallel computing and the emergence of multi-processor versions of old *nix favorites like gzip and bzip2, I thought I&#8217;d give the benchmark another go. The Setup My machine is an Intel Core 2 Quad Q6600 [...]]]></description>
			<content:encoded><![CDATA[<p>A long time ago, I ran a <a href="http://heliologue.com/2007/02/01/linux-command-line-compressor-benchmarks/">comparison</a> of various command-line compressors in Linux.  Recently, intrigued by the rise of parallel computing and the emergence of multi-processor versions of old *nix favorites like gzip and bzip2, I thought I&#8217;d give the benchmark another go.</p>
<p><span id="more-7042"></span></p>
<h3>The Setup</h3>
<p>My machine is an Intel Core 2 Quad Q6600 [2.4Ghz] on a Gigabyte GA-N680SLI-DQ6 (nVidia 680i), with 4GB of Corsair XMS2(PC2 6400).  The tests were run on a vanilla installation of Ubuntu 10.10 x64, fully updated.  The compressors themselves were compiled by me from source code, so I would have the latest and greatest at the time of testing.</p>
<p>I did not use a RAM disk, since I wanted all of my available RAM available for compression (some of the tests used aggressive settings).  I did use a freshly-formatted SATA disk for the purpose, however.  For further information, see the methodology and results.</p>
<p>Timing was done with the GNU <code>time</code> command.</p>
<h3>The Compressors</h3>
<ul>
<li><a rel="external" href="http://www.gnu.org/software/gzip/"><b>gzip</b></a>. Perhaps the most well-known compressor on Linux, gzip is a DEFLATE-based compressor which offers moderate compression at a very fast speed. Note: I made the mistake of retrieving the sources from <a href="http://www.gzip.org">www.gzip.org</a>, which has links to v1.2.4, from 1999.  The most recent stable version is 1.4, from <a href="http://ftp.gnu.org/gnu/gzip/">GNU&#8217;s servers</a>. The version reflected in the benchmark, therefore, is out of date.</li>
<li><a rel="external" href="http://bzip.org/"><b>bzip2</b></a>. An old favorite, bzip2 is a basic block-level compressor which is more aggressive than its cousin, gzip, at the cost of speed.</li>
<li><a rel="external" href="http://tukaani.org/xz/"><b>xz</b></a>.  A relatively new compressor, xz arose from the ashes of <a href="http://tukaani.org/lzma/">LZMAUtils</a>, an implementation of the LZMA compression spec from Igor Pavlov&#8217;s 7-zip project. xz is is the command-line equivalent, and similar in usage to gzip and bzip2; more to the point, it&#8217;s been accepted by the GNU toolchain and now has its own filter in tar (-J).</li>
<li><a rel="external" href="http://rzip.samba.org/"><b>rzip</b></a>.  Originally written by Andrew Tridgell (of Samba fame, and who indirectly caused <i>git</i>, when you think about it) as a doctoral project.  It functions similarly to gzip and bzip2, according to its author, but attempts to take advantage of &#8220;long-distance redundancies&#8221; in larger files; that is, where bzip2 may only process 900K chunks at a time, rzip will seek to look beyond that in order to eek out additional compression.</li>
<li><a rel="external" href="http://www.lzop.org/"><b>lzop</b></a>. One of the Lempel-Ziv-Oberhumer (LZO) compressors, it focuses on decompression speed rather than pure compression.</li>
<li><a rel="external" href="http://ncompress.sourceforge.net/"><b>compress</b></a>. Though not really a contender in any real zip, I included the old <i>compress</i> program for comparative purposes.  Compress is a fast LZW compressor which creates .Z files, and can be triggered as a filter in tar with -Z.  The implementation used is <a href="http://ncompress.sourceforge.net/">ncompress</a>.</li>
<li><a rel="external" href="http://www.info-zip.org/"><b>zip</b></a>.  An old standby, zip creates, well, .zip files, using the DEFLATE algorithm. One difference between zip and gzip is that zip creates a structured archive (no tar necessary).</li>
<li><a rel="external" href="http://freshmeat.net/projects/lzip/"><b>lzip</b></a>. Another LZMA-based compressor, lzip is heavily-asynchronous, emphasizing small decompression times at the expensive of larger initial compression time.</li>
<li><a rel="external" href="http://p7zip.sourceforge.net/"><b>p7zip</b></a>. The *nix version of Igor Pavlov&#8217;s 7-Zip for Windows, p7zip has a slew of options (including different compressors, such as ppmd), but we&#8217;ll be focusing on lzma2.</li>
<li><a rel="external" href="http://www.zlib.net/pigz/"><b>pigz</b></a>. A parallel implementation of gzip. Like most parallel implementations, pigz sacrifices a small amount of compression efficiency for large gains in speed.
</li>
<li><a rel="external" href="http://freshmeat.net/projects/lbzip2"><b>lbzip2</b></a>. A parallel implementation of bzip2. </li>
<li><a rel="external" href="http://www.compression.ca/pbzip2/"><b>pbzip2</b></a>. Another parallel implementation of bzip2. </li>
<li><a rel="external" href="http://www.quicklz.com/"><b>quicklz (qpress)</b></a>. A very fast implementation of LZO, the QuickLZ library was at version 1.5.0 at the time of the benchmark, but its companion compressor program, QPress, was linked against 1.4.1.</li>
<li><a rel="external" href="http://freshmeat.net/projects/lxz"><b>lxz</b></a>.  A parallel implementation of the <code>xz</code> compressor.  By the same creator as <code>lbzip2</code>.  It supports compression <em>only</em>, so the benchmark will not contain decompression numbers for this tool.  The author considers it a stopgap until xz gets multithreading support.</li>
<li><a rel="external" href="http://freshmeat.net/projects/lrzip"><b>lrzip</b></a>. Long ago, a fork of <code>rzip</code>, Con Kolivas&#8217; <code>lrzip</code> has become a full-fledged compressor with multiple possible algorithms to choose from; the only one tested was LZMA.  Note that the version tested here (0.571) was replaced shortly after I ran my benchmarks by v0.600, which was a large rewrite ostensibly improving decompression speed.</li>
<li><a rel="external" href="http://www.nongnu.org/lzip/plzip.html"><b>plzip</b></a>. A straightfoward parallelization of the <code>lzip</code> compressor.</li>
<div class="info">
<p><strong>A note on options.</strong>  </p>
<p>Most of the compressors tested follow the same basic pattern for specifying compression strength, modeled after the options of gzip and bzip2.  <code>-#</code> specifies compression strength, where # is a number.  On some compressors, <code>0</code> is the lowest; on others, it&#8217;s <code>1</code>.  My approach was to test the lowest strength, the highest strength, and the default strength (either 5 or 6, depending).</p>
<p>On some multithreaded compressors, it was necessary to specify the number of threads or processors to use.  Where this was required, I use a value of <code>4</code>, the number of cores in my processor.
</p></div>
<h3>The Methodology</h3>
<p><b>Corpus</b>. I downloaded the 11/3/2009 <code>pages-articles.xml.bz2</code> from Wikipedia&#8217;s <a  rel="external" href="http://dumps.wikimedia.org/enwiki/20091103/">mirror</a> an uncompressed it.  23+ GB. Uh oh.  Realizing that I didn&#8217;t have all the time in the world to spend compressing things, I truncated the file to its first 1&#8217;073&#8217;741&#8217;824 bytes (1GB) using the <code>truncate</code> command.</p>
<p>For each compressor/setting, the compression was run three times; the compressed size was recorded; then the decompression was run three times.  Each run&#8217;s time was recorded, and the average and standard deviation was calculated for both compression and decompression.</p>
<p>Note that in some cases, operation speed was limited by the speed of the hard drive (I&#8217;ll cover this in detail in the next section); there isn&#8217;t much to be done about this, other than the note it. Multiple runs should tease out inconsistencies and show where there is a real discrepancy as opposed to a consistent bottleneck.</p>
<h3>The Results</h3>
<p>The data table itself is far too large to fit in this template, <a rel="external" href="https://spreadsheets.google.com/spreadsheet/lv?key=0AjhZYyrcZ50idGdQaGdPYmF3RGk5emJYOG13ZkYzSHc&#038;hl=en&#038;f=0&#038;rm=full">but you can see it as Google Spreadsheet here</a>.</p>
<p>There&#8217;s no way to crown a clear winner, since all compressors mean tradeoffs.  If I were as smart and capable as <a href="http://www.maximumcompression.com/">Werner Bergman</a>, I could have a formula worked up for <a href="http://www.maximumcompression.com/data/summary_mf2.php">compressor efficiency</a>, but I&#8217;m more curious to see the differences between single-threaded compressors and their multithreaded variants.</p>
<p>To get it out of the way:  <code>xz</code> at maximum compression had the best compression ratio of all tested configurations.  At 243&#8217;350&#8217;864 bytes, it shrank the test corpus to 22.7% of its original size.  This came at a heavy cost, though, as compression at this level took an average of 27 minutes, give or take 200 seconds.</p>
<p>The <em>fastest</em> compression, by far, came from QuickLZ, whose lowest setting managed to process the entire 1GB file in an average of <em>6.3 seconds</em>.  Its maximum setting doubled the time to 12 seconds and shaved off an additional 75MB, but the price of such fantastic speed is much less compression: the maximum setting reduced our test corpus to only 44.4% of its original size.</p>
<p class="info">
Remember that there is no <b>best</b> compressor; there is only a compressor that is <b>best</b> for your needs. Compression is all about diminishing returns:  it takes a mere 12 seconds to get from 100% to 44% with QuickLZ.  It takes an additional <em>27 minutes</em> to get from 44% to 22% with <code>xz</code>.
</p>
<p>When looking at the decompression times, one might notice that the scores bottom out at between 12-15 seconds.  My guess is this does not reflect the decompression speed at much as the write speed of the hard drive.  In theory, one might see even lower numbers with a RAMdisk, but in practice, decompression speeds which outpace storage write speeds are unlikely to be taken advantage of outside a benchmark scenario.</p>
<p>The numbers for <code>bzip2</code> vary wildly, for reasons I haven&#8217;t figured out.  I ran a second iteration of three tests for each setting and got similar numbers, so I stuck with my initial measurements.  Other compressors don&#8217;t have such high standard deviations relative to their runtimes, so I don&#8217;t <em>think</em> it was a problem with my setup. </p>
<p>Generally speaking, the parallel implementations of single-threaded compressors fare extremely well.  Let&#8217;s look at the average times for some of the standard ones. Note that compression ratios are given as the percentage of the original filesize, not the percent reduction, so lower is better.</p>
<table>
<caption>bzip2</caption>
<thead>
<tr>
<th>
			Compressor
		</th>
<th>
			Compression Time (s)
		</th>
<th>
			Compression Ratio
		</th>
<th>
			Decompression Time (s)
		</th>
</tr>
</thead>
<tbody>
<tr>
<td>bzip2 (min)</td>
<td>186.509</td>
<td>31.59%</td>
<td>63.342</td>
</tr>
<tr>
<td>lbzip2 (min)</td>
<td>38.852</td>
<td>31.63%</td>
<td>18.520</td>
</tr>
<tr>
<td>pbzip2 (min)</td>
<td>38.072</td>
<td>31.61%</td>
<td>13.533</td>
</tr>
<tr>
<td>bzip2 (default)</td>
<td>169.474</td>
<td>28.14%</td>
<td>64.432</td>
</tr>
<tr>
<td>lbzip2 (default)</td>
<td>42.085</td>
<td>28.15%</td>
<td>19.140</td>
</tr>
<tr>
<td>pbzip2 (default)</td>
<td>40.698</td>
<td>28.33%</td>
<td>14.630</td>
</tr>
<tr>
<td>bzip2 (max)</td>
<td>206.307</td>
<td>27.18%</td>
<td>66.460</td>
</tr>
<tr>
<td>lbzip2 (max)</td>
<td>52.091</td>
<td>27.19%</td>
<td>23.092</td>
</tr>
<tr>
<td>pbzip2 (max)</td>
<td>52.438</td>
<td>27.19%</td>
<td>20.617</td>
</tr>
</tbody>
</table>
<table>
<caption>gzip</caption>
<thead>
<tr>
<th>
			Compressor
		</th>
<th>
			Compression Time (s)
		</th>
<th>
			Compression Ratio
		</th>
<th>
			Decompression Time (s)
		</th>
</tr>
</thead>
<tbody>
<tr>
<td>gzip (min)</td>
<td>36.146</td>
<td>40.09%</td>
<td>19.195</td>
</tr>
<tr>
<td>pigz (min)</td>
<td>8.892</td>
<td>40.06%</td>
<td>12.199</td>
</tr>
<tr>
<td>gzip (default)</td>
<td>82.452</td>
<td>34.65%</td>
<td>16.912</td>
</tr>
<tr>
<td>pigz (default)</td>
<td>16.673</td>
<td>34.69%</td>
<td>12.852</td>
</tr>
<tr>
<td>gzip (max)</td>
<td>134.889</td>
<td>34.24%</td>
<td>16.513</td>
</tr>
<tr>
<td>pigz (max)</td>
<td>25.051</td>
<td>34.28%</td>
<td>12.129</td>
</tr>
</tbody>
</table>
<table>
<caption>xz</caption>
<thead>
<tr>
<th>
			Compressor
		</th>
<th>
			Compression Time (s)
		</th>
<th>
			Compression Ratio
		</th>
<th>
			Decompression Time (s)
		</th>
</tr>
</thead>
<tbody>
<tr>
<td>xz (min)</td>
<td>138.272</td>
<td>33.82%</td>
<td>50.916</td>
</tr>
<tr>
<td>lxz (min)</td>
<td>29.890</td>
<td>33.80%</td>
<td></td>
</tr>
<tr>
<td>xz (default)</td>
<td>1164.772</td>
<td>24.45%</td>
<td>34.114</td>
</tr>
<tr>
<td>lxz (default)</td>
<td>39.997</td>
<td>24.96%</td>
<td></td>
</tr>
<tr>
<td>xz (max)</td>
<td>1643.367</td>
<td>22.66%</td>
<td>31.968</td>
</tr>
<tr>
<td>lxz (max)</td>
<td>699.268</td>
<td>23.07%</td>
<td></td>
</tr>
</tbody>
</table>
<p>As you can see, most parallel implementations decrease compression <em>and</em> decompression times by a factor of between 3 and 5, while taking, at most, a few tenths of a percent increase in final file size.</p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2011/05/03/linux-command-line-compressors-compared/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GNOME Audio Player Shootout v3.0</title>
		<link>http://heliologue.com/2010/08/29/gnome-audio-player-shootout-v3-0/</link>
		<comments>http://heliologue.com/2010/08/29/gnome-audio-player-shootout-v3-0/#comments</comments>
		<pubDate>Sun, 29 Aug 2010 17:08:05 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[GNOME]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[reviews]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=5650</guid>
		<description><![CDATA[In January 2007 I published the GNOME Audio Player Shootout, a simple comparison of the options available to GNOME users for handling their day-to-day playback needs. It proved to be so popular that in December of 2008 I did a followup, excluding some abandoned players and adding some new ones. Though it hasn&#8217;t been quite [...]]]></description>
			<content:encoded><![CDATA[<p><img class="right" alt="GNOME logo" src="/img/tech/gnome.png"></p>
<p>In January 2007 I published the <a href="http://heliologue.com/2007/01/18/gnome-audio-player-shootout/">GNOME Audio Player Shootout</a>, a simple comparison of the options available to GNOME users for handling their day-to-day playback needs.  It proved to be so popular that in December of 2008 I did a followup, excluding some abandoned players and adding some new ones.  Though it hasn&#8217;t been quite two years yet, I thought it was time for another look at the state of audio players in the GNOME ecosystem.</p>
<p>This time around, I&#8217;ve got a heavy focus on new players, as there have been a number of new arrivals since my last shootout that show a lot of promise.  This article will cover (in no particular order): </p>
<ul>
<li>Rhythmbox (0.12.8)</li>
<li>Exaile (3.2.0)</li>
<li>Banshee (1.7.4)</li>
<li>Quod Libet (2.2.1)</li>
<li>Guayadeque (0.2.6-svn1186)</li>
<li>DeaDBeeF (0.4.1)</li>
<li>aTunes (2.0.1)</li>
<li>xnoise (0.1.10)</li>
<li>GMusicBrowser (1.1.5-git)</li>
<li>Aqualung (0.9~beta11)</i>
</ul>
<p>All testing was done using an up-to-date Ubuntu Lucid x64 with all necessary repositories added, including some PPAs for the last versions of these players.  Considered but not reviewed were Decibel Audio Player (hasn&#8217;t changed appreciably since last time), Gejengel (so unstable as to be unusable), and Bluemindo (still too simple to be useful).  </p>
<p>Please note that this article necessarily incorporates some of my own biases.  I am an avowed <a rel="external" href="http://foobar2000.org">foobar2000</a> fan and you will notice that I tend to favor the utility-minded players over the media centers and iTunes clones.  This article should still be useful, even if your own inclinations are different from mine.</p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2010/08/29/gnome-audio-player-shootout-v3-0/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Tracking LZMA efficiency</title>
		<link>http://heliologue.com/2009/02/09/tracking-lzma-efficiency/</link>
		<comments>http://heliologue.com/2009/02/09/tracking-lzma-efficiency/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 04:51:26 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[codecs]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=3580</guid>
		<description><![CDATA[I&#8217;m a big fan of 7-Zip. It isn&#8217;t the best-looking application ever written, but that could be because its creator, Igor Pavlov, is concerned much more with its compression methods than its interface. 7-Zip has its own container format, but more important is the LZMA compression algorithm that Igor wrote and put into the public [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a big fan of <a href="http://7-zip.org">7-Zip</a>. It isn&#8217;t the best-looking application ever written, but that could be because its creator, Igor Pavlov, is concerned much more with its compression methods than its interface.  7-Zip has its own container format, but more important is the <a href="http://en.wikipedia.org/wiki/LZMA">LZMA</a> compression algorithm that Igor wrote and put into the public domain.</p>
<p>I decided to do some quick and dirty benchmarks to track the progress of LZMA/7-Zip over time.  I went back as far as Igor supplied binaries, including one from the very old 3.x series.  Rather than test every single release between then and now, I used only &#8220;stable&#8221; releases, with the exception of version 4.65, which is the latest version of any sort, as well as 4.66, which uses an alpha version of Igor&#8217;s new LZMA2 codec (and, as you&#8217;ll see, provides definite performance improvement).</p>
<p>I used Igor&#8217;s Timer utility to time the process (global time was reported).  The corpus in this case was the <a href="http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.28.tar.bz2">Linux kernel source, v2.6.28</a>.  I conducted these tests on a RAM disk to eliminate hard disk latency issues (especially for decompressions, which improved by about 25% from my initial HDD-based tests). My rig is a Intel Core 2 Quad Q6600 [2.4Ghz], with 4GB of RAM (one dedicated to the RAM disk), running Vista SP1 x64. </p>
<p>The command line setup was an approximation of the 7-Zip GUI&#8217;s &#8220;ultra&#8221; settings:  <code>-t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on</code>, letting the archiver auto-choose the number of threads to spawn.  <span id="more-3580"></span></p>
<h3>The Data</h3>
<table class="sortable rowstyle-even">
<caption>
		LZMA Efficiency<br />
	</caption>
<thead>
<tr>
<th class="sortable-text">
				7-zip version
			</th>
<th class="sortable-numeric">
				encoding time (s)
			</th>
<th class="sortable-numeric">
				decoding time (s)
			</th>
</tr>
</thead>
<tbody>
<tr>
<td>
				3.13
			</td>
<td>
				541.271
			</td>
<td>
				43.379
			</td>
</tr>
<tr>
<td>
				4.20
			</td>
<td>
				531.457
			</td>
<td>
				44.040
			</td>
</tr>
<tr>
<td>
				4.23
			</td>
<td>
				527.871
			</td>
<td>
				42.425
			</td>
</tr>
<tr>
<td>
				4.32
			</td>
<td>
				341.290
			</td>
<td>
				42.126
			</td>
</tr>
<tr>
<td>
				4.42
			</td>
<td>
				219.451
			</td>
<td>
				42.211
			</td>
</tr>
<tr>
<td>
				4.57
			</td>
<td>
				174.064
			</td>
<td>
				44.163
			</td>
</tr>
<tr>
<td>
				4.62
			</td>
<td>
				170.973
			</td>
<td>
				42.836
			</td>
</tr>
<tr>
<td>
				4.65
			</td>
<td>
				170.917
			</td>
<td>
				43.058
			</td>
</tr>
<tr>
<td>
				4.66 (lzma2)
			</td>
<td>
				126.259
			</td>
<td>
				46.663
			</td>
</tr>
</tbody>
</table>
<h3>The Analysis</h3>
<p><a href="/img/albums/Software/lzma_compression_graph.png" class="right" rel="lightbox" title="Tracking LZMA efficiency"><img src="/img/albums/Software/lzma_compression_graph_thumb.png" alt="LZMA efficiency graph" /></a></p>
<p>Without conducting a more thorough battery of tests on a variety of different configurations, it&#8217;s difficulty to say with certain just <em>where</em> the performance improvements came from, be it better using of threading or multiprocessors, general algorithmic improvements, or something else.  I also don&#8217;t know if the performance increases we see reside in improvements to LZMA itself as Igor was finalizing it, or just the code quality of 7-Zip, which <em>implements</em> LZMA.</p>
<p>In any case, the improvements since 3.13 are very clear (remember that lower is better), at least for compression, and for &#8220;ultra&#8221; settings.  Decompression remained largely similar, which surprised me.  Some of these results might be directly tied to the number and type of files that were compressed in the case:  4.66, for instance, improves decompression speed for uncompressable files, but no such files exist here since it&#8217;s source code.</p>
<p>Hats off to Igor Pavlov for his steady improvement on both a really great compression standard and one of my favorite pieces of software for Windows.</p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2009/02/09/tracking-lzma-efficiency/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>GNOME Audio Player Shootout Revisited</title>
		<link>http://heliologue.com/2008/12/19/gnome-audio-player-shootout-revisited/</link>
		<comments>http://heliologue.com/2008/12/19/gnome-audio-player-shootout-revisited/#comments</comments>
		<pubDate>Fri, 19 Dec 2008 16:13:56 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[codecs]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[FLAC]]></category>
		<category><![CDATA[GNOME]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=2709</guid>
		<description><![CDATA[It&#8217;s been close to two years since I wrote GNOME Audio Player Shootout, a visual and textual comparison of some the best available audio players for the GNOME desktop. As is usually the case in the world of free software, a lot has happened since then (and yet, in a strange way, things have stayed [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/img/tech/gnome.png" alt="GNOME logo" class="right" /></p>
<p>It&#8217;s been close to two years since I wrote <a href="http://heliologue.com/2007/01/18/gnome-audio-player-shootout/">GNOME Audio Player Shootout</a>, a visual and textual comparison of some the best available audio players for the GNOME desktop.</p>
<p>As is usually the case in the world of free software, a lot has happened since then (and yet, in a strange way, things have stayed exactly the same).  I decided to revisit some of those players and see how they&#8217;ve progressed.  Some of them listed last time haven&#8217;t seen any appreciable development, and have been left off.</p>
<p class="alert">
I realize that I am totally ignoring the daemon-based players (read: Music Player Daemon, XMMS2);  this is by design, since those players open up a whole new can of worms.  Suffice it to say that if you&#8217;ve decided on and XMMS2 or MPD-based player and successfully configured it, you probably don&#8217;t need any advice on choosing software.
</p>
<p>The following programs will be covered in this review (development versions):</p>
<ul>
<li>BMPx (0.40.14)</li>
<li>Rhythmbox (0.11.6)</li>
<li>Exaile (2.99.1-svn)</li>
<li>Banshee (1.4.1)</li>
<li>Quod Libet (2.0)</li>
<li>Decibel (1.00)</li>
<li>Songbird (1.0)</li>
<li>Listen (0.6~svn1044)</li>
</ul>
<p>All of the testing was done on a fresh install (and update) of Ubuntu 8.10 in VirtualBox, using a small representative sample of my music collection (some modern, some classical, in Vorbis, MP3, and FLAC).</p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2008/12/19/gnome-audio-player-shootout-revisited/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>In defense of open models</title>
		<link>http://heliologue.com/2008/10/22/in-defense-of-open-models/</link>
		<comments>http://heliologue.com/2008/10/22/in-defense-of-open-models/#comments</comments>
		<pubDate>Wed, 22 Oct 2008 20:50:54 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[stupidity]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[Web 2.0]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=2907</guid>
		<description><![CDATA[Andrew Keen has no idea how open models work. In his latest article, he pontificates that the recent economic downturn is a death knell for community-supported or community-built programs/sites/&#38;c. So how will today&#8217;s brutal economic climate change the Web 2.0 &#8220;free&#8221; economy? It will result in the rise of online media businesses that reward their [...]]]></description>
			<content:encoded><![CDATA[<p>Andrew Keen has no idea how open models work.</p>
<p>In his latest article, he pontificates that the recent economic downturn is a death knell for community-supported or community-built programs/sites/&amp;c.</p>
<blockquote cite="http://www.internetevolution.com/author.asp?section_id=556&#038;doc_id=166342" title="Andrew Keen • Economy to Give Open-Source a Good Thumping">
<p>So how will today&#8217;s brutal economic climate change the Web 2.0 &#8220;free&#8221; economy? It will result in the rise of online media businesses that reward their contributors with cash; it will mean the success of Knol over Wikipedia, Mahalo over Google, TheAtlantic.com over the HuffingtonPost.com, iTunes over MySpace, Hulu over YouTube Inc., Playboy.com over Voyeurweb.com, TechCrunch over the blogosphere, CNN&#8217;s professional journalism over CNN&#8217;s iReporter citizen-journalism&#8230; The hungry and cold unemployed masses aren&#8217;t going to continue giving away their intellectual labor on the Internet in the speculative hope that they might get some &#8220;back end&#8221; revenue. &#8220;Free&#8221; doesn&#8217;t fill anyone&#8217;s belly; it doesn&#8217;t warm anyone up. </p>
</blockquote>
<p>There are really two broad fallacies that need addressing here.  The first is Keen&#8217;s use of the word &#8220;open source,&#8221; which here is a misnomer.  He never mentions Linux, Apache, or other open source programs which always have and will continue to have a dedicated base of programmers, most of whom work on it in their spare time, without any remuneration except personal pride and the esteem of their peers.  It need hardly be noted that an economic downtown is likely to <em>increase</em> interest in open-source software, as it likely reduces operating costs for businesses.</p>
<p><span id="more-2907"></span></p>
<p>No, what Keen means when he says &#8220;open source&#8221; is free-as-in-beer services, often serving liberally-licensed content;  Wikipedia&#8217;s content is not open source (there&#8217;s no source to open), but it <em>is</em> available under the GNU Free Documentation License, which is something like a liberal Creative Commons license.  Perhaps Keen has a sheet of words vaguely associated with Web 2.0 and just likes to throw them around in case his readers are too stupid to know better.</p>
<p>But then comes the bigger fallacy—i.e. in an economic depression, the things that motivated people to contribute to social sites and content servers will vanish entirely.  Nevermind the fact that most of these services don&#8217;t necessarily imply the forfeiture of copyright; or that many already include ways to monetize one&#8217;s content.  No, Keen fundamentally misunderstands why people contribute to things like Wikipedia.  This isn&#8217;t a recent phenomenon borne on the largess of the Web 2.0 bubble;  people didn&#8217;t start contributing to Wikipedia simply because they were so rich from their day jobs that they felt like giving something back.  No, people like being a part of something.  They like attaching their name to good work, free or not.</p>
<p>This is all a very roundabout way of saying that Keen couldn&#8217;t be more wrong;  he apparently is crass enough to believe that anything one does can and should be tied to monetary compensation.  I imagine he gets paid for his articles for Internet Evolution (if he was doing them <i>pro bono</i>, it would certainly speak volumes about his argument);  perhaps he overestimates the value of his labor.</p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2008/10/22/in-defense-of-open-models/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Desktop Linux revisited</title>
		<link>http://heliologue.com/2008/06/20/desktop-linux-revisited/</link>
		<comments>http://heliologue.com/2008/06/20/desktop-linux-revisited/#comments</comments>
		<pubDate>Fri, 20 Jun 2008 21:15:49 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[Firefox]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=2078</guid>
		<description><![CDATA[About 2 years ago I wrote a piece called Five things that Desktop Linux really needs, attempting to air out my five biggest grievances with Desktop Linux. If you follow FOSS news, every year is heralded as &#8220;The Year of the Linux Desktop,&#8221; although such a thing clearly hasn&#8217;t happened yet. Now, two years later, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/img/tech/tux.png" alt="Tux" class="right" /></p>
<p>About 2 years ago I wrote a piece called <a href="http://heliologue.com/2006/08/03/five-things-that-desktop-linux-really-needs/">Five things that Desktop Linux really needs</a>, attempting to air out my five biggest grievances with Desktop Linux.  If you follow FOSS news, every year is heralded as &#8220;The Year of the Linux Desktop,&#8221; although such a thing clearly hasn&#8217;t happened yet.  Now, two years later, I thought it would be interesting to revisit those five problems and see what kind of progress has been made in two years.</p>
<p><span id="more-2078"></span></p>
<h3>Linux needs a good CD ripper</h3>
<p>When I last wrote, the favorite CD ripper for the GNOME environment was <a href="http://nostatic.org/grip/">grip</a>, which had the benefit of being extremely customizable, even if its ripping was plain-Jane <code>cdparanoia</code>.  Grip is still used, I imagine, but the default ripper with the GNOME desktop is Sound Juicer, which is a gstreamer-based ripper/encoder that abstracts everything quite heavily and gives damn few options.</p>
<p>The good news is that the semi-abandoned <code>cdparanoia</code> project at least saw a maintenance release that fixed some bugs;  the bad news is that the promised further revisions have failed to materialize, meaning that there&#8217;s still no compelling cd ripper available for Linux.  The Exact Audio Copys and the dBpoweramps remain Windows-only tools.</p>
<p><a href="/img/albums/Software/rubyripper.png" title="A screenshot of RubyRipper" rel="lightbox"><img src="/img/albums/Software/rubyripper_thumb.png" alt="A screenshot of RubyRipper" class="right" /></a></p>
<p>Also in these past two years, however, a new ripper has emerged:  RubyRipper, a Ruby/GTK2 program that tries to emulate EAC&#8217;s approach to ripping;  that is, it reads segments multiple times and compares them using Ruby&#8217;s checksum matcher.  While there are some audiophiles at <a href="http://hydrogenaudio.org">HydrogenAudio</a> who insist that this isn&#8217;t a perfect approach to ripping (of course it isn&#8217;t), it&#8217;s still far more than any of the desktop-standard rippers can come up with.  Ideally, it will eventually feature AccurateRip support, though this is tentative.</p>
<p>So, I&#8217;m happy to report that <em>some</em> progress has been made in this area, though Linux is still a second-class citizen when it comes to CD ripping.</p>
<h3>Linux needs good and consistent font rendering</h3>
<p>You wouldn&#8217;t think it, compared to issues like multimedia codecs, but font rendering is awash in legal issues.  Be it Apple&#8217;s patent for BCI (byte code interpreter) or Microsoft&#8217;s patent on TrueType (both of which are legally dubious), it&#8217;s legal threats and not technical problems that keep the default font smoother for most distros from producing nice, clean, antialiased fonts.  The code already exists in the upstream source code for TrueType, but it&#8217;s disabled by default.  Ubuntu finally made the decision to enable it by default in their distribution, for which I applaud them.  Packages exist for many other distributions, which is still a damn sight better than the typical &#8220;compile it yourself&#8221; response, which always strikes me as utterly absurd.</p>
<p>In my previous post, I highlighted the discrepancy between various <em>types</em> of programs in Linux when it comes to font rendering.  I can say without hesitation that the situation has improved since then, though I&#8217;m not entirely sure where the responsibility for the fixes lie.</p>
<p><a href="/img/albums/Software/font_rendering.png" title="Font rendering in Ubuntu Linux 8.04" rel="lightbox"><img src="/img/albums/Software/font_rendering_thumb.png" alt="Font rendering in Ubuntu Linux 8.04" class="center" /></a></p>
<p>What you see is my blog in Firefox 3.0, some source code in Netbeans 6.1, and the template picker in OpenOffice 2.4.  Notice that the font rendering is pretty similar in all three of them.  I can tell you that Netbeans looks that good because I&#8217;m running it with the Java 6 JDK, which finally added decent font antialiasing.  Running it with Java 5 produces some pretty obnoxious font quality.  As to OpenOffice, they either fixed font rendering on their end, or else OpenOffice benefits from the larger system font smoothing included in Ubuntu.</p>
<h3>Linux needs better inter-distro compatibility and less dependence on repositories</h3>
<p>My choice of Linux is Ubuntu;  this decision is spurred largely by some &aelig;sthetic choices, and the truly orgasmic package management system.  If it were not for this, I might very well be running OpenSUSE, which has greatly improved its package management with v11.0.  One of OpenSUSE&#8217;s more compelling features is that they&#8217;re more willing (and the community is more willing) to add new software to the repositories.  It&#8217;s significantly easier for me to get the latest and greatest software for openSUSE, often by dint of either Pacman&#8217;s wonderful repository or the openSUSE build service, which Ubuntu has responded to with the Personal Package Archive (read: build service).</p>
<p>The one benefit of the Ubuntu&#8217;s approach is that packages <em>tend</em> to play nicely with each other, whereas with openSUSE and <em>it&#8217;s</em> build service, there are sometimes overlapping dependencies.</p>
<p>But there are still a bunch of different package types and packages managers;  even among package types, there are incompatible versions.  And because of the shared nature of Linux libraries, each distribution&#8217;s release will likely have a narrow slice of software versions that will work for that particular library.  Say what you will about the Windows approach, but when I install the latest FileZilla on Windows, I don&#8217;t get bitched at by my system for needing newer wxWidgets libraries (and therefore necessitating that I either compile my own version or wait 6 month until somebody does it for me).  Similarly, installing new graphics drivers doesn&#8217;t mean I have to also set up the latest kernel headers and reconfiguring my display configuration file so that it doesn&#8217;t fail spectacularly when I reboot.</p>
<p>Initiatives like LSB, FreeDesktop, and PackageKit have made bold steps to make Linux play well with itself.  But there&#8217;s no middle ground with Linux:  you either let distributions do all the work for you, and limit yourself to the particular software, and the particular versions, that they feel like offering you, <em>or</em> you can do everything yourself, compiling and installing your software manually.</p>
<h3>Linux needs better multimedia</h3>
<p>OK, multimedia on Linux still sucks, and it still sucks hard.  Even providing that you&#8217;ve enabled extra (legally dubious) repositories for your installation and downloaded all of the plugins and codecs that are available to you (after, of course, you&#8217;ve decided to use either GStreamer or Xine as a video engine), you still have the unfortunate issues of video in Linux being slower and of an inferior quality to video on Windows.  Is there a reason that rendering with Totem-GStreamer is blocky and awful, and rendering with The KMPlayer is picture-perfect?  Even <code>xine</code> is far from perfect.  </p>
<p>Then there&#8217;s the age-old problem of the X server and the graphics stack on Linux being shit to begin with.  Compiz is great, and I&#8217;ve spent a fair amount of time watching my windows wobble and painting fire on my screen, but is there a reason that emulators perform so much worse in Linux?  Is there a reason that I can&#8217;t switch tabs in Firefox with a several-second delay before the page contents are written to the screen?  Is there a reason that the X server breaks at the slightest provocation?  Is there a reason that the graphics stack offers so little to developers when compared to DirectX on Windows?  Is there a reason that all the good-looking audio players on Linux can&#8217;t offer me anywhere near the same functionality that foobar2000 does on Windows?  </p>
<p>I see individual programs making great progress, but there are fundamental flaws in the Linux approach to multimedia that aren&#8217;t going to be solved no matter how many widgets we give our apps.  Multimedia on Linux still has a long way to go.</p>
<h3>Linux needs disk image mounters</h3>
<p>I&#8217;m pleased to report that since I last talked about this issue, there have been a couple of programs written to provide just this functionality.  Programs like Daemon Tools or Alcohol 52% on Windows provide a way to mount virtual copies of CD or DVD images, allowing the computer to interact with them as though they were physical discs sitting in the drive.</p>
<p>On Gnome, there&#8217;s the <a href="http://www.marcus-furius.com/?page_id=14">Furius ISO Mount</a>;  on KDE, there&#8217;s <a href="http://www.acetoneiso.netsons.org/">AcetoneISO</a>, which is also gaining burning support <i>a la</i> Alcohol 120%.  Both of these programs function pretty much exactly how you would expect them to.  Of the five issues I highlighted last time, this appears to be the most completely resolved;  unfortunately, it was also the least important of the issues, and has gotten even less important to me personally since my previous writeup.</p>
<h3>Conclusion</h3>
<p>My biggest problem is that although the open source community continues to produce extremely compelling software, it suffers from the same fundamental flaws it did years ago:  its X server and graphics layer are slow and difficult to work with, which is why you&#8217;ll find much better software emulators, games, and video playback on Windows than you will on Linux;  its distro-centric repositories are both a boon to the end-user and a version lock-in that eliminates the &#8220;Go to the vendor site and download an .exe&#8221; ease of Windows.  Linux is still a &#8220;Do It Yourself&#8221; operating system, meaning that despite all of the work being done, there&#8217;s not necessarily a complete and versatile environment for developers to program against.  Certainly, there isn&#8217;t a <em>consistent</em> environment, and things are constantly changing in the world of Linux.</p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2008/06/20/desktop-linux-revisited/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Fatuous criticisms of Linux</title>
		<link>http://heliologue.com/2008/04/04/fatuous-criticisms-of-linux/</link>
		<comments>http://heliologue.com/2008/04/04/fatuous-criticisms-of-linux/#comments</comments>
		<pubDate>Fri, 04 Apr 2008 20:27:05 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=2031</guid>
		<description><![CDATA[I love Jeff Atwood&#8217;s blog, and can even accept that he&#8217;s drank of the Microsoft Kool-Aid seemingly for both desktop and server because he&#8217;s a great writer and a great programmer. But I admit to being troubled by his recent post. I might think it to be an April Fool&#8217;s Day joke, except the post [...]]]></description>
			<content:encoded><![CDATA[<p>I love Jeff Atwood&#8217;s blog, and can even accept that he&#8217;s drank of the Microsoft Kool-Aid seemingly for both desktop and server because he&#8217;s a great writer and a great programmer.</p>
<p>But I admit to being troubled by his recent post.  I might think it to be an April Fool&#8217;s Day joke, except the post is dated 31 March 2008. After quoting a couple of Linux upgrade horror stories from a software-engineer-turned-club-owner, he concludes:</p>
<blockquote cite="http://www.codinghorror.com/blog/archives/001089.html" title="Coding Horror &sect; Let That Be a Lesson To You, Son: Never Upgrade.">
<p>I can&#8217;t fault Jamie&#8217;s approach. A clean install of an operating system on a new hard drive &#8212; for kiosks running controlled hardware, no less &#8212; that&#8217;s as good as it gets.</p>
<p>Apparently, <strong>Linux is so complex that even a world class software engineer can&#8217;t always get it to work.</strong></p>
<p>I find it highly disturbing that a software engineer of Jamie&#8217;s caliber would give up on upgrading software. Jamie lives and breathes Linux. It is his platform of choice. If he throws in the towel on Linux upgrades, then what possible hope do us mere mortals have? </p>
</blockquote>
<p><span id="more-2031"></span></p>
<p>I see a couple of mistaken assumptions here.  The first is that kiosks are a simple platform for Linux to work on.  In fact, I have no <em>idea</em> what hardware is in these kiosks, and have no idea if there are proprietary hardware bits in them that aren&#8217;t supported well in Linux.  The second poor assumption is that the only necessary step here is simply installing the new O/S.  In fact, the engineer does a <em>lot</em> of custom configuration to his kiosks, and it may be <em>that</em> which is causing the crashiness;  or, it could be bad hardware.  I simply don&#8217;t have enough information to know.  </p>
<p>Atwood places the blame squarely on Linux&#8217;s &#8220;complexity.&#8221;  If Linux was that awful, or if Windows was that great, why hasn&#8217;t Jamie simply gotten the kiosks to run Windows?  I don&#8217;t know for sure, but I&#8217;m guessing they might be just as crashy.  And certainly more expensive.</p>
<p>Finally, as many commenters on Atwood&#8217;s entry have pointed out, a good software engineer != a good sysadmin.  And using Fedora Core is not a good path to stability.  And more importantly, that <strong>anecdotes from one guy, however smart, do not a comprehensive criticism make.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2008/04/04/fatuous-criticisms-of-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OpenOffice 2.4.0</title>
		<link>http://heliologue.com/2008/03/25/openoffice-240/</link>
		<comments>http://heliologue.com/2008/03/25/openoffice-240/#comments</comments>
		<pubDate>Tue, 25 Mar 2008 19:13:58 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[aside]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=2022</guid>
		<description><![CDATA[After a number of delays, OpenOffice 2.4.0 has been officially released. Get it here. Check the mirrors for your own OS and localization. OpenOffice 2.4.0 has not quite been released yet. Some major new features include OpenGL transitions for Impress, some major charting improvements for Calc, and block selection for Writer.]]></description>
			<content:encoded><![CDATA[<p><img src="/img/tech/openoffice_tango.png" alt="OpenOffice.org" class="right" /></p>
<p><ins datetime="2008-03-27T13:24:09+00:00">After a number of delays, OpenOffice 2.4.0 has been officially released.  Get it <a href="http://download.openoffice.org/">here</a>.  Check the mirrors for your own OS and localization.</ins>  <del datetime="2008-03-27T13:24:09+00:00">OpenOffice 2.4.0 has not <em>quite</em> been released yet. </del> Some major new features include <a href="http://www.oooninja.com/2008/02/eye-candy-3d-opengl-transitions-impress.html">OpenGL transitions</a> for Impress, some major <a href="http://wiki.services.openoffice.org/wiki/Chart2/Features2.4">charting</a> improvements for Calc, and <a href="http://www.oooninja.com/2007/12/block-selection-mode-new-feature.html">block selection</a> for Writer.  </p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2008/03/25/openoffice-240/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Linux 2.6 filesystem benchmarks</title>
		<link>http://heliologue.com/2008/03/21/linux-26-filesystem-benchmarks/</link>
		<comments>http://heliologue.com/2008/03/21/linux-26-filesystem-benchmarks/#comments</comments>
		<pubDate>Fri, 21 Mar 2008 21:30:33 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[aside]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[filesystems]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://heliologue.com/?p=2015</guid>
		<description><![CDATA[A recent Linux filesystem benchmark, using modern filesystems, takes a look at some hard numbers. Looks like JFS usually comes in on top.]]></description>
			<content:encoded><![CDATA[<p>A recent Linux <a href="http://www.techyblog.com/linux-news/linux-filesystem-benchmark.html">filesystem benchmark</a>, using modern filesystems, takes a look at some hard numbers.  Looks like <a href="http://en.wikipedia.org/wiki/IBM_Journaled_File_System_2_(JFS2)">JFS</a> usually comes in on top.</p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2008/03/21/linux-26-filesystem-benchmarks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Common compression and corpuses</title>
		<link>http://heliologue.com/2008/01/19/common-compression-and-corpuses/</link>
		<comments>http://heliologue.com/2008/01/19/common-compression-and-corpuses/#comments</comments>
		<pubDate>Sun, 20 Jan 2008 04:43:21 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://heliologue.com/blog/2008/01/19/common-compression-and-corpuses/</guid>
		<description><![CDATA[Every so often, I dink around with benchmarking common lossless compressors. One of the best sites for it is, I think, Werner Bergman&#8217;s Maximum Compression, which is a rather comprehensive running benchmark of just about every lossless compression benchmark under the sun. Really, there&#8217;s a lot. What you have to understand about the world of [...]]]></description>
			<content:encoded><![CDATA[<p>Every so often, I dink around with benchmarking common lossless compressors.  One of the best sites for it is, I think, Werner Bergman&#8217;s <a href="http://www.maximumcompression.com/">Maximum Compression</a>, which is a rather comprehensive running benchmark of just about every lossless compression benchmark under the sun.  Really, there&#8217;s a <em>lot</em>.  What you have to understand about the world of compressors is that they are very often academic projects or toys that very smart people play with in their free time.  There are also companies (but not many) who invest in their own proprietary algorithms for lossless compression.</p>
<p>Here&#8217;s the catch, though:  the quality of a compressor isn&#8217;t measured by its final compression ratio.  The <a href="http://en.wikipedia.org/wiki/PAQ">PAQ</a> series of a compressors, for instance, offer great compression and really, truly awful compression times.  The time goes with the highest compression levels of WinRK (a proprietary Win32 format with an accompanying GUI).  But disk is cheap:  nobody <em>really</em> cares about a fraction of a percentage of compression efficiency, do they?  What people really want is for their (inevitable) <a href="http://heliologue.com/free-software/archivers-and-compressors/">archiving GUI</a> to take less time doing what it does.</p>
<p>In this spirit, I have compiled not so much an exhaustive less of possible compression algorithms (I&#8217;ll leave that to Werner, who is very good at what he does), but rather a short list of the most common formats, tested on three different (relatively well-known) corpuses:  the <a href="http://en.wikipedia.org/wiki/Calgary_Corpus">Calgary Corpus</a>, the newer <a href="http://en.wikipedia.org/wiki/Canterbury_Corpus">Canterbury Corpus</a>, and Andrew Tridgell&#8217;s 1999 <a href="http://samba.org/ftp/tridge/large-corpus/">Large Corpus</a>.  The first of these two are corpuses used to test the very kind of academic project which I&#8217;ve avoided.  I dislike using them because they are small in size, which means that there is significantly less opportunity for variations in compression formats to manifest themselves.  In the interest of verifiability, however, I have used them.  I also included Andrew Tridgell&#8217;s large corpus because it&#8217;s been my experience that small test corpuses tend to vary too much too to disk I/O latency and other vagaries of compression algorithms.</p>
<p>What will follow is a data table for each corpus, followed by some brief observations about each.</p>
<p><span id="more-1954"></span></p>
<p>First, a note about the test environment:</p>
<ul>
<li>Windows Vista x64</li>
<li>Intel Q6600 Quad-Core</li>
<li>4GB Corsair PC2 6400</li>
<li>Western Digital Caviar WD1600YS SATAII, 160GB (system drive)</li>
<li><strong>Timer</strong>: Igor Pavlov&#8217;s <a href="http://www.7-zip.org/dl/utils/timer301.zip">timer.exe</a> (times reported are &#8220;Process&#8221; times).</li>
</ul>
<p>Next, a note about compressor versions</p>
<table class="sortable rowstyle-even" summary="Compressors and versions">
<caption>
Compressor versions used<br />
</caption>
<thead>
<tr>
<th class="sortable-text">Compressor</th>
<th class="sortable-numeric">Version</th>
<th>Source</th>
</tr>
</thead>
<tbody>
<tr>
<td>
tar
</td>
<td>
1.13
</td>
<td>
<a href="http://gnuwin32.sf.net">GnuWin32</a>
</td>
</tr>
<tr>
<td>
gzip
</td>
<td>
1.3.12
</td>
<td>
<a href="http://gnuwin32.sf.net">GnuWin32</a>
</td>
</tr>
<tr>
<td>
tar
</td>
<td>
1.0.4
</td>
<td>
<a href="http://gnuwin32.sf.net">GnuWin32</a>
</td>
</tr>
<tr>
<td>
zip/unzip
</td>
<td>
2.32/5.52
</td>
<td>
<a href="http://gnuwin32.sf.net">GnuWin32</a>
</td>
</tr>
<tr>
<td>
7z (32-bit)
</td>
<td>
4.57
</td>
<td>
<a href="7-zip.org">7-Zip</a>
</td>
</tr>
<tr>
<td>
rar
</td>
<td>
3.71
</td>
<td>
<a href="http://rarlabs.com">RarLabs</a>
</td>
</tr>
</tbody>
</table>
<p>Now on to the benchmarks&#8230;</p>
<table class="sortable rowstyle-even" id="calgary-corpus">
<caption>
                Calgary Corpus<br />
            </caption>
<thead>
<tr>
<th class="sortable-text" scope="col">
                        Codec
                    </th>
<th class="sortable-text" scope="col">
                        Setting
                    </th>
<th class="sortable-numeric" scope="col">
                        Enc. Speed (s)
                    </th>
<th class="sortable-numeric" scope="col">
                        Dec. Speed (s)
                    </th>
<th class="sortable-numeric" scope="col">
                        Size (b)
                    </th>
<th class="sortable-numeric" scope="col">
                        Ratio
                    </th>
</tr>
</thead>
<tbody>
<tr>
<td>
                        tar
                    </td>
<td>
                    </td>
<td>
                        0.000
                    </td>
<td>
                        0.000
                    </td>
<td>
                        3,265,536
                    </td>
<td>
                        1.000
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                        fast
                    </td>
<td>
                        0.171
                    </td>
<td>
                        0.062
                    </td>
<td>
                        1,244,763
                    </td>
<td>
                        0.381
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                    </td>
<td>
                        0.312
                    </td>
<td>
                        0.093
                    </td>
<td>
                        1,070,276
                    </td>
<td>
                        0.328
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                        best
                    </td>
<td>
                        0.561
                    </td>
<td>
                        0.062
                    </td>
<td>
                        1,062,584
                    </td>
<td>
                        0.325
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                        fast
                    </td>
<td>
                        0.499
                    </td>
<td>
                        0.218
                    </td>
<td>
                        961,633
                    </td>
<td>
                        0.294
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                    </td>
<td>
                        0.514
                    </td>
<td>
                        0.202
                    </td>
<td>
                        891,321
                    </td>
<td>
                        0.273
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                        best
                    </td>
<td>
                        0.483
                    </td>
<td>
                        0.218
                    </td>
<td>
                        891,321
                    </td>
<td>
                        0.273
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                        -1
                    </td>
<td>
                        0.187
                    </td>
<td>
                        0.078
                    </td>
<td>
                        1,244,985
                    </td>
<td>
                        0.381
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                    </td>
<td>
                        0.358
                    </td>
<td>
                        0.078
                    </td>
<td>
                        1,070,495
                    </td>
<td>
                        0.328
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                        -9
                    </td>
<td>
                        0.516
                    </td>
<td>
                        0.046
                    </td>
<td>
                        1,062,803
                    </td>
<td>
                        0.325
                    </td>
</tr>
<tr>
<td>
                        7z
                    </td>
<td>
                        1
                    </td>
<td>
                        0.436
                    </td>
<td>
                        0.171
                    </td>
<td>
                        962,460
                    </td>
<td>
                        0.295
                    </td>
</tr>
<tr>
<td>
                        7z
                    </td>
<td>
                        6
                    </td>
<td>
                        1.996
                    </td>
<td>
                        0.140
                    </td>
<td>
                        856,273
                    </td>
<td>
                        0.262
                    </td>
</tr>
<tr>
<td>
                        7z
                    </td>
<td>
                        9
                    </td>
<td>
                        2.152
                    </td>
<td>
                        0.140
                    </td>
<td>
                        853,686
                    </td>
<td>
                        0.261
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m1
                    </td>
<td>
                        0.265
                    </td>
<td>
                        0.140
                    </td>
<td>
                        1,167,991
                    </td>
<td>
                        0.358
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m3
                    </td>
<td>
                        1.950
                    </td>
<td>
                        0.140
                    </td>
<td>
                        935,499
                    </td>
<td>
                        0.286
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m5
                    </td>
<td>
                        1.762
                    </td>
<td>
                        0.891
                    </td>
<td>
                        788,671
                    </td>
<td>
                        0.242
                    </td>
</tr>
</tbody>
</table>
<p>The Calgary Corpus dates back to the late 80s.  It&#8217;s become <em>the</em> test to perform, but it may or may not adequately represent the standard compressor workload in 2008.  You&#8217;ll notice that Winrar&#8217;s maximum setting produces the smallest archive, and more quickly than the neighboring 7-zip runs.  Notice, too, that among the lowest values, there tends to be a sort of &#8220;bottoming-out&#8221; point at which the speed of the compressor&#8217;s process in CPU is limited by the speed of the disk.</p>
<table class="sortable rowstyle-even" id="canterbury-corpus">
<caption>
                Canterbury Corpus<br />
            </caption>
<thead>
<tr>
<th class="sortable-text" scope="col">
                        Codec
                    </th>
<th class="sortable-text" scope="col">
                        Setting
                    </th>
<th class="sortable-numeric" scope="col">
                        Enc. Speed (s)
                    </th>
<th class="sortable-numeric" scope="col">
                        Dec. Speed (s)
                    </th>
<th class="sortable-numeric" scope="col">
                        Size (b)
                    </th>
<th class="sortable-numeric" scope="col">
                        Ratio
                    </th>
</tr>
</thead>
<tbody>
<tr>
<td>
                        tar
                    </td>
<td>
                    </td>
<td>
                        0.000
                    </td>
<td>
                        0.000
                    </td>
<td>
                        2,821,120
                    </td>
<td>
                        1.000
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                        fast
                    </td>
<td>
                        0.140
                    </td>
<td>
                        0.062
                    </td>
<td>
                        872,570
                    </td>
<td>
			0.309
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                    </td>
<td>
                        0.249
                    </td>
<td>
                        0.062
                    </td>
<td>
                        739,066
                    </td>
<td>
			0.262
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                        best
                    </td>
<td>
                        1.138
                    </td>
<td>
                        0.062
                    </td>
<td>
                        736,223
                    </td>
<td>
			0.261
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                        fast
                    </td>
<td>
                        0.390
                    </td>
<td>
                        0.156
                    </td>
<td>
                        584,964
                    </td>
<td>
			0.207
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                    </td>
<td>
                        0.514
                    </td>
<td>
                        0.171
                    </td>
<td>
                        570,856
                    </td>
<td>
			0.202
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                        best
                    </td>
<td>
                        0.390
                    </td>
<td>
                        0.156
                    </td>
<td>
                        570,856
                    </td>
<td>
			0.202
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                        -1
                    </td>
<td>
                        0.140
                    </td>
<td>
                        0.078
                    </td>
<td>
                        872,795
                    </td>
<td>
			0.309
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                    </td>
<td>
                        0.343
                    </td>
<td>
                        0.062
                    </td>
<td>
                        739,286
                    </td>
<td>
			0.262
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                        -9
                    </td>
<td>
                        1.170
                    </td>
<td>
                        0.062
                    </td>
<td>
                        736,443
                    </td>
<td>
			0.261
                    </td>
</tr>
<tr>
<td>
                        7z
                    </td>
<td>
                        1
                    </td>
<td>
                        0.280
                    </td>
<td>
                        0.930
                    </td>
<td>
                        569,953
                    </td>
<td>
			0.202
                    </td>
</tr>
<tr>
<td>
                        7z
                    </td>
<td>
                        6
                    </td>
<td>
                        1.950
                    </td>
<td>
                        0.124
                    </td>
<td>
                        487,919
                    </td>
<td>
			0.172
                    </td>
</tr>
<tr>
<td>
                        7z
                    </td>
<td>
                        9
                    </td>
<td>
                        2.199
                    </td>
<td>
                        0.124
                    </td>
<td>
                        485,391
                    </td>
<td>
			0.173
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m1
                    </td>
<td>
                        0.218
                    </td>
<td>
                        0.124
                    </td>
<td>
                        772,369
                    </td>
<td>
			0.274
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m3
                    </td>
<td>
                        1.232
                    </td>
<td>
                        0.093
                    </td>
<td>
                        515,831
                    </td>
<td>
			0.183
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m5
                    </td>
<td>
                        1.170
                    </td>
<td>
                        0.561
                    </td>
<td>
                        427,178
                    </td>
<td>
			0.151
                    </td>
</tr>
</tbody>
</table>
<p>I&#8217;m still not entirely able to figure out the Canterbury Corpus;  it&#8217;s ostensibly an &#8220;update&#8221; to the aging Calgary Corpus.  One would think that having been created more than a decade after it&#8217;s predecessor, and with the express purpose of more accurately representing the compressor workload of 2001, it would at least be <em>larger</em> (hard disks and file sizes <em>have</em> increased in size since 1989, believe it or not), but in fact it&#8217;s not, which was somewhat of a disappointment to me, as I saw entirely the same trends as with the previous corpus.  Is that an accurate determination of the algorithms in question?  Maybe not—read on.</p>
<table class="sortable rowstyle-even" id="tridge-large-corpus">
<caption>
                Large-Corpus<br />
            </caption>
<thead>
<tr>
<th class="sortable-text" scope="col">
                        Codec
                    </th>
<th class="sortable-text" scope="col">
                        Setting
                    </th>
<th class="sortable-numeric" scope="col">
                        Enc. Speed (s)
                    </th>
<th class="sortable-numeric" scope="col">
                        Dec. Speed (s)
                    </th>
<th class="sortable-numeric" scope="col">
                        Size (b)
                    </th>
<th class="sortable-numeric" scope="col">
                        Ratio
                    </th>
</tr>
</thead>
<tbody>
<tr>
<td>
                        tar
                    </td>
<td>
                    </td>
<td>
                        0.000
                    </td>
<td>
                        0.000
                    </td>
<td>
                        247,933,952
                    </td>
<td>
                        1.000
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                        fast
                    </td>
<td>
                        7.347
                    </td>
<td>
                        2.698
                    </td>
<td>
                        65,782,177
                    </td>
<td>
			0.265
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                    </td>
<td>
                        13.072
                    </td>
<td>
                        3.151
                    </td>
<td>
                        53,870,968
                    </td>
<td>
			0.217
                    </td>
</tr>
<tr>
<td>
                        gzip
                    </td>
<td>
                        best
                    </td>
<td>
                        21.855
                    </td>
<td>
                        2.449
                    </td>
<td>
                        53,536,722
                    </td>
<td>
			0.216
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                        fast
                    </td>
<td>
                        40.591
                    </td>
<td>
                        9.360
                    </td>
<td>
                        52,791,871
                    </td>
<td>
			0.213
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                    </td>
<td>
                        54.506
                    </td>
<td>
                        10.567
                    </td>
<td>
                        39,372,759
                    </td>
<td>
			0.159
                    </td>
</tr>
<tr>
<td>
                        bzip2
                    </td>
<td>
                        best
                    </td>
<td>
                        54.228
                    </td>
<td>
                        10.935
                    </td>
<td>
                        39,372,759
                    </td>
<td>
			0.159
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                        -1
                    </td>
<td>
                        6.349
                    </td>
<td>
                        2.208
                    </td>
<td>
                        65,782,411
                    </td>
<td>
			0.265
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                    </td>
<td>
                        12.682
                    </td>
<td>
                        2.527
                    </td>
<td>
                        53,871,197
                    </td>
<td>
			0.217
                    </td>
</tr>
<tr>
<td>
                        zip
                    </td>
<td>
                        -9
                    </td>
<td>
                        21.529
                    </td>
<td>
                        2.433
                    </td>
<td>
                        53,536,951
                    </td>
<td>
			0.216
                    </td>
</tr>
<tr>
<td>
                        7z
                    </td>
<td>
                        1
                    </td>
<td>
                        19.578
                    </td>
<td>
                        6.608
                    </td>
<td>
                        47,343,400
                    </td>
<td>
			0.191
                    </td>
</tr>
<tr>
<td>
                        7z
                    </th>
<td>
                        6
                    </td>
<td>
                        128.645
                    </td>
<td>
                        4.035
                    </td>
<td>
                        26,373,931
                    </td>
<td>
			0.106
                    </td>
</tr>
<tr>
<td>
                        7z
                    </td>
<td>
                        9
                    </td>
<td>
                        172.677
                    </td>
<td>
                        3.712
                    </td>
<td>
                        24,722,887
                    </td>
<td>
			0.100
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m1
                    </td>
<td>
                        9.016
                    </td>
<td>
                        4.446
                    </td>
<td>
                        48,939,730
                    </td>
<td>
			0.197
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m3
                    </td>
<td>
                        125.128
                    </td>
<td>
                        3.868
                    </td>
<td>
                        31,916,951
                    </td>
<td>
			0.129
                    </td>
</tr>
<tr>
<td>
                        rar
                    </td>
<td>
                        m5
                    </td>
<td>
                        138.435
                    </td>
<td>
                        23.852
                    </td>
<td>
                        29,200,310
                    </td>
<td>
			0.118
                    </td>
</tr>
</tbody>
</table>
<p>Mostly interestingly in the Tridgell&#8217;s &#8220;large-corpus,&#8221; we finally see 7-Zip spring ahead of WinRAR in terms of pure compression ratio (and in speed, too, in some cases).  I&#8217;m not an expert on compression, so I can&#8217;t tell you why certain efficiencies only manifest themselves over large datasets, but clearly 7-Zip wins in more modern cases where large data-sets (mostly text, if Tridgell&#8217;s description is accurate) are present.  </p>
<p>Clearly, the LZMA algorithm (the heart of 7-Zip) is something to be proud of;  not only is it GPL, but it often outperform the popular WinRAR in both pure compression and in efficiency as well.  I&#8217;m a little surprised that the 7-Zip  *nix port, p7zip, hasn&#8217;t gained more traction in Linux, but I suppose that old ways die hard.  The cheapness of disk and bandwidth nowadays rather point to more transparent compression as the ideal rather than whatever archiving format has the best compression in terms of purely numeric results.</p>
<p>For those of you looking for a decent free arching program, check <a href="http://7-zip.org">7-Zip</a> out;  for those of you who lust after data tables of compression benchmarks, give <a href="http://maximumcompression.com">Werner&#8217;s</a> a look:  it&#8217;ll satiate your desire for tabular results in ways you never thought possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://heliologue.com/2008/01/19/common-compression-and-corpuses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

