A Modest Construct

Tag: Windows

Desktop Linux revisited

Tux

About 2 years ago I wrote a piece called Five things that Desktop Linux really needs, attempting to air out my five biggest grievances with Desktop Linux. If you follow FOSS news, every year is heralded as “The Year of the Linux Desktop,” although such a thing clearly hasn’t happened yet. Now, two years later, I thought it would be interesting to revisit those five problems and see what kind of progress has been made in two years.

Read the full article »

Windows XP SP3

Windows XP SP3

Eh. It’s good, I suppose, and I’m sure its much-vaunted performance is there, but this is very much a service pack dealing with O/S guts, and not a massive feature pack a la SP2. I can’t immediately tell any difference.

In other news, Hardy Heron is released! And its software is already out of date.

Fatuous criticisms of Linux

I love Jeff Atwood’s blog, and can even accept that he’s drank of the Microsoft Kool-Aid seemingly for both desktop and server because he’s a great writer and a great programmer.

But I admit to being troubled by his recent post. I might think it to be an April Fool’s Day joke, except the post is dated 31 March 2008. After quoting a couple of Linux upgrade horror stories from a software-engineer-turned-club-owner, he concludes:

I can’t fault Jamie’s approach. A clean install of an operating system on a new hard drive — for kiosks running controlled hardware, no less — that’s as good as it gets.

Apparently, Linux is so complex that even a world class software engineer can’t always get it to work.

I find it highly disturbing that a software engineer of Jamie’s caliber would give up on upgrading software. Jamie lives and breathes Linux. It is his platform of choice. If he throws in the towel on Linux upgrades, then what possible hope do us mere mortals have?

Read the full article »

OpenOffice 2.4.0

OpenOffice.org

After a number of delays, OpenOffice 2.4.0 has been officially released. Get it here. Check the mirrors for your own OS and localization. OpenOffice 2.4.0 has not quite been released yet. Some major new features include OpenGL transitions for Impress, some major charting improvements for Calc, and block selection for Writer.

Size curves for office file formats

Just a few days ago, I compared the relative sizes of Microsoft’s Office Open XML (OOXML) and OASIS’s OpenDocument format (ODF). I noticed that while OOXML was smaller for smaller amounts of text, ODF was smaller for larger documents. I was curious as to the turning point for this curve, which I hypothesize has to do with the complexity of OOXML’s markup.

I ran a brief test using generated Lorem Ipsum text in approximate amounts (the leftmost column), and recorded its size (in bytes) when pasted into Notepad, and then as OpenDocument Text (OpenOffice.org 2.3.1), and then as OOXML (Office 2007 SP1).

After the data table is a graphical representation of the results. It’s clear that ODF slips below OOXML somewhere between 300Kb and 400Kb of raw textual data.

Comparison of file format sizes
Size Text OOXML ODF
5k 5030 12209 29408
25k 25158 14173 29715
50k 50318 15116 30039
100k 100638 18020 30616
200k 201276 24901 31670
300k 301918 31238 32676
400k 402558 37594 33634
800k 805118 61805 37418
1600k 1610238 110468 44881

file sizes

Common compression and corpuses

Every so often, I dink around with benchmarking common lossless compressors. One of the best sites for it is, I think, Werner Bergman’s Maximum Compression, which is a rather comprehensive running benchmark of just about every lossless compression benchmark under the sun. Really, there’s a lot. What you have to understand about the world of compressors is that they are very often academic projects or toys that very smart people play with in their free time. There are also companies (but not many) who invest in their own proprietary algorithms for lossless compression.

Here’s the catch, though: the quality of a compressor isn’t measured by its final compression ratio. The PAQ series of a compressors, for instance, offer great compression and really, truly awful compression times. The time goes with the highest compression levels of WinRK (a proprietary Win32 format with an accompanying GUI). But disk is cheap: nobody really cares about a fraction of a percentage of compression efficiency, do they? What people really want is for their (inevitable) archiving GUI to take less time doing what it does.

In this spirit, I have compiled not so much an exhaustive less of possible compression algorithms (I’ll leave that to Werner, who is very good at what he does), but rather a short list of the most common formats, tested on three different (relatively well-known) corpuses: the Calgary Corpus, the newer Canterbury Corpus, and Andrew Tridgell’s 1999 Large Corpus. The first of these two are corpuses used to test the very kind of academic project which I’ve avoided. I dislike using them because they are small in size, which means that there is significantly less opportunity for variations in compression formats to manifest themselves. In the interest of verifiability, however, I have used them. I also included Andrew Tridgell’s large corpus because it’s been my experience that small test corpuses tend to vary too much too to disk I/O latency and other vagaries of compression algorithms.

What will follow is a data table for each corpus, followed by some brief observations about each.

Read the full article »