A while ago, as OpenOffice.org 2.0 approached completion, I compared the file sizes of Microsoft Office’s binary format against OpenOffice’s new OpenDocument format. Recall that OpenDocument is an XML-based storage formatted that is ultimate compressed into a zip file, creating smaller file sizes. Microsoft’s new Office Open XML is essentially the same thing, but with a totally different XML schema.
I decided to revisit this kind of test, and had four test files:
- The text of Ulysses, in HTML format. I chose HTML format to test the extra markup, as it should theoretically create a more complex document.
- A very large generated Lorem Ipsum block (205’000+ characters), which is pseudo-random, but with a lot of redundancy.
- A one-page block of Lorem Ipsum text, in order to test the handling of small files
- A randomly generated CSV with multiple kinds of text and 5’000 records. Converted used in OpenOffice Calc and Microsoft Excel.
Read on for the data table on observations.
