Just a few days ago, I compared the relative sizes of Microsoft’s Office Open XML (OOXML) and OASIS’s OpenDocument format (ODF). I noticed that while OOXML was smaller for smaller amounts of text, ODF was smaller for larger documents. I was curious as to the turning point for this curve, which I hypothesize has to do with the complexity of OOXML’s markup.

I ran a brief test using generated Lorem Ipsum text in approximate amounts (the leftmost column), and recorded its size (in bytes) when pasted into Notepad, and then as OpenDocument Text (OpenOffice.org 2.3.1), and then as OOXML (Office 2007 SP1).

After the data table is a graphical representation of the results. It’s clear that ODF slips below OOXML somewhere between 300Kb and 400Kb of raw textual data.

Comparison of file format sizes
5k 5030 12209 29408
25k 25158 14173 29715
50k 50318 15116 30039
100k 100638 18020 30616
200k 201276 24901 31670
300k 301918 31238 32676
400k 402558 37594 33634
800k 805118 61805 37418
1600k 1610238 110468 44881

