Real World Computing
Ship of fools
Office XML
I recently read about senior Microsoft staff being disappointed at the way their OOXML standards process was being handled, with suggestions that some parties might be being underhanded. This given existing accusations about Microsoft stuffing the voting panels, but never mind.
I dashed off an email to the Microsoft UK PR team asking how Microsoft felt justified in seeking ISO standard status for OOXML when it wasn't even capable of storing numbers correctly. Go back a few issues for the full blood and gore on this matter, but suffice it to say here that a number such as 1234.1234 is a problem for Excel because of the way the IEEE floating-point number system works.
Type 1234.1234 into Excel and it displays as 1234.1234 correctly. However, save the file as XML and a nasty little secret gets revealed: Excel actually stores it as 1234.1233999999999 in the XML file. I understand that Excel has to deal with IEEE quirks, but I'd like the XML file to interoperate without requiring me to fudge the issue manually, thank you.
No less than the great Jean Paoli replied to my email, and he started off by explaining the issues with IEEE number formats, which was kind but not really necessary. He then continued with the following (I've inserted my own comments):
"Excel does have the ability though to store 1234.1234 as 1234.1233999999999 or as 1234.1234 and Open XML of course allows both."
Woah, holy smoke, Batman. Open XML allows both? Maybe there's a "Store as what I meant" format for the cell I've overlooked? Or some "XML for compatibility reasons with other applications" checkbox in the File/Save dialog:
"If the user decides that the number is a 'floating point', then Excel stores and interprets 1234.1234 and 1234.1233999999999 as the same floating-point numbers. A very important point is to note that in order to ensure the highest level of precision across calculations, all floating-point cell values must be stored the same way. If an Open XML file contains 1234.1234, Excel will read it and interpret it exactly the same way it interprets 1234.1233999999999."
Yes, I understand that, too, but when you're working in some regulated application area, such as auditing, for example, it's arguable that Excel should never arbitrarily change one number into another just for convenience of display.
"If the user prefers to have a 'string representation' of the number, then the user can directly enter '1234.1234', causing Excel to store 1234.1234 as 1234.1234."
At this point, my head fell off. The answer was there in front of us all along: if you want your spreadsheet to interoperate in a mixed environment using XML as the transport layer, then make sure you pass all data in and out as strings. In other words, don't you dare think about using the default number formats for transporting... numbers.
