Office "12" XML format session (OFF304)

Brian Jones presented the new and improved (default) XML formats coming in Office “12.” There were a couple of slides where the font was too small–simply because there was so much information being conveyed in a small amount of real estate. Other than that, and perhaps a couple of demo glitches, Brian gave a nice presentation.

Here are my notes:

  • XML is default format in “12” (not new information).
  • The new XML formats are actually a Zip “package” of compressed XML documents or “parts” along with relationships and other bits. (Aside: this sure does smell like OpenOffice and the OASIS Open Document Format (ODF) for Office Applications under development.).
  • Unlike their binary (legacy) counterparts, the XML formats are compressed. Current storage measurements show around a 50+% savings (e.g. .docx vs. .doc).
  • There will be patches available to patch down-level Office apps for new XML format support. This is critical if the new formats are to be adopted more rapidly and pervasively. ISVs must know that they’re not investing entirely in the future with no regard for the present enterprise landscape.
  • The openness of the XML formats means that any compression library, any XML parser, on any system (e.g. Linux) may be employed.
  • The current bits (not made available at PDC) make a binary format copy due to schema volatility (i.e. style undergoing work in fluid fashion) for the XML formats; this will change in final release (or before), of course.
  • The ordering of parts in a package is still in flux (e.g. important parts up front, etc.). Of course, this begs the question, what makes a part important and who decides the definition (e.g. Microsoft, an ISV and/or our customers)?
  • It was good to hear confirmation of Office “12” and XPS (“Metro“) interop (e.g. unified rationale for Zip employment).
  • Word “12” format is similar to Word 2003 format; Excel “12” format is a significant revision; Excel 2003 was not a full fidelity format (e.g. not everything was saved to XML); and PPT “12” is a brand new format.
  • The “Open Packaging Conventions” need to be studied, understood and followed (e.g. not adhering to these rules will result in data loss; e.g. a part with no reference flagged as corrupt and dropped in next save).
  • There is custom-defined XML schema support in O12–reference schemas provide formatting while custom schemas provide meaning, both to the same content.
  • I see these new XML formats as the new public, open API for OLE Structured Storage, which was previously only widely available via COM and Office binary formats. This transition is huge!
  • The new formats should make solutions development much cleaner (e.g. InfoPath-Word integration, custom XML/XSL and Word XML, metadata exchange acrosss different stores and systems, including offline scenarios, etc.).
  • The slide entitled “Sample Solution Scenarios” is particularly interesting when considering potential applications to add value to the new XML formats in Office “12” (e.g. criteria/validation scans–inbound and outbound, inject/remove agents, taxonomy/ontology formation/refinement agents, etc.). Knowing that the packaged result is open and extensible (e.g. the VSTO manifest where Office “12” is concerned will just be another part in the package) should prove to be a powerful development opportunity.
  • You should expect to see a ton of samples (based on significant feedback from 2003 release)–i.e. not just docs that state what, but samples that state how and why.

Brian advises that we stay tuned to his blog. Later this week (perhaps as early as tomorrow), Microsoft will release the current (preview) reference schemas for Office “12,” at which point Brian promises to blog in more detail about this development canvas.

9/14/2005 update: Brian posts information to download the preview schemas.

Tags: PDC05