EOOXML Objections Clearinghouse
←Older revision | Newer revision→
This page is a clearinghouse for ongoing work to document problems with the Ecma 376 ("EOOXML") document-format proposal. It is intended to be used to collect information and reasoning for future versions of the EOOXML objections document and similar documents. As such, any (correct) information on flaws in Ecma 376 is welcome; err on the side of inclusiveness.
Be sure to peruse the EOOXML project lessons learned page for important information on avoiding edit conflicts and communicating with other editors, EOOXML at JTC-1 for further links, and EOOXML Contacts for ISO-member contact information.
Notice of related cases
Reviewers of Ecma 376 should be aware that issues raised by the Ecma 376 proposal—Microsoft's refusal to release the specifications for its legacy file formats and/or its refusal to support the OpenDocument standard—may conceivably become involved in antitrust cases, thus raising a need for JTC-1's heightened scrutiny of the Ecma 376 proposal. Particulars are given in the Related cases section, below.
Conventions used in this document
1. Ecma 376
"Ecma 376" is used to denote the ECMA-376 Office Open XML File Formats as submitted to the ISO JTC-1 committee.
The original PDF file of the document submitted to JTC-1 has been removed from the Ecma web site, but a mirror copy is available.
2. Page numbers
All page numbers are with respect to the single PDF file version of Ecma 376, as submitted to JTC-1 and mirrored as noted above. The page numbers on the page footers of the specification do not match the page on the PDF file. For example, the page labeled "304" is actually page 620 in the PDF. In this document we cite to the PDF page numbers, e.g., "page 620". All page numbers not specifically identified as being citations to a particular document are citations to the document version discussed in this paragraph.
3. Compatibility Note
The sole market requirement given for Ecma 376 is that of compatibility with previously existing Microsoft Office documents and the need to migrate them to XML. In this document we include several notes on the topic of compatibility, and they look like this:
- Compatibility Note
- Information pertinent to compatibility with pre-existing Microsoft Office documents goes here.
Questions to be decided
- Does the proposed standard contradict, conflict with or duplicate existing international standards, in particular ISO standards?
- Would the existence of OOXML as an International Standard in addition to ISO/IEC 26300 cause user confusion?
- Can national bodies reasonably be expected to competently evaluate a proposed standard exceeding 6,000 pages within the time and constraints of fast-track procedures, particularly given the host of issues raised by the particular proposal at issue?
- Would the proposed standard create 'obstacles to international trade' within the meaning of the Agreement on Technical Barriers to Trade?
- If so, is the proposed standard necessary nonetheless, i.e., is there a market requirement for a duplicative standard that contradicts ISO/IEC 26300, the OpenDocument standard?
- Would fast track processing of the proposed standard place JTC-1's reputation at risk?
For all questions, along with an answer and an argument, some degree of impact analysis would be useful. For example, how confused would users be? What is the practical impact?
Introduction and Summary
This document reports grounds for objection to the preparation of Ecma 376—the Ecma Office Open XML standard—as an ISO/IEC standard by Joint Technical Committee 1 ("JTC-1"). Specifically, Ecma 376 should be diverted from its present fast-track processing and should be remanded to Ecma International for: (i) harmonization with ISO/IEC 26300:2006, the OpenDocument standard; and numerous other standards that it contradicts; (ii) development of more suitable intellectual property documents that actually grant rights to implement the specification.
The overarching controlling law is the international Agreement on Technical Barriers to Trade Agreement on Technical Barriers to Trade ("TBT"). It provides in Article 2:
2.1 Members shall ensure that in respect of technical regulations, products imported from the territory of any Member shall be accorded treatment no less favourable than that accorded to like products of national origin and to like products originating in any other country.
2.2 Members shall ensure that technical regulations are not prepared, adopted or applied with a view to or with the effect of creating unnecessary obstacles to international trade. For this purpose, technical regulations shall not be more trade-restrictive than necessary to fulfill a legitimate objective, taking account of the risks non-fulfilment would create. Such legitimate objectives are, inter alia: national security requirements; the prevention of deceptive practices; protection of human health or safety, animal or plant life or health, or the environment. In assessing such risks, relevant elements of consideration are, inter alia: available scientific and technical information, related processing technology or intended end-uses of products.
ISO, IEC and JTC-1 have faithfully executed those instructions over the years, as reflected in the ISO primary goal of "one standard, one test, and one conformity assessment procedure accepted everywhere,” producing a large body of international and open standards that have stimulated competition.
The JTC-1 decision whether Ecma 376 should remain on the fast track procedures under which it was submitted by Ecma presents issues of monumental importance within the standards development establishment, the software development industry, and for software users worldwide. Pared to its essentials, the issue is whether powerful vendors with established monopolies are thereby entitled to their own private and non-interoperable international standards, standards that can not feasibly be implemented by any other vendor.
Viewed from another perspective, the issue is whether ISO/IEC and JTC-1 are willing to accept the loss of reputation that would inevitably accompany a decision to process and adopt Ecma 376 on the fast track.
Viewed from yet another perspective, the issue is whether the Digital Divide will widen even further because the amazingly successful ISO/IEC 26300:2006 (OpenDocument) standard already widely supported by both free and proprietary software is rendered non-interoperable with an international standard that can only be implemented by a single vendor and any hand-chosen partners it may decide to favor.
There can be no blinking past the fact that adoption of Ecma 376 as an international standard would dramatically tilt the economic playing field for competitors. How then can it not be trade restrictive, an obstacle to international trade within the meaning of the TBT? Viewed most narrowly, the relevant legal question is whether an enormous obstacle to international trade is nonetheless necessary, per TBT section 2.2.
Having deliberately boycotted the multiple-vendor effort to develop the OpenDocument standard, Microsoft now petitions JTC-1 through Ecma International for its own duplicative and contradictory standard, offering a massive specification for review in the 30-day period provided for filing contradictions, a specification that was not made publicly available until less than 30 days before it was submitted to JTC-1 and for which there is still no full-featured reference implementation available for testing and evaluation purposes. Quite simply, it appears Microsoft is attempting to game the system to obtain standardization of a specification that flies in the face of the TBT and the very purposes of standardization.
So that Microsoft's leverage is clear, Ecma 376 offered only a single justification for a specification that duplicates the functionality of the ISO OpenDocument standard—except for OpenDocument's interoperability features. That justification is the claimed need for compatibility with billions of documents stored worldwide in Microsoft Office's legacy file formats. Microsoft attempts to hold those documents hostage, insisting on a right to maintain its vendor lock on its existing customer base through a new file format that is incompatible with the OpenDocument standard.
The specifications for the binary file formats—the compatibility with which is the very reason given for the proposed standard—are nowhere disclosed, which worrying appears to enable extending its existing office software monopoly to monopolize the migration of those binary formats to XML. However, Microsoft's nondisclosure of binary format specifications is at issue in antitrust litigation or under investigation.
This document assembles objections to the fast track processing of the Ecma 376 specification by JTC-1 and concludes with a discussion of applicable law and recommendations.
It is planned to continue refining this document during the period provided for JTC-1 P-member bodies to submit contradictions in regard to Ecma 376. This version is being released at this time so that the general thrust of the document can be reviewed and so that information in the document can begin circulating to those who need the information.
The fast track process forces a rapid and unfinished response, which is one more reason why Ecma 376 ought to be removed from that fast track, to give everyone time to fully evaluate it.
- Detailed analysis of controlling law regarding JTC-1 objections: The case for a valid contradiction of Microsoft Office Open XML at ISO has not been rebutted (4 Feb 2007).
Points & Authorities
Microsoft seeks through Ecma and ISO to obtain international standard status for its latest Microsoft Office file formats. This document seeks to demonstrate that no development company other than Microsoft could feasibly implement the specifications. Microsoft in effect argues that there is no choice but to award it an exclusive monopoly on the migration of legacy Microsoft Office formats to XML, and only to its own flavor of XML.
There has been insufficient disclosure and too little time to do a comprehensive review.
Ecma 376 contradicts numerous international standards
The Gregorian Calendar
The Gregorian calendar is the most widely used calendar in the world. A modification of the Julian calendar, it was decreed by Pope Gregory XIII on 24 February 1582. The Gregorian calendar forms the basis of many international standards such as ISO 8601.
Ecma 376 section 184.108.40.206 page 3305, “Date Representation”, conflicts with the Gregorian calendar in the calculation of dates. Specifically, it requires spreadsheet implementations to incorrectly treat the year 1900 as a leap year. This contradicts the Gregorian calendar, ISO 8601 and the civil calendar adopted by most nations of the world.
- Compatibility Note
- There is a known bug in Microsoft Excel that treats the year 1900 as a leap year. Changing the Gregorian calendar is not necessary (or the best way) to achieve compatibility with spreadsheets that depend on this bug.
- A better solution is to define the spec correctly, and when converting old binary files to the new format, Microsoft Office would (for example) replace WEEKDAY() by WEEKDAY()+1 for any dates affected by this bug. Alternatively, since they have compatibility flags for several other legacy bugs, this could be handled that way as well, e.g., when importing a legacy Excel document, set a flag "LeapYearBug=true", but when creating a new OOXML document this flag would not be set and dates would be described correctly.
ISO 8601 (Representation of dates and times)
ISO 8601 is the ISO standard for date and time representations.
Ecma 376 section 220.127.116.11 page 3305, “Date Representation” stipulates that dates must be represented as numeric codes counting from 1900 or 1904. This is in conflict with ISO 8601.
This section also forbids applications from supporting years before 1900, also in conflict with ISO 8601.
ISO 639 (Codes for the Representation of Names and Languages)
Ecma 376 section 2.18.51 page 2530, ST_LangCode requires support for a fixed list of numeric language codes as an alternative to the already existing set provided by ISO 639. This is a conflict with ISO 639. The codes standardized by ISO 639 include the use of a Registration Authority to process requests for new language codes. This is preferable to a fixed list attached to a document standard.
(Although 2.18.51 also allows ISO 639 to be used, the practical effect of specifying two standards is that an implementation must support both ISO 639 and the nonstandard ST_LangCode in order to fully interoperate.)
(ST_LangCode also seems to be incorrectly specified as two hexadecimal digits, which on its face would limit Ecma 376 to supporting 256 languages. However, four digits seems to be meant, part of a general inconsistency in specifying the lengths of hexadecimal numbers, as described below.)
- Blog: Standardizing Away the Word's Languages, Bill Poser, Language Log
ISO/IEC 8632 (Computer Graphics Metafile) and W3C SVG (Scalable Vector Graphics)
ISO/IEC 8632 (CGM) is the ISO standard for computer graphics metafiles: "2D graphical (pictorial) information" consisting of "vector graphics", "raster graphics", and "text" (NIST, 1998). Another possibility is SVG, the W3C standard "for describing two-dimensional vector and mixed vector/raster graphics in XML". (CGM is a binary format designed for large technical schematics, where SVG is an XML format designed for creative graphics and illustrations .) Much of SVG was incorporated by reference into ISO 26300.
Ecma 376 uses neither of these international standard formats for vector graphics, and instead defines two new incompatible formats. Ecma 376 "DrawingML" (section 14 page 132) defines a vector drawing XML format in conflict with the industry standard W3C SVG. Ecma 376 "VML" (section 8.6.2 page 24) requires support for another drawing XML format in conflict with W3C SVG. (Note that VML was proposed by Microsoft as a W3C standard in 1998, but was rejected in favour of SVG.)
Ecma 376 section 18.104.22.168 page 5679, "Embedded Object Alternate Image Requests Types" and section 22.214.171.124 page 5738, "Clipboard Format Types" also refer to Windows Metafiles or Enhanced Metafiles, a nonstandard vector-graphics format originating in Microsoft Windows.
- Compatibility Note
- Defining a new standard for vector drawings is not necessary for compatibility with existing Microsoft Office documents. Even if Ecma 376 legitimately needs drawing functions not available in SVG, it should reference SVG for all features that are provided by SVG.
- ISO 26300 OpenDocument illustrates how to solve this problem:
- There are functions in ISO 26300 that are not present in SVG (3D drawings).
- There are functions in SVG not present in ISO 26300 (cubic bezier curves).
- Therefore, ISO 26300 uses an SVG-compatible namespace for all drawing functions that can be provided by SVG, and a separate namespace for 3D drawing functions. Hence avoiding any conflict with SVG.
ISO/IEC 26300:2006 (OpenDocument Format for Office Applications)
ISO/IEC 26300 OpenDocument is the ISO/IEC standard for office productivity applications. It covers the functionality needed for text documents, spreadsheets, drawings and presentations for office applications.
Ecma 376 duplicates the functionality of the existing OpenDocument standard as its core purpose is to support text documents, spreadsheets, drawings and presentations for office applications. Ecma 376 contradicts ISO 26300.
- Compatibility Note
- As far as we can see, all the functionality in Ecma 376 can be represented (better) in ISO 26300, and the much greater length of Ecma 376 is due to its poor design and failure to generalize and use existing standards. For example:
- Where Ecma 376 specifies attributes like useWord97LineBreakRules, ISO 26300 uses the more generic (and hence more powerful) style:line-break to select the set of line breaking rules for text.
- Where Ecma 376 specifies 50 pages of clip art (pages 2415 to 2465), ISO 26300 allows the insertion of an arbitrary image into the document.
- However, as the specification is over 6,000 pages long, it is impossible to say this authoritatively.
- Compatibility Note
- If there really is functionality in Ecma 376 that legitimately needs to be standardized at the ISO level (which we doubt), it should be standardized as an extension of the existing ISO 26300 and not as a new, incompatible standard.
- This point is discussed further in the section Ecma 376 is not 'necessary' as defined by the Agreement on TBT
W3C MathML (Mathematical Markup Language)
MathML is the W3C standard for "describing mathematical notation and capturing both its structure and content".
Ecma 376 section 7.1 "Math" (page 747) covers mathematical expressions, and defines a format in conflict and incompatible with the W3C Recommendation MathML.
Note: MathML is included in the ISO/IEC 26300 standard (OpenDocument Format) in section 12.5 "Mathematical Content". As a result, Ecma 376 conflicts with an ISO specification for mathematical notation.
- Compatibility Note
- If there is functionality in Ecma 376 that legitimately cannot be represented in MathML, Ecma 376 must still use MathML-compatible tags for all the features that can be expressed in MathML. See the previous section ("W3C SVG") for an illustration of how ISO 26300 solves a similar situation with the SVG standard.
ISO/IEC 10118-3, W3C XML-ENC, and other cryptographic hash standards
Ecma 376 ignores accepted standards for cryptographic hashes and defies expert standards for cryptography, by proposing its own hash algorithms which are almost certainly flawed.
Cryptography, including the constructure of secure hash functions, is very difficult. Weaknesses are regularly discovered even in publicly-vetted cryptographic algorithms long thought secure, whereas proprietary cryptographic methods not subjected to intensive public scrutiny are nearly always found to be seriously flawed (see e.g. Schneier).
Several government agencies and standards bodies with expertise in encryption have made lists of recommended hash functions, all of which have received extensive scrutiny by cryptographers. For example, in the area of secure hash functions:
- ISO has chosen the "Whirlpool" algorithm as standard ISO 10118-3.
- The W3C, in its XML-ENC standard, includes a list of algorithms: SHA1, SHA256, SHA512, RIPEMD-160.
- The European NESSIE project recommends: ISO 10118-3 ("Whirlpool"), SHA-256, SHA-384 and SHA-512.
- In the USA, NIST recommends SHA1, SHA224, SHA256, SHA384, and SHA512.
- In Japan, CRYPTREC recommends: MD5, RIPEMD-160, SHA1, SHA256, SHA384, and SHA512.
Ecma 376 section 126.96.36.199 (page 1941) does not follow the advice of any of these organizations. Instead, it defines new hashing algorithms that have not undergone scrutiny by the cryptographic community.
Section 188.8.131.52 (page 1941) defines one; Sections 184.108.40.206 (page 2786) "protectedRange" and 3.2.29 (page 2698) define another very similar algorithm. Nowhere is there clear notification that these algorithms are likely to be extremely flawed and thus should not be used in new applications.
The Emca 376 hash functions are almost guaranteed to be flawed and insecure. This poses two security risks:
- The immediate risk is that hashed document passwords may be determinable from the hashed value. Since users often reuse document passwords for other documents and other systems (whether they should or not), including an inadequately reviewed hash function risks enabling forgery and identity theft of many other systems by attackers.
- Defining a new hash function inside an ISO standard (giving it the ISO seal of approval) creates the expectation that this hash function has received proper scrutiny by the crytographic community (like ISO 10118-3 has) and is secure. This is likely to lead the industry into using the new insecure hash function(s) in a variety of security-critical applications, making many other security-critical applications directly vulnerable as well.
W3C SMIL (Synchronized Multimedia Integration Language)
SMIL is the W3C standard for "synchronized multimedia presentation". As the Recommendation states, with SMIL an author can:
- Describe the temporal behavior of the presentation.
- Describe the layout of the presentation on a screen.
- Associate hyperlinks with media objects.
Ecma 376 section 4.4 "Animation" (page 565) covers presentation animations (slide transitions), in conflict with the W3C Recommendation SMIL.
- Compatibility Note
- It is not necessary to define a new standard for slide transitions to achieve compatibility with existing Microsoft Office documents. Even if Ecma 376 legitimately needs slide transition functions not available in SMIL, it should reference SMIL for all features that are provided by SMIL.
- ISO 26300 OpenDocument illustrates this point. ISO 26300 uses SMIL-compatible attributes for slide transitions whenever such an attribute exists. In a similar way, if there is functionality in Ecma 376 that legitimately cannot be represented in SMIL, Ecma 376 must still use SMIL-compatible tags for all the features that can be expressed in SMIL.
Ecma 376 is immature and inconsistent
Even in the limited time available for public review of more than 6,000 pages, a large number of inconsistencies and flaws have become apparent in the Emca 376 specification, in addition to the major omissions and disregard for existing standards described elsewhere in this document. Although any one of these flaws, taken individually, is easily corrected, together they demonstrate the undue haste and lack of care that went into the rapid drafting of this proposed standard.
Invents units of measurement
Many attributes throughout the Ecma 376 spec take values in "English Metric Units" (EMU). For example, attributes of type ST_PositiveCoordinate (220.127.116.11, page 4505) are measured in EMUs. This is not a known unit in existing literature. It is only defined inside a paragraph in section 18.104.22.168 page 655, so that "91440 EMUs/U.S. inch, 36000 EMUs/cm". Similarly, (2.18.105, page 1836 ) specifies "twips"—twentieths of a point (1/1440th of an inch).
Internal inconsistencies: the w:sz element
The w:sz element is an example of major internal inconsistencies in the specifications measurements:
- For fonts, the w:sz element specifies the size in half points (22.214.171.124, page 1013).
- For frameset, the w:sz element has a string value that could be a relative value, a percentage, or a number of pixels (126.96.36.199, page 2136). The examples on page 2138 do not refer to w:sz at all.
- However, as the child of rPr (3.4.11, page 2846), its value is in points.
Note that in the Spreadsheet section (section 3), none of the examples have any namespace prefixes.
The w:sz attribute is also internally inconsistent:
- For table borders, the w:sz attribute is specified in eighths of a point, unless the border style is an art border, in which case the width is in points (188.8.131.52, page 824).
- When used as an attribute of restoredLeft (184.108.40.206, page 3786), it specifies the size of a dimension in normal view as a percentage of the screen.
- In presentations, as an attribute of the ph element (220.127.116.11, page 3825), it is an enumerated value with choices "full", "half", and "quarter" (4.8.13, page 3957).
- When sz is used as an attribute of defRPr (default character properties (18.104.22.168.2, page 4197), it is the size of a font in hundredths of a point.
Such inconsistencies dramatically increase the complexity of implementing the specification. All measurement specifications should be evaluated and adapted as necessary to provide a coherent system of measurements applied throughout the specification consistently to minimize the number of inconsistencies.
Internal inconsistencies and omissions: ST_Border
Section 2.18.4 page 2414 lists numerous styles such as apples, scaredCat, heebieJeebies, etc. However, the specification does not fully define these styles (e.g missing height, width, color-depth, orientation). The style basicThinLine describes behavior for horizontal, vertical and corner scenarios but many styles (e.g babyRattle, balloonsHotair, etc) provide no such details. The problem with this is that a single style can be interpreted differently by different vendors/implementors. Also, these styles provide no generality.
Confusing and inconsistent definitions of lengths of hexadecimal numbers
Ecma 376 is self-contradictory in its description of the lengths of several hexadecimal number types, such as ST_LangCode (2.18.52, page 2531), ST_ShortHexNumber (2.18.86, page 2591), ST_LongHexNumber (2.18.57, page 2542), ST_HexColorRGB (2.18.45, page 2520), ST_Panose (2.18.72, page 2569 and 22.214.171.124, page 4502), ST_UcharHexNumber (2.18.106, page 2620), ST_UnsignedIntHex (3.18.86, page 3712), ST_UnsignedShortHex (3.18.87, page 3713), and ST_HexBinary3 (126.96.36.199, page 4483).
For example, section 2.18.52 page 2531, ST_LangCode, is defined on the text as a "two digit hexadecimal code". But the values given cannot be represented by only two hexadecimal digits, and rather need four.
One possible interpretation is that by "digit" the spec sometimes means "octet" (i.e., a byte). An octet/byte is equivalent to two hexadecimal digits. If this interpretation is correct, then the specification clearly needs repair, since this is very likely to cause serious confusion in developers trying to implement Ecma 376. However, in other places (such as the definition for ST_LongHexNumber), it notes that 4 octets can store 8 hexadecimal digits (which is correct), so it is not simply a matter of defining "digit" oddly. This problem also suggests a lack of review, since clearly 4-digit values cannot fit in fields where only 2 digits are permitted.
ST_ShortHexNumber and ST_LongHexNumber two types are described as representing a "Two Digit Hexadecimal Number Value" and a "Four Digit Hexadecimal Number Value", respectively, with their contents restricted to "have a length of exactly 2 characters" and "exactly 4 characters", respectively. However, all of the examples given show them as using 4 and 8 hexadecimal digits, respectively, and the description of ST_LongHexNumber explicitly calls it a "four octet (eight digit) hexadecimal number". For example, when these quantities are used to represent bitmasks (see the section on bitmasks, below), they are used to represent 16-bit and 32-bit quantities, respectively, which require 4 and 8 hexadecimal digits, not 2 and 4. For example, section 2.4.51 (p. 1211), uses ST_ShortHexNumber to represent numbers as large as 0x0400 (decimal 1024), which requires at least three hexadecimal digits, and shows examples with four digits. As another example, section 188.8.131.52 (p. 1541) uses ST_LongHexNumber to represent "a four digit hexadecimal encoding of the first 32 bits of the 64-bit code-page bit field" — although it describes it as 4 digits, 8 hexadecimal digits are required to represent a 32-bit quantity, and the examples shown use 8 digits.
Similar problems are found in the definitions of numerous other hexadecimal types. For ST_HexColorRGB, the description states that the "contents must have a length of exactly 3 characters", but the XML Schema fragment defines it as 3 hexadecimal octets, or 6 hexadecimal digits. For ST_Panose the description states that the "contents must have a length of exactly 10 characters", but the XML Schema fragment defines it as 10 hexadecimal octets, or 20 hexadecimal digits. (The XML Schema definition is consistent with the example in 2.18.72; there is no example in 184.108.40.206.) For ST_UcharHexNumber, the description states both that the "contents must have a length of exactly 1 characters" and that it is "specified as a two digit (one octet) hexadecimal number"; the example and the XML Schema definition both support the latter interpretation. For ST_UnsignedIntHex, the description states that the "contents must have a length of exactly 4 characters", but the XML Schema fragment defines it as 4 hexadecimal octets, or 8 hexadecimal digits. (ST_UnsignedIntHex is meant to hold a hexadecimal representation of a unsigned integer, implying a length of 4 bytes (8 hexadecimal digits) on all modern architectures.) For ST_UnsignedShortHex, the description states that the "contents must have a length of exactly 2 characters", but, the XML Schema fragment defines it as 2 hexadecimal octets, or 4 hexadecimal digits. For ST_HexBinary3, the description states that the "contents must have a length of exactly 3 characters", but, the XML Schema fragment defines it as 3 hexadecimal octets, or 6 hexadecimal digits. (Separately, this type is used for srgbClr@val and sysClr@lastClr where in both cases it is being used for storing an RGB value. ST_HexColorRGB already exists specifically for this purpose.)
Unspecified terms: plain text
Ecma 376 section 11.3.1 ((page 38) "Alternative Format Import Part", allows content in "plain text". The encoding for "plain text" is not specified (is it 7-bit ASCII? ISO 8859-1? UTF-8?). As specified it will not allow international interoperable use.
Poor names and inconsistent naming conventions for elements and attributes
Ecma 376 contradicts the goals of XML which are:
- 6. XML documents should be human-legible and reasonably clear.
- 10.Terseness in XML markup is of minimal importance.
Instead, Ecma 376 often uses unclear names and inconsistent naming conventions. These include unnecessary vowel removals, name truncations, and unusual abbreviations.
These bizarre names:
- Reduce readability, making understanding such files and tools to process them unnecessarily more difficult.
- Greatly increase the risk of misunderstandings and confusion among developers, increasing the risk of incorrectly generated or processed documents.
- Make the specification unnecessarily hard to understand by developers whose native language is not English - inappropriate for an international standard.
There is no benefit to these bizarre names. In particular, they do not reduce the size of the file being saved or transmitted, since the file is compressed first.
Here are some examples:
- in VML (220.127.116.11, page 4413) "outerShdw (Outer Shadow Effect)" has its second word devoid of vowels. And yet its Child Elements and Attributes have different naming conventions, e.g. scrgbClr, algn, blurRad, dir, dist, rotWithShape
- in WordprocessingML (18.104.22.168, page 2020) "settings(Document Settings)" has a large list of Child Elements, and within that it has significant contradictory naming conventions, e.g. ActiveWritingStyle, attachedSchema, documentType, docVars, endnotePr, hdrShapeDefaults.
Inconsistent and inflexible notation for percentages
Ecma 376 uses four inconsistent notations for percentage units, at least one of which is particularly inflexible:
- Section 2.18.85 (p. 2583) uses predefined symbols (like "pct15" for 15%) in 5 or 2.5 percent increments (which is inflexible and difficult to process with standard XML tools, compared to a generic number-valued field)
- Section 22.214.171.124 (p. 2053) uses a decimal number giving the percentage
- Section 2.18.97 (p. 2608) uses a number in 50ths of a percent
- Section 126.96.36.199 (p. 4505) uses a number in 1000ths of a percent
In contrast, for example, the W3C SVG and W3C CSS standards both consistently use a single notation—decimal percentages followed by the "%" symbol—as described in section 7.10 of the W3C SVG 1.1 specification and section 4.3.3 of the CSS 2.1 specification.
- Compatibility Note
- There is no need for this inconsistency or inflexibility to achieve compatibility with pre-existing Microsoft Office documents. A generic decimal percentage would suffice to express all of the above numeric values, while being much easier to process with existing XML tools. The precision with which a number is expressed in the file can easily be independent of the precision with which it is implemented (e.g. a particular implementation may be limited to distinguishing 5-percent increments, and could achieve this by rounding the percentage internally as needed).
Inappropriate non-document settings (application settings)
Ecma 376 section 188.8.131.52 "doNotLeaveBackslashAlone" (page 2180). "This element specifies whether applications should automatically convert the backslash character into the yen character when it is added through user keyboard input". This is an application setting, not a document setting.
Section 184.108.40.206 (p. 2084) defines a "doNotUseLongFileNames" attribute to "ensure that the file names for all files generated when saving this document as a web page do not exceed eight characters with a three character extension." Similarly, section 220.127.116.11 (p. 2083, "doNotSaveAsSingleFile") also controls how the document is exported to HTML file(s). Again, these kinds of options regarding how files are exported into different formats are arguably application settings, not document settings, especially since other aspects of the HTML-export process (or export to many other formats) are not covered by Ecma 376.
Inappropriate user-interface specifications: Clip Art
In several places, Ecma 376 specifies information that is better left to the user interface of the implementation, rather than being defined by the file format.
Perhaps most egregious is Ecma's inclusion of long lists of images (clip art) that can be employed in the document—instead of specifying fixed lists of images, it would be much better to simply allow the use of an arbitrary embedded image in W3C SVG or another format, with any presupplied images determined only by the implementation. Examples of lists of fixed images and text paths in Ecma 376 include sections 2.18.4 (p. 2414, "Border Styles"), 18.104.22.168 (p. 4557, "Preset Shape Types"), and 22.214.171.124 (p. 4645, "Preset Text Shape Types"). In all, this constitutes over 100 pages of required clip art.
Non-XML formatting codes
In Section 126.96.36.199 page 2355 "XE" (full name not defined) defines "\b", "\i" at the end of a line to make the preceding text bold and italic, which is contrary to XML syntax. Similarly for other sections in 2.16.5, such as 188.8.131.52–184.108.40.206 (p. 2353–2354), which define "\* Caps", "\* FirstCap", "\* Lower", and "\* Upper" to format the capitalization of preceding text.
- Compatibility Note
- There is no need to implement a non-XML formatting syntax in order to import data from legacy programs, or even to present a legacy non-XML syntax as a user interface. The implementation can simply convert between syntaxes as needed when saving or loading a file.
Mismatched detailed description
The text that describes USERINITIALS, section 220.127.116.11 (p. 2353–2354), instead discusses USERNAME.
Inflexible numbering format
Section 2.18.66 page 2554, ST_NumberFormat, Numbering Format for number lists (2.9.18 page 1581), footnotes (2.11.17 page 1645), endnotes (2.11.18 page 1646), captions (18.104.22.168 page 1912) and Page numbers (2.6.12 page 1412).
- Fixed to a few countries. Many regions are not included.
- Contradicts W3C XSLT which ISO 26300 uses.
- Contradicts Unicode ISO 10646.
Uses a Microsoft-specific namespace
Section 22.214.171.124 page 5197 Attribute "href" (Hyperlink Target) uses a Namespace "urn:schemas.microsoft.com:office:office". An ISO standard must not reference company-specific namespaces.
- Compatibility Note
- Ecma 376 gives the rationale (section 6.1, p. 5126) that "all VML namespaces defined in this specification maintain the legacy namespace structure already used by millions of documents" and "VML should be considered a deprecated format included in Office Open XML for legacy reasons only and new applications that need a file format for drawings are strongly encouraged to use preferentially DrawingML". This merely raises the question of why VML is included at all, rather than simply using DrawingML and converting legacy documents as needed, or better yet using a existing standard file format such as W3C SVG as described above.
Emca 376 redefines standard color values
Ecma 376 section 2.18.46 (page 2521) contradicts the standard SVG Color Keyword Names's hexadecimal RGB values for given color names.
Independent of Ecma 376's failure to adopt the SVG standard, its subtle redefinition of existing standardized terms will only lead to further confusion.
- Compatibility Note
- There is no need to redefine color names in order to achieve compatibility with existing Microsoft Office documents. Microsoft is free to use whatever color names it wishes on its application interface, and store the hexadecimal color value in the file.
In contrast, section 126.96.36.199 (p. 4531) "ST_PresetColorVal" (Preset Color Value) matches SVG colors well. Unfortunately, it renames "darkGray" to "dkGray" to avoid self-contradiction at the cost of reducing agreement with the SVG standard.
- Blog: MSOOXML contradicts W3C SVG Colour definitions
- Blog: MSOOXML's disregard for existing standards.
Nonstandard, inflexible paper-size naming
Sections 188.8.131.52 (page 2770) and 184.108.40.206 (page 2774), both of which involver printer settings, define a "paperSize" attribute whose value is an integer representing one of 68 fixed paper sizes. These paper-size codes are apparently based on corresponding paper-size registry codes in Microsoft Windows, rather than using the standard paper-size names as defined in ISO 216, ANSI Y14.1, and similar standards. In contrast, ISO 26300 OpenDocument employs a much more flexible scheme: it simply describes the paper size by recording the physical width and height of the page, leaving the assignment of symbolic paper-size names to the user interface.
Sections of Ecma 376 were not properly submitted to ISO
Ecma 376 improperly incorporates by reference sections that were not included in Ecma's submission to JTC1, and were only available electronically on the Ecma web site in a format not permissible for JTC1 review.
In particular, pages 191 and 263 list an "Annex D" which contains "normative definitions" of XML schema "distributed in electronic form only" and which are the "definitive version" if "discrepancies exist between the electronic version of a schema and its corresponding representation as published in this part". These schema can be found only in the file "OpenPackagingConventions-XMLSchema.zip" on the Ecma 376 web page, which was reportedly not a part of the Ecma's official JTC1 submission, and is in a format (a zip-compressed DrawingML XML file) that is not one of the allowable formats for JTC1 review. The "normative" definitions of Annex D are referenced by Ecma 376 in sections 3.8.7 (p. 2101, "Cell Style"), 3.8.40 (p. 2931, "Table Style"), and 220.127.116.11 (p. 4557, "Preset Shape Types"), 18.104.22.168 (p. 4645, "Preset Text Shape Types").
Ecma 376 uses bitmasks, inhibiting extensibility and use of standard XML tools
The boolean (yes/no) type in most programming languages, such as C and C++, or other types used for the same purpose (such as "char" in ISO/IEC 9899:1990), corresponds to at least a single byte (8 bits) on all modern systems. In memory-constrained situations (as were common in the past), however, the inefficiency of using an 8-bit type to store 1 bit of information was a problem.
A bitmask is a technique to encode multiple values inside a single variable, by assigning a meaning to each individual bits of the variable. For example, the binary 10110001 (decimal 177) would mean Yes/No/Yes/Yes/No/No/No/Yes and contain the answers to 8 different yes/no questions.
Bitmasks in Ecma 376
Many element attributes in Ecma 376 are defined as bitmasks. For example:
Ecma 376 section 22.214.171.124 (page 1541) "sig (Supported Unicode Subranges and Code Pages)" describes the <w:sig> element whose attributes are all bitmasks. For example, take the attribute csb1:
"Specifies a four digit hexadecimal encoding of the upper 32 bits of the 64-bit code-page bit field that identifies which specific character sets or code pages are supported by the parent font"
This attribute takes the following values:
|0-15||Reserved for OEM||24||IBM Turkish|
|16||IBM Greek||25||IBM Cryillic|
|17||MS-DOS Russian||26||Latin 2|
|18||MS-DOS Nordic||27||MS-DOS Baltic|
|19||Arabic||28||Greek (former 437G)|
|20||MS-DOS Canadian French||29||Arabic (AMSO 708)|
The other attributes of <w:sig> have similar definitions as bitmasks.
Many other element attributes in Ecma 376 have similar definitions as bitmasks. For example:
- Section 126.96.36.199, Paragraph conditional formatting (page 842).
- Section 2.4.7, Table cell conditional formatting (page 1085).
- Section 2.4.8, Table row conditional formatting (page 1087).
- Section 2.4.51, Table style conditional formatting settings (page 1211).
- Section 2.4.52, Table style conditional formatting settings exceptions (page 1213)
- Section 188.8.131.52, Suggested filtering for list of document styles (page 2034)
- Section 184.108.40.206, Suggested sorting for list of document styles (page 2036)
- Section 220.127.116.11, tableproperties attribute of shape group (page 5227)
Bitmasks are not extensible
The bitmasks specified by Ecma 376 are mostly of fixed length (a fixed number of bits). For example, the bitmasks used in sections 2.4.51, 2.4.52, 18.104.22.168, and 22.214.171.124 are all of type ST_ShortHexNumber (2.18.86, p. 2591), which is defined as consisting of exactly 4 hexadecimal digits (16 bits, see above regarding conflicting definitions). The bitmasks in section 126.96.36.199 are of type ST_LongHexNumber (2.18.57, p. 2542) which is defined as consisting of exactly 8 hexadecimal digits (32 bits, see above regarding conflicting definitions). The bitmasks in sections 188.8.131.52, 2.4.7, and 2.4.8 are of type ST_Cnf (2.18.11, p. 2478), which is defined as consisting of exactly 12 binary digits (12 bits). The bitmask in section 184.108.40.206 (p. 5227) consists of exactly "three bits".
Because it is not possible to add new bits to a fixed-length bitmask, extensibility is extremely limited.
Also, bitmasks require that some other data be encoded into numbers to be used in the bitmasks. For example, see the language encodings discussed earlier: every language must be assigned an arbitrary numeric code before it can be used. Keeping this mapping up-to-date requires constant maintenance by some body. If not carefully handled, a single vendor could end up having de facto control over this mapping, and as a result that vendor could determine what could be done or not by the format (by refusing to assign mappings useful to a competitor).
- Compatibility Note
- XML formats have no need for bitmasks. XML provides much richer structures, and the original benefits of bitmasks do not apply to XML formats. The theoretical memory saving is irrelevant if you are encoding the number in ASCII and surrounding it by text tags. In addition, Ecma 376 documents (like ISO 26300) are compressed anyways.
Bitmasks cause significant validation problems
Using bitmasks creates a new data model, separate from the XML data model. In particular, the bitmask cannot be described in or validated by XML Schema, Relax NG, Schematron or any standard XML schema language or current validator.
Bitmasks defeat XSLT manipulation
XSLT is the W3C standard for manipulating and converting XML documents, and is by far the most popular tool for working with XML. XSLT has no tools for bitwise operators, since bitmasks are not part of the XML data model.
Bitmasks conflict with the Ecma TC45 charter
The TC45 is the Ecma Technical Committee charged with developing the Ecma 376 specification. The charter of the TC45 includes the specific goal of:
"...enabling the implementation of the Office Open XML Formats by a wide set of tools and platforms in order to foster interoperability across office productivity applications and with line-of-business systems"
Since bitmasks cannot be implemented in any of the standard tools for XML data formats, their use is in conflict with the TC45's charter.
Bitmasks in Ecma 376 are not internally consistent
The formats used to describe bitmasks are not internally consistent within Ecma 376.
- Several bitmasks (sections 220.127.116.11, 2.4.7, and 2.4.8) use ST_Cnf (2.18.11 p. 2478), expressed as a string of 12 binary digits.
- Several bitmasks (sections 2.4.51, 2.4.52, 18.104.22.168, and 22.214.171.124) use ST_ShortHexNumber (2.18.86 p. 2591), expressed as a string of 4 hexadecimal digits (although the specification contradicts itself: it says that this "type's contents must have a length of exactly 2 characters", but all the examples shown have 4 characters).
- Section 126.96.36.199 uses ST_LongHexNumber (2.18.57, p. 2542), expressed as a string of as 8 hexadecimal digits (although the specification contradicts itself: it says that this "type's contents must have a length of exactly 4 characters" and describes it as a "Four Digit Hexadecimal Number Value" but also says it is a "four octet (eight digit) hexadecimal number" and all of the examples shown have 8 digits).
- At least one bitmask, in section 188.8.131.52 (p. 5227), uses an unspecified "string" format "represented as an integer", with a "decimal" number given as an example.
This internal inconsistency in not only the length, but the format (binary, hexadecimal, or decimal) of bitmasks makes uniform processing of Ecma 376 bitmasks even more difficult, in addition to the problems mentioned above.
Field codes, formulas, and relationships
Field codes (such as INCLUDEPICTURE and INCLUDETEXT) and formulas are neither cross-platform nor cross-machine compatible.
Field codes and formulas do not specify a single path separator: forwards and backwards slashes are permissible. This may render content referred to by field codes and formulas inaccessible when documents are opened on a system that uses a different operating system than the creating system did. This may happen, for instance, when:
- A Windows user saves a master document and linked files to a network share. The master document is then opened on Linux system from that share.
- A Windows user e-mails the master document (AND all linked files) to a Linux user, who then saves them to disk and attempts to open the master document.
Broken fields and formulas may result. It may be impossible to render faithfully on the consuming system.
Behavior regarding the use of forward and backward slashes is inconsistent. Unix and its clones indicate directories with a forward slash; Windows, with a backslash. Furthermore, on certain operating systems, the symbol that is not in use as a path separator is also NOT reserved: it may be used in valid filenames. Because of this, the onus is on the consuming application to determine what exactly these symbols mean. For instance, does the backslash in "the\file.docx" refer to the "file.docx" in the directory "the", or is it part of the name of a file in the current directory?) Ideally, the specification would use ONE path separator. This would simplify interpretation of paths (only producers and consumers on systems that use a different path separator would have to perform any conversion, and this conversion would be far more certain.)
Applications are free to encode paths as they please in field codes and formulas. Both relative and absolute paths may be used. Both types of paths have their uses. However, the specification leaves it up to the producing application to determine whether absolute or relative paths are used. The result is twofold:
- First, consuming applications may not be able to interpret the paths correctly (Microsoft Word, for instance, does not understand relative paths.)
- Second, encoding filenames and locations as absolute paths makes files unportable (they cannot be transported to another machine, let alone be opened remotely or moved to a different directory.)
In both cases--whether the consuming application cannot interpret the path, or when the path is invalid due to changes in directory structure--the field codes and formulas break, thus breaking the master document. For instance, if the user creates c:\dir\master.docx and adds an INCLUDETEXT field code to a file slave.docx (that resides in the same directory), but then renames dir to folder or, alternately, moves these files to a Linux-based system (with no drive letters), the document will break. The included text will not display. The only way to work around this is for the consuming application to heuristically search for the files referred to by the field codes and formulas.
Deviation from rels structure
Finally, it should be noted that field codes and formulas constitute an exception from the rels structure defined by the specification. The aim of the rels structure is to spare consumers the need to iterate through all packaged files, scanning for links. Instead, they need only look in rels, which serves as a central repository with the details of all internal and external relationships. Excluding field codes and formulas from this structure defeats its entire purpose, as applications will nevertheless be enjoined to scan through every part of the package for relationships not mentioned in rels.
Ecma 376 relies on undisclosed information
Undisclosed proprietary specifications
Section 184.108.40.206 "Embedded Object Alternate Image Requests Types" (page 5679) requires implementors to support the proprietary Windows Metafiles.
Section 220.127.116.11 (page 4077) defines a "quicktimeFile" element that "specifies the existence of a QuickTime file" which can be played as specified "within the timing node list". QuickTime is a multimedia container file format originating with Apple Computer, which was the basis of ISO/IEC 14496-14:2003. Not only does the Ecma 376 standard not specify a version of QuickTime that is to be supported, but it does not specify which of the many audio and video encoding formats ("codecs") that can be found in QuickTime containers are to be supported. Many of these codecs, such as the Sorensen Video codecs, are undocumented proprietary formats that may not be implementable by an independent software application.
Cloning the behaviour of proprietary applications
Several sections require the implementor to clone the behaviour of a proprietary product, where the behaviour to clone is not specified. For example:
- Section 18.104.22.168 page 2161, autoSpaceLikeWord95.
- Section 22.214.171.124 page 2199, footnoteLayoutLikeWW8.
- Section 126.96.36.199 page 2209, lineWrapLikeWord6.
- Section 188.8.131.52 page 2210, mwSmallCaps.
- Section 184.108.40.206 page 2225, shapeLayoutLikeWW8.
- Section 220.127.116.11 page 2245, suppressTopSpacingWP.
- Section 18.104.22.168 page 2250, truncateFontHeightsLikeWP6.
- Section 22.214.171.124 page 2252, uiCompat97To2003.
- Section 126.96.36.199 page 2264, useWord2002TableStyleRules.
- Section 188.8.131.52 page 2265, useWord97LineBreakRules.
- Section 184.108.40.206 page 2266, wpJustification.
- Section 220.127.116.11 page 2268, wpSpaceWidth.
More can be found by searching Ecma 376 for the word "Guidance".
Specifications that say "clone this product," instead of explicitly stating what behavior is required, have no place in an international standard. It may also be illegal in some jurisdictions to determine what such a non-specification means, as discussed below regarding end-user license agreements (EULAs).
- Compatibility Note
- Attributes like these have no place in an international standard, and are not needed for compatibility with existing documents. The correct way to achieve compatibility is through generic tags. For example:
- autoSpaceLikeWord95 should be replaced by a generic character-spacing attribute that takes a numeric value or set of numeric values.
- wpSpaceWidth should be replaced by by a generic space-width tag that takes a numeric value or set of numeric values.
- Even attributes as obscure as lineWrapLikeWord6 can be generalized into a line-wrap-style attribute. Using a more general solution offers far more extensibility and flexibility.
Relies on application-defined behaviors
Ecma 376 often relies on "application-defined" behaviors to support important functionality that should be documented or supported via existing standards. The reliance upon application-defined formats inhibits the goal of interoperability and prevents the exchange of valuable information contained within a document.
- Section 18.104.22.168 p. 5436 defines the "equationxml" attribute of "shape" elements, "used to rehydrate an equation using the Office Open XML Math syntax". This information is apparently intended to allow mathematical equations in drawings to be edited and interpreted based on their underlying mathematical structure rather than as simple graphical objects, a critically important feature for users of equations in illustrations and presentations. However, the "actual format of the contents of this attribute are application-defined", which makes them impossible to exchange between applications. (Even though "they shall contain Office Open XML Math", this could be arbitrarily and unnecessarily obfuscated by the presence of other application-specific information, application-specific encodings, and so on.)
- Section 22.214.171.124 p. 5438 defines a "gfxdata" attribute for the "shape" elements, which "contains DrawingML content" that is "base-64 encoded". However, the "contents of this package are application-defined", so even though they "shall use the Parts defined by this Standard whenever possible" there is not sufficient information for an independent implementation to read this data or display the "DrawingML content" contained therein. (The stated rationale for this attribute is to allow "VML to represent graphical content while still persisting DrawingML for consuming applications that support DrawingML" — but this only highlights the duplicative nature of Ecma 376, which defines two new vector-graphics XML formats, VML and DrawingML, instead of using a single standard one such as W3C SVG.)
- Section 126.96.36.199 on p. 5596 defines an "ink" element which stores "ink annotations in an application-defined format." This is apparently intended to store Microsoft Ink annotations, used with tablet input devices to add hand-written annotations to documents. These annotations are often a vital part of documents and their specification is undefined in Ecma 376. Moreover, the use of unspecified formats is entirely unnecessary, as the W3C PNG specification could be used for transparent raster image data and the W3C SVG specification could be used for vector or mixed vector/raster data. Microsoft, in contrast, reports that it uses one of two proprietary formats for Ink content: an Ink Serialized Format (ISF) encoding the user's pen-stroke information as well as other metadata (using an undocumented compressed format), as well as a "fortified" GIF format including ISF meta-data.
- Numerous elements are not required by the standard, but if omitted lead to "application-defined" default behaviors—a completely unnecessary barrier to interchange between applications (causing the same document with "default" styles to appear completely different in two conforming programs), as opposed to simply defining the defaults in the standard. For example, sections 2.7.4 (p. 1482) defines elements to specify default paragraph and run properties (docDefaults, pPr, pPrDefault, rPr, and rPrDefault); if these are omitted "the defaults are therefore application-defined". Similarly, section 188.8.131.52 (p. 2280) defines a date-and-time formatting switch that, if not present, leads to "a date or time result is formatted in an implementation-defined manner."
Relies on unspecified multimedia file formats
In several places, Ecma 376 specifies ways in which a document can incorporate external graphics, audio, and video files, without specifying even a minimal set of file formats that should be supported. This immediately creates a barrier to interoperability, because there is no reason to expect that different implementations of Ecma 376 will support the same multimedia file types.
- Section 184.108.40.206 (page 2320) defines an "INCLUDEPICTURE" field that "retrieves the picture contained in" a named document. However, it does not specify what "picture" formats should be supported, despite the fact that there are many standard graphics formats that could be reasonably supported (such as W3C PNG, ISO 10918-1 JPEG, or W3C SVG).
- Section 220.127.116.11 (page 4075) defines an "audioFile" element that "specifies the existence of an audio file" that can be played as specified "within the timing node list". However, it does not specify what "audio" formats should be supported (as opposed to specifying ISO 11172-5 "MP3" files, and/or some other well-documented formats such as Ogg Vorbis or FLAC).
- Section 18.104.22.168 (page 4079) defines a "videoFile" element that "specifies the existence of a video file" which can be played as specified "within the timing node list". However, it does not specify what "video" formats should be supported (as opposed to specifying ISO/IEC 14496 MPEG-4 and/or some other documented standard formats).
Ironically, Ecma's Open XML White Paper touts Ecma 376's "independence from any particular type of source content" as a "requirement" for "interoperability": "OpenXML contains no restriction on image, audio or video types. For example, images can be in GIF, PNG, TIFF, PICT, JPEG or any other image type." The white paper does not say how different implementations can possibly interoperate fully if this would require them to support "any" conceivable image, audio, or video type.
Ecma 376 cannot be adequately evaluated within the 30-day evaluation period
At over 6,000 pages, the Ecma 376 specification is 10 times larger than the ISO/IEC 26300 OpenDocument specification (at least in part because it fails to reuse many pre-existing standards). It is not possible to review over 200 pages per day with any hope of finding all the major problems in the specification.
The Ecma 376 specification lacks any pre-existing industry review with the exception of this document:
- Ecma 376 was prepared over-hastily, with a mean page review/edit/approve rate of more than 18 pages per day, approximately 20 times faster than other markup standards.
- Insufficient time was available for review of the enormous specification; it was finalized by Ecma on December 7 and submitted to JTC-1 less than 30 days later;
- When submitted to JTC-1, there were no full-featured reference applications available for testing and evaluation purposes.
- Ecma 376 was developed behind closed doors, severely limiting external review during its development cycle and making it unlikely to be the result of an industry consensus.
- The work plan of the Ecma technical commitee that developed Ecma 376 specifically required compatibility with pre-existing proprietary file formats of a single vendor (Microsoft) that are incorporated by reference but whose specifications are not available. This restriction, the unavailability of the specifications for those formats, and the lack of suitable reference applications blocks review and evaluation of Ecma 376's success in achieving its core goal of compatibility with those legacy binary file formats.
- No reference applications that implement even a majority of the features of Ecma 376 were available for testing and evaluation purposes at the commencement of the period for JTC-1 review, nor are they available to the present day.
In spite of the short time available and the other constraints, this document preparation process has already found many problematic aspects of Ecma 376. This review is far from comprehensive. A comprehensive review less constrained by a short review period will undoubtedly uncover many more flaws.
Ecma 376 has not met the stability requirement
ISO/IEC JTC 1 Directives, Edition 5, Version 2.0 states that in relation to PAS submissions: "The specification shall have had sufficient review over an extended time period to characterise it as being stable." (JTC1 Directives, Annex M The Transposition of Publicly Available Specifications into International Standards - A Management Guide, M.22.214.171.124)
Since the specification was submitted for fast-track resolution almost immediately after its development, and its development was behind closed doors, this requirement has not been met.
Ecma 376 cannot be fully implemented by other vendors
For a variety of reasons, noted below, it is not reasonable for vendors other than Microsoft to fully implement Ecma 376.
Ecma 376 requires implementation of undisclosed specifications
See the section Ecma 376 relies on undisclosed information above.
The "compatibility with legacy formats" can only be implemented by Microsoft
- As indicated above, Ecma 376 requires implementors to emulate the behaviour of previous Microsoft products. As the behaviour is not specified, and the products are proprietary, only Microsoft can implement those portions of the specification.
- As indicated above, Ecma 376 requires implementors to support Windows Metafiles instead of ISO 8632 or W3C SVG. As Windows Metafiles are a proprietary technology, only Microsoft can implement this portion of the specification reliably.
- Ecma 376 section 11.3.1 "Alternative Format Import Part" allows implementations to insert content in alternate file formats such as RTF. RTF is a Microsoft proprietary format. Microsoft can support old binary documents simply by embedding the RTF content. But other implementors cannot reliably support those documents because the specification for RTF is not included in Ecma 376.
Patent rights to implement the Ecma 376 specification have not been granted
Read literally, the intellectual property ("IP") documents accompanying Ecma 376 grant no rights for vendors other than Microsoft Corp. to implement the specification. Even ignoring that problem, the IP documents are at best ambiguous as to the extent of rights granted and convey no rights for any other version of the proposed standard such as an improved version reflecting JTC-1 criticism. A single vendor in effect retains veto rights over any changes to the specification. Such defects render Ecma 376 unsuitable as an international standard candidate.
The bottom line is that the relevant intellectual property documents present legal quicksand of a depth that could only be determined through litigation. They are an unsuitable legal foundation for an international standard.
Rights to implement Ecma 376 are governed by two Microsoft Corp. covenants not to sue, the Microsoft Open Specification Promise ("OSP") and an earlier Microsoft Covenant Regarding Office 2003 XML Reference Schemas ("CNS") See Microsoft Open Specification Promise page: "We are giving potential implementers of Ecma Office Open XML the ability to take advantage of either the CNS or the OSP, at their choice."
The Microsoft covenants not to sue grant no rights
Both the OSP and the CNS are worded using what are for practical purposes identical grammatical constructs that as far as we can tell grant no rights whatsoever whilst leaving the superficial appearance of a grant of rights. In the OSP, Microsoft states that the rights granted are for "patents that are necessary to implement [the specification]." In the CNS, the rights granted are for "patent claims necessary to conform to the technical specifications[.]" It would make equal sense to say, "apples necessary to conform to the technical specifications." The problem is that no patents or patent claims are necessary to implement or conform to a software specification and the rights thus granted consist of an empty set.
Software is written in code and implemented using methods and concepts. Software is not written or implemented in patents or in patent claims. An implementation of a software specification can fully conform to that specification regardless of whether or not patents are thereby infringed.
A patent is a legal instrument analogous to a deed of ownership for real property. Patent claims are analogous to the description of real property in a deed. But neither a deed nor its property description are what is actually owned; a deed is a legal instrument, not the property owned, which is on a separate plane of existence. The property identified in a deed's description may have a real house or tree upon it; the deed does not. Just so, software may employ methods and concepts described in a patent's claims, but the patent claims are not the methods and concepts described therein. The patent claims are only a description of those methods and concepts. The methods and concepts described in patent claims may be necessary to implement or conform to a specification; however, their mere description in the patent claims is not "necessary" to the implementation of the specification. The patent claims and the methods and concepts exist on separate planes.
Therefore, the enabling language of the OSP and the CNS, read literally, describe empty sets of rights expressly granted. Recognizing the unfairness of misleading language in such documents, courts will often remedy an otherwise unjust result by implying corrective language from established norms or industry practices or by recognizing a right by way of a waiver or estoppel. But both the OSP and the CNS conclude with a sentence stating:
"[n]o other rights except those expressly stated in this promise [respectively, covenant] shall be deemed granted, waived or received by implication, or estoppel, or otherwise."
Because the rights "expressly stated" are an empty set, the sentence just quoted has the effect of blocking any judicial attempt to prevent an unjust result. Therefore, a court would most likely be forced to rest a waiver or estoppel on Microsoft's public statements about the openness of Ecma 376 rather than on the IP documents themselves.
Moreover, the problem with the grammatical construct was brought to Microsoft's attention in a critique of the CNS. See also footnote accompanying the linked text. That the same grammatical construct was nonetheless carried over to the later OSP is therefore cause for concern.
The fact that the relevant IP documents, read literally, appear to grant no rights whatsoever provides an unacceptable legal foundation for an international standard. The grant of rights to implement Ecma 376 should be made explicit. Ecma 376 should be diverted from fast-track processing so that revised legal documents, if any are forthcoming, can be more carefully reviewed.
Microsoft intellectual property documents are ambiguous
The apparent lack of a grant of any rights whatsoever is not the only serious defect. Ambiguities in the language are compounded by conflicting language and ambiguities in the specification itself in regard to conformance. However, as drafted neither the OSP nor the CNS allow resort to a separate document such as the Ecma 376 specification as a source of rights because both IP documents prohibit the recognition of rights that are not expressly stated in the IP documents themselves. These are issues that can only be resolved by Microsoft.
Both the OSP and the CNS extend patent protection only to conformant implementations of the specification, with further ambiguous qualifiers discussed in later sections below. See OSP ("to the extent it conforms to a Covered Specification"); CNS ("claims necessary to conform to the technical specifications ... against those conforming parts of software products"). However, neither document defines the variants of the word conform that they use, apparently leaving that to be defined in the specification itself.
The documents should be redrafted to indicate unmistakably where the relevant definitions of conformance can be found, particularly given both covenants' inclusion of a sentence forbidding the implication of any rights not expressly stated in those documents. Both documents say:
No other rights except those expressly stated 'in this [covenant/promise]' shall be deemed granted, waived or received by implication, or estoppel, or otherwise.
As discussed in the preceding section, the rights granted are to our understanding actually a null set. However, glossing over that problem by assuming arguendo that some rights were granted nonetheless, the existence and scope of those rights can only be determined by examining the specification itself to determine the meaning of conform and its variants. But those are rights identified in another document, the specification, not "in this [covenant/promise]." That is fatal to any attempt to use the Ecma 376 specification as a source of rights granted because the reader is forbidden from looking to another document as a source of implementer's rights not expressly stated in the IP documents.
Therefore, any revised IP documents should also unambiguously declare where the relevant definition of conform and its variants is located.
The Microsoft Open Specification Promise is ambiguous
Moreover, in the OSP we find additional language limiting rights:
Microsoft Necessary Claims” are those claims of Microsoft-owned or Microsoft-controlled patents that are necessary to implement only the required portions of the Covered Specification that are described in detail and not merely referenced in such Specification.
That sentence contains a four-step reduction of rights. First, one must somehow penetrate "patents that are necessary to implement" phrase, which is forbidden by the sentence that prohibits the implication of rights not expressly stated. Ignoring that barrier and implying that which is forbidden, one could refashion the subject phrase into something like the more typical "patents that are necessarily infringed by implementing the specification."
However, the putative patents that would necessarily be infringed by implementation are nowhere identified so that developers—or reviewers of the draft specification—might determine whether implementation would infringe a Microsoft patent. See ISO/IEC Patent Policy ("The originator of a proposal for a document shall draw the attention of the committee to any patent rights of which the originator is aware and considers to cover any item of the proposal. Any party involved in the preparation of a document shall draw the attention of the committee to any patent rights of which it becomes aware during any stage in the development of the document.")
That problem is exacerbated by the fact that Microsoft has not granted rights to implement the entire Ecma 376 specification. In the second step of the OSP's narrowing of rights, the phrase "only the required portions" excludes from patent protection any implementation of any portion of the Ecma 376 specification that is not mandatory. The Microsoft Open Specification Promise appears to offer no patent protection whatsoever for implementation of the multitude of optional features in the Ecma 376 specification.
In the third step of narrowing implementers' rights, the OSP carves off patent protection for implementations of any specification features that are required by the specification unless they "are described in detail." How much detail? The term is not defined. Viewed one way, a single alphanumeric character is a detail. Is that sufficient? Viewed another way, one can purchase books at nearly any bookstore that describe how to write software programs in detail. Is that enough detail? Absent far less vague definitions or identification of specific portions of the specification excluded from patent protection, one can only obtain answers to such questions through litigation.
In the fourth step of narrowing rights, the OSP carves off patent protection for all implementations of any mandatory requirement that is "merely referenced in such Specification." Thereby, would Microsoft not be denying any patent protection for the required implementation of, e.g.:
- the Unicode Standard (pg. 13)
- the UTF-8 and UTF-16 encoding form, as required by XML 1.0 (pg. 13)
- W3C XML 1.0 (passim)
- ISO B4 (pg. 2772)
- ISO B5 (pg. 2772)
- ISO 639-1 (pg. 1040)
- ISO 690 (pg. 5965)
- ISO 3166-1 (pg. 2530)
- ISO 8061 (pg. 575) (Alpine Ski Bindings!)
- ISO 8601 (pg. 184)
- ISO 10646 (pg. 1544)
- ISO 8859-15 (pg. 2699)
- ISO/IEC 2382.1:1993 (pg. 6002)
- ISO/IEC 9594-8 (pg. 184)
- ISO/IEC 10646 (pg. 184)
- ISO/IEC 10646-1 (pg. 13)
There are many other, non-ISO published standards that are "merely referenced" in the specification.
Closer to home, many Microsoft legacy file formats are also required by the specification to be implemented and are "merely referenced." Rob Weir of IBM has collected and referenced several such instances and discussed them in the context of conflicting provisions of the specification that both require and forbid their implementation. While it would be difficult for developers to implement those requirements because the formats' specifications have not for the most part been disclosed, should they succeed in doing so, e.g., through reverse engineering the formats, they appear to be given no patent protection by the OSP because they are "merely referenced" in Ecma 376. Moreover, they are apparently granted no rights to reverse engineer the applications to determine required behavior.
In the same vein, Ecma 376 is replete with a series of tags for supporting deprecated features of Microsoft applications. Examples are discussed in earlier portions of this document. Each such tag is accompanied by the following boilerplate guidance and can be quickly identified by searching the specification for portions of the guidance's text:
Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance
Because such tags "merely reference" behavior of older applications rather than specifying the required behavior in detail, Microsoft has apparently granted no patent rights to study and replicate their unidentified behavior, despite the use of the mandatory must in the "guidance".
The Microsoft Covenant Not to Sue is irrelevant and ambiguous in any event
Despite Microsoft's statement on its Open Specification Promise page that developers can take their choice of implementing Ecma 376 under the patent protection of the OSP or the earlier covenant Not to Sue, that supposed grant of rights is wholly ineffective and irrelevant. The CNS by its own terms is plainly limited to implementations of the Microsoft Office 2003 XML Reference Schemas. The CNS also bluntly states:
No other rights except those expressly stated in this covenant shall be deemed granted, waived or received by implication, or estoppel, or otherwise.
(Emphasis added.) Microsoft's statement on another web page that this CNS can be relied upon as a source of rights to implement Ecma 376 is ineffective. That would require that rights not expressly stated in the CNS (the right to implement a different specification) be "otherwise" granted, waived, or received. The CNS forbids its amendment as to rights granted by resort to another document. The CNS is irrelevant in determining what rights are granted to developers who implement Ecma 376.
Even ignoring that problem and those discussed earlier, the CNS is also highly ambiguous as to the extent of rights to implement, at least as ambiguous as the Open Specification Promise. We refer the reader again to a detailed analysis of the CNS for a discussion of ambiguities within it.
End-User License Agreements (EULAs) may forbid full implementation
As noted above, many portions of the specification inappropriately require duplication of the functionality of various proprietary products, without a definition of exactly what that behavior is. Even worse, in some jurisdictions it may be illegal for competitors to try to determine what the specification actually means.
Many of these products' End-User License Agreements (EULAs) forbid attempts to determine exactly what these products do. It is difficult to find the EULA for Word 6, but later versions are instructive. For example, the "Microsoft Office Standard Edition 2003" (retrieved January 22, 2007) states in "LIMITATIONS ON REVERSE ENGINEERING, DECOMPILATION, AND DISASSEMBLY" that, "You may not reverse engineer, decompile, or disassemble the Software, except and only to the extent that such activity is expressly permitted by applicable law notwithstanding this limitation."
Note that these involve copyright and/or contract issues, and thus Microsoft's patent grants do not appear to provide any relief from these provisions.
In some jurisdictions, these EULA statements are probably enforceable. Indeed, Virginia and Maryland in the United States have passed a law called "UCITA", and UCITA essentially gives EULAs the force of law. Some jurisdictions do permit reverse engineering for interoperability purposes, but this is not universally true, and in some cases it is not clear that these exceptions are enough to permit legal use. The U.S. Digital Millenium Copyright Act (DMCA) includes an exception permitting reverse engineering for interoperability purposes from its prohibitions, but it is unclear that this DMCA provision would override EULAs in this case. In any case, this is a legal issue that must be resolved before this specification can even be considered. It is inappropriate to consider an international standard in which some suppliers might be forbidden by law to determine what the specification is.
Ecma 376 is a vendor lock-in specification
- Adoption of Ecma 376 in its current state would frustrate the ISO goal [PDF] of "one standard, one test, and one conformity assessment procedure accepted everywhere.” Yet Microsoft's Alan Yates has freely admitted that the primarily goal of Ecma 376's sponsor is to have two standards instead of one: "What I'm really going to be talking about is Massachusetts actually opening up to more choice and more competition than the current policy has. That's, I think that's the fundamental decision that's before us. Can Massachusetts open up to more choice, additional standards, in order to enable greater value over a period of time?"
- Ecma 376 adoption would in effect grant Microsoft a monopoly on the conversion of its binary formats to XML
- Ecma 376 is at least arguably violative of an existing antitrust injunction issued by the European Commission DG Competition*
- Ecma 376 is at least arguably violative of an antrust injunction issued in U.S. v. Microsoft
- Microsoft's refusal to disclose specifications for its binary file formats is under anti-trust investigation by the European Commission
Ecma 376's full name, "Office Open XML", confuses the marketplace
The name "Office Open XML" is often confused with "Open Office XML," the original name of the OpenDocument format in its early development. This will confuse many readers and software users into thinking this is the ISO/IEC 26300:2006 (OpenDocument) standard, or that it relates to the OpenOffice.org product (one of several products that implement the ISO/IEC 26300 standard and a competing office suite that has one of the largest market share after Microsoft Office).
Indeed, several web applications that have implemented ISO/IEC 26300 refer to it in their user interfaces with labels such as "Save to OpenOffice." It may be that their developers have decided that because of the popularity of the OpenOffice.org office suite, more users will recognize the name of the most popular implementing application than will recognize the name OpenDocument. An example that uses the "Save to OpenOffice" label is Google Docs, a free web application service. Even Microsoft press releases are confused as to the proper name of Office Open XML.
For further relevant detail see:
- Amusing but Confusing, by Rob Weir, An Antic Disposition (collecting examples of press confusion—and a Microsoft Office Open XML expert's blog profile —mistakenly referring to Office Open XML as Open Office XML.
In a similar situation JTC-1 national bodies alleged a contradiction against C++/CLI when it was submitted under Fast Track last year. See:
Microsoft may or may not desire such confusion to exist, but it does. A vendor's choice of names is not necessarily appropriate for a standard. JTC-1 is urged to require that Ecma 376 be given a less confusing name.
Ecma development process not open nor the result of industry consensus
Ecma Technical Committee 45 included the following goal in its terms of reference: “produce a formal standard for office productivity applications within the Ecma International standards process which is fully compatible with the Office Open XML Formats”. In other words, the technical committee was specifically limited to producing a format that was a subset of a single vendor's own proprietary format. Addressing the needs of users of other products, accomodating the functionality of other products, improving the format, and other such efforts to ensure the widest degree of interoperability were not permitted. The Ecma development process was not an open standards process nor industry consensus, but instead was clearly dominated by a single vendor.
In contrast, ISO/IEC 26300 was developed and modified by cooperative work of many independent parties, including a number of significant improvements through technical changes in the format itself:
As noted earler, reviewers of Ecma 376 should be aware that issues raised by the Ecma 376 proposal—Microsoft's refusal to release the specifications for its legacy file formats and/or its refusal to support ISO 26300—variously are or may become involved in antitrust cases on two continents, thus raising a need for JTC-1's heightened scrutiny of the legal landscape before further processing of the Ecma 376 proposal. Not only National Standard Bodies are involved in the issues Ecma 376 raises. The courts and antitrust regulators are also involved. Thus, reviewers would be prudent to involve legal counsel in their review.
European Commission antitrust investigation
Acting on a February 22, 2006 complaint from the European Committee for Interoperable Systems ("ECIS") (ECIS press announcement, the European Commission's DG Competition is reportedly investigating whether Microsoft violated European Union antitrust laws by its refusal to support ISO 26300 in Microsoft Office and its refusal to divulge to competitors the specifications for its legacy Microsoft Office file formats. Both of those refusals are manifested in Ecma 376, as discussed in more detail below.
According to a transcript that was published by the press on July 12, 2006, Competition Director General Neelie Kroes was asked by a reporter:
[Y]ou've got another complaint before the Commission right now which looks at interoperability issues in other areas including, as I believe, Office. Have you made any decision on that complaint which was filed in February by the ECIS group [ED: http://www.e-c-i-s.org/archives/2006/02/brussels_22_feb.html] and do you have any feeling that the March 2004 Decision establishes a broader principle of interoperability that will animate the Commission's decisions as it goes forward?
The answer on the last part of your question is Yes, and the answer on the first part of your question is No, I don't have yet—made a decision.
No later statement on the status of that investigation has been found. The matter is apparently still pending.
European Union antitrust litigation
The European Commission's DG Competition previously found [PDF] that Microsoft's refusal to disclose its interoperability protocols for Windows and Windows Server to competitors was an antitrust violation. But DG Competition did not limit its remedial order to just the Windows communications protocols; it ordered Microsoft to refrain "from any act or conduct having … equivalent object or effect." Id., pg. 299.
The press release describing the ECIS complaint discussed above calls for DG Competition to "rapidly and broadly enforced" the limitations on Microsoft's conduct established in the 2004 order. It is not clear whether DG Competition, should it decide to pursue the matter, would do so as part of the original proceedings or begin a new legal proceeding. Presumably, the refusal to support ISO 26300 would require a separate proceeding, since the refusal to support a standard within the meaning of the Agreement on Technical Barriers to Trade was not an issue in the original proceeding, now on appeal in the European Court of First Instance.
U.S. v. Microsoft
Microsoft is operating under a broad antitrust consent decree (injunction) in the case of U.S. v. Microsoft, in which the court has retained jurisdiction to supervise Microsoft's compliance. Microsoft Office is at least arguably within the scope of the middleware as defined by section VI(K)(2)(b) of that decree. If deemed as such, Microsoft is required to disclose documentation for the Office file formats to competitors and others. It is known that the Plaintiffs in the antitrust litigation currently occuring in Iowa, USA, Comes v. Microsoft, have been granted leave to inform the U.S. Justice Department of certain materials, obtained in part in discovery in that case, that they believe is evidence of Microsoft failing to disclose APIs (application programming interfaces) to competitors in violation of the 2002 Final Judgment in United States v. Microsoft.
Ecma 376 raises serious issues as to the Agreement on Technical Barriers to Trade
Ecma 376 is not a typical proposal for an international standard. It arrives on the JTC-1 agenda at the behest of a vendor with an enduring monopoly share of the relevant world-wide market. That vendor chose not to participate in the multivendor collaboration that produced what became ISO 26300, the OpenDocument standard. Ecma Article 2 of the Agreement on Technical barriers to Trade, section 2.2 provides:
Members shall ensure that technical regulations are not prepared, adopted or applied with a view to or with the effect of creating unnecessary obstacles to international trade.
If Ecma 376 becomes an ISO standard, that would create a significant barrier to trade
See the earlier section, Ecma 376 cannot be reasonably implemented by other vendors.
The ISO standardization of Ecma 376 in its present state would result in an international standard that no vendor other than Microsoft could fully implement, For that reason, Ecma 376 would have "the effect of" granting Microsoft an exclusive monopoly over the high-fidelity migration of documents stored in its legacy file formats to Ecma 376 formats, a very substantial ostacle to international trade. For example, should a government procurement tender request bids for "an office software suite fully implementing Ecma 376 and capable of full fidelity in migrating any Microsoft Office file format from versions 97 through 2007 to Ecma 376," no vendor other than Microsoft could or would have a product meeting the tender's specification.
A new and separate standard may not be 'necessary'
Unquestionably, there is a market requirement for the high fidelity migration of legacy Microsoft Office files to XML. The pivotal issue is whether Ecma/Microsoft's claim is accurate that an entirely new applicative and overlapping office productivity file format standard is necessary to achieve the high fidelity migration to XML that the market requires. A related issue is whether a Microsoft-sponsored plugin for its office suite will be able to provide sufficient fidelity in conversions/transformations between Ecma 376 and ODF. The efficacy of that tool bears heavily on Microsoft's claim that Ecma 376 does not contradict ISO 26300, on the theory that the plugin provides sufficient interoperability.
The truth of Microsoft's claims can not be definitively determined at this time. There has been insufficient time to review the specification adequately, there are major omissions in the specification necessary for its evaluation, and suitable reference applications are not available for testing. We believe that JTC-1 stands in little better position.
We suggest that JTC-1 would benefit greatly from demonstrations of two plug-ins for Microsoft Office presently under development. One is the Microsoft plug-in just mentioned. Whether the Microsoft plugin will provide sufficient fidelity going between ISO 26300 and Ecma 376 to ease concerns about a contradiction of the former is an open question. No less an authority than Microsoft Chief Executive Officer Steve Ballmer reportedly said that Microsoft will not:
"attempt to build file converters that can make files 100 percent compatible between the two file formats. ... But it will achieve the level of interoperability that customers can work with, he said."
We believe that the Microsoft plug-in takes a problematic approach in that it does not add native support for ISO/IEC 26300 (OpenDocument) to Microsoft Office. It is based on an external XSL transformer supplemented by C# routines. Because it is an external process, it does not enable OpenDocument to be set as the default file save format in Microsoft Office, which is a significant limitation. The Microsoft plugin is due to be released this month, coinciding with the release of Microsoft Office 2007.
The second plug-in that we suggest JTC-1 arrange to have demonstrated is the da Vinci plug-in under development by the OpenDocument Foundation in aid of OpenDocument support in Microsoft Office for the Commonwealth of Massachusetts. The da Vinci plug-in provides native read-write support for OpenDocument to Microsoft Word, reportedly using Word's APIs for adding native file format support. Interestingly, its developers claim an ability to achieve 100 per cent fidelity in round-tripping documents between OpenDocument and Word's various binary format versions. We have contacted its developers and they have indicated willingness to arrange a demonstration for JTC-1. The Foundation's President, Gary Edwards, who is also one of the developers of the OpenDocument standard, can be contacted using contact information on the Foundation's web site.
Further information on the Foundation's methods and concepts:
- ODF as the Perfect MS Office File Format: How to: Add native file support for OpenDocument to Microsoft Office.
- Running on OpenDocument Inside of MS Office:
Perfect Conversion Fidelity & The daVinci OpenDocument Plugin for Microsoft Office.
Mr. Edwards has also indicated his willingness to demonstrate the plug-in using test files selected by JTC-1 and suggests that both the Microsoft and the Foundation plug-ins be demonstrated using the same test files, allowing an objective comparison of fidelity. The plug-in has previously been demonstrated for Massachusetts, the State of California, and the European Commission's IDABC.
With such demonstrations, we believe that JTC-1 will be in a far better position to evaluate: [i] whether there actually is a need for a separate standard for migration of legacy binary files to XML, specifically to the existing standard OpenDocument XML; [ii] whether full fidelity native OpenDocument support can be added to Microsoft Office via an installable plug-in despite any Microsoft refusal to harmonize its specification with the OpenDocument standard; and [ii] whether high fidelity transformations/conversions can be achieved going between OpenDocument and Ecma 376.
We also suggest that Ecma 376 be diverted from the Fast Track to allow for more thorough review of its specification and to allow its testing in appropriate reference applications. We believe that the market requirement (necessity) of a second standard cannot be rationally determined at this time. However, we offer an alternate, standards-based resolution.
Alternate, standards-based resolution
We stated earlier that we are unaware of any functionality in the Ecma 376 specification that cannot be expressed using the present ISO/IEC 26300 OpenDocument standard. But since Ecma 376 is over 6,000 pages long, it is impossible to know this with absolute certainty without additional time for review and the availability of appropriate reference applications for evaluation and testing purposes.
In this section we propose an alternate solution in the event that Ecma 376 does have a functionality that cannot be implemented in ISO 26300. We propose that Ecma 376 be replaced by a specification that extends ISO 26300 with whatever new tags are needed to cover the additional functionality.
The ISO/IEC 26300 OpenDocument standard is designed to be extensible. It is designed to admit new namespaces to adopt new functionality (see section 1.5 of ISO 26300). ISO 26300 was designed by a wide variety of organizations, with a wide variety of needs. Assuming that Ecma 376 truly contains functionality that can't be represented in ISO 26300, the correct way to specify this functionality is by defining an extension of ISO 26300.
Just as ISO 26300 references existing standards (SVG, SMIL, MathML, Dublin Core, ISO 8601, etc) whenever they provide overlapping functionality, so too should Ecma 376 reference ISO 26300 for any and all functionality that can be represented in ISO 26300. Ecma 376 should be limited to only specifying functionality that is not already provided by an existing international standard.
- Compatibility Note
- There should be very little or no need to extend the ISO 26300 specification to support existing Microsoft Office documents fully unless it is Microsoft specific such as Information Rights Management (IRM), Object Linking and Embedding (OLE) or Visual Basic for Applications (VBA). For example:
- There is no need to add an attribute like autoSpaceLikeWord95 because ISO 26300 already includes the generic attribute style:font-independent-line-spacing.
- There is no need to add an attribute like useWord97LineBreakRules because ISO 26300 already includes a generic style:line-break property to select the set of line breaking rules to use for text.
- There is no need to specify 7 pages of clip art, because ISO 26300 already allows the inclusion of any image in the document.
- Compatibility Note
- It should be pointed out that even for additional functionality, a new standard may not be strictly necessary. ISO 26300 already allows vendors to add their own custom tags to an OpenDocument file as long as they do so in their own namespace.
- For example, nothing prevents Microsoft from adding an attribute called microsoft:useWord97LineBreakRules or microsoft:VBAData or microsoft:OLEData to their files if they really want.
- Of course, if an extension to ISO 26300 is to be made, it is highly preferable that said extension be specified as an international standard as well.
Benefits of the alternate solution
This alternate solution (reworking Ecma 376 as an extension of ISO 26300 that only specifies additional functionality) has several important benefits:
- It avoids duplication. This drastically reduces the barrier to trade, and better matches the ISO primary goal of "one standard, one test, and one conformity assessment procedure accepted everywhere,”
- It removes conflicts with existing standards. ISO/IEC 26300 is already compatible with ISO 8601, ISO 639, W3C SVG, W3C MathML, W3C SMIL, W3C XML-ENC and the Gregorian Calendar. Reworking Ecma 376 as an extension of ISO 26300 would remove almost all conflicts with existing standards.
- It would reduce the spec size considerably. Much of the Ecma 376 specification merely duplicates functionality already specified in ISO and W3C standards. The considerable size reduction would make the specification much more accessible to scrutiny and implementation.
- It encourages multi-supplier interoperability. Many suppliers already implement ISO/IEC 26300, as it has been available for quite some time; some implementations even provide publicly readable source code. Building on a base of an international standard specifically designed for inter-application interoperability, which has been around much longer, is a more sensible starting point for any international standard.
Feasibility of the alternate solution
The alternate standards-based solution proposed is feasible. First, no less a software luminary than Tim Bray, co-lead developer of the W3C XML 1.0 recommendation, has advocated the same alternate solution:
The ideal outcome would be a common shared office-XML dialect for the basics—and it should be ODF [ISO 26300] (or a subset), since that's been designed and debugged—than another extended vocabulary to support Microsoft features, whether they're cool new whizzy features or mouldy old legacy features (XML Namespaces are designed to support exactly this kind of thing). That way, if you stayed with the basic stuff you'd never need to worry about software lock-in; the difference between portable and proprietary would be crystal-clear. And, for the basic stuff that everybody uses, there'd be only one set of tags. This outcome is technically feasible. Who could possibly be against it?
Second, Microsoft developers have told the Commonwealth of Massachusetts it would be "trivial" for Microsoft to implement OpenDocument in Microsoft Office:
But [former MA secretary of administration and finance Eric] Kriss insisted that the ODF [ISO 26300] policy wasn’t intended to be anti-Microsoft. He said technical people at Microsoft told him it would be “trivial” to add support for ODF to the new Office 2007. The resistance to doing so came from the vendor’s business side, according to Kriss.
Doing so would provide software users with interoperability as to all Microsoft Office features that can be mapped to ISO/IEC 26300 OpenDocument. That would provide software users immediate intra-vendor application interoperability for at least a huge portion of the functionality available in Microsoft Office. Many vendors' applications already support OpenDocument and Corel recently announced that it was developing ISO 26300 support for the last major office suite other than Microsoft's that does not already support OpenDocument, WordPerfect Office.
Combining the market shares of the office suites that already support OpenDocument, those that have ISO 26300 support under development, and Microsoft's suite would provide broad interoperability for approaching 100 per cent of all office productivity software users.
Third, Microsoft's Alan Yates, General Manager, Microsoft Information Worker Business Strategy, has predicted that ISO 26300 and Ecma 376 will in fact be harmonized:
I would say, in the future, some time, you know, at some point, there will be convergence [of ISO 26300 and Ecma 376]. Convergence does happen over a period of time. Or there will be incorporation, there will be subsetting, supersetting. You know, the wireless standard, the A version merged into the B version, merged into the G version over a period of time to give better performance and functionality over a period of time. ... So, good news, I think, on that front is that this problem will be solved in time. It is not an easy, sort of snap-your-fingers sort of problem.
(Transcript of meeting audio; audio downloadable here in mp3 format.)We can not overstress the importance of Mr. Yates' statement. From it one may infer that Microsoft recognizes:
- that Ecma 376 in fact duplicates and/or overlaps with the ISO 26300 standard; <li> that Ecma 376 and ISO 26300 are at present insufficiently compatible; <li> that the contradiction of ISO 26300 thereby acknowledged will require a longer period of time to resolve than is available within the limitations of fast-track processing by JTC1; and <li> that it is in fact feasible for the ODF standard to be adopted/adapted to meet Microsoft's software requirements.</ul></blockquote>
Microsoft needs an incentive to execute the alternate solution
To be blunt, the present situation amply proves that Microsoft lacks sufficient incentive to fully support ISO/IEC 26300 and to begin the process of adapting the ISO/IEC 26300 standard to serve any of its legitimate software requirement needs ISO/IEC 26300 may or may not presently support (an issue that need not be decided). JTC-1 is in the unique position of being able to provide Microsoft with the needed incentive by insisting that all Ecma 376 contradictions with the OpenDocument standard be removed before continuing the standardization process for Ecma 376.
Doing so would also execute on JTC-1's duty under the Agreement on Technical Barriers to Trade, Article 2 section 2.2:Members shall ensure that technical regulations are not prepared, adopted or applied with a view to or with the effect of creating unnecessary obstacles to international trade.
(Emphasis added.) Ecma 376 presents a situation in which the "preparation" of a proposed standard has "the effect of creating unnecessary obstacles to international trade." Microsoft has a monopoly position in the relevant software market. Any delay in informing the world's populace that Ecma 376 must not contradict the OpenDocument standard will result in millions of more people, businesses, and other institutions being persuaded to obtain and use software that contradicts the existing standard. That poses a rapidly worsening non-interoperability barrier to trade and a weakening of the adoption rate of the existing standard.JTC-1 has the word of Microsoft developers that it would be trivial for Microsoft to fully support OpenDocument and thereby resolve the interoperability barrier. A public announcement by JTC-1 that Microsoft will be required to work within the framework of the existing OpenDocument standard if it wishes to use standardized formats would dramatically mitigate the harm to competition being inflicted every day that Microsoft is able to say that its file formats are under consideration as an ISO standard.