evolution-mime-registries.html

old version:

Managing Evolution in the Web: MIME, Registries, Extensibility

Larry Masinter for W3C TAG, 12/17/2011, draft for discussion, intended to become one or more TAG findings.

This document is circulated as part of TAG

[ACTION-595] Create a report on MIME and the Web
[ACTION-531] Draft document on architectural good practice related to registries
[ACTION-350] Revise http://lists.w3.org/Archives/Public/www-tag/2009Oct/0075.html based on feedback on www-tag and the feedback from TAG f2f 2009-12-09 discussion
[ACTION-636] Update product page for Mime and the Web
[ISSUE-41] What are good practices for designing extensible languages and for handling versioning
[ISSUE-66] The role of MIME in the Web Architecture

History [ACTION-241] Review TAG versioning situation and report back to TAG and HTML

Please discuss this document on www-tag@w3.org (archived).

(This document covers a lot of ground, and the above actions envision taking them on a small piece at a time; this represents a new strategy of Write first, modularize later.)

Introduction

This document discusses evolution of the web, and makes (will make, when completed) recomendations for best practices around managing evolution. In particular, it recommends practices for the use of MIME and MIME types in the Web, the establishment and use of registries, and managing references among technical specifications which require stability in the face of evolution of other components.

The value of the Internet and the Web is global communication among unrelated parties. Different implementations need to agree on the protocols, languages, and protocol elements in their communication for them to interoperate. Unmanaged evolution results in diminishing the interoperability of components, a cost to all which must be weighed against the benefits of evolution.

There are a number of issues that need to be addressed to help achieve the goal of careful evolution and global interoperability in an evolving world. Those recommendations include attention to the way in which standards allow for extensibility by adding values and new meaning to the protocol elements used within them, guidelines for establishing and using registries, and a model for evolution in a way that a standards organization can lead in the managed evolution of the technology available to its implementations.

Typographical convention

In a number of cases, terms used with a particular (narrow) definition within this document are used in bold face. Editorial comments are in parentheses and in italic. (Perhaps a subsequent edition will turn bold face into hyperlinks to the definition of this term within this document.)

Terminology and concepts: Evolution and Standards in the Web

It is useful to consider evolution of different aspects of how the Web works and evolves, and to be precise in the terminology when discussing evolution. This section primarily establishes a framework for these aspects, through careful use of terminology.

Definitions:

protocol: a general term for the way in which agents interact.
language: a component of interaction in which one party sends some data which is then interpreted by the receiver. (For simplicity, this document uses language to cover what is also known as a file format or file type).
protocol element: a component of a protocol or language, where the syntax and semantics of the protocol element is described independently. Multiple protocols and languages may use the same protocol element.

implementation: Software installed by agents to manage the interaction with others.

specification: a technical document which describes (some part of) a protocol, language, or protocol element, and gives rules for how implementations of them are expected to behave.
standard: a specification which represents some level of agreement among those planning to build, maintain, or use the implementations of the protocols, languages and protocol elements.

There are relationships between these, for example:

A protocol includes transmission between one party (the sender) to another (the receiver), where the transmission is intended to be interpreted by the reciever as a protocol element or according to a language.
A language can be thought of as a particularly complex protocol element.
Using the same protocol element in multiple languages and protocols often facilitates linking multiple applications together.
In a distributed system (consisting of multiple agents on a network), each agent uses implementations of the protocols and languages to interact with other agents.
In the Web, some content (instances of a language) is created by a person typing at a keyboard.
HTTP is a protocol. RFC 2616 is a standard that describes it.
HTTP supports transmission of data in a language where the language to be used is indicated by the “content-type” protocol element.
HTML is a language, the primary language used in the web. Other languages used in the Web are JPEG, GIF, and CSS.

Evolution is a process where all of these aspects change over time, to adapt to new requirements, add new features, fix problems that arise as circumstances, applications, and user needs change.

Managing evolution of standards in a world where there are multiple implementations involves coordinating the evolution of these different aspects:

protocols, languages, protocol elements: evolve as their common implementations evolve.
implementations: evolve as their implementers of them create or adopt new features; in many cases, that evolution requires evolution or addition to the protocols, languages and protocol elements the implementations use to communicate.
standards: evolve as new (versions, editions) specifications are written, made available, reviewed, discussed, revised, and eventually are agreed to. A new specification might lead implementations (proposing additions or changing) or follow implementations where the specification has been changed to match implementation behavior. Standards evolve by accepting new specifications as a new edition or version of previous specifications..

The process by which specifications become standards involves coordination between multiple parties, and significant review. Various means are used to assist the evolution of implementations while maintaining interoperability, without being unduly held back by the standards process.

Extensibility and evolution can be both positive and negative. [IAB-extension]. While extensibility allows innovation and deployment of new features, there is always a risk of unintended consequences, such as interoperability problems or security vulnerabilities. This risk is especially high if the extension is performed by a different team than the original designers, who may stray outside implicit design constraints or assumptions. Extensions should be done carefully and with a full understanding of the base protocol or language, existing implementations, and current operational practice.

Web Evolution and Identifiers

(This section is intended to lay out the design choices for whether to use a registry or a URI or some other means of associating values with symbols/names/codes/values. Is there a better word than 'identifier'?)

One way a standard can facilitate evolution is to allow for extensibility in some of the protocol elements it uses, through the ability to add new values for those protocol elements that were not part of the protocol or language at the time the specification was written. Some protocol elements used in a language or protocol may allow values which act as identifiers: determining the meaning those values requires information which is specified independently. In some cases, an identifier might come from a fixed set of values (e.g., identifiers for the days of the week or months of the year). But in many cases, evolution and extensibility are accomplished by changing the meaning of existing values or adding new values and associating meaning with those values. The Web uses many protocol elements which are identifiers; for example, character entities in HTML, content-types, uri schemes, color names, host names, HTTP headers. (see Appendix A for a more complete list.)

Implementations evolve by changing or extending the behavior of the implementation when identifiers are encountered, most commonly by adding new values.

Identifier methods

There are a variety of ways for allocating identifiers and associating meaning with them:

in specification:: Many specifications limit the set of identifiers allowed in a protocol element to those explicitly listed in the specification: extending or changing the meaning of the identifiers allowed requires a new specification. This still allows implementations to evolve (through private extensions), with the standard following. (Example: element names in HTML)
use a registry:: A registry for a protocol element has list of identifiers and corresponding information about the identifier, including references to specifications. A registry is maintained by a registrar (an organization or individual), and has associated processes for adding to or updating the registry. (Example: Internet media types, HTTP response codes).
use a URI (IRI) as the identifier:: Some protocol elements use a URI to name an extensibility point, where the URI itself provides a mechanism for determining the "meaning" of the extension [httpRange-14]. (Example: RDF)
use a "vendor prefix":: A "vendor prefix" is a short string which identifies an organization which controls one or more implementations. The organization maintaining an implementation uses prefixed identifiers for those their unique extensions. As extensions are made part of the standard, the unprefixed identifier is then substituted. (Example: CSS)

use URI-named namespace:: The protocol element uses an identifier in a way (with prefixes or scoped contexts or otherwise) where there is a URI-identified name space, and the meaning of individual identifiers are understood with respect to that namespace. This allows linking together multiple namespace values, and short identifiers. (Example: RDF with #).

Considerations for choosing an identifier method

(This section is intended to discuss the considerations when designing a protocol element which uses identifiers and the way those are managed. The basic problem is how to do this so that the specification and implementation and language evolution are kept in sync while not making old conforming implementations non conforming, etc. (still very sketchy). [IAB-extension] has a discussion of costs & benefits, but it tries to separate 'routine' and 'major' extension categories, based on the impact adding a new identifier has on the base protocol/language)

What are the processes for designing and deploying extensions?

assigning an identifier: Inventing a new identifier requires obtaining one that is not already used, and making information about the identifier available to others who need to know it
discovering the meaning of an identifier: finding out from the identifier information about it

using an identifier in a protocol or language: Identifiers are added to other languages and protocols; extracting metadata from the identifier

merging two identifiers: making one identifier obsolete
merging private identifier into public identifier space: avoiding

Some extensibility points have requirements that are not obvious or well-documented or well-understood, and could affect proper functioning in some way... if so, a process that has some qualification of whether it has passed meaningful review, whether someone other than the inventor of the registry item can update its specification. Lower cost of evolution, Preserve Interoperability, Matching reality, allow for private extensions, give implementers guidance about what is actually needed to be interoperable with other deployed systems, allow discovery of what is meaningful and important, insure the information is timely, doesn't go out of date, disappear, make sure that it is stable and evolvable at the same time. Manage transition from experimental to stable to standard, make sure the process for extending a standard needs to have similar characteristics as the standard itself, in terms of "fair" and "transparent", make sure the registry is as long-lived as the specifications that use it, avoid problems of trademark, spam, denial of service, ..., and Identifier length (Some protocols and languages are sensitive to the space usage and compressibility of long strings used as identifiers; in such cases, identifier length is a consideration)

Evaluation of different identifier methods

(this section is intended to give a broad evaluation of the categories against the evaluation criteria; also still an outline of notes ...)

In Specification: Low cost of implementation extension, higher cost of specification update, fairness depends on same standards process as anything else, long lifetime, transition from implementation to specification is painful but that's what standards are about.

Registry: (see Section Registries below for 'best pratices' when this is the right choice). Cost of setting up registry, managing it, expert review, benefit of avoiding interactions, fairness issues, trademark & spam. Allows using numbers and meaningless values to avoid trademark spam, and difficulties of internationalization.

Using URIs: Example: RDF. Meaning is discovered by httpRange14. Low cost (no registration process, might require maintaining URI. Very timely. Transition unnecessary. Lifetime up to lifetime of URI. Very fair. Hard to misuse because no registry. Preferred method, modulo longevity of URIs. Note that URN allows naming a registry as a URI.

Vendor Prefix: (example from CSS. Transition path difficulties outlined in [ref]

URI-named namespace: XML namespaces, RDF (?).

Findings

(Some things the TAG might 'find'? These are useless if nobody believes in them.):

Finding.DefineExtensibility: Extensibility and evolution must be planned and provided for in specifications that become standards. Standards that use identifiers should also specify the expected behavior of compliant implementations when confronted with unrecognized identifiers; for example, to distinguish between "must understand" and "must ignore" for unrecognized identifiers. Without also constraining implementation behavior, the fact that the specification might be extensible will not translate into an effective way of allowing implementations to evolve.

Finding.AvoidRegistries: (originally I drafted 'avoid registries'. Certainly non-IANA registries have a problem with the long-term viability and control of the registry over, say, a 20 year period. Avoiding registries are using IANA seem preferable for the long term. In the short-term, using a Wiki seems like it's ok? Whether or not you use a registry with some gatekeeping and review may depend on the cost of extensibility... if it's low, then use a URI or vendor prefix. if it's high, use In Specification. In the middle, use a Registry with review. )

Best Practices for Establishing and Using Registries

In the case where a registry is used for maintaining a list of identifiers and their meaning or pointers to their specifications, there are some common practices that will enhance the reliability and interoperability of the Web while allowing rapid evolution.

The Internet Assigned Numbers Authority (IANA)[IANADEF] is the primary organization whose charter and purpose is to maintain registries of values needed for Internet protocols and languages as specified in the IETF.[ref BCP from which this was quoted]. IANA administers the registries for many protocol elements in the core of the Web, including URI scheme names, Internet media type identifiers ("MIME types"), HTTP protocol header values, HTTP result codes, names of character sets. (See Appendix A.)

Detailed analysis of registries

(The idea is to extend the very brief analysis in the previous section to a more complete discussion of what makes a registry work or not work. So far this is just notes, very incomplete.)

Lower cost of evolution, Preserve Interoperability, Matching reality, allow for private extensions, give implementers guidance about what is actually needed to be interoperable with other deployed systems, allow discovery of what is meaningful and important, insure the information is timely, doesn't go out of date, disappear, make sure that it is stable and evolvable at the same time. Manage transition from experimental to stable to standard, make sure the process for extending a standard needs to have similar characteristics as the standard itself, in terms of "fair" and "transparent", make sure the registry is as long-lived as the specifications that use it, avoid problems of trademark, spam, denial of service, ..., and Identifier length (Some protocols and languages are sensitive to the space usage and compressibility of long strings used as identifiers; in such cases, identifier length is a consideration)

Update: A registry has a specific update policy. Matching reality: Registries tend to go out of step with reality unless costs of registration or registry update are low and benefits are high to at least one of the parties authorized to make a registration or update. (See "Matching Reality" below). Discovery: Manual discovery is hindered by many alternative places to find a registry, and the possibility of alternative locations (Wikipedia for MIME types, for example.) Timeliness: In particular for registries, there is a tension between establishing a long-lived extensibility point meaning, between cementing the value too soon (before consensus is reached) and too late (after widespread deployment), especially when those overlap (extensions are widely deployed before consensus is reached). The policy needs to move toward "registration before deployment" independent of where in the standards cycle that holds. If the standard differs from early deployment, the registry should be updated to point to not only the standard but also the facts for what one might encounter "in the wild".

Transition: If registries encode status in registered names (as the MIME registry does), transition and grandfathering are issues. Lifetime: The documents pointed to by IANA registry are not as long-lived as the registry itself, and much of the information is obsolete. See "Registry Stability" below.

Fairness: IANA is capable of administering a "fair" process with a reasonable dispute reslution mechanism, if those are specified at the time the registry is established. Wikis and other methods for maintaining a registry have more (or at least different) potential for abuse.

The IETF best current practice specification [BCP 26][RFC 5226] gives guidelines to protocol designers for establishing the registry rules associated with an IANA registry. Note that IANA acts as the operator of each registry, but itself does not evalute registry requests, but merely adminmisters a process by which the organization or individuals authorized to review or approve registry entries are accepted. These guidelines apply to IANA namespaces established or requested by W3C working groups or task forces.

Matching Reality

(want to summarize the HappIana problem space)

In some cases, implementations have evolved and the registries have not followed: the registries have not tracked the use of identifiers.. In some cases, the registry process is percieved as a bottleneck. If there is a registry, it is only useful if values are registered. A registry which does not match actual use (as is currently the case with URI schemes, Media Types) is not very useful.

Over time, divergence of meaning of identifiers used in the same protocol element but in different languages or protocols is harmful to interoperability. We need some way of creating a positive force so that, over the long run, divergence is reduced.

(try to address willful violations) Technical specifications that wish to override an existing registry for some values and use it for another should (a) attempt to correct the extensiting registry; in cases where it cannot, (b) a new "override" registry should be established with new values, where the spec points to the new registry. (????)

References in Registry Values

(see section References below)

Often, a registry will not contain the complete definition needed to understand the meaning of an identifier, and contains a pointer to a specification or specification series. For example, the Internet Media Type registry defining file formats and languages often contains a pointer to a specification. In those cases, the registry entry for an identifier might be updated, or the registry value itself established in a way that notes the possible evolution of the specification indicated.

Forking is a situation where there are multiple specifications or specification series which are associated with the same identifier used in a single protocol element.

An identifier in a registry which identifies a protocol or language may contain a pointer to a specification, but the use of that identifier in an implementation needs to identify the language or protocol intended, even when those have evolved beyond what the specification referenced has described.

(can we make recommendations for references in registries consistent with recommendations for references in specifications and standards?)

Finding.Series:: Registries should allow updates, and note warnings. In particular, documents rarely change without making a change which is incompatible in at least one direction (old content is invalid under the new definition, vs. new content is invalid or not processed interoperably in the old value.).
Finding.Forking:: If specifications are "forked" in incompatible ways, then use separate names for the forks. If the same name is used for multiple forks (specifications which diverge technically) and where different implementations are widely deployed, the registry should contain pointers to all of the different specification branches. This means that in those cases, the registry entry cannot be in the same document as a description of only one of the forks.

Indicating Standards Status of Registry Entries

Registry values and the specifications they point to typically go through a life-cycle, where a parameter is introduced experimentally, deployed in a limited or vendor-specific context, and then adopted more broadly.

Frequently, groups with registries or registered values attempt to convey status of a registered value in the name chosen within the registry, e.g., using an "x-" prefix for experimental names, "vnd." prefixes in internet media types, etc. In practice, these conventions are failures, counter-productive, because there is no simple deployment path when status changes, e.g., vendor proposed extension become public standards, experiments succeed, etc.

Extensibility requires review against criteria, but some identifiers haven't been reviewed and are unsafe. Registries at least give you the option of noting the review status of the proposed extension as a warning to implementors, if it is an extensibility point that requires careful review.

Finding.noStatusInName:: Do NOT attempt to encode parameter status in the name; do not use "vnd.", or "x-".
Finding.registrationEase:: There is a tradeoff between requiring that registry entries contain complete information and the goal of insuring the registry contains at least some information about identifiers likely to be encountered. In general, the cost of using unregistered values must be non-negligible to the organizations allowed or encouraged to register a value, if a distributed development community is to use the registry.

Organizational support

(this section is left here for the cases where a W3C spec needs a registry and IANA and its processes don't fit the needs... it's intended to be specific to W3C,)

How to encourage W3C staff & working group participants to manage the registration information, update the chair document on establishing and managing registries and extensibility points. Other registrations have their own administrative procedure. A regular "have obligations related to registration been met" check into the W3C document publication/advancement procedure.

The following conclusions reached:

Finding.use-IANA:: W3C specifications SHOULD use IANA registration methods for those extensibility points which are shared with other (IETF-managed) application protocols, rather than inventing their own registries.
Finding.explicit:: Any extensibility points in a W3C specification MUST be explicit about the method and management of the registration of new values in a public, fair, and transparent way.

MIME and the Web

In the web, two important protocol elements whose values are identifiers come from MIME (the Multipurpose Internet Mail Exchange set of specifications): the "Internet Media Type"and the "charset".

MIME: a framework for transmitting content within protocols
Internet Media Type: A protocol element used in MIME as an identifier of languages. There is an Internet Media Type registry which includes several values, including a pointer to a specification of the language.
content-type: A protocol element (used in the HTTP protocol, email protocols and many others) which uses the Internet Media Type protocol element, and (in some cases) additional parameters associated with that.
charset: A protocol element (used many Internet protocols and languages, including in the content-type parameter of HTTP and within XML and HTML) which is an identifier for scripts and their encoding.

The contexts of email and web are sufficiently different that some of the requirements for email registration and web registration, as well as practices in the deployment of implementations of agents that use MIME, have led to some mismatch between desired properties of the Internet Media Type protocol element.

The typical use pattern of email is that the transmission of data is unanticipated, and often between parties where the sender has no knowledge of the capabilities of the recipient. The typical use pattern of the web is that data is requested explicitly, and often much is known about the requirements and expected content for a retrieval.

Other Ways of determining language (without using content-type)

Often, it is quite possible, with relatively high accuracy, to determine the language of data by examining the data itself; in some cases in the web (retrieval by ftp or file system access), there is no independent channel for communicating content-type: other indicators and sniffing.

file extensions: A common practice in many systems was to use the end of the name of a file in the file system (the "file extension" ) as identifying the type of file. This practice has now extended to other systems.

sniffing: in many contexts, language can be guessed by looking for some unique string, number or pattern, which only appears in files of that language. In circumstances where this was a unique number, it was called a "magic number", although this concept has been extended to other textual patterns. In some cases, sniffing will be employed to override a (syntactically correct) content-type label, because of previous experience with mis-labeled content.

Information about these other ways of determining language were gathered for the Internet Media Type registry; those registering types are encouraged to also describe 'magic numbers', Mac file type, and common file extensions. However, since there was no formal use of that information, the quality of that information in the Internet Media Type registry is haphazard.

In some applications, implementations of some languages and protocols have interpreted identifiers in ways inconsistent with their registry entries, to the point where the specifications of those languages have neeeded to provide for "willful violations" of the registry entries (which cannot change if they are used differently by other languages and protocols which use the same protocol element.)

Internet Media Type registry problems

(this section is intended to summarize the problems with the MIME registry because fixing it and keeping it up to date will be a big job, if that's what we want to do).

Internet Media Types suffered from "poor registration performance":

Lots of file types aren't registered (no entry in IANA), even for file types that have been deployed for over a decade. For example, "image/svg+xml", "image/jp2" and "video/mp4" are not registered.
For many file types that are registration, the registration is incomplete or incorrect (people doing registration didn't understand 'magic number' or other fields).
The actual content deployed or created by deployed software doesn't match the registration.

Content Negotiation

(this section may belong somewhere else, but it's a use case for finer granularity descriptions of languages and file formats, and possibly some additional information not related to language at all....)

When two parties communicate and share more than one language (or version of a language) they might use for communication, the idea of "content negotiation" involves an exchange protocol where a choice among these methods is made. content negotiation is common: fax machine twirps to each other when initially connecting, negotiating resolution, compression methods and so forth. In Internet mail, "content negotiation" consists of the sender preparing and sending multiple alternative messages, and including them all in the same message.

For example, HTML email also often contains alternatives in plain text, labeled with content-type of "text/html" and "text/plain" respectively. In HTTP, a request might include "Accept" or "Accept-Charset" parameters which allow the responding web server to match the language of the response to the capabilities of the client. The "User-Agent" parameter in a HTTP request is often used for this purpose.

Content negotiation based on "Accept" and content-type has only been successful in limited contexts. Negotiating the content-type (e.g. HTML vs. Word vs. PDF) doesn't really happen: people want to make an explicit choice of downloading an MS Office or PDF depending on the goals they have that moment, instead of letting software pick a format for them. Negotiation of HTML vs. XHTML happens but is rare in the big picture and rarely offers true value to users.

Polyglot, Multiview, Specializations

(this section might also belong somewhere else, but it's one of the problems with MIME types and sniffing, that the same content might properly be considered to be in two different languages)

There are some interesting cases where the same content can be viewed as being in multiple languages:

Polyglot: A 'polyglot' document is one which is some data which can be treated as being in more than one language, but in the situation where meaning of the data is not significantly different in the two languages. Developing new languages in such a way that there are significant use cases for polyglot content is part of a transition strategy to allow content providers (senders) to manage, produce, store, deliver the same data, but with two different labels, and have it work equivalently with two different kinds of implementations (one of which knows one language, and another which knows another.) This use case was part of the transition strategy from HTML to an XML-based XHTML, and also as a way of a single service offering both HTML-based and XML-based processing (e.g., same content useful for news articles and Web pages.)
Multiview: This use case seems similar but it's quite different. In this case, the same data has very different meaning when served as two different content-types, but that difference is intentional; for example, the same data served as text/html is a document, and served as an RDFa type is some specific data.
Specialization: In these cases, there is a class-subclass hierarchy of languages, where the same document is both a general XML document as well as +xml, a general JSON data structure vs +json, stored using ZIP but used particularly for a purpose with a manifest. DNG and TIFF.

(Additional considerations... MIME assumes the sets of language uses are partitioned. PNG and its use in fireworks. Google re-using JPEG for new Google image format.)

Fragment identifiers

The Web added the notion of being able to address part of a content and not the whole content by adding a 'fragment identifier' to the URL that addressed the data. Of course, this originally made sense for the original Web with just HTML, but how would it apply to other content. The URL spec glibly noted that "the definition of the fragment identifier meaning depends on the Internet Media Type", but unfortunately, few of the Internet Media Type definitions included this information, and practices diverged greatly.

If the interpretation of fragment identifiers depends on the MIME type, though, this really crimps the style of using fragment identifiers differently if content negotiation is wanted.

Sniffing security uses scriptability info

If the Internet Media Type registry is more explicit about which kinds of content contain what kind of scriptability access, then the specifications for sniffing can reference the Internet Media Type registry to determine what kinds of sniffing constitute a 'privelege upgrade'.

Note that all sniffing can be a priviledge upgrade, if there is a buggy recipient, although bugs can be fixed, but spec violations are a problem.

Findings

(This section is still open as events are happening outside of the TAG, this is just an outline still).

Version numbers in-band
Use of references in MIME to point to (evoving) specifications?
"Happy Iana": make registration easier -- we support it
"Update MIME spec": review and make sure it meets web's needs
"Fix sniffing"
move "willful violations" out of specification and into registry or shadow registry

Specification References to other Evolving Specifications and Standards

(This section is derived from [SpecUpdate]. )

Specifications and registry entries frequently include references to other specifications. Sometimes those references are intended to reference a specific version or edition of the specification; in other cases, the reference is intended to allow for update and evolution.

Specifications evolve as new editions of them are written. Standards evolve as groups agree that a particular edition of a specification represents their common agreement. Specifications may have versions, editions, and forks.

Often, one specification will reference another specification. In some cases the reference is given informally, in others it the reference consists of a URI which identifies the specification.

Discuss status, editions, versions, standards, specifications, in IETF, W3C, other organizations. IETF has internet-draft, RFC. RFC can be Informational, Standards track, Experimental. W3C has editor's draft, working draft, proposed, candidate, rec. Explain W3C spec.

Discuss considerations for what you might want to optimize: evolution, stability.

Implementations sometimes lag behind specifications, yet implementations of new editions of referenced specifications should be encouraged.

Discuss need for clarity against goals

Define "Implementation-dependent"

If a choice is described as 'implementation-dependent', then conformant implementations must document which choice they make.

(following text from HT and CMSMQ needs review -- do we really want to impose this in MIME registrations? Goes against the registry rules of "put MIME reg in spec and update it there too".)

When citing a W3C specification in the normative references section of another specification or a registry care should be taken to be clear about the status of editions of the referenced specification other than the then-current one.

Left-Handed Sewer Flutes 1.0 (Second edition), P.D.Q. Bach and Peter Schickele, Editors. World Wide Consortium, 29 February 2009. The edition cited (http://www.w3.org/TR/2009/REC-lhsf-20090229/) is the earliest appropriate for use with this specification. Conformant implementations may follow the edition cited and/or any later edition(s). The latest edition of LHSF 1.0 is available at http://www.w3.org/TR/lhsf/. It is implementation-defined which editions of LHSF 1.0 are supported.

The appropriateness of this approach is based on the W3C rules regarding what constitutes an acceptable new edition of an existing W3C Recommendation. [cite]

For references to publications from other standards bodies with similar expectations regarding backwards compatibility, for example IETF or ISO, a similar approach to citation is also called for, along the following lines:

The Extension of MIME Content-Types to a New Medium, N. Borenstein and M. Linimon. Internet Engineering Task Force RFC 1437, 1 April 1993. RFC 1437 was current at the date of publication of this specification, but may be updated or obsoleted by later RFCs. Conformant implementations may follow the RFC cited and/or any later RFCs which update or obsolete it. It is implementation-defined which RFCs are supported. Intelligent transport systems -- Physical characterisation of vehicles and equipment -- International airline seat pitch measurements. Part 1: Measurement architecture. International Standard ISO 314159-1:2009, 29 February 2009. The referenced specification may from time to time be amended, replaced by a new edition, or expanded by the addition of new parts. See http://www.iso.org/iso/home.htm for up-to-date information. Conformant implementations may follow the edition cited and/or any amendments etc. It is implementation-defined which amendments etc. are supported.

Or a general caveat:

Dated references below are to the earliest known or appropriate edition of the referenced work. The referenced works may be subject to revision, and conformant implementations may follow, and are encouraged to investigate the appropriateness of following, some or all more recent editions or replacements of the works cited. It is in each case implementation-defined which editions are supported.

and then simply

Left-Handed Sewer Flutes 1.0 (Second edition), P.D.Q. Bach and Peter Schickele, Editors. World Wide Consortium, 29 February 2009 (http://www.w3.org/TR/2009/REC-lhsf-20090229/). The latest edition of LHSF 1.0 is available at http://www.w3.org/TR/lhsf/.

The Extension of MIME Content-Types to a New Medium, N. Borenstein and M. Linimon. Internet Engineering Task Force RFC 1437, 1 April 1993.

Intelligent transport systems -- Physical characterisation of vehicles and equipment -- International airline seat pitch measurements. Part 1: Measurement architecture. International Standard ISO 314159-1:2009, 29 February 2009. See http://www.iso.org/iso/home.htm for up-to-date information.

Acknowledgements

(keep this section up to date? )

References

[BCP26]: Guidelines for Writing an IANA Considerations Section in RFCs, BCP 26, RFC ...

[IABext] Design Considerations for Protocol Extensions work in progress, Internet Draft

[Friendly] Friendly Registries, work in progress, Wiki Page, requirements and a place to gather explicit proposals

[HappyIana] https://www.ietf.org/mailman/listinfo/happiana

[mime-web-info] http://tools.ietf.org/html/draft-masinter-mime-web-info

[LinkRelation] http://lists.w3.org/Archives/Public/www-tag/2011May/0006.html

[sniff] http://tools.ietf.org/html/draft-ietf-websec-mime-sniff

[MediaTypeFinding] Internet Media Type registration, consistency of use TAG Finding 3 June 2002 (Revised 4 September 2002)

[MIMEGuidelines] Register an Internet Media Type for a W3C Spec (W3C guidelines on registering types)

[MediaRegUpdate] Media Type Specifications and Registration Procedures, Intenet Draft, work in progress

[NoX] X- parameters harmful (Peter St. Andre)

[SpecUpdate] Best Practice for Referring to Specifications Which May Update [email draft, H. Thompson, C.M. Sperberg-McQueen]

[VendorFlap]

[HTML5-charset]	Hickson, I., “HTML5: A vocabulary and associated APIs for HTML and XHTML (8.2.2.1 Determining the character encoding).”
[RFC1521]	Borenstein, N. and N. Freed, “MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies,” RFC 1521.
[RFC1522]	Moore, K., “MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text,” RFC 1522, September 1993.

http://www.w3.org/2005/10/Process-20051014/tr

Comments welcome. ht [1] http://www.w3.org/2001/tag/2009/09/23-minutes#item03 [2] http://www.w3.org/2001/tag/group/track/actions/303

text/html MIME type RFC

[IAB-extension] Design Considerations for Protocol Extensions, B. Carpenter, B. Aboba, S. Cheshire, Internet Architecture Board (Internet Draft, work in progress).

[IAB-success] What Makes for a Successful Protocol?, RFC 5218, D. Thaler, B. Aboba, Internet Architecture Board, July 2008.

[IETF-extensions] Procedures for Protocol Extensions and Variations, RFC 4775, BCP 125, S. Bradner, B. Carpenter, T. Narten, December 2006.

[tag-versioning] Review of TAG work on versioning (email Larry Masinter 4 2009)

Appendix A: Identifiers used in Web Protocols and Languages

((The idea for this appendix is to make a list of the identifiers used in web protocols and languages and their identifier method, including, for registries, the name and location and properties of the registry used. If possible, it would be great to note the versioning and extensibility process is (a) specified (b) used in practice.))

HTTP
- methods (GET, PUT, POST, DELETE, PATCH, BREW)
- hosts (server types)
- paths (schemes, well-known locations, headers, header values
- return codes
- content type main type
  - content type parameter values for some parameters (e.g., charset)
HTML
- link relations
CSS
JavaScript
...
URI
- URI schemes'
XSLT
XML
...

Appendix B: Left Over bits

(Some things to talk about more or in related specs -- this whole section should go away or be turned into independent specs.)

Discussions: forking: two specs, different languages, same Internet Media Type, what goes in the registry? Is this covered?

Evolution toward normalization -- how to get registries to match reality over time

URI schemes identify protocols (do they?)

Precision of specifications vs. range of allowable implementations (google+ discussion w/hixie)

Normative requirements in specification relationship tobehavior of implementations

Roles within a specification (reader, writer, client, server, proxy), Balance of power between implementors of different roles

Agents and user agents

Is there more about "living standard" ?

"follow-your-nose" for using URIs vs. registries? Or else convincing registries to have stable URIs for registered values.

Reasons for a "registry":

to avoid conflict (main purpose for all of the methods)
to set a bar and set review - you want to have a quality of anything introduced
to provide look-up
limit the number because there is a cost of introducing each one

For example, some protocol designers thought a new URI scheme could cause a lot of extra work. For HTML tags, when you introduce a new section, everyone needs to understand that who implements browsers.

But if you add metadata, it's no skin off anyone's nose. so you have 2 situations - one on which you need whole community to get involved and one in which anyone besides a sub-community can ignore.

([12]http://lists.w3.org/Archives/Public/www-tag/2011Dec/0049.html) Roy Fielding as calling mustUnderstand-based approaches "socially reprehensible" we need a decision tree - questions to answer to understand what kind of extension you're doing and which of these techniques you should use

Compound extensibility points: when a new version of an exensibility point defines a new context in which old extensibility points are interpreted. (This is "willful violation" territory, if not also "sniffing" territory). see discussion following http://lists.w3.org/Archives/Public/www-archive/2011Nov/0009.html

User-Agent is a protocol element which identifies implementations and their versions

Extensibility through Modularity ... instead of one big spec you have multiple specs so that the individual parts can evolve without having to review revisions the whole thing... Good if you manage cross-references and make sure the modules are aware of requirements.

Implementations, Roles, Conformance, and Evolution: A specification describes a protocol, language, protocol element, and rules for implementations of the specifications are to behave.

Following sections may also appear in other documents

Compatibility under evolution

This section would talk about forward, bckward compatibility and requirements... Some times new evolutions are with old ones, sometimes not. Compatibility has many meanings: for example, for a language, it is desirable if new documents work reasonably (with some fallback) with old readers and that old documents work reasonably with new readers. The meaning of "reasonably" varies: "works reasonably" is softened to "either works reasonably or gives clear warning about nature of problem (version mismatch)."

Versions, Editions, Variants, Forking

(This section would (a) note the issues of compatibility (backward, forward) and the relationship to version, edition, errata, corrigem; discuss the debate about using and assigning version indicators, in-band or out-of-band, the DOCTYPE controversy, the HTML5 vs. HTML forever, JavaScript, etc. and summarize the various pros and cons against the considerations of widespread deployment, motivation to 'reverse engineer', the 'quirks mode' problem, race-to-the-bottom) See [IAB-extensions] section 4.1 for version number discussion. Modes (quirks mode, near standards mode) in receivers attempting to adapt to evolution by mis-using version indicator. History of bad versioning practices?

Languages don't have versions:: specifications have versions. most languages used on the web don't have versions, in that most implementations of readers of the language are written to try to adapt whatever data they get to they get to whatever the implementors believe is the best they can do to satisfy user's expectations, as well or better than any other implementation, subject to the internal constraints and architecture of the implementation. In these situations, where features are implemented incrementally and are not orthogonal extensions, using a version indicator to distinguish author's intent is unacceptable. The version indicator at best gives you some (but not a very good) idea of who to blame.

Implementations have versions: Implementations have versions, and, in particular, what authors of content might want to know (or select among) is what set of language or protocol features (or versions of those features) are supported (correctly) by the receiving implementations. This leads to doing content-negotiation based on User-agent.

Implementations, Roles, Conformance

(this section would talk about how a protocol has "roles": client, server, proxy, user agent; specifications describe a language used by many of the roles, and a protocol between muttiple parties, each of which has a role in a particular transaction. Discuss the relationship between strict and loose conformance requirements; specs intended for multiple roles but reviewed only by one, difference between "ease of implementation" vs. "breadth of allowable implementations". http://intertwingly.net/blog/2009/04/08/HTML-Reunification

Evolution of Protocols and Languages vs. Evolution of Natural Languages

This section would draw analogies and distinctions between how formal and natural languages evolve (quite a literature on this).

War Stories

Here lie some past controversies

The javascript version pragma

The HTML version indicator (The HTML doctype)