LTRU Canonicalization

A. Original
http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-21.html#canonical
B. Suggested
(I started with Addison's formulation, and then tried to combine all the suggestions that did not materially change the content; I also separated out the examples and explanation into sub-bullets for to keep them from muddling the main rules.)
C. Single canonical form

default canonical form => canonical form
extlang canonical form => extlang form
Since a particular language tag is sometimes used by many processes, language tags SHOULD always be created or generated in a canonical form. Since a particular language tag is sometimes used by many processes, language tags SHOULD always be created and processed in a canonical form.

There are two canonical forms for language tags: the 'default' canonical form has no extlang subtag, while the 'extlang' canonical form has an extlang where possible. Normally, the default canonicalization is preferred. However, the extlang canonical form may be useful in environments where the presence of the macrolanguage subtag is considered beneficial in matching or selection (see <xref target="choiceUsingExtlang"></xref>).
[same as B]

[
Remove from B: "There are two canonical forms... ...considered beneficial in matching or selection."]
A language tag is in canonical form when:

1. The tag is well-formed according the rules in Section 2.1 (Syntax) and Section 2.2 (Language Subtag Sources and Interpretation)
A language tag is in a canonical form, either default or extended, when the tag is well-formed according the rules in <xref target="syntax"/> and <xref target="sources"/> and it has been canonicalized by applying each of the following steps in order, using data from the IANA registry (see <xref target="ianaformat"/>): A language tag is in a canonical form when the tag is well-formed according the rules in <xref target="syntax"/> and <xref target="sources"/> and it has been canonicalized by applying each of the following steps in order, using data from the IANA registry (see <xref target="ianaformat"/>):
5. If more than one extension subtag sequence exists, the extension sequences are ordered into case-insensitive ASCII order by singleton subtag (that is, the subtag sequence '-a-babble' comes before '-b-warble')
[ moved to #1, as Kent did, because it makes the exposition of the option simpler. ]
1. Extension sequences are ordered into case-insensitive ASCII order by singleton subtag.
  • That is, the subtag sequence '-a-babble' comes before '-b-warble'.
[same as B]
2. Redundant or grandfathered tags that have a Preferred-Value mapping in the IANA registry (see Section 3.1 (Format of the IANA Language Subtag Registry)) MUST be replaced with their mapped value. These items either are deprecated mappings created before the adoption of this document (such as the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are the result of later registrations or additions to this document (for example, "zh-hakka" was deprecated in favor of the ISO 639-3 code 'hak' when this document was adopted). These mappings SHOULD be done before additional processing, since there can be additional changes to subtag values. These field-body of the Preferred-Value for grandfathered and redundant tags is an "extended language range" ([RFC4647] (Phillips, A. and M. Davis, “Matching of Language Tags,” September 2006.)) and might consist of more than one subtag. 2. Redundant or grandfathered tags are replaced by their Preferred-Value, if there is one.
  • These items are either deprecated mappings created before the adoption of this document (such as the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are the result of later registrations or additions to this document (for example, "zh-hakka" was deprecated in favor of the ISO 639-3 code 'hak' when this document was adopted).
  • These field-body of the Preferred-Value for grandfathered and redundant tags is an "extended language range" (<xref target="RFC4647"></xref>) and might consist of more than one subtag.
[same as B]
3. Subtags of type 'extlang' SHOULD be mapped to their Preferred-Value. The field-body of the Preferred-Value for extlangs is an "extended language range" and typically maps to a primary language subtag. For example, the subtag sequence "zh-hak" (Chinese, Hakka) would be replaced with the tag "hak" (Hakka).

4. Other subtags that have a Preferred-Value field in the IANA registry (see Section 3.1 (Format of the IANA Language Subtag Registry)) MUST be replaced with their mapped value. Most of these are either Region subtags where the country name or designation has changed or clerical corrections to ISO 639-1.
3. Subtags are replaced by their Preferred-Value, if there is one. For extlangs, the original primary language subtag is also replaced if there is a primary language subtag in the Preferred-Value.
  • The field-body of the Preferred-Value for extlangs is an "extended language range" and typically maps to a primary language subtag. For example, the subtag sequence "zh-hak" (Chinese, Hakka) would be replaced with the tag "hak" (Hakka).
  • Most of the non-extlang subtags are either Region subtags where the country name or designation has changed or clerical corrections to ISO 639-1.
[same as B]

4. In the extlang canonical form (but not the default canonical form), primary language subtags that are also extlang subtags are prepended with the extlang's Prefix.
  • For example, "hak-CN" (Hakka, China) has primary language subtag 'hak', which in turn has an 'extlang' record with a Prefix 'zh' (Chinese). The 'extlang' canonical form would be "zh-hak-CN" (Chinese, Hakka, China).
  • Note that Step 4 may restore a subtag that was removed by Step 3.
[Replace B: "4. In the extlang canonical form (but not...
...Step 3." by the following]

The canonical form has no extlang subtag. There is an alternate 'extlang form' that modifies the canonical form so that primary language subtags that are also extlang subtags are prepended with the extlang's Prefix. This form may be useful in environments where the presence of the Prefix subtag is considered beneficial in matching or selection (see <xref target="choiceUsingExtlang"></xref>).
  • For example, "hak-CN" (Hakka, China) has primary language subtag 'hak', which in turn has an 'extlang' record with a Prefix 'zh' (Chinese). The extlang form would be "zh-hak-CN" (Chinese, Hakka, China).




Comments