A. Original http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-21.html#canonical
|
B. Suggested
(I started with Addison's
formulation, and then tried to combine all the suggestions that did not
materially change the content; I also separated out the examples and
explanation into sub-bullets for to keep them from muddling the main
rules.) |
C. Single canonical form
default canonical form => canonical form
extlang canonical form => extlang form |
| Since a particular language tag is sometimes used by many processes,
language tags SHOULD always be created or generated in a canonical
form.
|
Since a particular language tag is sometimes used by
many processes, language tags SHOULD always be created and processed in
a canonical form.
There are two canonical forms for language tags: the
'default' canonical form has no extlang subtag, while the 'extlang'
canonical form has an extlang where possible. Normally, the default
canonicalization is preferred. However, the extlang canonical form
may be useful in environments where the presence of the macrolanguage
subtag is considered beneficial in matching or selection (see <xref
target="choiceUsingExtlang"></xref>).
|
[same as B]
[Remove from B: "There are two canonical forms... ...considered beneficial in matching or selection."]
|
A language tag is in canonical form when:
1. The tag is well-formed according the rules in Section 2.1 (Syntax) and
Section 2.2 (Language Subtag Sources and Interpretation)
|
A language tag is in a canonical form, either default or
extended, when the tag is well-formed according the rules in <xref
target="syntax"/> and <xref target="sources"/> and it has been
canonicalized by applying each of the following steps in order, using
data from the IANA registry (see <xref target="ianaformat"/>): |
A language tag is in a canonical form when the tag is well-formed according the rules in <xref
target="syntax"/> and <xref target="sources"/> and it has been
canonicalized by applying each of the following steps in order, using
data from the IANA registry (see <xref target="ianaformat"/>): |
5. If more than one extension subtag sequence exists, the extension
sequences are ordered into case-insensitive ASCII order by singleton
subtag (that is, the subtag sequence '-a-babble' comes before '-b-warble')
[ moved to #1, as Kent did, because it makes the exposition of the option simpler. ]
|
1. Extension sequences are ordered into case-insensitive ASCII
order by singleton subtag.
- That is, the subtag sequence '-a-babble'
comes before '-b-warble'.
|
[same as B] |
| 2. Redundant or grandfathered tags that have a Preferred-Value mapping
in the IANA registry (see Section 3.1 (Format of the IANA Language Subtag Registry))
MUST be replaced with their mapped value. These items either are
deprecated mappings created before the adoption of this document (such
as the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are the
result of later registrations or additions to this document (for
example, "zh-hakka" was deprecated in favor of the ISO 639-3 code 'hak'
when this document was adopted). These mappings SHOULD be done before
additional processing, since there can be additional changes to subtag
values. These field-body of the Preferred-Value for grandfathered and
redundant tags is an "extended language range" ([RFC4647] (Phillips, A. and M. Davis, “Matching of Language Tags,” September 2006.)) and might consist of more than one subtag.
|
2. Redundant or grandfathered tags are replaced by their Preferred-Value, if there is one.
- These
items are either deprecated mappings created before the adoption of
this document (such as the mapping of "no-nyn" to "nn" or "i-klingon"
to "tlh") or are the result of later registrations or additions to this
document (for example, "zh-hakka" was deprecated in favor of the ISO
639-3 code 'hak' when this document was adopted).
- These
field-body of the Preferred-Value for grandfathered and redundant tags
is an "extended language range" (<xref
target="RFC4647"></xref>) and might consist of more than one
subtag.
|
[same as B] |
3. Subtags of type 'extlang' SHOULD be mapped to their
Preferred-Value. The field-body of the Preferred-Value for extlangs is
an "extended language range" and typically maps to a primary language
subtag. For example, the subtag sequence "zh-hak" (Chinese, Hakka)
would be replaced with the tag "hak" (Hakka).
4. Other subtags that have a Preferred-Value field
in the IANA registry (see Section 3.1 (Format of the IANA Language Subtag Registry))
MUST be replaced with their mapped value. Most of these are either
Region subtags where the country name or designation has changed or
clerical corrections to ISO 639-1.
|
3. Subtags are replaced by their Preferred-Value, if there is one. For extlangs, the original primary language
subtag is also replaced if there is a primary language subtag in the Preferred-Value.
- The field-body of the Preferred-Value for extlangs is
an "extended language range" and typically maps to a primary language
subtag. For example, the subtag sequence "zh-hak" (Chinese,
Hakka) would be replaced with the tag "hak" (Hakka).
- Most of the non-extlang subtags are either Region
subtags where the country name or designation has changed or clerical
corrections to ISO 639-1.
|
[same as B] |
|
4. In the extlang canonical form (but not the default canonical form), primary language subtags that are also extlang subtags are prepended with the extlang's Prefix.
- For example, "hak-CN" (Hakka, China) has primary language subtag 'hak', which in turn has an 'extlang' record with a Prefix 'zh' (Chinese). The 'extlang' canonical form would be "zh-hak-CN" (Chinese, Hakka, China).
- Note that Step 4 may restore a subtag that was removed by Step 3.
|
[Replace B: "4. In the extlang canonical form (but not... ...Step 3." by the following]
The canonical form has no extlang subtag. There is an alternate 'extlang form'
that modifies the canonical form so that primary language subtags that are also
extlang subtags are prepended with the extlang's Prefix. This form
may be useful in environments where the presence of the Prefix
subtag is considered beneficial in matching or selection (see <xref
target="choiceUsingExtlang"></xref>).
- For example, "hak-CN" (Hakka, China) has primary language subtag 'hak', which in turn has an 'extlang' record with a Prefix 'zh' (Chinese). The extlang form would be "zh-hak-CN" (Chinese, Hakka, China).
|