The following are a set of
non-overlapping categorization of all labels of characters from
[\-A-Za-z09], with examples. It is an elaboration of the distinctions
made in
defs.
| Label Term
|
Pattern
|
Definition | Examples
|
1
|
A-Label
|
xn--*
|
The * is valid punycode, passes IDN tests |
xn--bcker-gra ("bäcker")
|
2
|
Fails-IDN5
|
xn--*
|
The * is valid punycode <= 59 long, fails IDN Domain Name Lookup Protocol (Sec 5)
|
xn--g6h ("♥")
xn--bcker-gra ("Bäcker")
|
3
| Fails-IDN4-only
|
xn--*
|
The * is valid punycode <= 59 long, fails IDN Registration Protocol (Sec 4) but not Domain Name Lookup (Sec 5) | xn-a-0hc ("aא") |
4
|
Overlong Punycode
|
xn--*
|
The * is valid punycode but 60 bytes or more (invalid DNS). |
xn--o39a20gda89ku8a4mt2wnra67lzvaw9qrno41a245bf6am0w14sdib7zvppbz309c6da
("가낗나뇲다댯라럈마먔ᄇ뱟사샷악얐ᄌ쟛차챴카컀)
|
5
|
Invalid PunyCode
|
xn--*
|
The * is invalid Punycode. |
xn--a xn--
|
6
|
Invalid ACE Prefix
|
!x*--*
*!n--*
!x!n--*
|
The pattern has hyphens in position 3&4, but doesn't start with "xn" |
ab--g6h
|
7
|
Valid LDH
|
RFC 952 except above
|
length < 64,...
|
abc
|
8
| Other ASCII
| all but above
|
| $a3&
|
Names for various subgroupings are also useful. For example, Terms
1-5 are all "putative A-Labels" or "ACE Prefix" labels. Terms 4-6 could
be called "Broken IDN". Terms 2-6 could be called "Invalid IDN".
Relation between Unicode and Punicode
All Unicode strings are mapped (reversibly) by Punycode to one of the following (adding the ACE prefix):
-
A-Label
-
Fails-IDN5
- Fails-IDN4-only
- Overlong Punycode
Thus for each of 1-4 there is a corresponding Unicode String (Label):
- U-Label
- Unicode-Fails-IDN5
- Unicode-Fails-IDN4-only
- Overlong-Unicode.
Note that apparent Punycode strings might not map to Unicode, such as the "a" in "xn--a".
Inconsistency in current defs
The term "LDH label" is defined in:
2.3.1.2. LDH-label and Internationalized Label
These specifications use the term "LDH-label" strictly to refer to an
all-ASCII label that obeys the preferred syntax (often known as
"hostname" (from RFC 952 [RFC0952]) or "LDH") conventions and that is
not an IDN.
That implies LDH = any valid LDH that is not an A-Label. In the diagram
below, however (section 2.3.1.6 in defs), it shows LDH-Label as being neither an A-Label nor Broken IDN.
_______________________ _______________________
| ASCII Labels | | Non-ASCII |
| | | |
| ___________________| | __________________|
| |LDH-conforming (1)| | | U-label (2) |
| | | | |_________________|
| | ________________| | | |
| | | LDH-label | | | Binary Label |
| | |_______________| | | (including |
| | | A-label | | | high bit on) |
| | |_______________| | |_________________|
| | | | | | |
| | | Broken IDN | | | Bit String |
| | | e.g., xn--?,| | | Label |
| | | abc--def | | |_________________|
| | |_______________| |______________________|
| |__________________|
| ___________________|
| |Not-LDH-Conforming|
| | |
| | ________________|
| | |SRV & SRV-like |
| | | e.g., _tcp |
| | |_______________|
| | | Leading or |
| | | trailing |
| | | hyphens |
| | |_______________|
| | | Other non-LDH |
| | | ASCII chars |
| | | e.g., #$%&_ |
| | |_______________|
| |__________________|
|_____________________|
Inconsistency in protocol
In
the following statement it says "U-Label". This is incorrect. The
application of sections 5.1-5.5 do not guarantee that the result is a
U-Label, since they do not require the application of BIDI or Context
rules. Similarly, we can't use the term "A-Label" (Sec 5.6, 5.7) since
the putative A-Label may not be one.
5.6. Punycode Conversion
The validated string, a U-label, is converted to an A-label using the
Punycode algorithm with the ACE prefix added.