Label Categorization
The following are a set of non-overlapping categorization of all labels of characters from [\-A-Za-z09], with examples. It is an elaboration of the distinctions made in defs.
Names for various subgroupings are also useful. For example, Terms 1-5 are all "putative A-Labels" or "ACE Prefix" labels. Terms 4-6 could be called "Broken IDN". Terms 2-6 could be called "Invalid IDN".
Relation between Unicode and Punicode
All Unicode strings are mapped (reversibly) by Punycode to one of the following (adding the ACE prefix):
A-Label
Fails-IDN5
Fails-IDN4-only
Overlong Punycode
Thus for each of 1-4 there is a corresponding Unicode String (Label):
U-Label
Unicode-Fails-IDN5
Unicode-Fails-IDN4-only
Overlong-Unicode.
Note that apparent Punycode strings might not map to Unicode, such as the "a" in "xn--a".
The term "LDH label" is defined in:
2.3.1.2. LDH-label and Internationalized Label
These specifications use the term "LDH-label" strictly to refer to an
all-ASCII label that obeys the preferred syntax (often known as
"hostname" (from RFC 952 [RFC0952]) or "LDH") conventions and that is
not an IDN.
That implies LDH = any valid LDH that is not an A-Label. In the diagram below, however (section 2.3.1.6 in defs), it shows LDH-Label as being neither an A-Label nor Broken IDN.
_______________________ _______________________
| ASCII Labels | | Non-ASCII |
| | | |
| ___________________| | __________________|
| |LDH-conforming (1)| | | U-label (2) |
| | | | |_________________|
| | ________________| | | |
| | | LDH-label | | | Binary Label |
| | |_______________| | | (including |
| | | A-label | | | high bit on) |
| | |_______________| | |_________________|
| | | | | | |
| | | Broken IDN | | | Bit String |
| | | e.g., xn--?,| | | Label |
| | | abc--def | | |_________________|
| | |_______________| |______________________|
| |__________________|
| ___________________|
| |Not-LDH-Conforming|
| | |
| | ________________|
| | |SRV & SRV-like |
| | | e.g., _tcp |
| | |_______________|
| | | Leading or |
| | | trailing |
| | | hyphens |
| | |_______________|
| | | Other non-LDH |
| | | ASCII chars |
| | | e.g., #$%&_ |
| | |_______________|
| |__________________|
|_____________________|
In the following statement it says "U-Label". This is incorrect. The application of sections 5.1-5.5 do not guarantee that the result is a U-Label, since they do not require the application of BIDI or Context rules. Similarly, we can't use the term "A-Label" (Sec 5.6, 5.7) since the putative A-Label may not be one.
5.6. Punycode Conversion
The validated string, a U-label, is converted to an A-label using the
Punycode algorithm with the ACE prefix added.