Label Categorization

The following are a set of non-overlapping categorization of all labels of characters from [\-A-Za-z09], with examples. It is an elaboration of the distinctions made in defs.

Names for various subgroupings are also useful. For example, Terms 1-5 are all "putative A-Labels" or "ACE Prefix" labels. Terms 4-6 could be called "Broken IDN". Terms 2-6 could be called "Invalid IDN".

Relation between Unicode and Punicode

All Unicode strings are mapped (reversibly) by Punycode to one of the following (adding the ACE prefix):

    • A-Label

    • Fails-IDN5

    • Fails-IDN4-only

    • Overlong Punycode

Thus for each of 1-4 there is a corresponding Unicode String (Label):

    1. U-Label

    2. Unicode-Fails-IDN5

    3. Unicode-Fails-IDN4-only

    4. Overlong-Unicode.

Note that apparent Punycode strings might not map to Unicode, such as the "a" in "xn--a".

Inconsistency in current defs

The term "LDH label" is defined in:

2.3.1.2. LDH-label and Internationalized Label

These specifications use the term "LDH-label" strictly to refer to an

all-ASCII label that obeys the preferred syntax (often known as

"hostname" (from RFC 952 [RFC0952]) or "LDH") conventions and that is

not an IDN.

That implies LDH = any valid LDH that is not an A-Label. In the diagram below, however (section 2.3.1.6 in defs), it shows LDH-Label as being neither an A-Label nor Broken IDN.

_______________________ _______________________

| ASCII Labels | | Non-ASCII |

| | | |

| ___________________| | __________________|

| |LDH-conforming (1)| | | U-label (2) |

| | | | |_________________|

| | ________________| | | |

| | | LDH-label | | | Binary Label |

| | |_______________| | | (including |

| | | A-label | | | high bit on) |

| | |_______________| | |_________________|

| | | | | | |

| | | Broken IDN | | | Bit String |

| | | e.g., xn--?,| | | Label |

| | | abc--def | | |_________________|

| | |_______________| |______________________|

| |__________________|

| ___________________|

| |Not-LDH-Conforming|

| | |

| | ________________|

| | |SRV & SRV-like |

| | | e.g., _tcp |

| | |_______________|

| | | Leading or |

| | | trailing |

| | | hyphens |

| | |_______________|

| | | Other non-LDH |

| | | ASCII chars |

| | | e.g., #$%&_ |

| | |_______________|

| |__________________|

|_____________________|

Inconsistency in protocol

In the following statement it says "U-Label". This is incorrect. The application of sections 5.1-5.5 do not guarantee that the result is a U-Label, since they do not require the application of BIDI or Context rules. Similarly, we can't use the term "A-Label" (Sec 5.6, 5.7) since the putative A-Label may not be one.

5.6. Punycode Conversion

The validated string, a U-label, is converted to an A-label using the

Punycode algorithm with the ACE prefix added.