Mapping Differences

Here is a comparison of the mapping used in the current IDNA (IDNA2003) with the proposed IDNA2008.
  1. The first page shows characters where the mappings are not equal, and where the 2003 mapping gives a valid 2003 string, and the 2008 mapping gives a valid 2008 string. So these are the truely problematic cases.
    1. The columns indicating validity after each sample column are marked with an abbreviation consisting of a letter and digit:
      1. Valid vs x=not valid
      2. 3 for 2003 vs 8 for 2008.
    2. The first column shows the Unicode version for the character. Note that the characters after U3.2 are valid in IDNA2003 clients, but unmapped.
      1. The first 4 are exceptions
      2. The next 5 are due to changes in NFC, before it was made completely stable by Unicode.
      3. The rest are due to the additions of case mappings, before case folding was made completely stable by Unicode.
    3. ~ means that the mapping is the same as the source.
  2. The second page shows the strings clustered into groups according to their validity characteristics under mapping. A sample from each group is provided (the highest frequency character on the web), together with the count of characters.
    1. All of the items on the previous page are grouped under on of three groups:
      1. 03C2 (ς) GREEK SMALL LETTER FINAL SIGMA (the first 4), or
      2. 2F9BF (䗗) CJK COMPATIBILITY IDEOGRAPH-2F9BF (the next 5), or
      3. 04C0 (Ӏ) CYRILLIC LETTER PALOCHKA (the rest)
    2. The orange items (⇹) in the Eq? column are those with different mappings under IDNA2003 vs IDNA2008. Many of them don't matter because they are not valid under either scheme.
    3. The character counts are the number of Unicode characters in each group. Counts marked with re red are lower frequency on the web.
  3. The third page shows more samples of each group, with the top ten characters in the group (measured by web frequency).
For a larger view, see

IDNA Mapping Differences