Mapping Differences

IDNA Mapping Differences

Here is a comparison of the mapping used in the current IDNA (IDNA2003) with the proposed IDNA2008.

1. The first page shows characters where the mappings are not equal, and where the 2003 mapping gives a valid 2003 string, and the 2008 mapping gives a valid 2008 string. So these are the truely problematic cases.
  1. The columns indicating validity after each sample column are marked with an abbreviation consisting of a letter and digit:
    1. Valid vs x=not valid
    2. 3 for 2003 vs 8 for 2008.
  2. The first column shows the Unicode version for the character. Note that the characters after U3.2 are valid in IDNA2003 clients, but unmapped.

- - 1. The first 4 are exceptions
    2. The next 5 are due to changes in NFC, before it was made completely stable by Unicode.
    3. The rest are due to the additions of case mappings, before case folding was made completely stable by Unicode.

1. 1. ~ means that the mapping is the same as the source.
2. The second page shows the strings clustered into groups according to their validity characteristics under mapping. A sample from each group is provided (the highest frequency character on the web), together with the count of characters.

1. The third page shows more samples of each group, with the top ten characters in the group (measured by web frequency).

Page updated

Google Sites

Report abuse