Here is a comparison of the mapping used in the current IDNA (IDNA2003) with the proposed IDNA2008. - The first page shows characters where the mappings are not equal, and where the 2003 mapping gives a valid 2003 string, and the 2008 mapping gives a valid 2008 string. So these are the truely problematic cases.
- The columns indicating validity after each sample column are marked with an abbreviation consisting of a letter and digit:
- Valid vs x=not valid
- 3 for 2003 vs 8 for 2008.
- The first column shows the Unicode version for the character. Note that the characters after U3.2 are valid in IDNA2003 clients, but unmapped.
- The first 4 are exceptions
- The next 5 are due to changes in NFC, before it was made completely stable by Unicode.
- The rest are due to the additions of case mappings, before case folding was made completely stable by Unicode.
- ~ means that the mapping is the same as the source.
- The second page shows the strings clustered into groups according to their validity characteristics under mapping. A sample from each group is provided (the highest frequency character on the web), together with the count of characters.
- All of the items on the previous page are grouped under on of three groups:
- 03C2 (ς) GREEK SMALL LETTER FINAL SIGMA (the first 4), or
- 2F9BF (䗗) CJK COMPATIBILITY IDEOGRAPH-2F9BF (the next 5), or
- 04C0 (Ӏ) CYRILLIC LETTER PALOCHKA (the rest)
- The orange items (⇹) in the Eq? column are those with different mappings under IDNA2003 vs IDNA2008. Many of them don't matter because they are not valid under either scheme.
- The character counts are the number of Unicode characters in each group. Counts marked with re red are lower frequency on the web.
- The third page shows more samples of each group, with the top ten characters in the group (measured by web frequency).
For a larger view, see http://spreadsheets.google.com/ccc?key=tmJdiOVz8RncSt83g30rIxg
IDNA Mapping DifferencesIDNA Mapping Differences
|
|