I put together a table that shows the assignments made by IDNA2008 and
IDNA2003, respectively. I would urge people to review these for
problems before Stockholm.
The file is at: http://macchiato.com/idna/idna-info.html
There
are plaintext versions in that directory also. The tab and html
versions allow copy and paste into your favorite spreadsheet for
comparison, so that you can filter and sort by different values.
Key
| Field |
Contents |
Comments |
| 1 | difference status
| w = warning, X = ambiguous mapping, blank = no problem
| | 2 | codepoint or codepoint..codepoint |
| | 3 | character(s) | | | 4 | idna2008 status | PVALID, CONTEXTO, CONTEXTJ, REMAP, UNASSIGNED, DISALLOWED;
REMAP if the code point is DISALLOWED but has a mapping
| | 5 | idna2008 mapping | "n/a" unless REMAP "delete" unless non-empty | | 6 | idna2003 status | "~" if same as idna2008 status | | 7 | idna2003 mapping | "~" if same as idna2008 mapping | | 8 | script | | | 9 | general category/ies | | | 10 | description(s) | name(s), or designations for those that aren't characters
|
Please let me know of any problems! Firefox CharactersHere is the set: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[\u01C3\u02D0\u0337\u0338\u3033]My take is that all of these are legitimate characters.
| Item |
Example |
Comments |
| \u01C3 | aǃb vs a!b | typically identical, but ! isn't allowed in domain names anyway.
| | \u02D0 | aːb vs a:b | similar, but not the same appearance. Could be confused with : used in URL password or port, so UIs should probably warn.
| | \u0337 | a̸b vs a/b | not really confusable because of positioning | | \u0338 | a̷b vs a/b | not really confusable because of positioning | | \u3033 | a〳b vs a/b | not really confusable because of positioning.
| | \u05F4 | a״b vs a"b | similar, but not the same, and " isn't allowed in domain names anyway.
|
|