I regenerated the confusables for UTS#39: Unicode Security Mechanisms, adding some characters from Mozilla and some related characters. I'd appreciate any feedback on additions/corrections. I've not really looked at any of the 5.1/5.2 additions, so help there would be especially appreciated, and we will want to release the new data soon after U5.2 is released. The draft source is at: By adding x ≈ y (x is visually confusable with y) to that source, it adds new confusables to the data. (The x ≈ y relationship in the file is expressed by the standard ";" delimiter between the two items, as in standard Unicode data files.) Note that confusability means that in some common fonts, at common UI font sizes, the characters look similar enough to be mistaken. So, for example, look at the following: ( י ) 05D9 HEBREW LETTER YODThe YOD doesn't look much like the apostrophe in a traditional font, but it does look like it in some modern fonts; so it gets added. The draft summary result is at: This is after generating equivalence classes. What that means is that if we had x ≈ y and y ≈ z in the source, then it also adds y ≈ x, z ≈ y, x ≈ z, and z ≈ x. The equivalence classes not only apply transitivity and symmetry for whole strings, but also to substrings. That means that if we have x ≈ y and yw ≈ z and q ≈ ym in the source, then the equivalence class also adds xw ≈ z and q ≈ xm. This file displays these equivalence classes by picking a representative, and mapping all others to it. For example: ( ! ) 0021 EXCLAMATION MARKThis means that { ! ǃ !} are all in the same equivalence class. Similarly, all of the following are in the same equivalence class. ( / ) 002F SOLIDUSThe comments (after #) indicate that the mapping is indirect. That is, the above comment indicates that it is added because 丿 (in some fonts) looks like ⼃ which (in some fonts) looks like /. New Script and Character ProposalsI'm also thinking that it would be a good idea to ask explicitly in script proposals for a list of possible confusables in the proposal. That would help to keep it up to date for new scripts, especially historic ones. |
