This page provides a simple illustration of how a GUI can visually indicated boundaries between different scripts, to help avoid spoofing. The code is rough, and only meant for illustration.
The boundaries basically use the following pseudo-code:
lastScript = COMMON;
for i = 0..n
script = getScript(source[i]);
// Certain characters assume the script of certain adjacent characters
if (script == COMMON) script = lastScript;
else if (lastScript == COMMON) lastScript = script;
if (script == HAN_N && (lastScript == HAN_T || lastScript == HAN_S)) script = lastScript;
else if (lastScript == HAN_N && (script == HAN_T || script == HAN_S)) lastScript = script;
// Afterward the fixes, check to see if there is a boundary
if (lastScript != script) {
showBoundary(); // show boundary with color difference, lines, or other device
}
lastScript = script; // remember for next time
The getScript() call can use the Script value from the Unicode Character Database (see UTR #24: Script Names), with a few additional modifications:
One could certainly refine this to call out more characters that are visually confusable. For example, many CJK Radicals are identical in appearance to CJK Ideographs.
Note: In this demo, the script values are stubbed out, and are present only for simple ASCII Latin and Greek. The HAN fields are generated with a rough pass through the Unihan fields in the UCD: any character with a kSimplifiedVariant is counted as Traditional; any character with a kTraditionalVariant is counted as Simplified, and all others are counted as neutral.