Erik van der Poel, Markus Scherer and I were discussing having a test suite for UTS #46
. Here are some thoughts we had about how it could be constructed.
Have a table that includes a series of lines of the form:
source_string ; options ; ToASCII-result ; ToASCII-status ; ToUnicode-result ; ToUnicode-status # comment
- source_string and results are space-separated code points (eg "0061 0062"
- options are space-delimited options: "noSTD3", "Transitional", "noContextJ", "noBidi"
- status is "error" or blank
0061 0062 ; noBidi ; 0061 0062 ; ok ; 0061 0062 ; ok # ab
The source strings would come from the following:
- Add all of the examples that are used in the spec.
- Every place an error condition could result, have an example that is just enough to trigger the exception (eg a label of length 64), plus one that is just at the limit (eg a label of length 63).
- Randomly generate a large number of strings using interesting classes of characters, eg a mixture of representative characters from the classes (a)-(k) in http://unicode.org/draft/reports/tr46/tr46.html#Table_IDNA_Comparisons, both in normalized and denormalized form.