Unicode & Int’l SW‎ > ‎UTC‎ > ‎

Additional UCA Files

L2/09-xxxx
Subject
Additional UCA Files
Author Mark Davis
Date 2009-10-23
To UTC

In the process of developing the UCA, I generate a number of files, listed below. The two collation test files are included in the release, but I think we should consider releasing others as well, perhaps in a subdirectory (/extracted?).
[TXT] CheckCollationValidity.html           23-Oct-2009 19:22  270K 

This is a log of the validity tests. We probably don't want this one, since it is mostly used to check consistency during the development process. 


[TXT] CollationTest_NON_IGNORABLE.txt       23-Oct-2009 19:23   13M  
[TXT] CollationTest_NON_IGNORABLE_SHORT.txt 23-Oct-2009 19:22 1.5M
[TXT] CollationTest_SHIFTED.txt 23-Oct-2009 19:23 14M
[TXT] CollationTest_SHIFTED_SHORT.txt 23-Oct-2009 19:23 1.5M

The short versions are useful for people who need to include the files in their release, but don't want the burden of all the comments, since the sizes are considerably different, as you can see (about 1/10 the size).


[TXT] FractionalUCA.txt                     23-Oct-2009 19:23  1.7M  
[TXT] FractionalUCA_SHORT.txt 23-Oct-2009 19:23 587K
[TXT] FractionalUCA_summary.txt 23-Oct-2009 19:23 67K

The fractional UCA files are a recast of the UCA data into a byte-format, as described in the implementation notes in UTS#10. Using it can represent substantial storage savings for implementations.


[TXT] UCARules-log.txt                      23-Oct-2009 19:23  1.6M 

This file is of no particular interest, and shouldn't be included.


[TXT] UCA_Rules.txt                         23-Oct-2009 19:23  1.4M  
[TXT] UCA_Rules.xml 23-Oct-2009 19:23 1.6M
[TXT] UCA_Rules_NoCE.txt 23-Oct-2009 19:23 1.0M
[TXT] UCA_Rules_SHORT.txt 23-Oct-2009 19:23 213K
[TXT] UCA_Rules_SHORT.xml 23-Oct-2009 19:23 355K

The rules files recast the UCA into a series of rules, using the CLDR format. The UCA_Rules_NoCE.txt file is particularly useful for comparing differences between versions. If you try to do a diff on allkeys.txt between versions, it is hopeless, because all of the numbers afterwards change when there is an inserted element. The rules files just give the ordering relationships, and can be usefully diffed.
Comments