Mark Davis, 2009-07-08 I was working on the following action. We were not able to come to consensus as to how to proceed in the ed committee, so I'm bringing this to the UTC. 117 A040 Mark Davis Update PropertyAliases.txt and PropertyValueAliases.txt with the Unihan properties. L2/08-352 UCD 2008-11-12 2008-11-12
The document is: http://www.unicode.org/L2/L2008/08352-stability-prop.html. That document didn't specify the property names or aliases, and I also needed default values. I followed the analogy of kRSUnicode, which shows up as: # Unicode_Radical_Stroke (URS) # @missing: 0000..10FFFF; Unicode_Radical_Stroke; <none> The CompatibilityVariant and the numerics have defaults for String and Numeric properties, however. Note that we probably should list CompatibilityVariant as a derived property in #44, since it -- according to #38 -- is derived from the compat decomp in UnicodeData; and is thus (I presume) just filtered to only be CJK_Ideographs. <BTW> We really ought to
have a faq explaining what the heck the difference is between the
properties Ideograph and Unified Ideograph, since it is pretty
impossible to guess from the names. It appears that Ideograph is really
a derived property and equal to the following. Is this relationship by
intent or accident? </BTW>Unified Ideograph + HANGZHOU numerals + compat ideographs + 3006 + 3007 cf: http://unicode.org/cldr/utility/unicodeset.jsp?a=[:Ideographic:]&b=[:Unified_Ideograph:] http://unicode.org/cldr/utility/unicodeset.jsp?a=[:Ideographic:]&b=[\u3006\u3007[:Unified_Ideograph:][:name=/HANGZHOU|CJK%20COMPATIBILITY%20IDEOGRAPH/:]] On that basis, here was what I came up with. --- Aliases --- # ================================================ # Numeric Properties # ================================================ CJK_AC ; CJK_AccountingNumeric ; kAccountingNumeric CJK_ON ; CJK_OtherNumeric ; kOtherNumeric CJK_PN ; CJK_PrimaryNumeric ; kPrimaryNumeric ... # ================================================ # String Properties # ================================================ ... CJK_CV ; CJK_CompatibilityVariant ; kCompatibilityVariant ... # ================================================ # Miscellaneous Properties # ================================================ IIC ; IICore ; kIICore IRG_G ; IRG_GSource ; kIRG_GSource IRG_H ; IRG_HSource ; kIRG_HSource IRG_J ; IRG_JSource ; kIRG_JSource IRG_K ; IRG_KSource ; kIRG_KSource IRG_KP ; IRG_KPSource ; kIRG_KPSource IRG_T ; IRG_TSource ; kIRG_TSource IRG_U ; IRG_USource ; kIRG_USource IRG_V ; IRG_VSource ; kIRG_VSource ... URS ; Unicode_Radical_Stroke ; kRSUnicode --- ValueAliases --- # CJK_AccountingNumeric (CJK_AC) # @missing: 0000..10FFFF; CJK_AccountingNumeric; NaN # CJK_CompatibilityVariant (CJK_CV) # @missing: 0000..10FFFF; CJK_CompatibilityVariant; <code point> # CJK_OtherNumeric (CJK_ON) # @missing: 0000..10FFFF; CJK_OtherNumeric; NaN # CJK_PrimaryNumeric (CJK_PN) # @missing: 0000..10FFFF; CJK_PrimaryNumeric; NaN # IICore (IIC) # @missing: 0000..10FFFF; IICore; <none> # IRG_GSource (IRG_G) # @missing: 0000..10FFFF; IRG_GSource; <none> # IRG_HSource (IRG_H) # @missing: 0000..10FFFF; IRG_HSource; <none> # IRG_JSource (IRG_J) # @missing: 0000..10FFFF; IRG_JSource; <none> # IRG_KPSource (IRG_KP) # @missing: 0000..10FFFF; IRG_KPSource; <none> # IRG_KSource (IRG_K) # @missing: 0000..10FFFF; IRG_KSource; <none> # IRG_TSource (IRG_T) # @missing: 0000..10FFFF; IRG_TSource; <none> # IRG_USource (IRG_U) # @missing: 0000..10FFFF; IRG_USource; <none> # IRG_VSource (IRG_V) # @missing: 0000..10FFFF; IRG_VSource; <none> Here are some back and forths on this, edited heavily for brevity. I would really really prefer that we don't invent any new names, long
or short. For the long names, the kFoo names are perfectly adequate in my opinion, they are what the Unihan users know (at the very least, they have to be the "preferred" long names). For the short names, with 88 properties, they are bound to be impossible to remember (e.g. for me CJK_ON suggests kJapaneseOn rather than kOtherNumeric). ... I can understand that, but we also need to be consistent with what
we've done already with kRSUnicode, and other properties. Note that the
kxxx names are retained as aliases, and there'd be no problem with your
continuing to use them in the xml. Having the k form (which can be very long) be the short alias is simply bizarre. As far as the UCD properties were concerned, the tags in Unihan were just gorp; this is the point at which we are really fully recognizing (some of) them as UCD properties, and we should give them consistent names, as we have *already done* with kRSUnicode. Note that with Unicode_Radical_Stroke, the we didn't even use the k form as an *alias*; the official name was all and only the first two fields below. In the above, I added the 'k' form as an alias.. URS ; Unicode_Radical_Stroke ; kRSUnicode |