Unicode & Int’l SW‎ > ‎UTC‎ > ‎

Clarification of implicit weights for ideographs in UCA

L2/xxx
To: UTC
From: Mark Davis, Markus Scherer
Re: Clarification of implicit weights for ideographs in UCA
Date: 2009-02-20

UCA has the following description of how to generate implicit weights:

The value for BASE depends on the type of character:

FB40 CJK Ideograph
FB80 CJK Ideograph Extension A/B
FBC0 Any other code point

http://unicode.org/reports/tr10/#Implicit_Weights

This is unfortunately not crystal-clear. It will also need to be updated when we add additional CJK Ideographic blocks.

The goal was to include all the Unified Ideographs. The issue is what counts as "CJK Ideograph", and what counts as "CJK Ideograph Extension A/B". Our presumption is that:
This is what we have followed for some time in the ICU implementation. We'd like to fix the text to correspond to the above.

Other Questions:
  1. Is it worth adding a note that the block grew in Unicode 4.1 and Unicode 5.1?
  2. Should we "future-proof" the second category by making it  Extension A block + Plane 2 (minus non-characters)? What about Plane 3?
The text also needs a bit of editing because it uses "character" sometimes when "code point" is meant.
Comments