Clarification of implicit weights for ideographs in UCA

L2/xxx

To: UTC

From: Mark Davis, Markus Scherer

Re: Clarification of implicit weights for ideographs in UCA

Date: 2009-02-20

UCA has the following description of how to generate implicit weights:

The value for BASE depends on the type of character:

FB40

FB80

FBC0

CJK Ideograph

CJK Ideograph Extension A/B

Any other code point

http://unicode.org/reports/tr10/#Implicit_Weights

This is unfortunately not crystal-clear. It will also need to be updated when we add additional CJK Ideographic blocks.

The goal was to include all the Unified Ideographs. The issue is what counts as "CJK Ideograph", and what counts as "CJK Ideograph Extension A/B". Our presumption is that:

This is what we have followed for some time in the ICU implementation. We'd like to fix the text to correspond to the above.

Other Questions:

    1. Is it worth adding a note that the block grew in Unicode 4.1 and Unicode 5.1?

    2. Should we "future-proof" the second category by making it Extension A block + Plane 2 (minus non-characters)? What about Plane 3?

The text also needs a bit of editing because it uses "character" sometimes when "code point" is meant.