Additional Derived Properties for Action 115A008

L2/09-219R2

Subject: Operational Properties for Action 115A008

Date: 2009-05-14

From: Mark Davis

To: UTC

I had the following action from the UTC:

115 A008 Mark Davis Produce updated proposal for the "operationally X-cased" properties, with more background. L2/08-157 2008-05-20 2008-05-20

Here is the proposal, after revising the names and comments as per discussion in the UTC on May 13.

(Link to working doc: http://www.macchiato.com/unicode/action-115a008)

DerivedCoreProperties.txt

Add the following 6 properties (the short name is in parens).

# Derived Property: Cased (Cased)

# As defined by Unicode Standard Definition D120

# C has the Lowercase or Uppercase property or has a General_Category value of Titlecase_Letter.

0041..005A ; Cased # L& [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z

0061..007A ; Cased # L& [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z

00AA ; Cased # L& FEMININE ORDINAL INDICATOR

00B5 ; Cased # L& MICRO SIGN

00BA ; Cased # L& MASCULINE ORDINAL INDICATOR

00C0..00D6 ; Cased # L& [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS

...

# Derived Property: Case_Ignorable (CI)

# As defined by Unicode Standard Definition D121

# C is defined to be case-ignorable if

# Word_Break(C) = MidLetter or MidNumLet, or

# General_Category(C) = Nonspacing_Mark (Mn), Enclosing_Mark (Me), Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk).

0027 ; Case_Ignorable # Po APOSTROPHE

002E ; Case_Ignorable # Po FULL STOP

003A ; Case_Ignorable # Po COLON

005E ; Case_Ignorable # Sk CIRCUMFLEX ACCENT

0060 ; Case_Ignorable # Sk GRAVE ACCENT

00A8 ; Case_Ignorable # Sk DIAERESIS

....

# Derived Property: Is_Lowercase (ILC)

# As defined by Unicode Standard Definition D124

# isLowercase(X) is true when toLowercase(toNFD(X)) = toNFD(X)

0000..001F ; Is_Lowercase # Cc [32] <control-0000>..<control-001F>

0020 ; Is_Lowercase # Zs SPACE

0021..0023 ; Is_Lowercase # Po [3] EXCLAMATION MARK..NUMBER SIGN

0024 ; Is_Lowercase # Sc DOLLAR SIGN

0025..0027 ; Is_Lowercase # Po [3] PERCENT SIGN..APOSTROPHE

0028 ; Is_Lowercase # Ps LEFT PARENTHESIS

0029 ; Is_Lowercase # Pe RIGHT PARENTHESIS

002A ; Is_Lowercase # Po ASTERISK

002B ; Is_Lowercase # Sm PLUS SIGN

...

# Derived Property: Is_Uppercase (IUC)

# As defined by Unicode Standard Definition D125

# isUppercase(X) is true when toUppercase(toNFD(X)) = toNFD(X)

0000..001F ; Is_Uppercase # Cc [32] <control-0000>..<control-001F>

0020 ; Is_Uppercase # Zs SPACE

0021..0023 ; Is_Uppercase # Po [3] EXCLAMATION MARK..NUMBER SIGN

0024 ; Is_Uppercase # Sc DOLLAR SIGN

0025..0027 ; Is_Uppercase # Po [3] PERCENT SIGN..APOSTROPHE

0028 ; Is_Uppercase # Ps LEFT PARENTHESIS

0029 ; Is_Uppercase # Pe RIGHT PARENTHESIS

...

# Derived Property: Is_Titlecase (ITC)

# As defined by Unicode Standard Definition D126

# isTitlecase(X) is true when toTitlecase(toNFD(X)) = toNFD(X)

0000..001F ; Is_Titlecase # Cc [32] <control-0000>..<control-001F>

0020 ; Is_Titlecase # Zs SPACE

0021..0023 ; Is_Titlecase # Po [3] EXCLAMATION MARK..NUMBER SIGN

0024 ; Is_Titlecase # Sc DOLLAR SIGN

0025..0027 ; Is_Titlecase # Po [3] PERCENT SIGN..APOSTROPHE

...

# Derived Property: Is_Casefolded (ICF)

# As defined by Unicode Standard Definition D127

# isCasefolded(X) is true when toCasefold(toNFD(X)) = toNFD(X)

0000..001F ; Is_Casefolded # Cc [32] <control-0000>..<control-001F>

0020 ; Is_Casefolded # Zs SPACE

0021..0023 ; Is_Casefolded # Po [3] EXCLAMATION MARK..NUMBER SIGN

0024 ; Is_Casefolded # Sc DOLLAR SIGN

0025..0027 ; Is_Casefolded # Po [3] PERCENT SIGN..APOSTROPHE

0028 ; Is_Casefolded # Ps LEFT PARENTHESIS

0029 ; Is_Casefolded # Pe RIGHT PARENTHESIS

002A ; Is_Casefolded # Po ASTERISK

002B ; Is_Casefolded # Sm PLUS SIGN

...

# Derived Property: Is_Cased (IC)

# As defined by Unicode Standard Definition D128

# isCased(X) when isLowercase(X) is false, or isUppercase(X) is false, or isTitlecase(X) is false

0041..005A ; Is_Cased # L& [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z

0061..007A ; Is_Cased # L& [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z

00B5 ; Is_Cased # L& MICRO SIGN

00C0..00D6 ; Is_Cased # L& [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS

00D8..00F6 ; Is_Cased # L& [31] LATIN CAPITAL LETTER O WITH STROKE..LATIN SMALL LETTER O WITH DIAERESIS

00F8..0137 ; Is_Cased # L& [64] LATIN SMALL LETTER O WITH STROKE..LATIN SMALL LETTER K WITH CEDILLA

0139..018C ; Is_Cased # L& [84] LATIN CAPITAL LETTER L WITH ACUTE..LATIN SMALL LETTER D WITH TOPBAR

...

DerivedNormalizationProperties.txt

Add the following 2 properties:

# Derived Property: NFKC_Casefold (NFKCCF)

# As defined by CaseFolding, removing Default_Ignorable_Code_Points, then transforming by NFKC; then repeating

# TODO: flesh out description so Eric can understand it. ;-)

# Clarify that NFC must be applied to any string.

# All code points not explicitly listed for NFKC_Casefold

# have a value equal to the code point.

0041 ; NFKC_Casefold; 0061 # L& LATIN CAPITAL LETTER A

0042 ; NFKC_Casefold; 0062 # L& LATIN CAPITAL LETTER B

0043 ; NFKC_Casefold; 0063 # L& LATIN CAPITAL LETTER C

0044 ; NFKC_Casefold; 0064 # L& LATIN CAPITAL LETTER D

0045 ; NFKC_Casefold; 0065 # L& LATIN CAPITAL LETTER E

0046 ; NFKC_Casefold; 0066 # L& LATIN CAPITAL LETTER F

0047 ; NFKC_Casefold; 0067 # L& LATIN CAPITAL LETTER G

...

005A ; NFKC_Casefold; 007A # L& LATIN CAPITAL LETTER Z

00A0 ; NFKC_Casefold; 0020 # Zs NO-BREAK SPACE

00A8 ; NFKC_Casefold; 0020 0308 # Sk DIAERESIS

00AA ; NFKC_Casefold; 0061 # L& FEMININE ORDINAL INDICATOR

00AD ; NFKC_Casefold; # Cf SOFT HYPHEN

00AF ; NFKC_Casefold; 0020 0304 # Sk MACRON

00B2 ; NFKC_Casefold; 0032 # No SUPERSCRIPT TWO

00B3 ; NFKC_Casefold; 0033 # No SUPERSCRIPT THREE

00B4 ; NFKC_Casefold; 0020 0301 # Sk ACUTE ACCENT

...

# Derived Property: Is_NFKC_Casefold (isNFKCCF)

# As defined by X = NFKC_Casefold(X)

0000..001F ; Is_NFKC_Casefold # Cc [32] <control-0000>..<control-001F>

0020 ; Is_NFKC_Casefold # Zs SPACE

0021..0023 ; Is_NFKC_Casefold # Po [3] EXCLAMATION MARK..NUMBER SIGN

0024 ; Is_NFKC_Casefold # Sc DOLLAR SIGN

0025..0027 ; Is_NFKC_Casefold # Po [3] PERCENT SIGN..APOSTROPHE

0028 ; Is_NFKC_Casefold # Ps LEFT PARENTHESIS

0029 ; Is_NFKC_Casefold # Pe RIGHT PARENTHESIS

002A ; Is_NFKC_Casefold # Po ASTERISK

...

Text

Add references to these properties under the corresponding definitions, plus in UAX #31.