Additional Derived Properties for Action 115A008
L2/09-219R2
Subject: Operational Properties for Action 115A008
Date: 2009-05-14
From: Mark Davis
To: UTC
I had the following action from the UTC:
115 A008 Mark Davis Produce updated proposal for the "operationally X-cased" properties, with more background. L2/08-157 2008-05-20 2008-05-20
Here is the proposal, after revising the names and comments as per discussion in the UTC on May 13.
(Link to working doc: http://www.macchiato.com/unicode/action-115a008)
DerivedCoreProperties.txt
Add the following 6 properties (the short name is in parens).
# Derived Property: Cased (Cased)
# As defined by Unicode Standard Definition D120
# C has the Lowercase or Uppercase property or has a General_Category value of Titlecase_Letter.
0041..005A ; Cased # L& [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z
0061..007A ; Cased # L& [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z
00AA ; Cased # L& FEMININE ORDINAL INDICATOR
00B5 ; Cased # L& MICRO SIGN
00BA ; Cased # L& MASCULINE ORDINAL INDICATOR
00C0..00D6 ; Cased # L& [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS
...
# Derived Property: Case_Ignorable (CI)
# As defined by Unicode Standard Definition D121
# C is defined to be case-ignorable if
# Word_Break(C) = MidLetter or MidNumLet, or
# General_Category(C) = Nonspacing_Mark (Mn), Enclosing_Mark (Me), Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk).
0027 ; Case_Ignorable # Po APOSTROPHE
002E ; Case_Ignorable # Po FULL STOP
003A ; Case_Ignorable # Po COLON
005E ; Case_Ignorable # Sk CIRCUMFLEX ACCENT
0060 ; Case_Ignorable # Sk GRAVE ACCENT
00A8 ; Case_Ignorable # Sk DIAERESIS
....
# Derived Property: Is_Lowercase (ILC)
# As defined by Unicode Standard Definition D124
# isLowercase(X) is true when toLowercase(toNFD(X)) = toNFD(X)
0000..001F ; Is_Lowercase # Cc [32] <control-0000>..<control-001F>
0020 ; Is_Lowercase # Zs SPACE
0021..0023 ; Is_Lowercase # Po [3] EXCLAMATION MARK..NUMBER SIGN
0024 ; Is_Lowercase # Sc DOLLAR SIGN
0025..0027 ; Is_Lowercase # Po [3] PERCENT SIGN..APOSTROPHE
0028 ; Is_Lowercase # Ps LEFT PARENTHESIS
0029 ; Is_Lowercase # Pe RIGHT PARENTHESIS
002A ; Is_Lowercase # Po ASTERISK
002B ; Is_Lowercase # Sm PLUS SIGN
...
# Derived Property: Is_Uppercase (IUC)
# As defined by Unicode Standard Definition D125
# isUppercase(X) is true when toUppercase(toNFD(X)) = toNFD(X)
0000..001F ; Is_Uppercase # Cc [32] <control-0000>..<control-001F>
0020 ; Is_Uppercase # Zs SPACE
0021..0023 ; Is_Uppercase # Po [3] EXCLAMATION MARK..NUMBER SIGN
0024 ; Is_Uppercase # Sc DOLLAR SIGN
0025..0027 ; Is_Uppercase # Po [3] PERCENT SIGN..APOSTROPHE
0028 ; Is_Uppercase # Ps LEFT PARENTHESIS
0029 ; Is_Uppercase # Pe RIGHT PARENTHESIS
...
# Derived Property: Is_Titlecase (ITC)
# As defined by Unicode Standard Definition D126
# isTitlecase(X) is true when toTitlecase(toNFD(X)) = toNFD(X)
0000..001F ; Is_Titlecase # Cc [32] <control-0000>..<control-001F>
0020 ; Is_Titlecase # Zs SPACE
0021..0023 ; Is_Titlecase # Po [3] EXCLAMATION MARK..NUMBER SIGN
0024 ; Is_Titlecase # Sc DOLLAR SIGN
0025..0027 ; Is_Titlecase # Po [3] PERCENT SIGN..APOSTROPHE
...
# Derived Property: Is_Casefolded (ICF)
# As defined by Unicode Standard Definition D127
# isCasefolded(X) is true when toCasefold(toNFD(X)) = toNFD(X)
0000..001F ; Is_Casefolded # Cc [32] <control-0000>..<control-001F>
0020 ; Is_Casefolded # Zs SPACE
0021..0023 ; Is_Casefolded # Po [3] EXCLAMATION MARK..NUMBER SIGN
0024 ; Is_Casefolded # Sc DOLLAR SIGN
0025..0027 ; Is_Casefolded # Po [3] PERCENT SIGN..APOSTROPHE
0028 ; Is_Casefolded # Ps LEFT PARENTHESIS
0029 ; Is_Casefolded # Pe RIGHT PARENTHESIS
002A ; Is_Casefolded # Po ASTERISK
002B ; Is_Casefolded # Sm PLUS SIGN
...
# Derived Property: Is_Cased (IC)
# As defined by Unicode Standard Definition D128
# isCased(X) when isLowercase(X) is false, or isUppercase(X) is false, or isTitlecase(X) is false
0041..005A ; Is_Cased # L& [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z
0061..007A ; Is_Cased # L& [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z
00B5 ; Is_Cased # L& MICRO SIGN
00C0..00D6 ; Is_Cased # L& [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS
00D8..00F6 ; Is_Cased # L& [31] LATIN CAPITAL LETTER O WITH STROKE..LATIN SMALL LETTER O WITH DIAERESIS
00F8..0137 ; Is_Cased # L& [64] LATIN SMALL LETTER O WITH STROKE..LATIN SMALL LETTER K WITH CEDILLA
0139..018C ; Is_Cased # L& [84] LATIN CAPITAL LETTER L WITH ACUTE..LATIN SMALL LETTER D WITH TOPBAR
...
DerivedNormalizationProperties.txt
Add the following 2 properties:
# Derived Property: NFKC_Casefold (NFKCCF)
# As defined by CaseFolding, removing Default_Ignorable_Code_Points, then transforming by NFKC; then repeating
# TODO: flesh out description so Eric can understand it. ;-)
# Clarify that NFC must be applied to any string.
# All code points not explicitly listed for NFKC_Casefold
# have a value equal to the code point.
0041 ; NFKC_Casefold; 0061 # L& LATIN CAPITAL LETTER A
0042 ; NFKC_Casefold; 0062 # L& LATIN CAPITAL LETTER B
0043 ; NFKC_Casefold; 0063 # L& LATIN CAPITAL LETTER C
0044 ; NFKC_Casefold; 0064 # L& LATIN CAPITAL LETTER D
0045 ; NFKC_Casefold; 0065 # L& LATIN CAPITAL LETTER E
0046 ; NFKC_Casefold; 0066 # L& LATIN CAPITAL LETTER F
0047 ; NFKC_Casefold; 0067 # L& LATIN CAPITAL LETTER G
...
005A ; NFKC_Casefold; 007A # L& LATIN CAPITAL LETTER Z
00A0 ; NFKC_Casefold; 0020 # Zs NO-BREAK SPACE
00A8 ; NFKC_Casefold; 0020 0308 # Sk DIAERESIS
00AA ; NFKC_Casefold; 0061 # L& FEMININE ORDINAL INDICATOR
00AD ; NFKC_Casefold; # Cf SOFT HYPHEN
00AF ; NFKC_Casefold; 0020 0304 # Sk MACRON
00B2 ; NFKC_Casefold; 0032 # No SUPERSCRIPT TWO
00B3 ; NFKC_Casefold; 0033 # No SUPERSCRIPT THREE
00B4 ; NFKC_Casefold; 0020 0301 # Sk ACUTE ACCENT
...
# Derived Property: Is_NFKC_Casefold (isNFKCCF)
# As defined by X = NFKC_Casefold(X)
0000..001F ; Is_NFKC_Casefold # Cc [32] <control-0000>..<control-001F>
0020 ; Is_NFKC_Casefold # Zs SPACE
0021..0023 ; Is_NFKC_Casefold # Po [3] EXCLAMATION MARK..NUMBER SIGN
0024 ; Is_NFKC_Casefold # Sc DOLLAR SIGN
0025..0027 ; Is_NFKC_Casefold # Po [3] PERCENT SIGN..APOSTROPHE
0028 ; Is_NFKC_Casefold # Ps LEFT PARENTHESIS
0029 ; Is_NFKC_Casefold # Pe RIGHT PARENTHESIS
002A ; Is_NFKC_Casefold # Po ASTERISK
...
Text
Add references to these properties under the corresponding definitions, plus in UAX #31.