Unicode & Int’l SW‎ > ‎UTC‎ > ‎

Additional Derived Properties for Action 115A008

L2/09-219R2
Subject: Operational Properties for Action 115A008
Date: 2009-05-14
From: Mark Davis
To: UTC


I had the following action from the UTC:

115    A008    Mark Davis        Produce updated proposal for the "operationally X-cased" properties, with more background.    L2/08-157            2008-05-20    2008-05-20       

Here is the proposal, after revising the names and comments as per discussion in the UTC on May 13.

(Link to working doc: http://www.macchiato.com/unicode/action-115a008)

DerivedCoreProperties.txt

Add the following 6 properties (the short name is in parens).

# Derived Property:   Cased (Cased)
#  As defined by Unicode Standard Definition D120
#  C has the Lowercase or Uppercase property or has a General_Category value of Titlecase_Letter.

0041..005A    ; Cased # L&  [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z
0061..007A    ; Cased # L&  [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z
00AA          ; Cased # L&       FEMININE ORDINAL INDICATOR
00B5          ; Cased # L&       MICRO SIGN
00BA          ; Cased # L&       MASCULINE ORDINAL INDICATOR
00C0..00D6    ; Cased # L&  [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS
...

# Derived Property:   Case_Ignorable (CI)
#  As defined by Unicode Standard Definition D121
#  C is defined to be case-ignorable if
#    Word_Break(C) = MidLetter or MidNumLet, or
#    General_Category(C) = Nonspacing_Mark (Mn), Enclosing_Mark (Me), Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk).

0027          ; Case_Ignorable # Po       APOSTROPHE
002E          ; Case_Ignorable # Po       FULL STOP
003A          ; Case_Ignorable # Po       COLON
005E          ; Case_Ignorable # Sk       CIRCUMFLEX ACCENT
0060          ; Case_Ignorable # Sk       GRAVE ACCENT
00A8          ; Case_Ignorable # Sk       DIAERESIS
....

# Derived Property:   Is_Lowercase (ILC)
#  As defined by Unicode Standard Definition D124
#  isLowercase(X) is true when toLowercase(toNFD(X)) = toNFD(X)

0000..001F    ; Is_Lowercase # Cc  [32] <control-0000>..<control-001F>
0020          ; Is_Lowercase # Zs       SPACE
0021..0023    ; Is_Lowercase # Po   [3] EXCLAMATION MARK..NUMBER SIGN
0024          ; Is_Lowercase # Sc       DOLLAR SIGN
0025..0027    ; Is_Lowercase # Po   [3] PERCENT SIGN..APOSTROPHE
0028          ; Is_Lowercase # Ps       LEFT PARENTHESIS
0029          ; Is_Lowercase # Pe       RIGHT PARENTHESIS
002A          ; Is_Lowercase # Po       ASTERISK
002B          ; Is_Lowercase # Sm       PLUS SIGN
...

# Derived Property:   Is_Uppercase (IUC)
#  As defined by Unicode Standard Definition D125
#  isUppercase(X) is true when toUppercase(toNFD(X)) = toNFD(X)

0000..001F    ; Is_Uppercase # Cc  [32] <control-0000>..<control-001F>
0020          ; Is_Uppercase # Zs       SPACE
0021..0023    ; Is_Uppercase # Po   [3] EXCLAMATION MARK..NUMBER SIGN
0024          ; Is_Uppercase # Sc       DOLLAR SIGN
0025..0027    ; Is_Uppercase # Po   [3] PERCENT SIGN..APOSTROPHE
0028          ; Is_Uppercase # Ps       LEFT PARENTHESIS
0029          ; Is_Uppercase # Pe       RIGHT PARENTHESIS
...

# Derived Property:   Is_Titlecase (ITC)
#  As defined by Unicode Standard Definition D126
#  isTitlecase(X) is true when toTitlecase(toNFD(X)) = toNFD(X)

0000..001F    ; Is_Titlecase # Cc  [32] <control-0000>..<control-001F>
0020          ; Is_Titlecase # Zs       SPACE
0021..0023    ; Is_Titlecase # Po   [3] EXCLAMATION MARK..NUMBER SIGN
0024          ; Is_Titlecase # Sc       DOLLAR SIGN
0025..0027    ; Is_Titlecase # Po   [3] PERCENT SIGN..APOSTROPHE
...

# Derived Property:   Is_Casefolded (ICF)
#  As defined by Unicode Standard Definition D127
#  isCasefolded(X) is true when toCasefold(toNFD(X)) = toNFD(X)

0000..001F    ; Is_Casefolded # Cc  [32] <control-0000>..<control-001F>
0020          ; Is_Casefolded # Zs       SPACE
0021..0023    ; Is_Casefolded # Po   [3] EXCLAMATION MARK..NUMBER SIGN
0024          ; Is_Casefolded # Sc       DOLLAR SIGN
0025..0027    ; Is_Casefolded # Po   [3] PERCENT SIGN..APOSTROPHE
0028          ; Is_Casefolded # Ps       LEFT PARENTHESIS
0029          ; Is_Casefolded # Pe       RIGHT PARENTHESIS
002A          ; Is_Casefolded # Po       ASTERISK
002B          ; Is_Casefolded # Sm       PLUS SIGN
...

# Derived Property:   Is_Cased (IC)
#  As defined by Unicode Standard Definition D128
#  isCased(X) when isLowercase(X) is false, or isUppercase(X) is false, or isTitlecase(X) is false

0041..005A    ; Is_Cased # L&  [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z
0061..007A    ; Is_Cased # L&  [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z
00B5          ; Is_Cased # L&       MICRO SIGN
00C0..00D6    ; Is_Cased # L&  [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS
00D8..00F6    ; Is_Cased # L&  [31] LATIN CAPITAL LETTER O WITH STROKE..LATIN SMALL LETTER O WITH DIAERESIS
00F8..0137    ; Is_Cased # L&  [64] LATIN SMALL LETTER O WITH STROKE..LATIN SMALL LETTER K WITH CEDILLA
0139..018C    ; Is_Cased # L&  [84] LATIN CAPITAL LETTER L WITH ACUTE..LATIN SMALL LETTER D WITH TOPBAR
...


DerivedNormalizationProperties.txt

Add the following 2 properties:

# Derived Property:   NFKC_Casefold (NFKCCF)
#  As defined by CaseFolding, removing Default_Ignorable_Code_Points, then transforming by NFKC; then repeating

# TODO: flesh out description so Eric can understand it. ;-)
# Clarify that NFC must be applied to any string.

#  All code points not explicitly listed for NFKC_Casefold
#  have a value equal to the code point.

0041  ; NFKC_Casefold; 0061           # L&  LATIN CAPITAL LETTER A
0042  ; NFKC_Casefold; 0062           # L&  LATIN CAPITAL LETTER B
0043  ; NFKC_Casefold; 0063           # L&  LATIN CAPITAL LETTER C
0044  ; NFKC_Casefold; 0064           # L&  LATIN CAPITAL LETTER D
0045  ; NFKC_Casefold; 0065           # L&  LATIN CAPITAL LETTER E
0046  ; NFKC_Casefold; 0066           # L&  LATIN CAPITAL LETTER F
0047  ; NFKC_Casefold; 0067           # L&  LATIN CAPITAL LETTER G
...
005A  ; NFKC_Casefold; 007A           # L&  LATIN CAPITAL LETTER Z
00A0  ; NFKC_Casefold; 0020           # Zs  NO-BREAK SPACE
00A8  ; NFKC_Casefold; 0020 0308      # Sk  DIAERESIS
00AA  ; NFKC_Casefold; 0061           # L&  FEMININE ORDINAL INDICATOR
00AD  ; NFKC_Casefold;                # Cf  SOFT HYPHEN
00AF  ; NFKC_Casefold; 0020 0304      # Sk  MACRON
00B2  ; NFKC_Casefold; 0032           # No  SUPERSCRIPT TWO
00B3  ; NFKC_Casefold; 0033           # No  SUPERSCRIPT THREE
00B4  ; NFKC_Casefold; 0020 0301      # Sk  ACUTE ACCENT
...

# Derived Property:   Is_NFKC_Casefold (is
NFKCCF)
#  As defined by X = NFKC_Casefold(X)

0000..001F    ; Is_NFKC_Casefold # Cc  [32] <control-0000>..<control-001F>
0020          ; Is_NFKC_Casefold # Zs       SPACE
0021..0023    ; Is_NFKC_Casefold # Po   [3] EXCLAMATION MARK..NUMBER SIGN
0024          ; Is_NFKC_Casefold # Sc       DOLLAR SIGN
0025..0027    ; Is_NFKC_Casefold # Po   [3] PERCENT SIGN..APOSTROPHE
0028          ; Is_NFKC_Casefold # Ps       LEFT PARENTHESIS
0029          ; Is_NFKC_Casefold # Pe       RIGHT PARENTHESIS
002A          ; Is_NFKC_Casefold # Po       ASTERISK
...

Text

Add references to these properties under the corresponding definitions, plus in UAX #31.


Comments