*TODO Misc property proposal
1. Possible stability policy invariant
Z = whitespace - [\u0009-\u000D \u0085]
2. Add to Default_Ignorable_Code_Point
U+FFFC (  ) OBJECT REPLACEMENT CHARACTER
3. Change Scripts from Common to Specific for the following Other_Symbols
U+0CF1 ( ೱ ) KANNADA SIGN JIHVAMULIYA
U+0CF2 ( ೲ ) KANNADA SIGN UPADHMANIYA
U+FDFD ( ﷽ ) ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
About 1000 symbols have specific scripts:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:s:]-[:script=common:]]
[༔ ፠ ᥀ ႞ ႟ ͵ ᎐-᎙ ҂ ؈ ؎ ؏ ۩ ߶ ৺ ୰ ௳-௸ ௺ ౿ ൹ ꠨-꠫ ༁-༃ ༓ ༕-༗ ༚-༟ ༴ ༶ ༸ ྾-࿅ ࿇-࿌ ࿎ ࿏ ᧠-᧿ ᭡-᭪ ᭴-᭼ ϶ ؆ ؇ ⳥-⳪ ⠀-⣿ ꒐-꓆ 𐅹-𐆉 𝈀-𝉁 𝉅 ؋ ৲ ৳ ૱ ௹ ៛ ۽ ۾ ⺦ ⺀ ⺄ ⺃ ⺂ ⺅-⺋ ⺁ ⺌ ⺍ ⺐⺎ ⺏⺑-⺓ ⺕ ⺔ ⺗ ⺖ ⺘ ⺙ ⺛-⺞ ⺠-⺥ ⺧-⺰ ⺲⺵ ⺱⺳ ⺴ ⺶-⺹ ⺻ ⺺ ⺼-⻂ ⻄ ⻃ ⻅-⻝ ⻟⻞ ⻠-⻲]
with decomposition (dt!=none):
[` ΄´ ΅῭῁ ᾽᾿῎῍῏ ῾῞῝῟ ῀ ㈀ ㈎ ㈁ ㈏ ㈂ ㈐ ㈃ ㈑ ㈄ ㈒ ㈅ ㈓ ㈆ ㈔ ㈇ ㈕ ㈝ ㈞ ㈈ ㈖ ㈜ ㈉ ㈗ ㈊ ㈘ ㈋ ㈙ ㈌ ㈚ ㈍ ㈛ ﬩ ﷼ ㉠ ㉮ ㉡ ㉯ ㉢ ㉰ ㉣ ㉱ ㉤ ㉲ ㉥ ㉳ ㉦ ㉴ ㉧ ㉵ ㉾ ㉨ ㉶ ㉽ ㉩ ㉷ ㉼ ㉪ ㉸ ㉫ ㉹ ㉬ ㉺ ㉭ ㉻ ㋐ ㌃ ㌀-㌂ ㋑ ㌄ ㌅ ㋒ ㌆ ㋓ ㌈ ㌇ ㋔ ㌊ ㌉ ㋕ ㌋-㌏ ㋖ ㌐-㌗ ㋗ ㌘-㌛ ㋘ ㌜ ㋙ ㌞ ㌝ ㋚ ㌟ ㌠ ㋛ ㌡ ㋜ ㋝ ㌢ ㌣ ㋞ ㋟ ㌤ ㋠-㋢ ㌥ ㋣ ㌦ ㌧ ㋤ ㌨ ㋥-㋨ ㌩ ㋩ ㌫-㌭ ㌪ ㋪ ㌮-㌱ ㋫ ㌲-㌵ ㋬ ㌻ ㌼ ㌶-㌺ ㋭ ㍁ ㍂ ㌽-㍀ ㋮ ㍃-㍇ ㋯ ㍈-㍊ ㋰ ㋱ ㍍ ㍋ ㍌ ㋲ ㋳ ㍎ ㍏ ㋴ ㍐ ㋵-㋷ ㍑ ㍒ ㋸ ㍔ ㍓ ㋹ ㍕ ㍖ ㋺ ㋻ ㍗ ㋼-㋾ ⼀-⽏ ⺟ ⽐-⿔ ⻳ ⿕]
4. We are inconsistent regarding manufactured symbols
㉠ is Script=Hangul, Other_Symbol
㋐ is Script=Katakana, Other_Symbol
⒜ is Script=Common, Other_Symbol
㊏ is Script=Common, Other_Symbol
㈹ is Script=Common, Other_Symbol
ⓐ is Script=Common, Other_Symbol
U+32CD ( ㋍ ) SQUARE ERG is Script=Common, Other_Symbol
U+3300 ( ㌀ ) SQUARE APAATO is Script=Katakana, Other_Symbol
U+337F ( ㍿ ) SQUARE CORPORATION is Script=Common, Other_Symbol
µ is Script=Common, Lowercase_Letter
U+1D400 ( 𝐀 ) MATHEMATICAL BOLD CAPITAL A is Script Common, Uppercase_Letter
There is no principled reason why ㉠ or ㌀ should not have Script=Common, like the others. Either that or the others have the appropriate scripts.
About a 1000 letters have Common Script, typically where they are functionally symbols:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:Letter:]-[:^script=common:]]
[ـ ʹ ʺ ˆ-ˏ ˬ ꜗ-ꜟ ꞈ ː ˑ 〱-〵 ー ʻ ʽ ˀ ʼ ˮ ʾ ʿ ˁ ⸯ 〼 〆]
with decomposition:
[µʹℂℇℊ-ℓℕℙ-ℝℤℨℬℭℯ-ℱℳ-ℹ ℼ-ℿⅅ-ⅉー゙゚𝐀-𝑔𝑖-𝒜𝒞𝒟𝒢𝒥𝒦𝒩-𝒬 𝒮-𝒹𝒻𝒽-𝓃𝓅-𝔅𝔇-𝔊𝔍-𝔔𝔖-𝔜𝔞-𝔹 𝔻-𝔾𝕀-𝕄𝕆𝕊-𝕐𝕒-𝚥𝚨-𝛀𝛂-𝛚𝛜-𝛺 𝛼-𝜔𝜖-𝜴𝜶-𝝎𝝐-𝝮𝝰-𝞈𝞊-𝞨𝞪-𝟂 𝟄-𝟋]
5. I was using the subblocks (from NamesList) to help organize, and found the following:
Singular plus Plural forms.
In many cases, we have a subblock that is a singular form of a plural name. I'd suggest we always use the plural form for noun or noun phrases, even if there is only a single character in the subblock. That is "XXX symbol" => "XXX symbols". Adjectival phrases ("Arabic") would be left alone.
This would affect the following (plus perhaps more: I only checked singular forms where there was a regular plural also).
Additional consonant => Additional consonants
Additional letter => Additional letters
Circle => Circles
Circled Hangul syllable => Circled Hangul syllables
Consonant => Consonants
Currency symbol => Currency symbols
Dependent vowel sign => Dependent vowel signs
Double diacritic => Double diacritics
Extended Arabic letter => Extended Arabic letters
Format character => Format characters
Gender symbol => Gender symbols
Keyboard symbol => Keyboard symbols
Latin letter => Latin letters
Letter => Letters
Letterlike symbol => Letterlike symbols
Medievalist addition => Medievalist additions
Miscellaneous addition => Miscellaneous additions
Miscellaneous arrow => Miscellaneous arrows
Miscellaneous mark => Miscellaneous marks
Miscellaneous mathematical operator => Miscellaneous mathematical operators
Miscellaneous mathematical symbol => Miscellaneous mathematical symbols
Miscellaneous symbol => Miscellaneous symbols
Modifier letter => Modifier letters
Operator => Operators
Relation => Relations
Rest => Rests
Sign => Signs
Space => Spaces
Special => Specials
Special character extension => Special character extensions
Squared Latin abbreviation => Squared Latin abbreviations
Subjoined consonant => Subjoined consonants
Syllable => Syllables
Symbol => Symbols
Tamil symbol => Tamil symbols
Variant letterform => Variant letterforms
Vertical line operator => Vertical line operators
Vowel => Vowels
There are a number of cases where the subblock equals the block.
Singletons: If it is a singleton (the only subblock in the block), add more of a gloss.
Braille Patterns
CJK Radicals Supplement
CJK Strokes
Currency Symbols
Ideographic Description Characters
Kanbun
Kangxi Radicals
Small Form Variants
Variation Selectors
Yi Radicals
Yijing Hexagram Symbols
Variation Selectors
Yi Radicals
Yijing Hexagram Symbols
Non-Singletons: also change by narrowing the description, or changing to just "Miscellaneous", since it doesn't make sense to have block X contain subblock X and Y; that would imply both that Y's are X's and that Y's are not X's.
Block Elements
CJK Symbols And Punctuation
Combining Diacritical Marks For Symbols
Combining Half Marks
General Punctuation
Geometric Shapes
IPA Extensions
Letterlike Symbols
Miscellaneous Symbols
Miscellaneous Technical