Root locale and fallback
Issues
ULocale#getFallback() never get to ULocale.ROOT (ticket#6673), instead, the final locale is empty locale (new ULocale("")), then null.
Question from Markus: Should any of the fallbacks ever get to null? Should they not stop at the root locale?
ULocale.getFallback(String) never get to "root", the final fallback is "".
ULocale.getFallback(String) may return a locale string ending empty segment. For example, "en__POSIX" -> "en_" -> "en" -> ""
Design Questions
What is the canonical representation of root locale?
Three possible options - "" (empty string), "root", or "und" (undetermined)
JDK 1.6 added Locale.ROOT using empty language - new Locale("", "", "").
{Yoshito} I prefer "" (empty string) for several reasons
Logical (no special handling)
Same with Java
However, backward compatibility concern - can we change ULocale.ROOT from ULocale("root") to ULocale("") now?
Normalization
What should be done in the locale constructor?
What is the expected behavior of ULocale.canonicalize(String) ?
{Yoshito} canonicalize should normalize casing and following mappings
Deprecated ICU locales
fr_FR_PREEURO -> fr_FR@currency=FRF
hi__DIRECT -> hr@collation=direct
other variants mapped to keywords
Grandfathered BCP 47 tags
art_LOJBAN -> jbo
zh_HAKKA / zh__HAKKA -> hak
other BCP47 grandfathered tag - preferred mapping
POSIX
C -> en_US_POSIX
.NET names
az_AZ_CYRL -> az_Cyrl_AZ
zh_CHS -> zh_Hans
Common mistakes
three-letter language codes (eng) that have two letter equivalents
three-letter region codes (xxx) that have two letter equivalents
three-digit codes (813) that have two letter equivalents
swapping script and region code (see also .NET names above)
Deprecated codes
iw -> he
some others
Proposed Changes
ULocale.ROOT
current: new ULocale("root");
proposed:new ULocale("");
current: ULocale("en__POSIX") -> ULocale("en_") -> ULocale("en") -> ULocale("") -> null
proposed: ULocale("en__POSIX") -> ULocale("en") -> ULocale("") -> null
current: "en__POSIX" -> "en_" -> "en" -> "" -> ""
proposed: "en_POSIX" -> "en" -> "" -> null?
Conclusions
A conference call was held for discussing these design questions on 2009-11-17. Attendees: Mark, Markus, Doug, Umesh and Yoshito.
Our conclusions are below:
ULocale.ROOT.toString() == "", not "root"
BCP47
ULocale.ROOT to BCP47 "und"
Locale.ROOT to BCP47 "und"
BCP47 "und" to ULocale ""
getFallback() chops off from nominal form not from canonical form, never leaves trailing underscore, just works on '_'-separated strings.
class ULocale {
...
static ULocale getCanonicalInstance(String); // Factory
ULocale getCanonicalEquivalent(); // Uses cached internal pointer
...
};
Canonicalize "und" to "". ULocale("und-DE") will have
lang = "und", region = "DE"
canonical lang = "", canonical region = "DE"
Resource bundles
root.res
de.res
de-DE.res
root-DE.res
en -> "" -> null?
Yes in Java ULocale#getFallback() - e.g. ULocale("en") -> ULocale("") -> null
No in Java ULocale#getFallback(String) - e.g. "en" -> "" -> ""
No in C++
Document well.