Representation of alternate data values in ICU
Trying to come up with a uniform way to allow access to alternate values ( "@alt=something" ) from CLDR within ICU's resource bundles. There are a number of different scenarios, both in terms of usage and in desired lookup style.
Current Places in CLDR where we have alternate values:
Short or variant spellings of language names - such as:
<language type="az">Azerbaijani</language>
<language type="az" alt="short">Azeri</language>
2. Stand-alone forms of script names - such as:
<script type="Hans">Simplified</script>
<script type="Hans" alt="stand-alone">Simplified Han</script>
3. Short or variant forms of country names - such as:
<territory type="HK">Hong Kong SAR China</territory>
<territory type="HK" alt="short">Hong Kong</territory>
4. Variant forms of the day period names - such as:
<dayPeriodWidth type="wide">
<dayPeriod type="am">AM</dayPeriod>
<dayPeriod type="am" alt="variant">a.m.</dayPeriod>
<dayPeriod type="noon">noon</dayPeriod>
<dayPeriod type="pm">PM</dayPeriod>
<dayPeriod type="pm" alt="variant">p.m.</dayPeriod>
</dayPeriodWidth>
5. Short forms of unit names - such as:
<unit type="second">
<unitPattern count="one">{0} second</unitPattern>
<unitPattern count="one" alt="short">{0} sec</unitPattern>
<unitPattern count="other">{0} seconds</unitPattern>
<unitPattern count="other" alt="short">{0} secs</unitPattern>
</unit>
Possible ways you might want to do the lookup for item in locale: foo-bar-zip alt=short
A. (“phonebook” style) (this is what we get if we use the existing, internal lookup function that takes a path to the requested item, using it once with “foo/bar/zip%short” and if it fails using the function again with “foo/bar/zip”)
foo-bar-zip alt=short
foo-bar alt=short
foo alt=short
foo-bar-zip
foo-bar
foo
B.("nice to have" style, where the alternate is a nice to have, but not as important as getting a value from the locale you want
foo-bar-zip alt=short
foo-bar-zip
foo-bar alt=short
foo-bar
foo alt=short
foo
C. ( "exact match" style - places where the non-alt version is not a good fallback for the alt version, i.e. the alt version has specific semantics that you want to enforce in the lookup.
foo-bar-zip alt=short
foo-bar alt=short
foo alt=short
Likely desired lookups for each scenario above:
Short or variant spellings of language names - A or B
Stand-alone forms of script names - C
Short or variant forms of country names - A or B
Variant forms of the day period names - B
Short forms of unit names - A
PROPOSED structures for the data:
Original Proposal ( John & Yoshito )
en{
Countries{
GY{"Guyana"}
HK{
default{"Hong Kong SAR China"}
short{"Hong Kong"}
}
HM{"Heard Island and McDonald Islands"}
}
}
Downside: Lack of forward compatibility - difficulty in mapping xpath to value for LDML2ICUConversion
MARK's idea:
en{
Countries{
GY{"Guyana"}
HK{"Hong Kong SAR China"}
HK_alt{
short{"Hong Kong"}
}
HM{"Heard Island and McDonald Islands"}
}
}
Downside: Difficult to count/enumerate values, because you have to "sift out" the alternate values.
MARKUS's idea
en{
Countries{
GY{"Guyana"}
HK{"Hong Kong SAR China"}
HK%short{"Hong Kong”}
HM{"Heard Island and McDonald Islands"}
}
}
Downside: Difficult to count/enumerate values, because you have to "sift out" the alternate values.
John's UPDATED Proposal:
en{
Countries{
GY{"Guyana"}
HK{"Hong Kong SAR China"}
HM{"Heard Island and McDonald Islands"}
}
Countries%short{
HK{"Hong Kong"}
}
}