Representation of alternate data values in ICU

Trying to come up with a uniform way to allow access to alternate values ( "@alt=something" ) from CLDR within ICU's resource bundles. There are a number of different scenarios, both in terms of usage and in desired lookup style.

Current Places in CLDR where we have alternate values:

    1. Short or variant spellings of language names - such as:

<language type="az">Azerbaijani</language>

<language type="az" alt="short">Azeri</language>

2. Stand-alone forms of script names - such as:

<script type="Hans">Simplified</script>

<script type="Hans" alt="stand-alone">Simplified Han</script>

3. Short or variant forms of country names - such as:

<territory type="HK">Hong Kong SAR China</territory>

<territory type="HK" alt="short">Hong Kong</territory>

4. Variant forms of the day period names - such as:

<dayPeriodWidth type="wide">

<dayPeriod type="am">AM</dayPeriod>

<dayPeriod type="am" alt="variant">a.m.</dayPeriod>

<dayPeriod type="noon">noon</dayPeriod>

<dayPeriod type="pm">PM</dayPeriod>

<dayPeriod type="pm" alt="variant">p.m.</dayPeriod>

</dayPeriodWidth>

5. Short forms of unit names - such as:

<unit type="second">

<unitPattern count="one">{0} second</unitPattern>

<unitPattern count="one" alt="short">{0} sec</unitPattern>

<unitPattern count="other">{0} seconds</unitPattern>

<unitPattern count="other" alt="short">{0} secs</unitPattern>

</unit>

Possible ways you might want to do the lookup for item in locale: foo-bar-zip alt=short

A. (“phonebook” style) (this is what we get if we use the existing, internal lookup function that takes a path to the requested item, using it once with “foo/bar/zip%short” and if it fails using the function again with “foo/bar/zip”)

foo-bar-zip alt=short

foo-bar alt=short

foo alt=short

foo-bar-zip

foo-bar

foo

B.("nice to have" style, where the alternate is a nice to have, but not as important as getting a value from the locale you want

foo-bar-zip alt=short

foo-bar-zip

foo-bar alt=short

foo-bar

foo alt=short

foo

C. ( "exact match" style - places where the non-alt version is not a good fallback for the alt version, i.e. the alt version has specific semantics that you want to enforce in the lookup.

foo-bar-zip alt=short

foo-bar alt=short

foo alt=short

Likely desired lookups for each scenario above:

    1. Short or variant spellings of language names - A or B

    2. Stand-alone forms of script names - C

    3. Short or variant forms of country names - A or B

    4. Variant forms of the day period names - B

    5. Short forms of unit names - A

PROPOSED structures for the data:

Original Proposal ( John & Yoshito )

en{

Countries{

GY{"Guyana"}

HK{

default{"Hong Kong SAR China"}

short{"Hong Kong"}

}

HM{"Heard Island and McDonald Islands"}

}

}

Downside: Lack of forward compatibility - difficulty in mapping xpath to value for LDML2ICUConversion

MARK's idea:

en{

Countries{

GY{"Guyana"}

HK{"Hong Kong SAR China"}

HK_alt{

short{"Hong Kong"}

}

HM{"Heard Island and McDonald Islands"}

}

}

Downside: Difficult to count/enumerate values, because you have to "sift out" the alternate values.

MARKUS's idea

en{

Countries{

GY{"Guyana"}

HK{"Hong Kong SAR China"}

HK%short{"Hong Kong”}

HM{"Heard Island and McDonald Islands"}

}

}

Downside: Difficult to count/enumerate values, because you have to "sift out" the alternate values.

John's UPDATED Proposal:

en{

Countries{

GY{"Guyana"}

HK{"Hong Kong SAR China"}

HM{"Heard Island and McDonald Islands"}

}

Countries%short{

HK{"Hong Kong"}

}

}