ICU - International Components for Unicode

Norwegian locales changes in v39

Changes for Norwegian locale IDs in CLDR 39, ICU 69

(P. Edberg, 2021-Mar-01)

Summary:

In CLDR 39 and ICU 69, the locale code “no” (Norwegian) is no longer an alias (for CLDR and ICU purposes) and becomes the parent locale for locales "nb" (Norwegian Bokmål) and "nn" (Norwegian Nynorsk). The latter two locales retain their regional locales: “nb_NO”, “nb_SJ”, “nn_NO”. The locale code “no” is no longer deprecated. The actual content for locale “nb” (which represents “nb_NO”) is being moved into “no”, so that “nb” and “nb_NO” will be empty “default content” locales.

Motivation:

- Handle the ISO 639 macro-language code “no” in a way that is more consistent with the way that other macrolanguage codes are handled by CLDR and ICU; see Background & Change Details below. Note that some CLDR and ICU clients already swap the handling of “nb” and “no” in order to treat “no” more like other macrolanguages in CLDR.
- Provide a user experience that more closely matches how Norwegian users think of the languages. For example, it has been confusing having only collations labeled "Norwegian Bokmål" and "Norwegian Nynorsk", instead of a collation “Norwegian” which is shared by both Bokmål and Nynorsk.
- Provide better data organization and consistency (and to reduce size) by allowing inheritance of items like formats from “no” to “nn”. Something on the order of 20% or more of the data items for “nn” can be inherited from “no”.

Background:

Based on ISO 639-3, the IANA subtag registry designates certain language subtags as macrolanguages to which a set of individual language codes belong. For example, the macrolanguage subtag "zh" (Chinese) encompasses the following, among others:

- "cmn" (Mandarin)
- "yue" (Cantonese)
- "wuu" (Wu Chinese)
- "hak" (Hakka Chinese)
- ...

Another such macrolanguage is "no" (Norwegian), which encompasses the following individual language codes:

- "nn" (Norwegian Nynorsk)
- "nb" (Norwegian Bokmål)

In general, the way that CLDR and ICU have handled such macrolanguage codes (e.g. "zh" or "ar") is to treat them as representing the most prominent encompassed language, so that (for example) "zh" really means "cmn" and is used instead of it, and "ar" really means "arb" (see unicode_language_subtags); the CLDR supplemental <alias> element has entries like

CLDR has not had separate display names for e.g."cmn" or "arb".

However, this approach has not been used for the Norwegian codes in CLDR, because the whole ISO 639-3 macrolanguage mechanism postdated the original ICU/CLDR decision on how to handle the Norwegian codes. Instead, "no" has been treated as a legacy code that is aliased to "nb":

CLDR has not had locale data for "no", only for "nb" and "nn", which have been treated as separate languages (not inheriting data from one another), with separate (but identical) collation data etc. However, CLDR has had localized display names for all of "nb", "nn", and "no".

Change Details:

In CLDR 39 the specific changes are:

- The CLDR languageAlias mapping for “no” is removed, i.e. “no” is no longer a deprecated alias to “nb”.
- "no" is made the parent locale of “nb” and "nn", so that "nn" can inherit data such as formats and collation from "no". Note that this is the first instance of a parent locale with a different language code than one of its child locales. Such inheritance only makes sense if the written forms are quite close; see Wikipedia article on the two forms of written Norwegian.
- The content of the "nb" locale is moved to "no", so that "nb" is empty in CLDR.
- For data such as collation and rbnf where “nb” and “nn” previously had identical content, the real content is now in “no”, and “nb” and “nn” both are empty stubs inheriting from “no”.
- Coverage, likelySubtags, and validity data are updated accordingly.

On the ICU side, some consequences are:

- The “no” locale data files have actual content (for Bokmål), instead of just being a stub with an %%ALIAS entry pointing to the corresponding “nb” file.
- The “nb” and “nn” locale data files now begin with a %%Parent{"no"} element. In some cases this may be the entire content, as with the collation files.
- The actual collation data is now called just “Norwegian”.
- In most cases (except in some of the associated locale directories such as curr/ or lang/, the locale files that previously existed still exist, they may just have different contents (inheriting more from “no”).

Migration issues:

- “no” is no longer a deprecated alias, it is a fully-valid locale code.
- “nb” and “nn” inherit from a prent locale “no”. This is the first case in which a language-only locale code has a parent other than “root”. Code that assumes that language-only locale codes (i.e. locale IDs without ‘_’or ‘-’ ) can only have a parent of “root” may now fail.
- Implementations cannot strip the “no” locale if they support either the “nb” or “nn” locales.

For further information and updates, please also see the CLDR 39 release note.

Page updated

Report abuse