ICU - International Components for Unicode

ICU 71

ICU is the premier library for software internationalization, used by a wide array of companies and organizations.

Release Overview

ICU 71 updates to CLDR 41 locale data with various additions and corrections.

ICU 71 adds phrase-based line breaking for Japanese. Existing line breaking methods follow standards and conventions for body text but do not work well for short Japanese text, such as in titles and headings. This new feature is optimized for these use cases.

ICU 71 adds support for Hindi written in Latin letters (hi_Latn). The CLDR data for this increasingly popular locale has been significantly revised and expanded. Note that based on user expectations, hi_Latn incorporates a large amount of English, and can also be referred to as “Hinglish”.

ICU 71 and CLDR 41 are minor releases, mostly focused on bug fixes and small enhancements. (The fall CLDR/ICU releases will update to Unicode 15 which is planned for September.) We are also working to re-establish continuous performance testing for ICU, and on development towards future versions.

ICU 71 updates to the time zone data version 2022a. Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

For more details, including migration issues, see below.

Please use the icu-support mailing list and/or find/submit error reports.

🔴🔴🔴 Do you need ICU to work on EBCDIC platforms? 🔴🔴🔴

- We need help: Someone needs to build ICU4C on a native-EBCDIC machine (z or i), fix C++ compiler issues (if any), fix issues related to an EBCDIC codepage as the system encoding, and test frequently (or add their machine into our CI). Please contact us via the icu-support mailing list.
- Otherwise we will remove the support code for non-ASCII-family platforms. Details: ICU-21672

Version Number

The initial release has library version number 71.1.

- Release date: 2022-04-07
- List of tickets fixed in ICU 71

If there are maintenance releases, they will be 71.2, 71.3, etc. (During ICU 71 development, the library version number was 71.0.x.)

Note: There may be additional commits on the maint/maint-71 branch that are not included in the prepackaged download files.

Common Changes

- CLDR 41 (blog) :
  - Limited-submission release. Phase 3 of the grammatical units of measurement project, to be supported by a future ICU release.
  - Hindi (Latin) (newly in ICU): There have been substantial additions made to hi_Latn. Note that based on user expectations, hi_Latn incorporates a large amount of English, and can also be referred to as “Hinglish”. That is, it is assumed to be content more formally identified as be hi-Latn-t-en-h0-hybrid.
  - Transliteration: Fourteen new transforms have been added for the Ethiopic script and languages written in it.
- Phrase-based line breaking for Japanese (“Bunsetsu”, 文節): Preferred line breaking style for headings and other short text (ICU-21699)
  - From CLDR: “Prioritize keeping natural phrases (of multiple words) together when breaking, used in short text like title and headline”
  - Usage: Create a line BreakIterator for language tag "ja-u-lw-phrase" (legacy locale ID "ja@lw=phrase")
  - See the LDML specification (CLDR) for lw
- The DateTimePatternGenerator now uses the appropriate date-time combining pattern as specified by CLDR for skeletons which combine date and time elements, but do not match skeletons for any single availableFormats entry. (ICU-21353).
  - For example, if you use DateFormat.getInstanceForSkeleton("MMMMEEEEdjm", Locale.ENGLISH), the output will change from "Thursday, February 22, 12:30 PM" to the correct "Thursday, February 22 at 12:30 PM" (changing from a comma to "at").
  - To go along with this, there are new interfaces for retrieving or overriding any of the four standard date-time combining patterns.
- NumberRangeFormatter: New output field for the “approximately sign” (ICU-21765)
  - In ICU 70, this was categorized under the generic SIGN field.
- Time zone data (tzdata) version 2022a (2022-mar). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

ICU4C Specific Changes

- NumberFormat: Support arbitrary-precision rounding increment (ICU-21908)
  - (Already supported in ICU4J via BigDecimal.)
- API changes since ICU4C 70 (Markdown) / (HTML)

ICU4J Specific Changes

- ICU 71 still only requires Java 7 but has also been tested with Java 8..16
- API Changes since ICU4J 70

Migration Issues

1. There is a regression in DateIntervalFormat which can cause the Java version to fail to construct when the input skeleton string includes both hour and day period, but the day period doesn't immediately follow the hour (for example: "Bh" or "hmsa"). (ICU-21984, fixed in pull request #2060, cherry-picked to the maint/maint-71 branch: #2066)
2. The DateTimePatternGenerator bug fix for choosing the correct date+time combining pattern will cause easily visible changes in output for applications that use date/time skeletons, for example changing "<date>, <time>" to "<date> at <time>".
3. See also CLDR 41 migration issues.

ICU4C Platform Support

ICU4C requires C++11 and has been tested with up to C++20.

We routinely test on recent versions of Linux, macOS, and Windows.

We accept patches for other platforms.

Windows: The minimum supported version is Windows 7. (See How To Build And Install On Windows for more details.)

ICU4J Platform Support

ICU4J works on Java 7..16 and on Android API level 21 and later.

Download

Source and binary downloads are available on the git/GitHub tag page: https://github.com/unicode-org/icu/releases/tag/release-71-1

See the Source Code Access page for how to download the ICU file tree directly from GitHub.

ICU locale data was generated from CLDR tag https://github.com/unicode-org/cldr/releases/tag/release-41.

Maven dependency:

</dependency>

Page updated

Report abuse