ICU - International Components for Unicode

ICU 73

ICU is the premier library for software internationalization, used by a wide array of companies and organizations.

Release Overview

Note: The ICU 73.2 maintenance update (see below) was released on 2023-06-15.

ICU 73 updates to CLDR 43 (blog) locale data with various additions and corrections.

ICU 73 improves Japanese and Korean short-text line breaking, reduces C++ memory use in date formatting, and promotes the Java person name formatter from tech preview to draft.

ICU 73 updates to the time zone data version 2023c (March 2023). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

For more details, including migration issues, see below.

Please use the icu-support mailing list and/or find/submit error reports.

Version Number

The initial release has library version number 73.1.

- Release date: 2023-04-13
- List of tickets fixed in ICU 73

If there are maintenance releases, they will be 73.2, 73.3, etc. (During ICU 73 development, the library version number was 73.0.x.)

Note: There may be additional commits on the maint/maint-73 branch that are not included in the prepackaged download files.

ICU 73.2 maintenance release

Release date: 2023-06-15

ICU 73.2 updates to CLDR 43.1 locale data. These are maintenance releases for ICU 73 and CLDR 43, with limited sets of bug fixes and no API or structural changes.

There are significant changes for GB18030-2022 compliance support:

CLDR extends the support for “short” Chinese sort orders to cover some additional, required characters for Level 2. This is carried over into ICU collation.
ICU has a modified character conversion table, mapping some GB18030 characters to Unicode characters that were encoded after GB18030-2005.

There are also changes for compatibility:

There are optional variants of time formats with AM/PM (only for English) using ASCII spaces in CLDR that can also be used in ICU via custom data generation. This is intended to help certain implementers transition to the improved patterns, which have used a narrow no-break space between the time and AM/PM since CLDR 42.
- For how to generate ICU data with this option, look for alt="ascii" on tools/cldr/cldr-to-icu/README.md
The changes to the word segmentation behavior of @ sign that were in CLDR 42 (ICU 72) have been reverted. These caused problems for certain parsers that did not expect @ to join to letters.

ICU 73.2 and CLDR 43.1 include several other bug fixes, including person name formatting, and Cyrillic transforms.

List of tickets fixed in ICU 73.2

Next Release (FYI)

For the next release, ICU 74 in 2023-oct, we plan to make the following changes:

C: Require C11 (up from C99)
C++: Require C++17 (up from C++11)
Java: Switch from ant to Maven, and rearrange the source file tree to the Maven default

Common Changes

- CLDR 43 (blog) :
  - CLDR 43 is a limited-submission release. Data for many languages has been improved.
  - In English, the name “Türkiye” is now used for the country instead of “Turkey” (the alternate spelling is also available in the data). Where appropriate, a corresponding term is used in other languages.
  - Person name formatting data is now complete and out of “tech preview”.
  - Collation: Improved sorting & matching of “fancy quotes”, Geresh, and Gershayim in the default (CLDR root) sort order. (CLDR-15946, L2/23-016)
    - Several punctuation marks now compare primary-equal to their single and double quote ASCII fallbacks. This makes them easier to find, and groups names together that only differ in whether ASCII quotes or typographic quotes are used.
  - A new unit was added for the Beaufort scale (wind speed).
  - Improved and expanded data for likely subtags.
- Line breaking with Japanese phrase-based breaking is now using the BudouX machine learning implementation for better quality. (ICU-22100, see ICU 71 ICU-21699 for context)
- Phrase-based line breaking for Korean now breaks at spaces (approximates word boundaries). (ICU-22119)
- The UnicodeSet::closeOver() function has a new option for simple case folding. (ICU-6065)
  - C: USET_SIMPLE_CASE_INSENSITIVE / Java: UnicodeSet.SIMPLE_CASE_INSENSITIVE
  - This is useful for implementations that use Simple_Case_Folding (1:1 code points) for case-insensitive matching rather than the full Case_Folding (1:n) mappings. For example, ECMAScript (JavaScript) regular expressions use simple case foldings.
- Several small Calendar API additions to facilitate implementations of the proposed ECMAScript Temporal API. (ICU-22027)
- Time zone data (tzdata) version 2023c (2023-mar). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

ICU4C Specific Changes

- API changes since ICU4C 72 (Markdown) / (HTML)
- New classes SimpleNumber and SimpleNumberFormatter, with a subset of NumberFormatter functionality for less memory, more object reuse, and fewer code dependencies. (ICU-22093)
  - The SimpleDateFormat classes now uses SimpleNumberFormatter, significantly reducing heap memory use. (ICU-20115)

Some internal changes:

- Continuous Integration with undefined-behavior sanitizer (UBSan) and alignment sanitizer, and code changes. (ICU-22224)
- Continuous Integration with a subset of Control Flow Integrity checks and code changes. (ICU-21374)
- Implementation code relies more on C++11 (char16_t, nullptr, override, ...) with fewer typedefs and conditional defitions. (ICU-21833)

ICU4J Specific Changes

- New class PersonNameFormatter implementing the draft specification of CLDR person name formatting. (ICU-22081)
  - Added in ICU 72 as a technology preview.
  - Promoted to draft in ICU 73, with some API changes. (ICU-22287)
  - CLDR background on why this feature is being added and what it does.
- Technology Preview since ICU 72: New class MessageFormatter implementing the draft specification of the CLDR MessageFormat Working Group. (ICU-22124, draft message syntax)
- Since ICU 72: ICU now requires Java 8 but has also been tested with Java 11 & 16 (ICU-22116)
  - On Android, you may need to enable “library desugaring” depending on your target API level and which parts of ICU you include.
  - Most of the ICU 72 library code should still work with Java 7 / Android API level 21, but we no longer test with Java 7.
- API Changes since ICU4J 72

Known Issues

ICU4J

ICU-22333 Ant target releaseJarCheck fails with "No runnable methods" in ExhaustivePersonNameFormatterTest. ICU4J users building the library from source with the build target 'releaseVer' are affected. The problem has been fixed on the main branch by PR#2425.

Migration Issues

1. See CLDR 43 migration issues
  1. For ICU users who generate ICU data directly from CLDR: In the CLDR repo, the "seed" data has been merged into the "common" file tree (CLDR-6396). As a result, there are many more locale data files in CLDR "common", but many that were moved do not have usable data item coverage and are therefore not automatically added to ICU. See the CLDR Migration section for details.
2. Interval Formats: A small number of interval formats (like “Dec 2 – 3”) have their spacing changed for consistency. This is unlikely to cause problems, as they are similar to a large number of similar changes in CLDR 42/ICU 72.
3. The “gb2312” and “big5han” Chinese collation tailorings are no longer included in the ICU binary data. (ICU-22285)
  1. These are based on the code point order of their respective legacy charsets. By contrast, the “pinyin” and “stroke” sort orders, which are the defaults for the regional variants of Chinese, are based on current Unicode Han character data.
  2. The ICU source data files still include the data for these tailorings. See the User Guide for how to include them in the binary data.
  3. Future versions of CLDR and ICU may remove the source data for these tailorings. (CLDR-16062)

ICU4C Platform Support

ICU4C requires C++11 and has been tested with up to C++20.

We routinely test on recent versions of Linux, macOS, and Windows.

We accept patches for other platforms.

Windows: The minimum supported version is Windows 7. (See How To Build And Install On Windows for more details.)

ICU4J Platform Support

ICU4J works on Java 8..17.

ICU4J should work on Android API level 21 and later but may require “library desugaring”.

Download

Source and binary downloads are available on the git/GitHub tag page:

See the Source Code Access page for how to download the ICU file tree directly from GitHub.

ICU locale data was generated from the CLDR tag:

73.2 from 43.1: https://github.com/unicode-org/cldr/releases/tag/release-43-1
73.1 from 43: https://github.com/unicode-org/cldr/releases/tag/release-43

Maven dependency:

</dependency>

Page updated

Report abuse