ICU 4.4

All

Regex using abstract text access APIs (UText), roll in work by Jordan Rose: #4521 (ensure perf OK, ensure UTF8 support)
Ensure that there are APIs providing access to all CLDR data: e.g. #4836, #5478, etc. (Google also interested) (Peter has CLDR task to enumerate the data that is missing; based on that we can file additional bugs and divide up the work)
Improved search capabilities (Peter to generate design doc) - mainly asymmetric search, i.e. type e, match e,é,è; type é, match é but probably not e and certainly not è (#7093) (Google also interested). Other possibilities (lower priority) include:
- Position-dependent matching? (e.g. Arabic HEH and TEH MARBUTA should match for a search when both are at the end of a word)
- Use of search object distinct from collator? (Possible optimization, may not be necessary, not of interest to others)
Reduce ICU4C dynamically-allocated memory, especially for time zone data (more compact data formats may help with this): #6873, #6879 (Google also interested) (Peter will look at porting Yoshito's ICU4J work to C; requires interpreting const in a "logical" way - can do lazy loading, just make sure thread safe. Should document this interpretation. Peter to coordinate with Andy on this)

- Number spellout format & parse support for CJK numbers, including in dates. Note: CLDR 1.7 added relevant capability per cldrbug:1927; is there anything else that needs to be done in ICU (may work if appropriate patterns are used, Peter will do some experiments)
- Support >2GB text length for search, regex, text break, encoding conversion, perhaps transliteration. Use of UText will provide appropriate interfaces for regex and RBBI with additional internal changes. #5451 is for the RBBI changes.
- Encoding detection for a wider range of encodings, with some finer distinctions. For example pure ShiftJIS text should return both ShiftJIS and cp932 with 100% confidence; text including cp932 extensions should also return both but with lower confidence for ShiftJIS.
- Additional conversion tables (not necessarily in default build). Don't need a ticket for this yet.

General focus: Usability, Maintainability and Performance
- Code and Data Maintainability Improvements, e.g. Separating timezone data from code.
- Overriding/updating locale information in an ICU installation: 4597 6633
- Collation and string search service code clean up: 4562
- Misc layout bug fixes: 5589 6625 5431 6113 6182
- Improved ICU performance and regression for selected service areas only, e.g. Collation
- Extended IETF BCP47 support: language/locale specification for HTTP/XML/OpenJDK
- Lenient parsing, e.g. DateFormat. (Already implemented by Apple on branch)
- Locale service SPI
- JSR-310 Date and Time APIs
- @provider multiple version support, Calling old ICU service code through new ICU API
- Java 5 migration (ICU4J)
- Supporting generics to match JDK APIs
- ICU 4.4 will no longer support Java 1.4 or older versions
- Java Logging support (ICU4J)
- ICU Resource Bundle footprint optimization

Page updated

Report abuse