Proposal 20070719

Proposal email sent to the icu-design list on 2007-jul-19.

time zone API: getDisplayName()

Markus Scherer <markus.icu@gmail.com>

Thu, Jul 19, 2007 at 1:50 PM

To: icu-design@lists.sourceforge.net

Dear ICU team,

Below please see an exchange from last November between John Emmons

and myself. It contains an API proposal of sorts, showing wrapper code

and suggesting something like it for ICU.

I would like to see if we could add this for ICU 3.8, even if it were

to use DateFormat under the covers right now, like my wrapper here

does.

On the question of an API that takes a "bool daylight" but not a

date/time value, I understand from John's reply that it is problematic

-- a time zone might not have used daylight savings time consistently

in the past. However, it might still be useful for getting a display

name when you have a Unix struct tm or similar so that you need not

puzzle together (or guess!) an appropriate date/time value. What do

you think? (If this is too controversial, although it follows the

current API more closely, I think I can do without it, at least for

now. If we had to use a DateFormat right now, then this variant would

not be easy to implement anyway.)

John & Mark, could you please bring me up to speed on your work on

meta time zones?

markus

Forwarded Conversation

Subject: time zone API: getDisplayName()

------------------------

From: Markus Scherer <markus.icu@gmail.com>

To: John Emmons <emmo@us.ibm.com>, Mark Davis <mark.davis@icu-project.org>

Date: Sat, Nov 4, 2006 at 8:02 AM

Hi John,

Some ICU meetings ago you said you were working on improved time zone

display name look-ups, and I said I would work with you on the API

where we need to be able to request particular forms. Sorry it took me

so long to start the discussion!

So here we go. I have created a thin wrapper around the ICU4C TimeZone

class to provide a smaller API with the requested features. I am

copying the relevant parts below. I don't know if you are working only

on getDisplayName() or also on getOffset(). This just includes the

parts for getDisplayName(). Please let me know if you are also working

on getOffset().

For getDisplayName(), I essentially added a DisplayStyle enum

parameter with the CLDR-defined choices directly selectable. These are

the preferred formats; of course there will be fallbacks as necessary.

I also have a DisplayLength enum mirroring ICU's EDisplayType (short

vs. long format).

My implementation currently uses a DateFormat, which is slow and does

not quite provide the granularity of format selection, at least in the

current implementation. (The missing granularity should probably be

fixed in DateFormat as well.) Also because of the DateFormat, I ended

up only implementing a function for now that takes a point-in-time

parameter (so that I have a time to stick into the DateFormat), rather

than the more direct function that takes the boolean daylight

selector.

The goal is to have a TimeZone::getDisplayName() function, much like

the one in my wrapper, with a selector like the DisplayStyle here so

that I can implement my wrapper much more directly, without the detour

through the DateFormat.

What do you think?

The following parts of my wrapper API include the getDisplayName().

// Constants for use with GetDisplayName(), for whether a short or

// a long display name is desired.

// Keep the constants and their numeric values in sync with

// ICU's TimeZone::EDisplayType.

enum DisplayLength {

SHORT = 1,

LONG = 2

};

// Constants for use with GetDisplayName(), selecting the

// style of time zone display name.

enum DisplayStyle {

GMT_OFFSET, // GMT+9:30

RFC822, // +0930

GENERIC, // Pacific Time

SPECIFIC, // Pacific Standard Time or Pacific Daylight Time

LOCATION, // Los Angeles (US)

STYLE_COUNT

};

// Get a display name for the time zone and the specified display locale.

// The locale should be a string like "en", "de_CH" or "zh_Hans".

// If there is no good display name available for the time zone ID, then

// the time zone ID itself is returned.

// The returned string will usually contain non-ASCII characters.

//

// TODO(mscherer): Currently ICU is missing functionality:

// If the LOCATION style is requested, the function may return

// the GENERIC or SPECIFIC style instead.

UnicodeText GetDisplayName(const DateTime &time,

DisplayStyle style,

DisplayLength length,

const string &display_locale) const;

#if 0

// TODO(mscherer): Add this API function here once ICU has a corresponding API

// function. The current icu::TimeZone::getDisplayName() takes a bool daylight

// but does not support this style parameter.

// Instead, the current GetDisplayName(time, ...) implementation

// uses an ICU DateFormat object which requires a datetime parameter.

// We would have to guess a datetime for implementing the version below.

// Overload that takes a bool daylight instead of the time value.

UnicodeText GetDisplayName(bool daylight,

DisplayStyle style,

DisplayLength length,

const string &display_locale) const;

#endif

Best regards,

markus

--------

From: John Emmons <emmo@us.ibm.com>

To: Markus Scherer <markus.icu@gmail.com>

Date: Mon, Nov 6, 2006 at 8:32 AM

Hi Markus,

Looks like a good start. However, my biggest concern, which is the

same one that Mark and I are grappling with right now, is how to deal

with Olson zones that may have a different display name depending on

the time in question. In these scenarios, it is difficult or nearly

impossible to implement a getDisplayName() function without going

through DateFormat.

For example,

America/Indiana/Knox - Includes many counties in Indiana that

currently observe CST in winter and CDT in summer. But, prior to

2006, these counties observed EST year round. So in these cases, you

can't do a reliable lookup of the time zone's display name without

knowing which time we are talking about, unless you are willing to

live with an API that returns the display name only as it applies to

the current modern time, and I question how useful such an API would

be in practice.

We are also dealing the complexities of how to deal with the fact that

often many Olson zones share a commonly used display name, and we

don't want to have to duplicate these display names everywhere in

CLDR. Things like "Atlantic Standard Time" can apply to

"America/Halifax", but also to "Atlantic/Bermuda", "America/Barbados",

etc. Since they often cross country boundaries, we have the

potential for political conflicts. For example, if I decide I'm going

to put the translations for "Central European Time" in "Europe/Paris",

and alias "Europe/Berlin" to it, do the Germans get upset? And then

what happens when "Europe/Paris" changes its rules? I think you can

appreciate the complexities involved here...

At this point, I am toying with the possibilities of having a

"meta-time zone" that we could define in CLDR for naming purposes, and

then we could define the fact that a certain Olson zone "observes" one

of the meta zones during a specific time period. Right now I'm trying

to formulate a syntax for this that would make sense and cover the

scenarios we need it to.

You're certainly welcome to participate in the discussion and design

of this. Right now Mark and I are working on it together since no one

else seems to care...

Regards,

John C. Emmons

Globalization Architect

IBM Software Group, Austin TX

Ph. 512-838-8184/512-259-9051

Internet: emmo@us.ibm.com

"Markus Scherer" <markus.icu@gmail.com>

11/04/2006 09:02 AM

To John Emmons/Austin/IBM@IBMUS, "Mark Davis" <mark.davis@icu-project.org>

cc

Subject time zone API: getDisplayName()

[Quoted text hidden]

--------

From: Markus Scherer <markus.icu@gmail.com>

To: John Emmons <emmo@us.ibm.com>

Date: Mon, Nov 6, 2006 at 11:08 AM

Hi John, thanks for the reply and the reminder that I am still

underestimating how messy time zones are!

On 11/6/06, John Emmons <emmo@us.ibm.com> wrote:

... my biggest concern, which is the same one that Mark and I are grappling with right now, is how to deal with Olson zones that may have a different display name depending on the time in question. In these scenarios, it is difficult or nearly impossible to implement a getDisplayName() function without going through DateFormat.

... America/Indiana/Knox - Includes many counties in Indiana that currently observe CST in winter and CDT in summer. But, prior to 2006, these counties observed EST year round. ...

Very good point. This does smell like deprecating versions of

getDisplayName() that do not take a date/time value, and adding ones

that do. However, I would hate for such methods to go through

DateFormat, particularly because that means creating one inside the

method, using it once, and throwing it away -- or else mutexing the

use of an owned DateFormat object. Either way is a slow bottleneck. It

seems like it should be the other way around: A new version of

TimeZone::getDisplayName() should be able to figure out the display

name based on the provided date/time, and DateFormat should call it

with the date/time and with the style and length selectors.

So in these cases, you can't do a reliable lookup of the time zone's display name without knowing which time we are talking about, unless you are willing to live with an API that returns the display name only as it applies to the current modern time, and I question how useful such an API would be in practice.

Makes sense. I think we will have to implement this "current behavior"

lookup for the current API though because we don't have the date/time

available and we can't remove the current API.

We are also dealing the complexities of how to deal with the fact that often many Olson zones share a commonly used display name...

For example, if I decide I'm going to put the translations for "Central European Time" in "Europe/Paris", and alias "Europe/Berlin" to it, do the Germans get upset? And then what happens when "Europe/Paris" changes its rules? I think you can appreciate the complexities involved here...

Somewhat. I am not sure that anyone would be upset by attaching shared

data to one or the other arbitrarily, for example by using alphabetic

order or something else neutral for choosing the anchor point for the

data.

At this point, I am toying with the possibilities of having a "meta-time zone" that we could define in CLDR for naming purposes, and then we could define the fact that a certain Olson zone "observes" one of the meta zones during a specific time period.

This seems like a nice solution even from a technical standpoint,

politics aside.

Right now I'm trying to formulate a syntax for this that would make sense and cover the scenarios we need it to.

You're certainly welcome to participate in the discussion and design of this. Right now Mark and I are working on it together since no one else seems to care...

Well, my main interest is getting to a more usable API, but I would be

happy to participate in bouncing around the data organization as well.

Best regards,

markus