Zend Framework: Zend_Locale Component Proposal
| Proposed Component Name | Zend_Locale |
|---|---|
| Developer Notes | http://framework.zend.com/wiki/display/ZFDEV/Zend_Locale |
| Proposers |
|
| Revision | 4.0 - 19 Sept 2006: Extracted translation classes to Zend_Translate 3.0 - 28 July 2006: Changed Class Skeleton and added new Use cases based on comments from Zend 2.1 - 11 July 2006: Changed Class Skeletons for matching actual work 2.0 - 20 Juni 2006: Reworked based on new Zend_Measure and Zend_Date Proposal. (wiki revision: 39) |
Table of Contents
- 1. Overview
- 2. References
- 3. Component Requirements, Constraints, and Acceptance Criteria
- 4. Dependencies on Other Framework Components
- 5. Theory of Operation
- 6. Milestones / Tasks
- 7. Class Index
- 8. Use Cases
- 9. Class Skeletons
1. Overview
Zend_Locale is a basic wrapper for all I18N and L10N issues for the Zend Framework. It provides the framework with all locale related informations. All classes which should be locale-aware should implement Zend_Locale.
2. References
Locale Description Standard
Details to Locale Description Standard which has to be implemented
LDML - Locale Data Markup Language
LDML the Locale Data Markup Language is part of the CLDR Project and it describes the locales XLM based.
- Description of the LDML format
- The Common Language Data Repository Project
- Example XML for LDML - only download
International Standards
The following international standards must be used
ISO 639
International Language Code Definition
ISO 639-1 for 2 letter, ISO 639-2 for 3 letter language codes
ISO 3166
International Country Code Definitions
ISO 3166-1 for 2 letter country codes
RFC 3066
Identification of languages
I18N General
Mailing Lists
Mailinglist discussions in past
Unicode Discussion for upcoming PHP6
Discussions for Zend Framework
- Johannes Orth - first thoughts about locale support
- Jason Mynard - general I18N thoughts
- Garvin Vess - first summary of needed functions
- Thomas Weidner - summarize functionality for Zend_Locale
- Garvin Vess - simplifying I18N needs
- Garvin Vess - First response of pre-proposal for Zend_Locale
3. Component Requirements, Constraints, and Acceptance Criteria
- Wrapper functionality
- Lightweight and fast implementation
- Simple use
- Automatic recognition of language the browser requests
4. Dependencies on Other Framework Components
- Zend_Exception
- Zend_Cache - (optional)
5. Theory of Operation
Basics
Zend_Locale is a wrapper for all locale related informations, especially for the CLDR, in the Zend Framework. It has to be simple to use and as lightweight as possible.
Locale description format
Zend_Locale has to know all language codes (see ISO 639 and ISO 3166). Therefor a locale description format has to be implemented. As format the free avaiable LDML format will be used (part of CLDR), as it is already used by many Open Source Projects.
Automatic locale recognition
Zend_Locale has to recognize which language the browser is requesting (see ISO 639 and ISO 3166), and automatically find the best matching locale for the client. (Internal Fallback mechanism). Alternative it could check for the system locale (environment variables) when f.e. used at command line.
Formatting with locales
Zend_Locale_Format can tell other functions how to format datas in different locales. For this, the Module knows how different locales are formatting data. Data can be everything which is defined in LDML. This implements especially:
- Date / Time
- Currency
- Measurement
- Numbers
- and all other descriptions which are handled through the LDML
Exception handling
Zend_Locale_Exception handles and throws all exceptions which our Zend_Locale Classes or their subclasses will throw
Basically we have to implement/know the following standards
ISO 639 - Language Codes
ISO 3166 - Region Codes
RFC 3066 - Country Codes
Outsourced or delayed functionality
Zend_Locale_Collate defines how alphabetic sorting has to be done in different locales, for example german, greek or russian where you have more than 26 letters or the pronouncing is different. This Class will be delayed for later implementation
Framework Components which use Zend_Locale
- Zend_Date
- Zend_Calendar
- Zend_Measure
- Zend_Currency
- Zend_Http_Client
- Zend_Http_Server
- Zend_Format_Input
6. Milestones / Tasks
7. Class Index
- Zend_Locale - base class
- Zend_Locale_Exception- exception handling
- Zend_Locale_Format - A standard interface for locale formatting
- Zend_Locale_Data - internal LDML handling class
8. Use Cases
Get default locale
Set other language
Get region and language
Get the accepted languages from a users browser
Formatting
Formatting numbers
Example useage of own date-format (Zend_Date will use Zend_Locale_Format internally)
Example useage of a own defined format
Checking if a locale string is a float value
Getting a locale translated list of all countries
6 Comments
comments.show.hideJul 17, 2006
Gavin
Zend_Locale is conditionally accepted subject to the conventions,
stipulations, requirements, and changes listed below.
Zend_Locale should focus on encapsulating the data specifying a
particular locale. It should contain information either fully or
partially describing a particular "locale", and includes functions to
access and modify this information.
A ZF default locale object should be used in most places where a
component or function expects an optional locale object.
The default locale object should be constructed from an instance of
Zend_Config shared by all ZF components.
The locale should be optional for constructors, not required.
We need some sample use cases showing how Zend_Locale and related
classes will be used with Zend_Input_Filter.
Where possible Zend_Date should be loosely coupled to Zend_Locale.
Ideally, many of the functions in Zend_Date would not even 'require()'
Zend_Locale.
Instances should be serializable.
The private static "class" variables should become protected instead of
private.
Text string translation functionality should be split to a separate
class. Will the TXM and gettext backends need Zend_Cache or use purely
internal optimizations (e.g. specialized caching or "compiled" data
structures) to improve performance?
Add to the requirements: "Functionality in Zend_Locale and
Zend_Locale_Translate that duplicates in PHP 6 and the new
object-oriented data extension will be replaced when available. Effort
should be made to provide wrapper functionality and a compatible API."
Functionality to automatically create a locale object from information
sent in HTTP headers (i.e. sent by the browser) should be split to a
separate class. However, "auto-completing" a locale object given
partial data might still reside in Zend_Locale. Auto-completing may be
accomplished algorithmically, possibly using heuristics in some
circumstances. Auto-completion should not be automatic by default in
Zend_Locale.
The word "locale" is overloaded too much. Function names need to be
more intuitive and descriptive. For example, these functions:
getLanguage() return a language code
getCountry() return a country code
getRegion() return a region code
The SQL backend storage should remain in the incubator until performance
benchmarks demonstrate performance is comparable to the TMX and Gettext
storage backends to avoid potential complications with developers
encountering severe performance problems.
If Zend_Locale::setLocale() is actually a getter-type function that
"returns the actual set default Locale", then a name like
"getDefaultLocaleName" is more descriptive, but terribly long. Do you
have suggestions for a naming convention to clearly distinguish between:
that sufficient to unambiguously specify a precise "locale"?)
The comments and names for setLocale(), getLocale(), and getLocales()
are confusing (possibly accidentally swapped?). Let us use more
descriptive names, like getBrowserDefaultLocale(), getLocalString(),
getDefaultLocale(), and getAllLocales() .. or something similar. Please
clarify the names and purpose of these functions, and give example use
cases.
Parsing, normalization, conversion, and formatting function names
could benefit from sharing common looking names with other locale-related
classes.
Methods should not combine input parsing, normalization, and output
formatting all in one step. Instead, parsing and normalization could be
encapsulated by a constructor. Then a "formatTo($someLocale)" can be
applied to the instance.
Questions for Proposal Author/Team
Zend_Locale_Format needs clarification, including a list of possible
values for $input. For example, does Zend_Locale_Format::getFormat()
provide generic number formatting based on locale?
Is there any portion of LDML that Locale does not need to model in order
to provide our lightweight implementation of "locale"?
Jul 18, 2006
Thomas Weidner
I will answer all related questions and make the changes a little bit later (much to redesign and think of, as I have to review 3 classes (one already made
).
But one thing:
To make this a little bit clearer:
Locale in the meaning of translation does not make use of LDML.
LDML is only needed for all additional locale aware classes like
"How to parse numbers, how to format calendars, get a list of all countries" and much more.
So Zend_Locale itself has until now no use of Zend_Locale_Data as Zend_Locale only has to
do translational issues.
Zend_Locale_Data is a passive, standalone, static ICU/LDML Reader which returns only the
asked datas from LDML. It COULD be used totally independent.
From LDML itself until now there will only be support of the main and the supplemental files.
There is no support for collation, segments or transforms planned until now.
All other questions will be answered later.
Jul 19, 2006
Gavin
Thank you Thomas for your hard work on this important proposal. I completely understand the need for more time. Also, everyone is here to help, and answers can be worked out one part at a time, with comments back and forth between everyone interested in helping with this component.
We repeated the lightweight design goal in the form of a question ("Is there any portion of LDML ..."), to make certain key design decisions are captured in these comment notes. These notes provide key reference material in the future, as components evolve. The question was added to this proposal, since Zend_Locale_Data/Format are implied by this proposal.
Jul 28, 2006
Thomas Weidner
Here I must seperate two tasks which are known as "locale".
TRANSLATION:
Here we have to
LOCALE INFORMATION MANAGEMENT:
Here we have to do several things locale aware. This implements
I've seperated these tasks.
Translation and all related issues are done by Zend_Locale.
Locale Information Management, which generally implements ICU/LDML usage, is done by Zend_Locale_Format.
This class is static as ICU could be used independent from Zend_Locale if this is needed.
Therefor when for example implementing Zend_Date object, this could be done independent from Zend_Locale,
and therefor it's faster.
Zend_Locale_Format is intendent to be used by other Classes which need to be locale aware.
Here we have to differ between
Indepentently the default locale could be found out by calling the static class Zend_Locale::getDefault.
All locale aware classes will first call this function when NO standard locale was found.
The standard locale will be stored by calling Zend_Cache for
Locale's are always OPTIONAL.
When no locale was given, all locale aware classes will use the default locale.
I did'nt use Zend_Filter_Input until now.
I'm not sure what you want me here to show...
How Zend_Locale makes use of Zend_Filter_Input ???
Non... as Zend_Locale/_Format handles all locale things it-self.
When it should use Zend_Filter_Input for example for parsing a locale aware float number:
Zend_Filter_Input::getFloat($locale); could be called.
But the problem here is that Zend_Locale_Format knows how to parse an input string properly,
and Zend_Filter_Input not.
When doing this, I would loose the property of parsing input locale aware.
How Zend_Filter_Input can use Zend_Locale ???
There are 2 options:
1.) Zend_Filter_Input could use Zend_Locale_Format to know if a string is locale-aware and would get a
locale-unaware string.
So Zend_Filter_Input's function getInt would first have to call Zend_Locale_Format and could then do it's
job with the locale-unaware returned value.
This would be the best way in my opinion.
Zend_Filter_Input could use:
Zend_Locale_Format::getInteger
Zend_Locale_Format::getFloat
Zend_Locale_Format::getDate
and so on...
2.) Zend_Filter_Input could be locale aware by knowing how to format.
There are 2 ways.
2.1) Zend_Locale_Format::getFormat(Zend_Locale::NUMBERFORMAT) would f.e. return '#,##0.0#'.
Zend_Locale_Filter then would have to parse as defined in the format.
No good way in my opinion, as Zend_Locale_Format::getInteger f.e. is already doing this job locale aware.
2.2) Zend_Locale_Data::getContent($locale,'numbersymbols') could return the
locale aware symbols for number formatting.
Then Zend_Filter_Input would have to parse the input checking the right symbols.
Also no good way in my opinion as Zend_Locale_Format::getInteger is doning this job already.
3.) Zend_Filter_Input could stay locale independent and the user would have to do the job.
This could be done, but why should the user be aware if his input string is locale aware or not.
Better to do this job automatically, so the user would not have to think of it.
For Dates it could look loke this
So my opinion is that Zend_Filter_Input should be locale aware, and it should have to use
Zend_Locale_Format.
So the user only has to do
all other things would be done internally.
This would be the best way in my opinion.
Zend_Date makes only use of Zend_Locale_Format and Zend_Locale_Data.
Both of these are static. Zend_Locale itself would not be used.
See above for LOCALE INFORMATION MANAGEMENT.
Added in the proposal
In Zend_Locale_Data the internal variables were made private because
they store the information which is found in LDML and what locales have
been searched until now.
Making them protected will brake the class, as these variables are only
for internal use. When an other class could change what Zend_Locale_Data have
been found and searched for now it would be a big problem.
What is the intention for making them protected ??
Zend_Locale is this seperate class.
LDML is only accessable through Zend_Locale_Format/Data.
The backends are called upon to use Zend_Cache for internal caching.
Each backend has to use this by its own, because SQL-Caching is different than File-Caching,
and also each source has to be handled and therefor also cached different.
Caching has to be done in the source backends, as the base class does'nt know when a source is no longer avaiable.
Agreed.
Only problem for now...
How could a "compatible API" be done when we dont know how the I18N object will look like in PHP6
I've looked in several other languages and took out the most used things. This should de the job best
I think that creating a own class for only one (1) function is a little bit excessive option.
There are only 2 ways for detecting locales.
What do you mean with "autocompleating" ?
When Requesting "de_TG", automatically de and root will be used.
Locale Information and also translation is ALWAYS recursive.
When you mean this with autocompletion we would break standard behaivor.
Keep in mind that there must always be an output, even when the requested string cannot be translated.
I think this was already be done when your comments were written.
I only found getLocale and setLocale...
Have you referred to an older proposal ??
Agreed.
But keep in mind that smaller project often use database for simpler access,
gettext is mostly used for middle sized projects, and TMX in realy big
project. Exceptions approve this rule
getDefault() will return the default locale.
But you can also select to take the default locale only from
the browser or instead use the environment. getEnv()/getBrowser()
To clarify:
When no locale name is given the "standard locale" has to be used.
This is normally the root. But it could also be an automatically found
locale from browser or environment.
A locale is always represented by
LANGUAGE and REGION
Language en,de,fr means a standard which can be used in all en,de or fr spoken areas.
Region defined a specific information which is only related to this country
So 'de' means all german speaking people or countries.
'de_AT' means only the austrian people which speak german.
Also 'it_AT' could be possible when there are italic speaking ethnic groups in this region.
As 'it_AT' could not be found in LDML the next recursion is used.
'it_AT' degrades to 'it' only.
This is the same for translation and also for reading the LDML.
'it' is also an precise definition of an locale. But this locale is region-unaware.
I think i've changed this even befor your comments.
But I changed them a little bit more, so the purpose of a function should be clear now.
Take a look at Zend_Locale_Format.
All related function are avaiable in this class.
Parsing and Normalization has to be done in one step.
Formatting the output is another step.
Input Parsing and Normalization have t be done by construction time or
function call.
Output formatting is only done when output is needed (toString, convert,...)
Yes it provides this rules based on locale.
The following standard formats could be returned locale based from the function getFormat($type):
Date(Calendar, Format) // Formating rules for a given date type
Time(Calendar, Format) // Formatting rules for a given time type
DateTime(Calendar, Format) // Formatting rules for a given calendar type
Timezone // Formatting rules for a timezone
Decimal // Formatting rules for decimal values (-123,456.789)
Scientific // Formatting rules for scientific values (-12.345e-67)
Percentage // Formatting rules for percentage values (1432%)
Currency // Formatting rules for currency values
LDML is only avaiable through Zend_Locale_Format and Zend_Locale_Data.
Data is the ICU parser which gives us all wanted information.
Format is the parsing and normalization class which makes use of Zend_Locale_Data.
In LDML there are several informations which are very usefull.
f.e. a localized list of all languages, regions, monthnames and much more.
Things which will not be present for now are
They could be included in future.
----I hope all things are now clear.
Aug 01, 2006
Gavin
Filtering
I like your approach to integrating Zend_Locale into Zend_Filter_Input, partly because it reduces the total amount of code a developer must write to use these classes compared to alternatives.
private static
If static class variables are private, then how can anyone extend or modify the class? If everyone is absolutely certain the class will never be extended by a developer in a meaningful way, then private static might be ok. If a developer can add a method that "breaks" the parent class by manipulating a protected static variable, that is not sufficient reason to prevent all developers from subclassing the parent class.
Compatibility with I18N PHP
lol .. so true! We do the best we can in the time we have, don't worry about it too much, as this is not really a strict requirement, but more of a guideline (compatibility with PHP, as it evolves).
Automatic Locale Recognition
Automatic locale recognition can take more forms than just using information from HTTP headers. HTTP_ACCEPT_LANGUAGE sometimes contains partial informations, like only "en". Similarly, we might only know the region or country. Thus, in some circumstances a complete, full specification of a locale is not known. I use the term "auto-completing" for the process of algorithmically using known facts to determine the real locale. If you believe the total amount of code for detecting locales from environment variables, HTTP_ACCEPT_LANGUAGE, etc. would be small, then I don't think anyone will object to including it in the Zend_Locale class. We want to be careful that a simple "hello world" program doesn't start using classes with lots of unnecessary locale code. However, if we ever begin implementing other forms of auto-completion (e.g. deriving a locale from a postal address, phone number, etc.), I would be concerned about possible code bloat in Zend_Locale.
$locale means .. ?
Yes, while our comments were assembled and reviewed the proposal was updated and function names improved, so our comments about function names were addressed before the comments were posted. I searched through the current usage of the word 'locale' in the Zend_Locale proposal, and only one key ambiguity concerns me. In the code and examples, $locale sometimes refers to an instance of Zend_Locale, and other times it refers to a string. Is there something we could do to reduce possible confusion between locale objects and locale strings?
LDML Bloat
In examining the XML files, I see numerous files containing mostly XML tag names, and very little data (e.g. en_US.xml - even has bloat in the form of comments). I also see other files containing lots of data that very few will ever need (e.g. en.xml, ja.xml). I see that Locale/Data/ requires 8.6 MB of disk space in Cygwin / NTFS. However, there are only 3,615,863 bytes used for all of these files, with the majority comprising XML "overhead".
Is it possible or reasonable to consider alternative ways of storing and accessing this information, in order to avoid performance issues for those using only subsets of this data?
Hint: This might be a version 2 objective.
Text Translation
Zend_Locale still includes translation functionality. There have been concerns that this would result in unnecessary overhead for applications not using any translation features, but needing other functions of the locale classes, such as date-related functions. Given the separation of the other classes listed in the Zend_Locale proposal, perhaps these concerns are unfounded. However, without more information, it is difficult to determine the impact of supporting translation directly in the Zend_Locale API.
Help
I also highly recommend recruiting help from amongst the ZF community of CLA signees. There is much to do here. The early alpha and beta testers of these incubator components could be a great resource not only for helping with testing and coding, but also in figuring out good unit tests (at least 80% coverage prior to review for inclusion to core), helping with the documentation, and practical use case examples.
P.S.
Some of our previous comments apply to previous versions of the proposal. Nevertheless, I believe the comments are still relevant, as they establish and clarify requirements.
Aug 02, 2006
Laurent Melmoux
Text Translation
---------------------
Thomas, things starts to looks really great, thanks !!!
Just one more think
... I agree with Gavin you should move out the Translation part from the Zend_Locale class. I think Zend_Locale, as his name is suggesting it, should only managing locales and support other local sensitive component to make localization possible. Then Zend_Translation will deal with translation string. Each one is doing what's they are suppose to do and naming thing this way help to understand what the code is doing.
My $0.02