Skip to end of metadata
Go to start of metadata

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

<ac:macro ac:name="unmigrated-inline-wiki-markup"><ac:plain-text-body><![CDATA[

Zend Framework: Zend_Uri Component Proposal

Proposed Component Name Zend_Uri
Developer Notes http://framework.zend.com/wiki/display/ZFDEV/Zend_Uri
Proposers Shahar Evron
Zend Liaison TBD
Revision 1.0 - 2 August 2010: Ready for community review
0.1 - 26 July 2010: Initial Draft (wiki revision: 40)

Table of Contents

1. Overview

Zend_Uri is the Zend Framework component responsible for representing URIs (Uniform Resource Identifiers) as objects.

In Zend Framework 1.x and before, Zend_Uri was mostly used for representation and validation of URIs of specific schemes, and the only scheme implemented was of HTTP URIs. In addition, Zend_Uri 1.0 was not capable of representing partial or relative URIs, and URIs of arbitrary schemes, and did not provide tools for resolving, encoding and normalizing URIs.

This proposal describes a set of changes and improvements (effectively a complete rewrite) to Zend_Uri for Zend Framework 2.0, that will address the deficiencies mentioned above.

2. References

  • Preview code is available on the zend-uri-se branch at git://arr.gr/zf2.git

3. Component Requirements, Constraints, and Acceptance Criteria

  • Zend\Uri will allow representation of generic URIs as objects
  • Zend\Uri will allow programmatic composition of URIs using getter/setter methods to various URI parts
  • Zend\Uri will allow creation of URI objects through string parsing
  • Zend\Uri will allow representation of partial and relative URIs
  • Zend\Uri will closely follow RFC-3986 definitions for URI syntax
  • Zend\Uri will always produce RFC-3986 compliant URIs when converting URI objects back to a string
  • Zend\Uri will attempt to be flexible when parsing string URIs and accept invalid URIs or URI parts if these can be encoded into a syntactically valid URI
  • Zend\Uri will provide subclasses for representation of scheme-specific URIs
  • Zend\Uri will allow users to easily create their own scheme-specific classes
  • Zend\Uri scheme subclasses may enforce additional validation rules on URIs
  • Zend\Uri scheme subclasses may provide additional protocol or scheme specific APIs
  • Zend\Uri will allow automatic string-to-object conversion using the factory pattern
  • Zend\Uri will provide API for resolving relative URIs
  • Zend\Uri will provide API for normalizing URI strings
  • Zend\Uri will provide API for converting absolute URIs into relative URIs based on a shared absolute base URI
  • Zend\Uri will provide generic methods for validating and encoding different URI parts
  • Zend\Uri will not provide an interface for strict validation or encoding of URI strings.
    • These operations should be provided by Zend\Validate and Zend\Filter classes that may internally rely on Zend\Uri encoding and validation methods.

4. Dependencies on Other Framework Components

  • Zend\Validate\Hostname
  • Zend\Validate\Ip
  • Zend\Exception

5. Theory of Operation

Subclassing

The Zend\Uri component will provide a concrete class (Zend\Uri\Uri) implementing RFC-3986 compatible Generic URI Syntax parsing, composition, resolution, validation encoding and normalization of URIs. This class will be concrete and could be used to represent any compliant URI, including scheme specific URIs and partial or relative URIs.

In addition, Zend\Uri will provide a set of subclasses of Zend\Uri\Uri (initially Zend\Uri\Http and Zend\Uri\File) that will only be capable of representing URIs of specific schemes, and will enforce additional validation rules in addition to those defined by the Generic Syntax RFC. These subclasses may still be able to represent partial or relative URIs, as long as they comply with any rules imposed by the scheme.

Parsing and Composition

URI parsing and composition will be done following the parsing and composition rules defined in the RFC. The aim is to be relatively lax when parsing string URIs and setting different URI parts using accessor methods, and accept input as long as it can eventually be encoded into a valid URI when the object is converted back to a string.

For example, the URI file:///C:/Program Files/Zend will be accepted by the parser despite the fact that it's path component (C:/Program Files/Zend) contains an invalid space character. When the URI is composed back into a string and is normalized, it will be represented as file:///C:/Program%20Files/Zend, which is a valid and RFC-compliant URI.

Zend\Uri will refuse to parse a string or accept a part set through one of the mutator methods only if the input can never be unambiguously converted into a valid URI part.

For example, the following will not be allowed:

Since the scheme of a URI may never contain spaces and the URI syntax rules do not define a mean to represent a space character in the scheme part.

In contrast, the following will be allowed:

Since although the ' ' and '#' signs may not be used literally in the query part of a URI, they can be encoded as '%20' and '%23' respectively when the URI is re-composed.

Relative URI Resolution

One of the common tasks to preform with URIs is resolving relative URIs and merging a base URI with a relative URI to form a canonical representation of the relative URI. Unlike Zend_Uri 1.0, the new implementation will expose an API for resolving a (possibly relative) URI against an absolute base URI to form an absolute URI.

Additionally, Zend\Uri will expose an API to perform the opposite operation: "subtract" a common base URI from an absolute URI to form a relative reference.

Both methods can be useful for example when composing or parsing HTML pages, and when creating links in portable applications.

Normalization

Zend\Uri\Uri and it's subclasses will expose an API to normalize URI objects. This normalization method should be used, for example, before comparing two URI strings to check if they are identical.

For example, the following URLs, while syntactically different, are semantically equivalent: http://www.example.com:80/?foo=b%61
HTTP://www.example.com?foo=bar

The normalization API will allow the user to compare these two URIs, by normalizing them using the RFC defined normalization rules (and possibly scheme-specific normalization added in Zend\Uri\Uri subclasses). In the example above, the normalized URIs would both be converted to: http://www.example.com/?foo=bar

Normalizing URIs will include:

  • Converting the scheme to lower case
  • Removing the port if it is equal to the scheme's default port
  • Decoding any percent-encoded characters which do not need to be encoded
  • Replacing an empty path with '/' in URIs that have an authority part
  • Converting percent-encoding hexadecimal characters to upper case
  • Removing empty port, query or fragment parts
  • Additional scheme specific normalization (e.g. in HTTP URLs lower-casing the host name)
Automatic Scheme-specific Class Selection

Zend\Uri will provide a factory-pattern class (Zend\Uri\UriFactory) which will allow users to pass URI strings into it. Depending on the URI string scheme, the Factory method will return a scheme-specific class to represent the URI if such class is registered with the UriFactory class. By default, the scheme-specific classes provided by Zend\Uri will be registered with the factory class, and users will be able to register additional scheme-specific subclasses of Zend\Uri (or overwrite any pre-registered schemes) with the factory class, if they wish to implement their own URI classes.

If the URI string is relative and does not specify a scheme, users may specify a default scheme to fall back to (e.g. when parsing an HTML page fetched using HTTP, relative href links are assumed to be of the HTTP scheme).

If the factory method will not find the appropriate scheme class, or is unable to detect the URI scheme and no default scheme was specified, it will fall back to using the generic syntax Zend\Uri\Uri class.

Scheme-specific Functionality

In addition to the above, and to the enforcement of specific validation rules, Zend\Uri\Uri subclasses may also expose additional scheme-specific functionality.

For example, the File scheme class may expose methods to convert a Win32 or Unix file path into a file:/// URI.

6. Milestones / Tasks

The implementation and general availability of this change should be coordinated with the release schedule of Zend Framework 2.0, with the following milestones:

  • Milestone 1: [PARTIALLY DONE] design notes will be published here
  • Milestone 2: [PARTIALLY DONE] A working prototype with 80%+ unit test coverage is checked into the developer's git repository
  • Milestone 3: Working prototypes of scheme specific classes are checked into the developer's git repository
  • Milestone 4: Complete unit test coverage and documentation is checked into the developer's git repository
  • Milestone 5: Fixes to components relying on Zend_Uri (namely Zend\Http\Client) is checked into the developer's git repository
  • Milestone 6: Zend\Filter and Zend\Validate classes for URI normalization and validation are made available
  • Milestone 7: Developer branch is merged into the public git repository in time for preview releases of Zend Framework 2.0

7. Class Index

  • Zend\Uri\Uri
  • Zend\Uri\Http
  • Zend\Uri\File
  • Zend\Uri\UriFactory

Exception classes:

  • Zend\Uri\Exception
  • Zend\Uri\InvalidUriException
  • Zend\Uri\InvalidUriPartException
  • Zend\Uri\InvalidUriClassException

Filter / Validator implementations (optional):

  • Zend\Filter\Uri - Zend Filter for normalizing URI strings
  • Zend\Validator\Uri - Zend Validator for validating URIs

8. Use Cases

UC-01: URI parsing
UC-02: Accessing different URI parts
UC-03: Resolving relative URIs
UC-04: Creating configuration-based URIs depending on environment
UC-05: Normalizing URIs
UC-06: Creating a relative reference from an absolute URI
UC-07: Extracing HTTP URLs from an HTML page
UC-08: Using the Factory class with custom URI classes
UC-09: Example of using Zend\Uri to get files from FTP server
UC-10: Accessing and Encoding various HTTP URI parts (part of an HTTP client implementation)

9. Class Skeletons

The Zend\Uri\Uri class
The ZendUriUriFactory class

]]></ac:plain-text-body></ac:macro>

]]></ac:plain-text-body></ac:macro>

Labels:
zf2 zf2 Delete
zend_uri zend_uri Delete
uri uri Delete
proposal proposal Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Aug 02, 2010

    <p>This is badly needed. The API looks really good so far, as does the capacity to add new schemes. When it gets through the proposal process, I'd love to add support for a few other schemes which are used/(will be) used in other parts of the framework (tag: and data: spring to mind among others). Look forward to seeing this in community review section!</p>

  2. Aug 02, 2010

    <p>Looks nice. Some notes:<br />
    1. It would be nice if relative resolution functions were able to accept absolute URLs too (returning the absolute one back then, normalized or not). There are situations where it is not known if the URL you're getting is relative or not, it'd make the API more convenient. Maybe it was the idea but description says "relative" so I mention this just in case <ac:emoticon ac:name="smile" /></p>

    <p>2. "generate" looks a bit confusing - it's not obvious that it will produce the URL string. Maybe just toString() would be ok?</p>

    <p>3. host, port, etc. are part of base URI API, but not all URIs have those. On the other hand, there seems to be no method of separating the scheme from the rest, it is unclear how to work with URIs like <a class="external-link" href="mailto:foo@bar.com">foo@bar.com</a> - there's getScheme() but not getting the rest of it. </p>

    <p>4. Continuing from the previous one, it would be nice if for particular URI handler (like Http) there was a way to know if certain URI "belongs" to it - without going through factory and creating the actual class. </p>

    <p>5. Having out-of-the-box support for something like mailto: and data: would be nice <ac:emoticon ac:name="smile" /></p>

    1. Aug 02, 2010

      <p>Hi,</p>

      <p>1. That's the plan - see the last example in UC-04 above. </p>

      <p>2. I agree. BTW ->generate() is wrapped in __toString() anyway. Although I agree the name toString() makes more sense (+ wrapping with __toString() for magic casting purposes)</p>

      <p>3. In fact, for a URI like foo@bar.com getPath() would do - for a URI like <a class="external-link" href="http://foo@bar.com/baz">http://foo@bar.com/baz</a> it obviously won't but I'm not sure why you'd want / need access to "anything but the scheme". Can you give a use case where you would need more than that? </p>

      <p>BTW with Zend\Http\Client we do need access to things like getPath() + getQuery() (when constructing the HTTP request) but that's really not a reason to implement another method for it. </p>

      <p>4. I'm not entirely sure what you mean. If you'd try to do:</p>

      <ac:macro ac:name="code"><ac:default-parameter>php</ac:default-parameter><ac:plain-text-body><![CDATA[
      $uri = new Zend\Uri\Http('file:///foo/bar');
      ]]></ac:plain-text-body></ac:macro>

      <p>It would throw an exception. Each subclass "knows" which schemes it accepts. The Factory class allows you to parse a page full of URIs and get exactly the right subclass for each one without checking it's scheme in advance. Again, can you explain with a use case what you need?</p>

      <p>5. I think that makes sense (I started proposing mailto: but dropped it because personally I have no idea what specific APIs are required from it). Once this generally makes sense I can try to propose additional scheme specific classes.</p>

      <p>Cheers,</p>

      1. Aug 05, 2010

        <p>4. What I was meaning is a way to know that URI belongs to a module without actually running a ctor and getting exception, etc. I.e., suppose I have script that goes over all URIs in an HTML page and converts them from relative to absolute and then replaces the server name with another name. I could instantiate every URI via factory and check the type but a) I don't really care about non-HTTP ones, so why waste resources on instantiating them and b) if there's some link to a scheme that is not supported I get an exception even though only thing I wanted to know is that it's "not HTTP" - I don't care what it is beyond that. So I think a quick way to check "HTTP or not" makes sense, if I need to ignore URIs of different types.</p>

        1. Nov 13, 2010

          <p>I've modified UC-07 to demonstrate the use of a new parseScheme() static method, following your suggestion.</p>

          1. Nov 15, 2010

            <p>In a narrow case, WebBrowser use currently webpage's base tag's URL, if it is available as Base URL(<html><head><base href=>). It may be well to mention it on manual as notice.</p>

            1. Nov 15, 2010

              <p>Indeed that's true, and the URI RFC actually provides good examples of how partial URI resolving should work based on context. However, I do not see this as a part of the Zend\Uri domain - Zend\Uri is about constructing URIs and not about understanding the context in which they are used. </p>

              <p>Zend\Http\Client will have to do some of that (for example when handling redirects), and possibly some of the other components (Zend\Feed comes to mind), and will be able to use Zend\Uri APIs to do so. </p>

              <p>I do agree it may be worth to mention this in the documentation.</p>

  3. Aug 02, 2010

    <p>Can you rename the method generate to toString?</p>

    <p>There is already code by DASPiD:
    <a class="external-link" href="http://framework.zend.com/svn/framework/standard/branches/user/dasprid/Zend_Uri-2.0">http://framework.zend.com/svn/framework/standard/branches/user/dasprid/Zend_Uri-2.0</a></p>

    1. Aug 04, 2010

      <p>Yes I think it makes sense (see above).</p>

      <p>Edit: generate() renamed to toString() as suggested. </p>

  4. Aug 03, 2010

    <p>Very nice design, let get into the incubator to run in my own test field :-D</p>

  5. Nov 14, 2010

    <p>Also it would be nice to add a method for adding/setting/deleting query paramters. At the moment you are only able to set a complete query string but on the other hand you are able to return the query parameters array.<br />
    As usecase it is imaginable to pass an uri object to some methods and each of these methods can add/set/delete a parameter.<br />
    Also it is useful to build query strings because you don't have to decide whether to use the "?" or "&" for concatenation to build the uri.</p>

    1. Nov 15, 2010

      <p>Currently the setQuery() method will accept an associative array of parameters, from which it can build the query string. </p>

      <p>I am not sure offering a set of functions to manage specific parameters (set value, delete, add) is in the scope of Zend\Uri. Query string parameter representation is actually not a part of the URI specification - in the generic URI syntax the query string can represent any kind of information as long as it uses the right set of characters. This is probably more in the context of other components using Zend\Uri, such as Zend\Http\Client. </p>

      <p>In any case if demand for this comes up I will add such functionality (perhaps through a standalong class "Zend\Uri\QueryBuilder") but I'm not sure this is a must have for first version. </p>

  6. Nov 15, 2010

    <p>why use toString()?</p>

    <p>we can't use magic methode like __tostring() and have something more quick to write : echo $uri->resolve($baseUri); else of echo $uri->resolve($baseUri)->toString(); ??</p>

    1. Nov 15, 2010

      <p>__toString is implemented and is a wrapper around toString() - so you can use automatic casting just as well. </p>

      <p>There is a subtle difference however: toString() may throw an exception if the URL as a whole is not valid and cannot be composed into a string. This is an unlikely situation but can happen, for example, if you start from a blank URI, and only set the port, but not the host. This cannot be a valid URI because if a port or a userInfo section are not empty, the host must also not be empty. </p>

      <p>In this case toString() will throw an InvalidUriException, but __toString will not (cannot throw an exception from __toString in 5.3). Instead, __toString() will return a blank string. </p>

  7. Feb 11, 2011

    <ac:macro ac:name="note"><ac:rich-text-body><p><strong>Community Review Team Recommendation</strong></p>

    <p>The CR-Team recommends this proposal be accepted as-is.</p></ac:rich-text-body></ac:macro>