ZF-10247: Zend_Http_Client URL encoding of spaces (which RFC are we applying?)

Description

In attempting to resolve a Zend_Oauth issue (uses a subclass of Zend_Http_Client), the original reporter noted that Zend_Http_Client uses http_build_query() to URL encode GET/POST parameters. This encodes all spaces as +, and not the percent encoding %20. The + encoding is not acceptable for OAuth or any other digitally signed protocol under RFC 3986. I'm currently investigating how live services and my own testing are not impacted by this but other implementations are - OAuth is a minefield for interoperability issues.

I am not a HTTP expert, but it seems a relatively simple task to switch from + to %20 (BC can be considered - not sure of any impact), and it will certainly prevent a number of dependent components suffering from unnoticed bugs. Consistency of URIs may not seem to matter from the perspective of general querying where + and %20 can be interpreted similarly, but the same is not true of protocols where URI parameters are digitially signed and interpreted strictly as-is in accordance with the latest RFC. This means that + and %20 alter the base signature of a URI depending on the encoding selected which feeds back to generated HMACs.

It would be great to get some feedback/opinions on the above since, not being an HTTP expert, I could be misunderstanding something. The only thing I am sure of is that PHP is fairly bad at applying RFCs consistently across all functions and Zend_Http_Client may be an unwitting victim of http_build_query() whose documentation does not clarify the RFC it applies. This should, without doubt, be fixed in ZF2 where these is no hesitation needed in making BC breaking changes given the continuing growth of digital signing in HTTP based protocols.

Once I figure out the compensatory measures employed by implementations like Twitter (unaffected by this), I can assess if a fix is absolutely necessary. If it is, the dependency on Zend_Http_Client will require copy-paste editing of the request() method if the above fix cannot be added. It would be great for ZF 2.0 if subclassing Zend_Http_Client was considered going forward and we could introduce more bit-size splits of concerns in the class - subclassing it is not all that easy to do without duplicating a lot of its code.

Comments

Writers write, that has just been proved.

You've been selected for writing the longest issue report not containing any code. Congratulations.

Ahh, but wait for the blog post! :P

I'll try to give a short as possible answer :)

  1. I completely agree about ZF2.0. In fact, I'm writing a new proposal (and experimenting with an implementation) of Zend\Uri for 2.0, which will be completely RFC-3986 compliant, and already (in my implementation) contains methods which replace http_build_query() and similar internal ambiguous functions with consistent RFC compliant encoding methods. The main diff between them and the internal functions is encoding of space as %20 and not '+'. This is clearly the 100% consistent method of encoding.

I didn't start working on Zend\Http\Client for 2.0 yet, but when I will I will try to re-use the Zend\Uri code for this purpose. I will also keep your note on subclassing, which I completely agree with, in mind.

  1. That said, I am not sure what the impact on BC will be. I am 80% sure it won't break BC because %20 is understood globally as space, and that includes all places where '+' is understood as space as well. However, I can't say I'm 100% sure. More advice on this will be most welcome.

Shahar.

Only place it might break BC is in situations where stuff expects + explicitly. I can't think of any examples though. Could be one for Matthew/#zftalk.dev to talk about and make a call on.

Shahar, would it be possible to add an option to switch the default encoding from + to %20? It might be simpler to add a quick fix as an option for client which really really need it, and let the full fix wait for ZF2. It may also be possible to extract the encoding to a separate protected method where it's easier to override in a subclass. This would avoid any BC issues altogether for the ZF1 version.

I think this is fixed in the latest ZF (1.11.5). When using Zend_Oauth_Client the config option "rfc3986_strict" is set to true by default which will encode all spaces to "%20" instead of "+".

However this is only done for all GET params not the HTTP body (POST and PUT params). I think the method Zend_Http_Client::_prepareBody() should also be aware of the "rfc3986_strict" option. I guess something like the snippet above should do the trick:

snipp

case self::ENC_URLENCODED: // Encode body as application/x-www-form-urlencoded $this->setHeaders(self::CONTENT_TYPE, self::ENC_URLENCODED); $body = http_build_query($this->paramsPost, '', '&'); if ($this->config['rfc3986_strict']) { $body = str_replace('+', '%20', $body); } break;

snipp

@[~padraic] if you identify the areas where the strict flag needs to be implemented I can whip up a patch