Issues

ZF-1897: Leading/trailing newlines and spaces are preserved when loading XML from HTTP response.

Description

When parsing a HTTP response with "Transfer-encoding: chunked" header Zend_Http_Response::getBody() tries to decode the body using Zend_Http_Response::decodeChunkedBody(). Unfortunately the decoded body is not trimmed as Zend_Http_Response::extractBody() already does for not chunked bodies. This results in a problem when using Zend_XmlRpc_Client because SimpleXML expects the XML declaration at the start of the document. I'd like to suggest to change Zend_Http_Response::decodeChunkedBody() as follows (i.e. simply adding a ltrim() call before returning the decoded body): see attachement.

Comments

Assign to Shahar.

Hi, and thanks for the report and patch.

Are you using ::decodeChunkedBody() statically without instantiating a Response object?

If you could supply me with some reproduction code (for example a URL that I can send a request to that returns such problematic response) that would be most useful. The reason I need it is because the body which is decoded using decodeChunkedBody() should already be trimmed from the leading blank line as it is received from the server.

Additionally, your bug made me notice an even bigger issue - in some cases, the extractBody() method removed intentional blank lines and spaces from the response body - this is of course not acceptable, and I would like to change it, and not use ltrim() in any case.

Hi,

I noticed this problem when using Zend_XmlRpc_Client. Zend_XmlRpc_Client aggregates an instance of Zend_Http_Client and calls Zend_Http_Client::request() within Zend_XmlRpc_Client::doRequest(). Zend_Http_Client::request() calls Zend_Http_Response::fromString() to construct an instance of Zend_Http_Response. Zend_Http_Response::fromString() on his part calls Zend_Http_Response::extractBody() to get the contents. ::extractBody() uses ltrim() to string newlines. Later Zend_Http_Response::getBody() is called. If the body is not chunked there is no problem because there are no leading newlines and the XML declaration is at the start of the document. But if the body is chunked the decoded body may contain new lines. Which results in an error when SimpleXML tries to parse the document in Zend_XmlRpc_Response::loadXml(). Please look at the attached sample responses (XML-RPC response bodies cut off). On closer inspection you will notice that Zend_Http_Response will cut off all leading newlines in response-not-chunked.txt which results in a valid XML document. But in response-chunked.txt Zend_Http_Reponse will only cut off newlines that stand ahead of the first length information (line 9).

I'm not sure wheter simply removing all leading newlines (as I proposed) is the best way to solve the problem. Maybe it would be a better approach for this particular problem to remove leading newlines in Zend_XmlRpc_Response::loadXml().

Looking at the samples you sent, it looks like the problem is in the response (and in the way Zend_XmlRpc parses it) and not in Zend_Http_Client. The chunked response actually contains leading spaces (sent by the server). Logically, Zend_Http_Client should not be cutting off any data from the response sent by the server - including leading white space, as long as it's a part of the response body.

I suggest changing Zend_XmlRpc to trim any whitespace from the response body if it causes SimpleXML to fail.

I am assiging this to the XmlRpc owners - I will also file a new bug on Zend_Http_Response because it shouldn't be trimming anything at all (which now it does).

Assigning to Matthew

Setting component to Zend_XmlRpc

Scheduling for 1.5.0 RC2

I'm having trouble finding a way to re-create the issue in a test, and will need to consult with Shahar to fix. Rescheduling for next mini release following 1.5.0.

This issue should have been fixed for the 1.5 release.

This doesn't appear to have been fixed in 1.5.0. Please update if this is not correct.

Scheduling for next minor release; need to work with Shahar on a way to test this.

Assigning to Shahar; need help determining how to test to recreate the issue.

Shahar, any ideas here?

How about this for a reproducing test case: I used the OP's 'response-chunked.txt' file as a basis to create this test response:


HTTP/1.1 200 OK
Date: Thu, 06 Sep 2007 14:58:44 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3 Python/2.3.5 mod_ssl/2.0.54 OpenSSL/0.9.7e PHP/5.1.6
X-Powered-By: PHP/5.1.6
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

92 

 
<?xml version="1.0"?>
FOO
 
0 
 

Then run the following unit test using it:


    /**
     * @group ZF-1897
     */
    public function testCallFailsWhenHttpClientDoesNotTrimChunkedResponse()
    {
        $baseUri = "http://foo:80";
        $this->httpAdapter = new Zend_Http_Client_Adapter_Test();
        $this->httpClient = new Zend_Http_Client(null, array('adapter' => $this->httpAdapter));
        
        $respBody = file_get_contents(dirname(__FILE__) . "/_files/ZF1897-response-chunked.txt");
        $this->httpAdapter->setResponse($respBody);

        $this->xmlrpcClient = new Zend_XmlRpc_Client($baseUri);
        $this->xmlrpcClient->setHttpClient($this->httpClient);
        
        $this->assertEquals('FOO', $this->xmlrpcClient->call('foo'));
    }

The result of this test is:


++ phpunit --verbose --group ZF-1897 AllTests
PHPUnit 3.5.13 by Sebastian Bergmann.
[...snipped...]
There was 1 error:
1) Zend_XmlRpc_ClientTest::testCallFailsWhenHttpClientDoesNotTrimChunkedResponse
Zend_XmlRpc_Client_FaultException: Failed to parse response
[...snipped...]

Apply Shahar's suggested fix (trim the body):


Index: library/Zend/XmlRpc/Client.php
===================================================================
--- library/Zend/XmlRpc/Client.php      (revision 24104)
+++ library/Zend/XmlRpc/Client.php      (working copy)
@@ -294,7 +294,7 @@
             $response = new Zend_XmlRpc_Response();
         }
         $this->_lastResponse = $response;
-        $this->_lastResponse->loadXml($httpResponse->getBody());
+        $this->_lastResponse->loadXml(trim($httpResponse->getBody()));
     }

     /**

...and the unit test above (and the XmlRpc test suite as a whole) now passes.

So it looks to me like trimming the body is the way to go. Thoughts?

Fixed in trunk r24150

Merged to release-1.11 in r24159