Zend Framework

Empty items array when parsing rss1.0/RDF feed

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: 0.1.5, 1.7.3
  • Fix Version/s: 1.8.1
  • Component/s: Zend_Feed
  • Labels:
    None
  • Fix Version Priority:
    Should Have

Description

When trying to parse an RSS1.0 / RDF feed (rdf namespace), items array is empty.

Example : http://www.php.net/news.rss (-;

Produce :
[title] => PHP: Hypertext Preprocessor
[link] => http://www.php.net/
[description] => The PHP scripting language web site
[items] => Array
(
)

the XML dump of zend_feed is :

<?xml version="1.0" encoding="utf-8"?>
<channel xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:about="http://www.php.net/">
<title>PHP: Hypertext Preprocessor</title>
<link>http://www.php.net/</link>
<description>The PHP scripting language web site</description>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://www.zendcon.com"/>
.../...

<rdf:li rdf:resource="http://www.php.net/archive/index.php"/>
</rdf:Seq>
</items>
</channel>

I did a quick review of Zend_Feed, finding that there is a namespace
registration which seems to be in trouble, but not sure, and it need probaly to switch the item tag of entryRSS class, or add a new entryRDF class.

It's not a matter, but as this the www.php.net feed, it's humoristic (-;

Thanks for all you job.

Thierry

Issue Links

Activity

Hide
Daniel Bezruchkin added a comment -

Extracting this into your zend directory will get RDF feeds to work.

Show
Daniel Bezruchkin added a comment - Extracting this into your zend directory will get RDF feeds to work.
Hide
Dave Liefbroer added a comment -

The attached zip doesn't work. It makes a major error (blank page). Will investigate on the error.

Show
Dave Liefbroer added a comment - The attached zip doesn't work. It makes a major error (blank page). Will investigate on the error.
Hide
Dave Liefbroer added a comment -

It's because of:
$success = @$doc->loadXML(Zend_Feed::utf8ToUnicodeEntities($string));
in Feed.php

The utf8ToUnicodeEntities function doesn't exist (wrong code version?)

In the previous version it was:
$success = @$doc->loadXML($string);
That works!

Show
Dave Liefbroer added a comment - It's because of: $success = @$doc->loadXML(Zend_Feed::utf8ToUnicodeEntities($string)); in Feed.php The utf8ToUnicodeEntities function doesn't exist (wrong code version?) In the previous version it was: $success = @$doc->loadXML($string); That works!
Hide
Bill Karwin added a comment -

Changing fix version to 0.6.0.

Show
Bill Karwin added a comment - Changing fix version to 0.6.0.
Hide
Ronnie Schwartz added a comment -

I have the exact same issue. Any idea when this will be resolved? I've used PEAR's RSS class, no good. I've used Magpie/simplepie, no good. This one was able to parse all of the new feeds but cannot parse the 1.0 rdf feeds. So it's the best so far!

Show
Ronnie Schwartz added a comment - I have the exact same issue. Any idea when this will be resolved? I've used PEAR's RSS class, no good. I've used Magpie/simplepie, no good. This one was able to parse all of the new feeds but cannot parse the 1.0 rdf feeds. So it's the best so far!
Hide
Simone Carletti added a comment -

This bug depends on ZF-26.
RSS 1.0 lists items outside channel node and Zend_Feed actually can't handle this situation.

Rather than fixing the behavior, I would suggest to add a new RDF class, as proposed in the description of this issue.
RSS 1.0 is completely an other branch compared with RSS 2.0.

The main difference between RSS 0.91 branch (created by Dave Winer) and RSS 1.0 branch (managed by RSS-DEV Working Group) is that the latter is RDF based while RDF architecture has been completely removed in RSS 0.91, RSS 0.92, RSS 2.0.

Additionally, I would suggest to add a new class property to return feed type/version.
The following seems to be a list of formats currently supported by Zend feed:

  • Atom 0.3
  • Atom 0.5
  • Atom 1.0
  • RSS 0.91
  • RSS 0.92
  • RSS 2.0
    The following formats should be supported but they are not, right now:
  • RSS 1.0
    Perhaps a new ticket is the better solution for a new proposal, rather than a comment.
Show
Simone Carletti added a comment - This bug depends on ZF-26. RSS 1.0 lists items outside channel node and Zend_Feed actually can't handle this situation. Rather than fixing the behavior, I would suggest to add a new RDF class, as proposed in the description of this issue. RSS 1.0 is completely an other branch compared with RSS 2.0. The main difference between RSS 0.91 branch (created by Dave Winer) and RSS 1.0 branch (managed by RSS-DEV Working Group) is that the latter is RDF based while RDF architecture has been completely removed in RSS 0.91, RSS 0.92, RSS 2.0. Additionally, I would suggest to add a new class property to return feed type/version. The following seems to be a list of formats currently supported by Zend feed:
  • Atom 0.3
  • Atom 0.5
  • Atom 1.0
  • RSS 0.91
  • RSS 0.92
  • RSS 2.0 The following formats should be supported but they are not, right now:
  • RSS 1.0 Perhaps a new ticket is the better solution for a new proposal, rather than a comment.
Hide
Simone Carletti added a comment -

I forgot to say that my previous comment has been inspired by http://www.nabble.com/zend-feed-issue--tf4928553s16154.html#a14108105

Show
Simone Carletti added a comment - I forgot to say that my previous comment has been inspired by http://www.nabble.com/zend-feed-issue--tf4928553s16154.html#a14108105
Hide
Matthew Turland added a comment -

The only difference between RSS 1.0 and other versions that is related to this issue is that item elements are not contained within the channel element. The attached file patch.diff modifies Zend_Feed_Rss to check for this and also patches the appropriate test in the test suite so that, without the patch to Zend_Feed_Rss, RSS 1.0 feed tests will fail.

Show
Matthew Turland added a comment - The only difference between RSS 1.0 and other versions that is related to this issue is that item elements are not contained within the channel element. The attached file patch.diff modifies Zend_Feed_Rss to check for this and also patches the appropriate test in the test suite so that, without the patch to Zend_Feed_Rss, RSS 1.0 feed tests will fail.
Hide
Simone Carletti added a comment -

Hi Matthew,
I gave a look at the patch you submitted a few days ago.

The following line doesn't really makes sense to me.

$this->assertTrue($feed->count() > 0);

_importRssValid method is an utility method and we cannot assume in advance the file he's going to fetch is not a valid empty feed.
I would create some valid RSS 1.0 unit tests instead.

The other part of the patch, the code fragment that should introduce RSS 1.0 compatibility it's fine, but I think it's incomplete.
Zend_Feed doesn't handle only feed import but it's able to create and edit a feed as well.

Did you think about how an imported RSS 1.0 feed will be printed out?
I assume it would be handled by Zend_Feed_Rss class but this library, as underlined by ZF-44, always returns an RSS 2.0 instance.
It means, an RSS 1.0 come in and an RSS 2.0 come out... I suppose this is not a good workflow.

What do you propose to fix this consequential issue?

For the sake of completeness, I'd like to share an additional though.
http://www.feedparser.org/ is, so far, the best feed parser written in python and probably one of the best feed parsers in the world.
Zend_Feed should probably learn something from this library!

Show
Simone Carletti added a comment - Hi Matthew, I gave a look at the patch you submitted a few days ago. The following line doesn't really makes sense to me.
$this->assertTrue($feed->count() > 0);
_importRssValid method is an utility method and we cannot assume in advance the file he's going to fetch is not a valid empty feed. I would create some valid RSS 1.0 unit tests instead. The other part of the patch, the code fragment that should introduce RSS 1.0 compatibility it's fine, but I think it's incomplete. Zend_Feed doesn't handle only feed import but it's able to create and edit a feed as well. Did you think about how an imported RSS 1.0 feed will be printed out? I assume it would be handled by Zend_Feed_Rss class but this library, as underlined by ZF-44, always returns an RSS 2.0 instance. It means, an RSS 1.0 come in and an RSS 2.0 come out... I suppose this is not a good workflow. What do you propose to fix this consequential issue? For the sake of completeness, I'd like to share an additional though. http://www.feedparser.org/ is, so far, the best feed parser written in python and probably one of the best feed parsers in the world. Zend_Feed should probably learn something from this library!
Hide
Simone Carletti added a comment -

Any news on this feature?
I would suggest to change status to unassigned if work is not in progress.

Show
Simone Carletti added a comment - Any news on this feature? I would suggest to change status to unassigned if work is not in progress.
Hide
Benjamin Eberlei added a comment -

I am resolving then reoping this bug, since its occupied over a year now.

Please raise your voice Matthew if this a no go by me

Show
Benjamin Eberlei added a comment - I am resolving then reoping this bug, since its occupied over a year now. Please raise your voice Matthew if this a no go by me
Hide
Benjamin Eberlei added a comment -

Reopened issue

Show
Benjamin Eberlei added a comment - Reopened issue
Hide
Matthias Sch. added a comment -

any news on this bug?
i think its just including the patch?

Show
Matthias Sch. added a comment - any news on this bug? i think its just including the patch?
Hide
Matthew Turland added a comment -

As far as I'm aware, no conflicting changes have been made to Zend_Feed_Rss since this patch was suggested, so the patch should work. Note that only the portion of the patch for library/Zend/Feed/Rss.php is really needed.

In terms of the portion that patches tests, it may be a better design decision to create an additional supporting method that first calls _importRssValid and then applies a non-empty check, and have all tests with non-empty test data files call that instead of _importRssValid, so that cases where data is expected to be empty can continue to function as normal.

Thoughts anyone?

Show
Matthew Turland added a comment - As far as I'm aware, no conflicting changes have been made to Zend_Feed_Rss since this patch was suggested, so the patch should work. Note that only the portion of the patch for library/Zend/Feed/Rss.php is really needed. In terms of the portion that patches tests, it may be a better design decision to create an additional supporting method that first calls _importRssValid and then applies a non-empty check, and have all tests with non-empty test data files call that instead of _importRssValid, so that cases where data is expected to be empty can continue to function as normal. Thoughts anyone?
Hide
Wil Sinclair added a comment -

Matthew, could you please evaluate the proposed solution and determine what we need to do to get this fixed? According to the votes, there seems to be a lot of interest in this issue.

Show
Wil Sinclair added a comment - Matthew, could you please evaluate the proposed solution and determine what we need to do to get this fixed? According to the votes, there seems to be a lot of interest in this issue.
Hide
Matthew Turland added a comment -

I've considered Simone's point and have updated my patch accordingly. _importRssValid no longer checks the feed item count in this new patch. Instead, it modifies _importRssValid to return the $feed object it creates to be used by the calling method and modifies the two existing RSS 1.0 test methods to check their respective feed item counts.

I've applied my patch to Zend_Feed_Rss in a current SVN checkout to confirm that it still works. If I run the modified unit tests on the unpatched version of this class file, I get this output:

$ phpunit Zend_Feed_ImportTest tests/Zend/Feed/ImportTest.php 
PHPUnit 3.3.8 by Sebastian Bergmann.

..............FF..........

Time: 3 seconds

There were 2 failures:

1) testRss100Sample1(Zend_Feed_ImportTest)
Failed asserting that <integer:2> matches expected value <integer:0>.

2) testRss100Sample2(Zend_Feed_ImportTest)
Failed asserting that <integer:1> matches expected value <integer:0>.

FAILURES!
Tests: 26, Assertions: 30, Failures: 2.

If I apply the patch and run the modified unit tests again, I get this output:

$ phpunit Zend_Feed_ImportTest tests/Zend/Feed/ImportTest.php 
PHPUnit 3.3.8 by Sebastian Bergmann.

..........................

Time: 2 seconds

OK (26 tests, 30 assertions)

Is this an acceptable solution?

Show
Matthew Turland added a comment - I've considered Simone's point and have updated my patch accordingly. _importRssValid no longer checks the feed item count in this new patch. Instead, it modifies _importRssValid to return the $feed object it creates to be used by the calling method and modifies the two existing RSS 1.0 test methods to check their respective feed item counts. I've applied my patch to Zend_Feed_Rss in a current SVN checkout to confirm that it still works. If I run the modified unit tests on the unpatched version of this class file, I get this output:
$ phpunit Zend_Feed_ImportTest tests/Zend/Feed/ImportTest.php 
PHPUnit 3.3.8 by Sebastian Bergmann.

..............FF..........

Time: 3 seconds

There were 2 failures:

1) testRss100Sample1(Zend_Feed_ImportTest)
Failed asserting that <integer:2> matches expected value <integer:0>.

2) testRss100Sample2(Zend_Feed_ImportTest)
Failed asserting that <integer:1> matches expected value <integer:0>.

FAILURES!
Tests: 26, Assertions: 30, Failures: 2.
If I apply the patch and run the modified unit tests again, I get this output:
$ phpunit Zend_Feed_ImportTest tests/Zend/Feed/ImportTest.php 
PHPUnit 3.3.8 by Sebastian Bergmann.

..........................

Time: 2 seconds

OK (26 tests, 30 assertions)
Is this an acceptable solution?
Hide
Benjamin Eberlei added a comment -

Resolved issue, i have verified and applied Matthews Testcases and Bugfixes. Thanks! Two very old bugs gone now

Show
Benjamin Eberlei added a comment - Resolved issue, i have verified and applied Matthews Testcases and Bugfixes. Thanks! Two very old bugs gone now
Hide
Satoru Yoshida added a comment -

Sorry, not in 1.7.4. I think it may be released in next minor.

Show
Satoru Yoshida added a comment - Sorry, not in 1.7.4. I think it may be released in next minor.
Hide
twk added a comment - - edited

The problem is reproducable with some feeds like
http://ranking.goo.ne.jp/rss/keyword/keyrank_all1/index.rdf

The source of that feed begins with
<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet href="/rss/user.xsl" type="text/xsl" media="screen" ?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xml:lang="ja">
<channel rdf:about="http://ranking.goo.ne.jp/service/001/">

Zend_Feed_Rss#__wakeup() checks if the feed is rdf or not with the following code
but the firstChild of that feed is "xml-stylesheet" and so it is not treated as rdf.
Please improve the check routine.
if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else { $this->_element = $this->_element->getElementsByTagName('channel')->item(0); }

Quick fix for the client user:
Replace
$feed = Zend_Feed::import($url);
with something like
$string = file_get_contents($url);
$string = str_replace('<?xml-stylesheet href="/rss/user.xsl" type="text/xsl" media="screen" ?>', '', $string); // or whatever between <?xml ?> and <rdf:RDF
$feed = Zend_Feed::importString($string);

Show
twk added a comment - - edited The problem is reproducable with some feeds like http://ranking.goo.ne.jp/rss/keyword/keyrank_all1/index.rdf The source of that feed begins with <?xml version="1.0" encoding="utf-8" ?> <?xml-stylesheet href="/rss/user.xsl" type="text/xsl" media="screen" ?> <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="ja"> <channel rdf:about="http://ranking.goo.ne.jp/service/001/"> Zend_Feed_Rss#__wakeup() checks if the feed is rdf or not with the following code but the firstChild of that feed is "xml-stylesheet" and so it is not treated as rdf. Please improve the check routine. if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else { $this->_element = $this->_element->getElementsByTagName('channel')->item(0); } Quick fix for the client user: Replace $feed = Zend_Feed::import($url); with something like $string = file_get_contents($url); $string = str_replace('<?xml-stylesheet href="/rss/user.xsl" type="text/xsl" media="screen" ?>', '', $string); // or whatever between <?xml ?> and <rdf:RDF $feed = Zend_Feed::importString($string);
Hide
twk added a comment -

To fix the problem, replace the following in Zend_Feed_Rss#__wakeup()
// Find the base channel element and create an alias to it.
if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else {
with
// Find the base channel element and create an alias to it.
$rdf = $this->_element->getElementsByTagNameNS('http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'RDF')->item(0);
if ($rdf) { $this->_element = $rdf; } else {

Show
twk added a comment - To fix the problem, replace the following in Zend_Feed_Rss#__wakeup() // Find the base channel element and create an alias to it. if ($this->_element->firstChild->nodeName == 'rdf:RDF') { $this->_element = $this->_element->firstChild; } else { with // Find the base channel element and create an alias to it. $rdf = $this->_element->getElementsByTagNameNS('http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'RDF')->item(0); if ($rdf) { $this->_element = $rdf; } else {
Hide
Matthew Weier O'Phinney added a comment -

Assigning to Alex.

Show
Matthew Weier O'Phinney added a comment - Assigning to Alex.
Hide
Alexander Veremyev added a comment -

Fixed.

Show
Alexander Veremyev added a comment - Fixed.
Hide
Matt Steele added a comment -

Was this added to 1.8.1? I don't see a Zend_Feed_Rdf class...

Show
Matt Steele added a comment - Was this added to 1.8.1? I don't see a Zend_Feed_Rdf class...
Hide
Nico Haase added a comment -

I don't see this resolved A feed which was linked in ZF-6516 is not accessible, neither this one from a german computer-magazine: http://www.heise.de/newsticker/heise.rdf

Show
Nico Haase added a comment - I don't see this resolved A feed which was linked in ZF-6516 is not accessible, neither this one from a german computer-magazine: http://www.heise.de/newsticker/heise.rdf
Hide
Nico Haase added a comment -

Sorry, please forget my last comment - I used an old version of ZF... shame on me...

Show
Nico Haase added a comment - Sorry, please forget my last comment - I used an old version of ZF... shame on me...

People

Vote (16)
Watch (13)

Dates

  • Created:
    Updated:
    Resolved: