skip to content

PHP: RSS Feed Reader: Source Code

 Tweet Share0 Tweets

This page presents a simple class with a constructor and two public functions: getOutput returns an HTML-formatted version of the RSS feed, while getRawOutput returns all the attributes in a single multi-level array.

<?PHP use \Chirp\RSSParser; // where is the feed located? $url = "http://www.the-art-of-web.com/rss.xml"; // create object to hold data and display output $rss_parser = new RSSParser($url); $output = $rss_parser->getOutput(); // returns string containing HTML echo $output; ?>

Yes, it really can be that simple.

Source code of rssparser.php

This class is by no means the be-all and end-all of RSS parsing. It's designed to be simple, functional and easily customisable. It appears to work for all RSS formats, and can be extended to handle new formats - or perhaps further to handle general XML parsing.

File: rssparser.php

<?PHP   namespace Chirp;   // Original PHP code by Chirp Internet: www.chirp.com.au   // Please acknowledge use of this code by including this header.   class RSSParser   {     // keeps track of current and preceding elements     var $tags = array();     // array containing all feed data     var $output = array();     // return value for display functions     var $retval "";     var $errorlevel 0;     // constructor for new object     function __construct($file)     {       $errorlevel error_reporting();       error_reporting($errorlevel & ~E_NOTICE);       // instantiate xml-parser and assign event handlers       $xml_parser xml_parser_create("");       xml_set_object($xml_parser$this);       xml_set_element_handler($xml_parser"startElement""endElement");       xml_set_character_data_handler($xml_parser"parseData");       $curl_opts = [         CURLOPT_FOLLOWLOCATION => true,         CURLOPT_COOKIEFILE => "/tmp/newspapers.tmp"       ];       // open file for reading and send data to xml-parser       $data preg_match("/^http/"$file) ? http_get_contents($file$curl_opts) : file_get_contents($file);       xml_parse($xml_parser$data) or die(         sprintf(get_class() . ": Error <b>%s</b> at line <b>%d</b><br>",         xml_error_string(xml_get_error_code($xml_parser)),         xml_get_current_line_number($xml_parser))       );       // dismiss xml parser       xml_parser_free($xml_parser);       error_reporting($errorlevel);     }     function startElement($parser$tagname$attrs=array())     {       // RSS 2.0 - ENCLOSURE       if($tagname == "ENCLOSURE" && $attrs) {         $this->startElement($parser"ENCLOSURE");         foreach($attrs as $attr => $attrval) {           $this->startElement($parser$attr);           $this->parseData($parser$attrval);           $this->endElement($parser$attr);         }         $this->endElement($parser"ENCLOSURE");       }       // Yahoo! Media RSS - images       if($tagname == "MEDIA:CONTENT" && $attrs['URL'] && $attrs['MEDIUM'] == 'image') {         $this->startElement($parser"IMAGE");         $this->parseData($parser$attrs['URL']);         $this->endElement($parser"IMAGE");       }       // check if this element can contain others - list may be edited       if(preg_match("/^(RDF|RSS|CHANNEL|IMAGE|ITEM)/"$tagname)) {         if($this->tags) {           $depth count($this->tags);           if(is_array($tmp end($this->tags))) {             list($parent$num) = each($tmp);             if($parent$this->tags[$depth-1][$parent][$tagname]++;           }         }         array_push($this->tags, array($tagname => array()));       } else {         if(!preg_match("/^(A|B|I)$/"$tagname)) {           // add tag to tags array           array_push($this->tags$tagname);         }       }     }     function endElement($parser$tagname)     {       if(!preg_match("/^(A|B|I)$/"$tagname)) {         // remove tag from tags array         array_pop($this->tags);       }     }     function parseData($parser$data)     {       // return if data contains no text       if(!trim($data)) return;       $evalcode "\$this->output";       foreach($this->tags as $tag) {         if(is_array($tag)) {           list($tagname$indexes) = each($tag);           $evalcode .= "[\"$tagname\"]";           if(${$tagname}) $evalcode .= "[" . (${$tagname} - 1) . "]";           if($indexesextract($indexes);         } else {           if(preg_match("/^([A-Z]+):([A-Z]+)$/"$tag$matches)) {             $evalcode .= "[\"$matches[1]\"][\"$matches[2]\"]";           } else {             $evalcode .= "[\"$tag\"]";           }         }       }       eval("$evalcode = $evalcode . '" addslashes($data) . "';");     }     // display a single channel as HTML     function display_channel($data$limit)     {       extract($data);       if($IMAGE) {         // display channel image(s)         foreach($IMAGE as $image$this->display_image($image);       }       if($TITLE) {         // display channel information         $this->retval .= "<h1>";         if($LINK$this->retval .= "<a href=\"$LINK\" target=\"_blank\">";         $this->retval .= stripslashes($TITLE);         if($LINK$this->retval .= "</a>";         $this->retval .= "</h1>\n";         if($DESCRIPTION$this->retval .= "<p>$DESCRIPTION</p>\n\n";         $tmp = array();         if($PUBDATE$tmp[] = "<small>Published: $PUBDATE</small>";         if($COPYRIGHT$tmp[] = "<small>Copyright: $COPYRIGHT</small>";         if($tmp$this->retval .= "<p>" implode("<br>\n"$tmp) . "</p>\n\n";         $this->retval .= "<div class=\"divider\"><!-- --></div>\n\n";       }       if($ITEM) {         // display channel item(s)         foreach($ITEM as $item) {           $this->display_item($item"CHANNEL");           if(is_int($limit) && --$limit <= 0) break;         }       }     }     // display a single image as HTML     function display_image($data$parent="")     {       extract($data);       if(!$URL) return;       $this->retval .= "<p>";       if($LINK$this->retval .= "<a href=\"$LINK\" target=\"_blank\">";       $this->retval .= "<img src=\"$URL\"";       if($WIDTH && $HEIGHT$this->retval .= " width=\"$WIDTH\" height=\"$HEIGHT\"";       $this->retval .= " border=\"0\" alt=\"$TITLE\">";       if($LINK$this->retval .= "</a>";       $this->retval .= "</p>\n\n";     }     // display a single item as HTML     function display_item($data$parent)     {       extract($data);       if(!$TITLE) return;       $this->retval .=  "<p><b>";       if($LINK$this->retval .=  "<a href=\"$LINK\" target=\"_blank\">";       $this->retval .= stripslashes($TITLE);       if($LINK$this->retval .= "</a>";       $this->retval .=  "</b>";       if(!$PUBDATE && $DC["DATE"]) $PUBDATE $DC["DATE"];       if($PUBDATE$this->retval .= " <small>($PUBDATE)</small>";       $this->retval .=  "</p>\n";       // use feed-formatted HTML if provided       if($CONTENT['ENCODED']) {         $this->retval .= "<p>" stripslashes($CONTENT['ENCODED']) . "</p>\n";       } elseif($DESCRIPTION) {         if($IMAGE) {           foreach($IMAGE as $IMG$this->retval .= "<img src=\"$IMG\">\n";         }         $this->retval .=  "<p>" stripslashes($DESCRIPTION) . "</p>\n\n";       }       // RSS 2.0 - ENCLOSURE       if($ENCLOSURE) {         $this->retval .= "<p><small><b>Media:</b> <a href=\"{$ENCLOSURE['URL']}\">";         $this->retval .= $ENCLOSURE['TYPE'];         $this->retval .= "</a> ({$ENCLOSURE['LENGTH']} bytes)</small></p>\n\n";       }       if($COMMENTS) {         $this->retval .= "<p style=\"text-align: right;\"><small>";         $this->retval .= "<a href=\"$COMMENTS\">Comments</a>";         $this->retval .= "</small></p>\n\n";       }     }     function fixEncoding(&$input$key$output_encoding)     {       if(!function_exists('mb_detect_encoding')) return $input;       $encoding mb_detect_encoding($input);       switch($encoding)       {         case 'ASCII':         case $output_encoding:           break;         case '':           $input mb_convert_encoding($input$output_encoding);           break;         default:           $input mb_convert_encoding($input$output_encoding$encoding);       }     }     // display entire feed as HTML     function getOutput($limit=false$output_encoding='UTF-8')     {       $this->retval "";       $start_tag key($this->output);       switch($start_tag)       {         case "RSS":           // new format - channel contains all           foreach($this->output[$start_tag]["CHANNEL"] as $channel) {             $this->display_channel($channel$limit);           }           break;         case "RDF:RDF":           // old format - channel and items are separate           if(isset($this->output[$start_tag]['IMAGE'])) {             foreach($this->output[$start_tag]['IMAGE'] as $image) {               $this->display_image($image);             }           }           foreach($this->output[$start_tag]['CHANNEL'] as $channel) {             $this->display_channel($channel$limit);           }           foreach($this->output[$start_tag]['ITEM'] as $item) {             $this->display_item($item$start_tag);           }           break;         case "HTML":           die("Error: cannot parse HTML document as RSS");         default:           die("Error: unrecognized start tag '$start_tag' in getOutput()");       }       if($this->retval && is_array($this->retval)) {         array_walk_recursive($this->retval, array($this'fixEncoding'), $output_encoding);       }       return $this->retval;     }     // return raw data as array     function getRawOutput($output_encoding='UTF-8')     {       array_walk_recursive($this->output, array($this'fixEncoding'), $output_encoding);       return $this->output;     }   } ?>

expand code box

The parsing of the RSS feed into a PHP array is done by the RSSParser class using the startElement, endElement and parseData functions. The remaining functions are used only for displaying the data or accessing the raw data.

Here you can copy the code for rssparser.php:

Fields Supported by Default

This script supports the following attributes (fields) by default but can easily be extended. See the Feed Reader Demonstration for examples of parsed RSS (and Atom) feeds.

Channel (RSS or RDF:RDF)

Item

If you think it's worth adding support for other RSS attributes, please let us know using the Feedback link below.

Multibyte String Function support

If your PHP install doesn't include Multibyte String Function support then you will see some errors. You can get around that by jettisoning the fixEncoding function.

In other words, replacing:

return $this->fixEncoding($this->retval, $output_encoding);

with just:

return $this->retval;

The feed will then be displayed using it's original character encoding, which may or may not match the encoding of your HTML page, but other than that shouldn't be a problem.

References

< PHP

Send a message to The Art of Web:


used only for us to reply, and to display your gravatar.

<- copy the digits from the image into this box

press <Esc> or click outside this box to close

User Comments

Most recent 20 of 27 comments:

Post your comment or question

14 January, 2016

I try to use youyr code in my wordpress site in local server then I got an error message 'Call to undefined function http_get_contents()....' . please help me proceed further.

The function you're after is here.

21 December, 2014

Any updated on the deprecated http_get_contents($file) function? Would love to use this code, but getting an error. Same issue @mike_root. Thank you so much!

You can find a basic version of the http_get_contents function here.

28 October, 2014

Error message that your function in "RSS Feed Reader: Source Code" article, the code "http_get_contents(" doesn't exist. Has it been deprecated (I'm using Apache 2.4, PHP 5.4)?

On the contrary it's something we've just written - to get the file contents over http using cURL. Only I haven't had time to write it up yet. Stay tuned.

13 October, 2014

Thank you for this awesome class.

I wanna just post a probabily fix for

htmlspecialchars(stripslashes($itemdata['DESCRIPTION']))
and TITLE too

i have replace with

html_entity_decode(utf8_decode (stripslashes($itemdata['TITLE'])))

because i had a problem with any special chars.

Regards.
(sorry for my bad english =D )

13 October, 2014

I like your scripting for the PHP: RSS Feed Reader. Do you have a version that uses the PHP command CURL instead of FOPEN?
Thanks!
Gary

1 June, 2014

hi,

thanks for the code.
Really like it.

especially this:
$output = $rss_parser->getOutput(1);

I just wanted 1 item in my list.

great work.

greetings from the netherlands.

7 April, 2013

I had developed some code that basically works, but this looks more complete. One surprise I ran into... If the content included in the <description> tag contains <img...>, I would like to be able to scale the size if the width is greater than my current content pane... Any ideas on how to scale images that are included inside the <description> section?

Have you tried some generic CSS such as:

img { max-width: 100%; }
img { height: auto; }

30 May, 2012

The first of eight tested that works without errors,
absolutely top code, the best
Thanks to a huge mountain
from germany, berlin

9 October, 2011

Hi there. I shuld say this parser was perfect solution to my problem but I have one issue. I can't set number of showing feeds. Can you help me with this?

What you're looking for is just:

$output = $rss_parser->getOutput(3);

This will limit the display to the first 3 items in the feed.

23 September, 2011

Love the parser.
I had something like this in my feed

6:00 PM

and added

$tagname = ereg_replace(":","",$tagname);

inside startElement, then could reference the variable as MCSTARTTIME

I would use str_replace instead of the ereg function which is now deprecated, but yes, that's a good way to extract other variables

4 September, 2011

Is there a way to alter the class so it doesn't fail when encountering "undefined entities"?

The feed I am displaying apparently has some characters that the script doesn't understand and it is causing my page to fail with the error: "myRSSParser: Error undefined entity at line 385"

Line 385 would be the line in the cache file being read by the script.

The RSS feed reader class relies on the XML Parser extension included with PHP. That is where the error is being thrown rather then from our code (ref: php.net/xml_parse).

To avoid XML errors you need to make sure that the input is valid, or maybe just tweak the character-encoding or use utf8_encode if that's the problem..

For your particular case you can insert the following patch:

while($data = fread($fp, 4096)) {
if(!in_array(mb_detect_encoding($data), array("UTF-8", "ASCII"))) {
$data = preg_replace('[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S', '?', $data);
}
xml_parse(...) or die(

21 June, 2011

Tks for your greate post, beside I have one question that is :
How can I use this code to put rss from two source on one page ?

You only need to include the PHP class one time. The code that follows can then be repeated as many times as you want on the page, though it's probably a good idea use caching.

13 April, 2011

this solution is fantastic. one question though: if i was to include a truncate function to truncate each entry to a certain number of words, where would i put that in the code? i was reading your other page on truncating and couldn't figure out how to merge the two.

thanks a bunch.

You can truncate the DESCRIPTION field in the display_item function just before it's added to the return string:
e.g.
$DESCRIPTION = myTruncate($DESCRIPTION, 200);
$this->retval .= "<p>" . stripslashes($DESCRIPTION) . "</p>\n\n";

10 January, 2011

I got an error.
Error: unrecognized start tag 'FEED' in getOutput()

can you please explain

See comments above.

21 September, 2009

Your class.myrssparser.php has been extremely helpful to me in understanding creating/displaying RSS feeds, but in the code as copied onto my server. I get a huge string of error messages. The first few are as follows:
Notice: Undefined index: CHANNEL in C:InetpubVTRADERRfactorcla­ss.myrssparser.php on line 42

Notice: Undefined variable: RSS in ...
Notice: Undefined index: RSS in ...
Notice: Undefined index: LINK in ...

The errors you're seeing are really "Notices" saying that a variable (array index) is being referenced without previously being created/initialised. You can suppress these messages by setting your error_reporting level in PHP to "E_ALL ^ E_NOTICE" so it displays only actual errors and warnings and not notices.

21 July, 2009

First of all, your RSS Feed Reader class is great. Thanks for sharing it.

I've been using without major problems; although, I find one little issue I could not resolve yet: I would like to change the date format that comes within the item->pubdate tag to something more friendly. Could you guys give any ideas?

A few people have asked about this. I suggest something like the following:

if($PUBDATE) {
  $PUBDATE = date('l, jS F Y', strtotime($PUBDATE));
  $tmp[] = "<small>Published: $PUBDATE</small>";
}

11 June, 2009

Using blogger's atom.xml, the &gt and &lt and some / used in <br /> are not being parsed out, and are appearing in the html. Any ideas?

If you send me the feed URL I can check it out

28 March, 2009

I have no display whatso ever!!!!
the only error I receive is
myRSSParser: Could not open www.example.net/rss.xml for input.
I just cannot work out what the problem is - It is not just this example but at least two other reader example also. Any ideas?

Hi Keith, it sounds like your webserver is denying access to the request from PHP. That can happen for example if you have a firewall or filtering rules (mod_rewrite) that deny access when there is no HTTP_USER_AGENT. Check your server logs for a 403 error.

19 March, 2009

this is an excellent tutorial. i searched high and low for an rss tutorial and this one is miles ahead of the others. thank you very much for it. i would like to ask, how do you limit the results per page?

Hi Joe, you just need to pass the number of items you want to display as the first argument to the getOutput() function.

15 September, 2008

Thanks for this article, its really help me.
(^_^)

top