Example 3 - generating an RSS feed from a page using regular expression syntax to identify feed items

| Home | RSSxl | RSSxl tutorial | Support forums |

This example shows you how to generate a feed from a page using regular expression syntax to identify feed items

This example is very similar to Example 2, but uses Regular Expression syntax to include the second paragraph in the first item.  This technique is for advanced users only!  If you are not familiar with regular expression syntax, then don't attempt it.

The following box shows some of the HTML for the examples.htm page on this site:

<h1>Tutorial</h1>
<h2>Introduction</h2>
<p>You have three options open to you, depending on whether you can edit the 
HTML on the target page, and depending on whether you are familiar with Regular 
Expression syntax.</p>
<!-- start feed -->
<!-- start item -->
<h2><a href="example1.htm">Example 1</a></h2>
<!-- start description -->
<p>This is an example of generating a feed from your own page by 
inserting HTML comments to identify feed items.</p>
<p>This is the easiest method of converting your own web pages.</p><!-- end description -->
<!-- end item -->

<!-- start item --> 
<h2><a href="example2.htm">Example 2</a></h2> 
<!-- start description --> 
<p>This is an example of generating a feed from a page using existing 
HTML tags on the page to identify feed items.</p> 
<!-- end description --> 
<!-- end item -->

<!-- start item --> 
<h2><a href="example3.htm">Example 3</a></h2> 
<!-- start description --> 
<p>This is an example of generating a feed from your own page using some 
regular expression syntax to identify feed items.</p> 
<!-- end description --> 
<!-- end item --> 

In this example the <H2> heading "Introduction" has been used to mark the start of the feed items.  RSSxl will then look for the first instance of the Start and End Item Strings after that.  The <h2> tag has been used to identify the start of each item.  The end of each item is identified by the </p> tag followed by optional carriage return and newline characters followed by the start of another tag which is not a <p> tag, using the regular expression syntax: </p>\r?\n?<[^p]  The "Start Description String" and "End Description String" have again been left blank, because RSSxl will automatically split each item into a Title and Description based on the page layout. In the RSSxl tool, type in the following parameters:

Page URL = http://www.wotzwot.com/examples.htm
Start String = <h2>Introduction</h2>
Start Item String = <h2>
End Item String = </p>\r?\n?<[^p]
Start Description String =  
End Description String =  

Click here to enter these parameters into RSSxl

(Then click on the "Generate RSS" button to see the RSS feed)

Note that in this example the "Start String" cannot left blank, because the introduction on the page would also match the "Start Item String" and "End Item String" and create an additional feed item. 

Back | Up