Jump to content

Python rss parser:


regexorcist

Recommended Posts

OMG!!It appears I'm very rusty when it comes to python as this took me...(I hate to admit it) a few hours to finish. :whistling: The code below is a small RSS feed parser only for the <title>, <link> and <pubDate> tags.Besides using only 3 tags, I designed it to read the RSS structure of this forum only,it may or may not work anywhere else. The real purpose was to show some python code doing something.(It's not lean nor pretty and I know there are much easier ways to do the same thing)NOTE:there is no validation checking of the input RSS feed path

#!/usr/bin/pythonimport sysimport stringfrom urllib2 import urlopenimport xml.dom.minidom var_feed_url = sys.argv[1]var_xml = urlopen(var_feed_url)var_all = xml.dom.minidom.parse(var_xml)def extract_content(var_all, var_tag, var_loop_count):   return var_all.firstChild.getElementsByTagName(var_tag)[var_loop_count].firstChild.datavar_loop_count = 0var_item = " "while len(var_item) > 0:   var_title = extract_content(var_all, "title", var_loop_count)   var_link = extract_content(var_all, "link", var_loop_count)   var_date = extract_content(var_all, "pubDate", var_loop_count)   print "Title:		  ", var_title      print "URL Link:	   ", var_link   print "Published Date: ", var_date   print " "   var_loop_count += 1   try:	  var_item = var_all.firstChild.getElementsByTagName("item")[var_loop_count].firstChild.data   except:	  	  var_item = ""

Here is this ugly script in action...Here is the feed path used (notice the path at the bottom of image)python_rss1.pngBecause the path is so long and has to be typed on the command line after the python script name, I decided to assign the path to a bash variable r_path.This makes it easier to run multiple times and makes for an image not so widefor this post.python_rss2.pngHere is what the command line looks like when running the script:(I named the script rss.py and a RSS path argument must be entered after, no validation...sorry)python_rss3.pngHere you can see that the script has run, but your only seeing the last entries:python_rss4.png The above problem is easily remedied by piping the command into more./rss.py $r_path | morepython_rss5.pngHope this helps someone :rolleyes:

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...