Welcome to RSS-reader documentation!

Welcome to the documentation of RSS-reader.

RSS (RDF Site Summary or Really Simple Syndication) is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many different websites in a single news aggregator, which constantly monitor sites for new content, removing the need for the user to manually check them. News aggregators (or “RSS readers”) can be built into a browser, installed on a desktop computer, or installed on a mobile device.[Wikipedia]

This RSS-reader comes in a command line interface format, and provided the RSS link of a webpage, will provide the RSS feeds of that webpage in two formats, including a JSON form. The user has the option to decide how many feeds they are willing to get and based on this option, they will be shown that many feeds.

How to use the RSS-reader

The RSS-reader is represented in CMI form, by running the program, user can enter the RSS of a webpage and the number of feeds they have selected as the option.

Below is demonstrated the help of the program

RSS-reader help

As it is mentioned in the help of the program, user have multiple options as follows:

help which displays the options, flags and arguments for the program.
version displays the version in which the program comes in If this option is selected, only the version will be printed.
json if selected, the news feed will be printed in json format
verbose if selected, the user will be informed by the logs shown to them that how the program progresses
limit the number of news feeds the user is willing to be shown. If not specified, all the feeds available on the RSS will be output.

And as argument, it is required to input an RSS URL.

source the RSS URL which the user expects to see feeds from
date the date specified by the user to be shown the feed from that date
--to-html path to which the HTML file should be stored
--to-pdf path to which the PDf file should be stored
--colorize prints the output in colorized format
If the distribution is installed, program can also be run using the rss_reader command, without needing the py command.

Below is illustrate an example of running the program with multiple options:

RSS-reader help

News Feed in JSON format

If the user selects the option of being presented the feed in json format, depending on the –limit option, they will be shown that number of feeds in json format.

The JSON fomat implemented for the RSS-reader is as follows: The feeds based on their relevence on the RSS file will be numbered starting from 1 and the output would appear in the followig form:

json format for news feed

with corresponding info about each news in front of the fields.

Below is an actual ouput of the news feed in JSON format is represented:

json format for news feed - CMI

Caching the News

When the user inputs a RSS URL, and no date is entered, the rss-reader fetches the feed items from the specified source and prints it in normal or json format, based on the options selected. While doing this, it also caches the read news.

The utility caches the feeds data as follows: When a feed is read, a dictionary of the feed’s information is created, storing its title, date, content, news link and image’s link, the RSS source and a path to the feed’s cache directory. The utility creates a cache directory in the cached_news folder for each feed. In the feed’s directory, the article of the feed from its news page is downloaded in a text file, the links in that article are extracted and stored in a text file and the images in the article are downloaded in another directory named “images” in the feed’s directory. This is done for when the utility wants to convert the feeds into HTML or PDF.

Then for each feed, a tuple is constructed, first element being the news date and the second one the previously mentioned dictionary and all tuples each corresponding to one feed are stored in a list that is saved in a file in the cached_news directory. The cached news are fetched by the news date, hence this implementation is designed that is demonstrated in the image below.

caching structure

The cached_news directory would look like this:

cached_news directory structure

And inside each feed’s directory, would look like this:

feed directory structure

How RSS-reader Utility Works

  • If the user selects –version option as an argument, the version of the utility will be printed and the program will be ended.

  • If the user selects –verbose option, verbose will be printed in the stdout

  • If the user selects –colorize option, the output on stoud would be printed in colorful format.

  • If the user doesn’t enter neither RSS URL nor –date, an error will be raised.

Otherwise, the behavior of the utility would be as it is illustrated in the diagram below:

utility behavior diagram

Used Libraries in the Project

For the implementation of this project, a number of libraries have been made use of. The most important of them are as follows: | These libraries are as follows:

argparse for parsing the arguments of the CMI
xml.etree.ElementTree for parsing the XML file of RSS into XML objects
requests for fetching web pages
json for converting the dictionary into json format This library has used in the second approach for this goal, and is not in the initial implementation (in the commented section)
logging for logging info/warning/error messages when the verbose option is set to on
re for regular expression operations
datetime and dateutil for the date format conversions
textwrap for wrapping the text in 120 characters format
pickle for data serialization during caching process
reportlab for producing PDF documents
BeautifulSoup for parsing HTML content into elements

Notes for the Reviewers

Dear reviewers, during the implementation of this project, I faced a few vague points and complications that I will present in this section, some of them for the purpose of clarification.

pycodestyle errors

In the output of the pycodestyle, there were few too many blank lines error. They were regarding the 2 blank lines I surrounded my module functions with, according the pep-8 guideline.

Future works

I will be improving this project every time I have time, as I have learned a lot from this project and I am still learning. My next step is going to implement test, as this step was not mandatory. Then I will be working on the implementation of the 6th iteration.