Welcome to RSS-reader documentation!
Welcome to the documentation of RSS-reader.
RSS (RDF Site Summary or Really Simple Syndication) is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many different websites in a single news aggregator, which constantly monitor sites for new content, removing the need for the user to manually check them. News aggregators (or “RSS readers”) can be built into a browser, installed on a desktop computer, or installed on a mobile device.[Wikipedia]
This RSS-reader comes in a command line interface format, and provided the RSS link of a webpage, will provide the RSS feeds of that webpage in two formats, including a JSON form. The user has the option to decide how many feeds they are willing to get and based on this option, they will be shown that many feeds.
How to use the RSS-reader
The RSS-reader is represented in CMI form, by running the program, user can enter the RSS of a webpage and the number of feeds they have selected as the option.
Below is demonstrated the help of the program
As it is mentioned in the help of the program, user have multiple options as follows:
help
which displays the options, flags and arguments for the program.version
displays the version in which the program comes in
If this option is selected, only the version will be printed.json
if selected, the news feed will be printed in json formatverbose
if selected, the user will be informed by the logs shown to them that how the program progresseslimit
the number of news feeds the user is willing to be shown. If not specified, all the feeds available on the RSS will be output.And as argument, it is required to input an RSS URL.
source
the RSS URL which the user expects to see feeds fromdate
the date specified by the user to be shown the feed from that date--to-html
path to which the HTML file should be stored--to-pdf
path to which the PDf file should be stored--colorize
prints the output in colorized formatBelow is illustrate an example of running the program with multiple options:
News Feed in JSON format
If the user selects the option of being presented the feed in json format, depending on the –limit option, they will be shown that number of feeds in json format.
The JSON fomat implemented for the RSS-reader is as follows: The feeds based on their relevence on the RSS file will be numbered starting from 1 and the output would appear in the followig form:
with corresponding info about each news in front of the fields.
Below is an actual ouput of the news feed in JSON format is represented:
Caching the News
When the user inputs a RSS URL, and no date is entered, the rss-reader fetches the feed items from the specified source and prints it in normal or json format, based on the options selected. While doing this, it also caches the read news.
The utility caches the feeds data as follows: When a feed is read, a dictionary of the feed’s information is created, storing its title, date, content, news link and image’s link, the RSS source and a path to the feed’s cache directory. The utility creates a cache directory in the cached_news folder for each feed. In the feed’s directory, the article of the feed from its news page is downloaded in a text file, the links in that article are extracted and stored in a text file and the images in the article are downloaded in another directory named “images” in the feed’s directory. This is done for when the utility wants to convert the feeds into HTML or PDF.
Then for each feed, a tuple is constructed, first element being the news date and the second one the previously mentioned dictionary and all tuples each corresponding to one feed are stored in a list that is saved in a file in the cached_news directory. The cached news are fetched by the news date, hence this implementation is designed that is demonstrated in the image below.
The cached_news directory would look like this:
And inside each feed’s directory, would look like this:
How RSS-reader Utility Works
If the user selects –version option as an argument, the version of the utility will be printed and the program will be ended.
If the user selects –verbose option, verbose will be printed in the stdout
If the user selects –colorize option, the output on stoud would be printed in colorful format.
If the user doesn’t enter neither RSS URL nor –date, an error will be raised.
Otherwise, the behavior of the utility would be as it is illustrated in the diagram below:

Used Libraries in the Project
For the implementation of this project, a number of libraries have been made use of. The most important of them are as follows: | These libraries are as follows:
argparse
for parsing the arguments of the CMIxml.etree.ElementTree
for parsing the XML file of RSS into XML objectsrequests
for fetching web pagesjson
for converting the dictionary into json format
This library has used in the second approach for this goal, and is not in the initial implementation (in the commented section)logging
for logging info/warning/error messages when the verbose option is set to onre
for regular expression operationsdatetime
and dateutil
for the date format conversionstextwrap
for wrapping the text in 120 characters formatpickle
for data serialization during caching processreportlab
for producing PDF documentsBeautifulSoup
for parsing HTML content into elementsNotes for the Reviewers
Dear reviewers, during the implementation of this project, I faced a few vague points and complications that I will present in this section, some of them for the purpose of clarification.
pycodestyle errors
In the output of the pycodestyle, there were few too many blank lines error. They were regarding the 2 blank lines I surrounded my module functions with, according the pep-8 guideline.
Future works
I will be improving this project every time I have time, as I have learned a lot from this project and I am still learning. My next step is going to implement test, as this step was not mandatory. Then I will be working on the implementation of the 6th iteration.