RSS feed printer

Using an old printer to print the latest news

When retro computing is mentioned, printers hardly get any attention. Which is one of the reasons I thought it would be fun to do a project where the printer could fulfill a more important task than printing out a screenshot of a Micky Mouse image from a silly game. So by using an old printer to print the latest news, printers could perhaps be considered a bit more be fun or interesting. The printer this project uses (Brother HR-5C), is chosen for reasons explained in the video. But mostly because this printer has a special place in my retro computing heart. See my page about this printer Brother_HR-5C.htm
I wanted to make a video about this project right from the start and I fantasized a scene about the telexes of the past, whizzing and buzzing all day long spewing out the latest, with a nostalgic and noisy charm. News from reporters around the world, with the telexes placed in the newsroom of an important newspaper giant. Now, the fact is, that I never grew up with these machines, I only know telexes from disaster of spy movies and. So nostalgically speaking, I have no deep feeling for these machines (other then the intense admiration for the technology). But the noisy dot-matrix impact printer, now that's a thing I grew up with. So I decided to focus the video a bit more around the printers itself, maybe even show some different kind of printers. Which I did, and the fact that this project is about printing the latest news, the focus is more on the printers, which was actually the idea in the first place. So the news itself doesn't get too much screen time. And technically speaking, decoding an RSS feed and sending it to a printer is, thanks to the ESP8266, not that difficult these days. And the news itself, well... that gets old pretty quickly.

What is an RSS feed?

RSS feeds are simplified webpages. They are a standard with a logo that looks like the image on the right, although slight variations to this logo exist. Technically speaking RSS feeds are a simple wrapper around very short messages or "items". Messages consisting of a "title", "description" and "pubDate". These messages tell the absolute minimum of a news item. These kind of messages are ideal to scroll along your screen of in the banner of a web-browser. Some consider RSS feeds as a things of the past, but plenty of news related websites offer an RSS feed with the latest news from various categories. For instance the BBC news, CNN or the Dutch news website NU.nl
RSS feeds should be simple and short .XML based files that could be downloaded front he internet. Unfortunately, some files are hundreds of kilobytes in size, which is a bit awkward for something which is intended to be compact. But these are exceptions and if processed correctly, should not pose a problem for this project.
RSS feeds are XML based, a human readable text format, where the data is between special markers. Markers that are always used in pairs. One marker to identify the beginning of the marked area and one marker to identify the ending of the marked area. The markers this project uses are, title, description and pubDate and are placed in an area marked by the labels <item> and </item>. An example of an RSS file, as supplied by the BBC world website, is to be found in the download section below. From that file we only require three fields (of each news-item) to be processed

  • title
  • The title is a piece of text which could be compared to a newspapers headline.
    <itle> Ocean liner sunk of first voyage </title>

  • description
  • The description is a piece of text that could be considered as the article, although it is mostly compressed into one or two brief lines of text and sometimes very similar to the headline.
    <description> The prestigious ocean liner "Titanic" sunk after collision with iceberg. More than 1500 people died.</description>

  • pubdate
  • The publication date is the date of the release of the item. You could call it a timestamp. However experience has learned that this timestamp might be adjusted/corrected/changed by the authors when the RSS feed is updated.
    <pubDate> 14 Apr 1912, 05:18 GMT</pubDate>

    Processing the RSS feed

    Processing XML is a thing that can be rather complex if you want to do it right. But it can be rather simple if you are willing to make some compromises. An annoying fact is that there exist various version of the RSS format: v0.9, v1.0, v2.0. So how could we ever handle all of these without implementing very version separately. Well, we don't. Since there is no need for it to do so. We just search for the most basic functionality and ignore everything else. And another thing that is annoying is the pretty large file-size that RSS feeds seem to have these days. On an embedded system this simply means that we cannot afford large buffers (or run the risk of files exceeding the buffer size). In other words we cannot buffer the entire file and process it extensively. What we can do is process the data as it comes along, buffering nothing but the contents we are interested in. Meaning that we only need some string buffers of a limited size. Since we don't know what RSS type we are going to receive it is best to keep it simple and only search for the things we know that do exist in all versions. And since we are only interested in the headlines and short story, we are going to be fine. So these are the items we are willing to search for and process their contents.
    The item-tag <item>, encapsulates each and every news item. There is no guarantee that the item will contain all three tags: <itle>, &ldescription>, &lpubDate>. So we must be able to accept and be able to handle that they might not present in an item. This is what the simplified RSS processing code needs to do, The method described here is capable of handling the presence (and absence) of the 3 most important tags in the item:
    Clear all the variables we are about to fill with new data.
    Search for the tag >item> if this is not found, then report an error "RSS feed contains no items" or something else if desired, to the console for debugging purposes.
    Processing the incoming data:
    if it starts with <title> store the data to the title variable until we detect </title>
    if it starts with <description> store the data to the description variable until we detect </description>
    if it starts with <pubdate> store the data to the title variable until we detect </pubdate>
    if we detect </item> (anywhere in the data-stream during processing) then we have completed in processing the current item.
    If an item contained information suited for printing, then print it and repeat everything for the next item. Keep repeating the above steps until the number of desired items are printed or the file ends. Below and example of how the output on the printer looks like.
    This information is extracted from the XML file for every item in the file (or only the first 10 or whatever). The main problem is that you just don't know when the RSS file will be updated. So at any point in time the RSS file could updated on the website and could contain one or more new items. Which means that the firmware for this project will periodically get the RSS file from the website of interest. For example every 30 min. And then check for new items. There is no need to check for updates every minute, updates are not that frequent. And a high RSS feed download rate most likely only annoy the website that supplies the RSS feed, you do not want them to block you.
    Because it would be very silly to print the same news item several times the firmware of this project needs to do some checking in order to prevent printing duplicates. This can be achieved by calculating a checksum value for every printed message and storing these checksums in a table. When we open an new RSS file and detect a checksum that already exists, then there is a very high change this message has been printed before and should not be printed again. The checksum is calculated by adding every character value of the title and description field, can it be any simpler? Now although there is a chance that different message generate the same checksum... this chance is so tiny that we take that risk of missing. The pubDate field is explicitly NOTincluded in the checksum calculation, because experience has learned that this field sometimes is changed, while the title and description fields remain the same. Now there is probably a very good reason why this happens, but for us this only annoying and not really important as the text/item has already been printed and cannot be changed, printing the same line again with the new pubDate information only leads to readers attempting to spot a difference, which in many cases, will not be found because there isn't any.

    Downloads:

    The PCB is very simple, just an ESP8266, some level shifting transistors. Although there level shifting isn't really special since the entire system operates as an open collector bus. But technically, this is a 3.3V system connected to a 5V bus. And we cannot connect the IO-pins directly to the bus, simply because a programming mistake is easily made and then we might connect the 3.3V output to the 5V bus voltage. Now it is disputable of this would cause any significant damage as the bus is pretty weak. But transistors are cheap, so why play dirty, adding the transistors is just the right thing to do. NOw if we were using a 5V operated Arduino then we could mimic the open collector setup by switching the IO-pins between output and input. But that's something for another project.
    The data-line is the only bi-directional line. Meaning that we should be able to read it back to. But the use of a pull-up resistor combined with a Shottky diode, we can safely read the 5V bus level. For the complete schematic, please refer to the PDF in the download section below.
    The ESP8266 connects to the printer using a Commodore IEC serial port. This ports is a bus system that uses a 6-pin DIN connector. Meaning that we do need only a few IO-pins. Which is pretty practical and much easier compared to much more pins that are required by the GPIB (IEEE-488) or the Centronics parallel cables. Now, there is nothing like a free lunch... so we pay for this in firmware, which is a bit more complex than a parallel system. But this isn't a big deal, the protocol is reasonably well described. Although the Commodore IEC serial interface has the reputation of being slow (a reputation that is entirely correct) speed is not an issue at all for a slow device like a printer. In other words, the bus is just the perfect for a simple project like this. Below an image of an IEC serial cable and the coneector on the projects PCB.

    Downloads:

    Schematic in PDF format: schematic.pdf
    Example of an XML file as downloaded from the BBC world website: example_BBC_world_rss.xml
    The firmware is located at my Github page