Using an old printer to print the latest news
RSS feeds are simplified webpages. They are a standard with a logo that looks like the image on the right, although slight variations to this logo exist. Technically speaking RSS feeds are a simple wrapper around very short messages or "items". Messages consisting of a "title", "description" and "pubDate". These messages tell the absolute minimum of a news item. These kind of messages are ideal to scroll along your screen of in the banner of a web-browser. Some consider RSS feeds as a things of the past, but plenty of news related websites offer an RSS feed with the latest news from various categories. For instance the BBC news, CNN or the Dutch news website NU.nl
RSS feeds should be simple and short .XML based files that could be downloaded front he internet. Unfortunately, some files are hundreds of kilobytes in size, which is a bit awkward for something which is intended to be compact. But these are exceptions and if processed correctly, should not pose a problem for this project.
RSS feeds are XML based, a human readable text format, where the data is between special markers. Markers that are always used in pairs. One marker to identify the beginning of the marked area and one marker to identify the ending of the marked area. The markers this project uses are, title, description and pubDate and are placed in an area marked by the labels
<item> and </item>. An example of an RSS file, as supplied by the BBC world website, is to be found in the download section below. From that file we only require three fields (of each news-item) to be processed
Processing XML is a thing that can be rather complex if you want to do it right. But it can be rather simple if you are willing to make some compromises. An annoying fact is that there exist various version of the RSS format: v0.9, v1.0, v2.0. So how could we ever handle all of these without implementing very version separately. Well, we don't. Since there is no need for it to do so. We just search for the most basic functionality and ignore everything else. And another thing that is annoying is the pretty large file-size that RSS feeds seem to have these days. On an embedded system this simply means that we cannot afford large buffers (or run the risk of files exceeding the buffer size). In other words we cannot buffer the entire file and process it extensively. What we can do is process the data as it comes along, buffering nothing but the contents we are interested in. Meaning that we only need some string buffers of a limited size. Since we don't know what RSS type we are going to receive it is best to keep it simple and only search for the things we know that do exist in all versions. And since we are only interested in the headlines and short story, we are going to be fine. So these are the items we are willing to search for and process their contents.
The item-tag <item>, encapsulates each and every news item. There is no guarantee that the item will contain all three tags: <itle>, &ldescription>, &lpubDate>. So we must be able to accept and be able to handle that they might not present in an item. This is what the simplified RSS processing code needs to do, The method described here is capable of handling the presence (and absence) of the 3 most important tags in the item:
Clear all the variables we are about to fill with new data.
Search for the tag >item> if this is not found, then report an error "RSS feed contains no items" or something else if desired, to the console for debugging purposes.
Processing the incoming data:
if it starts with <title> store the data to the title variable until we detect </title>
if it starts with <description> store the data to the description variable until we detect </description>
if it starts with <pubdate> store the data to the title variable until we detect </pubdate>
if we detect </item> (anywhere in the data-stream during processing) then we have completed in processing the current item.
If an item contained information suited for printing, then print it and repeat everything for the next item. Keep repeating the above steps until the number of desired items are printed or the file ends. Below and example of how the output on the printer looks like.
This information is extracted from the XML file for every item in the file (or only the first 10 or whatever). The main problem is that you just don't know when the RSS file will be updated. So at any point in time the RSS file could updated on the website and could contain one or more new items. Which means that the firmware for this project will periodically get the RSS file from the website of interest. For example every 30 min. And then check for new items. There is no need to check for updates every minute, updates are not that frequent. And a high RSS feed download rate most likely only annoy the website that supplies the RSS feed, you do not want them to block you.
Because it would be very silly to print the same news item several times the firmware of this project needs to do some checking in order to prevent printing duplicates. This can be achieved by calculating a checksum value for every printed message and storing these checksums in a table. When we open an new RSS file and detect a checksum that already exists, then there is a very high change this message has been printed before and should not be printed again. The checksum is calculated by adding every character value of the title and description field, can it be any simpler? Now although there is a chance that different message generate the same checksum... this chance is so tiny that we take that risk of missing. The pubDate field is explicitly NOTincluded in the checksum calculation, because experience has learned that this field sometimes is changed, while the title and description fields remain the same. Now there is probably a very good reason why this happens, but for us this only annoying and not really important as the text/item has already been printed and cannot be changed, printing the same line again with the new pubDate information only leads to readers attempting to spot a difference, which in many cases, will not be found because there isn't any.
The PCB is very simple, just an ESP8266, some level shifting transistors. Although there level shifting isn't really special since the entire system operates as an open collector bus. But technically, this is a 3.3V system connected to a 5V bus. And we cannot connect the IO-pins directly to the bus, simply because a programming mistake is easily made and then we might connect the 3.3V output to the 5V bus voltage. Now it is disputable of this would cause any significant damage as the bus is pretty weak. But transistors are cheap, so why play dirty, adding the transistors is just the right thing to do. NOw if we were using a 5V operated Arduino then we could mimic the open collector setup by switching the IO-pins between output and input. But that's something for another project.
The data-line is the only bi-directional line. Meaning that we should be able to read it back to. But the use of a pull-up resistor combined with a Shottky diode, we can safely read the 5V bus level. For the complete schematic, please refer to the PDF in the download section below.
The ESP8266 connects to the printer using a Commodore IEC serial port. This ports is a bus system that uses a 6-pin DIN connector. Meaning that we do need only a few IO-pins. Which is pretty practical and much easier compared to much more pins that are required by the GPIB (IEEE-488) or the Centronics parallel cables. Now, there is nothing like a free lunch... so we pay for this in firmware, which is a bit more complex than a parallel system. But this isn't a big deal, the protocol is reasonably well described. Although the Commodore IEC serial interface has the reputation of being slow (a reputation that is entirely correct) speed is not an issue at all for a slow device like a printer. In other words, the bus is just the perfect for a simple project like this. Below an image of an IEC serial cable and the coneector on the projects PCB.
Schematic in PDF format: schematic.pdf
Example of an XML file as downloaded from the BBC world website: example_BBC_world_rss.xml
The firmware is located at my Github page