DISE RSS Reader

From DISE KnowledgeBase

Jump to: navigation, search

Overview

RSS (Really Simple Syndication) is a common format for distributing news and other text content. The RSS format is used by most newspapers, news-websites, etc… to provide easy access to their news.

The DISE RSS Reader will give you the ability to take any of the available RSS feeds on the Internet and convert this to a text file that can easily be used as a data source for the Ticker Object in the DISE system.

Lightbulb.png Note: It is important to always make sure that you have permission to show RSS feeds in public, contact the owner of the RSS feed to get their permission before starting to use it.

RSS Reader.png

In the main window of the DISE RSS Reader you can manage several RSS feeds, change the update interval and Start / Stop processing. It is also possible to get the DISE RSS Reader to start minimized in the tray.

To be able to edit any of the settings or feeds you have to click on the Stop button, the DISE RSS Reader will always start in running mode.

To add new RSS feed click on the Add button. To edit a feed click on the Edit button.

RSS Feed Settings dialog

RSS Reader Feed.png

Name

The name of the RSS feed.

URL

URL to an RSS feed, in fact this can be a path to a local file, a HTTP URL or a FTP URL.

Output Filename

The path to the file that should be the output from this feed.

Test

Test the settings and fill the preview box. This function will update the output text file as well.

Get RSS Code

Get the actual XML RSS Code, and show in the preview window. To help resolving issues.

Include options tab

An RSS Feed can contain extra information describing the feed. Each article can also contain more information that can be included in the output text.

Include description of feed (at the top)

Include the description of the feed at the beginning of the text.

Include date in description

If enabled the date of the channel will be included in the beginning of the text.

Include date to each news item

If enabled the date and time of each article will be included in the end of each news item.

Only include first sentence of each news item

Try to shorten the next items by only including the first sentence (up to the first ".").

Do not include news older than

Skip all news items that are older than specified amount of hours.

Maximum number of news items to include

Limit the number of news listed.

Separators tab

For each news item you can specify text to add before and after the news item. Between each part of the news item you can also specify the separators.

Text to insert before each news item

Separator between title and description

Separator between description and date

Text to add after each news item

In any of these fields, enter \n to get a new line and \t to get a tab.

RSS Reader Feed Separators.png

Filter options tab

RSS Feeds can contain lots of different data and is many times badly formatted and can contain both incorrect characters and text. With these options you can filter incorrect news items.

Remove news items with empty title or description

Instruct the parser to not include news items that does not have any text in the title or the description.

Remove duplicate news items

If enabled the parser will remove any duplicate news items.

Remove news items with invalid characters

If enabled the parser will remove any characters that are unreadable.

Remove html tags from news items

If enabled the parser will remove HTML tags if any exists in the description.

Remove description if it is equal to news item title

Remove the description if it is equal to the title of the news item, will then only display the title.

Keep line breaks in text

Keep line breaks in the text from the feed, is only applicable for some feeds.

Convert no compliant XML entities

Convert XML entities that are not compliant to XML standard to compatible entities, so that the feed can be parsed successfully.

RSS Reader Feed Filter options.png

Additional text tab

It is possible to add additional text before and after all the news items. You may also want to change the default error message.

Text row to insert before all news items

Text inserted before the whole text.

Text row to insert after all news items

Text to be inserted after the whole text.

Display error text

Used if the RSS Reader was unable to download the news feed.
Enter \n to add a new line and \t to add a tab.

RSS Reader Feed Additional text.png

Sort options tab

If the RSS Feed contains date and time information for its news items then it is possible to sort all the news items by date. Otherwise they will be in the original order of the RSS Feed.

Sort news items by date

Sort the news items based on the date specified in the RSS Feed.

Reverse line order

Reverse the lines, with the oldest news first or if not using date sort, with the last item in the feed first.

RSS Reader Feed Sort options.png

Output file options tab

Codepage

Internally the DISE RSS Reader works with Unicode text. But when writing the output file it is possible to specify the code page that should be used.

Append to text file

The default operation is for each RSS Feed to overwrite the output file(s) specified. With this option enabled this will not happen and text from this RSS Feed will be added to the end of the output file. This enables you to add several feeds into the same text file by disabling this option in the first feed and then enabling the option in the following feeds.

Several output files

Instead of using the single output file it is possible to setup output to several output files to distribute the files directly from the RSS Reader.

RSS Reader Feed Output file options.png

Authentication tab

Some web servers and most FTP servers require that you specify an username and password to be able to log in.

RSS Reader Feed Authentication.png

Proxy options tab

At certain locations you may find that you have to connect use a Proxy server to be able to reach the Internet. This proxy server will be used by all web (HTTP) and FTP connections.

RSS Reader Feed Proxy options.png

Download enclosed files tab

Enclosed files can be an image or video file that are attached to each news item. These items can be downloaded to a specified path. If an news item does not contain an image then the default image will be used.

Download files to path

Directory where the files will be downloaded to.

Main part of file name

The name that will be used for the files downloaded. Ex. "NewsImage1.image" for the image corresponding to the first news item in output text.
All images will have the extension ".image" instead of .jpg, .png, etc. That is to be able to handle different image file formats without having to change any settings or content.

Only download image files

If checked only image files will be accepted for download, otherwise any file format will be accepted.

Default file

If a file could not be downloaded, or if no file could connected to the next item then this default file will be used instead.
If you leave this empty, no default will be used and the file will instead be deleted if it already exists.

RSS Reader Feed Download enclosed files.png

Multiple file sources tab

If a wild card are used for download from of news items from an FTP server then these options will apply for the downloaded files.

Delete source news file(s)

If possible, delete the files used as source.

Cache news file(s) on disk

Download the feed files to disk before doing any processing

Archive old news file(s)

Archive old files instead of deleting the files.

RSS Reader Feed Multiple file sources.png

RegExp tab

Use regular expressions (RegExp) to identify strings of interest, such as particular characters, words, or patterns of characters.

Info.png Read more: Regular expressions

Process lines separate

Manage each line separate, instead of applying the regular expression on all the text at once

Use RegEx replace

Do regular expressions replace

RegEx options

Change how regular expressions will behave.

Regular expression

The regular expression to use.

Replace matches

Text to apply matched back references to.

Matches filters

List of filters to apply to the back references.

Example

Input text

Bangkok - ;;;0.1;1013;68;;;THUNDER;30;;O;2|;;;0.2;1012;69;;;Thunder;30;;O;1

Regular expression

(.*) - (?:[\d\w\s,.]*;){8}([\d\w\s,.-]*);([-{0,1}\d\w\s,.]*);

Replace matches text

C:\\Weather\\\2.png

Result

C:\Weather\THUNDER.png

RSS Reader Feed RegExp.png

RegEx Filter

Choose a backreference number between 0 and 9 to apply a filter function to.

Available filter functions for each back reference are:

  • Upper case - Make all chars upper case.
  • Lower case - Make all chars lower case.
  • First char upper case - Make the first char in a text upper case and everything else lower case.
  • String replace - Replace backreferences text.
  • Sting replace (wild) - Same as replace, but does not require the whole backreference to match.

DISE RSS Reader RegEx filter.png

See also