Commit 9b7292df authored by Mattia Rizzolo's avatar Mattia Rizzolo

Delete venus from the repository

Signed-off-by: Mattia Rizzolo's avatarMattia Rizzolo <mapreri@ubuntu.com>
parent 8eda8cd8
Sam Ruby <rubys@intertwingly.net>
This codebase represents a radical refactoring of Planet 2.0, which lists
the following authors:
Scott James Remnant <scott@netsplit.com>
Jeff Waugh <jdub@perkypants.org>
This diff is collapsed.
Planet
------
Planet is a flexible feed aggregator. It downloads news feeds published by
web sites and aggregates their content together into a single combined feed,
latest news first. This version of Planet is named Venus as it is the
second major version. The first version is still in wide use and is
also actively being maintained.
It uses Mark Pilgrim's Universal Feed Parser to read from CDF, RDF, RSS and
Atom feeds; Leonard Richardson's Beautiful Soup to correct markup issues;
and either Tomas Styblo's templating engine or Daniel Viellard's implementation
of XSLT to output static files in any format you can dream up.
To get started, check out the documentation in the docs directory. If you have
any questions or comments, please don't hesitate to use the planet mailing list:
http://lists.planetplanet.org/mailman/listinfo/devel
Keywords: feed, blog, aggregator, RSS, RDF, Atom, OPML, Python
DeWitt Clinton - Mac OSX
Mary Gardiner - PythonPath
Elias Torres - FOAF OnlineAccounts
Jacques Distler - Template patches
Michael Koziarski - HTTP Auth fix
Brian Ewins - Win32 / Portalocker
Joe Gregorio - python versioning for filters, verbose tests, spider_threads
Harry Fuecks - Pipe characters in file names, filter bug
Eric van der Vlist - Filters to add language, category information
Chris Dolan - mkdir cache; default template_dirs; fix xsltproc
David Sifry - rss 2.0 xslt template based on http://atom.geekhood.net/
Morten Frederiksen - Support WordPress LinkManager OPML
Harry Fuecks - default item date to feed date
Antonio Cavedoni - Django templates
Morten Frederiksen - expungeCache
Lenny Domnitser - Coral CDN support for URLs with non-standard ports
Amit Chakradeo - Allow read-only files to be overwritten
Matt Brubeck - fix new_channel
Aristotle Pagaltzis - ensure byline_author filter doesn't drop foreign markup
This codebase represents a radical refactoring of Planet 2.0, which lists
the following contributors:
Patches and Bug Fixes
---------------------
Chris Dolan - fixes, exclude filtering, duplicate culling
David Edmondson - filtering
Lucas Nussbaum - locale configuration
David Pashley - cache code profiling and recursion fixing
Gediminas Paulauskas - days per page
Spycyroll Maintainers
---------------------
Vattekkat Satheesh Babu
Richard Jones
Garth Kidd
Eliot Landrum
Bryan Richard
TODO
====
* Allow display normalisation to specified timezone
Some Planet admins would like their feed to be displayed in the local
timezone, instead of UTC.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import cgi
import cgitb
cgitb.enable()
from urllib import unquote
import sys, os
# Modify this to point to where you usually run planet.
BASE_DIR = '..'
# Modify this to point to your venus installation dir, relative to planet dir above.
VENUS_INSTALL = "venus"
# Config file, relative to planet dir above
CONFIG_FILE = "config/live"
# Admin page URL, relative to this script's URL
ADMIN_URL = "admin.html"
# chdir to planet dir - config may be relative from there
os.chdir(os.path.abspath(BASE_DIR))
# Add venus to path.
sys.path.append(VENUS_INSTALL)
# Add shell dir to path - auto detection does not work
sys.path.append(os.path.join(VENUS_INSTALL, "planet", "shell"))
# import necessary planet items
from planet import config
from planet.spider import filename
# Load config
config.load(CONFIG_FILE)
# parse query parameters
form = cgi.FieldStorage()
# Start HTML output at once
print "Content-Type: text/html;charset=utf-8" # HTML is following
print # blank line, end of headers
print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">'
print '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sv"><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8" /><title>Admin results</title></head><body>'
print '<div>'
# Cache and blacklist dirs
cache = config.cache_directory()
blacklist = config.cache_blacklist_directory()
# Must have command parameter
if not "command" in form:
print "<p>Unknown command</p>"
elif form['command'].value == "blacklist":
# Create the blacklist dir if it does not exist
if not os.path.exists(blacklist):
os.mkdir(blacklist)
print "<p>Created directory %s</p>" % blacklist
# find list of urls, in the form bl[n]=url
for key in form.keys():
if not key.startswith("bl"): continue
url = unquote(form[key].value)
# find corresponding files
cache_file = filename(cache, url)
blacklist_file = filename(blacklist, url)
# move to blacklist if found
if os.path.exists(cache_file):
os.rename(cache_file, blacklist_file)
print "<p>Blacklisted <a href='%s'>%s</a></p>" % (url, url)
else:
print "<p>Unknown file: %s</p>" % cache_file
print """
<p>Note that blacklisting does not automatically
refresh the planet. You will need to either wait for
a scheduled planet run, or refresh manually from the admin interface.</p>
"""
elif form['command'].value == "run":
# run spider and refresh
from planet import spider, splice
try:
spider.spiderPlanet(only_if_new=False)
print "<p>Successfully ran spider</p>"
except Exception, e:
print e
doc = splice.splice()
splice.apply(doc.toxml('utf-8'))
elif form['command'].value == "refresh":
# only refresh
from planet import splice
doc = splice.splice()
splice.apply(doc.toxml('utf-8'))
print "<p>Successfully refreshed</p>"
elif form['command'].value == "expunge":
# only expunge
from planet import expunge
expunge.expungeCache()
print "<p>Successfully expunged</p>"
print "<p><strong><a href='" + ADMIN_URL + "'>Return</a> to admin interface</strong></p>"
print "</body></html>"
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Administration interface</title>
</head>
<body>
<h2>Administration interface</h2>
<p>Venus comes with a basic administration interface, allowing you to manually run planet, do a refresh from cache, expunge the cache or blacklist individual entries from the planet.</p>
<h3>Using the administration interface</h3>
<p>The administration interface allows you to manage the everyday tasks related to your venus installation.</p>
<ul><li><strong>Running planet</strong>. By clicking the "Run planet" button, you can do a full run of the planet script, rechecking all the feeds and recreating the generated files. This corresponds to running <code>python planet.py config.ini</code> with no arguments. Note that, depending on the numer of feeds, this operation may take some time.</li>
<li><strong>Refreshing planet</strong>. By clicking the "Refresh planet" button, you can do an "offline" run of the planet script, without rechecking all the feeds but still recreating the generated files. This corresponds to running <code>python planet.py -o config.ini</code>.</li>
<li><strong>Expunging the planet cache</strong>. By clicking the "Expunge cache" button, you can clean the cache from outdated entries. This corresponds to running <code>python planet.py -x config.ini</code>.</li>
<li><strong>Blacklisting</strong>. By selecting one or more of the entries in the list of entries, and clicking the "Blacklist" button, you can stop these items from displaying on the planet. This is very useful for quickly blocking inappropriate or malformed content from your planet. <i>Note that blacklisting does not take effect until you refresh or rerun the planet</i>. (Blacklisting can also be done manually on the server by moving files from the cache directory to the blacklist directory.)</li>
</ul>
<p>Installing the administration interface securely requires some knowledge of web server configuration.</p>
<p>The admin interface consists of two parts: the admin template file and the server callback script. Both must be correctly installed for the administration interface to work.</p>
<h3>Installing the admin template</h3>
The admin page template is found in <code>themes/common/admin.html.tmpl</code>. This template needs to be added to your config file along with your other templates, and optionally customized. Make sure that <code>action="admin_cb.py"</code> found in several places in the file points to the URL (or relative URL) of the admin callback script below.
<h3>Installing the admin callback script</h3>
<p>The admin callback script, admin_cb.py, needs to be copied to somewhere among your web server files. Depending on the details of your web server, your permissions, etc., this can be done in several different ways and in different places. There are three steps involved:</p>
<ol><li>Configuring the script</li>
<li>Enabling CGI</li>
<li>Secure access</li></ol>
<h4>Configuring the script</h4>
<p>At the top of the script, there are four variables you must customize. The correct values of the first three variables can be found by analyzing how you normally run the <code>planet.py</code> script. If you typically run planet from within the working directory <code>BASE_DIR</code>, using a command like <blockquote><code>python [VENUS_INSTALL]/planet.py [CONFIG_FILE]</code></blockquote> you know all three values.</p>
<dl><dt><code>BASE_DIR</code></dt><dd>
This variable must contain the directory from where you usually run the planet.py script, to ensure that relative file names in the config files work correctly.</dd>
<dt><code>VENUS_INSTALL</code></dt><dd>
This variable must contain your venus installation directory, relative to BASE_DIR above.</dd>
<dt><code>CONFIG_FILE</code></dt><dd>
This variable must contain your configuration file, relative to BASE_DIR above.</dd>
<dt><code>ADMIN_URL</code></dt><dd>
This variable must contain the URL (or relative URL) of the administration page, relative to this script's URL.</dd>
</dl>
<h4>Enabling CGI</h4>
<p>You will need to ensure that it can be run as a CGI script. This is done differently on different web server platforms, but there are at least three common patterns</p>
<ul><li><b>Apache with <code>.htaccess</code></b>. If your server allows you to use <code>.htaccess</code> files, you can simply add
<blockquote><code>Options +ExecCGI<br />
AddHandler cgi-script .py</code></blockquote>
in an .htaccess file in the planet output directory to enable the server to run the script. In this case, the admin_cb.py file can be put alongside the rest of the planet output files.
</li>
<li><b>Apache without <code>.htaccess</code></b>. If your server does not allow you to add CGI handlers to <code>.htaccess</code> files, you can add
<blockquote><code>Options +ExecCGI<br />
AddHandler cgi-script .py</code></blockquote>
to the relevant part of the central apache configuration files.
</li>
<li><b>Apache with cgi-bin</b>. If your server only allow CGI handlers in pre-defined directories, you can place the <code>admin_cb.py</code> file there, and make sure to update the <code>action="admin_cb.py"</code> code in the template file <code>admin.html.tmpl</code>, as well as the <code>ADMIN_URL</code> in the callback script.
</li>
</ul>
<p>In all cases, it is necessary to make sure that the script is executed as the same user that owns the planet output files and the cache. Either the planet output is owned by the apache user (usually <code>www-data</code>), or Apache's <a href="http://httpd.apache.org/docs/2.0/suexec.html">suexec</a> feature can be used to run the script as the right user.</p>
<h4>Securing the admin interface</h4>
<p>If you don't want every user to be able to administrate your planet, you must secure at least the <code>admin_cb.py</code> file, and preferrably the <code>admin.html</code> file as well. This can be done using your web server's regular access control features. See <a href="http://httpd.apache.org/docs/2.0/howto/auth.html">here</a> for Apache documentation.</p>
</body>
</html>
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Configuration</title>
</head>
<body>
<h2>Configuration</h2>
<p>Configuration files are in <a href="http://docs.python.org/lib/module-
ConfigParser.html">ConfigParser</a> format which basically means the same
format as INI files, i.e., they consist of a series of
<code>[sections]</code>, in square brackets, with each section containing a
list of <code>name:value</code> pairs (or <code>name=value</code> pairs, if
you prefer).</p>
<p>You are welcome to place your entire configuration into one file.
Alternately, you may factor out the templating into a "theme", and
the list of subscriptions into one or more "reading lists".</p>
<h3 id="planet"><code>[planet]</code></h3>
<p>This is the only required section, which is a bit odd as none of the
parameters listed below are required. Even so, you really do want to
provide many of these, especially ones that identify your planet and
either (or both) of <code>template_files</code> and <code>theme</code>.</p>
<p>Below is a complete list of predefined planet configuration parameters,
including <del>ones not (yet) implemented by Venus</del> and <ins>ones that
are either new or implemented differently by Venus</ins>.</p>
<blockquote>
<dl class="compact code">
<dt>name</dt>
<dd>Your planet's name</dd>
<dt>link</dt>
<dd>Link to the main page</dd>
<dt>owner_name</dt>
<dd>Your name</dd>
<dt>owner_email</dt>
<dd>Your e-mail address</dd>
</dl>
<dl class="compact code">
<dt>cache_directory</dt>
<dd>Where cached feeds are stored</dd>
<dt>output_dir</dt>
<dd>Directory to place output files</dd>
</dl>
<dl class="compact code">
<dt><ins>output_theme</ins></dt>
<dd>Directory containing a <code>config.ini</code> file which is merged
with this one. This is typically used to specify templating and bill of
material information.</dd>
<dt>template_files</dt>
<dd>Space-separated list of output template files</dd>
<dt><ins>template_directories</ins></dt>
<dd>Space-separated list of directories in which <code>template_files</code>
can be found</dd>
<dt><ins>bill_of_materials</ins></dt>
<dd>Space-separated list of files to be copied as is directly from the <code>template_directories</code> to the <code>output_dir</code></dd>
<dt>filter</dt>
<dd>Regular expression that must be found in the textual portion of the entry</dd>
<dt>exclude</dt>
<dd>Regular expression that must <b>not</b> be found in the textual portion of the entry</dd>
<dt><ins>filters</ins></dt>
<dd>Space-separated list of <a href="filters.html">filters</a> to apply to
each entry</dd>
<dt><ins>filter_directories</ins></dt>
<dd>Space-separated list of directories in which <code>filters</code>
can be found</dd>
</dl>
<dl class="compact code">
<dt>items_per_page</dt>
<dd>How many items to put on each page. <ins>Whereas Planet 2.0 allows this to
be overridden on a per template basis, Venus currently takes the maximum value
for this across all templates.</ins></dd>
<dt><del>days_per_page</del></dt>
<dd>How many complete days of posts to put on each page This is the absolute, hard limit (over the item limit)</dd>
<dt>date_format</dt>
<dd><a href="http://docs.python.org/lib/module-time.html#l2h-2816">strftime</a> format for the default 'date' template variable</dd>
<dt>new_date_format</dt>
<dd><a href="http://docs.python.org/lib/module-time.html#l2h-2816">strftime</a> format for the 'new_date' template variable <ins>only applies to htmltmpl templates</ins></dd>
<dt><del>encoding</del></dt>
<dd>Output encoding for the file, Python 2.3+ users can use the special "xml" value to output ASCII with XML character references</dd>
<dt><del>locale</del></dt>
<dd>Locale to use for (e.g.) strings in dates, default is taken from your system</dd>
<dt>activity_threshold</dt>
<dd>If non-zero, all feeds which have not been updated in the indicated
number of days will be marked as inactive</dd>
</dl>
<dl class="compact code">
<dt>log_level</dt>
<dd>One of <code>DEBUG</code>, <code>INFO</code>, <code>WARNING</code>, <code>ERROR</code> or <code>CRITICAL</code></dd>
<dt><ins>log_format</ins></dt>
<dd><a href="http://docs.python.org/lib/node422.html">format string</a> to
use for logging output. Note: this configuration value is processed
<a href="http://docs.python.org/lib/ConfigParser-objects.html">raw</a></dd>
<dt>feed_timeout</dt>
<dd>Number of seconds to wait for any given feed</dd>
<dt>new_feed_items</dt>
<dd>Maximum number of items to include in the output from any one feed</dd>
<dt><ins>spider_threads</ins></dt>
<dd>The number of threads to use when spidering. When set to 0, the default,
no threads are used and spidering follows the traditional algorithm.</dd>
<dt><ins>http_cache_directory</ins></dt>
<dd>If <code>spider_threads</code> is specified, you can also specify a
directory to be used for an additional HTTP cache to front end the Venus
cache. If specified as a relative path, it is evaluated relative to the
<code>cache_directory</code>.</dd>
<dt><ins>cache_keep_entries</ins></dt>
<dd>Used by <code>expunge</code> to determine how many entries should be
kept for each source when expunging old entries from the cache directory.
This may be overriden on a per subscription feed basis.</dd>
<dt><ins>pubsubhubbub_hub</ins></dt>
<dd>URL to a PubSubHubbub hub, for example <a
href="http://pubsubhubbub.appspot.com">http://pubsubhubbub.appspot.com</a>.
Used by <code>publish</code> to ping the
hub when feeds are published, speeding delivery of updates to
subscribers. See
the <a href="http://code.google.com/p/pubsubhubbub/"> PubSubHubbub
home page</a> for more information.</dd>
<dt><ins>pubsubhubbub_feeds</ins></dt>
<dd>List of feeds to publish. Defaults to <code>atom.xml rss10.xml
rss20.xml</code>.</dd>
<dt id="django_autoescape"><ins>django_autoescape</ins></dt>
<dd>Control <a href="http://docs.djangoproject.com/en/dev/ref/templates/builtins/#autoescape">autoescaping</a> behavior of django templates. Defaults to <code>on</code>.</dd>
</dl>
<p>Additional options can be found in
<a href="normalization.html#overrides">normalization level overrides</a>.</p>
</blockquote>
<h3 id="default"><code>[DEFAULT]</code></h3>
<p>Values placed in this section are used as default values for all sections.
While it is true that few values make sense in all sections; in most cases
unused parameters cause few problems.</p>
<h3 id="subscription"><code>[</code><em>subscription</em><code>]</code></h3>
<p>All sections other than <code>planet</code>, <code>DEFAULT</code>, or are
named in <code>[planet]</code>'s <code>filters</code> or
<code>templatefiles</code> parameters
are treated as subscriptions and typically take the form of a
<acronym title="Uniform Resource Identifier">URI</acronym>.</p>
<p>Parameters placed in this section are passed to templates. While
you are free to include as few or as many parameters as you like, most of
the predefined themes presume that at least <code>name</code> is defined.</p>
<p>The <code>content_type</code> parameter can be defined to indicate that
this subscription is a <em>reading list</em>, i.e., is an external list
of subscriptions. At the moment, three formats of reading lists are supported:
<code>opml</code>, <code>foaf</code>, <code>csv</code>, and
<code>config</code>. In the future,
support for formats like <code>xoxo</code> could be added.</p>
<p><a href="normalization.html#overrides">Normalization overrides</a> can
also be defined here.</p>
<h3 id="template"><code>[</code><em>template</em><code>]</code></h3>
<p>Sections which are listed in <code>[planet] template_files</code> are
processed as <a href="templates.html">templates</a>. With Planet 2.0,
it is possible to override parameters like <code>items_per_page</code>
on a per template basis, but at the current time Planet Venus doesn't
implement this.</p>
<p><ins><a href="filters.html">Filters</a> can be defined on a per-template basis, and will be used to post-process the output of the template.</ins></p>
<h3 id="filter"><code>[</code><em>filter</em><code>]</code></h3>
<p>Sections which are listed in <code>[planet] filters</code> are
processed as <a href="filters.html">filters</a>.</p>
<p>Parameters which are listed in this section are passed to the filter
in a language specific manner. Given the way defaults work, filters
should be prepared to ignore parameters that they didn't expect.</p>
</body>
</html>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Contributing</title>
</head>
<body>
<h2>Contributing</h2>
<p>If you make changes to Venus, you have no obligation to share them.
And unlike systems based on <code>CVS</code> or <code>subversion</code>,
there is no notion of &ldquo;committers&rdquo; &mdash; everybody is
a peer.</p>
<p>If you should chose to share your changes, the steps outlined below may
increase your changes of your code being picked up.</p>
<h3>Documentation and Tests</h3>
<p>For best results, include both documentation and tests in your
contribution.</p>
<p>Documentation can be found in the <code>docs</code> directory. It is
straight XHTML.</p>
<p>Test cases can be found in the
<a href="http://intertwingly.net/code/venus/tests/">tests</a> directory, and
make use of the
<a href="http://docs.python.org/lib/module-unittest.html">Python Unit testing framework</a>. To run them, simply enter:</p>
<blockquote><pre>python runtests.py</pre></blockquote>
<h3>Git</h3>
<p>If you have done a <a href="index.html">git pull</a>, you have already set up
a repository. The only additional step you might need to do is to introduce
yourself to <a href="http://git-scm.com/">git</a>. Type in the following,
after replacing the <b>bold text</b> with your information:</p>
<blockquote><pre>git config --global user.name '<b>Your Name</b>'
git config --global user.email '<b>youremail</b>@<b>example.com</b>'</pre></blockquote>
<p>Then, simply make the changes you like. When you are done, type:</p>
<blockquote><pre>git status</pre></blockquote>
<p>This will tell you which files you have modified, and which ones you may
have added. If you add files and you want them to be included, simply do a:</p>
<blockquote><pre>git add file1 file2...</pre></blockquote>
<p>You can also do a <code>git diff</code> to see if there are any changes
which you made that you don't want included. I can't tell you how many
debug print statements I have caught this way.</p>
<p>Next, type:</p>
<blockquote><pre>git commit -a</pre></blockquote>
<p>This will allow you to enter a comment describing your change. If your
repository is already on your web server, simple let others know where they
can find it. If not, consider using <a href="">github</a> to host your
<a href="http://help.github.com/forking/">fork</a> of Venus.</p>
<h3>Telling others</h3>
<p>Once you have a change worth sharing, post a message on the
<a href="http://lists.planetplanet.org/mailman/listinfo/devel">mailing
list</a>, or use github to send a <a
href="http://github.com/guides/pull-requests">pull request</a>.</p>
</body>
</html>
body {
background-color: #fff;
color: #333;
font-family: 'Lucida Grande', Verdana, Geneva, Lucida, Helvetica, sans-serif;
font-size: small;
margin: 40px;
padding: 0;
}
a:link, a:visited {
background-color: transparent;
color: #333;
text-decoration: none !important;
border-bottom: 1px dotted #333 !important;
}
a:hover {
background-color: transparent;
color: #934;
text-decoration: none !important;
border-bottom: 1px dotted #993344 !important;
}
pre, code {
background-color: #FFF;
color: #00F;
font-size: large
}
h1 {
margin: 8px 0 10px 20px;
padding: 0;
font-variant: small-caps;
letter-spacing: 0.1em;
font-family: "Book Antiqua", Georgia, Palatino, Times, "Times New Roman", serif;
}
h2 {
clear: both;
}
ul, ul.outer > li {
margin: 14px 0 10px 0;
}
.z {
float:left;
background: url(img/shadowAlpha.png) no-repeat bottom right !important;
margin: -15px 0 20px -15px !important;
}
.z .logo {
color: magenta;
}
.z p {
margin: 14px 0 10px 15px !important;
}
.z .sectionInner {
width: 730px;
background: none !important;
padding: 0 !important;
}
.z .sectionInner .sectionInner2 {
border: 1px solid #a9a9a9;
padding: 4px;
margin: -6px 6px 6px -6px !important;
}
ins {
background-color: #FFF;
color: #F0F;
text-decoration: none;
}
dl.compact {
margin-bottom: 1em;
margin-top: 1em;
}
dl.compact > dt {
clear: left;
float: left;
margin-bottom: 0;
padding-right: 8px;
margin-top: 0;
list-style-type: none;
}
dl.compact > dd {
margin-bottom: 0;
margin-top: 0;
margin-left: 10em;
}
th, td {
font-size: small;
}
window.onload=function() {
var vindex = document.URL.lastIndexOf('venus/');
if (vindex<0) vindex = document.URL.lastIndexOf('planet/');
var base = document.URL.substring(0,vindex+6);
var body = document.getElementsByTagName('body')[0];
var div = document.createElement('div');
div.setAttribute('class','z');
var h1 = document.createElement('h1');
var span = document.createElement('span');
span.appendChild(document.createTextNode('\u2640'));
span.setAttribute('class','logo');
h1.appendChild(span);
h1.appendChild(document.createTextNode(' Planet Venus'));
var inner2=document.createElement('div');
inner2.setAttribute('class','sectionInner2');
inner2.appendChild(h1);
var p = document.createElement('p');
p.appendChild(document.createTextNode("Planet Venus is an awesome \u2018river of news\u2019 feed reader. It downloads news feeds published by web sites and aggregates their content together into a single combined feed, latest news first."));
inner2.appendChild(p);
p = document.createElement('p');
var a = document.createElement('a');
a.setAttribute('href',base);
a.appendChild(document.createTextNode('Download'));
p.appendChild(a);
p.appendChild(document.createTextNode(" \u00b7 "));
a = document.createElement('a');
a.setAttribute('href',base+'docs/index.html');
a.appendChild(document.createTextNode('Documentation'));
p.appendChild(a);
p.appendChild(document.createTextNode(" \u00b7 "));
a = document.createElement('a');
a.setAttribute('href',base+'tests/');
a.appendChild(document.createTextNode('Unit tests'));
p.appendChild(a);
p.appendChild(document.createTextNode(" \u00b7 "));
a = document.createElement('a');
a.setAttribute('href','http://lists.planetplanet.org/mailman/listinfo/devel');
a.appendChild(document.createTextNode('Mailing list'));
p.appendChild(a);
inner2.appendChild(p);
var inner1=document.createElement('div');
inner1.setAttribute('class','sectionInner');
inner1.setAttribute('id','inner1');
inner1.appendChild(inner2);
div.appendChild(inner1);
body.insertBefore(div, body.firstChild);
}
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Etiquette</title>
</head>
<body>
<h2>Etiquette</h2>
<p>You would think that people who publish syndication feeds do it with the
intent to be syndicated. But the truth is that we live in a world where
<a href="http://en.wikipedia.org/wiki/Deep_linking">deep linking</a> can
cause people to complain. Nothing is safe. But that doesn&#8217;t
stop us from doing links.</p>
<p>These concerns tend to increase when you profit, either directly via ads or
indirectly via search engine rankings, from the content of others.</p>
<p>While there are no hard and fast rules that apply here, here&#8217;s are a
few things you can do to mitigate the concern:</p>
<ul>
<li>Aggressively use robots.txt, meta tags, and the google/livejournal
atom namespace to mark your pages as not to be indexed by search
engines.</li>
<blockquote><p><dl>
<dt><a href="http://www.robotstxt.org/">robots.txt</a>:</dt>
<dd><p><code>User-agent: *<br/>
Disallow: /</code></p></dd>
<dt>index.html:</dt>
<dd><p><code>&lt;<a href="http://www.robotstxt.org/wc/meta-user.html">meta name="robots"</a> content="noindex,nofollow"/&gt;</code></p></dd>
<dt>atom.xml:</dt>
<dd><p><code>&lt;feed xmlns:indexing="<a href="http://community.livejournal.com/lj_dev/696793.html">urn:atom-extension:indexing</a>" indexing:index="no"&gt;</code></p>
<p><code>&lt;access:restriction xmlns:access="<a href="http://www.bloglines.com/about/specs/fac-1.0">http://www.bloglines.com/about/specs/fac-1.0</a>" relationship="deny"/&gt;</code></p></dd>
</dl></p></blockquote>
<li><p>Ensure that all <a href="http://nightly.feedparser.org/docs/reference-entry-source.html#reference.entry.source.rights">copyright</a> and <a href="http://nightly.feedparser.org/docs/reference-entry-license.html">licensing</a> information is propagated to the
combined feed(s) that you produce.</p></li>
<li><p>Add no advertising. Consider filtering out ads, lest you
be accused of using someone&#8217;s content to help your friends profit.</p></li>
<li><p>Most importantly, if anyone does object to their content being included,
quickly and without any complaint, remove them.</p></li>
</ul>
</body>
</html>
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Filters</title>
</head>
<body>
<h2>Filters and Plugins</h2>
<p>Filters and plugins are simple Unix pipes. Input comes in
<code>stdin</code>, parameters come from the config file, and output goes to
<code>stdout</code>. Anything written to <code>stderr</code> is logged as an
ERROR message. If no <code>stdout</code> is produced, the entry is not written
to the cache or processed further; in fact, if the entry had previously been
written to the cache, it will be removed.</p>
<p>There are two types of filters supported by Venus, input and template.</p>
<p>Input to an input filter is a aggressively
<a href="normalization.html">normalized</a> entry. For
example, if a feed is RSS 1.0 with 10 items, the filter will be called ten
times, each with a single Atom 1.0 entry, with all textConstructs
expressed as XHTML, and everything encoded as UTF-8.</p>
<p>Input to a template filter will be the output produced by the template.</p>
<p>You will find a small set of example filters in the <a
href="../filters">filters</a> directory. The <a
href="../filters/coral_cdn_filter.py">coral cdn filter</a> will change links
to images in the entry itself. The filters in the <a
href="../filters/stripAd/">stripAd</a> subdirectory will strip specific
types of advertisements that you may find in feeds.</p>
<p>The <a href="../filters/excerpt.py">excerpt</a> filter adds metadata (in
the form of a <code>planet:excerpt</code> element) to the feed itself. You
can see examples of how parameters are passed to this program in either
<a href="../tests/data/filter/excerpt-images.ini">excerpt-images</a> or
<a href="../examples/opml-top100.ini">opml-top100.ini</a>.
Alternately parameters may be passed
<abbr title="Uniform Resource Identifier">URI</abbr> style, for example:
<a href="../tests/data/filter/excerpt-images2.ini">excerpt-images2</a>.
</p>
<p>The <a href="../filters/xpath_sifter.py">xpath sifter</a> is a variation of
the above, including or excluding feeds based on the presence (or absence) of
data specified by <a href="http://www.w3.org/TR/xpath20/">xpath</a>
expressions. Again, parameters can be passed as
<a href="../tests/data/filter/xpath-sifter.ini">config options</a> or
<a href="../tests/data/filter/xpath-sifter2.ini">URI style</a>.
</p>
<p>The <a href="../filters/regexp_sifter.py">regexp sifter</a> operates just
like the xpath sifter, except it uses
<a href="http://docs.python.org/lib/re-syntax.html">regular expressions</a>
instead of XPath expressions.</p>
<h3>Notes</h3>
<ul>
<li>Any filters listed in the <code>[planet]</code> section of your config.ini
will be invoked on all feeds. Filters listed in individual
<code>[feed]</code> sections will only be invoked on those feeds.
Filters listed in <code>[template]</code> sections will be invoked on the
output of that template.</li>
<li>Input filters are executed when a feed is fetched, and the results are
placed into the cache. Changing a configuration file alone is not sufficient to
change the contents of the cache &mdash; typically that only occurs after
a feed is modified.</li>
<li>Filters are simply invoked in the order they are listed in the
configuration file (think unix pipes). Planet wide filters are executed before
feed specific filters.</li>
<li>The file extension of the filter is significant. <code>.py</code> invokes
python. <code>.xslt</code> involkes XSLT. <code>.sed</code> and
<code>.tmpl</code> (a.k.a. htmltmp) are also options. Other languages, like
perl or ruby or class/jar (java), aren't supported at the moment, but these
would be easy to add.</li>
<li>If the filter name contains a redirection character (<code>&gt;</code>),
then the output stream is
<a href="http://en.wikipedia.org/wiki/Tee_(Unix)">tee</a>d; one branch flows
through the specified filter and the output is planced into the named file; the
other unmodified branch continues onto the next filter, if any.
One use case for this function is to use
<a href="../filters/xhtml2html.plugin">xhtml2html</a> to produce both an XHTML
and an HTML output stream from one source.</li>
<li>Templates written using htmltmpl or django currently only have access to a
fixed set of fields, whereas XSLT and genshi templates have access to
everything.</li>
<li>Plugins differ from filters in that while filters are forked, plugins are
<a href="http://docs.python.org/lib/module-imp.html">imported</a>. This
means that plugins are limited to Python and are run in-process. Plugins
therefore have direct access to planet internals like configuration and
logging facitilies, as well as access to the bundled libraries like the
<a href="http://feedparser.org/docs/">Universal Feed Parser</a> and
<a href="http://code.google.com/p/html5lib/">html5lib</a>; but it also
means that functions like <code>os.abort()</code> can't be recovered
from.</li>
</ul>
</body>
</html>
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Documentation</title>
</head>
<body>
<h2>Table of Contents</h2>
<ul class="outer">
<li><a href="installation.html">Getting started</a></li>
<li>Basic Features
<ul>
<li><a href="config.html">Configuration</a></li>
<li><a href="templates.html">Templates</a></li>
</ul>
</li>
<li>Advanced Features
<ul>
<li><a href="venus.svg">Architecture</a></li>
<li><a href="normalization.html">Normalization</a></li>
<li><a href="filters.html">Filters and Plugins</a></li>
<li><a href="admin.html">Administration interface</a></li>
</ul>
</li>
<li>Other
<ul>
<li><a href="migration.html">Migration from Planet 2.0</a></li>
<li><a href="contributing.html">Contributing</a></li>
<li><a href="etiquette.html">Etiquette</a></li>
</ul>
</li>
<li>Reference
<ul>
<li><a href="http://www.planetplanet.org/">Planet</a></li>
<li><a href="http://feedparser.org/docs/">Universal Feed Parser</a></li>
<li><a href="http://code.google.com/p/html5lib/">html5lib</a></li>
<li><a href="http://htmltmpl.sourceforge.net/">htmltmpl</a></li>
<li><a href="http://bitworking.org/projects/httplib2/">httplib2</a></li>
<li><a href="http://www.w3.org/TR/xslt">XSLT</a></li>
<li><a href="http://www.gnu.org/software/sed/manual/html_mono/sed.html">sed</a></li>
<li><a href="http://www.djangoproject.com/documentation/templates/">Django templates</a></li>
</ul>
</li>
<li>Credits and License
<ul>
<li><a href="../AUTHORS">Authors</a></li>
<li><a href="../THANKS">Contributors</a></li>
<li><a href="../LICENCE">License</a></li>
</ul>
</li>
</ul>
</body>
</html>
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Installation</title>
</head>
<body>
<h2>Installation</h2>
<p>Venus has been tested on Linux, and Mac OSX, and Windows.</p>
<p>You'll need at least Python 2.2 installed on your system, we recommend
Python 2.4 though as there may be bugs with the earlier libraries.</p>
<p>Everything Pythonesque Planet needs to provide basic operation should be
included in the distribution. Some optional features may require
additional libraries, for example:</p>
<ul>
<li>Usage of XSLT requires either
<a href="http://xmlsoft.org/XSLT/xsltproc2.html">xsltproc</a>
or <a href="http://xmlsoft.org/XSLT/python.html">python-libxslt</a>.</li>
<li>The current interface to filters written in non-templating languages
(e.g., python) uses the
<a href="http://docs.python.org/lib/module-subprocess.html">subprocess</a>
module which was introduced in Python 2.4.</li>
<li>Usage of FOAF as a reading list requires
<a href="http://librdf.org/">librdf</a>.</li>
</ul>
<h3>General Instructions</h3>
<p>
These instructions apply to any platform. Check the instructions
below for more specific instructions for your platform.
</p>
<ol>
<li><p>If you are reading this online, you will need to
<a href="../index.html">download</a> and extract the files into a folder somewhere.
You can place this wherever you like, <code>~/planet</code>
and <code>~/venus</code> are good
choices, but so's anywhere else you prefer.</p></li>
<li><p>This is very important: from within that directory, type the following
command:</p>
<blockquote><code>python runtests.py</code></blockquote>
<p>This should take anywhere from a one to ten seconds to execute. No network
connection is required, and the script cleans up after itself. If the
script completes with an "OK", you are good to go. Otherwise stopping here
and inquiring on the
<a href="http://lists.planetplanet.org/mailman/listinfo/devel">mailing list</a>
is a good idea as it can save you lots of frustration down the road.</p></li>
<li><p>Make a copy of one of the <code>ini</code> the files in the
<a href="../examples">examples</a> subdirectory,
and put it wherever you like; I like to use the Planet's name (so
<code>~/planet/debian</code>), but it's really up to you.</p></li>
<li><p>Edit the <code>config.ini</code> file in this directory to taste,
it's pretty well documented so you shouldn't have any problems here. Pay
particular attention to the <code>output_dir</code> option, which should be
readable by your web server. If the directory you specify in your
<code>cache_dir</code> exists; make sure that it is empty.</p></li>
<li><p>Run it: <code>python planet.py pathto/config.ini</code></p>
<p>You'll want to add this to cron, make sure you run it from the
right directory.</p></li>
<li><p>(Optional)</p>
<p>Tell us about it! We'd love to link to you on planetplanet.org :-)</p></li>
<li><p>(Optional)</p>
<p>Build your own themes, templates, or filters! And share!</p></li>
</ol>
<h3 id="macosx">Mac OS X and Fink Instructions</h3>
<p>
The <a href="http://fink.sourceforge.net/">Fink Project</a> packages
various open source software for MacOS. This makes it a little easier
to get started with projects like Planet Venus.
</p>
<p>
Note: in the following, we recommend explicitly
using <code>python2.4</code>. As of this writing, Fink is starting to
support <code>python2.5</code> but the XML libraries, for example, are
not yet ported to the newer python so Venus will be less featureful.
</p>
<ol>
<li><p>Install the XCode development tools from your Mac OS X install
disks</p></li>
<li><p><a href="http://fink.sourceforge.net/download/">Download</a>
and install Fink</p></li>
<li><p>Tell fink to install the Planet Venus prerequisites:<br />
<code>fink install python24 celementtree-py24 bzr-py24 libxslt-py24
libxml2-py24</code></p></li>
<li><p><a href="../index.html">Download</a> and extract the Venus files into a
folder somewhere</p></li>
<li><p>Run the tests: <code>python2.4 runtests.py</code><br /> This
will warn you that the RDF library is missing, but that's
OK.</p></li>
<li><p>Continue with the general steps above, starting with Step 3. You
may want to explicitly specify <code>python2.4</code>.</p></li>
</ol>
<h3 id="ubuntu">Ubuntu Linux (Edgy Eft) instructions</h3>
<p>Before starting, issue the following command:</p>
<blockquote><pre>sudo apt-get install bzr python2.4-librdf</pre></blockquote>
<h3 id="windows">Windows instructions</h3>
<p>
htmltmpl templates (and Django too, since it currently piggybacks on
the htmltmpl implementation) on Windows require
the <a href="http://sourceforge.net/projects/pywin32/">pywin32</a>
module.
</p>
<h3 id="python22">Python 2.2 instructions</h3>
<p>If you are running Python 2.2, you may also need to install <a href="http://pyxml.sourceforge.net/">pyxml</a>. If the
following runs without error, you do <b>not</b> have the problem.</p>
<blockquote><pre>python -c "__import__('xml.dom.minidom').dom.minidom.parseString('&lt;entry xml:lang=\"en\"/&gt;')"</pre></blockquote>
<p>Installation of pyxml varies by platform. For Ubuntu Linux (Dapper Drake), issue the following command:</p>
<blockquote><pre>sudo apt-get install python2.2-xml</pre></blockquote>
</body>
</html>
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Migration</title>
</head>
<body>
<h2>Migration from Planet 2.0</h2>
<p>The intent is that existing Planet 2.0 users should be able to reuse
their existing <code>config.ini</code> and <code>.tmpl</code> files,
but the reality is that users will need to be aware of the following:</p>
<ul>
<li>You will need to start over with a new cache directory as the format
of the cache has changed dramatically.</li>
<li>Existing <code>.tmpl</code> and <code>.ini</code> files should work,
though some <a href="config.html">configuration</a> options (e.g.,
<code>days_per_page</code>) have not yet been implemented</li>
<li>No testing has been done on Python 2.1, and it is presumed not to work.</li>
<li>To take advantage of all features, you should install the optional
XML and RDF libraries described on
the <a href="installation.html">Installation</a> page.</li>
</ul>
<p>
Common changes to config.ini include:
</p>
<ul>
<li><p>Filename changes:</p>
<pre>
examples/fancy/index.html.tmpl => themes/classic_fancy/index.html.tmpl
examples/atom.xml.tmpl => themes/common/atom.xml.xslt
examples/rss20.xml.tmpl => themes/common/rss20.xml.tmpl
examples/rss10.xml.tmpl => themes/common/rss10.xml.tmpl
examples/opml.xml.tmpl => themes/common/opml.xml.xslt
examples/foafroll.xml.tmpl => themes/common/foafroll.xml.xslt
</pre></li>
</ul>
</body>
</html>
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="docs.js"></script>
<link rel="stylesheet" type="text/css" href="docs.css"/>
<title>Venus Normalization</title>
</head>
<body>
<h2>Normalization</h2>
<p>Venus builds on, and extends, the <a
href="http://www.feedparser.org/">Universal Feed Parser</a> and <a
href="http://code.google.com/p/html5lib/">html5lib</a> to
convert all feeds into Atom 1.0, with well formed XHTML, and encoded as UTF-8,
meaning that you don't have to worry about funky feeds, tag soup, or character
encoding.</p>
<h3>Encoding</h3>
<p>Input data in feeds may be encoded in a variety of formats, most commonly
ASCII, ISO-8859-1, WIN-1252, AND UTF-8. Additionally, many feeds make use of
the wide range of
<a href="http://www.w3.org/TR/html401/sgml/entities.html">character entity
references</a> provided by HTML. Each is converted to UTF-8, an encoding
which is a proper superset of ASCII, supports the entire range of Unicode
characters, and is one of
<a href="http://www.w3.org/TR/2006/REC-xml-20060816/#charsets">only two</a>
encodings required to be supported by all conformant XML processors.</p>
<p>Encoding problems are one of the more common feed errors, and every
attempt is made to correct common errors, such as the inclusion of
the so-called
<a href="http://www.fourmilab.ch/webtools/demoroniser/">moronic</a> versions
of smart-quotes. In rare cases where individual characters can not be
converted to valid UTF-8 or into
<a href="http://www.w3.org/TR/xml/#charsets">characters allowed in XML 1.0
documents</a>, such characters will be replaced with the Unicode
<a href="http://www.fileformat.info/info/unicode/char/fffd/index.htm">Replacement character</a>, with a title that describes the original character whenever possible.</p>
<p>In order to support the widest range of inputs, use of Python 2.3 or later,
as well as the installation of the python <code>iconvcodec</code>, is
recommended.</p>
<h3>HTML</h3>
<p>A number of different normalizations of HTML are performed. For starters,
the HTML is
<a href="http://www.feedparser.org/docs/html-sanitization.html">sanitized</a>,
meaning that HTML tags and attributes that could introduce javascript or
other security risks are removed.</p>
<p>Then,
<a href="http://www.feedparser.org/docs/resolving-relative-links.html">relative
links are resolved</a> within the HTML. This is also done for links
in other areas in the feed too.</p>
<p>Finally, unmatched tags are closed. This is done with a
<a href="http://code.google.com/p/html5lib/">knowledge of the semantics of HTML</a>. Additionally, a
<a href="http://golem.ph.utexas.edu/~distler/blog/archives/000165.html#sanitizespec">large
subset of MathML</a>, as well as a
<a href="http://www.w3.org/TR/SVGMobile/">tiny profile of SVG</a>
is also supported.</p>
<h3>Atom 1.0</h3>
<p>The Universal Feed Parser also
<a href="http://www.feedparser.org/docs/content-normalization.html">normalizes the content of feeds</a>. This involves a
<a href="http://www.feedparser.org/docs/reference.html">large number of elements</a>; the best place to start is to look at
<a href="http://www.feedparser.org/docs/annotated-examples.html">annotated examples</a>. Among other things a wide variety of
<a href="http://www.feedparser.org/docs/date-parsing.html">date formats</a>
are converted into
<a href="http://www.ietf.org/rfc/rfc3339.txt">RFC 3339</a> formatted dates.</p>
<p>If no <a href="http://www.feedparser.org/docs/reference-entry-id.html">ids</a> are found in entries, attempts are made to synthesize one using (in order):</p>
<ul>
<li><a href="http://www.feedparser.org/docs/reference-entry-link.html">link</a></li>
<li><a href="http://www.feedparser.org/docs/reference-entry-title.html">title</a></li>
<li><a href="http://www.feedparser.org/docs/reference-entry-summary.html">summary</a></li>
<li><a href="http://www.feedparser.org/docs/reference-entry-content.html">content</a></li>
</ul>
<p>If no <a href="http://www.feedparser.org/docs/reference-feed-
updated.html">updated</a> dates are found in an entry, the updated date from
the feed is used. If no updated date is found in either the feed or
the entry, the current time is substituted.</p>
<h3 id="overrides">Overrides</h3>
<p>All of the above describes what Venus does automatically, either directly
or through its dependencies. There are a number of errors which can not
be corrected automatically, and for these, there are configuration parameters
that can be used to help.</p>
<ul>
<li><code>ignore_in_feed</code> allows you to list any number of elements
or attributes which are to be ignored in feeds. This is often handy in the
case of feeds where the <code>author</code>, <code>id</code>,
<code>updated</code> or <code>xml:lang</code> values can't be trusted.</li>
<li><code>title_type</code>, <code>summary_type</code>,
<code>content_type</code> allow you to override the
<a href="http://www.feedparser.org/docs/reference-entry-title_detail.html#reference.entry.title_detail.type"><code>type</code></a>
attributes on these elements.</li>
<li><code>name_type</code> does something similar for
<a href="http://www.feedparser.org/docs/reference-entry-author_detail.html#reference.entry.author_detail.name">author names</a></li>
<li><code>future_dates</code> allows you to specify how to deal with dates which are in the future.
<ul style="margin:0">
<li><code>ignore_date</code> will cause the date to be ignored (and will therefore default to the time the entry was first seen) until the feed is updated and the time indicated is past, at which point the entry will be updated with the new date.</li>
<li><code>ignore_entry</code> will cause the entire entry containing the future date to be ignored until the date is past.</li>
<li>Anything else (i.e.. the default) will leave the date as is, causing the entries that contain these dates sort to the top of the planet until the time passes.</li>
</ul>
</li>
<li><code>xml_base</code> will adjust the <code>xml:base</code> values in effect for each of the text constructs in the feed (things like <code>title</code>, <code>summary</code>, and <code>content</code>). Other elements in the feed (most notably, <code>link</code> are not affected by this value.
<ul style="margin:0">
<li><code>feed_alternate</code> will replace the <code>xml:base</code> in effect with the value of the <code>alternate</code> <code>link</code> found either in the enclosed <code>source</code> or enclosing <code>feed</code> element.</li>
<li><code>entry_alternate</code> will replace the <code>xml:base</code> in effect with the value of the <code>alternate</code> <code>link</code> found in this entry.</li>
<li>Any other value will be treated as a <a href="http://www.ietf.org/rfc/rfc3986.txt">URI reference</a>. These values may be relative or absolute. If relative, the <code>xml:base</code> values in each text construct will each be adjusted separately using to the specified value.</li>
</ul>
</li>
</ul>
</body>
</html>
This diff is collapsed.
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1280 1024" xmlns:xlink="http://www.w3.org/1999/xlink">>
<defs>
<g id="feed">
<path d="M10,15l75,0l0,75l-75,0z" fill="#F80"
stroke-linejoin="round" stroke-width="20" stroke="#F80"/>
<circle cx="15" cy="82" r="6" fill="#FFF"/>
<path d="M35,82s0-20-20-20 M55,82s0-40-40-40 M75,82s0-60-60-60"
stroke-linecap="round" stroke-width="12" stroke="#FFF" fill="none"/>
</g>
<g id="entry">
<g fill="none">
<ellipse stroke="#689" rx="3" ry="22"/>
<ellipse stroke="#eb4" rx="3" ry="22" transform="rotate(-66)"/>
<ellipse stroke="#8ac" rx="3" ry="22" transform="rotate(66)"/>
<circle stroke="#451" r="22"/>
</g>
<g fill="#689" stroke="#FFF">
<circle fill="#8ac" r="6.5"/>
<circle cy="-22" r="4.5"/>
<circle cx="-20" cy="9" r="4.5"/>
<circle cx="20" cy="9" r="4.5"/>
</g>
</g>
<g id="node" stroke="none">
<circle r="18" fill="#049"/>
<path d="M-14,7a16,16,0,0,1,22-21a15,15,0,0,0-14,2a3,3,0,1,1-5,5
a15,15,0,0,0-3,14" fill="#FFF"/>
</g>
<path d="M-14-6a44,62,0,0,0,28,0l0,12a44,62,0,0,0-28,0z"
fill="#049" id="arc"/>
</defs>
<rect height="1024" width="1280" fill="#0D0"/>
<use xlink:href="#feed" x="220" y="30"/>
<use xlink:href="#feed" x="150" y="60"/>
<use xlink:href="#feed" x="100" y="100"/>
<use xlink:href="#feed" x="60" y="150"/>
<use xlink:href="#feed" x="30" y="220"/>
<g fill="#F00" stroke-linejoin="round" stroke-width="12" stroke="#F88">
<path d="M50,800l0,180l1000,0l0-180z" fill="#FFF"/>
<path d="M150,330l400,0l0,300l-400,0z"/>
<path d="M750,200l200,0 l0,110l100,0l0,60l-100,0 l0,40l100,0l0,60l-100,0
l0,40l100,0l0,60l-100,0 l0,130l70,70l-340,0l70,-70z"/>
</g>
<path d="M1080,360l100,0l0,-70l-30,-30l-70,0z" fill="#FFF"/>
<path d="M1180,290l-30,0l0,-30" fill="none" stroke="#000"/>
<use xlink:href="#feed" x="1080" y="380"/>
<g transform="translate(1080,500)">
<use xlink:href="#arc" transform="translate(76,50) rotate(90)"/>
<use xlink:href="#arc" transform="translate(50,35) rotate(-30)"/>
<use xlink:href="#arc" transform="translate(50,65) rotate(30)"/>
<use xlink:href="#node" transform="translate(24,50)"/>
<use xlink:href="#node" transform="translate(76,80)"/>
<use xlink:href="#node" transform="translate(76,20)"/>
</g>
<path d="M260,150s100,60,90,280 M170,270s150,0,180,120
M200,200s150,0,150,200l0,450m-100,-70l100,70l100,-70
M850,807l0,-200m-70,70l70,-70l70,70"
stroke="#000" fill="none" stroke-width="40"/>
<ellipse cx="350" cy="368" fill="#FFF" rx="80" ry="30"/>
<ellipse cx="850" cy="238" fill="#FFF" rx="80" ry="30"/>
<g font-size="32" fill="#FFF" text-anchor="middle">
<text x="350" y="380" fill="#F00">Spider</text>
<text x="350" y="460">Universal Feed Parser</text>
<text x="350" y="530">html5lib</text>
<text x="350" y="600">Reconstitute</text>
<text x="350" y="750">Filter(s)</text>
<text x="850" y="250" fill="#F00">Splice</text>
<text x="950" y="350">Template</text>
<text x="950" y="450">Template</text>
<text x="950" y="550">Template</text>
<text x="1126" y="330" fill="#000">HTML</text>
</g>
<use xlink:href="#entry" x="100" y="900"/>
<use xlink:href="#entry" x="180" y="950"/>
<use xlink:href="#entry" x="200" y="850"/>
<use xlink:href="#entry" x="290" y="920"/>
<use xlink:href="#entry" x="400" y="900"/>
<use xlink:href="#entry" x="470" y="840"/>
<use xlink:href="#entry" x="500" y="930"/>
<use xlink:href="#entry" x="570" y="870"/>
<use xlink:href="#entry" x="620" y="935"/>
<use xlink:href="#entry" x="650" y="835"/>
<use xlink:href="#entry" x="690" y="900"/>
<use xlink:href="#entry" x="720" y="835"/>
<use xlink:href="#entry" x="730" y="950"/>
<use xlink:href="#entry" x="760" y="900"/>
<use xlink:href="#entry" x="790" y="835"/>
<use xlink:href="#entry" x="800" y="950"/>
<use xlink:href="#entry" x="830" y="900"/>
<use xlink:href="#entry" x="860" y="835"/>
<use xlink:href="#entry" x="870" y="950"/>
<use xlink:href="#entry" x="900" y="900"/>
<use xlink:href="#entry" x="930" y="835"/>
<use xlink:href="#entry" x="940" y="950"/>
<use xlink:href="#entry" x="970" y="900"/>
<use xlink:href="#entry" x="1000" y="835"/>
<use xlink:href="#entry" x="1010" y="950"/>
</svg>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY categoryTerm "WebSemantique">
]>
<!--
This transformation is released under the same licence as Python
see http://www.intertwingly.net/code/venus/LICENCE.
Author: Eric van der Vlist <vdv@dyomedea.com>
This transformation is meant to be used as a filter that determines if
Atom entries are relevant to a specific topic and adds the corresonding
<category/> element when it is the case.
This is done by a simple keyword matching mechanism.
To customize this filter to your needs:
1) Replace WebSemantique by your own category name in the definition of
the categoryTerm entity above.
2) Review the "upper" and "lower" variables that are used to convert text
nodes to lower case and replace common ponctuation signs into spaces
to check that they meet your needs.
3) Define your own list of keywords in <d:keyword/> elements. Note that
the leading and trailing spaces are significant: "> rdf <" will match rdf
as en entier word while ">rdf<" would match the substring "rdf" and
"> rdf<" would match words starting by rdf. Also note that the test is done
after conversion to lowercase.
To use it with venus, just add this filter to the list of filters, for instance:
filters= categories.xslt guess_language.py
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:atom="http://www.w3.org/2005/Atom" xmlns="http://www.w3.org/2005/Atom"
xmlns:d="http://ns.websemantique.org/data/" exclude-result-prefixes="d atom" version="1.0">
<xsl:variable name="upper"
>,.;AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZzÀàÁáÂâÃãÄäÅåÆæÇçÈèÉéÊêËëÌìÍíÎîÏïÐðÑñÒòÓóÔôÕõÖöØøÙùÚúÛûÜüÝýÞþ</xsl:variable>
<xsl:variable name="lower"
> aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzzaaaaaaaaaaaaææcceeeeeeeeiiiiiiiiððnnooooooooooøøuuuuuuuuyyþþ</xsl:variable>
<d:keywords>
<d:keyword> wiki semantique </d:keyword>
<d:keyword> wikis semantiques </d:keyword>
<d:keyword> web semantique </d:keyword>
<d:keyword> websemantique </d:keyword>
<d:keyword> semantic web</d:keyword>
<d:keyword> semweb</d:keyword>
<d:keyword> rdf</d:keyword>
<d:keyword> owl </d:keyword>
<d:keyword> sparql </d:keyword>
<d:keyword> topic map</d:keyword>
<d:keyword> doap </d:keyword>
<d:keyword> foaf </d:keyword>
<d:keyword> sioc </d:keyword>
<d:keyword> ontology </d:keyword>
<d:keyword> ontologie</d:keyword>
<d:keyword> dublin core </d:keyword>
</d:keywords>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="atom:entry/atom:updated">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
<xsl:variable name="concatenatedText">
<xsl:for-each select="../atom:title|../atom:summary|../atom:content|../atom:category/@term">
<xsl:text> </xsl:text>
<xsl:value-of select="translate(., $upper, $lower)"/>
</xsl:for-each>
<xsl:text> </xsl:text>
</xsl:variable>
<xsl:if test="document('')/*/d:keywords/d:keyword[contains($concatenatedText, .)]">
<category term="WebSemantique"/>
</xsl:if>
</xsl:template>
<xsl:template match="atom:category[@term='&categoryTerm;']"/>
</xsl:stylesheet>
This filter is released under the same licence as Python
see http://www.intertwingly.net/code/venus/LICENCE.
Author: Eric van der Vlist <vdv@dyomedea.com>
This filter guesses whether an Atom entry is written
in English or French. It should be trivial to chose between
two other languages, easy to extend to more than two languages
and useful to pass these languages as Venus configuration
parameters.
The code used to guess the language is the one that has been
described by Douglas Bagnall as the Python recipe titled
"Language detection using character trigrams"
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576.
To add support for a new language, this language must first be
"learned" using learn-language.py. This learning phase is nothing
more than saving a pickled version of the Trigram object for this
language.
To learn Finnish, you would execute:
$ ./learn-language.py http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt fi.data
where http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt is a text
representative of the Finnish language and "fi.data" is the name of the
data file for "fi" (ISO code for Finnish).
To install this filter, copy this directory under the Venus
filter directory and declare it in your filters list, for instance:
filters= categories.xslt guess-language/guess-language.py
NOTE: this filter depends on Amara
(http://uche.ogbuji.net/tech/4suite/amara/)
This diff is collapsed.
This diff is collapsed.
#!/usr/bin/env python
"""A filter to guess languages.
This filter guesses whether an Atom entry is written
in English or French. It should be trivial to chose between
two other languages, easy to extend to more than two languages
and useful to pass these languages as Venus configuration
parameters.
(See the REAME file for more details).
Requires Python 2.1, recommends 2.4.
"""
__authors__ = [ "Eric van der Vlist <vdv@dyomedea.com>"]
__license__ = "Python"
import amara
from sys import stdin, stdout
from trigram import Trigram
from xml.dom import XML_NAMESPACE as XML_NS
import cPickle
ATOM_NSS = {
u'atom': u'http://www.w3.org/2005/Atom',
u'xml': XML_NS
}
langs = {}
def tri(lang):
if not langs.has_key(lang):
f = open('filters/guess-language/%s.data' % lang, 'r')
t = cPickle.load(f)
f.close()
langs[lang] = t
return langs[lang]
def guess_language(entry):
text = u'';
for child in entry.xml_xpath(u'atom:title|atom:summary|atom:content'):
text = text + u' '+ child.__unicode__()
t = Trigram()
t.parseString(text)
if tri('fr') - t > tri('en') - t:
lang=u'en'
else:
lang=u'fr'
entry.xml_set_attribute((u'xml:lang', XML_NS), lang)
def main():
feed = amara.parse(stdin, prefixes=ATOM_NSS)
for entry in feed.xml_xpath(u'//atom:entry[not(@xml:lang)]'):
guess_language(entry)
feed.xml(stdout)
if __name__ == '__main__':
main()
#!/usr/bin/env python
"""A filter to guess languages.
This utility saves a Trigram object on file.
(See the REAME file for more details).
Requires Python 2.1, recommends 2.4.
"""
__authors__ = [ "Eric van der Vlist <vdv@dyomedea.com>"]
__license__ = "Python"
from trigram import Trigram
from sys import argv
from cPickle import dump
def main():
tri = Trigram(argv[1])
out = open(argv[2], 'w')
dump(tri, out)
out.close()
if __name__ == '__main__':
main()
#!/usr/bin/python
# -*- coding: UTF-8 -*-
"""
This class is based on the Python recipe titled
"Language detection using character trigrams"
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576
by Douglas Bagnall.
It has been (slightly) adapted by Eric van der Vlist to support
Unicode and accept a method to parse strings.
"""
__authors__ = [ "Douglas Bagnall", "Eric van der Vlist <vdv@dyomedea.com>"]
__license__ = "Python"
import random
from urllib import urlopen
class Trigram:
"""
From one or more text files, the frequency of three character
sequences is calculated. When treated as a vector, this information
can be compared to other trigrams, and the difference between them
seen as an angle. The cosine of this angle varies between 1 for
complete similarity, and 0 for utter difference. Since letter
combinations are characteristic to a language, this can be used to
determine the language of a body of text. For example:
>>> reference_en = Trigram('/path/to/reference/text/english')
>>> reference_de = Trigram('/path/to/reference/text/german')
>>> unknown = Trigram('url://pointing/to/unknown/text')
>>> unknown.similarity(reference_de)
0.4
>>> unknown.similarity(reference_en)
0.95
would indicate the unknown text is almost cetrtainly English. As
syntax sugar, the minus sign is overloaded to return the difference
between texts, so the above objects would give you:
>>> unknown - reference_de
0.6
>>> reference_en - unknown # order doesn't matter.
0.05
As it stands, the Trigram ignores character set information, which
means you can only accurately compare within a single encoding
(iso-8859-1 in the examples). A more complete implementation might
convert to unicode first.
As an extra bonus, there is a method to make up nonsense words in the
style of the Trigram's text.
>>> reference_en.makeWords(30)
My withillonquiver and ald, by now wittlectionsurper, may sequia,
tory, I ad my notter. Marriusbabilly She lady for rachalle spen
hat knong al elf
Beware when using urls: HTML won't be parsed out.
Most methods chatter away to standard output, to let you know they're
still there.
"""
length = 0
def __init__(self, fn=None):
self.lut = {}
if fn is not None:
self.parseFile(fn)
def _parseAFragment(self, line, pair=' '):
for letter in line:
d = self.lut.setdefault(pair, {})
d[letter] = d.get(letter, 0) + 1
pair = pair[1] + letter
return pair
def parseString(self, string):
self._parseAFragment(string)
self.measure()
def parseFile(self, fn, encoding="iso-8859-1"):
pair = ' '
if '://' in fn:
#print "trying to fetch url, may take time..."
f = urlopen(fn)
else:
f = open(fn)
for z, line in enumerate(f):
#if not z % 1000:
# print "line %s" % z
# \n's are spurious in a prose context
pair = self._parseAFragment(line.strip().decode(encoding) + ' ')
f.close()
self.measure()
def measure(self):
"""calculates the scalar length of the trigram vector and
stores it in self.length."""
total = 0
for y in self.lut.values():
total += sum([ x * x for x in y.values() ])
self.length = total ** 0.5
def similarity(self, other):
"""returns a number between 0 and 1 indicating similarity.
1 means an identical ratio of trigrams;
0 means no trigrams in common.
"""
if not isinstance(other, Trigram):
raise TypeError("can't compare Trigram with non-Trigram")
lut1 = self.lut
lut2 = other.lut
total = 0
for k in lut1.keys():
if k in lut2:
a = lut1[k]
b = lut2[k]
for x in a:
if x in b:
total += a[x] * b[x]
return float(total) / (self.length * other.length)
def __sub__(self, other):
"""indicates difference between trigram sets; 1 is entirely
different, 0 is entirely the same."""
return 1 - self.similarity(other)
def makeWords(self, count):
"""returns a string of made-up words based on the known text."""
text = []
k = ' '
while count:
n = self.likely(k)
text.append(n)
k = k[1] + n
if n in ' \t':
count -= 1
return ''.join(text)
def likely(self, k):
"""Returns a character likely to follow the given string
two character string, or a space if nothing is found."""
if k not in self.lut:
return ' '
# if you were using this a lot, caching would a good idea.
letters = []
for k, v in self.lut[k].items():
letters.append(k * v)
letters = ''.join(letters)
return random.choice(letters)
def test():
en = Trigram('http://gutenberg.net/dirs/etext97/lsusn11.txt')
#NB fr and some others have English license text.
# no has english excerpts.
fr = Trigram('http://gutenberg.net/dirs/etext03/candi10.txt')
fi = Trigram('http://gutenberg.net/dirs/1/0/4/9/10492/10492-8.txt')
no = Trigram('http://gutenberg.net/dirs/1/2/8/4/12844/12844-8.txt')
se = Trigram('http://gutenberg.net/dirs/1/0/1/1/10117/10117-8.txt')
no2 = Trigram('http://gutenberg.net/dirs/1/3/0/4/13041/13041-8.txt')
en2 = Trigram('http://gutenberg.net/dirs/etext05/cfgsh10.txt')
fr2 = Trigram('http://gutenberg.net/dirs/1/3/7/0/13704/13704-8.txt')
print "calculating difference:"
print "en - fr is %s" % (en - fr)
print "fr - en is %s" % (fr - en)
print "en - en2 is %s" % (en - en2)
print "en - fr2 is %s" % (en - fr2)
print "fr - en2 is %s" % (fr - en2)
print "fr - fr2 is %s" % (fr - fr2)
print "fr2 - en2 is %s" % (fr2 - en2)
print "fi - fr is %s" % (fi - fr)
print "fi - en is %s" % (fi - en)
print "fi - se is %s" % (fi - se)
print "no - se is %s" % (no - se)
print "en - no is %s" % (en - no)
print "no - no2 is %s" % (no - no2)
print "se - no2 is %s" % (se - no2)
print "en - no2 is %s" % (en - no2)
print "fr - no2 is %s" % (fr - no2)
if __name__ == '__main__':
test()
# The xpath_sifter filter allows you to stop entries from a feed being displayed
# if they do not match a particular pattern.
# It is useful for things like only displaying entries in a particular category
# even if the site does not provide per category feeds, and displaying only entries
# that contain a particular string in their title.
# The xpath_sifter filter applies only after all feeds are normalised to Atom 1.0.
# Look in your cache to see what entries look like.
[Planet]
# we are only applying the filter to certain feeds, so we do not configure it in the
# [Planet] section
### FIRST FEED: FILTER ON CATEGORY ###
# We are only interested in entries in the category "two" from this blogger, but
# he does not provide a per-category feed.
# The Atom for categories looks like this: <category term="two"/>, so here
# we filter the http://example.com/uncategorised.xml file for entries with a
# category tag with the term attribute equal to 'two'
[http://example.com/uncategorised.xml]
name = Category 'two' (from Site Without a Categorised Feed)
# This first version is the readable way to do it, but you'll run into trouble
# if you have any special characters, like spaces, in your require string
# filters = xpath_sifter.py?require=//atom:category[@term='two']
# Here's a URL quoted version:
filters = xpath_sifter.py?require=//atom%3Acategory%5B%40term%3D%27two%27%5D
# Here's a way to get the URL quoted version on the command line:
# python -c "import urllib; print urllib.quote('STRING');"
# eg
# python -c "import urllib; print urllib.quote('atom:category[@term=\'two\']');"
### SECOND FEED: FILTER ON TITLE ###
# The verbose blogger whose feed is below blogs about many subjects but we are
# only interested in entries about Venus. She does not use categories but
# fortunately her titles are very consistent, so we search within the title
# tag's text for the text 'Venus'
[http://example.com/verbose.xml]
name = Venus (from Verbose Site)
# Non-quoted version
# filters = xpath_sifter.py?require=//atom:title[contains(.,'Venus')]
# Quoted version
filters = xpath_sifter.py?atom%3Atitle%5Bcontains%28.%2C%27Venus%27%29%5D
### THIRD FEED: NO FILTER ###
# We can include other feeds that do not have the filter applied
[http://example.com/normal.xml]
name = No filter applied
# Planet configuration file
# Every planet needs a [Planet] section
[Planet]
# name: Your planet's name
# link: Link to the main page
# owner_name: Your name
# owner_email: Your e-mail address
name = Elias' Planet
link = http://torrez.us/planet/
owner_name = Elias Torres
owner_email = elias@torrez.us
# cache_directory: Where cached feeds are stored
# log_level: One of DEBUG, INFO, WARNING, ERROR or CRITICAL
cache_directory = /tmp/venus/
log_level = DEBUG
# The following provide defaults for each template:
# output_theme: "theme" of the output
# output_dir: Directory to place output files
# items_per_page: How many items to put on each page
output_theme = mobile
output_dir = /var/www/planet
items_per_page = 60
# If non-zero, all feeds which have not been updated in the indicated
# number of days will be marked as inactive
activity_threshold = 90
# filters to be run
filters = excerpt.py
# filter parameters
[excerpt.py]
omit = img p br
width = 500
# subscription list
[http://torrez.us/who#elias]
content_type = foaf
online_accounts =
http://del.icio.us/|http://del.icio.us/rss/{foaf:accountName}
http://flickr.com/|http://api.flickr.com/services/feeds/photos_public.gne?id={foaf:accountName}
# Planet configuration file
# Every planet needs a [Planet] section
[Planet]
# name: Your planet's name
# link: Link to the main page
# owner_name: Your name
# owner_email: Your e-mail address
name = Techmeme Leaderboard
link = http://planet.intertwingly.net/top100/
owner_name = Sam Ruby
owner_email = rubys@intertwingly.net
# cache_directory: Where cached feeds are stored
# log_level: One of DEBUG, INFO, WARNING, ERROR or CRITICAL
cache_directory = /home/rubys/planet/top100
log_level = INFO
# The following provide defaults for each template:
# output_theme: "theme" of the output
# output_dir: Directory to place output files
# items_per_page: How many items to put on each page
output_theme = mobile
output_dir = /home/rubys/public_html/top100
items_per_page = 60
# If non-zero, all feeds which have not been updated in the indicated
# number of days will be marked as inactive
activity_threshold = 90
# filters to be run
filters = excerpt.py
# Don't let any one feed monopolize the output (symptom often occurs when
# somebody 'migrates' their weblog.
new_feed_items = 4
bill_of_materials:
.htaccess
favicon.ico
robots.txt
# filter parameters
[excerpt.py]
omit = img p br
width = 500
# add memes to output
[index.html.xslt]
filters = mememe.plugin
[mememe.plugin]
sidebar = //*[@id="footer"]
# subscription list
[http://www.techmeme.com/lb.opml]
content_type = opml
# Planet configuration file based on the 'fancy' Planet 2.0 example.
#
# This illustrates some of Planet's fancier features with example.
# Every planet needs a [Planet] section
[Planet]
# name: Your planet's name
# link: Link to the main page
# owner_name: Your name
# owner_email: Your e-mail address
name = Planet Schmanet
link = http://planet.schmanet.janet/
owner_name = Janet Weiss
owner_email = janet@slut.sex
# cache_directory: Where cached feeds are stored
# log_level: One of DEBUG, INFO, WARNING, ERROR or CRITICAL
# feed_timeout: number of seconds to wait for any given feed
cache_directory = /home/rubys/planet/pscache
log_level = DEBUG
feed_timeout = 20
# output_theme: "theme" of the output
# output_dir: Directory to place output files
# items_per_page: How many items to put on each page
output_theme = classic_fancy
output_dir = /home/rubys/public_html/fancy
items_per_page = 60
# additional files to copy (note the wildcards!)
bill_of_materials:
images/#{face}
# Options placed in the [DEFAULT] section provide defaults for the feed
# sections. Placing a default here means you only need to override the
# special cases later.
[DEFAULT]
# Hackergotchi default size.
# If we want to put a face alongside a feed, and it's this size, we
# can omit these variables.
facewidth = 65
faceheight = 85
# Any other section defines a feed to subscribe to. The section title
# (in the []s) is the URI of the feed itself. A section can also be
# have any of the following options:
#
# name: Name of the feed (defaults to the title found in the feed)
#
# Additionally any other option placed here will be available in
# the template (prefixed with channel_ for the Items loop). We use
# this trick to make the faces work -- this isn't something Planet
# "natively" knows about. Look at fancy-examples/index.html.tmpl
# for the flip-side of this.
[http://www.netsplit.com/blog/index.rss]
name = Scott James Remnant
face = keybuk.png
# pick up the default facewidth and faceheight
[http://www.gnome.org/~jdub/blog/?flav=rss]
name = Jeff Waugh
face = jdub.png
facewidth = 70
faceheight = 74
[http://usefulinc.com/edd/blog/rss91]
name = Edd Dumbill
face = edd.png
facewidth = 62
faceheight = 80
[http://blog.clearairturbulence.org/?flav=rss]
name = Thom May
face = thom.png
# pick up the default faceheight only
facewidth = 59
#!/usr/bin/env python
"""
Main program to run just the expunge portion of planet
"""
import os.path
import sys
from planet import expunge, config
if __name__ == '__main__':
if len(sys.argv) == 2 and os.path.isfile(sys.argv[1]):
config.load(sys.argv[1])
expunge.expungeCache()
else:
print "Usage:"
print " python %s config.ini" % sys.argv[0]
import sys, socket
from planet import config, feedparser
from planet.spider import filename
from urllib2 import urlopen
from urlparse import urljoin
from html5lib import html5parser, treebuilders
from ConfigParser import ConfigParser
# load config files (default: config.ini)
for arg in sys.argv[1:]:
config.load(arg)
if len(sys.argv) == 1:
config.load('config.ini')
from Queue import Queue
from threading import Thread
# determine which subscriptions have no icon but do have a html page
fetch_queue = Queue()
html = ['text/html', 'application/xhtml+xml']
sources = config.cache_sources_directory()
for sub in config.subscriptions():
data=feedparser.parse(filename(sources,sub))
if data.feed.get('icon'): continue
if not data.feed.get('links'): continue
for link in data.feed.links:
if link.rel=='alternate' and link.type in html:
fetch_queue.put((sub, link.href))
break
# find the favicon for a given webpage
def favicon(page):
parser=html5parser.HTMLParser(tree=treebuilders.getTreeBuilder('dom'))
doc=parser.parse(urlopen(page))
favicon = urljoin(page, '/favicon.ico')
for link in doc.getElementsByTagName('link'):
if link.hasAttribute('rel') and link.hasAttribute('href'):
if 'icon' in link.attributes['rel'].value.lower().split(' '):
favicon = urljoin(page, link.attributes['href'].value)
if urlopen(favicon).info()['content-length'] != '0':
return favicon
# thread worker that fills in the dictionary which maps subs to favicon
icons = {}
def fetch(thread_index, fetch_queue, icons):
while 1:
sub, html = fetch_queue.get()
if not html: break
try:
icon = favicon(html)
if icon: icons[sub] = icon
except:
pass
# set timeout
try:
socket.setdefaulttimeout(float(config.feed_timeout()))
except:
pass
# (optionally) spawn threads, fetch pages
threads = {}
if int(config.spider_threads()):
for i in range(int(config.spider_threads())):
threads[i] = Thread(target=fetch, args=(i, fetch_queue, icons))
fetch_queue.put((None, None))
threads[i].start()
for i in range(int(config.spider_threads())):
threads[i].join()
else:
fetch_queue.put((None, None))
fetch(0, fetch_queue, icons)
# produce config file
config = ConfigParser()
for sub, icon in icons.items():
config.add_section(sub)
config.set(sub, 'favicon', icon)
config.write(sys.stdout)
<html xmlns:py="http://genshi.edgewall.org/" py:strip="">
<!--! insert search form -->
<div py:match="div[@id='sidebar']" py:attrs="select('@*')">
${select('*')}
<h2>Search</h2>
<form><input name="q"/></form>
</div>
<?python from urlparse import urljoin ?>
<!--! insert opensearch autodiscovery link -->
<head py:match="head" py:attrs="select('@*')">
${select('*')}
<link rel="search" type="application/opensearchdescription+xml"
href="${urljoin(str(select('link[@rel=\'alternate\']/@href')),
'opensearchdescription.xml')}"
title="${select('link[@rel=\'alternate\']/@title')} search"/>
</head>
<!--! ensure that scripts don't use empty tag syntax -->
<script py:match="script" py:attrs="select('@*')">
${select('*')}
</script>
<!--! Include the original stream, which will be processed by the rules
defined above -->
${input}
</html>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml">
<!-- insert search form -->
<xsl:template match="xhtml:div[@id='sidebar']">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<h2>Search</h2>
<form><input name="q"/></form>
</xsl:copy>
</xsl:template>
<!-- function to return baseuri of a given string -->
<xsl:template name="baseuri">
<xsl:param name="string" />
<xsl:if test="contains($string, '/')">
<xsl:value-of select="substring-before($string, '/')"/>
<xsl:text>/</xsl:text>
<xsl:call-template name="baseuri">
<xsl:with-param name="string">
<xsl:value-of select="substring-after($string, '/')"/>
</xsl:with-param>
</xsl:call-template>
</xsl:if>
</xsl:template>
<!-- insert opensearch autodiscovery link -->
<xsl:template match="xhtml:head">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<link rel="search" type="application/opensearchdescription+xml" title="{xhtml:link[@rel='alternate']/@title} search">
<xsl:attribute name="href">
<xsl:call-template name="baseuri">
<xsl:with-param name="string">
<xsl:value-of select="xhtml:link[@rel='alternate']/@href"/>
</xsl:with-param>
</xsl:call-template>
<xsl:text>opensearchdescription.xml</xsl:text>
</xsl:attribute>
</link>
</xsl:copy>
</xsl:template>
<!-- ensure that scripts don't use empty tag syntax -->
<xsl:template match="xhtml:script">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<xsl:if test="not(node())">
<xsl:comment><!--HTML Compatibility--></xsl:comment>
</xsl:if>
</xsl:copy>
</xsl:template>
<!-- add HTML5 doctype -->
<xsl:template match="/xhtml:html">
<xsl:text disable-output-escaping="yes">&lt;!DOCTYPE html&gt;</xsl:text>
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- pass through everything else -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
"""
Remap all images to take advantage of the Coral Content Distribution
Network <http://www.coralcdn.org/>.
"""
import re, sys, urlparse, xml.dom.minidom
entry = xml.dom.minidom.parse(sys.stdin).documentElement
for node in entry.getElementsByTagName('img'):
if node.hasAttribute('src'):
component = list(urlparse.urlparse(node.getAttribute('src')))
if component[0] == 'http':
component[1] = re.sub(r':(\d+)$', r'.\1', component[1])
component[1] += '.nyud.net:8080'
node.setAttribute('src', urlparse.urlunparse(component))
print entry.toxml('utf-8')
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<!-- Replace atom:author/atom:name with the byline author -->
<xsl:template match="atom:entry/atom:author[../atom:content/xhtml:div/xhtml:span[@class='byline-author' and substring(.,1,10)='Posted by ']]">
<xsl:copy>
<atom:name>
<xsl:value-of select="substring(../atom:content/xhtml:div/xhtml:span[@class='byline-author'],11)"/>
</atom:name>
<xsl:apply-templates select="*[not(self::atom:name)]"/>
</xsl:copy>
</xsl:template>
<!-- Remove byline author -->
<xsl:template match="xhtml:div/xhtml:span[@class='byline-author' and substring(.,1,10)='Posted by ']"/>
<!-- Remove two line breaks following byline author -->
<xsl:template match="xhtml:br[preceding-sibling::*[1][@class='byline-author' and substring(.,1,10)='Posted by ']]"/>
<xsl:template match="xhtml:br[preceding-sibling::*[2][@class='byline-author' and substring(.,1,10)='Posted by ']]"/>
<!-- pass through everything else -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<!-- If the first paragraph consists exclusively of "By author-name",
delete it -->
<xsl:template match="atom:content/xhtml:div/xhtml:p[1][. =
concat('By ', ../../../atom:author/atom:name)]"/>
<!-- pass through everything else -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<!-- If the first paragraph consists contains @class="from", delete it -->
<xsl:template match="atom:content/xhtml:div/xhtml:div[@class='comment']/xhtml:p[1][@class='from']"/>
<!-- pass through everything else -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns="http://www.w3.org/1999/xhtml">
<!-- only retain titles that don't duplicate summary or content -->
<xsl:template match="atom:title">
<xsl:copy>
<xsl:if test="string-length(.) &lt; 30 or
( substring(.,1,string-length(.)-3) !=
substring(../atom:content,1,string-length(.)-3) and
substring(.,1,string-length(.)-3) !=
substring(../atom:summary,1,string-length(.)-3) )">
<xsl:apply-templates select="@*|node()"/>
</xsl:if>
</xsl:copy>
</xsl:template>
<!-- pass through everything else -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
"""
Generate an excerpt from either the summary or a content of an entry.
Parameters:
width: maximum number of characters in the excerpt. Default: 500
omit: whitespace delimited list of html tags to remove. Default: none
target: name of element created. Default: planet:excerpt
Notes:
* if 'img' is in the list of tags to be omitted <img> tags are replaced with
hypertext links associated with the value of the 'alt' attribute. If there
is no alt attribute value, <img> is used instead. If the parent element
of the img tag is already an <a> tag, no additional hypertext links are
added.
"""
import sys, xml.dom.minidom, textwrap
from xml.dom import Node, minidom
atomNS = 'http://www.w3.org/2005/Atom'
planetNS = 'http://planet.intertwingly.net/'
args = dict(zip([name.lstrip('-') for name in sys.argv[1::2]], sys.argv[2::2]))
wrapper = textwrap.TextWrapper(width=int(args.get('width','500')))
omit = args.get('omit', '').split()
target = args.get('target', 'planet:excerpt')
class copy:
""" recursively copy a source to a target, up to a given width """
def __init__(self, dom, source, target):
self.dom = dom
self.full = False
self.text = []
self.textlen = 0
self.copyChildren(source, target)
def copyChildren(self, source, target):
""" copy child nodes of a source to the target """
for child in source.childNodes:
if child.nodeType == Node.ELEMENT_NODE:
self.copyElement(child, target)
elif child.nodeType == Node.TEXT_NODE:
self.copyText(child.data, target)
if self.full: break
def copyElement(self, source, target):
""" copy source element to the target """
# check the omit list
if source.nodeName in omit:
if source.nodeName == 'img':
return self.elideImage(source, target)
return self.copyChildren(source, target)
# copy element, attributes, and children
child = self.dom.createElementNS(source.namespaceURI, source.nodeName)
target.appendChild(child)
for i in range(0, source.attributes.length):
attr = source.attributes.item(i)
child.setAttributeNS(attr.namespaceURI, attr.name, attr.value)
self.copyChildren(source, child)
def elideImage(self, source, target):
""" copy an elided form of the image element to the target """
alt = source.getAttribute('alt') or '<img>'
src = source.getAttribute('src')
if target.nodeName == 'a' or not src:
self.copyText(alt, target)
else:
child = self.dom.createElement('a')
child.setAttribute('href', src)
self.copyText(alt, child)
target.appendChild(child)
def copyText(self, source, target):
""" copy text to the target, until the point where it would wrap """
if not source.isspace() and source.strip():
self.text.append(source.strip())
lines = wrapper.wrap(' '.join(self.text))
if len(lines) == 1:
target.appendChild(self.dom.createTextNode(source))
self.textlen = len(lines[0])
elif lines:
excerpt = source[:len(lines[0])-self.textlen] + u' \u2026'
target.appendChild(dom.createTextNode(excerpt))
self.full = True
# select summary or content element
dom = minidom.parse(sys.stdin)
source = dom.getElementsByTagNameNS(atomNS, 'summary')
if not source:
source = dom.getElementsByTagNameNS(atomNS, 'content')
# if present, recursively copy it to a planet:excerpt element
if source:
if target.startswith('planet:'):
dom.documentElement.setAttribute('xmlns:planet', planetNS)
if target.startswith('atom:'): target = target.split(':',1)[1]
excerpt = dom.createElementNS(planetNS, target)
source[0].parentNode.appendChild(excerpt)
copy(dom, source[0], excerpt)
if source[0].nodeName == excerpt.nodeName:
source[0].parentNode.removeChild(source[0])
# print out results
print dom.toxml('utf-8')
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<!-- Replace title with value of h1, if present -->
<xsl:template match="atom:title">
<xsl:apply-templates select="@*"/>
<xsl:copy>
<xsl:choose>
<xsl:when test="count(//xhtml:h1) = 1">
<xsl:value-of select="normalize-space(//xhtml:h1)"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="node()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
<!-- Remove all h1s -->
<xsl:template match="xhtml:h1"/>
<!-- pass through everything else -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
import sys
import html5lib
tree=html5lib.treebuilders.dom.TreeBuilder
parser = html5lib.html5parser.HTMLParser(tree=tree)
document = parser.parse(sys.stdin)
sys.stdout.write(document.toxml("utf-8"))
This diff is collapsed.
#
# Ensure that all headings are below a permissible maximum (like h3).
# If not, all heading levels will be changed to conform.
# Note: this may create "illegal" heading levels, like h7 and beyond.
#
import sys
from xml.dom import minidom, XHTML_NAMESPACE
# determine permissible minimimum heading
if '--min' in sys.argv:
minhead = int(sys.argv[sys.argv.index('--min')+1])
else:
minhead=3
# parse input stream
doc = minidom.parse(sys.stdin)
# search for headings below the permissable minimum
first=minhead
for i in range(1,minhead):
if doc.getElementsByTagName('h%d' % i):
first=i
break
# if found, bump all headings so that the top is the permissible minimum
if first < minhead:
for i in range(6,0,-1):
for oldhead in doc.getElementsByTagName('h%d' % i):
newhead = doc.createElementNS(XHTML_NAMESPACE, 'h%d' % (i+minhead-first))
for child in oldhead.childNodes[:]:
newhead.appendChild(child)
oldhead.parentNode.replaceChild(newhead, oldhead)
# return (possibly modified) document
print doc.toxml('utf-8')
#remove all tweets
import sys
data = sys.stdin.read()
if data.find('<id>tag:twitter.com,') < 0:
sys.stdout.write(data)
import sys, re
# parse options
options = dict(zip(sys.argv[1::2],sys.argv[2::2]))
# read entry
doc = data = sys.stdin.read()
# Apply a sequence of patterns which turn a normalized Atom entry into
# a stream of text, after removal of non-human metadata.
for pattern,replacement in [
(re.compile('<id>.*?</id>'),' '),
(re.compile('<url>.*?</url>'),' '),
(re.compile('<source>.*?</source>'),' '),
(re.compile('<updated.*?</updated>'),' '),
(re.compile('<published.*?</published>'),' '),
(re.compile('<link .*?>'),' '),
(re.compile('''<[^>]* alt=['"]([^'"]*)['"].*?>'''),r' \1 '),
(re.compile('''<[^>]* title=['"]([^'"]*)['"].*?>'''),r' \1 '),
(re.compile('''<[^>]* label=['"]([^'"]*)['"].*?>'''),r' \1 '),
(re.compile('''<[^>]* term=['"]([^'"]*)['"].*?>'''),r' \1 '),
(re.compile('<.*?>'),' '),
(re.compile('\s+'),' '),
(re.compile('&gt;'),'>'),
(re.compile('&lt;'),'<'),
(re.compile('&apos;'),"'"),
(re.compile('&quot;'),'"'),
(re.compile('&amp;'),'&'),
(re.compile('\s+'),' ')
]:
data=pattern.sub(replacement,data)
# process requirements
if options.has_key('--require'):
for regexp in options['--require'].split('\n'):
if regexp and not re.search(regexp,data): sys.exit(1)
# process exclusions
if options.has_key('--exclude'):
for regexp in options['--exclude'].split('\n'):
if regexp and re.search(regexp,data): sys.exit(1)
# if we get this far, the feed is to be included
print doc
s|<p><a href="http://[a-zA-Z0-9\-\.]*/~a/[a-zA-Z0-9]*?a=[a-zA-Z0-9]*"><img border="0" src="http://[a-zA-Z0-9\.\-]*/~a/[a-zA-Z0-9/]*?i=[a-zA-Z0-9]*"/></a></p>||g
s|<p><map name="google_ad_map.*</p>||
s|<p><!-- begin(Yahoo ad) -->.*<!-- end(Yahoo ad) --></p>||
# Example usages:
#
# filters:
# xhtml2html.plugin?quote_attr_values=True&quote_char="'"
#
# -- or --
#
# [xhtml2html.plugin]
# quote_attr_values=True
# quote_char="'"
import sys
opts = {}
for name,value in zip(sys.argv[1::2],sys.argv[2::2]):
name = name.lstrip('-')
try: opts[name] = eval(value)
except: opts[name] = value
try:
from xml.dom import minidom
doc = minidom.parse(sys.stdin)
except:
from html5lib import liberalxmlparser, treebuilders
parser = liberalxmlparser.XHTMLParser(tree=treebuilders.getTreeBuilder('dom'))
doc = parser.parse(sys.stdin, encoding='utf-8')
from html5lib import treewalkers, serializer
tokens = treewalkers.getTreeWalker('dom')(doc)
serializer = serializer.HTMLSerializer(**dict(opts))
for text in serializer.serialize(tokens, encoding='utf-8'):
sys.stdout.write(text)
import sys, libxml2
# parse options
options = dict(zip(sys.argv[1::2],sys.argv[2::2]))
# parse entry
doc = libxml2.parseDoc(sys.stdin.read())
ctxt = doc.xpathNewContext()
ctxt.xpathRegisterNs('atom','http://www.w3.org/2005/Atom')
ctxt.xpathRegisterNs('xhtml','http://www.w3.org/1999/xhtml')
# process requirements
if options.has_key('--require'):
for xpath in options['--require'].split('\n'):
if xpath and not ctxt.xpathEval(xpath): sys.exit(1)
# process exclusions
if options.has_key('--exclude'):
for xpath in options['--exclude'].split('\n'):
if xpath and ctxt.xpathEval(xpath): sys.exit(1)
# if we get this far, the feed is to be included
print doc
#!/usr/bin/env python
"""The Planet aggregator.
A flexible and easy-to-use aggregator for generating websites.
Visit http://www.planetplanet.org/ for more information and to download
the latest version.
Requires Python 2.1, recommends 2.3.
"""
__authors__ = [ "Scott James Remnant <scott@netsplit.com>",
"Jeff Waugh <jdub@perkypants.org>" ]
__license__ = "Python"
import os, sys
if __name__ == "__main__":
config_file = []
offline = 0
verbose = 0
only_if_new = 0
expunge = 0
debug_splice = 0
no_publish = 0
for arg in sys.argv[1:]:
if arg == "-h" or arg == "--help":
print "Usage: planet [options] [CONFIGFILE]"
print
print "Options:"
print " -v, --verbose DEBUG level logging during update"
print " -o, --offline Update the Planet from the cache only"
print " -h, --help Display this help message and exit"
print " -n, --only-if-new Only spider new feeds"
print " -x, --expunge Expunge old entries from cache"
print " --no-publish Do not publish feeds using PubSubHubbub"
print
sys.exit(0)
elif arg == "-v" or arg == "--verbose":
verbose = 1
elif arg == "-o" or arg == "--offline":
offline = 1
elif arg == "-n" or arg == "--only-if-new":
only_if_new = 1
elif arg == "-x" or arg == "--expunge":
expunge = 1
elif arg == "-d" or arg == "--debug-splice":
debug_splice = 1
elif arg == "--no-publish":
no_publish = 1
elif arg.startswith("-"):
print >>sys.stderr, "Unknown option:", arg
sys.exit(1)
else:
config_file.append(arg)
from planet import config
config.load(config_file or 'config.ini')
if verbose:
import planet
planet.getLogger('DEBUG',config.log_format())
if not offline:
from planet import spider
try:
spider.spiderPlanet(only_if_new=only_if_new)
except Exception, e:
print e
from planet import splice
doc = splice.splice()
if debug_splice:
from planet import logger
logger.info('writing debug.atom')
debug=open('debug.atom','w')
try:
from lxml import etree
from StringIO import StringIO
tree = etree.tostring(etree.parse(StringIO(doc.toxml())))
debug.write(etree.tostring(tree, pretty_print=True))
except:
debug.write(doc.toprettyxml(indent=' ', encoding='utf-8'))
debug.close
splice.apply(doc.toxml('utf-8'))
if config.pubsubhubbub_hub() and not no_publish:
from planet import publish
publish.publish(config)
if expunge:
from planet import expunge
expunge.expungeCache
xmlns = 'http://planet.intertwingly.net/'
logger = None
loggerParms = None
import os, sys, re
import config
config.__init__()
from ConfigParser import ConfigParser
from urlparse import urljoin
def getLogger(level, format):
""" get a logger with the specified log level """
global logger, loggerParms
if logger and loggerParms == (level,format): return logger
try:
import logging
logging.basicConfig(format=format)
except:
import compat_logging as logging
logging.basicConfig(format=format)
logger = logging.getLogger("planet.runner")
logger.setLevel(logging.getLevelName(level))
try:
logger.warning
except:
logger.warning = logger.warn
loggerParms = (level,format)
return logger
sys.path.insert(1, os.path.join(os.path.dirname(__file__),'vendor'))
# Configure feed parser
import feedparser
feedparser.SANITIZE_HTML=1
feedparser.RESOLVE_RELATIVE_URIS=0
import publish
This diff is collapsed.
from ConfigParser import ConfigParser
import csv
# input = csv, output = ConfigParser
def csv2config(input, config=None):
if not hasattr(input, 'read'):
input = csv.StringIO(input)
if not config:
config = ConfigParser()
reader = csv.DictReader(input)
for row in reader:
section = row[reader.fieldnames[0]]
if not config.has_section(section):
config.add_section(section)
for name, value in row.items():
if value and name != reader.fieldnames[0]:
config.set(section, name, value)
return config
if __name__ == "__main__":
# small main program which converts CSV into config.ini format
import sys, urllib
config = ConfigParser()
for input in sys.argv[1:]:
csv2config(urllib.urlopen(input), config)
config.write(sys.stdout)
""" Expunge old entries from a cache of entries """
import glob, os, planet, config, feedparser
from xml.dom import minidom
from spider import filename
def expungeCache():
""" Expunge old entries from a cache of entries """
log = planet.logger
log.info("Determining feed subscriptions")
entry_count = {}
sources = config.cache_sources_directory()
for sub in config.subscriptions():
data=feedparser.parse(filename(sources,sub))
if not data.feed.has_key('id'): continue
if config.feed_options(sub).has_key('cache_keep_entries'):
entry_count[data.feed.id] = int(config.feed_options(sub)['cache_keep_entries'])
else:
entry_count[data.feed.id] = config.cache_keep_entries()
log.info("Listing cached entries")
cache = config.cache_directory()
dir=[(os.stat(file).st_mtime,file) for file in glob.glob(cache+"/*")
if not os.path.isdir(file)]
dir.sort()
dir.reverse()
for mtime,file in dir:
try:
entry=minidom.parse(file)
# determine source of entry
entry.normalize()
sources = entry.getElementsByTagName('source')
if not sources:
# no source determined, do not delete
log.debug("No source found for %s", file)
continue
ids = sources[0].getElementsByTagName('id')
if not ids:
# feed id not found, do not delete
log.debug("No source feed id found for %s", file)
continue
if ids[0].childNodes[0].nodeValue in entry_count:
# subscribed to feed, update entry count
entry_count[ids[0].childNodes[0].nodeValue] = entry_count[
ids[0].childNodes[0].nodeValue] - 1
if entry_count[ids[0].childNodes[0].nodeValue] >= 0:
# maximum not reached, do not delete
log.debug("Maximum not reached for %s from %s",
file, ids[0].childNodes[0].nodeValue)
continue
else:
# maximum reached
log.debug("Removing %s, maximum reached for %s",
file, ids[0].childNodes[0].nodeValue)
else:
# not subscribed
log.debug("Removing %s, not subscribed to %s",
file, ids[0].childNodes[0].nodeValue)
# remove old entry
os.unlink(file)
except:
log.error("Error parsing %s", file)
# end of expungeCache()
from ConfigParser import ConfigParser
inheritable_options = [ 'online_accounts' ]
def load_accounts(config, section):
accounts = {}
if(config.has_option(section, 'online_accounts')):
values = config.get(section, 'online_accounts')
for account_map in values.split('\n'):
try:
homepage, map = account_map.split('|')
accounts[homepage] = map
except:
pass
return accounts
def load_model(rdf, base_uri):
if hasattr(rdf, 'find_statements'):
return rdf
if hasattr(rdf, 'read'):
rdf = rdf.read()
def handler(code, level, facility, message, line, column, byte, file, uri):
pass
from RDF import Model, Parser
model = Model()
Parser().parse_string_into_model(model,rdf,base_uri,handler)
return model
# input = foaf, output = ConfigParser
def foaf2config(rdf, config, subject=None, section=None):
if not config or not config.sections():
return
# there should be only be 1 section
if not section: section = config.sections().pop()
try:
from RDF import Model, NS, Parser, Statement
except:
return
# account mappings, none by default
# form: accounts = {url to service homepage (as found in FOAF)}|{URI template}\n*
# example: http://del.icio.us/|http://del.icio.us/rss/{foaf:accountName}
accounts = load_accounts(config, section)
depth = 0
if(config.has_option(section, 'depth')):
depth = config.getint(section, 'depth')
model = load_model(rdf, section)
dc = NS('http://purl.org/dc/elements/1.1/')
foaf = NS('http://xmlns.com/foaf/0.1/')
rdfs = NS('http://www.w3.org/2000/01/rdf-schema#')
rdf = NS('http://www.w3.org/1999/02/22-rdf-syntax-ns#')
rss = NS('http://purl.org/rss/1.0/')
for statement in model.find_statements(Statement(subject,foaf.weblog,None)):
# feed owner
person = statement.subject
# title is required (at the moment)
title = model.get_target(person,foaf.name)
if not title: title = model.get_target(statement.object,dc.title)
if not title:
continue
# blog is optional
feed = model.get_target(statement.object,rdfs.seeAlso)
if feed and rss.channel == model.get_target(feed, rdf.type):
feed = str(feed.uri)
if not config.has_section(feed):
config.add_section(feed)
config.set(feed, 'name', str(title))
# now look for OnlineAccounts for the same person
if accounts.keys():
for statement in model.find_statements(Statement(person,foaf.holdsAccount,None)):
rdfaccthome = model.get_target(statement.object,foaf.accountServiceHomepage)
rdfacctname = model.get_target(statement.object,foaf.accountName)
if not rdfaccthome or not rdfacctname: continue
if not rdfaccthome.is_resource() or not accounts.has_key(str(rdfaccthome.uri)): continue
if not rdfacctname.is_literal(): continue
rdfacctname = rdfacctname.literal_value['string']
rdfaccthome = str(rdfaccthome.uri)
# shorten feed title a bit
try:
servicetitle = rdfaccthome.replace('http://','').split('/')[0]
except:
servicetitle = rdfaccthome
feed = accounts[rdfaccthome].replace("{foaf:accountName}", rdfacctname)
if not config.has_section(feed):
config.add_section(feed)
config.set(feed, 'name', "%s (%s)" % (title, servicetitle))
if depth > 0:
# now the fun part, let's go after more friends
for statement in model.find_statements(Statement(person,foaf.knows,None)):
friend = statement.object
# let's be safe
if friend.is_literal(): continue
seeAlso = model.get_target(friend,rdfs.seeAlso)
# nothing to see
if not seeAlso or not seeAlso.is_resource(): continue
seeAlso = str(seeAlso.uri)
if not config.has_section(seeAlso):
config.add_section(seeAlso)
copy_options(config, section, seeAlso,
{ 'content_type' : 'foaf',
'depth' : str(depth - 1) })
try:
from planet.config import downloadReadingList
downloadReadingList(seeAlso, config,
lambda data, subconfig : friend2config(model, friend, seeAlso, subconfig, data),
False)
except:
pass
return
def copy_options(config, parent_section, child_section, overrides = {}):
global inheritable_options
for option in [x for x in config.options(parent_section) if x in inheritable_options]:
if not overrides.has_key(option):
config.set(child_section, option, config.get(parent_section, option))
for option, value in overrides.items():
config.set(child_section, option, value)
def friend2config(friend_model, friend, seeAlso, subconfig, data):
try:
from RDF import Model, NS, Parser, Statement
except:
return
dc = NS('http://purl.org/dc/elements/1.1/')
foaf = NS('http://xmlns.com/foaf/0.1/')
rdf = NS('http://www.w3.org/1999/02/22-rdf-syntax-ns#')
rdfs = NS('http://www.w3.org/2000/01/rdf-schema#')
# FOAF InverseFunctionalProperties
ifps = [foaf.mbox, foaf.mbox_sha1sum, foaf.jabberID, foaf.aimChatID,
foaf.icqChatID, foaf.yahooChatID, foaf.msnChatID, foaf.homepage, foaf.weblog]
model = load_model(data, seeAlso)
for statement in model.find_statements(Statement(None,rdf.type,foaf.Person)):
samefriend = statement.subject
# maybe they have the same uri
if friend.is_resource() and samefriend.is_resource() and friend == samefriend:
foaf2config(model, subconfig, samefriend)
return
for ifp in ifps:
object = model.get_target(samefriend,ifp)
if object and object == friend_model.get_target(friend, ifp):
foaf2config(model, subconfig, samefriend)
return
if __name__ == "__main__":
import sys, urllib
config = ConfigParser()
for uri in sys.argv[1:]:
config.add_section(uri)
foaf2config(urllib.urlopen(uri), config, section=uri)
config.remove_section(uri)
config.write(sys.stdout)
from glob import glob
import os, sys
if __name__ == '__main__':
rootdir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, rootdir)
from planet.spider import filename
from planet import config
def open():
try:
cache = config.cache_directory()
index=os.path.join(cache,'index')
if not os.path.exists(index): return None
import dbhash
return dbhash.open(filename(index, 'id'),'w')
except Exception, e:
if e.__class__.__name__ == 'DBError': e = e.args[-1]
from planet import logger as log
log.error(str(e))
def destroy():
from planet import logger as log
cache = config.cache_directory()
index=os.path.join(cache,'index')
if not os.path.exists(index): return None
idindex = filename(index, 'id')
if os.path.exists(idindex): os.unlink(idindex)
os.removedirs(index)
log.info(idindex + " deleted")
def create():
from planet import logger as log
cache = config.cache_directory()
index=os.path.join(cache,'index')
if not os.path.exists(index): os.makedirs(index)
import dbhash
index = dbhash.open(filename(index, 'id'),'c')
try:
import libxml2
except:
libxml2 = False
from xml.dom import minidom
for file in glob(cache+"/*"):
if os.path.isdir(file):
continue
elif libxml2:
try:
doc = libxml2.parseFile(file)
ctxt = doc.xpathNewContext()
ctxt.xpathRegisterNs('atom','http://www.w3.org/2005/Atom')
entry = ctxt.xpathEval('/atom:entry/atom:id')
source = ctxt.xpathEval('/atom:entry/atom:source/atom:id')
if entry and source:
index[filename('',entry[0].content)] = source[0].content
doc.freeDoc()
except:
log.error(file)
else:
try:
doc = minidom.parse(file)
doc.normalize()
ids = doc.getElementsByTagName('id')
entry = [e for e in ids if e.parentNode.nodeName == 'entry']
source = [e for e in ids if e.parentNode.nodeName == 'source']
if entry and source:
index[filename('',entry[0].childNodes[0].nodeValue)] = \
source[0].childNodes[0].nodeValue
doc.freeDoc()
except:
log.error(file)
log.info(str(len(index.keys())) + " entries indexed")
index.close()
return open()
if __name__ == '__main__':
if len(sys.argv) < 2:
print 'Usage: %s [-c|-d]' % sys.argv[0]
sys.exit(1)
config.load(sys.argv[1])
if len(sys.argv) > 2 and sys.argv[2] == '-c':
create()
elif len(sys.argv) > 2 and sys.argv[2] == '-d':
destroy()
else:
from planet import logger as log
index = open()
if index:
log.info(str(len(index.keys())) + " entries indexed")
index.close()
else:
log.info("no entries indexed")
from xml.sax import ContentHandler, make_parser, SAXParseException
from xml.sax.xmlreader import InputSource
from sgmllib import SGMLParser
from cStringIO import StringIO
from ConfigParser import ConfigParser
from htmlentitydefs import entitydefs
import re
# input = opml, output = ConfigParser
def opml2config(opml, config=None):
if hasattr(opml, 'read'):
opml = opml.read()
if not config:
config = ConfigParser()
opmlParser = OpmlParser(config)
try:
# try SAX
source = InputSource()
source.setByteStream(StringIO(opml))
parser = make_parser()
parser.setContentHandler(opmlParser)
parser.parse(source)
except SAXParseException:
# try as SGML
opmlParser.feed(opml)
return config
# Parse OPML via either SAX or SGML
class OpmlParser(ContentHandler,SGMLParser):
entities = re.compile('&(#?\w+);')
def __init__(self, config):
ContentHandler.__init__(self)
SGMLParser.__init__(self)
self.config = config
def startElement(self, name, attrs):
# we are only looking for data in 'outline' nodes.
if name != 'outline': return
# A type of 'rss' is meant to be used generically to indicate that
# this is an entry in a subscription list, but some leave this
# attribute off, and others have placed 'atom' in here
if attrs.has_key('type'):
if attrs['type'] == 'link' and not attrs.has_key('url'):
# Auto-correct WordPress link manager OPML files
attrs = dict(attrs.items())
attrs['type'] = 'rss'
if attrs['type'].lower() not in['rss','atom']: return
# The feed itself is supposed to be in an attribute named 'xmlUrl'
# (note the camel casing), but this has proven to be problematic,
# with the most common misspelling being in all lower-case
if not attrs.has_key('xmlUrl') or not attrs['xmlUrl'].strip():
for attribute in attrs.keys():
if attribute.lower() == 'xmlurl' and attrs[attribute].strip():
attrs = dict(attrs.items())
attrs['xmlUrl'] = attrs[attribute]
break
else:
return
# the text attribute is nominally required in OPML, but this
# data is often found in a title attribute instead
if not attrs.has_key('text') or not attrs['text'].strip():
if not attrs.has_key('title') or not attrs['title'].strip(): return
attrs = dict(attrs.items())
attrs['text'] = attrs['title']
# if we get this far, we either have a valid subscription list entry,
# or one with a correctable error. Add it to the configuration, if
# it is not already there.
xmlUrl = attrs['xmlUrl']
if not self.config.has_section(xmlUrl):
self.config.add_section(xmlUrl)
self.config.set(xmlUrl, 'name', self.unescape(attrs['text']))
def unescape(self, text):
parsed = self.entities.split(text)
for i in range(1,len(parsed),2):
if parsed[i] in entitydefs.keys():
# named entities
codepoint=entitydefs[parsed[i]]
match=self.entities.match(codepoint)
if match:
parsed[i]=match.group(1)
else:
parsed[i]=unichr(ord(codepoint))
# numeric entities
if parsed[i].startswith('#'):
if parsed[i].startswith('#x'):
parsed[i]=unichr(int(parsed[i][2:],16))
else:
parsed[i]=unichr(int(parsed[i][1:]))
return u''.join(parsed).encode('utf-8')
# SGML => SAX
def unknown_starttag(self, name, attrs):
attrs = dict(attrs)
for attribute in attrs:
try:
attrs[attribute] = attrs[attribute].decode('utf-8')
except:
work = attrs[attribute].decode('iso-8859-1')
work = u''.join([c in cp1252 and cp1252[c] or c for c in work])
attrs[attribute] = work
self.startElement(name, attrs)
# http://www.intertwingly.net/stories/2004/04/14/i18n.html#CleaningWindows
cp1252 = {
unichr(128): unichr(8364), # euro sign
unichr(130): unichr(8218), # single low-9 quotation mark
unichr(131): unichr( 402), # latin small letter f with hook
unichr(132): unichr(8222), # double low-9 quotation mark
unichr(133): unichr(8230), # horizontal ellipsis
unichr(134): unichr(8224), # dagger
unichr(135): unichr(8225), # double dagger
unichr(136): unichr( 710), # modifier letter circumflex accent
unichr(137): unichr(8240), # per mille sign
unichr(138): unichr( 352), # latin capital letter s with caron
unichr(139): unichr(8249), # single left-pointing angle quotation mark
unichr(140): unichr( 338), # latin capital ligature oe
unichr(142): unichr( 381), # latin capital letter z with caron
unichr(145): unichr(8216), # left single quotation mark
unichr(146): unichr(8217), # right single quotation mark
unichr(147): unichr(8220), # left double quotation mark
unichr(148): unichr(8221), # right double quotation mark
unichr(149): unichr(8226), # bullet
unichr(150): unichr(8211), # en dash
unichr(151): unichr(8212), # em dash
unichr(152): unichr( 732), # small tilde
unichr(153): unichr(8482), # trade mark sign
unichr(154): unichr( 353), # latin small letter s with caron
unichr(155): unichr(8250), # single right-pointing angle quotation mark
unichr(156): unichr( 339), # latin small ligature oe
unichr(158): unichr( 382), # latin small letter z with caron
unichr(159): unichr( 376)} # latin capital letter y with diaeresis
if __name__ == "__main__":
# small main program which converts OPML into config.ini format
import sys, urllib
config = ConfigParser()
for opml in sys.argv[1:]:
opml2config(urllib.urlopen(opml), config)
config.write(sys.stdout)
import os, sys
import urlparse
import planet
import pubsubhubbub_publisher as PuSH
def publish(config):
log = planet.logger
hub = config.pubsubhubbub_hub()
link = config.link()
# identify feeds
feeds = []
if hub and link:
for root, dirs, files in os.walk(config.output_dir()):
for file in files:
if file in config.pubsubhubbub_feeds():
feeds.append(urlparse.urljoin(link, file))
# publish feeds
if feeds:
try:
PuSH.publish(hub, feeds)
for feed in feeds:
log.info("Published %s to %s\n" % (feed, hub))
except PuSH.PublishError, e:
log.error("PubSubHubbub publishing error: %s\n" % e)
This diff is collapsed.
"""
Process a set of configuration defined sanitations on a given feed.
"""
# Standard library modules
import time
# Planet modules
import planet, config, shell
from planet import feedparser
type_map = {'text': 'text/plain', 'html': 'text/html',
'xhtml': 'application/xhtml+xml'}
def scrub(feed_uri, data):
# some data is not trustworthy
for tag in config.ignore_in_feed(feed_uri).split():
if tag.find('lang')>=0: tag='language'
if data.feed.has_key(tag): del data.feed[tag]
for entry in data.entries:
if entry.has_key(tag): del entry[tag]
if entry.has_key(tag + "_detail"): del entry[tag + "_detail"]
if entry.has_key(tag + "_parsed"): del entry[tag + "_parsed"]
for key in entry.keys():
if not key.endswith('_detail'): continue
for detail in entry[key].copy():
if detail == tag: del entry[key][detail]
# adjust title types
if config.title_type(feed_uri):
title_type = config.title_type(feed_uri)
title_type = type_map.get(title_type, title_type)
for entry in data.entries:
if entry.has_key('title_detail'):
entry.title_detail['type'] = title_type
# adjust summary types
if config.summary_type(feed_uri):
summary_type = config.summary_type(feed_uri)
summary_type = type_map.get(summary_type, summary_type)
for entry in data.entries:
if entry.has_key('summary_detail'):
entry.summary_detail['type'] = summary_type
# adjust content types
if config.content_type(feed_uri):
content_type = config.content_type(feed_uri)
content_type = type_map.get(content_type, content_type)
for entry in data.entries:
if entry.has_key('content'):
entry.content[0]['type'] = content_type
# some people put html in author names
if config.name_type(feed_uri).find('html')>=0:
from shell.tmpl import stripHtml
if data.feed.has_key('author_detail') and \
data.feed.author_detail.has_key('name'):
data.feed.author_detail['name'] = \
str(stripHtml(data.feed.author_detail.name))
for entry in data.entries:
if entry.has_key('author_detail') and \
entry.author_detail.has_key('name'):
entry.author_detail['name'] = \
str(stripHtml(entry.author_detail.name))
if entry.has_key('source'):
source = entry.source
if source.has_key('author_detail') and \
source.author_detail.has_key('name'):
source.author_detail['name'] = \
str(stripHtml(source.author_detail.name))
# handle dates in the future
future_dates = config.future_dates(feed_uri).lower()
if future_dates == 'ignore_date':
now = time.gmtime()
if data.feed.has_key('updated_parsed') and data.feed['updated_parsed']:
if data.feed['updated_parsed'] > now: del data.feed['updated_parsed']
for entry in data.entries:
if entry.has_key('published_parsed') and entry['published_parsed']:
if entry['published_parsed'] > now:
del entry['published_parsed']
del entry['published']
if entry.has_key('updated_parsed') and entry['updated_parsed']:
if entry['updated_parsed'] > now:
del entry['updated_parsed']
del entry['updated']
elif future_dates == 'ignore_entry':
now = time.time()
if data.feed.has_key('updated_parsed') and data.feed['updated_parsed']:
if data.feed['updated_parsed'] > now: del data.feed['updated_parsed']
data.entries = [entry for entry in data.entries if
(not entry.has_key('published_parsed') or not entry['published_parsed']
or entry['published_parsed'] <= now) and
(not entry.has_key('updated_parsed') or not entry['updated_parsed']
or entry['updated_parsed'] <= now)]
scrub_xmlbase = config.xml_base(feed_uri)
# resolve relative URIs and sanitize
for entry in data.entries + [data.feed]:
for key in entry.keys():
if key == 'content'and not entry.has_key('content_detail'):
node = entry.content[0]
elif key.endswith('_detail'):
node = entry[key]
else:
continue
if not node.has_key('type'): continue
if not 'html' in node['type']: continue
if not node.has_key('value'): continue
if node.has_key('base'):
if scrub_xmlbase:
if scrub_xmlbase == 'feed_alternate':
if entry.has_key('source') and \
entry.source.has_key('link'):
node['base'] = entry.source.link
elif data.feed.has_key('link'):
node['base'] = data.feed.link
elif scrub_xmlbase == 'entry_alternate':
if entry.has_key('link'):
node['base'] = entry.link
else:
node['base'] = feedparser._urljoin(
node['base'], scrub_xmlbase)
node['value'] = feedparser._resolveRelativeURIs(
node.value, node.base, 'utf-8', node.type)
# Run this through HTML5's sanitizer
doc = None
if 'xhtml' in node['type']:
try:
from xml.dom import minidom
doc = minidom.parseString(node['value'])
except:
node['type']='text/html'
if not doc:
from html5lib import html5parser, treebuilders
p=html5parser.HTMLParser(tree=treebuilders.getTreeBuilder('dom'))
doc = p.parseFragment(node['value'], encoding='utf-8')
from html5lib import treewalkers, serializer
from html5lib.filters import sanitizer
walker = sanitizer.Filter(treewalkers.getTreeWalker('dom')(doc))
xhtml = serializer.XHTMLSerializer(inject_meta_charset = False)
tree = xhtml.serialize(walker, encoding='utf-8')
node['value'] = ''.join([str(token) for token in tree])
import planet
import os
import sys
logged_modes = []
def run(template_file, doc, mode='template'):
""" select a template module based on file extension and execute it """
log = planet.logger
if mode == 'template':
dirs = planet.config.template_directories()
else:
dirs = planet.config.filter_directories()
# parse out "extra" options
if template_file.find('?') < 0:
extra_options = {}
else:
import cgi
template_file, extra_options = template_file.split('?',1)
extra_options = dict(cgi.parse_qsl(extra_options))
# see if the template can be located
for template_dir in dirs:
template_resolved = os.path.join(template_dir, template_file)
if os.path.exists(template_resolved): break
else:
log.error("Unable to locate %s %s", mode, template_file)
if not mode in logged_modes:
log.info("%s search path:", mode)
for template_dir in dirs:
log.info(" %s", os.path.realpath(template_dir))
logged_modes.append(mode)
return
template_resolved = os.path.abspath(template_resolved)
# Add shell directory to the path, if not already there
shellpath = os.path.join(sys.path[0],'planet','shell')
if shellpath not in sys.path:
sys.path.append(shellpath)
# Try loading module for processing this template, based on the extension
base,ext = os.path.splitext(os.path.basename(template_resolved))
module_name = ext[1:]
try:
try:
module = __import__("_" + module_name)
except:
module = __import__(module_name)
except Exception, inst:
return log.error("Skipping %s '%s' after failing to load '%s': %s",
mode, template_resolved, module_name, inst)
# Execute the shell module
options = planet.config.template_options(template_file)
if module_name == 'plugin': options['__file__'] = template_file
options.update(extra_options)
log.debug("Processing %s %s using %s", mode,
os.path.realpath(template_resolved), module_name)
if mode == 'filter':
return module.run(template_resolved, doc, None, options)
else:
output_dir = planet.config.output_dir()
output_file = os.path.join(output_dir, base)
module.run(template_resolved, doc, output_file, options)
return output_file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment