Developing Web Applications with Quixote
PyCon 2004
March 24, 2004
A.M. Kuchling
www.amk.ca
amk @ amk.ca
Overview
Quixote is a Web development framework written in Python.
Some of Quixote's goals:
- Simplicity
- Familiar to Python programmers
- Encourages good design (in several senses)
- Easy to use with other Python libraries
Related tools:
- Sancho: unit testing
- Dulcinea: various ZODB and Quixote-related utility classes
Sites/Applications using Quixote
Sites:
Applications:
Quixote was originally written for the MEMS Exchange, a project
that aims to implement a network for distributed semiconductor
fabrication, a network coordinated over the web. For more
information about the architecture we used on that project, see
"The
MEMS Exchange Architecture", a paper presented at PyCon
2003.
Linux Weekly News is the highest-traffic Quixote site, and
demonstrates that Quixote can be pretty scalable. Using Quixote and
mod_python, LWN survived a Slashdotting while running on a
relatively small machine, a 1GHz Pentium with 512Mb of RAM.
Most Quixote projects are for internal use. One publicly
available project is Cartwheel, which performs
genomic sequence analysis. I'm working on a Slashdot clone named
Solidus, and hope to have an alpha version available before
PyCon.
The Quixote Toolkit
- Publisher: maps URLs to Python code
- HTTPRequest, HTTPResponse classes
- Web server interfaces for CGI, FastCGI, SCGI, mod_python
- Python Templating Language (optional)
- Form framework (optional)
How Quixote Works: The basic idea
- Take the URL path and split it apart:
http://example.com/catalog/item/details
→ ['catalog', 'item', 'details']
- Starting at a configured root package (e.g. 'store.web'),
follow the URL path down:
http://example.com/ → store.web._q_index()
http://.../catalog/ → store.web.catalog()
or store.web.catalog._q_index
/catalog/item/ → store.web.catalog.item()
or store.web.catalog.item._q_index()
- Call it and send the output to the client:
output = store.web.catalog(request)
Quixote applications are Python packages, so they can be
installed using the Distutils and similar tools. Incoming HTTP
requests are mapped to a chunk of Python code, which is executed
and passed an object representing the contents of the request; the
code returns a string containing the contents that will be returned
to the client.
The code to be run to determined like this.
- Quixote is configured with the name of a root package, such as
store.web
.
- On receiving a request, the path is split apart at the slashes;
/catalog/item/details
becomes ['catalog',
'item', 'details']
.
- Quixote then looks for the 'catalog' attribute of
store.web
, and finds the corresponding object.
- Quixote then looks for the 'item' attribute of
store.web.catalog
, and so forth.
- The search stops when the end of the URL path is reached, or
when a callable object is found.
- If the object is callable -- a function, a method, an object
instance with a
__call__
method -- it's called. If
not, Quixote looks for a _q_index
method.
- The thing being called gets one argument,
request
,
and must return a string. (Actually, it can also return an instance
of a Stream
class in order to return streaming
output.)
- The string is sent back to the HTTP client.
This is something like Zope's traversal, but the rules are
simpler; applications can't change this algorithm or override it.
There are still some special names, though, that we'll look at
after providing a simple example.
Simple example
From quixote/demo/__init__.py:
# Every publicly accessible attribute has to be listed in _q_exports.
_q_exports = ["simple"]
def _q_index (request):
return """
<html>
<body>
...""" def simple (request): # This function returns a plain text
document, not HTML. request.response.set_content_type("text/plain")
return "This is the Python function 'quixote.demo.simple'.\n"
Because Quixote publishes the contents of Python modules, there
has to be a way of declaring which functions should be considered
public and can be called through HTTP requests. This is done by
listing the public names in a _q_exports
module
variable or object attribute; Quixote will not traverse into an
object or module that lacks a _q_exports
attribute.
How Quixote Works: Special names
_q_index
: If traversal ends up at an object that
isn't callable, this name is checked for and called.
_q_lookup
: if an attribute isn't found, this name
is checked for and called with the attribute.
_q_access
: at every step, this name is checked for
and called to perform access checks.
_q_resolve
: like a memoized version of _q_lookup
(rarely used)
All the names special to Quixote begin with _q_.
How Quixote Works: _q_lookup example
This example handles URLs such as /whatever/1/, .../2/, etc.
def _q_lookup (request, component):
try:
key = int(component)
except ValueError:
raise TraversalError("URL component is not an integer")
obj = ... database lookup (key) ...
if obj is None:
raise TraversalError("No such object.")
# Traversal will continue with the ObjectUI instance
return ObjectUI(obj)
How Quixote works: _q_access example
_q_access
is always called before traversing any
further.
This example requires that all users must be logged in.
from quixote.errors import AccessError, TraversalError
def _q_access (request):
if request.session.user is None:
raise AccessError("You must be logged in.")
# exits quietly if nothing is wrong
def _q_index [html] (request):
"""Here is some security-critical material ..."""
_q_access
is used to impose an access control
condition on an entire object; this saves the user from having to
add access control checks to each attribute and running the risk of
forgetting one. At every step of traversal, _q_access
is checked for and called if present. The function can raise an
exception to abort further traversal; if no exception is raised,
any return value is ignored.
How Quixote works: The HTTPRequest class
.response
-- a HTTPResponse instance
.session
-- a Session instance
request.get_environ('SERVER_PORT', 80)
-- various standard CGI environment vars
request.get_form_var('user')
-- get form
variables
request.get_cookie('session')
-- get cookie
values
request.get_url(n=1)
-- get URL of request, chopping off n pieces
request.get_accepted_types()
-- get a dict mapping {MIME type: quality value}
browser, version =
request.guess_browser_version()
return request.redirect('../../catalog')
-- redirect to the specified URL
How Quixote works: The HTTPResponse class
.headers
-- dict of HTTP headers
.cache
-- number of seconds to cache response
.set_content_type('text/plain')
-- specify MIME content type of the response
.set_cookie('session', '12345', path='/')
-- return a Set-Cookie header to the client
.expire_cookie('session')
-- delete cookie from the client
How Quixote works: Enabling sessions
Instead of Publisher
, use
SessionPublisher
:
from quixote.publisher import SessionPublisher
app = SessionPublisher('quixote.demo')
The request will then have a .session
attribute
containing a Session
instance.
Two other classes:
SessionManager
-- stores/retrieves sessions
Session
-- can hold .user
attribute
SessionManager is a dictionary-like object responsible for
storing sessions. The default implementation stores sessions
in-memory, but you can provide your own session manager that stores
them using a persistence mechanism such as ZODB or a relational
database.
The only interesting attribute of Session
is a
.user attribute, whose value is undefined by Quixote and left up to
the application.
Running Quixote applications
Several options:
- CGI (slow, not recommended)
- FastCGI (fast and scalable; Apache mod_fastcgi possibly
buggy)
- Apache + mod_scgi (fast, scalable; recommended for high
traffic)
- Pure Python web server (slower, not many features, but useful
for standalone applications).
Running Quixote applications: CGI/FastCGI
demo.cgi:
#!/www/python/bin/python
# Example driver script for the Quixote demo:
# publishes the quixote.demo package.
from quixote import Publisher
# Create a Publisher instance, giving it the root package name
app = Publisher('quixote.demo')
# Open the configured log files
app.setup_logs()
# Enter the publishing main loop
app.publish_cgi()
The above code will also handle FastCGI. CGI scripts will run
through publish_cgi()
once and exit; under FastCGI it
will loop and service multiple requests.
Running Quixote Applications: Stand-alone
Running a server on localhost is really easy:
import os, time
from quixote.server import medusa_http
if __name__ == '__main__':
s = medusa_http.Server('quixote.demo', port=8000)
s.run()
This can even be used for writing desktop applications: run a
Quixote server locally and use Python's webbrowser.open() module to
open a browser pointing at it.
PTL: Overview
PTL = Python Templating Language
example.ptl:
# To callers, templates behave like regular Python functions
def cell [html] (content):
'<td>' # Literal expressions are appended to the output
content # Expressions are evaluated, too.
'</td>'
def row [html] (L):
# L: list of strings containing cell content
'<tr>'
for s in L:
cell(s)
'</tr>\n'
def loop (n): # No [html], so this is a regular Python function
output = ""
for i in range(1, 10):
output += row([str(i), i*'a', i*'b'])
return output
PTL: Using templates
Templates live in .ptl files, which can be imported. To enable
this:
import quixote ; quixote.enable_ptl() # Enable import hook
Templates behave just like Python functions:
>>> import example
>>> example.cell('abc')
<htmltext '<td>abc</td>'>
>>> example.loop()
<htmltext '<tr><td>1</td><td>a</td><td>b</td>...</tr>\n'>
In .ptl files, methods can even be PTL files.
PTL: Comparison with other syntaxes
System |
Syntax |
Apache SSI |
<!--#include
virtual="/script/"--> |
PHP |
<?php func()?> |
ASP |
<% func() %> |
ZPT |
<span
tal:replace="content">...</span> |
PTL |
def f [html] (): content |
PTL's advantages over other syntaxes:
- Python users have no new rules to learn; templates have
parameters, keyword parameters, can be methods, etc.
- All Python language features are available:
for
,
if
, while
, exceptions, classes, nested
functions.
- Common code can be easily refactored out into functions
- Most templating syntaxes use HTML and provide escapes into
code. As templates get more complicated, the density of escape
sequences rises; this is worsened by escape sequences being a
multi-character sequence such as '^lt;?' or '<!--#'. PTL uses
code and escapes into HTML, so it doesn't suffer from this problem,
and the escape sequence is a single character (the quotation
mark).
PTL: Automatic escaping
def no_quote [plain] (arg):
'<title>'
arg # Converted to string
'</title>'
def quote [html] (arg):
'<title>'
arg # Converted to string and HTML-escaped
'</title>'
>>> no_quote('A history of the < symbol')
'<title>A history of the < symbol</title>'
>>> quote('A history of the < symbol')
<htmltext '<title>A history of the < symbol</title>'>
By using '[html]' instead of '[plain]', string literals are
compiled as htmltext
instances. When combined with
regular strings using a + b
or '%s' % b
,
htmltext
HTML-escapes the regular string.
This mechanism is both a convenience for the application writer
and a security feature. Cross-site scripting (XSS) attacks are a
class of security hole caused by forgetting to escape HTML tags in
untrusted data; you might forget to escape the title of a mail
message, for example. An attacker could insert JavaScript that
opened pop-up windows or redirected the user to another site.
It's easy to forget the required function call, and forgetting
to escape a single snippet is all it takes. PTL's automatic
escaping trusts only the string literals supplied in the program
text, and it also fails securely. When you mess up, the usual
result is double-escaping a string, resulting in web site users
seeing '<p>blah blah blah...'. This is embarrassing, but
doesn't open up any vulnerabilities.
It should be remembered that while we think PTL is really neat,
it's still optional. Using alternative templating isn't hard, and
there are Quixote users who never use PTL.
Generating HTML: Nevow
Graham Fawcett wrote a small Nevow implementation.
from nevow import *
_q_exports = ['template']
def template(doctitle, docbody):
"""
A page template. The stylesheet is there as a visual check
that class and id attributes are set properly.
"""
return html [
head [
title[doctitle],
style(type='text/css')[
'body { background-color: lightblue; } ',
'.section { border: blue 3px solid; padding: 6px; } ',
'#mainbody { background-color: white; } '
],
],
body [
h1 [doctitle],
div({'class':'section'}, id='mainbody')[docbody],
hr
]
]
Toolkit: Form processing
Quixote contains a set of classes for implementing forms.
Example:
from quixote.form import Form
class UserForm (Form):
def __init__ (self):
Form.__init__(self)
user = get_session().user
self.add_widget("string", "name", title="Your name",
value=user.name)
self.add_widget("password", "password1", title="Password",
value="")
self.add_widget("password", "password2",
title="Password, again", value="")
self.add_widget("single_select", "vote",
title = "Vote on proposal",
allowed_values=[None] + range(4),
descriptions=['No vote', '+1', '+0',
'-0', '-1'],
hint = "Your vote on this proposal")
self.add_widget("submit_button", "submit",
value="Update information")
The basic idea is that you subclass the Form
class
to create a single new form. A form contains a number of widgets.
Widgets represent a form element such as a text field or checkbox,
or multiple form elements; multiple form elements could be used to
enter a date, for example. Widgets can also perform additional
checks such as requiring that a text field contain an integer.
The framework handles processing of a form. The
Form
instance creates widgets in its
__init__
method. The render()
method is
called to generate HTML to display the form. On submitting the
form, the process()
method is called to read the
values of fields and perform any error checking, and if no errors
are reported, the action()
method is called to perform
the actual work of the form (e.g. inserting data into a database,
sending an e-mail, etc.).
For a more detailed explanation of the form framework, see part
2 of the Quixote tutorial at http://www.quixote.ca/learn/2.
Toolkit: Form processing (cont'd)
class UserForm (Form):
...
def render [html] (self, request, action_url):
standard.header("Edit Your User Information")
Form.render(self, request, action_url)
standard.footer()
def process (self, request):
values = Form.process(self, request)
if not (values['password1'] == values['password2']):
self.error['password1'] = 'The two passwords must match.'
return values
def action (self, request, submit, values):
user = request.session.user
user.name = values['name']
if values['password1'] is not None:
user.password = values['password1']
return request.response.redirect(request.get_url(1))
This render()
implementation uses the default
rendering of the form, but wraps our own header/footer around that
rendering.
process()
gets the values and performs error
checks.
action()
does the work of the form, and can assume
the input data is all correct.
Toolkit: Serving Static Files
For Quixote-only apps, you often need to return static files
such as PNGs, PDFs, etc.
from quixote.util import StaticFile, StaticDirectory
_q_exports = ['images', 'report_pdf']
report_pdf = StaticFile('/www/sites/qx/docroot/report.pdf',
mime_type='application/pdf'
images = StaticDirectory('/www/sites/qx/docroot/images/')
The quixote.util
module also contains helpers for
XML-RPC, for streaming files back to the client, etc.
These classes includes a number of conveniences. If you don't
provide a MIME media type, Python's mimetypes
module
will be used to guess the correct MIME type. Files can optionally
be cached in memory to save on I/O.
StaticDirectory
defaults to security: it doesn't
follow symlinks or allow listing the directory unless you
explicitly enable this.
Design patterns: Good URL design matters
The canonical bad URL:
http://gandalf.example.com/cgi-bin/catalog.py
?item=9876543&display=complete&tag=nfse_lfde
A better set of URLs:
http://www.example.com/catalog/9876543/complete
.../brief
.../features
Quixote features such as _q_lookup
make it easy to
support sensible URLs.
Design patterns: Separate problem objects and UI classes
Don't mix the basic objects for your problem with the HTML for
the user interface.
For each object, represent it by a class and put the user
interface in a *UI
class elsewhere.
Advantages:
- You can write unit tests for the problem objects.
- Problem objects are usable from scripts.
Structure of an application: PyCon proposal submission
Directory organization:
qx/bin/ # Various scripts
qx/conference/__init__.py # Marker
qx/conference/objects.py # Basic objects: Proposal, Author, Review
qx/ui/conference/__init__.py
qx/ui/conference/email.ptl # Text of e-mail messages
qx/ui/conference/standard.ptl # Header, footer, display_proposal()
qx/ui/conference/pages.ptl # Login form, base CSS
qx/ui/conference/proposal.ptl # ProposalUI class
Design patterns: common filenames
Naming conventions for common modules:
application/ui/standard.ptl
--
contains header()
, footer()
, and any
other commonly-used HTML snippets.
application/session.py
--
Custom session class (if you have one)
application/pages.ptl
-- PTL pages not tied
to an object