RELAX NG for Python

IPC10
February 7, 2002

A.M. Kuchling
www.amk.ca

What is RELAX NG?

RELAX NG (or RNG) is a schema language for XML.

Schema languages let you check whether an XML document conforms to a given schema: that it follows a certain structure of elements and attributes.

RELAX NG:

Other Schema Languages (I): DTDs

DTDs date back to SGML, and are also in XML 1.0.

<!ELEMENT title (#PCDATA)>
<!ATTLIST title
        id ID #IMPLIED
        class CDATA #IMPLIED
        title CDATA #IMPLIED
        %i18n;>

Opinion in the XML community seems to be running against them:

So long, and thanks for all the fish...

Other Schema Languages (II): XML Schema

Variously called XML Schema or XSD.
(http://www.w3.org/XML/Schema)

<xs:schema xs="http://.../2001/XMLSchema>
  <xs:element name="title">
    <xs:complexType>
      <xs attribute name="id" type="xs:ID" 
                       use="required"/>
      <xs attribute name="class" type="xs:string"/>
      <xs attribute name="title" type="xs:string"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

Drawbacks:

RELAX NG Example

<? xml version="1.0"?>
<element name="title" xmlns="http://relaxng.org/...">

  <attribute name="id"><text/></attribute>

  <attribute name="class">
    <choice>
      <value>big</value><value>small</value>
    </choice>
  </attribute>

  <attribute name="timestamp">
    <data type="dateTime" 
          datatypeLibrary="http://.../XMLSchema-datatypes"/>
  </attribute>
  
</element>

Implementation: Derivatives

The algorithm for RELAX NG is remarkably elegant, and is based on computing the derivative of a pattern.

A pattern P is nullable if the empty string (or empty tree) matches it.

The derivative of a pattern P w.r.t tree X =
a pattern matching what's left of P after matching X.

Pattern Text Derivative
a+b+ a a*b+
a*b+ aaa a*b+
a*b b Empty pattern

Current Status

Availability

The code is in the pyxml.sourceforge.net CVS
under sandbox/relaxng.

These slides:
http://www.amk.ca/talks