ZPUG DC
December 2, 2004
A.M. Kuchling
www.amk.ca
amk @ amk.ca
The Semantic Web has been a W3C project since around 1999.
The existing Web of HTML documents is good for humans:
The Semantic Web will augment the existing human-readable Web with structured data that's easy for software to process.
The Semantic Web is split into three layers:
Web Ontology Language (OWL) Relationships between vocabularies |
|
RDF Schema: Vocabulary definitions |
|
Resource Description Framework (RDF) Assertions of facts |
Resource X is named "Drew". |
RDF is a specification that defines a model for representing the world, and a syntax for serializing and exchanging the model.
Facts are 3-tuples of (subject, property, object).
Subject has a property of object |
---|
Resource X has a name of "Drew" |
ISBN 1234567890 has an author of resource X |
Resource X has a type of Person |
Dublin Core
http://purl.org/dc/elements/1.1/
FOAF (Friend-of-a-friend)
http://xmlns.com/foaf/0.1/
DOAP (Description of a Project)
http://usefulinc.com/ns/doap#
Resources are identified by URIs
http://example.com/person/0042
,
urn:isbn:1930110111
How are properties identified? They could be just names or serial numbers, but that wouldn't be very scalable.
Instead, properties have URIs just like resources.
http://amk.ca/xml/review/1.0#
http://amk.ca/xml/review/1.0#subject
Graphs are usually represented as a bunch of (subject,property,object) 3-tuples.
Subject | Property | Object |
---|---|---|
http://example.com/rev1 |
rev:subject → |
urn:isbn:1930110111 |
urn:isbn:1930110111 |
dc:title → |
"XSLT Quickly" |
urn:isbn:1930110111 |
dc:creator → |
http://example.com/author/0042 |
http://example.com/author/0042 |
FOAF:surname → |
DuCharme |
http://example.com/author/0042 |
FOAF:homepage → |
http://www.snee.com/bob/ |
http://example.com/author/0042 |
FOAF:pastProject → |
urn:isbn:1930110111 |
RDF Core defines an XML-based serialization for RDF.
<rdf:RDF xmlns:FOAF="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rev="http://amk.ca/xml/review/1.0#"> <!-- Implies rdf:type property is rev:Review --> <rev:Review rdf:about="http://example.com/rev1"> <rev:subject rdf:resource="urn:isbn:1930110111"/> </rev:Review> <rdf:Description rdf:about="http://example.com/author/0042"> <FOAF:firstName>Bob</FOAF:firstName> <FOAF:homepage rdf:resource="http://www.snee.com/bob/"/> <FOAF:pastProject rdf:resource="urn:isbn:1930110111"/> <FOAF:surname>DuCharme</FOAF:surname> </rdf:Description> </rdf:RDF>
An informal syntax that's easier to read and easier to scribble.
@prefix rev: <http://amk.ca/xml/review/1.0#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix FOAF: <http://xmlns.com/foaf/0.1/> . <http://example.com/author/0042> FOAF:firstName "Bob"; FOAF:surname "DuCharme"; FOAF:homepage <http://www.snee.com/bob/>; FOAF:pastProject <urn:isbn:1930110111> . <http://example.com/rev1> rev:subject [ = <urn:isbn:1930110111>; dc:title "XSLT Quickly"; dc:creator <http://example.com/author/0042>; dc:publisher "Manning" ] .
Virtues:
Sins:
The most basic form of RDF software is simply an RDF parser. Parsers are available for most of the languages you might need:
Here's a Python example using rdflib 2.0.4 (www.rdflib.net).
# # Initial setup -- create a TripleStore to hold RDF data # from rdflib.TripleStore import TripleStore store = TripleStore()
You can add the contents of several URLs, parsing the data as RDF/XML:
store.load('http://www.amk.ca/amk.rdf') store.load('http://www.python.org/pypi/?project=Twisted?format=doap') store.load(...)
You can output the contents of a store:
print store.serialize(format='xml')
You can add triples to a store:
from rdflib.URIRef import URIRef from rdflib.Literal import Literal from rdflib.Namespace import Namespace REVIEW_NS = Namespace('http://amk.ca/xml/review/1.0#') REVIEW_SUBJECT = REVIEW_NS['subject'] # Equivalent to: ##REVIEW_SUBJECT = URIRef('http://amk.ca/xml/review/1.0#subject') book_uri = URIRef('urn:isbn:0609602330') t = (URIRef('http://www.amk.ca/books/h/Isaacs_Storm.html'), REVIEW_SUBJECT, book_uri) store.add(t)
You can also remove a triple:
store.remove(t)
The most general query method is triples(), which takes a (subject, property, object) 3-tuple, returning an iterator over the matching triples.
For example, to list all things which have a dc:title property:
>>> DC_TITLE = DC_NS['title'] >>> for s,p,o in store.triples((None, DC_TITLE, None)): ... print s,p,o ... urn:isbn:0609602330 http://purl.org/dc/elements/1.1/title \ Isaac's Storm urn:isbn:1930110111 http://purl.org/dc/elements/1.1/title \ XSLT Quickly >>>
Someday, there will be a query language (SPARQL example):
SELECT ?title WHERE (<urn:isbn:1930110111> dc:title ?title)
Lets us define vocabularies (sets of classes and/or properties).
Example vocabulary:
Review
class.subject
property.First, define a prefix for the schema's namespace URI:
@prefix rev: http://amk.ca/xml/review/1.0#
To declare that a particular resource is a rev:Review, assert that the resource's rdf:type property is the class:
# Declare a resource <http://example.com/review1> rdf:type rev:Review .
Describe what this resource is reviewing; what's the subject?
# Supply subject <http://example.com/review1> rev:subject <http://www.music.com/album/6542>. <http://example.com/review1> rev:subject <urn:isbn:1930110111>.
So how do we define the class?
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rev: <http://amk.ca/xml/review/1.0#> . # Declare a Review class rev:Review # class URI: http://amk.ca/xml/review/1.0#Review rdf:type rdfs:Class ; rdf:ID "Review" ; rdfs:comment """Reviews are resources that express an opinion about some other resource.""" ; . # Declare a subclass of Review. rev:ComparativeReview rdf:type rdfs:Class ; rdfs:subClassOf rev:Review . rdfs:comment """Comparative reviews examine multiple resources, comparing their relative merits and usually offering an opinion about which one is the best.""" ;
You can also specify properties in a vocabulary. The following fragment defines the rev:subject property:
rev:subject rdf:type rdf:Property; rdfs:label "Subject property" ; # Resources which can have this property rdfs:domain rev:Review ; # Values this property can take rdfs:range rdfs:Resource ; rdfs:comment "Value is the resource being reviewed." ; . rev:title rdf:type rdf:Property; # This property only takes literal values rdfs:range rdfs:Literal; .
With RDF Schema, we know:
We don't know:
OWL is a W3C language for defining this sort of relationship. Possible relationships:
Here's an OWL declaration of a class representing persons:
@prefix gen: <http://genealogy.example.com/schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . gen:Person rdf:type owl:Class; rdf:ID "person" ; rdfs:comment "Resource representing a person." ; owl:equivalentClass foaf:Person; .
Define a property:
gen:ancestor # Declare as a transitive property: # X -> Y, Y -> Z implies X -> Z rdf:type owl:TransitiveProperty; rdfs:domain gen:Person; rdfs:range gen:Person; # Declare as inverse of some other property owl:inverseOf gen:descendant; .
OWL adds the ability to indicate when two classes or properties are identical.
OWL declarations provide additional information to let rule-checking and theorem-proving systems work with RDF data.
So how much of this stuff do you need to learn about and use?
But we don't need to aim for the stars. Simple things can be done without much effort, and can still be useful:
There are signs of life: FOAF has caught on, DOAP is rising, and many small projects are using RDF internally.
http://<whatever>.livejournal.com/data/foaf
.
These slides: www.amk.ca/talks/2004-12-02
For further information:
What Python library to use?
PyCon will be March 23-25 at GWU's Cafritz Center.
Deadline for proposals: Dec. 31st.
Call for papers:
http://www.python.org/pycon/2005/cfp.html
Proposal submissions: http://submit.pycon.org