To work on this assignment, you will need to to find and do the following:
<sch:schema>
down. To write Schematron rules for a document in the TEI namespace, you will then replace this with:
<schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2" xmlns:sqf="http://www.schematron-quickfix.com/validator/process" xmlns="http://purl.oclc.org/dsdl/schematron"> <ns uri="http://www.tei-c.org/ns/1.0" prefix="tei"/> </schema>
<schema>
root element.tei:
prefix before each of your elements since we are working with a document in the TEI namespace; otherwise none of your schema rules involving elements will fire! However, we do not use that prefix before attributes because the attributes are in no namespace.The Digital Mitford project is working on a collection of prosopography data: a record of people, places, organizations, published works, and other named entities relevant to British author Mary Russell Mitford’s world in the nineteenth century. After some years of collaborative research the collection of data (which we call our Site Index
) contains thousands of entries, and it keeps growing as members of our project team contribute batches of new entries in the course of their research. It’s common for our editors to make typographical errors as they enter details about historical people in particular, since these entries can be especially complicated! That is why we need to write some Schematron rules to help people find and correct the common errors they will likely make they are coding.
As you work on the rules below, think about how to group them logically into related pattern
elements. You can use an @id
on pattern
elements to help label them and organize your work. Also, be sure to associate your Schematron file with the XML file you are testing as soon as you write your first rule so you can see if it is responding as you expect.
Skim through the Digital Mitford project XML you downloaded, and get a sense of how it is organized and the way we have nested information about individuals inside each person
element. Notice:
tei:person
has an @xml:id
whose value is a distinct identity marker.tei:person
elements there are tei:persName
elements, some of which contain nested tei:surname
, and tei:forename
elements.tei:birth
and tei:death
with attributes and contents telling us about when and where a person was born and died.tei:person
elements contain a biographical tei:note
element with more information. These notes sometimes include references (made with @ref
attributes) to people, places, books, and more listed elsewehere in the site index.On the tei:person
element, we want to check the way its @xml:id
is written. In our project when a historical person is given a unique identifier, that @xml:id
value is supposed to begin with the most distinctive part of the person’s name, their last name. Since we code the tei:surname
element as a descendant of tei:person
, you may write a Schematron rule that tests whether the @xml:id
starts with the contents of the TEI's surname element. Hint: To specify an XML node (an element or attribute) as an argument in an XPath function, simply give the element name (without quotation marks) instead of a specific string.
tei:forename
, tei:surname
, and tei:placeName
elements, as well as any tei:persName
elements that hold text and do not wrap around forename and surname elements start with capital letters. Hints:|
to join these together. You last used the pipe when writing Relax NG. You can use it in Schematron (and XSLT) contexts here specifically to join together multiple context items in one rule.tei:birth
and tei:death
elements. All death dates need to be later than birth dates, but surprisingly, the TEI does not have a built-in way of checking this. Write a Schematron rule to flag when the dates coded in the @when
attributes on any tei:birth
and tei:death
elements don’t make sense. For the purposes of this homework, it is fine to concentrate only on the @when
attributes coded on tei:birth
and tei:death
(you can ignore other attributes containing dates).
yyyy-mm-dd
) and others are only partial and those, alas, will NOT convert to a machine-readable date with xs:date()
, so we do not want to use that function here. Instead, we recommend that you work with the tokenize()
function to isolate the year as the piece that we really need to look at, that is, the four-digit year that sits in front of the first hyphen. To reliably capture this piece, write the tokenize()
function to break the attribute values in pieces around hyphens (tokenize on the hyphen) and write a position predicate to grab the first of the tokens. (Note: tokenize() is a wonderfully adaptable function! Even if the date value lacks any hyphens and only contains a year, this will still return that year since the token just won’t break off!)
Upload your completed Schematron schema AND the si-Add-MRMsample.xml file with your Schematron associated to Courseweb, and follow our standard filenaming conventions for homework assignments uploaded to Courseweb.