NewtFire logo: a mosaic rendering of a firebelly newt
newtFire {upg dh|ds}
Creative Commons License Last modified: Friday, 03-May-2019 10:43:35 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

Schematroll, the Schematron mascotMeet Schematroll, the Schematron mascot! Schematroll is a cross between a bilby and a bettong.

Preliminaries

To work on this assignment, you will need to to find and do the following:

The goal:

The Digital Mitford project is working on a collection of prosopography data: a record of people, places, organizations, published works, and other named entities relevant to British author Mary Russell Mitford’s world in the nineteenth century. After some years of collaborative research the collection of data (which we call our Site Index) contains thousands of entries, and it keeps growing as members of our project team contribute batches of new entries in the course of their research. It’s common for our editors to make typographical errors as they enter details about historical people in particular, since these entries can be especially complicated! That is why we need to write some Schematron rules to help people find and correct the common errors they will likely make they are coding.

As you work on the rules below, think about how to group them logically into related pattern elements. You can use an @id on pattern elements to help label them and organize your work. Also, be sure to associate your Schematron file with the XML file you are testing as soon as you write your first rule so you can see if it is responding as you expect.

Survey the code

Skim through the Digital Mitford project XML you downloaded, and get a sense of how it is organized and the way we have nested information about individuals inside each person element. Notice:

Schematron rules to write and test

  1. On the tei:person element, we want to check the way its @xml:id is written. In our project when a historical person is given a unique identifier, that @xml:id value is supposed to begin with the most distinctive part of the person’s name, their last name. Since we code the tei:surname element as a descendant of tei:person, you may write a Schematron rule that tests whether the @xml:id starts with the contents of the TEI's surname element. Hint: To specify an XML node (an element or attribute) as an argument in an XPath function, simply give the element name (without quotation marks) instead of a specific string.

  2. Sometimes our editors don’t capitalize proper names! Check that all the tei:forename, tei:surname, and tei:placeName elements, as well as any tei:persName elements that hold text and do not wrap around forename and surname elements start with capital letters. Hints:
    • You can do that with one rule, and you can set multiple contexts using the union operator or pipe: | to join these together. You last used the pipe when writing Relax NG. You can use it in Schematron (and XSLT) contexts here specifically to join together multiple context items in one rule.
  3. Now let’s take a look at the dates coded in this file, coded in the tei:birth and tei:death elements. All death dates need to be later than birth dates, but surprisingly, the TEI does not have a built-in way of checking this. Write a Schematron rule to flag when the dates coded in the @when attributes on any tei:birth and tei:death elements don’t make sense. For the purposes of this homework, it is fine to concentrate only on the @when attributes coded on tei:birth and tei:death (you can ignore other attributes containing dates).
    • How to test for this: Some dates are given as full ISO years (yyyy-mm-dd) and others are only partial and those, alas, will NOT convert to a machine-readable date with xs:date(), so we do not want to use that function here. Instead, we recommend that you work with the tokenize() function to isolate the year as the piece that we really need to look at, that is, the four-digit year that sits in front of the first hyphen. To reliably capture this piece, write the tokenize() function to break the attribute values in pieces around hyphens (tokenize on the hyphen) and write a position predicate to grab the first of the tokens. (Note: tokenize() is a wonderfully adaptable function! Even if the date value lacks any hyphens and only contains a year, this will still return that year since the token just won’t break off!)
    • Remember, you are testing to see when a birth year is later than a death year, so you need to write a test that uses comparison operators.

Submission

Upload your completed Schematron schema AND the si-Add-MRMsample.xml file with your Schematron associated to Courseweb, and follow our standard filenaming conventions for homework assignments uploaded to Courseweb.