For our first XQuery exercise we’ll be working with a collection of Shakespeare’s
plays (actually not coded in TEI). We have uploaded these into our eXist XML database, and you can locate them in this collection: collection('/db/shakespeare/plays')
Write XQuery expressions for each of the following tasks using the eXide window in our eXist database, and test them by hitting the Eval
button. Then paste your XQuery expressions into a text
file, adding comments as needed. You will be submitting your text file to Courseweb.
<TITLE>
element is, which you will need to know in order to construct the XPath to retrieve
it. The simplest answer is a single XPath expression. The output should look something
like:
<TITLE>The Tragedy of Hamlet, Prince of Denmark</TITLE> <TITLE>The Tragedy of Macbeth</TITLE> <TITLE>The Tragedy of Romeo and Juliet</TITLE>
text()
or data()
or string()
. Your output should look something like:
The Tragedy of Hamlet, Prince of Denmark The Tragedy of Macbeth The Tragedy of Romeo and Juliet
<TITLE>
element. You will need to use count()
and distinct-values()
, and you’ll need a construction involving
count(of something) gt 40 .
Find the collection, drill down to the <PLAY>
elements in the collection (you know there are three of them), then filter them based
on whether or not they contain more than 40 <SPEAKER>
elements. Once you’re getting the one play that meets that description, you can add
a path step to retrieve its <TITLE>
.
base-uri()
function can be useful. Try appending base-uri()
to your XQuery expression and run it: What result do you see in the output window, and what is it telling you? /
) in the preceding results of base-uri()
? How could we remove the previous string of text in our output? We would use the tokenize()
function (which you can look up on at the w3schools list of XPath functions or in the Michael Kay book). That function breaks apart a string of text by dividing it at a particular regex pattern, and in this case the pattern is the forward slash. The tokenize()
function returns tokens or broken-off pieces of a string: each chunk before and after the regex you enter. In order to isolate just the piece we want, we can identify the pieces by their position in the sequence of broken pieces: is it the first token, the second, the third, or the last one, whatever it is? To retrieve the first token, after you run the tokenize function, you can place a predicate holding the position value: [1]
, [2]
, etc. To retrieve the last item in a series, without knowing its numerical position, you can use the last()
function (which you can read about in the same resources we mentioned above or in The Xpath functions we use the most). Note that nothing goes inside the parentheses in
last()
. With this information, then, how would you write your XQuery to return just the last part of the results of the base-uri()
function, the part that appears after the last forward slash character? Record your expression. When you have completed the assignment upload your text file containing your XQuery expressions to Courseweb.