Spring 2023: Classes meet M W F 12:20 - 1:10pm in Witkowski 109.
Schedule: Spring 2023
DIGIT 210: Lionpath class number: 5835. This course fulfills a core Digital Humanities requirement for the Digital Media, Arts, and Technology (DIGIT) major and an elective toward the Data Visualization Minor at Penn State.
Instructor
Dr. Elisa
                  Beshero-Bondar (Dr. B
), Professor of Digital Humanities and Program
               Chair of DIGIT.
- E-mail: eeb4 at psu.edu
- Office locations: 128 Kochel, also the lounge area near Witkowski 109, and online over Slack and Zoom.
- Office Hours: Mon. 1:10 - 2:30pm near Witkowski 109, Tues. 1:45 - 3:30 pm in Kochel 128, and by appointment.
Text Analysis: Course Description
This course orients you to text and document data formats, and engages you in
               hands-on coding and programming activities to manipulate these: to explore how computers read and process natural language
               patterns in texts, to and translate unstructured texts into
               structure to track, visualize, and explore complex data. In this course, you will learn
               methods for marking, extracting, and analyzing data from digital documents to produce
               infographics such as graphs, charts, diagrams, maps, which you will design in the
               context of real projects. This course is meant to be complementary with DIGIT 110: Text Encoding, but where the
               emphasis in that course is on curating and preparing reading views of documents, this
               course concentrates on analyzing data. Neither
               course is meant to be a prerequisite for the other: you may take either one as a
               beginner. Returning students (in either semester) review and help mentor beginning
               students for overlapping units they have experienced in the other course.
Learning to Code: Our Context
You do not need any background at all with computer programming or web development to succeed in this course. We teach practical programming as a foundational skill (like reading, writing, and arithmetic) that all students should experience regardless of major or background. We also teach it in the writerly context of clear communication and documentation, which helps to build communities and connect projects over long periods of time.
Learning Objectives:
- Work with Texts as Data-bearing Digital Artifacts - Prepare digital documents to curate and organize information accessible on the worldwide web.
- Gain practical experience with document data curation
- Learn and practice coding of various kinds to address unstructured texts and to produce marked-up structure at scale to support analytical research
 
- Gain confidence with reading and writing code in multiple environments- Gain code literacy: Recognize common patterns in code syntax.
- Gain experience with looking stuff up and applying it to your purpose
- Gain confidence in your ability to learn, adapt, and experiment with code
 
- Gain Experience with Natural Language Processing (NLP), Document Data Modeling, 
                  Distant Reading, and Autotagging Techniques- Write code to apply searching and data extraction methods through multiple kinds of pattern-matching algorithms, including forms of regular expression matching. Take conventional boolean searches and library database searches to new levels.
- Learn how to autotag enormous texts or collections of texts, for practical results: to code the structure of enormous texts from a distance in order to locate data and make them accessible for navigation.
- Apply mining anddrilling methods to interact with texts and visualizations differently than we could do "manually" or with unassisted eyes and brains.
- Reflect on thepossibilities and limitations of text data processing and visualization.
 
-  Gain Project Design and Editing Experience- Gain digital editing experience with proposing, designing, and contributing to one or more digital research projects, applying coding to the preserving, sharing, and investigating of textual resources
- Transform unstructured and structured text nto publishable web formats, to build a project website.
- Design navigation elements, and build visual aids and models (such as timelines and tree diagrams) from texts: to generate charts and images from extracted data
 
Optional Textbook and Other Class Resources
- Michael Kay, XSLT 2.0 and XPath 2.0: Programmer’s Reference, 4th edition (Wiley Publishing, 2008) ISBN-13: 978-0-470-19274-0 This book is optional, and I have not requested it at the bookstore. I have two copies, and it is available in the Penn State Libraries as an e-book. This is really the authoritative word on XSLT and XPath, written by a designer of the official W3C specifications of XSLT that we’re using. We’re not requiring that you buy it, but we recommend it to have a powerful reference at your fingertips and for learning more on your own. There’s a Kindle edition available but poorly designed for searching, so we prefer the hardcover print edition. If you’re going to purchase it, be sure you pick up the latest edition (from 2008), and not the earlier versions.
Software to install
Download and install the following software on your own personal computer(s) on or before the first day of class. These software tools are available in our campus computing labs, too.
- <oXygen/> XML Editor. (You will probably have this installed from DIGIT 100 or 110.) The DIGIT program has purchased a site license for this software, which is installed in Kochel 77, the Lilley Library computers, and Witkowski 109. The license also permits students enrolled in the course to install the software on their home computers (for course-related use only). When installing this on your own computers, you will need the license key, which we have posted on our course Announcements section of Canvas.
- AntConc: (You may have this installed from DIGIT 100.) Free corpus text analysis tool.
- We will ask you to install Python version 3.8 or higher on your computer, and install PyCharm Edu to assist in learning and writing Python code with syntax checking. Follow instructions and links from Pycharm ( https://www.jetbrains.com/help/pycharm/quick-start-guide.html#meet ) paying attention to what you need for your own computer systems. Feel free to download and explore Pycharm Edu on your own before we start working with it together: https://www.jetbrains.com/pycharm-edu/. Also, configure Anaconda so it is available to work within Pycharm following this guide: https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html. (We will provide guidance on this in class.)
- Zoom: Make sure your Zoom installation is up-to-date, and you are ready to connect. Sometimes we will record portions of class meetings and tutorial sessions for future reference to share over Zoom. Look for these in Canvas Announcements and use the Zoom menu option in Canvas to access these meetings.
- We will use GitHub for for sharing code and for project management. Create an account (choose the free options) at the https://github.com and install the GitHub client software for your operating system on your own machine on your computer. (We will explain how to use git and GitHub this in our course.)
- We will use the Slack chat platform for discussion and for asking questions (see https://slack.com/help/articles/218080037-Getting-started-for-new-members). Download and install the Slack client, configuring your account to use use your Penn State email address (the official address, which looks like xyz123@psu.edu, and not an alias based on your name that you may have set up), so you can join our Slack workspace: DIGIT-coders. When you receive an invitation to join this workspace you should accept.
- Later in the semester we may ask you to install a local copy of the eXist-db XML database, which you can download from https://exist-db.org/.
Class Web Resources:
- Course Home Website: https://newtfire.org/courses/textAnalysis/ Home of our syllabus and schedule.
- textAnalysis-Hub: https://github.com/newtfire/textAnalysis-Hub Class GitHub Repository and Issues Board
- Canvas: http://canvas.psu.edu To submit homework assignments and exams and read private course announcements
- File Conventions for Canvas Assignments
- Guidelines for Projects Developed in This Course
- Student Course Projects
- Explanatory Guides and Exercises: Complete List
- More resources will be added as we work together this semester.
Grading:
Homework Exercises (30%):
To keep up with this class, you must work on exercises regularly. Each day will involve some small assignment, to prepare you for the next of class, and to help you to build your course project. Homework grading: Homework is scored based on satisfactory completion, so the grade for the homework portion is based on the percentage of homework marked complete. Earn an A- in homework by having completed 90% of it. Homework may be returned with a request to revise and resubmit. It will not count as completed unless resubmitted.
About homework assignments: Coding and project review exercises in this course are about your active learning, and not—as in other courses—a way of testing whether you have already learned something we covered in class or in an assigned reading. You may often need to look up how to do something that you don’t already know how to do. Often there will be multiple ways of accomplishing the task and I am not simply looking for you to do things perfectly in just one way. Instead, I am looking for signs of your active learning process as you take on a challenge. Documenting problems is key to learning, and sometimes just writing out what you are trying to do helps lead you to a solution! There may be times when you don’t get the result you want in the homework, and that is to be expected! In those cases you can still get full credit for the assignment if you’ve made a serious attempt and if you submit, along with your code, a description of what else you tried, what results you expected, what results you got, and what you think went wrong. Getting stuck is part of the learning process. You will see me get stuck sometimes, and I will need your eyes to help me fix something! As long as you’ve described your understanding of the problem and your attempts to resolve it on your own, you will do well: documentation of how you get stuck is key. One of our goals is to form a supportive coding community in this class, so we are comfortable with unsticking each other.
I will read and evaluate all student homework, and will post assessments on Canvas. Coding homework is basically marked complete
 (1 point) or incomplete
 or redo
 (0 pts). If you are asked to redo an assignment it is considered incomplete or problematic. If you resubmit a redo
 to correct a serious problem, you will receive full credit for the assignment. I will post comments for feedback and learning purposes and you will find these comments on Canvas, sometimes in your coded homework file. If you have
                  not engaged with the assignment adequately (whether that means solving the tasks or
                  discussing the coding obstacles you encountered and how you dealt with them), I may
                  ask you to meet with me to review the issues and then complete a followup (redo) task
                  in order to receive credit. For assignments with posted solutions, I will invite you to
                  review the posted solution on GitHub and comment on it (we will show you how to do
                  this) to address something you learned from the solution or did in a different way.
                  For some assignments where we review posted solutions together in class, we will write back to you with individual comments only if your
                  specific submission raises an issue that we don’t address elsewhere. When much of the class is stuck
 on something, we will go over assignments together in class, too. If I don’t return
                  your assignment, that means that I found nothing to add to our posted solution. In those cases, if you have any questions about your work after reading the posted solution, please ask. 
               
Participation: In Class, on GitHub, and on Slack (15%):
 Coding and programming in real life is a social activity, and professionals in
                  the real world aren’t know-it-all
 experts who work alone, but rather are tuned
                  into discussion boards and regularly ask and answer questions to stay sharp and to
                  learn from their community. In this class, we want you to work together and talk
                  to each other and your instructors as your community resource, so we have built
                  this into our course participation grade as a formal expectation. Beginning by week two, we’ll expect each student to post at least once per
                     week on our course GitHub repo, and we strongly encourage you
                  to do more than this minimum. Earn an A
 in participation by asking questions, making suggestions, and sharing helpful
                  resources you’ve found. Help each other out by trying to answer questions on GitHub
                  (and
                  read the instructor posts too as we wade in to help). Your instructors will likely
                  be dominating the class time as we model concepts and methods, so the GitHub Issues
                  board gives the students a good space to form into a coding community to help each
                  other and reflect together. Also, if you have a question about an assignment, always think of our GitHub Issues board as your first resource to
                  check for helpful hints and to post your questions, because others may have the
                  same question and answers are best shared! Of course you may e-mail us, but we
                  really prefer you go the discussion board first, and doing so is, after all, worth
                  course credit as your participation grade.
               
Issue posts: Throughout the course, we’ll assign discussion posts worth points toward your Participation grade on our class GitHub site and on our Slack group. You will be discussing online readings or evaluating web resources. Your posting should do more than summarize the article or site (which you could just do by skimming or reading the first paragraph), but should demonstrate a thoughtful reflection on specific ideas and issues. When evaluating a web resource, don’t simply praise or condemn it without going into details about why a key component is effective or poorly designed. Good posts demonstrate care and reflection, and you may choose to respond to the overarching ideas of a piece, or to selected details of specific interest.
Tests (25%):
As scheduled throughout the course there will be a few (three or four) tests on the concepts and various kinds of markup technologies we are learning in the course. All will be take-home or taken online in between classes. They are open-book, open notes, but they must be completed individually and are designed to demonstrate that you have learned from the class material, coding assignments, and posted solutions. Tests may resemble homework assignments, but unlike homework exercises, these are given letter grades. These are given grades because they are evaluative and involve demonstrating what you have learned after we have finished a coding unit.
Projects (30%):
This course involves working on a team-based semester project. Project work will be scheduled with paced due dates throughout the semester, and will give you experience with team work to explore a research question and to document methods and discoveries using the coding and text analysis technologies addressed in our course.
Grading Scale:
Grades for the course are calcuated and posted on Canvas, and follow this standard scale: A: 93-100%, A-: 90-92%, B+: 87-89%, B: 83-86%, B-: 80-82%, C+: 77-79%, C: 70-76%, D: 60-69%, F: 59% and below. In taking the course on a S / NC (pass-fail) basis, students must earn a C to receive Satisfactory credit.
Course Policies:
Each day we are covering material that builds on earlier material and assignments, so your success depends upon regular attendance and completing each assignment on time.
Due dates and why we need them:
Your daily homework for this course is time-sensitive, because it is connected to a daily learning process. Keeping up with assignments, even if you do not always do them correctly, is key to what we discuss in class and helps you with your next steps and makes it possible for you to help build the semester project with your team. Work with the time schedule and upload coding assignments, response posts, and other homework exercises to Canvas (or GitHub or our web server as specified), by the due date and time indicated on the class schedule. Sometimes I will ask you to revise your work, and it will always help you to have a starting point, even if you know it is not correct when you submit it at first. Homework assignments will be posted online to our class website and linked from Canvas, so students who miss class are nevertheless expected to consult the schedule and submit assignments on time. Because we post, discuss, and share answers to homework exercises after submission deadlines, we will usually not accept late homework submissions.
Exam Policy:
Exams will be take-home, to do on your own time, with submissions due in Canvas or by web submission. Because I will be posting answers and sharing them in class, I do not allow people to write exams after the solutions are posted. However, I will drop your lowest exam score for the class, so that you may miss one exam without penalty.
Attendance and Classroom Courtesy:
I expect your active presence and interaction with me and your classmates this semester. Being an active part of this class means helping to form a community. We need to rely on each other in the classroom and online in our coding environemnts to learn and develop projects. Attendance is about connecting, being part of our class community of coders.
Our class is fast-paced and requires that we all be making the best use we can of
                  our in-person class sessions. Arriving late and leaving early (physically or
                  remotely) disrupts the important collective mental activity of class. So does
                  in-class texting and checking your cell phone. During classtime, 
                  I ask that you put mobile devices in Do Not Disturb
                  mode. While class is in progress, talking disruptively, leaving the classroom,
                  texting or using a cell phone or computer, reading a newspaper, or other
                  distracting behavior will be actively discouraged. 
 If you need to miss classes for health reasons, make arrangements with me and
                  your peers to catch up. Stay in the class loop
 by consulting Canvas and checking in over Slack and GitHub.
Student (and Faculty) Health and Wellness Services
If any of us, you students or me, are feeling sick, with COVID or flu-like, or other serious ailments this semester, please contact Behrend Student Health & Wellness Services at 814-898-6217. Reporting in when you do not feel well is not shameful; it is responsible and important to protect yourself and our community.
Please do not attend our physical class if you are not feeling healthy! Stay home, report symptoms, get tested. This applies to me as your professor as well as to you!
Counseling Services
This semester may be stressful for all of us. Many people at Penn State face personal challenges or have psychological needs that may interfere with their academic progress, social development, or emotional well-being. Seek help! The university offers a variety of confidential services to help you through difficult times, including individual and group counseling, crisis intervention, consultations, online chats, and mental health screenings: see resources posted at https://behrend.psu.edu/student-life/student-services/personal-counseling. These services are provided by staff who welcome all students and embrace a philosophy respectful of clients’ cultural and religious backgrounds, and sensitive to differences in race, ability, gender identity and sexual orientation. Counseling and Psychological services are available through the Personal Counseling Office in Reed Union Bldg. Rm 1: 814-898-6504.
LionHelp App
LionHELP is a smartphone application, available for both iOS and Android, that you can download if you or someone you know may be facing a mental health emergency. This app provides information about the signs of a mental health crisis, how to talk to someone who may be in crisis, a guide to help refer someone to the appropriate resource, and a full list of resources available on campus. The app can be downloaded free of charge, and there is absolutely no tracking of any information. Please note that LionHELP is not a diagnostic tool and should not take the place of services provided by a licensed mental health professional.
Equity
Penn State takes great pride to foster a diverse and inclusive environment for students, faculty, and staff. Acts of intolerance, discrimination, or harassment due to age, ancestry, color, disability, gender, gender identity, national origin, race, religious belief, sexual orientation, or veteran status are not tolerated and can be reported through Educational Equity via the Report Bias webpage (http://equity.psu.edu/reportbias/).
E-mail:
Each student is issued a University email address (username@psu.edu) upon admission. This email address may be used by the University for official communication with students. Students are expected to read email sent to this account on a regular basis. Failure to read and react to University communications in a timely manner does not absolve the student from knowing and complying with the content of the communications. The University provides an email forwarding service that allows students to read their email via other service providers (e.g., Hotmail, AOL, Yahoo). Students who choose to forward their email from their psu.edu address to another address do so at their own risk. If email is lost as a result of forwarding, it does not absolve the student from responding to official communications sent to their University email address. To forward email sent to your University account, go to http://accounts.psu.edu, log into your account, click on Edit Forwarding Addresses, and follow the instructions on the page. Be sure to log out of your account when you have finished.
Academic Integrity
Penn State Erie, The Behrend College, puts a very high value on academic integrity, and violations are not tolerated. Academic integrity is the pursuit of scholarly activity in an open, honest and responsible manner. Academic integrity is a basic guiding principle for all academic activity at The Pennsylvania State University, and all members of the University community are expected to act in accordance with this principle. Consistent with this expectation, the University’s Code of Conduct states that all students should act with personal integrity; respect other students’ dignity, rights and property; and help create and maintain an environment in which all can succeed through the fruits of their efforts. Academic integrity includes a commitment by all members of the University community not to engage in or tolerate acts of falsification, misrepresentation or deception. Such acts of dishonesty violate the fundamental ethical principles of the University community and compromise the worth of work completed by others.” (Senate Policy 49-20 and G-9 Procedures. Any violation of academic integrity will receive academic and possibly disciplinary sanctions, including the possible awarding of an XF grade which is recorded on the transcript and states that failure of the course was due to an act of academic dishonesty. All acts of academic dishonesty are recorded so repeat offenders can be sanctioned accordingly. More information on academic integrity can be found at: http://psbehrend.psu.edu/intranet/faculty-resources/academic-integrity/academic-integrity.
Source Citation and Plagiarism: One goal of our course is to reflect on how best to cite sources in digital contexts, including applications of artificial intelligence. We will consider how and why such citations differ from documenting printed texts. We will also consider the ease and frequency with which digital texts and graphics are plagiarized on the worldwide web, and discuss how the omission of source citations detracts from the authority of a digital information resource. We expect you to practice mindful source citation, and plagiarism on your part will have very serious consequences.
Representing the voice of another individual as your own voice constitutes
                  plagiarism, however generous that person may be in helping
 you with an
                  assignment. Turning in an assignment generated collectively under the name of a
                  single individual is considered plagiarism. When instructed to collaborate
                     on a project, project collaborators share collective authorship and should
                     identify themselves directly as a team. To avoid plagiarism, cite your
                  sources whenever you quote, paraphrase, or summarize material, or use digital
                  images from any outside source (including websites, articles, books, course
                  readings, Courseweb or GitHub postings, or someone else’s notes). When using the
                  copy
 and paste
 features as you read and research, be sure that you are
                  carefully marking that these passages are unprocessed from their source, so that
                  you know to process it later. Forgetting to do so not only produces sloppy work
                  but (whether you intended it or not) results in a false representation. As long as
                  you make a good faith and clear effort to cite your sources, you will not be
                  faulted for plagiarism, but your work will be penalized if citations are
                  inaccurate, unclear, or lack important information. 
That said, the coding and digital development we do encourages collaboration, and for that reason we adopt our colleague David Birnbaum's Collaboration policy, since his course is very similar to ours. This policy specifies that students identify collaborators in a comment on submitted asignments and take care on projects that all students contribute equally (and no student is contributing excessively more than what everyone else has done). When joining a group homework session, always work on the assignment by yourself first so you can be an equal participant, and write up the assignment by yourself, after the session is over so you take care not to copy from the other students. While we want you to consult with each other, you are responsible for doing all your writing and coding by yourself, using your own words.
Disability Services:
This course could pose certain issues related to physical abilities. Please talk to me if you need help navigating the course or accessing our resources. In the case of documented disabilities, students must meet with the instructor to discuss their specific accommodations. In order to receive consideration for reasonable accommodations, you must contact the appropriate disability services office at the campus where you are officially enrolled, participate in an intake interview, and provide documentation: See documentation guidelines (http://equity.psu.edu/sdr/guidelines). If the documentation supports your request for reasonable accommodations, your campus disability services office will provide you with an accommodation letter. Please share this letter with your instructors and discuss the accommodations with them as early as possible. You must follow this process for every semester that you request accommodations. Penn State Behrend’s Disability Services Coordinator is Amy James (ajk7@psu.edu)
Career Services
Career Services prepares Penn State students to enter the workforce or graduate school through a variety of services. Career professionals will assist with resume and cover letter reviews, internship and job searches, interview prep and mock interviews, career fair prep, development of career competencies, and graduate school prep. Be sure to utilize Career Services for all of your career endeavors, start planning your career early! Do not wait any longer—check out their website and/or stop into their office which is located in Reed 125 during drop-in hours Monday-Friday, 12:00-4:00 p.m. Make an appointment via the Career Services website instructions or call 814-898-6164
Inspiration
We gratefully acknowledge David Birnbaum’s Digital Humanities course at the University of Pittsburgh as our starting point and supporting resource for much of our development. Other inspirational resources include:
Projects that inspire us:
- Obdurodon: where we learned what we can teach, and where we’re still learning.
- Venice Time Machine: very ambitious, enormous project team of faculty and students to study and model a thousand years of Venice, digitizing "kilometers of archives."
- Map of Early Modern London
- Lord Byron and His Times: The very thoughtful stylistic design of this important project reproduces the style of nineteenth-century print and layout. The content makes many rare materials about Lord Byron’s social network searchable and connected to the web of linked open data.
- The Shelley-Godwin Archive: digitizes the manuscripts of Percy and Mary Shelley, and Mary Shelley’s parents, William Godwin and Mary Wollstonecraft—manuscripts often written in multiple hands. Provides an important study of the Frankenstein notebooks to demonstrate how much of a role Percy Shelley played in the writing of Frankenstein. The archive provides a good model of the use of TEI for manuscript encoding and of complex and multiple visualizations of manuscript texts.
- A Tour Through the Visualization Zoo
- Clay Shirky on Love, Internet Style (9 minutes of Youtube inspiration: on what lasts, and why community matters in our digital worlds.)
Previous versions of this course
- Spring 2022:
- Spring 2021:

