UMBC CMSC 491/691 Fall 2022
Knowledge Graphs

Home · Schedule · HW · Exams · Notes · GitHub · Examples · Resources · Colab · Discord · Webex

HW1: Genealogy with rdflib

Out Oct. 2; due Oct. 15

Get your repo

In this assignment you will experiment with using rdflib to create a simple ontology for defining a few family relations and inferring additional facts using RDFS and OWL. We recommend doing this on your own computer, which will be easier. You also try developing it on Colab but that will require wrangling the files a bit more. What we need at the end are the following files:

  • family.ttl: A slightly extended ontology for describing people and family relations
  • myfamily.ttl: Data about you and some of your family, which can be anonymized or fictional
  • hw1.ipynb: The modified notebook you used to combine your ontology and data and produce the required output
  • hw1.py: the python program generated from your notebook
  • myfamily_plus.ttl: The file created by the notebook with your family and the additional facts inferred by owlrl
  • me.ttl: a file with all of the triples where "you" are either the subject or object.
  • README.md: a file with a few questions to answer.
You might find the webpage OWL References for Humans to be a useful reference for OWL properties and classes.

1. Installing required software

On your own computer, you will need to have a fairly recent version of Python3. You will need to use pip to install the rdflib and owlrl packages. If you choose to use colab, you can follow the colab notebook example from class of doing rdfs and owlrl reasoning. It's fine to do the development in a jupyter or colab notebook, but we want you to download the notebook as a notebook and also as a python program.

Here are some tools that you might find useful.

  • The Turtle Web Editor website lets you load, paste, or type some Turtle into its editor, which has syntax highlighting and can do validation. When you click on the button to validate the Turtle, if it finds a problems, it will tell you the line number of the first one. You can then edit that line and try validating again. When you have fixed all of the problems, you can download your file or just select the text to paste it into an editor on your computer. The system is open-source and you can download it and run locally if you like.
  • If you can't remember the full URI for a prefix like foaf, rdfs, or yago, Prefix.cc will help. The simple interface lets you enter a common prefix (or a name like Wikidata) and it will show you the full URI with a link to its specification or documentation if it any exist. It can also show the most popular namespaces based on how often people have inquired about them.
  • One way to create ontologies or data for them is to type Turtle directly into your favorite text editor. Many popular editors, like VS Code and Emacs, have support for Turtle. VS Code with the GitHub Copilot  even seems to help. But another way is to use the Protege editor to visually create or edit a file of RDF or OWL in any of the common serializations. There's a bit of a learning curve and we will cover how to use it later, but if you want to get started see the Protege Getting Started pages.

2. Cloning your HW1 repository

Your initial repository will have a starter file for all of the files that are part of the assignment, but some of the files will be initially empty.

3. Developing your ontology and data files

This assignment asks you to make a few additions to a simple ontology about family relations and then to use it to represent data about a family with at least ten people spanning at least four generations. You might use part of your own family network but feel free to protect your privacy by changing names and other data. One of the people should be identified in the notebook as you, though.

3.1 family.ttl

We've defined an initial file family.ttl that uses the URI <http://example.org/family/> and prefix fam. It defines a single entity type, fam:Person, and a set of properties, such as fam:hasChild, fam:hasParent, and fam:grandParent. The predefined relations have RDFS domain and range properties and some have properties that allow inference. fam:grandParent, for example, says that it holds between two fam:Person entities if they are connected by a chain of exactly two fam:hasParent links.

This homework asks you to make some additions to the family.ttl ontology by adding at least four additional properties. You should use each of the four on several of the people in your family.ttl file. You could use, for example, some of the properties from the foaf vocabulary, like foaf:mbox or foaf:schoolHomepage. At least one of the new properties should hold between two objects (i.e. be a owl:ObjectProperty), rather than going from an object to a literal (e.g., be a owl:DatatypeProperty). This could be something like fam:sibling or fam:uncle. For these, you need not provide definitions that could allow them to be inferred from other relations (e.g., a sibling relation holds between two different people if they share a parent). We need to cover more of OWL to do that.

3.2 myfamily.ttl

You should specify facts about your own immediate family and include all of the relations. Include at least ten people (including yourself) and all relevant base relations. Include people that span at least four generations, so that there will be some grandparent relations. Again, feel free to anonymize everyone but you or add additional fictional people. You should add only enough facts to ensure that the RDFS and OWL inferences can compete the data. For example, You might only add fam:childOf relations and let the owlrl package add inverses. Similarly, the domain and range constrains should allow the system to infer than an entity is a fam:Person, so you need not explicitly assert that many individuals are instances of fam:Person.

4. Your notebook

Edit the notebook file hw1.ipynb to specify the identifier that represents you in your myfamily.ttl file. The notebook will (1) create a graph using your ontology (family.ttl) and data (myfamily.ttl), (2) compute its deductive closure using owlrl, (3) serialize the results as the file myfamily_plus.ttl and me.ttl. If you get errors when the notebook tries to load the Turtle files into a graph, you can use the Turtle Web Editor to find and correct syntax problems, like a missing period.

The myfamily_plus.ttl output has a lot of cruft from other namespaces, so is hard to do a sanity check on it. But the notebook does show you (and download) a file me.ttl that has all of the triples where your Person node is either the subject or object. Review this to see if it looks OK.

6. Commit your files

Commit and push the following seven files to your hw1 repository: family.ttl, myfamily.ttl, me.ttl, myfamily_plus.ttl, hw1.ipynb, hw1.py, and README.md