Assignment 1 Solution



Learning Goals

By the end of this assignment you should be able to:

  1. read a new relational schema, and determine whether or not a particular instance is valid with respect to that schema,

  1. apply the individual techniques for writing relational algebra queries and integrity constraints that we learned in class,

  1. combine the individual techniques to solve complex problems, and

  1. identify problems that cannot be solved using relational algebra.

These skills we leave you well prepared to learn SQL.

Later in the course, you will learn about how to develop your own schema based on knowledge of the domain. Even though developing a schema is really the rst step in building a database, it is a more advanced task than querying an existing database; this is why we will be learning about it and practising it later.


For this assignment, you will operate on a database for Instagram. Instagram is a social media platform on which users share pictures and videos, and post comments about them. You need to know a few facts about the domain:

As on many social media platforms, a user may follow another user.

The follows relationship is not necessarily symmetric: x can follow y even if y doesn’t follow x. If x follows y, we say that x is a follower of y, and that y is followed by x.

A post has one or more photos and/or videos. Users can like posts, and write comments on them. Their comments may include hashtags (for example, #csc343, #computerScience).

A user can create stories. Like a post, a story has one or more photos and/or videos, but it has no hashtags and can’t be \liked”. Instagram does, however, keep track of the users that view a story.

A user has the option of having a single current story at any one time. Their current story is displayed prominently when other users view their pro le.

The above may or may not be completely accurate for Instagram; regardless, our schema and your queries will be based on it.


Rather than store pictures and videos themselves, our database will store URL references to their locations in the cloud.



User(uid, name, website, about, email, phone, photo)

A tuple in this relation represents an Instagram user. uid is the string identi er selected by the user. name, website, about, email, and phone are information about this user. photo is the url of the pro le photo of this user.

Follows(follower, followed, start)

A tuple in this relation represents the fact that the user with identi er follower follows the user with the identi er followed, beginning at date-time start.

Post(pid, uid, when, location, caption)

A tuple in this relation represents a post added by a user to their pro le. pid is the post identi cation number. uid is the identi cation number of the user who posted this post, which we will call the poster. when is the date-time when this post was posted by the poster. location is the location selected by the poster for this post. caption is the text description given to this post by the poster. Each post can have at most one caption.

PIncludes(pid, url)

A tuple in this relation represents the fact that the photo or video stored at location url is included in post pid.

Hashtag(pid, tag)

A tuple in this relation represents the fact that the caption of post pid includes the hashtag tag.

Likes(liker, pid, when)

A tuple in this relation represents the fact that user liker has liked the post pid at date-time when.

Comment(pid, commenter, when, text)

A tuple in this relation represents the fact that user commenter left the comment text for post pid at date-time when.

Story(sid, uid, when, current)

A tuple in this tuple represents a story created by a user. sid is the identi er of the story, uid is the user who created it, and when is the date-time when they created it. current is true i this is the current story for this user.

SIncludes(sid, url)

A tuple in this relation represents the fact that the photo or video stored at location url is included in story sid.

Saw(viewerid, sid, when)

A tuple in this relation represents the fact that viewer viewerid saw story sid at date-time when.

Integrity constraints

Follows[follower] User[uid] Follows[followed] User[uid] Post[uid] User[uid]


PIncludes[pid] Post[pid] Hashtag[pid] Post[pid] Likes[liker id] User[uid] Likes[pid] Post[pid]

Comment[pid] Post[pid]

Comment[commenter] User[uid] Story[uid] User[uid]

SIncludes[sid] Story[sid] Saw[viewerid] User[uid] Saw[sid] Story[sid]

follower=followedF ollows = ;

Story[current] f\yes”; \no”g

Warmup: Getting to know the schema

To get familiar with the schema, ask yourself questions like these (but don’t hand in your answers):

What does this integrity constraint mean? follower=followedF ollows = ;

Would it be a good idea to de ne the Follows relation like this? Follows(follower, followed, start) Can the database represent a single post that has multiple comments?

Can the database represent multiple comments from the same user on one post?

How does the schema allow any number of photos or videos to be included in one story, but restrict the user to having only one pro le photo?

Can the database represent that a user likes the same post more than once? (If not, how would one change the schema to allow this?)

Can the database represent that a user makes two posts at the same time?

Can the database represent that the same user makes the same comment on two di erent posts? Can the database represent that the same picture is included in 2 stories?

Part 1: Queries

Write the queries below in relational algebra. There are a number of variations on relational algebra, and di erent notations for the operations. You must use the same notation as we have used in class and on the slides. You may use assignment, and the operators we have used in class: ; ; ./; ./condition; ; \; [; ; . Assume that all relations are sets (not bags), as we have done in class, and do not use any of the extended relational algebra operations from Chapter 5 of the textbook (for example, do not use the division operator).

Some additional points to keep in mind:


Do not make any assumptions about the data that are not enforced by the original constraints given above, including the ones written in English. Your queries should work for any database that satis es those constraints.

Assume that every tuple has a value for every attribute. For those of you who know some SQL, in other words, there are no null values.

Remember that the condition on a select operation may only examine the values of the attributes in one tuple, not whole columns. In other words, to use a value (other than a literal value such as 100 or \Adele”), you must get that value into the tuples that your select will examine.

The condition on a select operation can use comparison operators (such as and 6=) and boolean operators (_, ^ and :). Simple arithmetic is also okay, e.g., attribute1 attribute2 + 5000.

Some relations in our schema have a date-time attribute. You may use comparison operators on such values. You may refer to the year component of a date-time attribute d using the notation d.year.

You are encouraged to use assignment to de ne intermediate results.

It’s a good idea to add commentary explaining what you’re doing. This way, even if your nal answer is not completely correct, you may receive partial marks.

The order of the columns in the result doesn’t matter.

When asked for a maximum or minimum, if there are ties, report all of them.

At least one of the queries cannot be expressed in the language that you are using. In those cases, simply write \cannot be expressed”. Note: The queries are not in order according to di culty.

  1. Find all the users who have never liked or viewed a post or story of a user that they do not follow. Report their user id and \about” information. Put the information into a relation with attributes \username” and \description”.

  1. Find every hashtag that has been mentioned in at least two post captions on every day of 2018. You may assume that there is at least one post on each day of a year.

  1. Let’s say that a pair of users are \reciprocal followers” if they follow each other. For each pair of reciprocal followers, nd all of their \uncommon followers”: users who follow one of them but not the other. Report one row for each of the pair’s uncommon follower. In it, include the identi ers of the reciprocal followers, and the identi er, name and email of the uncommon follower.

  1. Find the user who has liked the least posts. Report the user’s id, name and email, and the id of the posts they have liked. If there is a tie, report them all.

  1. Let’s say a pair of users are \backscratchers” if they follow each other and like all of each others’ posts. Report the user id of all users who follow some pair of backscratcher users.

  1. The \most recent activity” of a user is his or her latest story or post. The \most recently active user” is the user whose most recent activity occurred most recently.

Report the name of every user, and for the most recently active user they follow, report their name and email, and the date of their most-recent activity. If there is a tie for the most recently active user that a user follows, report a row for each of them.

  1. Report the name and email of the user who has gained the minimum number of new followers in 2018. If there is a tie, report them all.

  1. For each user who has ever put any comments, report their id and the id of the rst and of the last post they commented on.


Part 2: Additional Integrity Constraints

Express the following integrity constraints with the notation R = ;, where R is an expression of relational algebra. You are welcome to de ne intermediate results with assignment and then use them in an integrity constraint.

  1. A view on a story must occur after the date-time of the story itself. (Remember that you can compare two date-time attributes with simple <, >= etc.)

  1. Each user can have at most one current story.

When writing your queries for Part 1, don’t assume that these additional integrity constraints hold, except for the second one | it was described above as a constraint that holds.

Style and formatting requirements

In order to make your algebra more readable, and to minimize errors, we are including these style and formatting requirements:

In your assignment statements, you must include names for all attributes in the intermediate relation you are de ning. For example, write

HighestGrade(sID; oID; grade) := : : :

Use meaningful names for intermediate relations and attributes, just as you would in a program.

If you want to include comments, put them before the algebra that they pertain to, not after. Make them stand out from the algebra, for example by using a di erent font. For example, this looks reasonable:

{ Students who had very high grades in any o ering of a csc course. High(sID) := sID dept=0 csc0 ^grade>95(T ook ./ Of fering)

A modest portion of your mark will be for good style and formatting.

Submission instructions

Your assignment must be typed; handwritten assignments will not be marked. You may use any word-processing software you like. Many academics use LaTeX. It produces beautifully typeset text and handles mathematical notation well. If you would like to learn LaTeX, there are helpful resources online. Whatever you choose to use, you need to produce a nal document in pdf format.

You must declare your team (whether it is a team of one, two or three students) and hand in your work electronically using the MarkUs online system. Instructions for doing so are posted on the Assignments page of the course website. Well before the due date, you should declare your team and try submitting with MarkUs. You can submit an empty le as a placeholder, and then submit a new version of the le later (before the deadline, of course); look in the \Replace” column.

For this assignment, hand in just one le: A1.pdf. If you are working in a pair, only one of you should hand it in.

Check that you have submitted the correct version of your le by downloading it from MarkUs; new les will not be accepted after the due date.