Description

5/5 - (2 votes)

Introduction

 

The goal of this assignment is to get you started writing python functions, and to evaluate your ability to create high quality e cient code. The work you will be doing in this assignment is not harder than what you have been doing in exercises up to this point, but this code will be marked by TAs who will evaluate not only whether your code works, but also whether it is e cient and well designed. Also remember that the assignments must be completed entirely on your own. So if you get stuck, the TAs can help with general concepts, but cannot look at your code or help you directly with your implementation. The same goes for your fellow students. No one should ever look at your assignment code and you should never look at anyone else’s (until after it’s submitted, at which time you can feel free to share and compare).

 

What Not to Use

 

You can complete this assignment using only the material covered in the rst four weeks of the course. In particular, you are not allowed to use loops anywhere in the assignment (there is no good reason to use them in this assignment!) or any data structures not covered in weeks 1-4.

 

 

DNA Sequencing

 

This assignment deals with DNA sequencing. A DNA sequence is a series of nucleotides: adenine (A), guanine (G), cytosine (C) and thymine (T). These sequences can be represented as strings of their rst letters. e.g., GCACTAG.

 

Within these DNA sequences, researchers are interested in nding speci c genic sequences (genes). The input we receive from DNA sequencers usually comes in a continuous stream, so the string may start with some upstream sequence (data from a previous gene), and may continue beyond the end of the speci c gene that is of interest with a downstream sequence (the start of another gene). Fortunately, all genes start with the sequence ATG, and the sequence ATG cannot appear in the middle of a gene 1. This makes it possible to isolate and analyze a speci c gene from a sequence.

 

Your Tasks

 

For this assignment, you will be required to build the following 5 functions:

 

split input: Takes in a DNA sequence (as described above) and returns a list with three elements, the upstream data, the gene (if any is found, or an empty string if no gene is found), and the downstream data, in that order

 

get gene: Takes in a DNA sequence (as described above) and returns a string representation of the gene if one is present, or the string ERROR if no gene is present.

 

validate gene: Takes in a string representation of a gene, and returns True i the gene presented is valid. For a gene to be valid it must satisfy the following critera:

 

{ It must start with the start codon (3 character sequence) ATG { It must contain at least one codon after the start codon

 

{ It must contain only full codons (i.e., it cannot end mid-way through a 3 character codon) { It must never contain four consecutive identical nucleotides

  • Note: This isn’t actually true in the real world, but it makes our lives a lot easier, so we’ll pretend

 

 

1

 

is palindromic: Takes a string representation of a gene, and returns True i that gene is palindromic (reads the same forwards as backwards).

 

evaluate sequence: Takes in a DNA sequence (as described above) and returns one of the follow-ing strings as appropriate: fNo Gene Found, Invalid Gene, Valid Gene Found, Valid Palindromic Gene Foundg.

 

Marking

 

Your assignment will be marked for correctness in a similar manner to your exercises. But it will also be marked by a TA for elements such as:

 

Programming style: Your variable names should be meaningful and your code as simple and clear as possible.

 

Commenting: Your documentation should be clear and concise and allow a user who has not read this handout to fully understand how to use and manipulate your functions.

 

Code re-use: You should have as little duplicated code as possible. If you nd yourself repeating code, there’s a good chance you could nd a simpler (lazier) method.

 

Testing coverage: Your test cases should not only cover all input categories, but should also be clearly labelled and organized in a sensible manner.

 

What to Submit

 

Submit a0.py and a0.test on MarkUs. Your le must be named exactly as given here (check that MarkUs says you have submitted all required les after you’re done submitting).

 

Before you submit:

 

Ensure that you have read & added your name and login to the header at the top of the le Test your code for PEP-8 compliance

 

Run DocTest and make sure you pass all your own cases Re-test all examples

 

Happy Sequencing!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2