Protein Synthesis SOlution




The goal of this project is to write a program that mimics the process of protein synthesis in eukaryotic cells. The first half focuses on transcription and translation. The second half introduces the concept of mutation.

Background Information

All living organism store their genetic information in chains of nucleic acid. All eukaryotes (i. e. Organisms whose cells contain membrane-bound organelles, or “little organs”) use deoxyribonucleic acid, or DNA, as the “hard drive” where information is stored. DNA is composed of four distinct nucleobases: adenine, thymine, cytosine, and guanine, which are abbreviated by their first letter  (ATCG). A chain of nucleobases form a DNA strand. Although one strand is enough to store information, each eukaryotic cell contains two complementary copies that bind to each other to form a double helix. The rules of base pairing are as follows: A and T pair together, C and G pair together.

Fun Fact: The human genome contains roughly 2.9 billion base pairs. If unwound in a straight line, this would amount to about 2 m in length. Thanks to ingenious folding techniques, our cells are able to store DNA in their nucleus, which is only 6 microns across (1 micron is a millionth of a meter). As if this weren’t impressive enough, remember that each cell contains two strand of DNA!

Task A: Transcription

Each gene codes for a protein, and transcription is the first step of gene expression. Most protein synthesis occurs in organelles known as ribosomes, which are located outside of the nucleus where DNA is stored. To relay information to a ribosome, the cell makes a copy of the relevant gene from DNA and sends that copy out of the nucleus. The copy is called a messenger ribonucleic acid, or mRNA. Like DNA, mRNA is made of the same nucleobases, except for one: it does not contain thymine [T], but instead contains uracil [U]. That means that the complement of [A] in mRNA is [U]. As such, the rules of complementation in mRNA are as follows:

  • [A] becomes [U]

  • [T] becomes [A]

  • [C] becomes [G]

  • [G] becomes [C]

Your task is to write a program called transcriptase.cpp that reads a text file called dna.txt that contains one DNA strand per line, which looks as follows:


and outputs to the console (terminal) the corresponding mRNA strands. Each output line must contain exactly one mRNA strand. This is a sample output of the program:


Recall that to read from a file, the following code snipet can be used:

    ifstream fin("dna.txt");     if ( {         cerr << "File cannot be read, opened, or does not exist.\n";         exit(1);     }     string strand;     while(getline(fin, strand)) {         cout << strand << endl;     }     fin.close();

The best way to do this is in two steps. First create a function that gives the complement of a base, and then write another function that uses it iteratively over a whole strand. For example, we could have char
 to return the complement of a base and string
 that uses it for each base in the strand. Note that the output must be in capital letters, regardless of how the input is formatted. To do this, you may include the <cstdlib> and use int
toupper(int c)
, which returns the upper case of any alpha character passed to it.