Here we store material for the course CB2442, Bioinformatics.
This project is maintained by kth-gt
This is an introductory lab for the programming part of the course. Your task is to write a function that can convert DNA sequence to amino acid sequence. To your help you have a scaffold of python code that you should use as to validate your code and also to make sure you follow a standard that the TAs can automatically validate.
Begin with downloading the project to your local computer by using this link.
Unzip the files into a directory and open the directory in VS Code.
$ unzip 'kth-gt cb2442 main prog-p1.zip'
$ code .
If you are not yet familiar with the VS Code software, watch for example this short introduction video. Note that you can run your code directly in VS Code and don’t need to open a separate terminal window. You can run either the whole script by e.g. pressing the start symbol in the upper right corner or by typing labp1.py
in the terminal window below the code window (if you don’t see it, select “Terminal” in the “View” menu). You can also run Python in interactive mode by typing python3
in the terminal window. This will give you a >>>
prompt and now you can paste (or type) sections of the code that will be executed. Another way to do this is to mark a section of the code, right-click, and select “Run Python” / “Run Selection/Line in Python Terminal” (or press Shift+Enter). Don’t forget to save every now and then.
We suggest activating the “autosave” option.
In the Python script file labp1.py
, edit the function named
def dna2aa(dna_str):
so that it takes a DNA sequence as input and returns an amino acid sequence. You may use the dictionary codon2aa
. which translates triplets of bases into amino acid symbols.
Also, set the list authors
to contain all the group members’ names.
You can make an initial execution of your dna2aa
function by running the Python file itself directly as top-level code by executing the line,
$ python labp1.py
However the final test of the code is done by executing the runnerp1.py
executable, which can be run from the Terminal by,
$ python runnerp1.py
or just
$ ./runnerp1.py
This executes the code in labp1.py
, and validates the results against some known test vectors.
If you implemented the function right, you will see your names appearing.
Change the behavior of dna2aa
so that it tries all three possible frames of translation, and selects the amino acid sequence that has the longest orf of the three alternatives.