logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Umar KhalidManagement
(5/5)

696 Answers

Hire Me
expert
Laurence FarrellStatistics
(5/5)

619 Answers

Hire Me
expert
Avinash KumarData mining
(/5)

955 Answers

Hire Me
expert
Pankaj ThrejaComputer science
(5/5)

915 Answers

Hire Me
Others
(5/5)

Contain docstrings and specific line or block comments that explain the semantics of your implementation

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Deliverables: 36 total points

seqCleaner 9 points

fastqParse 9 points

coordinateMathSoln 9 points

converter 9 points

Lab 2 – Manipulating Data Types (36 points)

Overview

Last week we introduced the Python function input(), which is used to take in string data. This week we will use input() to take in string and numeric (float and integer) data.

Remember that all data returned from input() is a string object, so this will mean that you need to convert any numeric data to their respective numeric objects. The exercises in this lab will give you practice manipulating various types of data that commonly arise in computational biology problems.

We are again using jupyter for this assignment, and as a final step we will create distinct command line programs from each of the parts. Please save each of the four (or five) python programs as .py files into your LAB02 folder. You can copy the text from the code cell, then paste it into an editor like notepad or textedit. Save that new file with the appropriate name ( eg. seqCleaner.py) into your LAB02 folder. Using terminal or cmd, you can now do a final test on that new program with the command:

python3 seqCleaner

All together, you will submit:

seqCleaner.py, fastqParse,py, coordinateMathSoln.py, converter.py, and (optionally) Triad.py

Sequence cleanup

In this exercise, you will create a program to “clean up” a sequence of DNA by removing ambiguous bases (denoted by “N”) output from a sequencer. Your task is to create a Python program called seqCleaner that

asks for and collects a sequence of DNA using input()

removes the ambiguous parts of the sequence, outputs the “cleaned” sequence, replacing the ambiguous parts with a count in {}’s. For example, if I enter the sequence of DNA “AaNNNNNNGTC” (without quotes), the program will output:

AA{6}GTC

Hints:

The input sequence is not guaranteed to be uppercase, but should be interpreted as though it is all uppercase. Only the letters (A,C,G,T,N) will be included in the input.

Only the DNA characters (A,C,G,T) should remain after the cleanup. The input may include, at most, one block of 'N' characters.

To get full credit on this assignment, your code needs to:

 In [ ]:

Run properly (execute and produce the correct output)

Contain docstrings and specific line or block comments that explain the semantics of your implementation. Include any assumptions or design decisions you made in writing your code

Include an overview describing what your program does with expected inputs and outputs. This should be in the form of a program level dosctring.

seqCleaner 

Sequence information parsing

In this exercise, you will create a program to “parse” sequence name information from a single line of a FASTQ formatted file. Your task is to create a Python script called fastqParse that: 

asks for and collects the seqname line of a FASTQ file using input()

parses out each field of the run information from the string and displays each of them on a new line For example, if I enter the FASTQ seqname line:

@EAS139:136:FC706VJ:2:2104:15343:197393 then the program will output: Instrument = EAS139

Run ID = 136

Flow Cell ID = FC706VJ Flow Cell Lane = 2

Tile Number = 2104 X-coord = 15343

Y-coord = 197393      Hints:

The input string is guaranteed to have 7 fields.

The first character of the FASTQ seqname line is “@” and each field of the run information is separated by a colon“:”. A reasonable solution would be around 16 lines of code excluding comments.

To get full credit on this assignment, your code needs to:

In [ ]:

Run properly (execute and produce the correct output) Contain documentation/comments

Include any assumptions or design decisions you made in writing your code

Include an overview describing what your program does with expected inputs and outputs fastqParse

Protein coordinates

In this exercise, you will create a program that takes three sets of atomic coordinates, all provided on a single line. The program then calculates the bond lengths and angles. For this program, you can start with the Triad class (provided). Your task is to create a Python program called coordinateMathSoln that: 

asks for and collects three sets of coordinates using input(), only use 1 line for this data !!

outputs the N-C and N-Ca bond lengths and the C-N-Ca bond angle with correct number of significant digits (see below) For example, if I enter the following coordinates (notice.. they are all on one line !!!) :

C = (39.447, 94.657, 11.824) N = (39.292, 95.716, 11.027) Ca = (39.462, 97.101, 11.465)

then the program will output the following three lines: N-C bond length = 1.33

N-Ca bond length = 1.46

C-N-Ca bond angle = 124.0

(Note: make sure that the angle returned is in degrees !!)

Hints:

Each coordinate will contain only 3 numeric values.

Bond lengths are the distance between two points, which for points P and Q in 3-space, (Px, Py, Pz) and (Qx, Qy, Qz) respectively, the distance between them is: 

∥PQ∥ = √(Px − Qx)2 + (Py − Qy)2 + (Pz − Qz)2) 

i∈x,y,z

=      ∑ (Pi − Qi)2 

Bond angles can be calculated from the dot product.

Let’s say that we have three points in space labeled P, Q and R. We are interested in the angle at point Q that is made from line segments QP and QR. The dot product tells us (for standard vectors P and R) that:

P ⋅ R = ∥P ∥∥R∥ cos θ

in this notation,∥P ∥ refers to the length of vector P as a standard vector (assumed to begin at the origin (0,0,0) ). We can then see that the angle between vectors P and R can be found by:

cos θ =    P ⋅ R

∥P ∥∥R∥

We can calculate the dot product using the sum of products of the vector components:

i∈x,y,z

P ⋅ R = ∑ PiRi

Now, to find vector P in standard form, we need to remember that QP starts at Q, so we need to place the origin at Q and find out where P is in that new space. We do that by subtracting the components of Q from P. Putting all of this together, we get:

∑i∈x,y,z (Pi − Qi)(Ri − Qi)

θ = cos−1   

∥QP ∥∥QR∥

Remember, θ is in radians.

Below I have given you a class (Triad) with methods to calculate dot products (dot), dot products of translated vectors (ndot), distances (dPQ, dPR, dPR) and angles in radians (angleP, angleQ and angleR) for each of the three points in a Triad object. A reasonable solution for this exercise involves around 12 lines of additional code excluding comments.

To get full credit on this assignment, your code needs to:

In [ ]:

Run properly (execute and produce the correct output) Contain docstrings and line comments (using )

Include any assumptions or design decisions you made in writing your code

Include an overview describing what your program does with expected inputs and outputs as a program level docstring coordinateMathSoln 

Extra credit (5 points): Rewrite the Triad class.

For extra-credit, provide a direct replacement for the Triad class. The external methods that calculate angles, distances, and points (tuples) p,q and r must be maintained such that either version of the Triad class can be used.

You could use the cosine law to calculate angles instead of the dot product. You might make use of the numpy module. You might recode each of the methods to avoid using zip. You might consider using list iterations. Your Triad replacement must reimplement all of Triad public function, without using zip and without being a trivial rewrite. Your implementation need not be as compact as the current implementation, and it needs to be correct and fully documented to receive full credit.

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme