logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Jarrod AlongeHistory
(5/5)

739 Answers

Hire Me
expert
Sikhar AggarwalMathematics
(5/5)

689 Answers

Hire Me
expert
StatAnalytica ExpertAccounting
(5/5)

877 Answers

Hire Me
expert
Lakshay GabaEnglish
(5/5)

918 Answers

Hire Me
Others

Python source files (upload .zip file in case of multiple files) containing your code only (no test data needed) and ReadMe.txt file (template provided) describing how to run your code.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

ASSIGNMENT

Submission:

 

  1. Source Code: Python source files (upload .zip file in case of multiple files) containing your code only (no test data needed) and ReadMe.txt file (template provided) describing how to run your code. Note that we will NOT debug your code. If your code does not execute as described in ReadMe.txt, you will receive a zero grade.

  2. Presentation Slide: One slide only in PPT/PPTX/PDF format to be used during the oral presentations (see below). If you submitted file spans more than a page, we will extract the first page for the oral

 

Presentations:

 

Everyone is required to deliver 3 minutes flash presentation accompanied by the submitted slide following the Three Minute Thesis (3MT) format, with additional 2 minutes for Q&A:

 

  1. Your presentation should at least contain methods (i.e., implementation), results (e.g., output), and

  2. Having appropriate graphics and visuals (e.g., figures, plots) in the presentation slides to help illustrate key concepts or results will be positively

  3. Any additional scientific insights and/or challenges faced and/or limitations of your implementation and/or efficiency analyses and/or comparisons with alternative approaches will be positively

 

Implementing decision tree for protein RSA prediction

 

Objective: Implement decision tree for protein relative solvent accessibility prediction.

 

Note: You must use standard Python programming language. You are NOT allowed to use non- standard packages or libraries (e.g. Biopython, scikit-learn, SciPy, NumPy, etc.).

 

A: Raw Data:

 

Two directors (fasta and sa) are supplied. The fasta directory contains 150 protein sequences in FASTA format. A FASTA file is as follows:

 

The true binary relative solvent accessibility (RSA) labels of these proteins can be found in the sa

directory. This file is also in FASTA format. RSA labels having two possible values:

 

‘E: exposed ‘B’: buried

 

N.B. The true RSA labels are calculated using the DSSP (Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Kabsch and Sander, 1983) software at a 25% threshold.

 

B: Curating Training and Test Datasets:

 

Divide the raw data into non-overlapping sets of training (~75%) and test (~25%) datasets using simple random sampling without replacement.

 

 

C.  Feature Extraction:

 

Using chemical properties of 20 naturally occurring amino acid residues as detailed in Table 1 and Figure 1, construct a feature matrix (or vector) for the training and test datasets.

 

 

Table 1. Chemical properties of 20 naturally occurring amino acid residues (Livingstone & Barton, CABIOS, 9, 745-756, 1993)

 
   

 

 

 

Figure 1. Venn diagram of chemical properties of 20 naturally occurring amino acid residues (Livingstone & Barton, CABIOS, 9, 745-756, 1993)

 

 

 

Specifically, the feature set should include the following binary attributes:

 

Attribute

Description

Hydrophobic

Whether a residue is hydrophobic

Polar

Whether a residue is hydrophobic

Small

Whether a residue size is small

Proline

Whether a residue is Proline (PRO, P)

Tiny

Whether a residue size is tiny

Aliphatic

Whether a residue is Aliphatic

Aromatic

Whether a residue is Aromatic

Positive

Whether a residue is Positively Charged

Negative

Whether a residue is Negatively Charged

Charged

Whether a residue is Charged

The output labels are already binary (e.g. 1 for exposed, 0 for buried or vice versa).

 

D.  Decision Tree Learning using ID3 on Training Set:

 

Implement the ID3 decision tree learning algorithm that follows a greedy top-down growth of the tree using information gain to learn the best hypothesis on training dataset.

 

E.  Decision Tree Classification on Test Set:

 

Implement decision tree classification algorithm that walks on the trained tree generated from step D and output predicts labels on test dataset.

 

N.B. ID3 decision tree is an offline-learning algorithm. Therefore, training and classification should be implemented separately. The classification algorithm should take a protein sequence in FASTA format as an input and predict labels in a standalone mode. You may save the parameters learned during training in a file that can be fed into the classifier, in an offline mode.

 

 

F.  Evaluate Accuracy:

 

Use Precision, Recall, and F-1 score to calculate the accuracy of the decision tree classifier implemented in step E on test dataset.

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme