logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Aashif AliStatistics
(5/5)

677 Answers

Hire Me
expert
Doug BeaucheminAccounting
(5/5)

953 Answers

Hire Me
expert
Clinton Kibaki MuneneNursing
(/5)

704 Answers

Hire Me
expert
Rasmi SehgalSocial sciences
(5/5)

515 Answers

Hire Me
Python Programming
(5/5)

Build effective classifiers to distinguish between malware and benign files, which are described by static and dynamic features.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Task

Build effective classifiers to distinguish between malware and benign files, which are described by static and dynamic features.

Learning Goals

Gain experience building a variety of classification models.

Learn how to handle data with significant class imbalance.

Gain experience in using classification methods to identify malware, including cases where there are an enormous number of features.

Dataset Description

The data set we use for this lab was generated from data obtained from the UCI repository1. The original data was distributed over three separate files, two of which contain descriptions of malware and one of benign files. The three files were merged into a single file for this lab (a few fields in the separate files did not overlap so a bit of preprocessing was needed to merge them). The data file that you will need for this lab is located at: https://storm.cis.fordham.edu/~gweiss/classes/cisc5660/data/Malware-staDyn-data.csv

Table 1: Summary Statistics of dataset

Name Malware-staDyn-data.csv

 

# Records 6,248

# Attributes 1,085

Class variable “label” located as last feature

Class values 0 (benign) and 1 (malware)

Class distribution 90.5% malware (n=5653) and 9.5% benign (n=595)

 

 

There original data did not contain a detailed description of each feature, but some general information can be obtained from the paper “Protecting from Malware Obfuscation Attacks through Adversarial Risk Analysis”2 (see Section 2.2 on feature extraction). As that paper discusses, static features include Assembly Language File (ASM) features, Hex dump features, and Portable Executable File Header (PE Header) features. Dynamic features are generated based on the run time behavior of the binaries executed within a Virtual Machine (the Cuckoo Sandbox environment was used with a two minute default time). The 12 features most relevant for malware detection were included.

Please follow footnote 2 and read over section 2.2 and browse other parts of the paper if interested.

 

What You Need to Do for this Lab

For this lab you need to build and evaluate a Decision Tree, Random Forest, and kNN classifier to distinguish between malware and benign examples from the supplied dataset. You will need to run all experiments on the unbalanced training data, one experiment on rebalanced training data using random oversampling (ROS), and several on rebalanced training data using SMOTE. You will also need to vary certain model parameters. You should generate and submit a table that is formatted like Table 1 below, which effectively specifies the details of the experiments. You also need to supply your code (I suggest you use Jupyter Notebook). The code must be well commented and this will impact your grade.

 

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme