logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Alvin BobadillaFinance
(5/5)

782 Answers

Hire Me
expert
Juan FloresEnglish
(5/5)

695 Answers

Hire Me
expert
malvin kengeEngineering
(/5)

685 Answers

Hire Me
expert
Sinaa AntiqueNursing
(5/5)

972 Answers

Hire Me
R Programming
(5/5)

Write a brief conclusion on your results and compare them to the results published for other algorithms on the MNIST dataset homepage.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Nature of the assignment 

This assignment allows you to practice your knowledge of machine learning by applying existing algorithms in R to solve the MNIST handwritten digit classification problem.

 The goal of the assignment is to take as input an image of a handwritten single digit, and determine what digit it is.

 First, the image data from the MNIST dataset needs to be loaded and prepared. DALT7011 Introduction to Machine Learning Resit Coursework Assignment Semester 1, 2022/23 

Then the most common linear technique for dimensionality reduction, principal component analysis (PCA), used to map the data to a lower-dimensional space and for visualisation, should be applied to the data.

 Next, you should use the pre-processed and dimension-reduced data to train a k-nearest-neighbour (kNN) classifier on the dataset. Use appropriate statistical tools to estimate the classification error and determine which value of k is best suited for the task. 

After that, you should choose one additional, suitable machine learning algorithm and evaluate its effectiveness for the same task. Please use a different method than the one you proposed in your original report. Your description of the experiments should provide a clear comparison of these two algorithms and formulate appropriate recommendations. 

The entire experiment must be thoroughly documented and accompanied with a set of R scripts from which it can be reproduced. 

 

Dataset 

The MNIST dataset is an open-source database of handwritten black-and-white digits that is widely used for training and testing of machine learning algorithms, as well as testing various imageprocessing systems. The MNIST dataset is an industry standard and one of the most common benchmarks for classification algorithms. 

The performances of most common algorithms on this dataset are well known and documented as error rates on the test set on the dataset’s homepage (see below). The accuracy rate may vary depending on the chosen machine learning algorithm.

 There are 60,000 images in the training fold and 10,000 images in the testing fold. The dataset comprises 10 classes, one class per digit, with 7,000 images (6,000 train images and 1,000 test images) per class. 

The data and information about MNIST can be found on the dataset homepage: 

http://yann.lecun.com/exdb/mnist/ 

Dataset curators: Chris Burges, Corinna Cortes and Yann LeCun

 Licensing information: MIT licence

 Suggested contents of the report 

Title page

 The following information should appear on the front page of your report: 

• Module number: DALT7011. 

• Student Number: 

• MSc Course: MSc in …

 • Word count: 

1. Introduction (15%) 

Provide a brief, clear introduction to the general topic with following components:

 1) The notion of a general classification problem.

 2) The notion of and rationale for separating test and train data sets. 

3) Explain what the MNIST data set is about and how it is an example of the previous two points. 

2. Data preparation and PCA Dimensionality Reduction (20%) 

Prepare your data set for analysis. 

Apply dimensionality reduction on the data set in R and visualize different classes in two dimensions based on it.

 Investigate how many principal components are needed to encode the data set and give example of reconstruction.

2. K-Nearest-Neighbour Classification Error Rate Evaluation (30%) Implement and describe an experiment in R that evaluates the classification error rate for a k-nearestneighbour (kNN) classifier on the MNIST dataset.

 Apply appropriate pre-processing and dimensionality reduction to the data.

 Run the kNN classifier on reduced dataset. Assess the impact of PCA on kNN. 

Determine the most suitable value for k experimentally, using a suitable error measurement. Use appropriate illustrations, diagrams and statistics. 

 

3. Second ML Technique and Error Rate Evaluation (20%) 

Implement and describe an experiment in R that evaluates the second, suitable classification algorithm of your choice on the MNIST dataset.

 Use appropriate illustrations and diagrams as well as statistics in order to compare to the previous results.

 Does PCA have the same effect on the chosen classifier as it had on kNN in the previous section?

4. Conclusion (15%)

 Write a brief conclusion on your results and compare them to the results published for other algorithms on the MNIST dataset homepage.

 Summarise your main findings.

 1) Which approach and parameter value is best suited? 

2) What other properties, other than solely the classification error, could be important to decide which method is most suited? 

3) Explain the possible current limitations of your solutions and possible further strategies to improve on the results. 

All arguments must be evidence-based. 

 

References, plagiarism and collusion

 Provide a list of references. 

You are required to cite the work of others used in your solution, include a list of references, and avoid plagiarism and collusion. Remember each loan should have at least one citation (use the university recommended referencing style).

 

 Appendix 

Append all source code to reproduce your experiments.

 References and Appendices themselves will not be marked. However, inappropriate use of these sections or their absence will be taken into consideration when awarding the final mark. 

 

Report format

 All above mentioned components should be composed in a single file. 

 

The assignment must be presented in the following format: 

• Font must be 11 point Arial font, Line Spacing – single and Spacing After – 10 pt or Line Spacing – double and Spacing After – 0 pt.

 • All pages must be numbered. 

• Margins must be as follows: Top: 1 inch, Bottom: 1 inch (2.5 cm), Left: 1.25 inches, Right: 1.25 inches (3.2 cm). 

Report word limit: 2,000 words. 

The word count excludes cover sheet, title, tables, figure labels, bibliography and appendices (100- 250 words).

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme