Nature of the assignment
This assignment allows you to practice your knowledge of machine learning by applying existing algorithms in R to solve the MNIST handwritten digit classification problem.
The goal of the assignment is to take as input an image of a handwritten single digit, and determine what digit it is.
First, the image data from the MNIST dataset needs to be loaded and prepared. DALT7011 Introduction to Machine Learning Resit Coursework Assignment Semester 1, 2022/23
Then the most common linear technique for dimensionality reduction, principal component analysis (PCA), used to map the data to a lower-dimensional space and for visualisation, should be applied to the data.
Next, you should use the pre-processed and dimension-reduced data to train a k-nearest-neighbour (kNN) classifier on the dataset. Use appropriate statistical tools to estimate the classification error and determine which value of k is best suited for the task.
After that, you should choose one additional, suitable machine learning algorithm and evaluate its effectiveness for the same task. Please use a different method than the one you proposed in your original report. Your description of the experiments should provide a clear comparison of these two algorithms and formulate appropriate recommendations.
The entire experiment must be thoroughly documented and accompanied with a set of R scripts from which it can be reproduced.
Dataset
The MNIST dataset is an open-source database of handwritten black-and-white digits that is widely used for training and testing of machine learning algorithms, as well as testing various imageprocessing systems. The MNIST dataset is an industry standard and one of the most common benchmarks for classification algorithms.
The performances of most common algorithms on this dataset are well known and documented as error rates on the test set on the dataset’s homepage (see below). The accuracy rate may vary depending on the chosen machine learning algorithm.
There are 60,000 images in the training fold and 10,000 images in the testing fold. The dataset comprises 10 classes, one class per digit, with 7,000 images (6,000 train images and 1,000 test images) per class.
The data and information about MNIST can be found on the dataset homepage:
http://yann.lecun.com/exdb/mnist/
Dataset curators: Chris Burges, Corinna Cortes and Yann LeCun
Licensing information: MIT licence
Suggested contents of the report
Title page
The following information should appear on the front page of your report:
• Module number: DALT7011.
• Student Number:
• MSc Course: MSc in …
• Word count:
1. Introduction (15%)
Provide a brief, clear introduction to the general topic with following components:
1) The notion of a general classification problem.
2) The notion of and rationale for separating test and train data sets.
3) Explain what the MNIST data set is about and how it is an example of the previous two points.
2. Data preparation and PCA Dimensionality Reduction (20%)
Prepare your data set for analysis.
Apply dimensionality reduction on the data set in R and visualize different classes in two dimensions based on it.
Investigate how many principal components are needed to encode the data set and give example of reconstruction.
2. K-Nearest-Neighbour Classification Error Rate Evaluation (30%) Implement and describe an experiment in R that evaluates the classification error rate for a k-nearestneighbour (kNN) classifier on the MNIST dataset.
Apply appropriate pre-processing and dimensionality reduction to the data.
Run the kNN classifier on reduced dataset. Assess the impact of PCA on kNN.
Determine the most suitable value for k experimentally, using a suitable error measurement. Use appropriate illustrations, diagrams and statistics.
3. Second ML Technique and Error Rate Evaluation (20%)
Implement and describe an experiment in R that evaluates the second, suitable classification algorithm of your choice on the MNIST dataset.
Use appropriate illustrations and diagrams as well as statistics in order to compare to the previous results.
Does PCA have the same effect on the chosen classifier as it had on kNN in the previous section?
4. Conclusion (15%)
Write a brief conclusion on your results and compare them to the results published for other algorithms on the MNIST dataset homepage.
Summarise your main findings.
1) Which approach and parameter value is best suited?
2) What other properties, other than solely the classification error, could be important to decide which method is most suited?
3) Explain the possible current limitations of your solutions and possible further strategies to improve on the results.
All arguments must be evidence-based.
References, plagiarism and collusion
Provide a list of references.
You are required to cite the work of others used in your solution, include a list of references, and avoid plagiarism and collusion. Remember each loan should have at least one citation (use the university recommended referencing style).
Appendix
Append all source code to reproduce your experiments.
References and Appendices themselves will not be marked. However, inappropriate use of these sections or their absence will be taken into consideration when awarding the final mark.
Report format
All above mentioned components should be composed in a single file.
The assignment must be presented in the following format:
• Font must be 11 point Arial font, Line Spacing – single and Spacing After – 10 pt or Line Spacing – double and Spacing After – 0 pt.
• All pages must be numbered.
• Margins must be as follows: Top: 1 inch, Bottom: 1 inch (2.5 cm), Left: 1.25 inches, Right: 1.25 inches (3.2 cm).
Report word limit: 2,000 words.
The word count excludes cover sheet, title, tables, figure labels, bibliography and appendices (100- 250 words).
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme