Following nomenclature to be used for this assignment
1. Report text available against each student document or instance
2. Words in the report text -> tokens
3. Splitting a sentence or document into constituent words tokenization
4. Stop-words commonly used words; meaning could still be retained without them
5. White spaces, commas and full stops to act as token splitter delimiters.
M1. Load the documents for all the 37 students. Tokenize all the documents and store the tokens. You may:
• use the NLTK package for tokenization, or
(Suggestion: Python users
import nltk
from nltk.tokenize import word_tokenize).
• write your own code for splitting the document into tokens (words)
M2. Merging the tokens from all documents, create a master list of distinct tokens available across all documents. Let us call this as “token population”
M3. Load the stop-words using NLTK package. Study these stop-words. What do you think they represent?
(Suggestion: Python users from nltk.corpus import stopwords).
M4. Create a “bag-of-words” from the “token population” by removing the stop-words.
M5. For each document / instance, create 2 feature vectors as follows
o First vector attributes indicate the presence / absence of tokens from bag-of-words
in that document
o Second vector attributes indicate the count of presence of each word from bag-of- words in the document
o Do this for all documents. After this is done, you should have 37 vectors each of first and second kind.
M6. Calculate the Jaccard Coefficient between the document vectors. Use first vector for each document for this. Jaccard coefficient between two binary vectors is defined as:
JC = (f11) / (f01+ f10+ f11)
f11= number of attributes where x was 1 and y was 1; x & y are vectors
M7. Calculate the Cosine similarity between the documents by using the second feature vector for each document.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme