Computer Science
Write a program that preprocesses the collection. This preprocessing stage should specifically include a function that tokenizes the text. In doing so, tokenize on whitespace and remove For this task, please use your own implementation of a tokenizer.
INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS
Tasks:
- Write a program that preprocesses the collection. This preprocessing stage should specifically include a function that tokenizes the text. In doing so, tokenize on whitespace and remove For this task, please use your own implementation of a tokenizer.
- Determine the frequency of occurrence for all the words in the collection. Answer the following questions:
- What is the total number of words in the collection?
- What is the vocabulary size? (i.e., number of unique terms).
- What are the top 20 words in the ranking? (i.e., the words with the highest frequencies).
- From these top 20 words, which ones are stop-words?
- What is the minimum number of unique words accounting for 15% of the total number of words in the collection?
Example: if the total number of words in the collection is 100, and we have the fol- lowing word-frequency pairs:
Word tf
the of a
data mining
…
|
20
|
10
|
10
|
8
|
7
|
…
|
the answer to this question will be (1 word accounts for 15% of the total 100 words).
- Integrate the Porter stemmer and a stopword eliminator into your code. Answer again questions a.-e. from the previous point. (See below a link to a Java Porter stemmer implementation and to a stopwords list).
https://www.dropbox.com/s/rexuzz3j56vi4bt/Porter.java
https://www.dropbox.com/s/5789sj8v07j2id0/stopwords.txt
Related Questions
. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java
CS 340 Milestone One Guidelines and Rubric
Overview: For this assignment, you will implement the fundamental operations of create, read, update,
. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class
Retail Transaction Programming Project
Project Requirements:
Develop a program to emulate a purchase transaction at a retail store. This
. The following program contains five errors. Identify the errors and fix them
7COM1028
Secure Systems Programming
Referral Coursework: Secure
. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer
CS 340 Final Project Guidelines and Rubric
Overview The final project will encompass developing a web service using a software stack and impleme