Big Data
Text Analytics in R
A Scandal in Bohemia by Arthur Conan Doyle.
The problem is to process a large document and analyze it.
1. Create A VCorpus after you separate the chapters. You should have three documents in your corpus.
a. Prior to removing the punctuation, find the 10 longest words and 10 longest sentences in each chapter. Prepare a table of this data as well as showing these items.
b. Read Introduction to Text Analytics.docx. Apply the methods in that document to the documents in your VCorpus. Also, Read text_analysis_in_R.pdf.
Show all you work with explanation of what you are doing. Tables are a good way to show your work in a compact manner.
After removing the stop words and punctuation:
c. If the dendrogram is unreadable, remove words of length 2, 3, 4, etc. to see how the dendrogram is affected. Show the dendrogram each time.
For the following you will need to write R functions to help you compute the results.
Use the packages textreuse, wordnet, zipfR
d. Use openNLP to mark the parts of speech for the 10 longest sentences found in part b for nouns and verbs having a length of 5 or greater. Prepare a table with this data
e. Analyze word frequency using functions from package zipfR. Prepare a table with this data.
f. Generate bigrams and trigrams for all words whose length is greater than 6 characters in the 10 longest sentences of each chapter. Prepare a table with this data.
g. Perform the procedures in Introduction to Text Analytics re Word Cloud and Sentiment Analysis.
h. Install the packages stringi and quanteda and select three additional methods from each – not the ones used in (g) – apply them and show the results.
Describe the methods you use, the results, you get, and what you understand about the theme of the book.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme