Instructions
Purpose is to use a text file provided to create a table below.
Section 1) split the large string via pattern
section 2) extract dates
section 3) create tibble
Assignment should take 15~30 minutes
## # A tibble: 131 x 2
## date text
## <date> <chr>
## 1 2015-02-01 "MSNBC Febru~
## 2 2015-02-01 "MSNBC Febru~
## 3 2015-02-02 "MSNBC Febru~
## # ... with 128 more rows
First downloadmsnbc_text.TXT and load it with readr package. Save it to single string variable called text
1) Split the string based on pattern
Rightnow, text variable should be a giant single string that has multiple documents.
If you open the msnbc_text notice how each document start with with something like 1 of 131 DOCUMENTS, 2 of 131 DOCUMENTS and so on. This is a pattern that separates each document in the file.
Instead of one big string, split the string (which should be in a variable called text at this point) on the pattern that separates each document and save it as a character vector.
You can do this by writing a regular expression that captures this pattern and then use str_split(text, pattern) %>% unlist() to split the single string you read in with readr::read_file() into separate documents
Check the length of your new character vector (make sure you have a character vector and not a list). You should have 132 items in your vector, but this is strange bc we have 131 documents. If you did this correctly, R will have created a string with only whitespace (“ and”" are whitespace characters) as the first element, check to make sure this is the case. If not, you did something wrong. If so, then subset the vector so we only include items 2 on from the text vector and save it back into the varaible text.
Lastly, trim whitespace from both sides of each document in the vector
Extract the dates
You should notice another pattern in the text for each document, the date appears at the top with a specific pattern. Use this pattern to extract the date from each document and save this in a variable called dates
Create a table with the data
create a table with all these variables in order (date, text).
and call it df. Each document’s data should be a row in the table
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme