logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
607 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Ishrat KhanAccounting
(/5)

505 Answers

Hire Me
expert
Radhika GargData mining
(/5)

899 Answers

Hire Me
expert
Vikrant BansalComputer science
(5/5)

586 Answers

Hire Me
expert
Jacob CaleGeneral article writing
(5/5)

889 Answers

Hire Me
Others
(5/5)

this project is to develop tools for performing basic text analysis tasks, such as processing text read from a text file

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Introduction

The goal of this project is to develop tools for performing basic text analysis tasks, such as processing text read from a text file, tokenizing the text, and performing word counts. You will be asked to write several functions to be used in performing these tasks. You will test your functions by performing text analysis on the novel “A Tale of Two Cities”.

General Instructions

Create a Python script file and a Jupyter notebook file within the same directory. The script file should be named word_count.py and the Jupyter notebook file named Project_02_YourLastName.ipynb. You will use the script to define your functions and the notebook will be used to load the script and to test your functions.

Please download the files tale_of_two_cities.txt and stopwords.txt, storing these in the same directory as your script file and notebook file. It is important that these files are all in the same directory. Otherwise, your code will not run correctly when I run it.

Instructions for the Script File

Define functions with the following names: process_word(), process_line(), process_file(), find_unique(), find_frequency(), most_common(), remove_stop(), count_by_length(), and count_by_first().

Descriptions of each of these functions are provided below.

process_word()

This function should accept a single parameter named word. This parameter is expected to contain a string representing a word. The function should remove any punctuation from the string and convert it to lowercase. This can be done by performing the following steps.

1. Store the string '.!?,"\'()*_:;0123456789' in a variable named remove. This string contains all of the characters to be removed from the beginning and end of word, if they are present.

2. Use the strip() method for strings to remove punctuation and digits from the beginning and end of word. Pass the method the string remove. Store the stripped string in a variable.

3. Use the replace()method on the string created in Step 1 to replace any single quote characters (likely representing apostrophes) with an empty string. That is, replace occurrences of "'" with "". Store the result.

4. Use the lower() method on the string created in Step 2 to convert it to lower case. Store the result. The function should return the string created in Step 3.

process_line()

This function should accept a single parameter named line. This parameter is expected to contain a string representing a line of text read from a file. The function should perform the following processing steps to the line:

1. Use the replace() method to replace any dash characters "-" with spaces, storing the result in a variable.

2. Apply the split() method to the string created in Step 1 to create a list of individual words contained within the string. Store the resulting list in a variable named words.

3. Loop over the elements of words. Apply the process_word() function to each string in this list. It is possible for the resulting processed word to be an empty string. If the processed word is not empty (in other words, if it has a length greater than 0), then store it in a list named processed_words.

The function should return the list processed_words.

process_file()

This function should accept a single parameter named path. This parameter is expected to contain a string representing the relative location of a text file. The function will create and return a list of processed words contained in the file by performing the following tasks.

1. Use with and open() to open the file whose location is stored in path. Use readlines() to read the contents of the file into a list. Each string in this list will represent an entire line of text from the file.

2. Create an empty list named words.

3. Loop over the list created in Step 1. Apply the process_line() function to each string in this list. The list of words returned by process_line() should be concatenated to the end of the list words. The combined list should be stored back into words. Recall that you can concatenate two lists using the + operator.

The function should return the list words.

find_unique()

This function should accept a single parameter named words. This parameter is expected to contain a list of strings representing words. The function should create a list that contains exactly one copy of any string that appears in words.

1. Create an empty list to store the unique words.

2. Loop over the elements of words. If a particular element has not already been added to the list of unique words, then append it to that list. Do nothing if the element has already been added to the unique list.

The function should return the list of unique words.

find_frequency()

This function should accept a single parameter named words. This parameter is expected to contain a list of strings representing words. The function should create a dictionary recording the number of times each individual word appears in words. Each dictionary key should be a string representing a word, and each value should be a count representing the number of times that string appeared in words.

1. Create an empty dictionary named freq_dict to store the counts.

2. Loop over the elements of words. If a particular element has already been added to freq_dict as a key then increment the value associated with that key. If the element does not appear as a key in freq_dict, then add it as a key with a value of 1.

The function should return the dictionary freq_dict.

remove_stop()

Stop words are words that are removed from a collection of words when performing a text analysis. These are typically

very common words such as “a” and “the”.

This function should accept two parameters named words and stop. Both parameters are expected to contain a list of strings representing words. The function should return a list obtained by removing from words any strings that also appear in stop.

1. Create an empty list to store the non-stop words.

2. Loop over the elements of words. If a particular element does not appear in stop, then add it to the list create in Step 1. If the element does appear in stop, then do nothing.

The function should return the list of non-stop words.

 

most_common()

This function should accept two parameters named freq_dict and n. The parameter freq is expected to contain a dictionary recording word counts. The parameter n should be an integer. The function should find and display the n words with the highest frequency in freq_dict. One method of finding the words with the highest frequencies is described below.

1. Create an empty list named freq_list. This list will be used to store tuples created from key/value pairs found in freq_dict. These tuples will have the form (value, key).

2. Loop over freq_dict.items(). For each key/value pair in freq_dict, create a tuple of the form (value, key) and append this to freq_list. It is important that the value (i.e. word count) appears first in the tuple.

3. Use the sort() method to sort freq_list in descending order. This will sort the list of tuples according to the first element in each tuple, which represents the word count.

4. Print out the first n results from freq_list in the format shown below. The xxxx symbols should be replaced with words and the #### symbols should be replaced with word counts. The dashed line should be 16 characters long. Allot 12 characters for the word column and 4 characters for the count column. The word column should be left-aligned and the count column should be right-aligned. The desired alignments can be obtained using f-strings.

Word Count

xxxx ####

xxxx ####

xxxx ####

This function should not return any value.

count_by_length()

This function should accept a parameter named words, which is expected to contain a list of strings representing words. The function should determine the number of strings in words of each possible length and display the resulting counts.

1. Create an empty dictionary named count_dict.

2. Loop over the elements of words. For each element of words, calculate the length of the element, storing the result in a variable. If the length found in Step a has been previously added as a key in count_dict, then increment the value corresponding to that key. If the length does not appear as a key in count_dict, then add it as a key with a value of 1.

3. Create an empty list named count_list. Loop over freq_dict.items(). For each key/value pair in

count_dict, create a tuple of the form (key, value) and append this to count_list.

4. Sort count_list in descending order. Note that this will sort the list of tuples according to the first element in tuple, which represents a specific word length.

5. Print the results in count_list in the format shown below. The xxxx symbols should be replaced with word lengths and the #### symbols should be replaced with word counts. The dashed line should be 16 characters long. Allot 12 characters for the word column and 4 characters for the count column. The word column should be left-aligned and the count column should be right-aligned. The desired alignments can be obtained using f- strings.

Length Count

xxxx ####

xxxx ####

xxxx ####

This function should not return any value. 

count_by_first()

This function should accept a parameter named words, which is expected to contain a list of strings representing words. The function should determine the number of strings in words with each possible starting letter and display the results.

 

The steps performed by this function are very similar to those described in the count_by_length() function. The main difference is that you will be using the first characters of strings in words as keys in count_dict rather than the length of the string. Note that if my_string is a string, then you can access the first character of my_string using my_string[0].

Print the results in the format shown below. The x symbols should be replaced with letters and the #### symbols should be replaced with word counts. The dashed line should be 16 characters long. Allot 12 characters for the word column and 4 characters for the count column. The word column should be left-aligned and the count column should be right- aligned. The desired alignments can be obtained using f-strings.

The rows in your output should be arranged so that the letter column is in increasing order from a to z.

Letter Count

x ####

x ####

x ####

This function should not return any value.

(5/5)
Attachments:

Expert's Answer

607 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme