logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Alfred DodddEconomics
(5/5)

720 Answers

Hire Me
expert
Lwin ArakanLaw
(5/5)

596 Answers

Hire Me
expert
Samuel BarberaMathematics
(5/5)

986 Answers

Hire Me
expert
Colleen LanclosLaw
(4/5)

680 Answers

Hire Me
Others
(5/5)

Remove from the list you created all the non-semantically relevant words

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Web page mining

Assignment Specification

Description: This program will extract data from a web page and perform some analysis.

Input: No user provided input. Data will be collected from any news website.

Output: 

Print the headlines

Generate a wordcloud for the words/bigrams in the headlines

Calculate the sentiment

See details in the Procedure.

Procedure:

1. Import the needed libraries 

These are just some of the libraries I think need to be used

# import the libraries

import bs4 as bs

import requests

from wordcloud import WordCloud

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

2. Define the target URL and open it

# defining the target source

# getting the content

body = requests.get()

3. Load the page into your “soup” (assuming you are using Beautifulsoup)

soup = bs.BeautifulSoup(body.content,'html.parser')

4. Create an empty list to host the list of words from the headlines

5. Loop into the “soup”, looking for the section with headlines

6. Transform the story heading/headlines into a string first and a list then

7. Remove from the list you created all the non-semantically relevant words (the “stopwords”), using the attached file “stopwords_en.txt” for the list of stopwords. Feel free to update the list, adding words that may be too frequent and – in your opinion – not too relevant (explaining the reason why you want to remove them). Filter out non-alphabetical elements and perform all the other preliminary cleaning on the text that you may require

#stopwords file is attached

#adding words to the stopwords

stopwords.extend(['dallas', 'texas', 'city'])

8. Looping into the list of clean headlines, print the headlines with the highest and lowest sentiment (3 each)

This is just an example of how to do the Sentiment Analysis

##### Sentiment Analysis #####

# calculating the sentiment using vader library

analyzer = SentimentIntensityAnalyzer()

# vader needs strings as input. Transforming the list into string

clean_text_str_pro = ' '.join(Pro_words)

vad_sentiment = analyzer.polarity_scores(clean_text_str_pro)

pos_pro = vad_sentiment ['pos']

neg_pro = vad_sentiment ['neg']

neu_pro = vad_sentiment ['neu']

9. Extract bigram, generating a separate list. Consider bigrams 2 words appearing together more than 2 times in the whole text. Bigrams will be like “word1_word2”, meaning you will create a new string composed by the 2 words, separated by an underscore (“_”)

10. Merge the list of single words with the list of bigrams

11. Create a wordcloud with the resulting list. If wordcloud is not available on your computer, either use an online option (see previous assignments) or calculate the sentiment as in previous assignments

This is an example of how to do the Word Cloud

##### Word cloud #####

print ('\n\n--- Generating the wordcloud')

# Transforming the lists of words into strings

Pro_words_string = ' '.join(Pro_words)

# Defining the wordcloud parameters

wc = WordCloud(background_color = "white", max_words = 2000, stopwords = stopwords)

# Store to file

#wc.to_file('Pro.png')

# Show the cloud

plt.imshow(wc)

plt.axis('off')

plt.show()

12. Submit the py file

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme