Description: This program will extract data from a web page and perform some analysis.
Input: No user provided input. Data will be collected from any news website.
Output:
Print the headlines
Generate a wordcloud for the words/bigrams in the headlines
Calculate the sentiment
See details in the Procedure.
Procedure:
1. Import the needed libraries
These are just some of the libraries I think need to be used
# import the libraries
import bs4 as bs
import requests
from wordcloud import WordCloud
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
2. Define the target URL and open it
# defining the target source
# getting the content
body = requests.get()
3. Load the page into your “soup” (assuming you are using Beautifulsoup)
soup = bs.BeautifulSoup(body.content,'html.parser')
4. Create an empty list to host the list of words from the headlines
5. Loop into the “soup”, looking for the section with headlines
6. Transform the story heading/headlines into a string first and a list then
7. Remove from the list you created all the non-semantically relevant words (the “stopwords”), using the attached file “stopwords_en.txt” for the list of stopwords. Feel free to update the list, adding words that may be too frequent and – in your opinion – not too relevant (explaining the reason why you want to remove them). Filter out non-alphabetical elements and perform all the other preliminary cleaning on the text that you may require
#stopwords file is attached
#adding words to the stopwords
stopwords.extend(['dallas', 'texas', 'city'])
8. Looping into the list of clean headlines, print the headlines with the highest and lowest sentiment (3 each)
This is just an example of how to do the Sentiment Analysis
##### Sentiment Analysis #####
# calculating the sentiment using vader library
analyzer = SentimentIntensityAnalyzer()
# vader needs strings as input. Transforming the list into string
clean_text_str_pro = ' '.join(Pro_words)
vad_sentiment = analyzer.polarity_scores(clean_text_str_pro)
pos_pro = vad_sentiment ['pos']
neg_pro = vad_sentiment ['neg']
neu_pro = vad_sentiment ['neu']
9. Extract bigram, generating a separate list. Consider bigrams 2 words appearing together more than 2 times in the whole text. Bigrams will be like “word1_word2”, meaning you will create a new string composed by the 2 words, separated by an underscore (“_”)
10. Merge the list of single words with the list of bigrams
11. Create a wordcloud with the resulting list. If wordcloud is not available on your computer, either use an online option (see previous assignments) or calculate the sentiment as in previous assignments
This is an example of how to do the Word Cloud
##### Word cloud #####
print ('\n\n--- Generating the wordcloud')
# Transforming the lists of words into strings
Pro_words_string = ' '.join(Pro_words)
# Defining the wordcloud parameters
wc = WordCloud(background_color = "white", max_words = 2000, stopwords = stopwords)
# Store to file
#wc.to_file('Pro.png')
# Show the cloud
plt.imshow(wc)
plt.axis('off')
plt.show()
12. Submit the py file
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme