This Assignment
Welcome to the third homework assignment of Data 100! In this assignment, we will be exploring tweets from several high profile Twitter users.
In this assignment you will gain practice with:
Conducting Data Cleaning and EDA on a text-based dataset. Manipulating data in pandas with the datetime and string accessors. Writing regular expressions and using pandas regex methods.
Performing sentiment analysis on social media using VADER.
In [2]:
# Run this cell to set up your notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
from ds100_utils import *
# Ensure that Pandas shows at least 280 characters in columns, so we can see ful
pd.set_option('max_colwidth', 280) plt.style.use('fivethirtyeight')
sns.set()
sns.set_context("talk")
def horiz_concat_df(dict_of_df, head=None): """
Horizontally concatenante multiple DataFrames for easier visualization. Each DataFrame must have the same columns.
"""
df = pd.concat([df.reset_index(drop=True) for df in dict_of_df.values()], ax
if head is None: return df
return df.head(head)
Score Breakdown
Question Points
1a 1
1b 1
1c 3
1d 1
2a 2
2b 2
2c 2
2d 2
2e 2
2f 1
3a 1
3b 1
3c 1
4a 1
4b 1
4ci 1
Question Points
4cii 1
4d 1
4e 2
4f 2
4g 2
5a 2
5b 2
Total 35
Question 1: Importing the Data
The data for this assignment was obtained using the Twitter APIs (https://developer.twitter.com/en/docs/twitter-api). To ensure that everyone has the same data and to eliminate the need for every student to apply for a Twitter developer account, we have collected a sample of tweets from several high-profile public figures. The data is stored in the folder data . Run the following cell to list the contents of the directory:
In [3]:
AOC_recent_tweets.txt
BernieSanders_recent_tweets.txt BillGates_recent_tweets.txt
Cristiano_recent_tweets.txt
EmmanuelMacron_recent_tweets.txt elonmusk_recent_tweets.txt
Question 1a
Let's examine the contents of one of these files. Using the open function (https://docs.python.org/3/library/functions.html#open) and read operation (https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects) on a python file object, read the first 1000 characters in data/BernieSanders_recent_tweets.txt and store your result in the variable q1a . Then display the result so you can read it.
Caution: Viewing the contents of large files in a Jupyter notebook could crash your browser. Be careful not to print the entire contents of the file.
Hint: You might want to try to use with :
with open("filename", "r") as f: f.read(2)
In [7]:
File "/tmp/ipykernel_115/3899424161.py", line 1 q1a =
^
SyntaxError: invalid syntax
In [ ]:
Question 1b
What format is the data in? Answer this question by entering the letter corresponding to the right format in the variable q1b below.
A. CSV
B. HTML
C. JavaScript Object Notation (JSON)
D. Excel XML
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme