logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Chase CruzEducation
(5/5)

956 Answers

Hire Me
expert
Winifred ScottResume writing
(5/5)

853 Answers

Hire Me
expert
Garima ThakurFinance
(5/5)

845 Answers

Hire Me
expert
Elmer BaurData mining
(5/5)

853 Answers

Hire Me
Others
(5/5)

Process two or more collections of data and compare some summary data about the two collections together.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Data Analysis

For this mini project, you must work individually.

Semi-structured Data Processing:

The goal of your assignment is to write a program that will read in JSON (or another semi-structured data format), from a Mongo DB collection, from a file, or from the web. 

You’ll then convert to a format that is structured with lines of data representing one type of unit (for example, one tweet from Twitter).  Your program should contain the data as lists of JSON structures (which, as you’ll recall, are just python dictionaries and lists).  Your program may also contain pandas dataframes for processed data.

Finally, your program will do some processing to collect data from some of the fields that will answer one or more questions, as described below, and write a file with the data suitable for answering each question.  Remember that some fields may be optional or have null values, so you may need to test for those conditions.  Graphing is optional.

Questions:

1. Process one collection of data and summarize information from several fields.

o This is like the example program for analyzing twitter hashtags, but must access different and more fields than in those examples.

2. Process one collection of data and separate it into different categories and provided summary statistics on those categories. 

o For example, bin the tweets by day or by hour and report on the number of tweets per day or hour.

3. Process two or more collections of data and compare some summary data about the two collections together.

o For example, collect Twitter user timelines from different political candidates and compare the number of retweets of their tweets.

Data:

You may collect data from Twitter, or some other URL that returns JSON data  (You may also choose another semi-structured data format: XML, HTML, YAML, etc.). You should collect at least several hundred data items, if possible.

Deliverable [total: 15 points]:

For this mini project, you must submit your data set, a program*, a report**, and output files.  Your program must be submitted as a Jupiter Notebook file that can be executed (.ipynb).

* A program (.ipynb) which does the following [subtotal: 10 points]:

o Reads in data from MongoDB, a file, or the web [1 points]

o Cleans and formats the data [2 points]

o Analyzes/Summarizes the data in three different ways [6 points (2 x 3)]

o Outputs the summaries in new files (3) with column headers [1 point]

** A report which describes the following [subtotal: 5 points]:

o The data and its source [1 point] 

o A description of your data exploration and data cleaning steps [1 point]

o Three clearly stated comparison questions with the unit of analysis, the comparison values and how they are computed. [1 point]

o A description of the program [1 point]

o A description of the output files [1 point]

For your program, you may use any of the code developed in class as a template, but it is absolutely essential that you use appropriate variable names and that you write original comments for what your program does.  Recall that good comments demonstrate your understanding of the code that you write and the problem that you are trying to solve.

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme