logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
A Plus ExpertComputer science
(5/5)

581 Answers

Hire Me
expert
joyComputer science
(4/5)

12 Answers

Hire Me
expert
Kathleen HaslamResume writing
(4/5)

680 Answers

Hire Me
expert
Rooma KalranMarketing
(5/5)

923 Answers

Hire Me
Others
(5/5)

In this homework assignment, you will be leveraging Databricks Community Edition and Spark to

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Big Data Architecture

Please complete your assignment and save it as a PDF or Word document then submit it electronically in the Assignments section of Canvas.  If you submit multiple assignments, only the latest submission will be graded.

Section 1 of 2: Reading invoice data in Spark

In this homework assignment, you will be leveraging Databricks Community Edition and Spark to answer some questions regarding an invoice dataset.  You may leverage Python or Scala, but you are expected to write code that completes the following tasks.

1) To get started, create a spark cluster in the Databricks console.  Once your cluster is up and running, take a screenshot and post it below.  

2) Read the invoice CSV into a resilient distributed dataset (RDD) using the code below.  Collect the first five rows and print them.  Take a screenshot of both the code and printed output and include it here.

invoice_rdd = sc.textFile("/databricks-datasets/online_retail/data-001/data.csv")

print(invoice_rdd.take(5)) 

Section 2 of 2: Answer the following questions regarding invoice data

For each question below, please: 

Use map and reduce functions to answer the question.

Provide the snippet of Spark code that you used to answer the question.

Include a screenshot of your notebook that includes both the code and the printed answer.

1) Which customer in the dataset has spent the most on products?  The quantity multiplied by the unit price will give you the total dollar amount spent per invoice line. 

2) What is the product description for the best selling product in the dataset?  We will define "Best Selling" as the product with the highest quantity sold.

3) How much has each country spent on products?  The output should have two columns, one being the country and the other being the gross dollar amount spent across all products.  Sort the output by the dollar amount, descending.  Print the entire output, showing a gross dollar amount for each country.

4) What is the highest-grossing day in the dataset?  Again, use quantity multiplied by unit price to get the revenue per line.

5) Finally, try out one of Databrick's visualizations.  Note that you will need to convert back to a DataFrame in order to visualize the data (hint: look at rdd.toDF()).  Create an appropriate DataFrame for visualization and call display on it.  

Take a screenshot of your code and the resulting visualization.  You can find available visualizations by expanding this icon at the bottom of a cell:

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme