Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Radhika GargData mining

(/5)

576 Answers

Hire Me

Colin JenkinsEngineering

(5/5)

770 Answers

Hire Me

Charles BrackenFinance

(5/5)

891 Answers

Hire Me

Anthony BidiniiData mining

(5/5)

857 Answers

Hire Me

Others

(5/5)

In this project, we will be working with the Diamonds dataset.

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Background

In this project, we will be working with the Diamonds dataset. This dataset contains information about several thousand diamonds sold in the United States. Each diamond in the dataset has 10 attributes recorded for it, but we will only be interested in the following 5 attributes:

• price - The sales price of the diamond, in US Dollars.

• carat - The weight of the diamond, measured in carats. One carat is 200 mg.

• cut - Quality of the cut of the diamond. The levels (from worst to best) are Fair, Good, Very Good, Premium, and

Ideal.

• color - Level of the tint in the diamond. Colorless diamonds are generally preferred. The levels of this variable (from worst to best) are: J, I, H, G, F, E, and D.

• clarity - Indicates the level of internal defects in the diamond. The levels (from worst to best) are: I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF.

General Instructions

Create a new notebook named Project_04_YourLastName.ipynb and complete the instructions provided below.

Any set of instructions you see in this document with an orange bar to the left will indicate a place where you should create a markdown cell. If no instructions are provided regarding formatting, then the text should be unformatted.

Any set of instructions you see with a blue bar to the left will provide instructions for creating a single code cell. Read the instructions carefully.

Any time that you are asked to display a DataFrame in this assignment, you should do so without using the print()

function.

Assignment Header

Create a markdown cell with a level 1 header that reads: "DSCI 303 – Project 04". Add your name below that as a level 3 header.

Import the following packages using the standard aliases: numpy, pandas, and matplotlib.pyplot. No other packages should be used in this project.

Part 1: Loading the Dataset; Preliminary Analysis

In this section, we will load the data into a DataFrame, and will explore the structure of the data set.

Create a markdown cell that displays a level 2 header that reads: "Part 1: Loading the Dataset; Preliminary Analysis". Also add some text briefly describing the purpose of your code in this part.

The data is stored in the tab-delimited text file diamonds.txt. Download this file into the directory that contains your notebook, and then load the data into a DataFrame named diamonds. Use the head() method to display the first 10 rows of this DataFrame.

Add a markdown cell explaining that we will determine the size of the dataset.

Print the shape of the diamonds DataFrame.

We will now inspect the distribution of the columns in diamonds. Add a markdown cell to briefly explain this.

Call the DataFrame method describe() on diamonds to display a DataFrame containing descriptive statistics for each of the columns.

Part 2: Filtering and Sorting

In this part, you will be asked to use filtering and sorting techniques to display information for diamonds satisfying certain criteria.

Create a markdown cell that displays a level 2 header that reads: "Part 2: Filtering and Sorting". Also some text explaining that we will start by viewing information about the 5 most expensive diamonds in the dataset.

Complete the following steps by chaining DataFrame methods, and without creating any new DataFrame variables.

1. Select the columns price, carat, cut, color, and clarity from diamonds.

2. Sort the resulting DataFrame by price, in descending order.

3. Use head() to display the first five rows of the result.

Create a markdown cell explaining that we will now view information about the 5 least expensive diamonds in the dataset.

Complete the following steps by chaining DataFrame methods, and without creating any new DataFrame variables.

1. Select the columns price, carat, cut, color, and clarity from diamonds.

2. Sort the resulting DataFrame by price, in ascending order.

3. Use head() to display the first five rows of the result.

Create a markdown cell explaining that we will now view information about the 5 largest diamonds in the dataset with an ideal cut.

Complete the following steps by chaining DataFrame methods, and without creating any new DataFrame variables.

1. Select the columns price, carat, cut, color, and clarity from diamonds.

2. Use boolean masking to filter the DataFrame, keeping only records for diamonds with an ideal cut.

3. Sort the resulting DataFrame by carat, in descending order.

4. Use head() to display the first five rows of the result.

Create a markdown cell explaining that we will now view information about the 5 largest diamonds in the dataset with an fair cut.

Complete the following steps by chaining DataFrame methods, and without creating any new DataFrame variables.

1. Select the columns price, carat, cut, color, and clarity from diamonds.

2. Use boolean masking to filter the DataFrame, keeping only records for diamonds with an fair cut.

3. Sort the resulting DataFrame by carat, in descending order.

4. Use head() to display the first five rows of the result.

Part 3: Working with Categorical Variables

The columns cut, color, and clarity are categorical variables whose values represent discrete categories that the diamonds can be classified into. Any possible value that a categorical variable can take is referred to as a level of that variable.

As mentioned at the beginning of these instructions, the levels of each of the variables have a natural ordering, or ranking. However, Pandas will not understand the order that these levels should be in unless we specify the ordering ourselves.

Create a markdown cell that displays a level 2 header that reads: "Part 3: Working with Categorical Variables". Add some text explaining that we will be creating lists to specify the order for each of the three categorical variables.

Create three lists named clarity_levels, cut_levels, and color_levels. Each list should contain strings representing the levels of the associated categorical variable in order from worst to best.

We can specify the order for the levels of a categorical variable stored as a column in a DataFrame by using the pd.Categorical() function. To use this function, you will pass it two arguments: The first is the column whose levels you are setting, and the second is a list or array containing the levels in order. This function will return a new series object, which can be stored back in place of the original column. An example of this syntax is provided below:

df.some_column = pd.Categorical(df.some_column, levels_list)

Create a markdown cell explaining that we will now use these lists to communicate to Pandas the correct order for the levels of the three categorical variables.

Use pd.Categorical() to set the levels of the cut, color, and clarity columns. This will require three calls to

pd.Categorical().

Create a markdown cell explaining that we will now create lists of named colors to serve as palettes to be used for visualizations later in the notebook.

Create three lists named clarity_pal, color_pal, and cut_pal. Each list should contain a number of named colors equal to the number of levels found for the associated categorical variable. The colors within each list should be easy to distinguish from one-another.

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Radhika GargData mining

Colin JenkinsEngineering

Charles BrackenFinance

Anthony BidiniiData mining

Others

In this project, we will be working with the Diamonds dataset.

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Radhika GargData mining

Colin JenkinsEngineering

Charles BrackenFinance

Anthony BidiniiData mining

Others

In this project, we will be working with the Diamonds dataset.

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer