(5/5)

Buy Now $35 USD

621 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Antonio SullivannResume writing

(5/5)

678 Answers

Hire Me

StatAnalytica ExpertCriminology

(5/5)

924 Answers

Hire Me

Avinash KumaarData mining

(/5)

675 Answers

Hire Me

Quintin MiddletonResume writing

(4/5)

775 Answers

Hire Me

Python Programming

(5/5)

This lab is specially designed for you to learn Spark MLlib with Python on the Databricks platform

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Instructions:

This lab is specially designed for you to learn Spark MLlib with Python on the Databricks platform step by step, including tools for data preprocessing, Machine Learning (ML) pipelines, several different machine learning algorithms, and corresponding ML applications.

This lab includes two tasks/mini-projects: 1) market basket analysis and recommendation, and 2) binary classification using logistic regression. All of these are very helpful for your final project. More specifically, in this lab, you will learn

• How to preprocess data with Spark MLlib

• How to use machine learning pipelines for optimization with Spark MLlib

• how to conduct advanced data analytics with Spark MLlib

o Association rules

o Classification

o Hyperparameter tuning and cross validation

o Market basket analysis

o Product recommendation

• how to embed SQL query in Spark on Databricks

In this lab, Databricks Runtime 7.3 ML or above is required for creating a new cluster to run

Spark MLlib application programs.

Task 1 (40 points + Extra Credit 20 points): Market Basket Analysis and Recommendation

The goal of this task is to use Spark MLlib to build a model to generate association rules to quickly run the market basket analysis to uncover associations between different items, then further to provide recommendations for purchase on a distributed platform.

Please follow the following step-by-step instructions to write the PySpark code with Spark MLlib to complete Task 2:

1.1 (2 points) Create a new notebook on Databricks, AIT614-Sec#_Lab5.1_YourName.ipynb, and write your full name and course# with section# on the notebook.

1.2 (3 points) Please read Spark MLlib Main Guide - Frequent Pattern Mining and copy FP-Growth

Python code example into the notebook and run it.

Hint: The outputs should be:

1.3 (10 points) Create a training set. Create your own dataset for training. For example, you can create different kinds of food items, and then display this dataset. The records/rows of the dataset should be >= 15.

Hint: For example, the output of some rows could be:

1.4 (5 points) Use FP-Growth to build a model. You can set different minSupport and minConfidence

values to test the model.

1.5 (2 points) Display frequent itemsets.

Hint: For example, the output could be:

1.6 (3 points) Display generated association rules.

Hint: For example, the output could be:

1.7 (5 points) Create a test set. Create your own dataset for testing. The rows of the dataset should be >= 5.

Hint: For example, the output could be:

1.8 (5 points) Make recommendation/prediction.

Hint: For example, the output could be:

1.9 (5 points) List citations/references to appreciate other’s work in a professional manner.

1.10 (Extra Credit 20 points) Load a real-world dataset and split into training and test sets then generate corresponding outputs.

Task 2 (60 points): Binary Classification using Logistic Regression

The goal of this task is to build a machine learning pipeline including a classification model that predicts the `Attrition` (Yes or No) from the features included in the dataset (income, work years, education level, marital status, job role, and so on), which we used in the Lab 3 and Lab 4.

Please follow the below step-by-step instructions and Databrick Spark MLlib tutorials and code examples: Getting started with MLlib.* to write the PySpark code with MLlib to complete the Task 1:

2.1 (2 points) Create a new notebook on Databricks, AIT614-Sec#_Lab5.2_YourName.ipynb, and write your full name and course# with section# on the notebook.

2.2 (5 points) Please follow the Lab 4 instruction to read the data file EmployeeAttrition.csv into the notebook and display the dataset schema.

Hint: Please add inferSchema ="true" in the file reading. The code snippets should be similar to:

2.3 (5 points) Randomly split data into training and test sets. Display the distribution of the number of HourlyRate using summary() and Education using groupBy() in the training set. Hint: The outputs are similar to:

2.4 (12 points) Feature preprocessing.

2.4.1 (5 points) In the displayed schema, you can find the categorical features with string data type. Select five categorical features "Department", "EducationField”, “Gender", "JobRole", and "MaritalStatus" and convert to index vector using stringIndexer() only.

2.4.2 (2 points) Set Attrition feature (Yes or No) as a label.

2.4.3 (3 points) Select these numeric features: "Age", "DailyRate", "Education", "DistanceFromHome", "HourlyRate", "JobInvolvement", "JobLevel", "JobSatisfaction", "MonthlyIncome", "YearsAtCompany", "YearsInCurrentRole", "YearsWithCurrManager", "NumCompaniesWorked", "PerformanceRating", "EnvironmentSatisfaction"

2.4.4 (2 points) Combine all feature columns into a single feature vector.

2.5 (2 points) Define the model. Use a logistic regression model LogisticRegression().

2.6 (8 points) Build the pipeline. Define a pipeline using pipeline() and then apply the pipeline model to the test dataset. Note that in this step, it may take a few minutes to build. Then display the predictions from the model.

Hint: Note that only FOUR stages in the pipeline. The output is similar to:

(5/5)

Attachments:

Instructions Files

Expert's Answer

Buy Now $35 USD

621 Times Downloaded

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Antonio SullivannResume writing

StatAnalytica ExpertCriminology

Avinash KumaarData mining

Quintin MiddletonResume writing

Python Programming

This lab is specially designed for you to learn Spark MLlib with Python on the Databricks platform

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Antonio SullivannResume writing

StatAnalytica ExpertCriminology

Avinash KumaarData mining

Quintin MiddletonResume writing

Python Programming

This lab is specially designed for you to learn Spark MLlib with Python on the Databricks platform

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer