logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
KENNEDY NGANGAStatistics
(/5)

602 Answers

Hire Me
expert
Chris AddisonCriminology
(5/5)

978 Answers

Hire Me
expert
Carlos HowardComputer science
(5/5)

913 Answers

Hire Me
expert
Dayanara AliPolitical science
(5/5)

650 Answers

Hire Me
Others
(5/5)

The goal is to familiarize you with Locality Sensitive Hashing, and different types of collaborative filtering recommendation systems

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

DSCI553 Foundations and Applications of Data Mining

Spring 2022

Assignment 3

1. Overview of the Assignment

In Assignment 3, you will complete two tasks. The goal is to familiarize you with Locality Sensitive Hashing (LSH), and different types of collaborative-filtering recommendation systems. The dataset you are going to use is a subset from the Yelp dataset used in the previous assignments.

4. Tasks

Note: This Assignment has been divided into 2 parts on Vocareum. This has been done to provide more computational resources.

4.1 Task1: Jaccard based LSH (2 points)

In this task, you will implement the Locality Sensitive Hashing algorithm with Jaccard similarity. In this task, we focus on the “0 or 1” ratings rather than the actual ratings/stars from the users. Specifically, if a user has rated a business, the user’s contribution in the characteristic matrix is 1. If the user hasn’t rated the business, the contribution is 0. You need to identify similar businesses whose similarity >= 0.5.

You can define any collection of hash functions that you think would result in a consistent permutation of the row entries of the characteristic matrix. Some potential hash functions are:

f(x)= (ax + b) % m or f(x) = ((ax + b) % p) % m

where p is any prime number and m is the number of bins. Please carefully design your hash functions.

After you have defined all the hashing functions, you will build the signature matrix. Then you will divide the matrix into b bands with r rows each, where b x r = n (n is the number of hash functions). You should carefully select a good combination of b and r in your implementation (b>1 and r>1). Remember that two items are a candidate pair if their signatures are identical in at least one band.

Your final results will be the candidate pairs whose original Jaccard similarity is >= 0.5. You need to write the final results into a CSV file according to the output format below. 

 

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme