Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Earle BirdselllData mining

(5/5)

858 Answers

Hire Me

Gary BartonAccounting

(5/5)

650 Answers

Hire Me

Hunter EdwardssHistory

(5/5)

842 Answers

Hire Me

Dushyant ChertriLaw

(5/5)

522 Answers

Hire Me

Others

(5/5)

Please make sure your answer in the PDF file is clear and readable and name your file as follows

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

INTRODUCTION TO DATA SCIENCE

INSTRUCTIONS TO CANDIDATES

1. The Extended Assignment (EA) is an open-book assessment. Students can refer to online resources, learning materials, textbooks, and other reading materials to answer the questions that have been posted in the assessment.

2. Answer ALL questions.

3. The duration to complete the EA is TWENTY-FOUR (24) HOURS.

4. Students are allowed ONE (1) attempt to do the EA successfully where only ONE (1) duly completed EA submission is permitted. Multiple submissions are NOT allowed.

5. MAXIMUM file size for your EA submission to be uploaded to ULearn is 20MB.

6. Please upload your answers in ONE (1) PDF file.

7. Please make sure your answer in the PDF file is clear and readable and name your file as follows: "your name_your ID_EA Answer"

8. Late submission and unclear/unreadable answer will not be accepted.

NOTE: You are required to submit “CERTIFICATION OF ORIGINALITY” in the first page of your answer sheet.

1. a. Consider the structure of training data with 32 rows as shown in TABLE Q1 for a classification problem with four possible classes.

Fill up appropriate headers for <attr1>, <attr2>, attr3> and

<attr4> according to attributes of your own training data. Headers for ID and Class have been provided. Later, fill up data corresponding to the four attributes and the four-option Class. The type of data for

<attr1> is binary, <attr2> is continuous, <attr3> is nominal, and

<attr4> is ordinal. The values for each attribute must be diverse. Based on the training data that have been furnished, compute the

Entropy for the overall collection of the training data, the attribute with binary data, the attribute with continuous data using multiway split, the attribute with nominal data using multiway split, and the attribute with ordinal data using multiway split. The split breakdowns for attributes requiring multiway split must be clearly indicated. Lastly, suggest with justification which attribute in the training data is the most heterogenous.

[35 marks]

b. Suppose that you have been hired by a digital news agency to summarize top-10 daily news on a specific vertical such as computing, medicine, finance, entertainment, or law in Malaysia. As a junior data scientist, suggest a complete text mining process that you will perform to achieve the goal.

[15 marks]

2. TABLE Q2 displays an unfilled temperature readings summary from ABC weather station in East Borneo comparing October, November, and December from 1990 till 1999 to that of from 2010 till 2019. The table should display the number of months in which the average maximum daily temperature was low (< 16oC), medium, or high (> 26oC). The investigation aims to discover whether a significant difference between the two rows exists.

Firstly, furnish TABLE Q2 with data in the Low, Medium, and High columns. The data for 1990-1999 must be unique from that of 2010-2019.

Assuming that the readings are independent from month to month, let unknown parameters 𝑝𝑑,𝑚 be the probability that a month's reading goes to bin 𝑚 ∈

{𝐿𝑜𝑤, 𝑀𝑒𝑑𝑖𝑢𝑚, 𝐻𝑖𝑔ℎ} in decade 𝑑 ∈ {1990 − 1999, 2010 − 2019}. As a junior data scientist, you have been requested to (i) provide expressions for the maximum likelihood estimates 𝑝̂𝑑,𝑚, stating what to maximize and over which variables, (ii) establish a null hypothesis 𝐻0 such that the probabilities are identical in both 1990-1999 and 2010-2019 and these probabilities are called

𝑞𝑘 to provide the maximum likelihood estimates 𝑞̂𝑘 under 𝐻0, perform a test onto 𝐻0 using the test statistics given as

and (iii) considering parametric sampling to compute the distribution for 𝑡 under 𝐻0. Additionally, your tasks also include (iv) explaining the relevance of one- sided test vs two-sided test for this investigation, (v) providing pseudocode to compute the p-value for the 𝐻0 test, and finally (vi) explaining an advantage and a disadvantage of a count-based test as opposed to a linear regression-based test.

[50 marks]

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Earle BirdselllData mining

Gary BartonAccounting

Hunter EdwardssHistory

Dushyant ChertriLaw

Others

Please make sure your answer in the PDF file is clear and readable and name your file as follows

ANSWER ALL QUESTIONS

INTRODUCTION TO DATA SCIENCE

INSTRUCTIONS TO CANDIDATES

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Earle BirdselllData mining

Gary BartonAccounting

Hunter EdwardssHistory

Dushyant ChertriLaw

Others

Please make sure your answer in the PDF file is clear and readable and name your file as follows

ANSWER ALL QUESTIONS

INTRODUCTION TO DATA SCIENCE

INSTRUCTIONS TO CANDIDATES

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer