Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Christopher MclaughlinMathematics

(5/5)

683 Answers

Hire Me

Abhilekh Nath DasComputer science

(/5)

887 Answers

Hire Me

Dai AndrewsGeneral article writing

(5/5)

982 Answers

Hire Me

Martha BagemihlStatistics

(5/5)

801 Answers

Hire Me

Others

(5/5)

coursework must be attempted in the groups of 4 or 5 students. Big Data analytics on a real case study

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Data Analytics

This coursework (CRWK) must be attempted in the groups of 4 or 5 students. This coursework is divided into two sections: (1) Big Data analytics on a real case study and (2) group presentation. All the group members must attend the presentation. The presentation would be online through Microsoft Teams. If you do not turn up in the presentation date with the video call, you will fail the module.

Overall mark for CRWK comes from two main activities as follows:

1- Big Data Analytics report (around 3,000 words, with a tolerance of ± 10%) in HTML format (60%)

2- Presentation (40%)

Marking Scheme

Topic	Totalmark	Remarks(breakdown of marks for each sub-task)
Big Data Analytics using Spark SQL	30	(6)	Providing 2 queries using Spark SQL.
		(14)	Developing advanced SQL statements. Refer to:https://spark.apache.org/docs/3.0.0/sql-ref.html
		(10)	Visualizing the outcomes of queries into the graphical andtextual format, and be able to interpret them.
Big Data Analytics using PySpark	60	(45)	Analyzing the dataset through 3 statistical analytics methods including advanced descriptive statistics,correlation, hypothesis testing, density estimation, etc.
Big Data Analytics using PySpark	60	(15)	Designing one classifier, then evaluate and visualize the accuracy/performance.Applying a multi-class classifier is considered for full mark.
Documentation	10	(10)	Write down a well-organized report for a programming andanalytics project.
Total:	100

IMPORTANT: you must use CRWK template in the HTML format, otherwise it will be counted as plagiarism and your group mark would be zero. Please refer to the “THE FORMAT OF FINAL SUBMISSION” section.

Big Data Analytics using Spark

Big Data Analytics

(1) Understanding Dataset: CSE-CIC-IDS20181

This dataset was originally created by the University of New Brunswick for analyzing DDoS data. You can find the full dataset and its description here. The dataset itself was based on logs of the university's servers, which found various DoS attacks throughout the publicly available period to generate totally 80 attributes with 6.40GB size. We will use about 2.6GB of the data to process it with the restricted PCs to 4GB RAM. Download it from here. When writing machine learning or statistical analysis for this data, note that the Label column is arguably the most important portion of data, as it determines if the packets sent are malicious or not.

a) The features are described in the “IDS2018_Features.xlsx” file in Moodle page.

b) The labels are as follows:

• “Label”: normal traffic

• “Benign”: susceptible to DoS attack

c) In this coursework, we use more than 8.2-million records with the size of 2.6GB. As a big data specialist, firstly, we should read and understand the features, then apply modeling techniques. If you want to see a few records of this dataset, you can either use [1] Hadoop HDFS and Hive, [2] Spark SQL or [3] RDD for printing a few records for your understanding.

1 Source: https://registry.opendata.aws/cse-cic-ids2018/ & https://www.unb.ca/cic/datasets/ids-2018.html

(2) Big Data Query & Analysis using Spark SQL [30 marks]

This task is using Spark SQL for converting big sized raw data into useful information. Each member of a group should implement 2 complex SQL queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings.

You can use https://spark.apache.org/docs/3.0.0/sql-ref.html for more information.

• What do you need to put in the HTML report per student?

1. At least two Spark SQL queries.

2. A short explanation of the queries.

3. The working solution, i.e., plot or table.

• Tip: The mark for this section depends on the level of your queries complexity, for instance using the simple select query is not supposed for a full mark.

(3) Advanced Analytics using PySpark [60 marks]

In this section, you will conduct advanced analytics using PySpark.

3.1. Analyze and Interpret Big Data using PySpark (45 marks)

Every member of a group should analyze data through 3 analytical methods (e.g., advanced descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels etc. accordingly.

Note: we need a working solution without system or logical error for the good/full mark.

3.2. Design and Build a Machine Learning (ML) technique (15 marks)

Every member of a group should go over https://spark.apache.org/docs/3.0.0/ml-guide.html and apply one ML technique. You can apply one the following approaches: Classification, Regression, Clustering, Dimensionality Reduction, Feature Extraction, Frequent Pattern mining or Optimization. Explain and evaluate your model and its results into the numerical and/or graphical representations.

Note: If you are 4 students in a group, you should develop 4 different models. If you have a similar model, the mark would be zero.

(4) Documentation [10 marks]

Your final report must follow the “The format of final submission” section. Your work must demonstrate appropriate understanding of building a user friendly, efficient and comprehensive analytics report for a big data project to help move users (readers) around to find the relevant contents.

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Christopher MclaughlinMathematics

Abhilekh Nath DasComputer science

Dai AndrewsGeneral article writing

Martha BagemihlStatistics

Others

coursework must be attempted in the groups of 4 or 5 students. Big Data analytics on a real case study

ANSWER ALL QUESTIONS

Data Analytics

Marking Scheme

Big Data Analytics using Spark

Big Data Analytics

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Christopher MclaughlinMathematics

Abhilekh Nath DasComputer science

Dai AndrewsGeneral article writing

Martha BagemihlStatistics

Others

coursework must be attempted in the groups of 4 or 5 students. Big Data analytics on a real case study

ANSWER ALL QUESTIONS

Data Analytics

Marking Scheme

Big Data Analytics using Spark

Big Data Analytics

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer