logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Siddharth GuptaStatistics
(/5)

680 Answers

Hire Me
expert
Cameron CollinssHistory
(5/5)

947 Answers

Hire Me
expert
Winifred ScottResume writing
(5/5)

986 Answers

Hire Me
expert
Willard BoiceGeneral article writing
(5/5)

583 Answers

Hire Me
Others
(5/5)

This document serves to provide some guidance for the trainees for their final project for the Data Science pathway.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Summary

 

This document serves to provide some guidance for the trainees for their final project for the Data Science pathway. Trainees will use the following datasets.

 

1. 2019-01.csv

2. 2019-02.csv

3. 2019-03.csv

4. airports.csv

5. carriers.csv

6. plane-data.csv

The Data:

The data for this project can be categorized into 4 sections. Flight details are captured on files 2019-01.csv, 2019-02.csv, 2019-03.csv. Airport data are in the airports.csv, carrier information in the carriers.csv and plane data in the plane-data.csv. 

 

Attributes shared by the flight details include:

Year  - 2019 

Month - 1-12 

DayofMonth  - 1-31 

DayOfWeek - 1 (Monday) - 7 (Sunday) 

DepTime  - actual departure time (local, hhmm) 

CRSDepTime  - scheduled departure time (local, hhmm) 

ArrTime  - actual arrival time (local, hhmm) 

CRSArrTime  - scheduled arrival time (local, hhmm) 

UniqueCarrier - unique carrier code 

FlightNum  - flight number 

TailNum  - plane tail number 

ActualElapsedTime - in minutes 

CRSElapsedTime  - in minutes 

AirTime - in minutes 

ArrDelay  - arrival delay, in minutes 

DepDelay  - departure delay, in minutes 

Origin  - origin IATA airport code 

Dest  - destination IATA airport code 

Distance  - in miles 

TaxiIn  - taxi in time, in minutes 

TaxiOut  - taxi out time in minutes 

Cancelled  - was the flight cancelled? 

CancellationCode  - reason for cancellation (A = carrier, B = weather, C = NAS, D = security) 

Diverted  - 1 = yes, 0 = no 

CarrierDelay  - in minutes 

WeatherDelay  - in minutes 

NASDelay  - in minutes 

SecurityDelay  - in minutes 

LateAircraftDelay  - in minutes

 

The airports.csv contains the following attributes:

iata - the international airport abbreviation code

airport - name of the airport

city  - name of the city the airport is located

state - the state code for the airports location

country - country in which airport is located.

lat - latitude co ordinate

long - longitude co ordinate

 

The carriers.csv contains the following attributes:

code - unique carrier code

description - carrier full name 

 

The plane-data.csv contains the following attributes:

tailnum - unique tail number of the aircraft

type - the type ownership of the aircraft

manufacturer - the manufacturer of the aircraft

issue_date - date aircraft was issued 

model - the model number of the aircraft

status - status indicator

aircraft_type - description of aircraft 

engine_type - engine type used

year - year of manufacturing

The Task:

 

The aim of the project is to provide a graphical summary of important features of the data set, combined with your own exploration.

 

You must use HiveQL and Spark to process the data. It’s up to you whether you decide to use all or just one. The only requirement is that you can justify your choices. Python can be used to visualise your results.

 

There should be a focus in your analysis to detect occurrences and cause of flight delays. A small example of this could be to look into: 

 

When is the best time of day/day of week/time of year to fly to minimise delays?

Do older planes suffer more delays?

How does the number of people flying between different locations change over time?

Can you detect cascading failures as delays in one airport create delays in others? Are there critical links in the system?

Which carrier experiences lowest/highest number of delays?

 

Aside from this requirement, you are free to choose which other feature(s) to explore, how far to take the analysis and the methods you used to do it. Perhaps you could look to the operational efficiency of the larger airports.

 

 

What do you need to deliver?

 

Create a 15-20 minute presentation that analyses the dataset and be prepared to present to and engage with the class by the end of the week.

 

To Include:

Introduction to your presentation

Method (What do you want to find out? How you went about doing it?)

Findings (With accompanying evidence, any code & data visuals)

Any challenges/ issues? And how you resolved them?

Conclusion

 

Notes:

Take care to take legible screenshots of the important parts of your code. 

Have the application open for practical demonstration.

You will need to make clear use of at least 4 of the datasets.

 

This task is to be completed individually. It will be graded on insight derived, technical content and professional practice.

 

Marks are available for clear data visualisation, the quality of your code used and on your presentation about your interpretation and insight into the data.

 

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme