# An important experiential component of the course is the completion of the report assignment.

INSTRUCTIONS TO CANDIDATES

Detailed Requirements

An important experiential component of the course is the completion of the report assignment. Students are encouraged to work in groups (2-3 students per group), but it is also fine to complete this assignment individually, to reflect on their learning. Students need to pick the project topic by their own. Based on the topic, students need to find the proper dataset, select a subset of variables, describe the data using statistics and graph, come up with a model to answer the business question and write a report to summarize the findings.

The data set can come from any source as long as it is something that interests you. Possible sources of data include:

o Sports statistics, such as those from the Baseball Archive: http://www.seanlahman.com/baseball-archive/statistics

o Kaggle datasets: https://www.kaggle.com/datasets

o FiveThirtyEight: http://fivethirtyeight.com/

o Philly Open Data: https://www.opendataphilly.org/

o Data Gov: https://www.data.gov/

o Guardian Data Blog: http://www.theguardian.com/data

o Flowing Data: http://flowingdata.com/

o Financial Times Data Blog: http://blogs.ft.com/ftdata/

o Our World in Data: https://ourworldindata.org/

o Pew Research Data: http://www.pewresearch.org/data/

o Reddit / Data is Beautiful: https://www.reddit.com/r/dataisbeautiful/

If you cannot decide or choose a proper dataset, please use the Titanic dataset to finish the project. You can find the Titanic dataset on https://www.kaggle.com/.

The Report

The report must cover the following:

Determine the type of model based on the question (descriptive or predictive)

Describe the question and your data - in text.

The data description analysis - using statistical numbers and graphs (using tables and graphs).

The data analysis to answer the business question (describe the model, run the model, and show the results).

The model evaluation (evaluate the model to see whether it is good).

Discuss potential problems and improvements.

The report should be a minimum of 5 pages (exclude title page, table of contents and reference page), 1.5-line spacing, regular margin, font size 12pt.

Data Analysis and Visual Presentation

For the data analysis and visual presentation, you can use MS. Excel.

Excel Data Analysis (by Springer): https://link.springer.com/book/10.1007/978-3-030-01279-3

For data analysis, you can also use RapidMiner. RapidMiner Online Training Videos: https://www.youtube.com/watch?v=9i05kf0AxoE

For Visual analysis you can also use Tableau. Tableau Online Training Videos:

https://www.tableau.com/learn

Marking Criteria

Your project will be marked based on the following rubric.

