Big Data Analytics
I. Guidance
Your reassessment for Big Data Analytics is by resubmission of your original assessment. From your first attempt feedback you will now be aware which elements you were stronger on.
A good strategy, to follow, would be that you work on any of your answers that achieved less than a 5/10.
For each of your answers consider what you have been asked to add or improve. Remember the word count for these questions still stands, so you may have to rewrite all or part of that section. Be careful not to lose the points that you were awarded marks for in your first attempt. Consider how you can add evidence to your answers, further reading and referencing (IEEE) may help.
For Q1: The feedback given may suggest that you have not identified three data analysis questions or maybe you have provided similar type of questions where you could merge them. On the other hand, you may have presented three questions but either all or few of them are not justified, therefore, you need to justify them through wider reading. Maybe lack of analysis, discussion and evaluation of the data was given during exploring the questions where you can provide some analysis and justification on why you have selected them.
For Q2, Q3 and Q4 you may have provided some description on answering them and maybe you have not provided the details of the tools/techniques that you used for exploring, preparing or analysis purposes on the data which you could mention them with justification for selecting them. Alternatively, you have provided the tools and techniques used with reasonable discussion for your answers, but lack of/no citation is provided to support them. Adding citations to support your discussion and decisions will enhance the trustworthiness to your responses. It would be also a good idea to consider “How, What and Why” during answering questions.
Q5 is for presenting correct summary of your results with some discussion and analysis. You may have provided some results, but minimal discussion is given with or without wider reading support where appropriate summary for your results with some discussion cited through wider reading is effective to achieve good marks. You may have not mapped fully to the data analysis questions and given with brief analysis and explanation of the results where you could effectively summarise the analysis of your results which are mapped to your data analysis questions.
Q6 focuses on your results which response to your three data analysis questions. You may have provided answers but those are not answering all your data analysis questions. Besides, you may have tried to response to your questions or limited discussion is given or no conclusions are drawn where you could expand your conclusions which should be mapped to the data analysis questions. Your answer may be strengthened by providing suggestions for further analysis and with the support from wider reading.
Q7, Q9 and Q10 focus on research-based responses depend on the scenario where reading materials given in the module content as well as further wider reading can help to answer them. Particularly for Q7 you may have answered the threats with little validity and/or the level of detail is poor where you can identify appropriate threats with explanations and discussion of potential solutions with wider reading support to carry out good marks. For Q9, you can provide more discussion on relevant tools and technologies and evidence of the potential benefit over a conventional DBMS. For Q10 you may have limited discussion on privacy issues or no wider support is provided where you can cite relevant sources to demonstrate your required research activity with good level of discussion that may help to obtain good marks.
You may need to think carefully while answering Q8. You may have provided a schema but may be lacking evidence of design for example, you may have provided limited tables with few primary and foreign keys, joins/relations however, adding more tables and appropriate attributes under them would help you to achieve more marks. You need to also present the evidence about the data conversion which would be compatibility with WEKA.
Overall, you may revisit your answers and feedback and see where you have received pass mark or slightly more/less than that, think carefully how you can improve them by adding further evidence. Maybe you can complete additional reading if you have not done already for adding citations to support your answers. If you are penalised for exceeding the limit of using figures (maximum five) for your first submission, then you can think adding multiple relevant charts per image is permissible and remember that tables are not included if they are not inserted as screenshots in the report.
Following few reasons, you may think that it is not necessary to improve your work during resubmission. However, you can still consider improving your answers.
• You may have been awarded pass marks for some responses in your first submission therefore you are focusing only the responses were awarded lower than pass mark during this resubmission.
• You were penalised due to late submission and now you are thinking to resubmit your original work without changing anything as the non-penalised mark was a pass.
II. Module Learning Outcomes
1. Create a data set using modern database models and technology.
2. Manipulate a data set to extract statistics and features.
3. Critically evaluate and apply data mining techniques/tools to build a classifier or regression model, and predict values for new examples.
4. Analyse and communicate issues with scaling up to large data sets, and use appropriate techniques to scale up the computation.
5. Critically discuss the need for privacy, identify privacy risks in releasing information, and design techniques to mediate these risks.
This assessment will contribute to all the learning outcomes for this module.
III. Assessment Background/Scenario
Data
In the assessment submission point in Canvas, you will find a data set called covid_19_indonesia_time_series_all.csv which describes the distribution and progression of COVID 19 across Indonesia over time and measured against local population. This is presented in .csv format and is therefore compatible with WEKA and Excel (the tools you may use in this assignment).
IV. Assessment Task
Your task:
Is to use the attached data set, any information you can find about it elsewhere, and the techniques taught in the module, to pose three questions on which to base your data analysis. You will then need to consider how you might store the (relevant) data in a database, how you might spread a very large version of that data over multiple computers, what the privacy concerns are here and how you might address them.
You must produce a structured analysis report using the given template. The structured report consists of five sections, each containing specific questions which you must answer and there is also an additional space to upload your referencing.
In sections 2 and 3 of the report, which require you to use data analysis tools, you may use WEKA or Python tools. In section 2 you may also use Excel for visualization.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme