Background. Social data analytics have become a vital asset for organizations and governments. For example, over the last few years, governments started to extract knowledge and derive insights from vastly growing open/social data to personalize the advertisements in elections, improve government services, predict intelligence activities, as well as to improve national security and public health. A challenge in analyzing social data is to model the data in graph models and query the social data graph to understand the hidden insights. In this assignment you will explore Big Data Technologies for organizing and querying social data graphs.
Resource: Beheshti, A., Ghodratnama, S., Elahi, M., & Farhood, H. (2022). Social Data Analytics (1st ed.). CRC Press. https://doi.org/10.1201/9781003260141
Dataset. The Twitter dataset, including 10k tweets, is available on iLearn.
Twitter1 serves many objects as JSON2, including Tweets and Users. These objects all encapsulate core attributes that describe the object. Each Tweet has an author, a message, a unique ID, a timestamp of when it was posted, and sometimes geo metadata shared by the user. Each User has a Twitter name, an ID, a number of followers, and most often an account bio.
With each Tweet, Twitter generates 'entity' objects, which are arrays of common Tweet contents such as hashtags, mentions, media, and links. If there are links, the JSON payload can also provide metadata such as the fully unwound URL and the webpage’s title and description.
So, in addition to the text content itself, a Tweet can have over 140 attributes associated with it. Let’s start with an example Tweet:
1 https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json
2 JSON is based on key-value pairs, with named attributes and associated values. These attributes, and their state, are used to describe objects.
Figure 1. A sample Tweet.
The following JSON illustrates the structure3 for these objects and some of their attributes:
Figure 2. Illustrating the structure of a Tweet using JSON.
Part 1. Data Curation (10%)
Write a program (in Python) to apply data cleaning/curation on the 10k Tweet Dataset. A sample 50 cleaned Tweets are provided to help you with this step.
Part 2. Graph Data Model (40%)
Considering the following Graph Data Model for Tweets in Twitter, in Figure 3, Write a program (in Python) to generate an RDF4 graph model from the 10k Tweet Dataset.
Note that the data required to create the “FOLLOWS” relationship shown in this model does not exist in the source dataset and will be inferred as an exercise during Part 3. For the remaining relationships, data can be imported to Neo4j using the APOC support for importing JSON. See appendix for information.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme