HW4 - Analyzing Bias in Networks
DSCI 531 - Spring 2022 - University of Southern California
1 Overview
In this homework, you will conduct basic network analyses on two different networks, perform link prediction on one of them, and analyze bias in networks.
2 Dataset
The friendship networks in UChicago and Caltech. Each node represents a person with a gender 0, 1, or 2: 1 and 2
are two genders, and 0 means gender not specified.
3 Tasks
3.1 Task1 – Network Analysis
1. For each network, calculate the centrality scores of nodes, including PageRank, betweeness centrality, degree centrality, and eigenvector centrality. You can use network analysis tools and libraries for this part, e.g., networkx1 includes all the needed algorithms. Separate these centrality scores by gender, and compare them. Which network has more gender gap in terms of centrality? Which centrality score(s) show(s) such gender gap? Give your insights on why this network has higher gender gap on the centrality score(s).
2. For each network, use Spearman’s rank correlation to find the most two similar centrality scores. Why do these two scores have more correlation?
3. For each network, calculate the clustering coefficient of nodes. Calcuate the Spearman’s rank correlation between the clustering coefficient and the four centrality scores. Which one has the least correlation with the clustering coefficient? Please give your insights.
3.2 Task2 – Link Prediction
Use Caltech network for this question. We want to perform link prediction on it. In link prediction, we have positive edges and negative edges. Positive edges are edges which are in the graph, and negative edges are edges which are not in the graph. In other words, negative edges are edges in complement of the graph. We train the link prediction model on a fraction of positive edges, and we test the model on how well it can retrieve the rest positive edges.2
For evaluation, for each node, we first retrieve the top-k incident edges as ranked by scores given by the model, and then count how many of the retrieved edges are in the test edges, thus obtaining precision@k on this node. The average precision@k over all the nodes is used to evaluate the model’s performance on the entire graph.
2This is not the only way to for evaluation of link prediction. In some works the model is also evaluated on how badly it can retrieve negative edges, which we will not consider in this assignment.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme