LCA Warmup - revisited
In this document, we walk through the LCA warm-up again, but we break down the code input and output step-by-step.
library(tidyverse)
library(poLCA)
Load data.
week9 <- read.csv("week10-practice.csv", header=T, skip=3)
head(week9)
## id team manager jobTasksSDSA jobBalanceSDSA jobPaySDSA
## 1 W472V Ethics X8022Z 7 3 7
## 2 S227N DSI V8472W 1 6 3
## 3 N333E Finance X7645X 7 7 7
## 4 K503P Finance Y4468Y 2 2 4
## 5 J845U R&D X8066X 6 3 7
## 6 O834K Finance Z7298V 3 3 4
## jobRecognitionSDSA jobAdvancementSDSA jobCoworkersSDSA jobManagementSDSA
## 1 7 7 5 4
## 2 2 2 7 7
## 3 7 7 7 NA
## 4 3 4 3 3
## 5 7 7 4 4
## 6 4 4 4 3
## salary tenure age genderFM degreeBMP citizenNY
## 1 126460 6 30 2 3 2
## 2 47749 1 38 1 1 1
## 3 80828 6 24 2 3 1
## 4 52852 6 35 1 1 2
## 5 98429 10 46 2 3 2
## 6 45397 1 50 1 1 2
str(week9)
## 'data.frame': 1384 obs. of 16 variables:
## $ id : Factor w/ 1384 levels "A103R","A105D",..: 1208 975 708 549 509 792 972 637 235 311 ...
## $ team : Factor w/ 5 levels "DSI","Ethics",..: 2 1 3 3 5 3 5 5 5 3 ...
## $ manager : Factor w/ 18 levels "I511L","V4497X",..: 10 4 9 13 11 18 16 8 8 15 ...
## $ jobTasksSDSA : int 7 1 7 2 6 3 2 7 3 4 ...
## $ jobBalanceSDSA : int 3 6 7 2 3 3 3 4 6 5 ...
## $ jobPaySDSA : int 7 3 7 4 7 4 3 7 4 5 ...
## $ jobRecognitionSDSA: int 7 2 7 3 7 4 4 7 3 5 ...
## $ jobAdvancementSDSA: int 7 2 7 4 7 4 3 7 4 6 ...
## $ jobCoworkersSDSA : int 5 7 7 3 4 4 4 5 7 7 ...
## $ jobManagementSDSA : int 4 7 NA 3 4 3 5 6 7 6 ...
## $ salary : int 126460 47749 80828 52852 98429 45397 171648 53330 145440 71567 ...
## $ tenure : int 6 1 6 6 10 1 8 7 2 8 ...
## $ age : int 30 38 24 35 46 50 34 53 22 42 ...
## $ genderFM : int 2 1 2 1 2 1 2 2 2 2 ...
## $ degreeBMP : int 3 1 3 1 3 1 3 1 3 2 ...
## $ citizenNY : int 2 1 1 2 2 2 2 2 1 2 ...
We want to determine if there is a grouping in the data that accounts for the relationship between gender, education, and citizenship.
The poLCA package
The poLCA package was introduced in the walkthrough, but I was asked for a bit more clarity on its usage.
The framework behind LCA is taking a group of multivariate dependent varaibles that are categorical, such as survey responses, and accounting for their correlation by latent class membership.
poLCA requires a formula specification with two elements: a list of dependent variables, which would the multiple outcomes/survey items, and a list of independent variables, which are predictors of class membership. The dependent variables require a cbind, but the independent variables do not.
Important: predictors for class membership are not required. If you just want to determine latent classes for a set of variables without a predictor, then simply use a 1 on the right side of the ~. See comments below.
For our hypothesis, we have:
## for ease, write formula into object
## form:
## dependent variables are on the lefthand side, and need cbind()
## the formula is specified by a tilde, ~
## the predictors of class membership are on the righthand side, normal formula specification (iv1 + iv2, etc)
## if '~ 1', then there is no predictor o n class
unconditionalLCA <- cbind(genderFM, degreeBMP, citizenNY) ~ 1
In addition to the formula, there are many options when running poLCA, see the code with comments below.
set.seed(105)
unconditional2class <- poLCA(unconditionalLCA, # the formula for the model,
week9, # the data
nclass =2, # number of classes to fit
maxiter = 10000, #max iterations
tol = 1e-8, # this is the degree of improvement from one iteration to the next to determine if the estimation converged. 1e-10 is very conservative, can relax to 1e-6.
nrep = 1, #number of times to carry out the iteration procedure. complex models sometimes need multiple reps to increase confidence in the solution
verbose = F # I don't want all the output for this document!
)
unconditional2class
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$genderFM
Pr(1) Pr(2)
class 1: 0.2009 0.7991
class 2: 0.8026 0.1974
$degreeBMP
Pr(1) Pr(2) Pr(3)
class 1: 0.1412 0.1875 0.6713
class 2: 0.5930 0.4041 0.0029
$citizenNY
Pr(1) Pr(2)
class 1: 0.2085 0.7915
class 2: 0.2018 0.7982
Estimated class population shares
0.5113 0.4887
Predicted class memberships (by modal posterior prob.)
0.4841 0.5159
=========================================================
Fit for 2 latent classes:
=========================================================
number of observations: 1384
number of estimated parameters: 9
residual degrees of freedom: 2
maximum log-likelihood: -3043.681
AIC(2): 6105.362
BIC(2): 6152.457
G^2(2): 0.2288443 (Likelihood ratio/deviance statistic)
X^2(2): 0.2285847 (Chi-square goodness of fit)
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND
The above was calculated very quickly, but the maximum likelihood was not found. This means that the estimation did not converge. I will increase the iterations to see if a solution is reached.
set.seed(105)
unconditional2class <- poLCA(unconditionalLCA, # the formula for the model,
week9, # the data
nclass =2, # number of classes to fit
maxiter = 15000, #max iterations
tol = 1e-10, # this is the degree of improvement from one iteration to the next to determine if the estimation converged. 1e-10 is very conservative, can relax to 1e-6.
nrep = 1, #number of times to carry out the iteration procedure. complex models sometimes need multiple reps to increase confidence in the solution
verbose = F # I don't want all the output for this document!
)
unconditional2class
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme