The code here is also part of the work that I did for studying the impact of anti-immigration laws on employment. To study the effect of an event, the best approach is to find two groups that are generally similar, while one received the treatment (e.g., passing anti-immigration law) and the other did not. We identify the event’s impact by observing the difference between the two groups’ characteristics pre and post treatment.
For our purpose, we construct a series of county pairs in which one county is in the state that passed some anti-immigration law, and the other one is in its neighboring state that does not have similar laws. We ensure the two counties in each pair are highly alike through establishing similarity scores based on the county’s industrial composition and only selecting the top 5% pairs with the highest scores. Here is how I did it:
setwd('E:/Github/proj1')
qwi = fread('qwiNeeded.csv')
qwibad = readRDS('qwi_datacheck.rds')
qwi = qwi[race == 'A0' & ethnicity == 'A0' & education == 'E0']
qwi = merge(qwi, qwibad, by = c('geo', 'industry'), all.x = T)
qwi = qwi[bad == 0]
names(qwi)[7] = 'Emp'
sample_n(qwi, 10)
## geo industry dt race ethnicity education Emp HirA HirAS
## 1: 47081 52 2012-07-01 A0 A0 E0 46 NA NA
## 2: 13085 42 2011-01-01 A0 A0 E0 1046 130 95
## 3: 39027 72 2011-04-01 A0 A0 E0 1618 415 94
## 4: 28001 71 2010-04-01 A0 A0 E0 418 53 25
## 5: 17069 48-49 2014-07-01 A0 A0 E0 80 8 7
## 6: 28115 23 2011-07-01 A0 A0 E0 360 60 28
## 7: 13177 52 2010-10-01 A0 A0 E0 146 19 8
## 8: 21213 23 2011-04-01 A0 A0 E0 316 78 16
## 9: 35025 53 2014-07-01 A0 A0 E0 638 108 46
## 10: 13287 92 2010-07-01 A0 A0 E0 267 26 28
## statefipsn cntyfipsn bad
## 1: 47 81 0
## 2: 13 85 0
## 3: 39 27 0
## 4: 28 1 0
## 5: 17 69 0
## 6: 28 115 0
## 7: 13 177 0
## 8: 21 213 0
## 9: 35 25 0
## 10: 13 287 0
naics = unique(qwi$industry)
naicsdt = data.table(industry = naics)
naicsdt = naicsdt[industry != '00']
geo = unique(qwi$geo)
edu = unique(qwi$education)
time = unique(qwi$dt)
Some counties in qwi are missing one or more industries, we add those back and fill the cells with NA:
qwiIndus = qwi[dt=='2010-01-01' , .(geo, industry, Emp)]
framework = as.data.table(expand.grid(geo = unique(qwiIndus$geo), industry = unique(qwiIndus$industry)))
qwiIndus = merge(framework, qwiIndus, by = c('geo','industry'), all.x = T)
qwiIndus = qwiIndus[,statefipsn := floor(geo/1000)][,cntyfipsn := geo - statefipsn*1000] # get state/cnty fips
sample_n(qwiIndus, 10)
## geo industry Emp statefipsn cntyfipsn
## 1: 1105 22 NA 1 105
## 2: 21165 21 NA 21 165
## 3: 18161 00 1306 18 161
## 4: 47087 31-33 169 47 87
## 5: 8051 92 614 8 51
## 6: 13079 00 1638 13 79
## 7: 18121 00 3272 18 121
## 8: 26043 52 345 26 43
## 9: 16081 42 67 16 81
## 10: 39119 23 897 39 119
Here we create a variable called pctInState that reveals the industrial compositions of each county. The value in pctInState is the percentage of the total employment for a certain industry of a given county. This variable will later be used to calculate two counties’ similarity score:
qwiIndus = qwiIndus[, EmpTotCounty := sum(Emp, na.rm = T), by = .(geo)]
qwiIndus = qwiIndus[, EmpShareCounty := Emp/EmpTotCounty]
qwiIndus = qwiIndus[!is.na(EmpShareCounty),
pctInState := rank(EmpShareCounty)/length(EmpShareCounty),
by = c('statefipsn','industry')]
qwiIndus = qwiIndus[order(statefipsn, industry, EmpShareCounty)]
setkey(qwiIndus, geo, industry)
head(qwiIndus, 20)
## geo industry Emp statefipsn cntyfipsn EmpTotCounty EmpShareCounty
## 1: 1001 00 12364 1 1 24658 0.501419418
## 2: 1001 11 169 1 1 24658 0.006853759
## 3: 1001 21 43 1 1 24658 0.001743856
## 4: 1001 22 133 1 1 24658 0.005393787
## 5: 1001 23 601 1 1 24658 0.024373429
## 6: 1001 31-33 1305 1 1 24658 0.052924000
## 7: 1001 42 262 1 1 24658 0.010625355
## 8: 1001 44-45 2064 1 1 24658 0.083705086
## 9: 1001 48-49 288 1 1 24658 0.011679779
## 10: 1001 51 104 1 1 24658 0.004217698
## 11: 1001 52 321 1 1 24658 0.013018087
## 12: 1001 53 165 1 1 24658 0.006691540
## 13: 1001 54 356 1 1 24658 0.014437505
## 14: 1001 55 NA 1 1 24658 NA
## 15: 1001 56 268 1 1 24658 0.010868684
## 16: 1001 61 1453 1 1 24658 0.058926109
## 17: 1001 62 1488 1 1 24658 0.060345527
## 18: 1001 71 81 1 1 24658 0.003284938
## 19: 1001 72 1937 1 1 24658 0.078554627
## 20: 1001 81 439 1 1 24658 0.017803553
## pctInState
## 1: 0.5671642
## 2: 0.4696970
## 3: 0.5263158
## 4: 0.5102041
## 5: 0.5151515
## 6: 0.1791045
## 7: 0.3166667
## 8: 0.8805970
## 9: 0.2968750
## 10: 0.4347826
## 11: 0.5873016
## 12: 0.8260870
## 13: 0.7222222
## 14: NA
## 15: 0.3396226
## 16: 0.6865672
## 17: 0.5074627
## 18: 0.4705882
## 19: 0.9846154
## 20: 0.9491525
The chunk below defines function getCountyPairs that generates all possible pairs given a list of counties:
getCountyPairs = function(statelist) {
treat_state = statelist[1]
control_state = statelist[-1]
treat_counties = unique(qwiIndus[statefipsn %in% treat_state, geo])
control_counties = unique(qwiIndus[statefipsn %in% control_state, geo])
pairs = as.data.table(expand.grid(county1 = treat_counties,
county2 = control_counties))
return(pairs)
}
The following chunk defines function matchScores that calculates the correlation of the industrial compositions of two counties, and uses the value as the similarity score:
matchScores = function(countyPairs) {
scores = numeric()
for (i in 1:nrow(countyPairs)) {
#if (i %% 1000 == 0) {print(i)}
geo1 = countyPairs[i, county1]
geo2 = countyPairs[i, county2]
vec1 = qwiIndus[.(geo1), pctInState]
vec2 = qwiIndus[.(geo2), pctInState]
scores[i] = cor(vec1, vec2, use='pairwise.complete.obs', method = 'pearson')
}
return(scores)
}
The states of interest are Indiana, Arizona, Utah, Alabama, Georgia, and South Carolina, which all passed harsh anti-immigration laws around the early 2010s. These states all enacted anti-immigration laws in the early 2010s. The chunk below creates six vectors that specifies these states and their neighbor states that do not have such laws:
stateIN = c(18,26,17,21,39)
stateAZ = c(4,32,8,35,6)
stateUT = c(49,32,16,56,8)
stateAL = c(1,12,28,47)
stateGA = c(13,37,12,47)
stateSC = c(45,37)
Now we can run the previously defined functions on some list of states. For each county in Alabama, how similar is it to the counties in the neighboring states? Let’s make a table about this:
pairs_AL = getCountyPairs(stateAL)
scores_AL = matchScores(pairs_AL)
scores_AL_dt = cbind(pairs_AL, scores_AL)
scores_AL_Jefferson = scores_AL_dt[county1 == 1073]
sample_n(scores_AL_Jefferson, 15)
## county1 county2 scores_AL
## 1: 1073 47187 0.77440535
## 2: 1073 28057 -0.39996481
## 3: 1073 28093 -0.18778672
## 4: 1073 28109 -0.10970364
## 5: 1073 47057 -0.58552775
## 6: 1073 47105 0.05074975
## 7: 1073 47183 0.18787882
## 8: 1073 12111 0.07783896
## 9: 1073 12029 -0.90976931
## 10: 1073 12127 0.33768272
## 11: 1073 47165 0.46295008
## 12: 1073 47103 -0.01739118
## 13: 1073 28129 0.09167980
## 14: 1073 47155 0.24747681
## 15: 1073 47071 0.19843300
We can easily visualize the county pairs’ similarity scores. Let’s use Jefferson county, AL as an example:
z1 = scores_AL_Jefferson[, -1]
names(z1) = c('region','value')
z2 = data.table(region = unique(scores_AL_dt$county1), value = -1)
z2[z2$region == 1073, 2] = 1
z1 = rbind(z1, z2)
z1[is.na(z1$value), 2] = -1
county_choropleth(z1, state_zoom = c("alabama", "tennessee", "mississippi",'florida'),legend = "Similarity Scores")
We can also do it for other states. Here are some examples:
Note: we did not include Utah in the figure above because Utah also passed some anti-immigration bill around the same time.
Note: Again, we did not include Alabama and South Carolina in the figure above because these two passed some anti-immigration bill around the same time.