avatarJonathan Albright

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6530

Abstract

ckquote id="435c"><p><b><i>cruzSentiments</i></b><i> = [] cruzKeywords = [‘cruz’, ‘tedcruz’]</i></p></blockquote><blockquote id="9008"><p><b><i>bernieSentiments</i></b><i> =[] bernieKeywords = [‘bern’, ‘bernie’, ‘sanders’, ‘sensanders’]</i></p></blockquote><blockquote id="54b1"><p><b><i>obamaSentiments</i></b><i> = [] obamaKeywords = [‘obama’, ‘barack’, ‘barackobama’]</i></p></blockquote><blockquote id="43b8"><p><b><i>republicanSentiments</i></b><i> = [] republicanKeywords = [‘republican’, ‘conservative’]</i></p></blockquote><blockquote id="4138"><p><b><i>democratSentiments</i></b><i> = [] democratKeywords = [‘democrat’, ‘dems’, ‘liberal’]</i></p></blockquote><blockquote id="15af"><p><b><i>gunsSentiments</i></b><i> = [] gunsKeywords = [‘guns’, ‘gun’, ‘nra’, ‘pistol’, ‘firearm’, ‘shooting’]</i></p></blockquote><blockquote id="a857"><p><b><i>immigrationSentiments </i></b><i>= [] immigrationKeywords = [‘immigration’, ‘immigrants’, ‘citizenship’, ‘naturalization’, ‘visas’]</i></p></blockquote><blockquote id="c462"><p><b><i>employmentSentiments</i></b><i> = [] emplyomentKeywords = [‘jobs’, ‘employment’, ‘unemployment’, ‘job’]</i></p></blockquote><blockquote id="99bb"><p><b><i>inflationSentiments </i></b><i>= [] inflationKeywords = [‘inflate’, ‘inflation’, ‘price hike’, ‘price increase’, ‘prices rais’]</i></p></blockquote><blockquote id="3fd8"><p><b><i>minimumwageupSentiments</i></b><i> = [] minimumwageupKeywords = [‘raise minimum wage’, ‘wage increase’, ‘raise wage’, ‘wage hike’]</i></p></blockquote><blockquote id="272d"><p><b><i>abortionSentiments </i></b><i>= [] abortionKeywords = [‘abortion’, ‘pro-choice’, ‘planned parenthood’]</i></p></blockquote><blockquote id="a38c"><p><b><i>governmentspendingSentiments</i></b><i> = [] governmentspendingKeywords = [‘gov spending’, ‘government spending’, ‘gov. spending’, ‘expenditure’]</i></p></blockquote><blockquote id="8970"><p><b><i>taxesupSentiments </i></b><i>= [] taxesupKeywords = [‘raise tax’, ‘tax hike’, ‘taxes up’, ‘tax up’, ‘increase taxes’, ‘taxes increase’, ‘tax increase’]</i></p></blockquote><blockquote id="3dce"><p><b><i>taxesdownSentiments </i></b><i>= [] taxesdownKeywords = [‘lower tax’, ‘tax cut’, ‘tax slash’, ‘taxes down’, ‘tax down’, ‘decrease taxes’, ‘taxes decrease’, ‘tax decrease’]</i></p></blockquote><p id="2419">Drilling down to the list of terms that are linked to each election sentiment keyword (in the code as <i>#(nameOfTuple, <b>sentimentList</b>, <b>keywordList </b>), </i>we can see:</p><blockquote id="ceb3"><p><b><i>personSentimentList</i></b><i> = [(‘hillary’, hillarySentiments, hillaryKeywords), (‘trump’, trumpSentiments, trumpKeywords), (‘cruz’, cruzSentiments, cruzKeywords), (‘bernie’, bernieSentiments, bernieKeywords), (‘obama’, obamaSentiments, obamaKeywords)]</i></p></blockquote><blockquote id="271d"><p><b><i>issueSentimentList</i></b><i> = [(‘guns’, gunsSentiments, gunsKeywords), (‘immigration’, immigrationSentiments, immigrationKeywords), (‘employment’, employmentSentiments, emplyomentKeywords), (‘inflation’, inflationSentiments, inflationKeywords), (‘minimum wage up’, minimumwageupSentiments, minimumwageupKeywords), (‘abortion’, abortionSentiments, abortionKeywords), (‘government spending’, governmentspendingSentiments, governmentspendingKeywords), (‘taxes up’, taxesupSentiments, taxesupKeywords), (‘taxes down’, taxesdownSentiments, taxesdownKeywords) ]</i></p></blockquote><p id="36bb">Phillips also provides a snippet of code “for taking random twitter IDs” to create a Twitter “<b>control group.” </b>This part of the code appears to “skim the most recent tweets that have mentioned one of our [Cambridge Analytica’s pre-defined] keywords.”</p><figure id="6ca9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*QyAPaQsga29bj1lCQrJPgg.png"><figcaption>code snippet “for taking random twitter IDs for the <b>control group”</b></figcaption></figure><p id="7c8d">Phillips explains in the notes within his code about the practicalities of sentiment mining — this is not big data (ie, “all the tweets”) that were being sought out:</p><blockquote id="1e0e"><p>“it turned out that skimming all of the tweets found very very few occurances of keywords since “twitter is such a global/multilingual platform.”</p></blockquote><p id="a70a">Next, Phillips provides a snippet to parse <b><i>any</i></b> text that CA was “looking for through <b>non-tweets</b> (like transcripts of some sort),” noting that the tool is set up to “find sentiment and <b>adds</b> [it] to the respective keywords’ data list”:</p><figure id="052b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*zsRwlzmbQsmjLfBpeGt-zw.png"><figcaption></figcaption></figure><p id="d10e">Interesting functionality, indeed. The lines of code then follow with a function that Phillips states:</p><blockquote id="923b"><p>“goes through tweets of each user, looks for keywords, and if the keyword is there, we find the sentiment for that tweet and add it to the sentiment data list”</p></blockquote><figure id="116f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*04Ew02t8-LAYKt8nTLVotQ.png"><figcaption>sentiment roundup</figcaption></figure><p id="e43d">Finally, the code compiles the collected and refined Twitter data into a set. Phillips describes:</p><blockquote id="a853"><p>“compiles the sentiment data for each keyword group into an easier to work with format (dataframe) … it is only meaningful if compared with a control group, since keyword selection is impossible to employ neutrally.”</p></blockquote><figure id="6112"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*XNZikCyhph11KUTOZfjIoQ.png"><figcaption></figcaption></figure><p id="989f">The final output of<b> </b>the<b> <a href="https://github.com/MichaelPhillipsData/GitSampleCode/blob/master/twitterAnalysis.py">Twitteranalysis.py</a></b> is a list of tweets and Twitter users (via user IDs) from a pre-defined set of keywords (<b>abortion</b>, <b>NRA</b>, <b>Hillary</b>, <b>Obama</b>, lower <b>taxes</b>, <b>guns</b>, <b>immigration</b>, <b>liberals</b>, etc.). All relate to #Election2016 campaign issues. Also, this code appears to be extensible — it can be used outside of Twitter, such as to mine the transcripts and recorded text from <b>focus</b> <b>groups</b> and <b>survey</b> <b>respondents</b>.</p><figure id="61af"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*QpUqksnfvKmuo0x5a3NkeA.png"><figcaption>snippet for focus group transcripts (and surveys? or NLP?)</figcaption>

Options

</figure><p id="dfad">These scripts normally wouldn’t be <i>that</i> interesting. But provided both were added by a Cambridge Analytica intern (at least at the time) and contain a running dialog of what the tools do, how they work, and why they were built— and the fact that they are *still* available on Github — I thought I’d share.</p><p id="d06a">Almost every pronoun used in the script walkthroughs (see the archived Github links) is <b><i>inclusive</i></b> and <b><i>plural</i></b> — “our,” “we,” “we’re,” etc. Also, reference to convert to format “to put into the <b>neural</b> <b>network</b>.”</p><figure id="5c91"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*IzfRd6qLzhykyZ37u7m5UA.png"><figcaption>neural network?</figcaption></figure><p id="526e">Wait, there’s <b><i>one</i></b> more thing. When Phillips committed his original <b>Twitteranalysis.py</b> script, he accidentally <a href="https://github.com/MichaelPhillipsData/GitSampleCode/commit/624536271395ab7329cac482ab1a114384659bfd#diff-a8c0e63312f3be2120e123b73b6adf8b">left the <b>working</b> Twitter API keys in the code </a>(via the consumer <b>key</b> and consumer “<b>secret</b>”). This contains the alphanumeric strings which are used for the developer account to access data from Twitter’s API.</p><figure id="b74a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*NZvqswilctnd1G6Kj0-kqA.png"><figcaption></figcaption></figure><p id="f333">Interestingly, on Feb 23, 2017 (yes, <b>2017</b>), Phillips <b>removed</b> the API keys:</p><figure id="d8cc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hzFXg8KChTbpkIytQEDBlA.png"><figcaption>Removal of API keys and replacement with placeholder keys.</figcaption></figure><p id="4406">Two days later, another Github user added a comment about Phillip’s mistake:</p><figure id="122e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*kjCrP36UIdEofefPq_sEBQ.png"><figcaption>user comments on Phillips’ API keys</figcaption></figure><p id="5d2b">Was this API key Cambridge Analytica’s? Or SCL’s ? While both scripts— the first including Phillips’ <b>@cambridgeanalytica.org</b> email address, clearly are voter data and election related, from the commentary in the script, it’s not clear who the API key belonged to. This might have been Phillips’ own account.</p><p id="f02e">Regardless, this code shows the <b>inner workings</b> of client voter file geo-data “enrichment” and presumably automated <b>voter database processing</b> for clients by Cambridge Analytica.</p><p id="f576">This code also provides the proof in showing once and for all how Twitter users’ emotional reactions and real-time discussions even favorites/likes (pulled from the API) are <b>mined</b> in real time and used to create <b>test</b> <b>phrases, </b>establish<b> control</b> <b>groups</b>, and apparently provide sets of <b>future</b> terms around keywords related to political campaign issues.</p><p id="a49a">The fact that Cambridge Analytica was using this kind of code to mine emotional responses that surface from users’ “recent tweets” from a defined set of 2016 presidential campaign “trigger words” is interesting.</p><div id="5623" class="link-block"> <a href="https://readmedium.com/whats-missing-from-the-trump-election-equation-let-s-start-with-military-grade-psyops-fa22090c8c17"> <div> <div> <h2>What’s Missing From The Trump Election Equation? Let’s Start With Military-Grade PsyOps</h2> <div><h3>What do Nelson Mandela, Thom Tillis, Trump’s possible Secretary of State pick John Bolton, Brietbart Chairman Steve…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*iEFW_9FFx5Dl9Yf1mcEWRA.png)"></div> </div> </div> </a> </div><p id="35a1">I’m confident Phillips provided this data in earnest, as he includes an excellent working description in the purposes and uses of these scripts. He was an CA intern who wanted to show his work to get a job in the future. Yet, this is part of the arsenal of tools used by Cambridge Analytica to <b>geolocate American voters</b> and harness American’s <b>real-time emotional sentiment </b>(see example below for Instagram targeting).</p><figure id="6337"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*eOo-hFNNRjrY-1mVHvxNww.png"><figcaption>Voter ID + Address &gt; Lat/Long + Congressional District =[real-time Instagram API location targeting]</figcaption></figure><p id="80ed">I’d argue the question of the <i>ownership</i> of Cambridge Analytica — a foreign business previously registered in the United States as a <i>foreign</i> corporation (SCL Elections ) just became a bit more relevant.</p><p id="fac4"><i>Foreign</i> influence— sound familiar?</p><p id="c14e">And that fact that a <b>working Twitter developer API key</b> — possibly one of Cambridge Analytica’s own — was left sitting on GitHub by a data <i>intern</i> for anyone to use is, well, another story. The code will likely be removed soon, so it’s available here:</p><div id="d9bb" class="link-block"> <a href="https://data.world/d1gi/ca-data"> <div> <div> <h2>CA Election Data Processing Scripts - dataset by d1gi</h2> <div><h3>d1gi is using data.world to share CA Election Data Processing Scripts data with the world</h3></div> <div><p>data.world</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*46xCxTUir5uF9cv2.)"></div> </div> </div> </a> </div><div id="2ef1" class="link-block"> <a href="https://readmedium.com/election2016-fakenews-compilation-455870d04bb"> <div> <div> <h2>📌#Election2016 #FakeNews Compilation</h2> <div><h3>Something like Mr Robot meets House of Cards meets academic hackathon deep-data journalism. IDK. All open data and…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*qUy1eEAHRFL89vZH2DmQFQ.jpeg)"></div> </div> </div> </a> </div></article></body>

Cambridge Analytica: the Geotargeting and Emotional Data Mining Scripts

Last year, Michael Phillips, a data science intern at Cambridge Analytica, posted the following scripts to a set of “work samples” on his personal GitHub account.

The Github profile, MichaelPhillipsData is still around. It contains a selection of Phillips’ coding projects. Two of the “commits” — still online today — appear to be scripts that were used by Cambridge Analytica around the election. One of them even lists his email address. The rest of his current work, Phillips notes on his Github profile, he unfortunately “cannot share.”

The first of Phillips’ two election data processing Github scripts is titled GeoLocation.py,* a list-completing and enrichment tool that can be used to:

“complete an array of addresses with accurate latitudes and longitudes using the completeAddress functionIncludes another function compareAPItoSource for testing APIs with source latitude longitudes.”

Phillips describes the geolocation list completion script as performing the following tasks (to enrich clients’ personal information files):

“Essentially what it does is: For each address in the addresses file, try to get an accurate lng/lat quickly (comparing available datafrom Aristotle/IG to the zip code file data to determine accuracy), but if we can’t, we fetch it from ArcGIS.”

>Don’t miss the line item called “Voter_ID”

line, line, line … “voter_ID?”

The second “work-related” script sitting on Phillips’ Github repo is called Twitteranalysis.py.

Phillips offers a quick starter for how the Twitter sentiment-mining code works:

For starters, we will just get sentiment from textBlob for tweets containing keywords like “Trump”, “Carson”, “Cruz”, “Bern”, Bernie”, “guns”, “immigration”, “immigrants”, etc.

Twitteranalysis.py also finds the Twitter user IDs amongst the tweet sample it collects in order to “retrieve all the user’s recent tweets and favorites.”

Looking in more detail, it then:

  1. Separates users’ tweets into [control] groups containing each keyword
  2. Produces a “sentiment graph” of the whole group using textBlob and matplotlib

As a real-time social media mining tool which uses common tools like tweepy and matplotlib, this doesn’t appear to be science fiction or extremely complex. However, this is not what makes the code interesting as a key research, political evidence, and cultural object.

The most fascinating part of the Twitter sentiment-miner that Phillips’ posted is how it appears to pull users’ IDs and find their “recent tweets” and favorites to expand the company’s corpus of keywords around specific objects of election “outrage” sentiment (ie, immigration, border control, etc.).

Looking below, nearly all “sentiments” within the lines of code involve “hot-button” 2016 election topics such as abortion, citizenship, naturalization, guns, the NRA, liberals, Obama, and Planned Parenthood.

See for yourself, here’s the actual code:

#each sentiments list will have tuples: (sentiment, tweetID)

#note: could include many more keywords like “feelthebern” for example, but need neutral keywords to get true sentiments. feelthebern would be a biased term.

In any case, here are the “sentiments” the script was set to look for via Twitter’s API:

hillarySentiments = [] hillaryKeywords = [‘hillary’, ‘clinton’, ‘hillaryclinton’]

trumpSentiments = [] trumpKeywords = [‘trump’, ‘realdonaldtrump’]

cruzSentiments = [] cruzKeywords = [‘cruz’, ‘tedcruz’]

bernieSentiments =[] bernieKeywords = [‘bern’, ‘bernie’, ‘sanders’, ‘sensanders’]

obamaSentiments = [] obamaKeywords = [‘obama’, ‘barack’, ‘barackobama’]

republicanSentiments = [] republicanKeywords = [‘republican’, ‘conservative’]

democratSentiments = [] democratKeywords = [‘democrat’, ‘dems’, ‘liberal’]

gunsSentiments = [] gunsKeywords = [‘guns’, ‘gun’, ‘nra’, ‘pistol’, ‘firearm’, ‘shooting’]

immigrationSentiments = [] immigrationKeywords = [‘immigration’, ‘immigrants’, ‘citizenship’, ‘naturalization’, ‘visas’]

employmentSentiments = [] emplyomentKeywords = [‘jobs’, ‘employment’, ‘unemployment’, ‘job’]

inflationSentiments = [] inflationKeywords = [‘inflate’, ‘inflation’, ‘price hike’, ‘price increase’, ‘prices rais’]

minimumwageupSentiments = [] minimumwageupKeywords = [‘raise minimum wage’, ‘wage increase’, ‘raise wage’, ‘wage hike’]

abortionSentiments = [] abortionKeywords = [‘abortion’, ‘pro-choice’, ‘planned parenthood’]

governmentspendingSentiments = [] governmentspendingKeywords = [‘gov spending’, ‘government spending’, ‘gov. spending’, ‘expenditure’]

taxesupSentiments = [] taxesupKeywords = [‘raise tax’, ‘tax hike’, ‘taxes up’, ‘tax up’, ‘increase taxes’, ‘taxes increase’, ‘tax increase’]

taxesdownSentiments = [] taxesdownKeywords = [‘lower tax’, ‘tax cut’, ‘tax slash’, ‘taxes down’, ‘tax down’, ‘decrease taxes’, ‘taxes decrease’, ‘tax decrease’]

Drilling down to the list of terms that are linked to each election sentiment keyword (in the code as #(nameOfTuple, sentimentList, keywordList ), we can see:

personSentimentList = [(‘hillary’, hillarySentiments, hillaryKeywords), (‘trump’, trumpSentiments, trumpKeywords), (‘cruz’, cruzSentiments, cruzKeywords), (‘bernie’, bernieSentiments, bernieKeywords), (‘obama’, obamaSentiments, obamaKeywords)]

issueSentimentList = [(‘guns’, gunsSentiments, gunsKeywords), (‘immigration’, immigrationSentiments, immigrationKeywords), (‘employment’, employmentSentiments, emplyomentKeywords), (‘inflation’, inflationSentiments, inflationKeywords), (‘minimum wage up’, minimumwageupSentiments, minimumwageupKeywords), (‘abortion’, abortionSentiments, abortionKeywords), (‘government spending’, governmentspendingSentiments, governmentspendingKeywords), (‘taxes up’, taxesupSentiments, taxesupKeywords), (‘taxes down’, taxesdownSentiments, taxesdownKeywords) ]

Phillips also provides a snippet of code “for taking random twitter IDs” to create a Twitter “control group.” This part of the code appears to “skim the most recent tweets that have mentioned one of our [Cambridge Analytica’s pre-defined] keywords.”

code snippet “for taking random twitter IDs for the control group”

Phillips explains in the notes within his code about the practicalities of sentiment mining — this is not big data (ie, “all the tweets”) that were being sought out:

“it turned out that skimming all of the tweets found very very few occurances of keywords since “twitter is such a global/multilingual platform.”

Next, Phillips provides a snippet to parse *any* text that CA was “looking for through non-tweets (like transcripts of some sort),” noting that the tool is set up to “find sentiment and adds [it] to the respective keywords’ data list”:

Interesting functionality, indeed. The lines of code then follow with a function that Phillips states:

“goes through tweets of each user, looks for keywords, and if the keyword is there, we find the sentiment for that tweet and add it to the sentiment data list”

sentiment roundup

Finally, the code compiles the collected and refined Twitter data into a set. Phillips describes:

“compiles the sentiment data for each keyword group into an easier to work with format (dataframe) … it is only meaningful if compared with a control group, since keyword selection is impossible to employ neutrally.”

The final output of the Twitteranalysis.py is a list of tweets and Twitter users (via user IDs) from a pre-defined set of keywords (abortion, NRA, Hillary, Obama, lower taxes, guns, immigration, liberals, etc.). All relate to #Election2016 campaign issues. Also, this code appears to be extensible — it can be used outside of Twitter, such as to mine the transcripts and recorded text from focus groups and survey respondents.

snippet for focus group transcripts (and surveys? or NLP?)

These scripts normally wouldn’t be that interesting. But provided both were added by a Cambridge Analytica intern (at least at the time) and contain a running dialog of what the tools do, how they work, and why they were built— and the fact that they are *still* available on Github — I thought I’d share.

Almost every pronoun used in the script walkthroughs (see the archived Github links) is inclusive and plural — “our,” “we,” “we’re,” etc. Also, reference to convert to format “to put into the neural network.”

neural network?

Wait, there’s one more thing. When Phillips committed his original Twitteranalysis.py script, he accidentally left the working Twitter API keys in the code (via the consumer key and consumer “secret”). This contains the alphanumeric strings which are used for the developer account to access data from Twitter’s API.

Interestingly, on Feb 23, 2017 (yes, 2017), Phillips removed the API keys:

Removal of API keys and replacement with placeholder keys.

Two days later, another Github user added a comment about Phillip’s mistake:

user comments on Phillips’ API keys

Was this API key Cambridge Analytica’s? Or SCL’s ? While both scripts— the first including Phillips’ @cambridgeanalytica.org email address, clearly are voter data and election related, from the commentary in the script, it’s not clear who the API key belonged to. This might have been Phillips’ own account.

Regardless, this code shows the inner workings of client voter file geo-data “enrichment” and presumably automated voter database processing for clients by Cambridge Analytica.

This code also provides the proof in showing once and for all how Twitter users’ emotional reactions and real-time discussions even favorites/likes (pulled from the API) are mined in real time and used to create test phrases, establish control groups, and apparently provide sets of future terms around keywords related to political campaign issues.

The fact that Cambridge Analytica was using this kind of code to mine emotional responses that surface from users’ “recent tweets” from a defined set of 2016 presidential campaign “trigger words” is interesting.

I’m confident Phillips provided this data in earnest, as he includes an excellent working description in the purposes and uses of these scripts. He was an CA intern who wanted to show his work to get a job in the future. Yet, this is part of the arsenal of tools used by Cambridge Analytica to geolocate American voters and harness American’s real-time emotional sentiment (see example below for Instagram targeting).

Voter ID + Address > Lat/Long + Congressional District =[real-time Instagram API location targeting]

I’d argue the question of the ownership of Cambridge Analytica — a foreign business previously registered in the United States as a foreign corporation (SCL Elections ) just became a bit more relevant.

Foreign influence— sound familiar?

And that fact that a working Twitter developer API key — possibly one of Cambridge Analytica’s own — was left sitting on GitHub by a data intern for anyone to use is, well, another story. The code will likely be removed soon, so it’s available here:

2016 Election
Cambridge Analytica
Scl
Trump
Data
Recommended from ReadMedium