BACKGROUND: Social media data is a promising source of social science data. However, deriving the demographic characteristics of users and dealing with the nonrandom, nonrepresentative populations from which they are drawn represent challenges for social scientists. OBJECTIVE: Given the growing use of social media data in social science research, this paper asks two questions: 1) To what extent are findings obtained with social media data generalizable to broader populations, and 2) what is the best practice for estimating demographic information from Twitter data? METHODS: Our analyses use information gathered from 979,992 geo-located Tweets sent by 22,356 unique users in South East England between 23 June and 4 July 2014. We estimate demographic characteristics of the Twitter users with the crowd-sourcing platform CrowdFlower and the image-recognition software Face++. To evaluate bias in the data, we run a series of log-linear models with offsets and calibrate the nonrepresentative sample of Twitter users with mid-year population estimates for South East England. RESULTS: CrowdFlower proves to be more accurate than Face++ for the measurement of age, whereas both tools are highly reliable for measuring the sex of Twitter users. The calibration exercise allows bias correction in the age-, sex-, and location-specific population counts obtained from the Twitter population by augmenting Twitter data with mid-year population estimates. CONTRIBUTION: The paper proposes best practices for estimating Twitter users’ basic demographic characteristics and a calibration method to address the selection bias in the Twitter population, allowing researchers to generalize findings based on Twitter to the general population.

Using Twitter data for demographic research / Yildiz, Dilek; Munson, Jo; Vitali, Agnese; Tinati, Ramine; Holland, Jennifer A.. - In: DEMOGRAPHIC RESEARCH. - ISSN 2363-7064. - 37:1(2017), pp. 1477-1514. [10.4054/DemRes.2017.37.46]

Using Twitter data for demographic research

Vitali, Agnese;
2017-01-01

Abstract

BACKGROUND: Social media data is a promising source of social science data. However, deriving the demographic characteristics of users and dealing with the nonrandom, nonrepresentative populations from which they are drawn represent challenges for social scientists. OBJECTIVE: Given the growing use of social media data in social science research, this paper asks two questions: 1) To what extent are findings obtained with social media data generalizable to broader populations, and 2) what is the best practice for estimating demographic information from Twitter data? METHODS: Our analyses use information gathered from 979,992 geo-located Tweets sent by 22,356 unique users in South East England between 23 June and 4 July 2014. We estimate demographic characteristics of the Twitter users with the crowd-sourcing platform CrowdFlower and the image-recognition software Face++. To evaluate bias in the data, we run a series of log-linear models with offsets and calibrate the nonrepresentative sample of Twitter users with mid-year population estimates for South East England. RESULTS: CrowdFlower proves to be more accurate than Face++ for the measurement of age, whereas both tools are highly reliable for measuring the sex of Twitter users. The calibration exercise allows bias correction in the age-, sex-, and location-specific population counts obtained from the Twitter population by augmenting Twitter data with mid-year population estimates. CONTRIBUTION: The paper proposes best practices for estimating Twitter users’ basic demographic characteristics and a calibration method to address the selection bias in the Twitter population, allowing researchers to generalize findings based on Twitter to the general population.
2017
1
Yildiz, Dilek; Munson, Jo; Vitali, Agnese; Tinati, Ramine; Holland, Jennifer A.
Using Twitter data for demographic research / Yildiz, Dilek; Munson, Jo; Vitali, Agnese; Tinati, Ramine; Holland, Jennifer A.. - In: DEMOGRAPHIC RESEARCH. - ISSN 2363-7064. - 37:1(2017), pp. 1477-1514. [10.4054/DemRes.2017.37.46]
File in questo prodotto:
File Dimensione Formato  
37_46.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.44 MB
Formato Adobe PDF
1.44 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/226332
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 17
social impact