research-article Open Access
- Authors:
- Roberto Ulloa Computational Social Science, GESIS – Leibniz-Institute for the Social Sciences, Germany
Computational Social Science, GESIS – Leibniz-Institute for the Social Sciences, Germany
https://orcid.org/0000-0002-9870-5505
Search about this author
- Mykola Makhortykh Institute of Communication and Media Studies, University of Bern, Switzerland
Institute of Communication and Media Studies, University of Bern, Switzerland
Search about this author
- Aleksandra Urman Social Computing Group, University of Zurich, Switzerland
Social Computing Group, University of Zurich, Switzerland
Search about this author
Journal of Information ScienceVolume 50Issue 2Apr 2024pp 404–419https://doi.org/10.1177/01655515221093029
Published:16 April 2024Publication History
- 0citation
- 0
- Downloads
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
Journal of Information Science
Volume 50, Issue 2
PreviousArticleNextArticle
Abstract
Algorithm audits have increased in recent years due to a growing need to independently assess the performance of automatically curated services that process, filter and rank the large and dynamic amount of information available on the Internet. Among several methodologies to perform such audits, virtual agents stand out because they offer the ability to perform systematic experiments, simulating human behaviour without the associated costs of recruiting participants. Motivated by the importance of research transparency and replicability of results, this article focuses on the challenges of such an approach. It provides methodological details, recommendations, lessons learned and limitations based on our experience of setting up experiments for eight search engines (including main, news, image and video sections) with hundreds of virtual agents placed in different regions. We demonstrate the successful performance of our research infrastructure across multiple data collections, with diverse experimental designs, and point to different changes and strategies that improve the quality of the method. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time, and we hope that this article can serve as a basis for further research in this area.
References
- [1] Gillespie T. The relevance of algorithms. Media Technol Essays Commun Mater Soc 2014; 167: 167.Google Scholar
- [2] Noble SU. Algorithms of oppression: how search engines reinforce racism. New York: New York University Press, 2018.Google ScholarCross Ref
- [3] O’Neil C. Weapons of math destruction: how big data increases inequality and threatens democracy. New York: Crown, 2016.Google Scholar
- [4] Mittelstadt B. Automation, algorithms, and politics| auditing for transparency in content personalization systems. Int J Commun 2016; 10: 12.Google Scholar
- [5] Bandy J. Problematic machine behavior: a systematic literature review of algorithm audits. ArXiv210204256 Cs, http://arxiv.org/abs/2102.04256 (2021, accessed 23 April 2021).Google Scholar
- [6] Diakopoulos NTrielli DStark J et al.. I vote for – how search informs our choice of candidate. In: Moore MTambini D (eds) Digital dominance: the power of Google, Amazon, Facebook, and Apple. Oxford: Oxford University Press, 2018, p. 22.Google Scholar
- [7] Hu DJiang SRobertson RE et al.. Auditing the partisanship of Google Search snippets. In:
The World Wide Web conference ,San Francisco, CA ,13–17 May 2019 , pp. 693–704. New York: Association for Computing Machinery.Google Scholar - [8] Kulshrestha JEslami MMessias J et al.. Quantifying search bias: investigating sources of bias for political searches in social media. In:
Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing ,Portland, OR ,25 February–1 March 2017 , pp. 417–432. New York: Association for Computing Machinery.Google Scholar - [9] Metaxa DPark JSLanday JA et al.. Search media and elections: a longitudinal investigation of political search results. Proc ACM Hum Comput Interact 2019; 3: 1291–12917.Google Scholar
- [10] Trielli DDiakopoulos N. Search as news curator: the role of Google in shaping attention to news information. In:
Proceedings of the 2019 CHI conference on human factors in computing systems ,Glasgow ,4–9 May 2019 , pp. 1–15. New York: Association for Computing Machinery.Google Scholar - [11] Urman AMakhortykh MUlloa R. The matter of chance: auditing web search results related to the 2020 U.S. presidential primary elections across six search engines. Soc Sci Comput Rev. Epub ahead of print 28 April 2021. DOI: 10.1177/08944393211006863.Google ScholarDigital Library
- [12] Courtois CSlechten LCoenen L. Challenging Google Search filter bubbles in social and political information: disconforming evidence from a digital methods case study. Telemat Inform 2018; 35: 2006–2015.Google ScholarCross Ref
- [13] Cozza VHoang VTPetrocchi M et al.. Experimental measures of news personalization in Google News. In: Casteleyn SDolog PPautasso C (eds) Current trends in web engineering. Cham: Springer International Publishing, 2016, pp. 93–104.Google Scholar
- [14] Haim MGraefe ABrosius H-B. Burst of the filter bubble? Effects of personalization on the diversity of Google News. Digit Journal 2018; 6: 330–343.Google ScholarCross Ref
- [15] Puschmann C. Beyond the bubble: assessing the diversity of political search results. Digit Journal 2019; 7: 824–843.Google ScholarCross Ref
- [16] Robertson REJiang SJoseph K et al.. Auditing partisan audience bias within Google Search. Proc ACM Hum Comput Interact 2018; 2: 1481–14822.Google Scholar
- [17] Robertson RELazer DWilson C. Auditing the personalization and composition of politically-related search engine results pages. In:
Proceedings of the 2018 World Wide Web conference ,Lyon ,23–27 April 2018 , pp. 955–965. Geneva: International World Wide Web Conferences Steering Committee.Google Scholar - [18] Hannak ASapiezynski PMolavi Kakhki A et al.. Measuring personalization of web search. In:
Proceedings of the 22nd international conference on World Wide Web – WWW ’13 ,Rio de Janeiro, Brazil ,13–17 May 2013 , pp. 527–538. New York: ACM Press.Google Scholar - [19] Kliman-Silver CHannak ALazer D et al.. Location, location, location: the impact of geolocation on web search personalization. In:
Proceedings of the 2015 Internet measurement conference ,Tokyo, Japan ,28–30 October 2015 , pp. 121–127. New York: Association for Computing Machinery.Google Scholar - [20] Otterbacher JBates JClough P. Competent men and warm women: gender stereotypes and backlash in image search results. In:
Proceedings of the 2017 CHI conference on human factors in computing systems ,Denver, CO ,6–11 May 2017 , pp. 6620–6631. New York: Association for Computing Machinery.Google Scholar - [21] Singh VKChayko MInamdar R et al.. Female librarians and male computer programmers? Gender bias in occupational images on digital media platforms. J Assoc Inf Sci Technol 2020; 71: 1281–1294.Google ScholarDigital Library
- [22] Makhortykh MUrman AUlloa R. Detecting race and gender bias in visual representation of AI on web search engines. In: Boratto LFaralli SMarras M et al.. (eds) Advances in bias and fairness in information retrieval. Cham: Springer International Publishing, 2021, pp. 36–50.Google Scholar
- [23] Cano-Orón L.. Google, what can you tell me about homeopathy? Comparative study of the top10 websites in the United States, United Kingdom, France, Mexico and Spain. Prof Inf 2019; 28: e280212.Google Scholar
- [24] Haim MArendt FScherr S. Abyss or shelter? On the relevance of web search engines’ search results when people Google for suicide. Health Commun 2017; 32: 253–258.Google ScholarCross Ref
- [25] Makhortykh MUrman AUlloa R. How search engines disseminate information about COVID-19 and why they should do better. Harv Kennedy Sch Misinformation Rev 2020; 1: 1–12.Google Scholar
- [26] Fischer SJaidka KLelkes Y. Auditing local news presence on Google News. Nat Hum Behav 2020; 4: 1236–1244.Google ScholarCross Ref
- [27] Lurie EMustafaraj E. Opening up the black box: auditing Google’s top stories algorithm. Proc Int Fla Artif Intell Res Soc Conf 2019; 32: 376–382, https://par.nsf.gov/biblio/10101277-opening-up-black-box-auditing-googles-top-stories-algorithm (accessed 7 May 2021).Google Scholar
- [28] Nechushtai ELewis SC. What kind of news gatekeepers do we want machines to be? Filter bubbles, fragmentation, and the normative dimensions of algorithmic recommendations. Comput Hum Behav 2019; 90: 298–307.Google ScholarCross Ref
- [29] Urman AMakhortykh MUlloa R. Auditing source diversity bias in video search results using virtual agents. In:
Companion proceedings of the web conference ,Ljubljana ,19–23 April 2021 , pp. 232–236. New York: Association for Computing Machinery.Google Scholar - [30] Hussein EJuneja PMitra T. Measuring misinformation in video search platforms: an audit study on YouTube. Proc ACM Hum Comput Interact 2020; 4: 48.Google Scholar
- [31] Makhortykh MUrman AUlloa R. Hey, Google, is this what the Holocaust looked like? Auditing algorithmic curation of visual historical content on web search engines. First Monday. Epub ahead of print 4 October 2021. DOI: 10.5210/fm.v26i10.11562.Google ScholarCross Ref
- [32] Zavadski AToepfl F. Querying the Internet as a mnemonic practice: how search engines mediate four types of past events in Russia. Media Cult Soc 2019; 41: 21–37.Google Scholar
- [33] McMahon CJohnson IHecht B. The substantial interdependence of Wikipedia and Google: a case study on the relationship between peer production communities and information technologies. Proc Int AAAI Conf Web Soc Media 2017; 11, https://ojs.aaai.org/index.php/ICWSM/article/view/14883 (accessed 7 May 2021).Google Scholar
- [34] Vincent NJohnson ISheehan P et al.. Measuring the importance of user-generated content to search engines. Proc Int AAAI Conf Web Soc Media 2019; 13: 505–516.Google Scholar
- [35] Haim M. Agent-based testing: an automated approach toward artificial reactions to human behavior. Journal Stud 2020; 21: 895–911.Google ScholarCross Ref
- [36] Datta ATschantz MCDatta A. Automated experiments on ad privacy settings. Proc Priv Enhancing Technol 2015; 2015: 92–112.Google ScholarCross Ref
- [37] McCown FNelson ML. Agreeing to disagree: search engines and their public interfaces. In:
Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries ,Vancouver, BC, Canada ,18–23 June 2007 , pp. 309–318. New York: Association for Computing Machinery.Google Scholar - [38] Jimmy Zuccon GDemartini G. On the volatility of commercial search engines and its impact on information retrieval research. In:
The 41st International ACM SIGIR conference on research & development in information retrieval ,Ann Arbor, MI ,8–12 July 2018 , pp. 1105–1108. New York: Association for Computing Machinery.Google Scholar - [39] Bodo BHelberger NIrion K et al.. Tackling the algorithmic control crisis – the technical, legal, and ethical challenges of research into algorithmic agents. Yale J Law Technol 2018; 19: 133–180, https://digitalcommons.law.yale.edu/yjolt/vol19/iss1/3Google Scholar
- [40] Möller Jvan de Velde RNMerten L et al.. Explaining online news engagement based on browsing behavior: creatures of habit? Soc Sci Comput Rev 2020; 38: 616–632.Google ScholarDigital Library
- [41] Mattu SYin LWaller A et al.. How we built a Facebook inspector. The Markup, 5 January 2021, https://themarkup.org/citizen-browser/2021/01/05/how-we-built-a-facebook-inspector (accessed 6 May 2021).Google Scholar
- [42] Feuz MFuller MStalder F. Personal web searching in the age of semantic capitalism: diagnosing the mechanisms of perso nalisation. First Monday. Epub ahead of print February 2011. DOI: 10.5210/fm.v16i2.3344.Google ScholarCross Ref
- [43] Mikians JGyarmati LErramilli V et al.. Detecting price and search discrimination on the internet. In:
Proceedings of the 11th ACM workshop on hot topics in networks ,Redmond, WA ,29–30 October 2012 , pp. 79–84. New York: Association for Computing Machinery.Google Scholar - [44] Scherr SHaim MArendt F. Equal access to online information? Google’s suicide-prevention disparities may amplify a global digital divide. New Media Soc 2019; 21: 562–582.Google Scholar
- [45] Urman AMakhortykh MUlloa R. Visual representation of migrants in web search results, https://boris.unibe.ch/156714/Google Scholar
- [46] Meyers PJ. YouTube dominates Google video in 2020. MOZ, 14 October 2020, https://moz.com/blog/youtube-dominates-google-video-results-in-2020 (accessed 6 May 2021).Google Scholar
- [47] Schechner SGrind KWest J. Searching for video? Google pushes YouTube over rivals. The Wall Street Journal, 14 July 2020, https://www.wsj.com/articles/google-steers-users-to-youtube-over-rivals-11594745232 (accessed 6 May 2021).Google Scholar
- [48] Asplund JEslami MSundaram H et al.. Auditing race and gender discrimination in online housing markets. Proc Int AAAI Conf Web Soc Media 2020; 14: 24–35.Google Scholar
- [49] Hannak ASoeller GLazer D et al.. Measuring price discrimination and steering on E-commerce web sites. In:
Proceedings of the 2014 conference on internet measurement conference ,Vancouver, BC, Canada ,5–7 November 2014 , pp. 305–318. New York: Association for Computing Machinery.Google Scholar - [50] Hupperich TTatang DWilkop N et al.. An empirical study on online price differentiation. In: Proceedings of the eighth ACM conference on data and application security and privacy,
Tempe, AZ ,19–21 March 2018 , pp. 76–83. New York: Association for Computing Machinery.Google ScholarDigital Library - [51] Eriksson MCJohansson A. Tracking gendered streams. Cult Unbound 2017; 9: 163–183.Google ScholarCross Ref
- [52] Snickars P. More of the same – on Spotify radio. Cult Unbound 2017; 9: 184–211.Google Scholar
- [53] Chakraborty AGanguly N. Analyzing the news coverage of personalized newspapers. In:
2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM) ,Barcelona ,28–31 August 2018 , pp. 540–543. New York: IEEE.Google Scholar - [54] WebBot Ulloa R. (3.2) [Computer software]. GESIS – Leibniz Institute for the Social Sciences, 2021, https://github.com/gesiscss/WebBot.Google Scholar
- [55] Aigenseer VUrman AChristner C et al.. Webtrack – desktop extension for tracking users’ browsing behaviour using screen-scraping, https://boris.unibe.ch/139219/Google Scholar
- [56] Chrome Developers. chrome.BrowsingData, https://developer.chrome.com/docs/extensions/reference/browsingData/ (2021, accessed 4 June 2021).Google Scholar
- [57] MDN Web Docs. browsingData.DataTypeSet. MDN Web Docs, 27 October 2021, https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/browsingData/DataTypeSet (accessed 4 June 2021).Google Scholar
- [58] Search blocking and captcha – captcha. Feedback, https://yandex.com/support/captcha/ (accessed 20 April 2021).Google Scholar
Cited By
View all
Recommendations
- Auditing the Partisanship of Google Search Snippets
WWW '19: The World Wide Web Conference
The text snippets presented in web search results provide users with a slice of page content that they can quickly scan to help inform their click decisions. However, little is known about how these snippets are generated or how they relate to a user's ...
Read More
- Fighting search engine amnesia: reranking repeated results
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Web search engines frequently show the same documents repeatedly for different queries within the same search session, in essence forgetting when the same documents were already shown to users. Depending on previous user interaction with the repeated ...
Read More
- Google Search Engine: Seo Tools You Need to Explode Your Website Traffic - Google Seo, Google Ranking
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Article
- Information
- Contributors
Published in
Journal of Information Science Volume 50, Issue 2
Apr 2024
264 pages
ISSN:0165-5515
Issue’s Table of Contents
© The Author(s) 2022
This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
Sponsors
In-Cooperation
Publisher
Sage Publications, Inc.
United States
Publication History
- Published: 16 April 2024
Author Tags
- Algorithm auditing
- data collection
- search engine audits
- user modelling
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations0
Article Metrics
- View Citations
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Issue’s Table of Contents