References
Acemoglu, Daron, and Simon Johnson. 2025. “Power and Progress: Our
Thousand-Year Struggle over Technology and Prosperity.”
Perspectives on Science and Christian Faith. https://api.semanticscholar.org/CorpusID:265119352.
Ardila, Rosana, Megan Branson, Kelly Davis, Michael Henretty, Michael
Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M Tyers,
and Gregor Weber. 2019. “Common Voice: A Massively-Multilingual
Speech Corpus.” arXiv Preprint arXiv:1912.06670.
Arnold, Eckhart. 2014. “What’s Wrong with Social
Simulations?” The Monist 97: 359–77. https://api.semanticscholar.org/CorpusID:67844223.
Aryabumi, Viraat, Yixuan Su, Raymond Ma, Adrien Morisot, Ivan Zhang,
Acyr Locatelli, Marzieh Fadaee, Ahmet Üstün, and Sara Hooker. 2024.
“To Code, or Not to Code? Exploring Impact of Code in
Pre-Training.” arXiv Preprint arXiv:2408.10914.
Barocas, Solon, and Andrew D. Selbst. 2016. “Big Data’s Disparate
Impact.” California Law Review 104 (3): 671–732.
Batty, Michael, and Paul M. Torrens. 2001. “Modeling Complexity :
The Limits to Prediction.” Cybergeo: European Journal of
Geography. https://api.semanticscholar.org/CorpusID:102344300.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019.
“Reconciling Modern Machine-Learning Practice and the Classical
Bias–Variance Trade-Off.” Proceedings of the National Academy
of Sciences 116 (32): 15849–54. https://doi.org/10.1073/pnas.1903070116.
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret
Shmitchell. 2021. “On the Dangers of Stochastic Parrots: Can
Language Models Be Too Big?” In Proceedings of the ACM
Conference on Fairness, Accountability, and Transparency (FAccT),
610–23.
Blodgett, Su Lin, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020.
“Language (Technology) Is Power: A Critical Survey of
‘Bias’ in NLP.” In Proceedings of ACL,
5454–76.
Buolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades:
Intersectional Accuracy Disparities in Commercial Gender
Classification.” In Proceedings of the Conference on
Fairness, Accountability and Transparency (FAT*), 77–91.
Carlini, Nicholas, Matthew Jagielski, Christopher A. Choquette-Choo,
Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas,
and Florian Tramèr. 2024. “Poisoning Web-Scale Training Datasets
Is Practical.” https://arxiv.org/abs/2302.10149.
Carlini, Nicholas, Florian Tramer, Eric Wallace, Matthew Jagielski,
Ariel Herbert-Voss, Katherine Lee, Adam Roberts, et al. 2021.
“Extracting Training Data from Large Language Models.” In
Proceedings of USENIX Security Symposium.
Crawford, Kate, and Trevor Paglen. 2019. “Excavating AI: The
Politics of Images in Machine Learning Training Sets.” https://www.excavating.ai/.
Creative Commons. 2023. “Understanding CC Licenses and Generative
AI.” https://creativecommons.org/2023/08/18/understanding-cc-licenses-and-generative-ai/.
Deckelmann, Selena. 2023. “Wikipedia’s Value in the Age of
Generative AI.” Wikimedia Foundation. https://wikimediafoundation.org/news/2023/07/12/wikipedias-value-in-the-age-of-generative-ai/.
European Union. 2016. “General Data Protection Regulation (EU)
2016/679.” https://eur-lex.europa.eu/eli/reg/2016/679/oj.
———. 2024. “Artificial Intelligence Act.” https://eur-lex.europa.eu/.
Federal Trade Commission. 2013. “Children’s Online Privacy
Protection Rule (COPPA) — 16 CFR Part 312.” https://www.ftc.gov/legal-library/browse/rules/childrens-online-privacy-protection-rule-coppa.
Fernandez, Raul Castro. 2023. “Data-Sharing Markets: Model,
Protocol, and Algorithms to Incentivize the Formation of Data-Sharing
Consortia.” Proceedings of the ACM on Management of Data
1: 1–25. https://api.semanticscholar.org/CorpusID:259213174.
Gebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman
Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018.
“Datasheets for Datasets.” In arXiv:1803.09010.
Grother, Patrick, Mei Ngan, and Kayee Hanaoka. 2019. “Face
Recognition Vendor Test (FRVT) Part 3: Demographic Effects.”
NISTIR 8280. NIST. https://doi.org/10.6028/NIST.IR.8280.
Hestness, Joel, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo
Jun, Hassan Kianinejad, Md Patwary, Mostofa Ali, Yang Yang, and Yanqi
Zhou. 2017. “Deep Learning Scaling Is Predictable,
Empirically.” arXiv Preprint arXiv:1712.00409.
Holland, Sarah, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia
Chmielinski. 2018. “The Dataset Nutrition Label: A Framework to
Drive Higher Data Quality Standards.” https://arxiv.org/abs/1805.03677.
Hubinger, Evan, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte
MacDiarmid, Tamera Lanham, et al. 2024. “Sleeper Agents: Training
Deceptive LLMs That Persist Through Safety Training.” https://arxiv.org/abs/2401.05566.
Hwang, Sohyeon, Priyanka Nanayakkara, and Yan Shvartzshnaider. 2025.
“Trust and Friction: Negotiating How Information Flows Through
Decentralized Social Media.” arXiv Preprint
arXiv:2503.02150.
Illinois General Assembly. 2008. “Biometric Information Privacy
Act (BIPA), 740 ILCS 14.” https://www.ilga.gov/legislation/ilcs/ilcs3.asp?ActID=3004.
ISO/IEC 23894:2023 Information Technology—Artificial
Intelligence—Risk Management. 2023. ISO/IEC.
Jackson, Brandon, B Cavello, Flynn Devine, Nick Garcia, Samuel J. Klein,
Alex Krasodomski, Joshua Tan, and Eleanor Tursman. 2024. “Public
AI: Infrastructure for the Common
Good.” Public AI Network. https://doi.org/10.5281/zenodo.13914560.
Jo, Emily, and Timnit Gebru. 2020. “Lessons from Archives:
Strategies for Collecting Sociocultural Data in Machine
Learning.” In Proceedings of FAccT, 306–16.
Johnson, Isaac, Lucie-Aimée Kaffee, and Miriam Redi. 2024.
“Wikimedia Data for AI: A Review of Wikimedia Datasets for NLP
Tasks and AI-Assisted Editing.” arXiv Preprint
arXiv:2410.08918.
Liu, Jason. 2024. “Data Flywheel Go
Brrr: Using Your
Users to Build Better
Products - Jason Liu.” https://jxnl.co/writing/2024/03/28/data-flywheel/.
Liu, Jiacheng, Taylor Blanton, Yanai Elazar, Sewon Min, YenSung Chen,
Arnavi Chheda-Kothary, Huy Tran, et al. 2025. “OLMoTrace: Tracing
Language Model Outputs Back to Trillions of Training Tokens.”
arXiv Preprint arXiv:2504.07096.
Marda, Nik, Jasmine Sun, and Mark Surman. 2024. “Public AI: Making
AI Work for Everyone, by Everyone.” Mozilla. https://assets.
mofoprod. net/network/documents/Public_AI_Mozilla. pdf.
Marwell, Gerald, and Pamela Oliver. 1993. The Critical Mass in
Collective Action. Cambridge University Press.
McCallister, Erika, Tim Grance, and Karen Scarfone. 2010. “Guide
to Protecting the Confidentiality of Personally Identifiable Information
(PII).” SP 800-122. NIST.
McDonald, Nora, Benjamin Mako Hill, Rachel Greenstadt, and Andrea Forte.
2019. “Privacy, Anonymity, and Perceived Risk in Open
Collaboration: A Study of Service Providers.” In Proceedings
of the 2019 CHI Conference on Human Factors in Computing Systems,
1–12.
Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy
Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and
Timnit Gebru. 2019. “Model Cards for Model Reporting.” In
Proceedings of the ACM Conference on Fairness, Accountability, and
Transparency (FAccT), 220–29.
Murphy, Kevin P. 2022. Probabilistic Machine Learning: An
Introduction. MIT Press. http://probml.github.io/book1.
Narayanan, Arvind, and Vitaly Shmatikov. 2008. “Robust
de-Anonymization of Large Sparse Datasets.” In Proceedings of
the IEEE Symposium on Security and Privacy, 111–25.
Nissenbaum, Helen. 2004. “Privacy as Contextual Integrity.”
Washington Law Review 79 (1): 119–57.
NIST. 2023. “Artificial Intelligence Risk Management Framework (AI
RMF 1.0).” NIST AI 100-1. National Institute of Standards;
Technology; https://www.nist.gov/ai.
Obermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil
Mullainathan. 2019. “Dissecting Racial Bias in an Algorithm Used
to Manage the Health of Populations.” Science 366
(6464): 447–53.
OWASP. 2023. “OWASP Top 10 for Large Language Model
Applications.” https://owasp.org/www-project-top-10-for-large-language-model-applications/.
Raji, Inioluwa Deborah, Indra Elizabeth Kumar, Aaron Horowitz, and
Andrew D. Selbst. 2022. “The Fallacy of AI Functionality.”
Proceedings of the 2022 ACM Conference on Fairness, Accountability,
and Transparency. https://api.semanticscholar.org/CorpusID:249872658.
Rakova, Bogdana, Renee Shelby, and Megan Ma. 2023.
“Terms-We-Serve-with: Five Dimensions for Anticipating and
Repairing Algorithmic Harm.” Big Data & Society 10
(2): 20539517231211553.
Roche, Adam, and Yali Sassoon. 2024. “What Is a Data
Flywheel? A Guide to
Sustainable Business
Growth.” Snowplow Blog. https://snowplow.io/blog/what-is-a-data-flywheel.
“Rosenbach v. Six Flags Entertainment Corp.” 2019. 2019 IL
123186, Supreme Court of Illinois.
Selbst, Andrew D., Danah Boyd, Suresh Venkatasubramanian Friedler,
Suresh Venkatasubramanian, and Janet Vertesi. 2019. “Fairness and
Abstraction in Sociotechnical Systems.” In Proceedings of the
ACM Conference on Fairness, Accountability, and Transparency
(FAccT), 59–68.
Shankar, Shreya. 2024. “Data Flywheels for
LLM Applications.” Shreya Shankar’s
Blog. https://www.sh-reya.com/blog/ai-engineering-flywheel/.
Shelby, Renee, Shalaleh Rismani, Kathryn Henne, AJung Moon, Negar
Rostamzadeh, Paul Nicholas, N’Mah Yilla-Akbari, et al. 2023.
“Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy
for Harm Reduction.” In Proceedings of the 2023 AAAI/ACM
Conference on AI, Ethics, and Society, 723–41. AIES ’23. New York,
NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3600211.3604673.
Shen, Judy Hanwen, Inioluwa Deborah Raji, and Irene Y Chen. 2024.
“The Data Addition Dilemma.” arXiv Preprint
arXiv:2408.04154. https://arxiv.org/abs/2408.04154.
Sorscher, Ben, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari
Morcos. 2022. “Beyond Neural Scaling Laws: Beating Power Law
Scaling via Data Pruning.” Advances in Neural Information
Processing Systems 35: 19523–36.
Sweeney, Latanya. 2000. “Simple Demographics Often Identify People
Uniquely.” Carnegie Mellon University, Data Privacy Working
Paper.
Tan, Joshua, Nicholas Vincent, Katherine Elkins, and Magnus Sahlgren.
2025. “If Open Source Is to Win, It Must Go Public.”
arXiv Preprint arXiv:2507.09296.
U.S. Copyright Office. 2024. “Copyright and Artificial
Intelligence: Policy Studies and Guidance.” https://copyright.gov/ai/.
U.S. Department of Education. 1974. “Family Educational Rights and
Privacy Act (FERPA).” https://www2.ed.gov/policy/gen/guid/fpco/ferpa/index.html.
U.S. Department of Health and Human Services. 2000. “HIPAA Privacy
Rule — 45 CFR Parts 160 and 164.” https://www.hhs.gov/hipaa/for-professionals/privacy/index.html.
Vincent, Nicholas, David Bau, Sarah Schwettmann, and Joshua Tan. 2023.
“An Alternative to Regulation: The Case for Public AI.”
arXiv Preprint arXiv:2311.11350.
Vincent, Nicholas, Mark Surman, and Jake Hirsch-Allen. 2025.
“Canada as a Champion for Public AI: Data, Compute and Open Source
Infrastructure for Economic Growth and Inclusive Innovation.”
Weidinger, Laura, John Mellor, et al. 2021. “Ethical and Social
Risks of Harm from Language Models.” arXiv Preprint
arXiv:2112.04359.
Wolpert, David H, and William G Macready. 2002. “No Free Lunch
Theorems for Optimization.” IEEE Transactions on Evolutionary
Computation 1 (1): 67–82.