Long Posts - Data Leverage Digital Garden

AI Dividends Without Taxing Compute, Automation, or Equity: A Presumptive Commons-Rent Tax Based on Capabilities and Data Dependence

2026-06-07 Long Posts Public copy

A proposal for updating data dividend ideas around capability measurement, data provenance, and AI auditing.

The AI "Evaluation Crisis" Is an Opportunity to Get Data Flow Right

2026-04-30 Long Posts Public copy

Why the AI evaluation crisis could force a reckoning on dataset provenance, attribution, and consent.

Attestation across the AI Supply Chain

2026-04-08 Long Posts Public copy

A proposal for interoperable attestation objects that connect training data, evaluation labor, and AI-generated outputs across the AI supply chain.

AI is driving the cost of polish down; some musings on fancy versus terse artifacts

2026-04-01 Long Posts Public copy

AI progress means the "polish" of a figure or website no longer proxies for quality. Can we try to turn this into a good thing for curation, attention allocation, and even AI progress itself?

A Short Guide to Data Strikes and Conscious Data Contribution in the Context of 2026 Frontier AI

2026-03-03 Long Posts Public copy

Back to the basics of data leverage.

The Paradox of Reuse in 2026: A Case of Quasi-Enclosure, or "Subsidized Club Goods that Sort of Look Like Public Goods"

2026-02-17 Long Posts Public copy

How we can understand, and react to, the complicated impacts of AI systems on online communities and knowledge commons

The Coding Agent Data Deal

2026-01-12 Long Posts Public copy

On user data control, coding agents as retrievers, and the value of your coding transcripts

Coding agents are (1) a big deal, (2) very relevant to data leverage, and (3) able to help build tools that support data leverage!

2026-01-05 Long Posts Public copy

Sharing an early reaction to recent coding agent discourse and two relevant projects

Almost Everybody -- Including Both Data Creators and AI Companies -- Stands to Benefit from Clearer "Data Rules".

2025-11-26 Long Posts Public copy

In fact, anyone who doesn''t think they will be a "big winner" long term benefits from clear rules, even if it means training data costs more in the short term.

How collective bargaining for information, public AI, and HCI research all fit together

2025-10-11 Long Posts Public copy

Another recap post for the Data Leverage newsletter!

Which datasets should we assume are "in all the AI models"?

2025-09-24 Long Posts Public copy

New model releases keep (re)sparking discussions about training data. What can we assume is upstream in the data river, and what do we want to see happen?

Algorithmic Collective Action With Two Collectives [crosspost]

2025-06-20 Long Posts Public copy

This post was written by Aditya Karan, with support from Nick Vincent and Karrie Karahalios to accompany a FAccT 2025 paper. It was originally published on Jun 19, 2025 via the Crowd Dynamics Lab blog.

On AI-driven Job Apocalypses and Collective Bargaining for Information

2025-06-05 Long Posts Public copy

Reacting to a fresh wave of discussion about AI''s impact on the economy and power concentration, and reiterating the potential role of collective bargaining.

How do we know our AI output is good? Double checks, bar charts, vibes, and training data.

2025-05-30 Long Posts Public copy

Connecting evaluation and dataset documentation via the lens of "AI as ranking".

Each Instance of "AI Utility" Stems from Some Human Act(s) of Information Recording and Ranking

2025-05-28 Long Posts Public copy

It''s ranking information all the way down.

Google and TikTok rank bundles of information; ChatGPT ranks grains.

2025-05-27 Long Posts Public copy

Google and others solve our attentional problem by ranking discrete bundles of information, whereas ChatGPT ranks more granular chunks. This lens can help us reason about AI policy.

Public AI, Data Appraisal, and Data Debates

2025-04-03 Long Posts Public copy

A consortium of Public AI labs can substantially improve data pricing, which may also help to concretize debates about the ethics and legality of training practices.

Evaluation Data Leverage: Advances like "Deep Research" Highlight a Looming Opportunity for Bargaining Power

2025-03-02 Long Posts Public copy

Research agents and increasingly general reasoning models open the door for immense "evaluation data leverage".

Tipping Points for Content Ecosystems

2025-02-12 Long Posts Public copy

Our AI design choices in 2024 could preclude "Powerful AI" in 2030.

AI Labs Should Open Source Data Protection Technologies

2025-01-31 Long Posts Public copy

There''s still incredible tension in the current data paradigm, but sharing "data protection" technologies, like those used by OpenAI to accuse DeepSeek of model theft, can help cut a path forward.

Live by the free-content-for-training sword, die by the free-content-for-training sword

2025-01-28 Long Posts Public copy

There''s deep tension in the current ask-for-forgiveness-free-for-all approach to acquiring data for model training. Will "open" models cause this tension to reach a breaking point?

Selling AGI like AG1: Will Consumers Push Back Against Proprietary Blends of Herbs and of Data?

2024-12-12 Long Posts Public copy

The race to produce premiere AI products with high price tags might change the standards around data disclosure.

Is Zuckerberg right to say that your specific creative work has no value to AI?

2024-09-28 Long Posts Public copy

Examining the Meta CEO''s claim that the "individual work of most creators isn’t valuable enough for it to matter" in the context of AI training.

"Many Models" and "Track Changes" for AI: Some Thoughts on LLM Interfaces

2024-08-08 Long Posts Public copy

Interacting with many models and harnessing the power of `diff`

Building a Data Pipeworks for Democratic AI: From Human Knowledge to Records to AI Systems

2023-11-13 Long Posts Public copy

Focusing on feedback loops -- connecting modern AI to early cybernetics-style thinking -- could help solve looming challenges and support democratic inputs to AI.

Will the New York Times Data Strike Have a Large Impact on ChatGPT?

2023-09-28 Long Posts Public copy

How can we start thinking about how opt-out decisions by content-producing organizations will affect LLMs?

A Harbinger of the Future of Content? The New York Times Starts a Data Strike

2023-08-25 Long Posts Public copy

The New York Times is trying to remove its content from OpenAI models, surfacing tensions around copyright, economic harms, privacy, and the distribution of AI benefits.

The WGA Strike is a Canary in the Coal Mine for AI Labor Concerns

2023-05-05 Long Posts Public copy

Could Upcoming Data Legislation Enable a "Right to Data Strike"?

Reddit, StackOverflow, and Europe: All Trending Towards Data Dignity

2023-05-01 Long Posts Public copy

Once again, we’ve had an eventful few weeks in the space of data-dependent computing!

Data Leverage Recap: December 2022 - April 2023

2023-04-18 Long Posts Public copy

The Last Three Months in Review: What''s New and What''s Next

Bing Rewards for the AI Age

2023-03-30 Long Posts Public copy

The plants in the Gardens by the Bay evoke a sense of flourishing-by-design; photo by Victor from Unsplash.

Plural AI Data Alignment

2023-03-02 Long Posts Public copy

Measuring the Alignment of AI Systems Based on their Data Pipelines

AI Technologies are System Maps, and You are a Cartographer

2023-02-03 Long Posts Public copy

Much of my work is in pursuit of “data dignity”, an idea that stems in part from scholars arguing that we should sometimes think of “data as labor”.

AI Artist or AI Art Thief? Innovation, Public Mandates, and the Case for Talking in Terms of Leverage

2022-12-16 Long Posts Public copy

The public debate over AI has seriously heated up in the wake of new advances in the design and deployment of large generative AI models.

ChatGPT is Awesome and Scary: You Deserve Credit for the Good Parts (and Might Help Fix the Bad Parts)

2022-12-04 Long Posts Public copy

More on why you''re an expert language model trainer

The Paradox of Reuse, Language Models Edition

2022-12-02 Long Posts Public copy

Background

Don’t give OpenAI all the credit for GPT-3: You might have helped create the latest “astonishing” advance in AI too

2020-09-22 Long Posts Public copy

The much-celebrated GPT-3 that can answer questions, write poems, and more wouldn’t be possible without content written by millions of people around the world. Shouldn’t they get some credit?