AI Dividends Without Taxing Compute, Automation, or Equity: A Presumptive Commons-Rent Tax Based on Capabilities and Data Dependence

A proposal for updating data dividend ideas around capability measurement, data provenance, and AI auditing.

2026-06-07

This post will capture some fresh thoughts on how Bernie Sanders's recent AI dividend proposal connects with suggestions from our 2021 data dividends report. I'll describe how my recent interest in attestation across the AI supply chain, the AI evaluation crisis, and Clear Data Rules connects to the topic of data dividends.

More specifically, I have a concrete proposal for how a data-dependence-tax-based data dividend can be updated to the 2026 context by implementing it as a "presumptive commons-rent tax that can be avoided by providing capabilities-to-data attribution." Instead of trying to come up with some proxy for data dependence to rank different companies in terms of how much data they use (e.g. by counting their users, auditing the volume of data within their organizational databases, etc.), we instead start from the assumption that more capable AI systems draw more heavily on the data commons (broadly construed) that humanity has built. AI companies can show evidence that they used licensed or governed data to achieve certain capabilities, and the ecosystem of international AI auditing organizations would work together to verify these claims and lower their tax burden.

If something like this were implemented, we would either end up in a world where companies profiting from highly capable data-dependent systems (which need not be the labs themselves) pay a large amount of aggregate funds into various national funds (or ideally a single internationally governed fund) or a world where the vast majority of upstream data flows through healthy data markets where data creators have the collective leverage necessary to get paid through a mixture of upfront and royalty payments.

Of course, there are a lot of details to be worked out. The main goal of this post is to paint a picture of this policy direction with broad strokes while starting to get specific about the research program needed to implement something like this.

To expand on the idea, let's begin with some context.

Brief history on data dividends research in 2021

In the 2021 report, we analyzed a variety of possible fundraising and disbursement mechanisms for a “data dividend” (which was being discussed by Governor Newsom of California at the time). While we were not aiming to pick a single answer, our "likely good first step" suggestion was a data dependence tax to fund public goods. To make "data dependence" operational, we suggested using user count as a proxy: firms with lots of users are probably getting lots of value from aggregated user/public data.

Some other notable works on data dividends around that time include:

Bax's computational treatment of individual and grouped data dividends using Shapley and Owen values
Wadhwa's Data Catalyst review of the economic impact and feasibility of data dividends
I contributed to this preprint on data dividend design choices and this later EAAMO poster paper on the challenges of “meritocratic” data valuation for dividends

Another key idea from the report was that, in the context of retroactive dividends (as opposed to forward-looking markets), it is probably best to avoid “fine-grained valuation” (e.g., just trying to write individual-specific checks). Of course, in the context of data markets, it could still make sense in some cases to price both individual data points and collective bundles.

In short: for a dividend, we should tax dependence on collective data and disburse it coarsely while we figure out better valuation and interpretability methods. I think the reasoning from that report holds up pretty well in light of AI progress. I also think it’s notable that the motivation described in the Sanders proposal matches the arguments in our original report pretty closely.

However, I also think we should be sensitive to concerns from economists about the impacts of compute taxes, automation taxes, capital taxes, etc., such as those discussed in a recent NBER working paper on AI-related taxes and sovereign wealth funds. After the Sanders op-ed went out, the idea quickly drew a cross-ideological mix of interest, skepticism, and pushback: AP covered the Sanders/Altman/Trump public-ownership conversation, while the Washington Post editorial board, Reason, Cato, and Fortune's coverage of takes from David Sacks captured pro-market, libertarian, and tech policy critiques of government equity stakes in AI companies. The economics concern, summarized: depending on design, an AI dividend tax could have negative effects on growth, investment, diffusion, etc.

A tax that targets "commons extraction"?

If we can avoid it, we might want to avoid explicitly targeting “AI,” “compute,” or “automation.” Instead, the thing we actually want to target is private value extraction from the commons.

Something that’s complicated about foundation models is that they depend on several categories of data that are common-ish. This includes literal, governed digital commons like Wikipedia; public-domain works; commons-y public-web resources like Common Crawl; open-source code; and more implicit commons like click data, trace data, public posts, interaction data, and social graphs.

Capability as a proxy for data dependence (and presumed commons dependence)

I think there’s actually an easy fix to target extraction from the commons: take a "rebuttable presumption of data-dependence" approach to the existence of powerful AI. Currently (and barring a major paradigm shift in AI) all of humanity’s approaches to building powerful AI are data-dependent. Even approaches that use reinforcement learning or synthetic data still have massive data dependencies in the overall training and evaluation pipeline needed to build an AI system.

Instead of using user count or something else as a primary proxy for data dependence, we might consider using capability itself. We would basically be working on a default assumption that if an AI system is very capable and monetized, a meaningful chunk of that value came from commons data. I think this assumption is currently very justified and will remain so in the near term.

There are several reasonable ways we can coarsely estimate the fraction of value attributable to data versus the value attributable to compute, non-data technical progress, interface progress, and other factors. We just need to pick some number. 50%, which happens to appear in the Sanders proposal, might be a reasonable starting placeholder. The more capable a system is, the more burden a company should face in explaining how it got so good.

Thus, we could iterate on various data dividends proposals to design what we might call a “presumptive commons-rent tax.” When a firm makes money from a powerful AI system, we presume some share of the rent came from public/user/commons data. AI operators can lower their presumptive commons-rent tax by showing that capabilities came from data that was acquired under non-commons conditions, e.g., licensed data purchased via a healthy data market.

As a toy example, suppose a highly capable AI system earns $10B in annual rents and the presumptive commons-rent share is set at 50%. If the operator can substantiate that half of its capability-relevant data contribution came from licensed, governed, or reciprocal sources, the taxable commons-rent base might fall from $5B to $2.5B.

How evidence of data use would lower the tax burden

We would need a clean accounting scheme here, with a possible unit being explained data. Explained data would estimate the share of a model’s effective, capability-relevant data contribution that the company can actually account for. To count as tax-reducing explained data, a data source would need to be documented, have provenance and proof of fair acquisition (e.g., because it was licensed, bought under contract, or similar), and plausibly relevant to the capabilities being taxed. Critically, the tax rate would depend on the capability level achieved by a model, so more capable models would require more explained data, in accordance with our scientific understanding of scaling laws and training data attribution.

Preparing such evidence would look something like this:

first, a company profiting from AI models (an important qualifier -- more below, though working through details here will extend beyond this "long blog post" format) prepares a datasheet
second, each entry in the datasheet would be labeled with an acquisition/governance status (licensed, internally generated, public-domain, governed by a data union/trust, etc.)
third, provenance evidence would be collected to support the acquisition/governance classifications
fourth, usage evidence shows how much each data component was actually used (could include details about mixture fractions, sampling rates, repetition, deduplication, training stage, post-training role, eval role, upstream sources for synthetic or RL data -- ultimately this evidence will be reviewed by AI auditor organizations, so it does not have to be 100% standardized and there can be some flexibility for different model types)
fifth, just as datasheet entries would be linked to provenance evidence, usage entries would be linked to ablations/data mixture experiments to show how those data components actually mattered for the relevant capabilities

For a first implementation, we need not rely on a single precise definition of “accepted explained-data points.” A more ready-to-use version could just use evidence tiers as determined by the adjudicating entities.

Model capability comes from a production process involving compute, model size, data quantity, data quality, interface and tool access, etc. Capability measurement would be used to set a default presumed tax rate. Companies could present data details to reduce their tax burden, and an auditor (or a network of auditing organizations) would convert the evidence into "accepted explained-data points" to determine a final rate.

This could create a good set of incentives:

if companies want lower taxes, they should build datasheets and provenance systems from the start
if they want larger reductions, they need to run and share data-centric scientific experiments
if they rely heavily on commons data, they can still do that, but they should pay something back or give something back
the tax is fully avoidable!

Importantly, everything described above would basically involve preparing a report that would look a lot like something required by existing or proposed data transparency laws, such as the EU AI Code of Practice. This is close to something AI companies might need to do anyway!

What does this tax really target?

But wait a second -- if the whole concern around taxing compute or automation is that "we shouldn't tax stuff that we want more of," isn't this potentially even worse than those other taxes, if we interpret this proposal as a tax on intelligence itself or capability itself? Critically, this is not a tax on intelligence or capability, but rather a tax on unexplained or mysterious capability. If an AI operator trains on 100% licensed/accounted-for data, and can show that this data actually drove the relevant capabilities, their commons tax is near zero.

If you train on Wikipedia, Common Crawl, public code, user traces, etc., you would end up paying some reasonable tax back to the commons (and the tax might also be reduced if you show evidence of, e.g., contributing to something like Wikimedia Enterprise, or making in-kind contributions of data, model weights, gold standard generated code, etc.). Ideally, during any kind of transition period, there would be a way to transfer existing reciprocity programs into tax credits as well. And perhaps reciprocity programs could just be integrated into the program in the long term.

Auditing and enforcement

How would this be enforced? This is where the recent momentum around auditing and safety comes in. Capability measurement -- and assessment of the ablations and whether AI operators are able to provide plausible accounts of, at a high level, how data choices drive capabilities -- could be handled by an ecosystem of independent auditing institutions, along the lines of the frontier AI auditing ecosystem.

The ecosystem of auditing orgs would become part of the infrastructure for data governance: measuring capabilities, reviewing provenance, looking at ablations, etc. This would also get companies to contribute to advancing and sharing science about where model capabilities come from, in turn helping the auditing organizations.

Critically, by looping in auditing and safety organizations, this proposal could also take advantage of the fact that AI safety is one area in which there is a plausible path to international cooperation. In fact, I think this might be one of the only ways that it might be plausible to at least lay out a path toward a global wealth fund rather than various national funds and sovereign-focused economic interventions.

A single global wealth fund is morally attractive, because the data commons is transnational, but the more realistic path may be federated: national or regional AI commons funds collect revenue, while treaty or club arrangements allocate some share to global public goods and commons institutions.

Of course, we likely would not want independent AI auditors to be burdened with global taxation responsibility (nor would they likely want a bunch of extra work). Public tax authorities would still set the rules and then accredited auditors (with proportionate support to hire staff to do all this) could review evidence while a public technical board maintains standards. Courts or administrative tribunals would handle disputes.

Compared to compute and automation taxes, I believe this kind of approach would avoid some of the concerns raised by economists, and instead target a specific harm: companies turning collective human activity and public knowledge into private rents without proportionate return.

In the current world, this would mean that AI companies would pay a bunch of taxes, which then might fuel, e.g., a national wealth fund, or ideally a global wealth fund. But it also creates a path to lower the burden: license data, work with data unions/trusts, document provenance, run ablations, or give value back to the commons.

Many open questions

Of course, things that seem simple are often laundering complexity. To highlight a few critical questions and considerations:

Who classifies exactly what systems count as frontier AI, and exactly which organizations are subject to this tax?
The very challenging task of accurate capabilities measurement becomes pivotal -- but we need to figure this out anyway!
Power dynamics within the ecosystem of evaluation institutions will be critical to success. If they get captured, the whole thing fails. But if those institutions are going to get power anyway, this seems like one of the better uses of that power!
Anti-avoidance rules would be essential: synthetic data should inherit provenance obligations from upstream models and datasets; related-party data licenses should face transfer-pricing-style scrutiny; covered revenue should attach to deployment and monetization jurisdiction, not only training location; and open-source or research releases should not automatically exempt closed downstream monetization.
Attribution will always be imperfect. Capability can come from many things (data, compute, algorithmic progress, post-training tricks, scaffolding, inference-time compute, distribution, interface design, and more).
Eventually, the base assumption about capability as a proxy for data dependence might break.
The policy would need careful safe-harbor handling for small, open, nonprofit, and public-interest uses, as well as existing reciprocal commons arrangements

But the overall direction here seems very promising to me. Please let me know what you think. As time permits (and conditional on your feedback -- is this too crazy, too strong of an assumption, redundant with existing proposals, etc.), I hope to put together a more whitepaper-looking version of this blog post to socialize the idea.

Notes and Links for Inline Hyperlinks

Sanders's recent AI dividend proposal - the motivating op-ed calling for a public ownership stake in large AI companies.
Our 2021 data dividends report - the earlier report this post updates, especially its data-dependence tax and coarse disbursement framing.
Attestation across the AI supply chain - background on why provenance, disclosure, and institutional verification matter for AI governance.
The AI evaluation crisis is an opportunity - related argument for treating evaluation infrastructure as a governance opportunity.
Clear Data Rules - related argument that clearer rules can benefit both data creators and AI companies.
NBER working paper on AI-related taxes and sovereign wealth funds - economics background on AI-related tax instruments, growth, investment, diffusion, and wealth-fund design.
AP overview of Sanders, Altman, and Trump discussing public ownership in AI - mainstream coverage showing how quickly the public-ownership idea became a cross-ideological political conversation.
Washington Post editorial board critique of government stakes in AI companies - mainstream editorial skepticism about public equity stakes and government ownership.
Reason critique of the proposed AI wealth fund - libertarian/pro-market critique of the stock-tax and voting-share design.
Cato critique of Sanders's proposal and Trump-era government-ownership precedents - pro-market critique that links the Sanders proposal to broader concerns about state ownership and corporate control.
Fortune coverage of David Sacks's warning about AI nationalization - tech-policy/industry-facing coverage of criticism from David Sacks.
EU AI Code of Practice - example of existing or proposed data-transparency obligations that could overlap with the reporting burden discussed here.
Frontier AI auditing ecosystem - institutional model for independent auditing capacity that could support capability measurement and provenance review.
Wikimedia Enterprise - useful example of high-volume commercial reuse supporting a commons institution.
Data Provenance Initiative paper - useful on data provenance tracking, dataset licensing, attribution, and why “show receipts” is not totally imaginary.
Kaplan et al. scaling laws - useful background for the idea that performance can be modeled as a function of data, model size, and compute.
Data mixing work - useful background for thinking about data mixture experiments and how data composition affects model performance.

Other Notes and Links

OpenAI, “Industrial Policy for the Intelligence Age” - April 2026 policy proposal from inside an AI lab that discusses public wealth funds, social safety nets, industrial policy, and AI-economy redistribution.
Sam Altman, “Moore’s Law for Everything” - 2021 antecedent for AI-funded public dividends, an American Equity Fund, and broad-based ownership of AI-driven wealth.
California AB 2013 - training-data transparency law requiring certain generative AI developers to publish documentation about training data on or before January 1, 2026.
U.S. Copyright Office, “Copyright and Artificial Intelligence Part 3: Generative AI Training” - May 2025 report on fair use, licensing, market effects, and copyright policy for generative AI training.
AP coverage of the Bartz v. Anthropic 2025 ruling and settlement approval - useful legal backdrop for distinguishing fair-use analysis of training from liability tied to pirated acquisition.
The Windfall Clause - classic AI benefits-sharing proposal that is voluntary and profit-triggered, useful as a contrast with tax- and provenance-triggered approaches.
Longpre et al., “Consent in Crisis: The Rapid Decline of the AI Data Commons” - evidence that access to the open web/data commons is becoming more restricted and contested.
Foundation Model Transparency Index and the 2025 update - empirical support for the claim that foundation-model transparency, including around data, remains limited.

Bax, Eric. 2019. “Computing a Data Dividend.” ACM Economics & Computation 2019 (EC ’19), poster presentation. arXiv:1905.01805. https://doi.org/10.48550/arXiv.1905.01805

Wadhwa, Tarun. 2020. “Economic Impact and Feasibility of Data Dividends.” Data Catalyst Institute white paper.

Vincent, Nicholas, Yichun Li, Renee Zha, and Brent Hecht. 2019. “Mapping the Potential and Pitfalls of ‘Data Dividends’ as a Means of Sharing the Profits of Artificial Intelligence.” arXiv:1912.00757 [cs.CY]. https://doi.org/10.48550/arXiv.1912.00757

Vincent, Nicholas, and Brent Hecht. 2023. “Sharing the Winnings of AI with Data Dividends: Challenges with ‘Meritocratic’ Data Valuation.” EAAMO ’23 Poster Track, Boston, MA, USA, October 30-November 1, 2023. Non-archival.

Source revision history

Selected Git commits that changed this source file.

9fb4674b8a 2026-07-12 - Migrate blog into digital presence monorepo

Source and AT Protocol record

Source path
content/writing/posts/2026-06-07-presumptive-commons-rent-tax-ai-dividends.md

AT Protocol URI
at://did:plc:doxvahqvyhyqf32v7wz7p5xk/site.standard.document/3mnqonrblbhwi

Local AT Protocol-shaped preview used to inspect the record before an exact public cache is refreshed.

{
  "note": "Local AT Protocol-shaped preview. Run `make garden-refresh-atproto` to cache exact public records where available.",
  "sourcePath": "content/writing/posts/2026-06-07-presumptive-commons-rent-tax-ai-dividends.md",
  "uri": "at://did:plc:doxvahqvyhyqf32v7wz7p5xk/site.standard.document/3mnqonrblbhwi",
  "value": {
    "$type": "site.standard.document",
    "title": "AI Dividends Without Taxing Compute, Automation, or Equity: A Presumptive Commons-Rent Tax Based on Capabilities and Data Dependence",
    "description": "A proposal for updating data dividend ideas around capability measurement, data provenance, and AI auditing.",
    "publishedAt": "2026-06-07",
    "site": "at://did:plc:doxvahqvyhyqf32v7wz7p5xk/site.standard.publication/3lzrsw2kvwc2m",
    "content": {
      "$type": "at.markpub.markdown",
      "text": "This post will capture some fresh thoughts on how Bernie Sanders's recent [AI dividend proposal][sanders-ai-dividend] connects with suggestions from our 2021 [data dividends report][data-dividend-report]. I'll describe how my recent interest in [attestation across the AI supply chain][attestation], [the AI evaluation crisis][evaluation-crisis], and [Clear Data Rules][clear-data-rules] connects to the topic of data dividends.\n\nMore specifically, I have a concrete proposal for how a data-dependence-tax-based data dividend can be updated to the 2026 context by implementing it as a \"presumptive commons-rent tax that can be avoided by providing capabilities-to-data attribution.\" Instead of trying to come up with some proxy for data dependence to rank different companies in terms of how much data they use (e.g. by counting their users, auditing the volume of data within their organizational databases, etc.), we instead start from the assumption that more capable AI systems draw more heavily on the data commons (broadly construed) that humanity has built. AI companies can show evidence that they used licensed or governed data to achieve certain capabilities, and the ecosystem of international AI auditing organizations would work together to verify these claims and lower their tax burden.\n\nIf something like this were implemented, we would either end up in a world where companies profiting from highly capable data-dependent systems (which need not be the labs themselves) pay a large amount of aggregate funds into various national funds (or ideally a single internationally governed fund) or a world where the vast majority of upstream data flows through healthy data markets where data creators have the collective leverage necessary to get paid through a mixture of upfront and royalty payments.\n\nOf course, there are a lot of details to be worked out. The main goal of this post is to paint a picture of this policy direction with broad strokes while starting to get specific about the research program needed to implement something like this.\n\nTo expand on the idea, let's begin with some context.\n\n## Brief history on data dividends research in 2021\n\nIn the 2021 report, we analyzed a variety of possible fundraising and disbursement mechanisms for a “data dividend” (which was being discussed by Governor Newsom of California at the time). While we were not aiming to pick a single answer, our \"likely good first step\" suggestion was a data dependence tax to fund public goods. To make \"data dependence\" operational, we suggested using user count as a proxy: firms with lots of users are probably getting lots of value from aggregated user/public data.\n\nSome other notable works on data dividends around that time include:\n- [Bax's computational treatment](https://arxiv.org/pdf/1905.01805) of individual and grouped data dividends using Shapley and Owen values\n- [Wadhwa's Data Catalyst review](https://datacatalyst.org/wp-content/uploads/2020/06/Economic-Impact-and-Feasibility-of-Data-Dividends-1.pdf) of the economic impact and feasibility of data dividends\n- I contributed to this [preprint](https://arxiv.org/abs/1912.00757) on data dividend design choices and this later [EAAMO poster paper](http://www.nickmvincent.com/static/eaamo_data_dividends.pdf) on the challenges of “meritocratic” data valuation for dividends\n\nAnother key idea from the report was that, in the context of retroactive dividends (as opposed to forward-looking markets), it is probably best to avoid “fine-grained valuation” (e.g., just trying to write individual-specific checks). Of course, in the context of data markets, it could still make sense in some cases to price both individual data points and collective bundles.\n\nIn short: for a dividend, we should tax dependence on collective data and disburse it coarsely while we figure out better valuation and interpretability methods. I think the reasoning from that report holds up pretty well in light of AI progress. I also think it’s notable that the motivation described in the Sanders proposal matches the arguments in our original report pretty closely.\n\nHowever, I also think we should be sensitive to concerns from economists about the impacts of compute taxes, automation taxes, capital taxes, etc., such as those discussed in a recent [NBER working paper on AI-related taxes and sovereign wealth funds][nber-ai-taxes]. After the Sanders op-ed went out, the idea quickly drew a cross-ideological mix of interest, skepticism, and pushback: [AP covered][ap-ai-public-ownership] the Sanders/Altman/Trump public-ownership conversation, while the [Washington Post editorial board][washpost-sanders-ai-stake], [Reason][reason-sanders-ai-wealth], [Cato][cato-trump-sanders-swf], and [Fortune's coverage of takes from David Sacks][fortune-sacks-ai-equity] captured pro-market, libertarian, and tech policy critiques of government equity stakes in AI companies. The economics concern, summarized: depending on design, an AI dividend tax could have negative effects on growth, investment, diffusion, etc.\n\n## A tax that targets \"commons extraction\"?\n\nIf we can avoid it, we might want to avoid explicitly targeting “AI,” “compute,” or “automation.” _Instead, the thing we actually want to target is private value extraction from the commons._\n\nSomething that’s complicated about foundation models is that they depend on several categories of data that are common-ish. This includes literal, governed digital commons like Wikipedia; public-domain works; commons-y public-web resources like Common Crawl; open-source code; and more implicit commons like click data, trace data, public posts, interaction data, and social graphs.\n\n## Capability as a proxy for data dependence (and presumed commons dependence)\n\nI think there’s actually an easy fix to target extraction from the commons: take a \"rebuttable presumption of data-dependence\" approach to the existence of powerful AI. Currently (and barring a major paradigm shift in AI) all of humanity’s approaches to building powerful AI are data-dependent. Even approaches that use reinforcement learning or synthetic data still have massive data dependencies in the overall training and evaluation pipeline needed to build an AI system.\n\nInstead of using user count or something else as a primary proxy for data dependence, we might consider using capability itself. We would basically be working on a default assumption that if an AI system is very capable and monetized, a meaningful chunk of that value came from commons data. I think this assumption is currently very justified and will remain so in the near term.\n\nThere are several reasonable ways we can coarsely estimate the fraction of value attributable to data versus the value attributable to compute, non-data technical progress, interface progress, and other factors. We just need to pick some number. 50%, which happens to appear in the Sanders proposal, might be a reasonable starting placeholder. The more capable a system is, the more burden a company should face in explaining how it got so good.\n\nThus, we could iterate on various data dividends proposals to design what we might call a “presumptive commons-rent tax.” When a firm makes money from a powerful AI system, we presume some share of the rent came from public/user/commons data. AI operators can lower their presumptive commons-rent tax by showing that capabilities came from data that was acquired under non-commons conditions, e.g., licensed data purchased via a healthy data market.\n\nAs a toy example, suppose a highly capable AI system earns $10B in annual rents and the presumptive commons-rent share is set at 50%. If the operator can substantiate that half of its capability-relevant data contribution came from licensed, governed, or reciprocal sources, the taxable commons-rent base might fall from $5B to $2.5B.\n\n## How evidence of data use would lower the tax burden\n\nWe would need a clean accounting scheme here, with a possible unit being **explained data**. Explained data would estimate the share of a model’s effective, capability-relevant data contribution that the company can actually account for. To count as tax-reducing explained data, a data source would need to be documented, have provenance and proof of fair acquisition (e.g., because it was licensed, bought under contract, or similar), and plausibly relevant to the capabilities being taxed. Critically, the tax rate would depend on the capability level achieved by a model, so more capable models would require more explained data, in accordance with our scientific understanding of [scaling laws][kaplan-scaling-laws] and training data attribution.\n\nPreparing such evidence would look something like this:\n\n- first, a company profiting from AI models (an important qualifier -- more below, though working through details here will extend beyond this \"long blog post\" format) prepares a datasheet\n- second, each entry in the datasheet would be labeled with an acquisition/governance status (licensed, internally generated, public-domain, governed by a data union/trust, etc.)\n- third, provenance evidence would be collected to support the acquisition/governance classifications\n- fourth, usage evidence shows how much each data component was actually used (could include details about mixture fractions, sampling rates, repetition, deduplication, training stage, post-training role, eval role, upstream sources for synthetic or RL data -- ultimately this evidence will be reviewed by AI auditor organizations, so it does not have to be 100% standardized and there can be some flexibility for different model types)\n- fifth, just as datasheet entries would be linked to provenance evidence, usage entries would be linked to ablations/[data mixture experiments][data-mixing-work] to show how those data components actually mattered for the relevant capabilities\n\nFor a first implementation, we need not rely on a single precise definition of “accepted explained-data points.” A more ready-to-use version could just use evidence tiers as determined by the adjudicating entities.\n\nModel capability comes from a production process involving compute, model size, data quantity, data quality, interface and tool access, etc. Capability measurement would be used to set a default presumed tax rate. Companies could present data details to reduce their tax burden, and an auditor (or a network of auditing organizations) would convert the evidence into \"accepted explained-data points\" to determine a final rate.\n\nThis could create a good set of incentives:\n\n- if companies want lower taxes, they should build datasheets and [provenance systems][data-provenance-initiative] from the start\n- if they want larger reductions, they need to run and share data-centric scientific experiments\n- if they rely heavily on commons data, they can still do that, but they should pay something back or give something back\n- the tax is fully avoidable!\n\nImportantly, everything described above would basically involve preparing a report that would look a lot like something required by existing or proposed data transparency laws, such as the [EU AI Code of Practice][eu-ai-code]. This is close to something AI companies might need to do anyway!\n\n## What does this tax really target?\n\nBut wait a second -- if the whole concern around taxing compute or automation is that \"we shouldn't tax stuff that we want more of,\" isn't this potentially even worse than those other taxes, if we interpret this proposal as a tax on intelligence itself or capability itself? Critically, this is not a tax on intelligence or capability, but rather a tax on unexplained or mysterious capability. If an AI operator trains on 100% licensed/accounted-for data, and can show that this data actually drove the relevant capabilities, their commons tax is near zero.\n\nIf you train on Wikipedia, Common Crawl, public code, user traces, etc., you would end up paying some reasonable tax back to the commons (and the tax might also be reduced if you show evidence of, e.g., contributing to something like [Wikimedia Enterprise][wikimedia-enterprise], or making in-kind contributions of data, model weights, gold standard generated code, etc.). Ideally, during any kind of transition period, there would be a way to transfer existing reciprocity programs into tax credits as well. And perhaps reciprocity programs could just be integrated into the program in the long term.\n\n## Auditing and enforcement\n\nHow would this be enforced? This is where the recent momentum around auditing and safety comes in. Capability measurement -- and assessment of the ablations and whether AI operators are able to provide plausible accounts of, at a high level, how data choices drive capabilities -- could be handled by an ecosystem of independent auditing institutions, along the lines of the [frontier AI auditing ecosystem][frontier-ai-auditing].\n\nThe ecosystem of auditing orgs would become part of the infrastructure for data governance: measuring capabilities, reviewing provenance, looking at ablations, etc. This would also get companies to contribute to advancing and sharing science about where model capabilities come from, in turn helping the auditing organizations.\n\nCritically, by looping in auditing and safety organizations, this proposal could also take advantage of the fact that AI safety is one area in which there is a plausible path to international cooperation. In fact, I think this might be one of the only ways that it might be plausible to at least lay out a path toward a global wealth fund rather than various national funds and sovereign-focused economic interventions.\n\nA single global wealth fund is morally attractive, because the data commons is transnational, but the more realistic path may be federated: national or regional AI commons funds collect revenue, while treaty or club arrangements allocate some share to global public goods and commons institutions.\n\nOf course, we likely would not want independent AI auditors to be burdened with global taxation responsibility (nor would they likely want a bunch of extra work). Public tax authorities would still set the rules and then accredited auditors (with proportionate support to hire staff to do all this) could review evidence while a public technical board maintains standards. Courts or administrative tribunals would handle disputes.\n\nCompared to compute and automation taxes, I believe this kind of approach would avoid some of the concerns raised by economists, and instead target a specific harm: companies turning collective human activity and public knowledge into private rents without proportionate return.\n\nIn the current world, this would mean that AI companies would pay a bunch of taxes, which then might fuel, e.g., a national wealth fund, or ideally a global wealth fund. But it also creates a path to lower the burden: license data, work with data unions/trusts, document provenance, run ablations, or give value back to the commons.\n\n## Many open questions\n\nOf course, things that seem simple are often laundering complexity. To highlight a few critical questions and considerations:\n\n- Who classifies exactly what systems count as frontier AI, and exactly which organizations are subject to this tax?\n- The very challenging task of accurate capabilities measurement becomes pivotal -- but we need to figure this out anyway!\n- Power dynamics within the ecosystem of evaluation institutions will be critical to success. If they get captured, the whole thing fails. But if those institutions are going to get power anyway, this seems like one of the better uses of that power!\n- Anti-avoidance rules would be essential: synthetic data should inherit provenance obligations from upstream models and datasets; related-party data licenses should face transfer-pricing-style scrutiny; covered revenue should attach to deployment and monetization jurisdiction, not only training location; and open-source or research releases should not automatically exempt closed downstream monetization.\n- Attribution will always be imperfect. Capability can come from many things (data, compute, algorithmic progress, post-training tricks, scaffolding, inference-time compute, distribution, interface design, and more).\n- Eventually, the base assumption about capability as a proxy for data dependence might break.\n- The policy would need careful safe-harbor handling for small, open, nonprofit, and public-interest uses, as well as existing reciprocal commons arrangements\n\nBut the overall direction here seems very promising to me. Please let me know what you think. As time permits (and conditional on your feedback -- is this too crazy, too strong of an assumption, redundant with existing proposals, etc.), I hope to put together a more whitepaper-looking version of this blog post to socialize the idea.\n\n## Notes and Links for Inline Hyperlinks\n\n- [Sanders's recent AI dividend proposal][sanders-ai-dividend] - the motivating op-ed calling for a public ownership stake in large AI companies.\n- [Our 2021 data dividends report][data-dividend-report] - the earlier report this post updates, especially its data-dependence tax and coarse disbursement framing.\n- [Attestation across the AI supply chain][attestation] - background on why provenance, disclosure, and institutional verification matter for AI governance.\n- [The AI evaluation crisis is an opportunity][evaluation-crisis] - related argument for treating evaluation infrastructure as a governance opportunity.\n- [Clear Data Rules][clear-data-rules] - related argument that clearer rules can benefit both data creators and AI companies.\n- [NBER working paper on AI-related taxes and sovereign wealth funds][nber-ai-taxes] - economics background on AI-related tax instruments, growth, investment, diffusion, and wealth-fund design.\n- [AP overview of Sanders, Altman, and Trump discussing public ownership in AI][ap-ai-public-ownership] - mainstream coverage showing how quickly the public-ownership idea became a cross-ideological political conversation.\n- [Washington Post editorial board critique of government stakes in AI companies][washpost-sanders-ai-stake] - mainstream editorial skepticism about public equity stakes and government ownership.\n- [Reason critique of the proposed AI wealth fund][reason-sanders-ai-wealth] - libertarian/pro-market critique of the stock-tax and voting-share design.\n- [Cato critique of Sanders's proposal and Trump-era government-ownership precedents][cato-trump-sanders-swf] - pro-market critique that links the Sanders proposal to broader concerns about state ownership and corporate control.\n- [Fortune coverage of David Sacks's warning about AI nationalization][fortune-sacks-ai-equity] - tech-policy/industry-facing coverage of criticism from David Sacks.\n- [EU AI Code of Practice][eu-ai-code] - example of existing or proposed data-transparency obligations that could overlap with the reporting burden discussed here.\n- [Frontier AI auditing ecosystem][frontier-ai-auditing] - institutional model for independent auditing capacity that could support capability measurement and provenance review.\n- [Wikimedia Enterprise][wikimedia-enterprise] - useful example of high-volume commercial reuse supporting a commons institution.\n- [Data Provenance Initiative paper][data-provenance-initiative] - useful on data provenance tracking, dataset licensing, attribution, and why “show receipts” is not totally imaginary.\n- [Kaplan et al. scaling laws][kaplan-scaling-laws] - useful background for the idea that performance can be modeled as a function of data, model size, and compute.\n- [Data mixing work][data-mixing-work] - useful background for thinking about data mixture experiments and how data composition affects model performance.\n\n## Other Notes and Links\n\n- [OpenAI, “Industrial Policy for the Intelligence Age”][openai-industrial-policy] - April 2026 policy proposal from inside an AI lab that discusses public wealth funds, social safety nets, industrial policy, and AI-economy redistribution.\n- [Sam Altman, “Moore’s Law for Everything”][moores-law-for-everything] - 2021 antecedent for AI-funded public dividends, an American Equity Fund, and broad-based ownership of AI-driven wealth.\n- [California AB 2013][california-ab-2013] - training-data transparency law requiring certain generative AI developers to publish documentation about training data on or before January 1, 2026.\n- [U.S. Copyright Office, “Copyright and Artificial Intelligence Part 3: Generative AI Training”][copyright-office-ai-training] - May 2025 report on fair use, licensing, market effects, and copyright policy for generative AI training.\n- [AP coverage of the Bartz v. Anthropic 2025 ruling][bartz-anthropic-ruling] and [settlement approval][bartz-anthropic-settlement] - useful legal backdrop for distinguishing fair-use analysis of training from liability tied to pirated acquisition.\n- [The Windfall Clause][windfall-clause] - classic AI benefits-sharing proposal that is voluntary and profit-triggered, useful as a contrast with tax- and provenance-triggered approaches.\n- [Longpre et al., “Consent in Crisis: The Rapid Decline of the AI Data Commons”][consent-in-crisis] - evidence that access to the open web/data commons is becoming more restricted and contested.\n- [Foundation Model Transparency Index][foundation-model-transparency-index] and [the 2025 update][foundation-model-transparency-index-2025] - empirical support for the claim that foundation-model transparency, including around data, remains limited.\n\n## Related Data Dividend References\n\nBax, Eric. 2019. “[Computing a Data Dividend](https://arxiv.org/pdf/1905.01805).” *ACM Economics & Computation 2019 (EC ’19)*, poster presentation. arXiv:1905.01805. https://doi.org/10.48550/arXiv.1905.01805\n\nWadhwa, Tarun. 2020. “[Economic Impact and Feasibility of Data Dividends](https://datacatalyst.org/wp-content/uploads/2020/06/Economic-Impact-and-Feasibility-of-Data-Dividends-1.pdf).” Data Catalyst Institute white paper.\n\nVincent, Nicholas, Yichun Li, Renee Zha, and Brent Hecht. 2019. “[Mapping the Potential and Pitfalls of ‘Data Dividends’ as a Means of Sharing the Profits of Artificial Intelligence](https://arxiv.org/abs/1912.00757).” arXiv:1912.00757 [cs.CY]. https://doi.org/10.48550/arXiv.1912.00757\n\nVincent, Nicholas, and Brent Hecht. 2023. “[Sharing the Winnings of AI with Data Dividends: Challenges with ‘Meritocratic’ Data Valuation](http://www.nickmvincent.com/static/eaamo_data_dividends.pdf).” *EAAMO ’23 Poster Track*, Boston, MA, USA, October 30-November 1, 2023. Non-archival.\n\n[sanders-ai-dividend]: https://www.sanders.senate.gov/op-eds/the-public-should-own-half-of-the-big-a-i-companies/\n[data-dividend-report]: https://www.nickmvincent.com/static/Data-Dividend_final.pdf\n[attestation]: https://dataleverage.substack.com/p/attestation-across-the-ai-supply\n[evaluation-crisis]: https://dataleverage.substack.com/p/the-ai-evaluation-crisis-is-an-opportunity\n[clear-data-rules]: https://dataleverage.substack.com/p/almost-everybody-including-both-data\n[eu-ai-code]: https://digital-strategy.ec.europa.eu/en/policies/ai-code-practice\n[frontier-ai-auditing]: https://www.averi.org/ourwork/frontier-ai-auditing\n[nber-ai-taxes]: https://www.nber.org/papers/w34873\n[ap-ai-public-ownership]: https://apnews.com/article/sam-altman-ai-bernie-sanders-trump-public-ownership-772224f9cd138eb79d3ef3336858a5d5\n[washpost-sanders-ai-stake]: https://www.washingtonpost.com/opinions/2026/06/03/bernie-sanders-wants-government-stake-ai-companies/\n[reason-sanders-ai-wealth]: https://reason.com/2026/06/02/bernie-sanders-ai-wealth-fund-bill-shows-that-he-doesnt-understand-ai-or-wealth/\n[cato-trump-sanders-swf]: https://www.cato.org/blog/trump-opened-door-sanderss-sovereign-wealth-fund\n[fortune-sacks-ai-equity]: https://fortune.com/2026/06/06/former-ai-czar-david-sacks-bernie-sanders-bill-government-equity-stupidity-tax-nationalization-trump-public-stakes/\n[wikimedia-enterprise]: https://wikimediafoundation.org/news/2021/10/25/wikimedia-foundation-launches-wikimedia-enterprise-the-new-opt-in-product-for-companies-and-organizations-to-easily-reuse-content-from-wikipedia-and-wikimedia-projects/\n[data-provenance-initiative]: https://www.nature.com/articles/s42256-024-00878-8\n[kaplan-scaling-laws]: https://arxiv.org/abs/2001.08361\n[data-mixing-work]: https://arxiv.org/abs/2403.16952\n[openai-industrial-policy]: https://openai.com/index/industrial-policy-for-the-intelligence-age/\n[moores-law-for-everything]: https://moores.samaltman.com/\n[california-ab-2013]: https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=202320240AB2013\n[copyright-office-ai-training]: https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf\n[bartz-anthropic-ruling]: https://apnews.com/article/1e5cece51c2e4bd0bb21d94de2abb035\n[bartz-anthropic-settlement]: https://apnews.com/article/9643064e847a5e88ef6ee8b620b3a44c\n[windfall-clause]: https://arxiv.org/abs/1912.11595\n[consent-in-crisis]: https://arxiv.org/abs/2407.14933\n[foundation-model-transparency-index]: https://crfm.stanford.edu/fmti/May-2024/index.html\n[foundation-model-transparency-index-2025]: https://arxiv.org/abs/2512.10169\n"
    }
  }
}