10  A Record-Generation Model

Status: first draft complete.

This section defines a compact model for how interactions become public records. The goal is a small, measurable pipeline you can use for product design and metrics without locking into a single implementation.

10.1 Notation and Objects

We define symbols once and reuse them:

Symbol Meaning
X system or deployment where interactions happen; time is discrete by event
i interaction index
x_i interaction context (agent, UI, model, route, timestamp)
o_i origin in {natural, prompted}
r_i record type in R (e.g., chat, feedback, label, summary)
K publish channels (e.g., git, huggingface, bluesky, site)

10.2 Pipeline States (per interaction)

Each interaction can move through a simple chain. Most variables are binary.

  • E_i: eligible under policy (type, scope, age).
  • C_i: captured locally (structured log or draft exists).
  • S_i: share prompt shown (only if the product prompts).
  • A_i: authorization or consent to publish (may include license + name).
  • T_i: transform or sanitize succeeded (PII checks, schema, hashing).
  • P_i^k: published to channel k ∈ K (accepted or merged state).
  • M_i: workflow state for PR-style channels in {open, merged, closed, tombstoned}.

Reading left to right, the pipeline is: E -> C -> S -> A -> T -> P. If any stage is 0, the record cannot be published for that channel.

10.3 Record Metadata

When an interaction is publishable, attach:

  • ψ(x_i) (psi): schema mapping from interaction to structured payload.
  • L_i: license (e.g., CC0, CC-BY, CC-BY-SA).
  • U_i: AI-use preference (e.g., train-genai=n;exceptions=cc-cr).
  • name_i: attribution string (username, pseudonym, anonymous).
  • uid_i: stable contribution id.

10.4 Policy Knobs and Rates

Let θ collect product and policy settings (logging mode, prompt rules, defaults). Define rates:

  • q_i = Pr[S_i=1 | x_i, θ]: prompt rate.
  • p_c(i) = Pr[C_i=1 | x_i, θ]: capture rate.
  • p_a(i) = Pr[A_i=1 | x_i, S_i, θ]: consent rate.
  • p_t(i) = Pr[T_i=1 | x_i, θ]: transform success.
  • p_k(i) = Pr[P_i^k=1 | x_i, θ]: channel acceptance or merge.

10.5 Publish Probability

For a single channel k:

Pr[P_i^k=1] = 1[E_i] · p_c(i) · p_a(i) · p_t(i) · p_k(i)

Here, 1[E_i] is an indicator: it is 1 when the interaction is eligible and 0 otherwise.

For multiple channels, the publish vector is P_i = (P_i^k)_{k∈K}.

10.6 Retraction and Lifecycle

After publication, items can be retracted. Model this with:

  • D_i^k(t) ∈ {0,1}: item i is withdrawn from channel k at time t.
  • h_d(i,k): hazard of retraction.
  • M_i: tracks the visible lifecycle state.

10.7 Prompted vs. Natural

Origin affects consent. When o_i=prompted, the product increases q_i and typically increases p_a(i) by surfacing consent at the right time. Natural interactions can still be captured (C_i=1) without publishing (A_i=0).

10.8 Minimal Schema

ψ(x_i) should at least include:

  • uid, type, created_at, route, model.
  • content (messages or summary) and optional feedback.
  • license=L_i, ai_use=U_i, attribution=name_i.
  • provenance: source=o_i, prompt_shown=S_i, checks (PII flags), hashes.

10.9 Metrics and Levers

  • Throughput for window W: expected published count Λ_k = E[∑_{i∈W} P_i^k].
  • Stage yield: estimate p_c, p_a, p_t, p_k from logs to locate bottlenecks.
  • Quality gates: raise or lower p_t with validators; adjust p_k via review policy.
  • UX levers: adjust q_i (when to prompt) to change p_a(i) without spamming.

10.10 Practical Mapping

  • OpenWebUI action: sets S_i=1, collects A_i, builds ψ(x_i) with L_i, U_i, name_i.
  • Git PR: P_i^git=1 iff merged; otherwise M_i ∈ {open, closed}.
  • Bluesky or microblog: optional P_i^bsky with a short preview and uid link.

This model is intentionally small: a staged Bernoulli chain with explicit metadata. It is sufficient to reason about product changes, consent flows, and publishing policies, and it is easy to compute stage-level rates from operational logs.