10 A Record-Generation Model
Status: first draft complete.
This section defines a compact model for how interactions become public records. The goal is a small, measurable pipeline you can use for product design and metrics without locking into a single implementation.
10.1 Notation and Objects
We define symbols once and reuse them:
| Symbol | Meaning |
|---|---|
X |
system or deployment where interactions happen; time is discrete by event |
i |
interaction index |
x_i |
interaction context (agent, UI, model, route, timestamp) |
o_i |
origin in {natural, prompted} |
r_i |
record type in R (e.g., chat, feedback, label, summary) |
K |
publish channels (e.g., git, huggingface, bluesky, site) |
10.2 Pipeline States (per interaction)
Each interaction can move through a simple chain. Most variables are binary.
E_i: eligible under policy (type, scope, age).C_i: captured locally (structured log or draft exists).S_i: share prompt shown (only if the product prompts).A_i: authorization or consent to publish (may include license + name).T_i: transform or sanitize succeeded (PII checks, schema, hashing).P_i^k: published to channelk ∈ K(accepted or merged state).M_i: workflow state for PR-style channels in{open, merged, closed, tombstoned}.
Reading left to right, the pipeline is: E -> C -> S -> A -> T -> P. If any stage is 0, the record cannot be published for that channel.
10.3 Record Metadata
When an interaction is publishable, attach:
ψ(x_i)(psi): schema mapping from interaction to structured payload.L_i: license (e.g., CC0, CC-BY, CC-BY-SA).U_i: AI-use preference (e.g.,train-genai=n;exceptions=cc-cr).name_i: attribution string (username, pseudonym, anonymous).uid_i: stable contribution id.
10.4 Policy Knobs and Rates
Let θ collect product and policy settings (logging mode, prompt rules, defaults). Define rates:
q_i = Pr[S_i=1 | x_i, θ]: prompt rate.p_c(i) = Pr[C_i=1 | x_i, θ]: capture rate.p_a(i) = Pr[A_i=1 | x_i, S_i, θ]: consent rate.p_t(i) = Pr[T_i=1 | x_i, θ]: transform success.p_k(i) = Pr[P_i^k=1 | x_i, θ]: channel acceptance or merge.
10.5 Publish Probability
For a single channel k:
Pr[P_i^k=1] = 1[E_i] · p_c(i) · p_a(i) · p_t(i) · p_k(i)
Here, 1[E_i] is an indicator: it is 1 when the interaction is eligible and 0 otherwise.
For multiple channels, the publish vector is P_i = (P_i^k)_{k∈K}.
10.6 Retraction and Lifecycle
After publication, items can be retracted. Model this with:
D_i^k(t) ∈ {0,1}: itemiis withdrawn from channelkat timet.h_d(i,k): hazard of retraction.M_i: tracks the visible lifecycle state.
10.7 Prompted vs. Natural
Origin affects consent. When o_i=prompted, the product increases q_i and typically increases p_a(i) by surfacing consent at the right time. Natural interactions can still be captured (C_i=1) without publishing (A_i=0).
10.8 Minimal Schema
ψ(x_i) should at least include:
uid,type,created_at,route,model.content(messages or summary) and optionalfeedback.license=L_i,ai_use=U_i,attribution=name_i.- provenance:
source=o_i,prompt_shown=S_i,checks(PII flags),hashes.
10.9 Metrics and Levers
- Throughput for window
W: expected published countΛ_k = E[∑_{i∈W} P_i^k]. - Stage yield: estimate
p_c, p_a, p_t, p_kfrom logs to locate bottlenecks. - Quality gates: raise or lower
p_twith validators; adjustp_kvia review policy. - UX levers: adjust
q_i(when to prompt) to changep_a(i)without spamming.
10.10 Practical Mapping
- OpenWebUI action: sets
S_i=1, collectsA_i, buildsψ(x_i)withL_i, U_i, name_i. - Git PR:
P_i^git=1iff merged; otherwiseM_i ∈ {open, closed}. - Bluesky or microblog: optional
P_i^bskywith a short preview anduidlink.
This model is intentionally small: a staged Bernoulli chain with explicit metadata. It is sufficient to reason about product changes, consent flows, and publishing policies, and it is easy to compute stage-level rates from operational logs.