Page-to-Bibliography Matching
When you visit an academic paper online, Extenote checks if it’s already in your bibliography. It tries multiple matching strategies: exact URL, DOI, arXiv ID, and finally title similarity. Each match type has a confidence score.
Step 1: matches exact URL
The fastest match is by exact URL. If the page URL matches an entry’s url field, that’s a perfect match with 100% confidence.
Test: matches exact URL
File: packages/refcheck/tests/matcher.test.ts:74
it("matches exact URL", () => {
const result = matchPageToVault(
"https://arxiv.org/abs/1706.03762",
"Some Page Title",
testEntries
);
expect(result).not.toBeNull();
expect(result!.entry.id).toBe("attention2017");
expect(result!.matchType).toBe("url");
expect(result!.confidence).toBe(1.0);
});
Step 2: matches DOI in URL
DOIs are extracted from URLs like doi.org/10.xxxx or embedded in publisher URLs. DOI matches have 95% confidence since they’re unique identifiers.
Test: matches DOI in URL
File: packages/refcheck/tests/matcher.test.ts:131
it("matches DOI in URL", () => {
const result = matchPageToVault(
"https://doi.org/10.18653/v1/N19-1423",
"Some Title",
testEntries
);
expect(result).not.toBeNull();
expect(result!.entry.id).toBe("bert2019");
expect(result!.matchType).toBe("doi");
expect(result!.confidence).toBe(0.95);
});
Step 3: matches arXiv abs URL
arXiv IDs are extracted from various URL formats: /abs/xxxx, /pdf/xxxx.pdf, and versioned URLs like /abs/xxxx.v3. All resolve to the same paper.
Test: matches arXiv abs URL
File: packages/refcheck/tests/matcher.test.ts:188
it("matches arXiv abs URL", () => {
const result = matchPageToVault(
"https://arxiv.org/abs/2005.14165",
"Some Title",
testEntries
);
expect(result).not.toBeNull();
expect(result!.entry.id).toBe("gpt3-2020");
// URL match takes precedence over arXiv match since the entry has exact URL
expect(result!.matchType).toBe("url");
expect(result!.confidence).toBe(1.0);
});
Step 4: matches by title when URL doesn’t match
When URL-based matching fails, Extenote falls back to title similarity. The page title is compared against all entry titles using fuzzy matching. Confidence depends on how closely the titles match.
Test: matches by title when URL doesn't match
File: packages/refcheck/tests/matcher.test.ts:248
it("matches by title when URL doesn't match", () => {
const result = matchPageToVault(
"https://example.com/unknown",
"Attention Is All You Need",
testEntries
);
expect(result).not.toBeNull();
expect(result!.entry.id).toBe("attention2017");
expect(result!.matchType).toBe("title");
expect(result!.confidence).toBeGreaterThan(0.85);
});
This documentation is generated from test annotations. Edit the source test file to update.