Methodology
Where the data comes from, what we verify, and the explicit places we are least reliable.
★ The pipeline in four steps ★
Every scan runs through the same four-step pipeline. First, a headless browser visits the restaurant’s menu URL and captures the rendered HTML, including the dishes that lazy-load after page paint. Second, a parser walks the DOM and extracts a structured record per dish - name, ingredient list, allergen tags, price, and category - matching the most common schema patterns we have seen across thousands of restaurant menus. Third, a language model audits each dish against a vegan rule set built from years of menu reading. The audit produces a confidence score that gets calibrated against ground-truth data from human verifiers. Fourth, the scan caches its results in our database so the next visitor to the same restaurant gets the same answer in under a second.
None of the four steps is a black box. We log the captured menu HTML, the parsed dish records, the language model’s reasoning trace, the human verifier’s confirmation (or correction), and the final order script committed for every item. When a verifier disagrees with the model’s call, the trace tells us exactly why and the rule set gets a small adjustment for the next scan cycle so the disagreement is captured in the system rather than re-litigated every refresh.
★ Where the data comes from ★
The primary source is the restaurant’s own menu page. When that page publishes structured allergen data - a public allergen PDF, an in-page allergen table, schema.org markup with ingredient fields - we read it and treat it as the highest-confidence input. For chains that publish detailed allergen sheets (Chipotle, Taco Bell, Panera, P.F. Chang’s, Sweetgreen), the documented allergen status is essentially the source of truth and the model’s job is mostly to write the order script in the language the kitchen will actually ring.
When the menu page does not publish allergen data, the scan falls back to ingredient inference: read the dish name, read the description, cross-reference common recipes, and assign a confidence score that reflects the gap between documented and inferred. An item described as “black bean burrito with rice and salsa” on a page with no allergen sheet still earns a high vegan confidence because the ingredient pattern is unambiguous. An item described as “chef’s special vegetable bowl” on the same page earns a low confidence because the dish name does not commit to specific ingredients.
★ What “verified vegan” means here ★
We use vegan in the practical sense: no animal-derived ingredients in the dish as ordered, including no dairy, no eggs, no honey, no fish, no shellfish, no broth or stock derived from animals, and no animal-fat cooking medium. Shared-equipment cooking (a fryer used for both fries and chicken, a grill used for vegetables and steak) is flagged as a counter-side check rather than disqualifying, because most vegan diners we have spoken to treat shared-equipment cooking as personally acceptable while ingredient contact is not.
Honey is treated as non-vegan by default per the dominant North American vegan convention, but every honey-containing item is explicitly tagged so a vegan who eats honey can override the recommendation. The same explicit-tag pattern applies to wine and beer used as cooking ingredients, fish sauce and oyster sauce in Asian preparations, and gelatin-thickened sauces.
We are not a medical-allergen reference. If you have a clinically diagnosed dairy or egg allergy, the canonical source for what is in any dish is always the restaurant itself plus a conversation with the kitchen. Vegan Recon is a research aid that gets you to the right question to ask, not a substitute for that question.
★ What the confidence score means ★
Every item carries a confidence score between zero and one. A score above 0.85 means the scan resolved the item against a documented ingredient list (chain allergen sheet, public recipe, in-page ingredient markup) and the order script has been verified by a human. A score between 0.7 and 0.85 means the documentation is solid but no human has verified the specific order script yet. A score between 0.5 and 0.7 means the scan inferred the vegan status from dish naming patterns and recipe norms; the item is probably vegan but a counter-side check is recommended. Below 0.5 we usually do not surface the item at all - if you see one, treat it as a lead rather than a verified order.
★ Where we are least reliable ★
Three places. First, local restaurants without a structured online menu - the scan does its best with whatever the page shows, but a hand-typed paragraph menu without ingredient details is hard to verify against any rule set. Second, regional chains that change their ingredient suppliers between markets - the corporate allergen sheet may say one thing while the specific store you walk into uses a different supplier whose recipe we have not catalogued. Third, anywhere the kitchen reformulates a dish faster than the website updates - this happens at every chain we cover, and our weekly re-verification cron catches most of it but not all.
When you order a Vegan Recon-flagged item and the kitchen confirms something the scan missed - a hidden butter brush, a new chicken-stock base, a swapped supplier - send us a correction by email. Every correction trains the next round of scans, and the rule-set adjustment lands on the affected items across the whole catalogue, not just the dish you reported. The contact line is on the About page.
★ How fresh the data is ★
Every chain page carries a “Last verified” date stamp at the top. A weekly cron re-fetches every cached restaurant menu and re-runs the audit; the stamp updates on each successful run. When a chain rolls out a menu change that the cron picks up, the affected items’ confidence scores drop temporarily and surface a re-verify note until a human curator confirms the new state.
Local restaurant menus get a re-verify pass every 30 days because they change less often than national chain menus. If you scan a local restaurant whose menu you know changed recently, the scan respects a fresh re-scan request and ignores the cached state. The result then enters the normal weekly cron cycle from the new baseline.