OLD (current main) | NEW (Ashwin's branch) | |
|---|---|---|
| Saves delivery | 3 soft blocks: USER SAVES ("weave in, ignore mismatches") + PRIORITY SAVES (hearted) + KNOWN PLACES | 1 block: CANDIDATE POOLS BY CITY — pre-built, scored, tagged [HEARTED]/[SAVED]/[ICONIC]/[GEM]/[SEEDED] |
| Who picks places | Model picks/invents; saves are hints | Model sequences a pre-built pool; tagged items MUST appear; invented = explicit (gap-fill) |
| Eateries | Included in saves (told to ignore mismatches) | Excluded (day-eligible only; eateries → food rail) |
| Day allocation | Model allocates | Deterministic DAY ALLOCATION + post-gen repairCityCoverage |
| Seasonality | none | SEASONALITY block (month + climate) |
| Item schedule | time: morning/afternoon labels | ordered list = the schedule |
candidate-pool.ts (the scorer) is net-new; main still runs OLD. The system prompt is identical on both sides, so this A/B isolates the user-message structure.
verso-db for 2 real saves-heavy users across 4 countries. 4 scenarios exercise different shapes:
assemblePool/scoreSave; OLD via faithful re-impl of main's loadSavedPlaceNamesByCity + getTopCuratedPlaceNamesByCity).Reproduce via workers/api/scripts/bench-render.ts (render) + bench-score.ts (deterministic score) + the itinerary-prompt-ab workflow.
| Dimension | OLD | NEW | Δ | Winner |
|---|---|---|---|---|
| saves_coverage | 7.81 | 8.87 | +1.06 | NEW |
| schema_validity | 7.75 | 9.79 | +2.04 | NEW |
| invented_discipline | 8.16 | 8.46 | +0.30 | NEW (mixed) |
| city_lock | 9.88 | 9.75 | -0.13 | tie |
| day_balance | 8.75 | 8.38 | -0.37 | OLD (small) |
| narrative_quality | 8.67 | 8.42 | -0.25 | OLD (small) |
| Scenario | OLD coverage | NEW coverage | OLD lock | NEW lock |
|---|---|---|---|---|
| s1 Thailand | 19/40 (48%) | 21/40 (53%) | 3/3 | 3/3 |
| s2 Italy | 10/10 (100%) | 10/10 (100%) | 3/3 | 3/3 |
| s3 Japan | 12/17 (71%) | 16/17 (94%) | 2/2 | 2/2 |
| s4 India | 9/11 (82%) | 11/11 (100%) | 3/3 | 3/3 |
| Average | 75% | 87% | 100% | 100% |
Panel and deterministic scorer agree: NEW covers more of the user's real saves, with city-lock effectively tied at 100%.
USER SAVES + PRIORITY SAVES feed eateries — including a hearted restaurant — into the day list, and the model schedules "Ristorante Belvedere" / "Il Riccio" as kind:activity (the schema says never a restaurant). This tanked OLD's schema score in s2 Italy (4.83 vs 10). NEW's pool is day-eligible-only, so eateries route to the food rail. OLD is actively misusing the saves it's given.NEW's only losses are small (day_balance −0.37, narrative −0.25, within panel noise), but the rationales surface 3 concrete, fixable weaknesses:
repairCityCoverage guarantees city coverage, but nothing guarantees every hearted item survives when the pool exceeds the day cap.(gap-fill) escape hatch let the model place a duplicate (s3: Fushimi Inari twice) and a Jaipur mall on a Jodhpur day (s4). Gap-fill isn't constrained to the day's city or de-duped.Data note: "Villa del Balbianello" is a Lake Como villa mis-saved under Rome — it hurt both variants. NEW's MUST-include rule forces the mis-geocoded save in; a pre-pool geo-sanity filter would help, but it's a data issue, not a prompt issue.
Land NEW (the collection-driven rewrite). It wins the two load-bearing dimensions — saves coverage and schema validity — by clear margins, and ties on city-lock. Its losses are marginal and addressable. Graft 3 small tweaks from OLD's strengths into NEW's CANDIDATE POOLS BY CITY rules block (prompts-v2.ts) during the port — no architecture change:
PRIORITY SAVES idea) so dense-city hearted items can't be silently dropped; optionally extend repairCityCoverage to inject any unplaced [HEARTED] item.repairCityCoverage as the per-day floor.Net: the rewrite is the right call and directly delivers the "uses more saves" goal (+12pts coverage), while also fixing a real OLD bug (restaurants-as-activities). The 3 tunings recover NEW's only soft spots.