Decision Gate System Interactive

How each experiment idea flows from hypothesis โ†’ final submission ยท hover nodes for detail

IDEA PRE-CHECK GATE 1 GATE 2 GATE 3 GATE 4 SUBMIT ๐Ÿ’ก New Technique Idea e.g. int5 MLP, trigram hash, SWA retune ๐Ÿ“‹ Check failures.md Has this been tried and ruled out? Ruled out? in failures.md YES ๐Ÿšซ Drop idea unless new context changes outcome NO ๐Ÿ“ Write plan file hypothesis ยท changes ยท conflicts ยท fallback ยท env vars ๐Ÿ”ต Gate 1 โ€” Smoke Test 1ร—H100, 1 seed ยท Check: no crash, artifact < 16MB ~$0.30 ยท ~10 minutes Pass? FAIL ๐Ÿ”ง Debug & fix then retry G1 PASS ๐ŸŸข Gate 2 โ€” 1ร—H100 Full Run 1ร—H100, compare val_bpb vs base ยท bpb improvement visible? ~$0.30 ยท ~12 minutes Improvement? NO ๐Ÿ“‹ Log failure add to failures.md with reason YES ๐ŸŸ  Gate 3 โ€” 8ร—H100 Single Seed 8ร—H100, 1 seed ยท target: val_bpb < 1.139 ~$3.50 ยท ~12 minutes (full 600s run) < 1.139? NO ๐Ÿ” Analyze & retune or log as conditional failure YES ๐ŸŸฃ Gate 4 โ€” 8ร—H100 ยท 3 Seeds 8ร—H100, seeds 42/1337/7 ยท mean bpb < 1.1378, p < 0.01 ~$10.50 ยท 3 ร— 12 min ยท final validation COST TRACKER G1: $0.30 G2: $0.30 G3: $3.50 G4: $10.50 $14.60 max per technique if G1/G2 abort: $0.60
Gate 1 โ€” cheap smoke
Gate 2 โ€” quality check
Gate 3 โ€” scale check
Gate 4 โ€” final 3-seed
Abort / document

Gate details

๐Ÿ”ต Gate 1 โ€” Smoke Test

1ร—H100, 1 seed (42). Pass criteria: training doesn't crash, artifact file is under 16,000,000 bytes. Does not require bpb improvement โ€” just that the code works.

~$0.30

๐ŸŸข Gate 2 โ€” Single GPU Quality

1ร—H100 full 600s run vs base. val_bpb must show visible improvement vs the copy base (1.1458). Even a small positive delta is enough to continue.

~$0.30

๐ŸŸ  Gate 3 โ€” Scale Validation

8ร—H100, 1 seed. val_bpb must be < 1.139 to continue to final 3-seed run. This is where single-GPU gains that don't scale get caught before spending $10.

~$3.50

๐ŸŸฃ Gate 4 โ€” Statistical Confirmation

8ร—H100, seeds 42, 1337, and 7. Mean val_bpb < 1.1378 (SOTA โˆ’ 0.005) with p < 0.01 variance across seeds. This is the final bar for submission.

~$10.50