French dev — I got a free AI to code autonomously in a loop

What if a free AI model could write game code, run the tests, and move on to the next task — all by itself?

That’s exactly what I tested this week. Three experiments, three progressive failures, and finally: a task completed end-to-end by an autonomous AI agent, with no human intervention, for zero dollars.

The setup

I’m working on a DragonRuby project: reproducing the Sonic 1 bonus stage (Megadrive) using lookup tables, at GameBoy resolution (160×144). The code is Ruby (mRuby), tests use dr_spec, and the workflow follows git flow.

For automation, I’m using:

Ralph TUI: an AI agent loop orchestrator. It reads a task list (PRD), launches the agent, detects completion, and moves to the next task.
OpenCode: an AI agent CLI (like Claude Code, but multi-model).
MiniMax M2.5 free: a free AI model scoring 80.2% on SWE-bench — nearly on par with Claude Opus (80.8%).

The experiment

I gave Ralph a simple task: create a sin/cos lookup table module in fixed-point Q8 (integers, no floats — just like the Megadrive did). With specs, tests, and a commit.

Attempt 1: the silent failure

Ralph runs 5 iterations. M2.5 writes the code. It’s clean. But:

It never ran the tests (DragonRuby binary not found — it lives outside the project directory)
It never signaled completion (Ralph expects a specific signal: <promise>COMPLETE</promise>)
Result: 7 minutes wasted. 0 tasks completed.

Attempt 2: the debug loop

I fix the DragonRuby path in the docs. Ralph relaunches. M2.5 finds the binary, runs the tests… and discovers the test runner is broken. It spends 10 minutes debugging dragon_specs.rb (a file copied from another project with hardcoded requires pointing to files that don’t exist). I kill it.

Attempt 3: completion

I fix everything: the test runner, mRuby-incompatible syntax, non-existent matchers. I rewrite the PRD with ultra-precise instructions: numbered steps, full binary path, exact matcher list, mRuby pitfalls documented.

Result:

	Exp 001	Exp 002	Exp 003
Tasks completed	0/3	0/3	1/1
Duration	7m28	10min+ (killed)	2m23
DR tests	Never ran	Crashed	25 tests, exit 0
COMPLETE signal	No	No	Yes
Cost	$0	$0	$0

What I learned

1. A free model can code — but you need to chew the work for it

M2.5 produced correct Q8 code on the very first attempt. But it didn’t know:

Where to find the test binary
That mRuby doesn’t support the same syntax as standard Ruby
Which test matchers actually exist in the framework
How to signal Ralph that it’s done

Each of these points required a fix in the project docs.

2. The real work is the template, not the code

Across 3 experiments, I spent 80% of my time writing docs and 20% watching the AI code. It’s counterintuitive: you think AI will save you time, but the time shifts toward preparation.

The good news: template work is cumulative. Every error fixed in the docs prevents dozens of future failures.

3. The cost is unbeatable

Claude Opus costs ~$0.50–1.00 per run ($5/$25 per million tokens). M2.5 free = $0. Over dozens of Ralph iterations, the difference is massive.

And the code quality? Nearly identical. Both produce correct Q8, clean specs, Sandi Metz compliant methods.

What’s next

I have 7 TDD issues lined up on the project. The goal: get Ralph to run the first 3 end-to-end, with no intervention. Then benchmark different free models on the same tasks.

Follow this series

This is the first post in a series about automating game development with AI agents. Coming up:

Running Ralph on 3 chained tasks with dependencies
Benchmarking different free models on identical tasks
Building a rotating checkerboard Megadrive-style — entirely coded by an agent

You can find me on GitHub or follow this blog — every step is documented in detail, code included.