RLPlay: A vibe project towards automatic research

<aside> 💡

Reinforcement Learning Play on a tiny 4090D.

</aside>

Hey…Math, you there?

Although we both desire to let small models capable to run

Set your expectation for small models…

Setup

Environmental Setup

Takeaways: Three-fold issues would happen in this stage, including network, environment, deps. Save times with correct tool / system choices, let agents do the tedious stuffs!

(For China Mainland Users) Make sure you could download packages, deps, and visit global internet… Many students in China mainland would be stuck at this step. The network problems would occur multiple times and super annoying…

Some Tips: If you are familiar with globally surfing, you could skip this step.

Use Modern Toolchains to Setup Environments. Using Linux is always a good choice in running deep learning related projects, if you are Windows user, you could use https://github.com/microsoft/WSL. Save time for fighting with your conda, use Pixi (Link here: https://pixi.prefix.dev/latest/) to save your times. For non-DL environments, use https://docs.astral.sh/uv/. Use https://docs.astral.sh/ruff/ for linter.

Observability, or Harness matters.

Solve Deps Issues.

Good First Moves

Observe the issues from real answers.

Smoke test, probe experiments before large-scale runs. Earlier found the

Operations

Fast, fast, fully utilize and fast. To fully utilizing the existing resources (specifically, CPUs, GPU(s), data etc.). Ideally, in first steps, a feasible epoch of iteration may be 5-10 minutes.

Setup automatic workflows with your favorite vibe tools. With capable vibe coding tools (e.g., Claude Code, Codex), you could easily setup the automatic,

Setup bootstrapping knowledge base.

Order the experiments in your favorite style.