<aside> 💡
Reinforcement Learning Play on a tiny 4090D.
</aside>
Although we both desire to let small models capable to run
Takeaways: Three-fold issues would happen in this stage, including network, environment, deps. Save times with correct tool / system choices.
(For China Mainland Users) Make sure you could download packages, deps, and visit global internet… Many students in China mainland would be stuck at this step. The network problems would occur multiple times and super annoying…
Use Modern Toolchains to Setup Environments. Using Linux is always a good choice in running deep learning related projects, if you are Windows user, you could use https://github.com/microsoft/WSL. Save time for fighting with your conda, use Pixi (Link here: https://pixi.prefix.dev/latest/) to save your times. For non-DL environments, use https://docs.astral.sh/uv/. Use https://docs.astral.sh/ruff/ for linter.
Observability, or Harness matters.
Solve Deps Issues.
Observe the issues from real answers.
Smoke test, probe experiments before large-scale runs. Earlier found the
Fast, fast, fully utilize and fast. To fully utilizing the existing resources (specifically, CPUs, GPU(s), data etc.), you could find out
Setup automatic workflows with your favorite vibe tools. With capable vibe coding tools (e.g., Claude Code, Codex), you could easily setup the automatic,
Setup bootstrapping knowledge base.
Order the experiments in your favorite style.