Contribute GPU time

Why this page exists

The maintainer of this project is Emma Leonhart. The development hardware is a laptop with an NVIDIA RTX 4070 Laptop GPU (8 GB VRAM, 35–115 W TGP). Three rungs of the normalized-wikidata model series — v11 (350 k triples), v12 (672 k triples), and v13 (2.5 M triples, training in flight as of 2026-05-14) — all fit within that envelope at --batch-size 16.

The fourth rung, v14 on the 4 M-triple v14-1M corpus, doesn't. Rough estimate: 10 epochs × ~4 h/epoch on an RTX 4090 = ~40 h of exclusive GPU. On the 4070 Laptop, the same run would be ~80 h. That's outside what the maintainer can practically dedicate to the project on her own hardware.

The training data is already public — EmmaLeonhart/normalized-wikidata@v14-1M on Hugging Face, 4 021 409 clean labeled triples, CC-BY-SA 4.0 (inherits from Wikidata). The training code is in this repository. What's missing is the wall-time. If that's something you can spare, this page tells you how.

What you'd be doing, in one command

git clone https://github.com/EmmaLeonhart/Loka
cd Loka

# Authenticate with YOUR Hugging Face account
huggingface-cli login

# Run the contributor script
python tools/contribute_v14_training.py --hf-user YOUR_HF_USERNAME

The script handles everything from there:

Pulls the v14-1M training corpus from Hugging Face (~180 MB, cached on first run).
Pulls the BPE tokenizer and vocab from EmmaLeonhart/loka@v12.
Ensures a Hugging Face dataset repo exists under your account — default <your-user>/loka-v14-contribution.
Runs training/train.py for 10 epochs at the standard architecture (44.5 M parameters, BPE, 6 layers, batch 16).
Runs tools/epoch_snapshot_pusher.py in parallel, which watches the training log and uploads each completed epoch's checkpoint to your HF repo, tagged v14.1, v14.2, ... v14.10.
If a late epoch crashes for any reason, every earlier epoch is already preserved both locally and on Hugging Face. (This safety net was added after the v12 disaster — see history.)

Estimated wall-time, by hardware

GPU	Per epoch	10 epochs total
RTX 4090 (24 GB)	~4 h	~40 h
RTX 4080 (16 GB)	~5 h	~50 h
RTX 3090 (24 GB)	~6 h	~60 h
RTX 4070 Laptop (8 GB) — the project's box	~8 h	~80 h
A100 / H100 (cloud)	~1–2 h	~10–20 h

If you have less than 8 GB VRAM, the script will likely OOM in epoch 1. If you have more, you can pass --batch-size 32 or higher and finish proportionally faster.

Before you start — please raise a GitHub issue

Please raise an issue at github.com/EmmaLeonhart/Loka/issues before kicking off training. Otherwise the maintainer has no idea anyone is running this, and two contributors might duplicate the work.

Issue template (copy-paste, fill in the placeholders):

Title: [v14] GPU donation: planning to train

Body:
Hi! Going to run `tools/contribute_v14_training.py --hf-user <mine>` on my hardware.

System:
- GPU: <your-GPU>
- VRAM: <X GB>
- Estimated wall-clock: ~<hours> h

Will update this issue with the HF link to <your-user>/loka-v14-contribution
when the run is in progress / complete.

When the run finishes, please comment on the same issue with the HF link — Emma can then mirror your result to EmmaLeonhart/loka@v14 and credit you in the paper.

What gets pushed to your HF account, and who owns it

Each epoch's checkpoint is pushed to your Hugging Face account as <your-user>/loka-v14-contribution, tagged v14.N. You own that repo and the upload uses your token, not Emma's. The training script never has access to EmmaLeonhart/loka's write credentials.

After you comment on the GitHub issue, Emma mirrors your result to EmmaLeonhart/loka@v14 (a separate, explicit step), updates MODEL.json to pin v14, and adds your contribution to the paper's acknowledgements section. Your repo stays in place as the canonical original.

Observed hardware constraints — and why your hardware might converge differently

The numbers reported in the project's DEVLOG and paper are laptop numbers. The 4070 Laptop's sustained TGP cap is ~80 W, hit consistently during training (verified via nvidia-smi while v13 was running — 74 W actual / 80 W cap, 74 °C, 83 % GPU util). The desktop 4070 has roughly 200 W of headroom — 2.5× the sustained power budget — and an H100 / A100 has much more again.

For wall-time this matters in the obvious way (per-epoch time scales roughly inversely with power, modulo memory bandwidth). But it can also change what the model converges to. Adam's momentum and learning-rate schedule interact with batch size, and with more VRAM you can take larger batches — which changes the gradient noise and sometimes the depth of the basin Adam settles into. The v13 run on Emma's laptop appears to plateau around ppl 240–260 starting at epoch 2; we don't know yet whether that's the data's information ceiling or whether more aggressive batch sizes + power headroom would push lower. A contributor run on a 24 GB card at --batch-size 64 would be a genuinely different empirical data point, not just a faster version of the same run.

Concrete things a contributor with better hardware might observe that we can't:

Larger batches (32, 64, 128) — lower gradient noise, potentially flatter loss curves, sometimes deeper minima.
Higher learning rates safely — large-batch training tolerates more aggressive schedules.
No 80 W power throttling — on a desktop GPU the per-batch wall-time is consistent across the run; on the laptop, there's some variance from thermal/power management.
Full 10 epochs without hitting a wall-clock budget — Emma's local runs cap at 5 epochs to fit a day's budget; a contributor can let it run as long as the loss curve is still descending.

None of this is a blocker for contribution at the laptop-equivalent config. But if you're contributing and have more headroom, feel free to experiment with --batch-size and --epochs flags on contribute_v14_training.py — the per-epoch HF snapshots mean every alternate-config run produces shipped artifacts other people can compare against, even if a given experiment turns out worse than baseline.

Cost (if you don't have hardware locally)

If you're considering a cloud GPU, the rough budget:

Provider	GPU class	Approx hourly	Approx total (10 epochs)
RunPod	RTX 4090	~$0.40	~$16
RunPod / Vast.ai	RTX 3090	~$0.30	~$18
Lambda Labs	A100 (40 GB)	~$1.10	~$15

Emma is not paying for cloud GPU on this project, so v14 stays on the donor-only path. But if you're already paying for cloud GPU for other work and have a spare day's worth of compute, this is a fairly small budget to put a meaningful result on Hugging Face.

What you get out of it

The training run is your repo on Hugging Face — <your-user>/loka-v14-contribution. Your name on the artifact.
Acknowledgement in the paper draft (paper/paper.md) when v14 is folded into the model series.
A reproducible reference run on the largest tier of the normalized-wikidata corpus — useful infrastructure for downstream world-model work.
The 4 M-triple v14-1M corpus on Hugging Face is independently useful for any KG-grounded language modelling project, and your trained checkpoint becomes a credible baseline for it.

Anything else?

If something fails or you want to do something different from the default recipe (more epochs, different architecture, a different corpus tier), open an issue and we'll figure it out. The contributor script is intentionally simple — one corpus, one model size, one batch size that fits 8 GB — but the underlying training code (training/train.py) supports everything that v3 onward supported.

Thank you for considering this. The whole project is open-source under Apache 2.0, the corpus is CC-BY-SA 4.0, and your contribution moves the entire normalized-wikidata model series forward.

Open a GitHub issue Read the contributor script