Training Mode

Pre-Train Direct

Enter raw text directly

Pre-Train File

Upload text file

SFT Direct

Enter ChatML directly

SFT File

Upload ChatML file

For pre-training, enter raw text. Model learns to predict next character.

Model Architecture
Est. Params: ~21M (vocab ~1,756)
Tokenizer (BPE)

Higher = larger vocabulary, more compression

Training Configuration

Adam uses adaptive learning rates with momentum. Disable for simple SGD.

Training Controls
Idle
Waiting...
0
Epoch
0
Step
-
Loss
-
Avg Loss
Training Log
Save & Load

Save/load complete training state including model, tokenizer, hyperparameters, and training data.

Export/import just the trained model weights.

Export/import just the trained tokenizer.