GabGPT Trainer

Training Mode

Enter raw text directly

Upload text file

Enter ChatML directly

Upload ChatML file

Training Data

For pre-training, enter raw text. Model learns to predict next character.

Model Architecture

Embedding Dimension

Attention Heads

Transformer Blocks

Max Sequence Length

Est. Params: ~21M (vocab ~1,756)

Tokenizer (BPE)

Number of Merges

Higher = larger vocabulary, more compression

Reserved Keywords

Training Configuration

Epochs

Batch Size

Sequence Length

Learning Rate

Warmup Ratio

Log Every N Steps

Use Adam Optimizer

Adam uses adaptive learning rates with momentum. Disable for simple SGD.

Training Controls

Idle

Waiting...

Epoch

Step

Loss

Avg Loss

Training Log

Save & Load

Training Session

Save/load complete training state including model, tokenizer, hyperparameters, and training data.

Model Only

Export/import just the trained model weights.

Tokenizer Only

Export/import just the trained tokenizer.