Troubleshooting

Training Issues

Out of Memory (OOM)

Symptoms: CUDA out of memory error during training.

Solutions:

Reduce sequence length

# Trainer truncation
args = SFTConfig(max_length=4096)  # Reduced from 8192

# Or reduce model context window during load
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=input_model,
    max_seq_length=4096,
    load_in_4bit=True,
)

Ensure gradient checkpointing is enabled
```
use_gradient_checkpointing = "unsloth"
```
Reduce LoRA rank
```
r = 16  # Reduced from 32
```
Reduce batch size
```
per_device_train_batch_size = 1
```
Use 4-bit quantization
```
load_in_4bit = True
```

Loss Not Decreasing

Symptoms: Training loss stays flat or increases.

Solutions:

Check dataset quality - Validate your dataset has proper format
Verify chat template - Must match model type

Increase learning rate slightly

learning_rate = 5e-4  # Try higher if stuck

Check for data corruption - Run validation script

Training Very Slow

Solutions:

Reduce sequence length (max_length in SFTConfig and/or max_seq_length when loading the model)
Enable 4-bit loading
Use adamw_8bit optimizer
Reduce logging frequency
```
logging_steps = 10  # Instead of 1
```

Dataset Issues

”messages_not_list_or_empty” Error

Cause: Dataset rows don’t have a valid messages field.

Solution: Ensure dataset format:

{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

“think_tag_unbalanced” Error

Cause: Mismatched <think> and </think> tags.

Solution: Validate think tags are properly balanced:

content = assistant_message["content"]
assert content.count("<think>") == content.count("</think>")
assert content.find("<think>") < content.find("</think>")

“missing_final_after_think” Error

Cause: Assistant message ends with </think> and no final answer.

Solution: Ensure content exists after the closing think tag:

after_think = content.split("</think>", 1)[1].strip()
assert len(after_think) > 0

“does_not_end_with_assistant” Error

Cause: Conversation doesn’t end with an assistant turn.

Solution: Each training example must end with an assistant message.

Model Issues

Wrong Output Format

Symptoms: Model doesn’t use think tags, or uses them incorrectly.

Solutions:

Verify chat template matches model type

# For thinking models
chat_template = "qwen3-thinking"

# For instruct models
chat_template = "qwen3-instruct"

Check training data format - Ensure all assistant messages use correct format

Model Outputs Gibberish

Solutions:

Check correct tokenizer is loaded with model
Verify chat template is applied
Ensure model loaded correctly (check for warnings during load)

Inconsistent Quality

Possible causes:

Training data quality varies
Training stopped too early
Overfitting to training data

Solutions:

Use higher quality dataset (e.g., Claude 4.5 Opus)
Train for more steps
Increase dataset diversity

Export Issues

GGUF Export Fails

Solutions:

Ensure llama.cpp is available
Terminal window
```
pip install llama-cpp-python
```

Try fewer quantization methods

quantization_method = ["q8_0"]  # Start with one

Check disk space - GGUF files can be large

HuggingFace Upload Fails

Solutions:

Verify token has write access
Check repository name is valid
Ensure enough storage quota

Environment Issues

CUDA Version Mismatch

Symptoms: Various CUDA errors or model won’t load.

Solutions:

Use Unsloth’s recommended versions
Terminal window
```
pip install unsloth
```

Check PyTorch CUDA compatibility

import torch
print(torch.cuda.is_available())
print(torch.version.cuda)

Tokenizer Parallelism Warning

Symptoms: Warning about tokenizer parallelism.

Solution:

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

Windows Multiprocessing Issues

Symptoms: Hangs or errors on Windows.

Solution:

if __name__ == "__main__":
    import multiprocessing as mp
    mp.freeze_support()
    # Training code here

Also set:

dataset_num_proc = 1
dataloader_num_workers = 0

Getting Help

If you’re still stuck: