Troubleshooting
Training Issues
Out of Memory (OOM)
Symptoms: CUDA out of memory error during training.
Solutions:
-
Reduce sequence length
# Trainer truncationargs = SFTConfig(max_length=4096) # Reduced from 8192# Or reduce model context window during loadmodel, tokenizer = FastLanguageModel.from_pretrained(model_name=input_model,max_seq_length=4096,load_in_4bit=True,) -
Ensure gradient checkpointing is enabled
use_gradient_checkpointing = "unsloth" -
Reduce LoRA rank
r = 16 # Reduced from 32 -
Reduce batch size
per_device_train_batch_size = 1 -
Use 4-bit quantization
load_in_4bit = True
Loss Not Decreasing
Symptoms: Training loss stays flat or increases.
Solutions:
- Check dataset quality - Validate your dataset has proper format
- Verify chat template - Must match model type
- Increase learning rate slightly
learning_rate = 5e-4 # Try higher if stuck
- Check for data corruption - Run validation script
Training Very Slow
Solutions:
- Reduce sequence length (
max_lengthin SFTConfig and/ormax_seq_lengthwhen loading the model) - Enable 4-bit loading
- Use
adamw_8bitoptimizer - Reduce logging frequency
logging_steps = 10 # Instead of 1
Dataset Issues
”messages_not_list_or_empty” Error
Cause: Dataset rows don’t have a valid messages field.
Solution: Ensure dataset format:
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}“think_tag_unbalanced” Error
Cause: Mismatched <think> and </think> tags.
Solution: Validate think tags are properly balanced:
content = assistant_message["content"]assert content.count("<think>") == content.count("</think>")assert content.find("<think>") < content.find("</think>")“missing_final_after_think” Error
Cause: Assistant message ends with </think> and no final answer.
Solution: Ensure content exists after the closing think tag:
after_think = content.split("</think>", 1)[1].strip()assert len(after_think) > 0“does_not_end_with_assistant” Error
Cause: Conversation doesn’t end with an assistant turn.
Solution: Each training example must end with an assistant message.
Model Issues
Wrong Output Format
Symptoms: Model doesn’t use think tags, or uses them incorrectly.
Solutions:
-
Verify chat template matches model type
# For thinking modelschat_template = "qwen3-thinking"# For instruct modelschat_template = "qwen3-instruct" -
Check training data format - Ensure all assistant messages use correct format
Model Outputs Gibberish
Solutions:
- Check correct tokenizer is loaded with model
- Verify chat template is applied
- Ensure model loaded correctly (check for warnings during load)
Inconsistent Quality
Possible causes:
- Training data quality varies
- Training stopped too early
- Overfitting to training data
Solutions:
- Use higher quality dataset (e.g., Claude 4.5 Opus)
- Train for more steps
- Increase dataset diversity
Export Issues
GGUF Export Fails
Solutions:
-
Ensure llama.cpp is available
Terminal window pip install llama-cpp-python -
Try fewer quantization methods
quantization_method = ["q8_0"] # Start with one -
Check disk space - GGUF files can be large
HuggingFace Upload Fails
Solutions:
- Verify token has write access
- Check repository name is valid
- Ensure enough storage quota
Environment Issues
CUDA Version Mismatch
Symptoms: Various CUDA errors or model won’t load.
Solutions:
-
Use Unsloth’s recommended versions
Terminal window pip install unsloth -
Check PyTorch CUDA compatibility
import torchprint(torch.cuda.is_available())print(torch.version.cuda)
Tokenizer Parallelism Warning
Symptoms: Warning about tokenizer parallelism.
Solution:
import osos.environ["TOKENIZERS_PARALLELISM"] = "false"Windows Multiprocessing Issues
Symptoms: Hangs or errors on Windows.
Solution:
if __name__ == "__main__": import multiprocessing as mp mp.freeze_support() # Training code hereAlso set:
dataset_num_proc = 1dataloader_num_workers = 0Getting Help
If you’re still stuck:
- Check Unsloth documentation
- Visit TeichAI on HuggingFace
- Review the Unsloth Discord