Skip to content

Troubleshooting

Training Issues

Out of Memory (OOM)

Symptoms: CUDA out of memory error during training.

Solutions:

  1. Reduce sequence length

    # Trainer truncation
    args = SFTConfig(max_length=4096) # Reduced from 8192
    # Or reduce model context window during load
    model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=input_model,
    max_seq_length=4096,
    load_in_4bit=True,
    )
  2. Ensure gradient checkpointing is enabled

    use_gradient_checkpointing = "unsloth"
  3. Reduce LoRA rank

    r = 16 # Reduced from 32
  4. Reduce batch size

    per_device_train_batch_size = 1
  5. Use 4-bit quantization

    load_in_4bit = True

Loss Not Decreasing

Symptoms: Training loss stays flat or increases.

Solutions:

  1. Check dataset quality - Validate your dataset has proper format
  2. Verify chat template - Must match model type
  3. Increase learning rate slightly
    learning_rate = 5e-4 # Try higher if stuck
  4. Check for data corruption - Run validation script

Training Very Slow

Solutions:

  1. Reduce sequence length (max_length in SFTConfig and/or max_seq_length when loading the model)
  2. Enable 4-bit loading
  3. Use adamw_8bit optimizer
  4. Reduce logging frequency
    logging_steps = 10 # Instead of 1

Dataset Issues

”messages_not_list_or_empty” Error

Cause: Dataset rows don’t have a valid messages field.

Solution: Ensure dataset format:

{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

“think_tag_unbalanced” Error

Cause: Mismatched <think> and </think> tags.

Solution: Validate think tags are properly balanced:

content = assistant_message["content"]
assert content.count("<think>") == content.count("</think>")
assert content.find("<think>") < content.find("</think>")

“missing_final_after_think” Error

Cause: Assistant message ends with </think> and no final answer.

Solution: Ensure content exists after the closing think tag:

after_think = content.split("</think>", 1)[1].strip()
assert len(after_think) > 0

“does_not_end_with_assistant” Error

Cause: Conversation doesn’t end with an assistant turn.

Solution: Each training example must end with an assistant message.

Model Issues

Wrong Output Format

Symptoms: Model doesn’t use think tags, or uses them incorrectly.

Solutions:

  1. Verify chat template matches model type

    # For thinking models
    chat_template = "qwen3-thinking"
    # For instruct models
    chat_template = "qwen3-instruct"
  2. Check training data format - Ensure all assistant messages use correct format

Model Outputs Gibberish

Solutions:

  1. Check correct tokenizer is loaded with model
  2. Verify chat template is applied
  3. Ensure model loaded correctly (check for warnings during load)

Inconsistent Quality

Possible causes:

  1. Training data quality varies
  2. Training stopped too early
  3. Overfitting to training data

Solutions:

  1. Use higher quality dataset (e.g., Claude 4.5 Opus)
  2. Train for more steps
  3. Increase dataset diversity

Export Issues

GGUF Export Fails

Solutions:

  1. Ensure llama.cpp is available

    Terminal window
    pip install llama-cpp-python
  2. Try fewer quantization methods

    quantization_method = ["q8_0"] # Start with one
  3. Check disk space - GGUF files can be large

HuggingFace Upload Fails

Solutions:

  1. Verify token has write access
  2. Check repository name is valid
  3. Ensure enough storage quota

Environment Issues

CUDA Version Mismatch

Symptoms: Various CUDA errors or model won’t load.

Solutions:

  1. Use Unsloth’s recommended versions

    Terminal window
    pip install unsloth
  2. Check PyTorch CUDA compatibility

    import torch
    print(torch.cuda.is_available())
    print(torch.version.cuda)

Tokenizer Parallelism Warning

Symptoms: Warning about tokenizer parallelism.

Solution:

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

Windows Multiprocessing Issues

Symptoms: Hangs or errors on Windows.

Solution:

if __name__ == "__main__":
import multiprocessing as mp
mp.freeze_support()
# Training code here

Also set:

dataset_num_proc = 1
dataloader_num_workers = 0

Getting Help

If you’re still stuck:

  1. Check Unsloth documentation
  2. Visit TeichAI on HuggingFace
  3. Review the Unsloth Discord