Skip to content

Configuration Options

Overview

This reference documents all configuration options used in TeichAI training scripts.

Global Configuration

# Identity
hf_account = "your-username" # HuggingFace username or organization
hf_token = "hf_..." # HuggingFace write token
output_model_name = "My-Distill" # Name for output model
# Model
input_model = "unsloth/Qwen3-4B" # Base model to fine-tune
chat_template = "qwen3" # Chat template type
# Dataset
dataset_id = "TeichAI/dataset-name" # HuggingFace dataset ID
dataset_file = "" # Local JSONL file path (alternative)
# Training
max_len = 8192 # Maximum sequence length
steps = 2000 # Training steps
resume = False # Resume from checkpoint
# Upload
private_upload = False # Upload as private model

Model Configuration

input_model

The base model to fine-tune. Use Unsloth model IDs for optimized training.

Model TypeExample ID
Qwen3 Denseunsloth/Qwen3-4B, unsloth/Qwen3-8B
Qwen3 Thinkingunsloth/Qwen3-4B-Thinking-2507
Qwen3 Instructunsloth/Qwen3-4B-Instruct-2507
Qwen3 MoEunsloth/Qwen3-30B-A3B-Thinking-2507
Nemotronnvidia/Nemotron-Cascade-8B-Thinking

chat_template

Chat template for formatting conversations. Must match model type.

TemplateUse Case
qwen3Base Qwen3 models
qwen3-thinkingThinking variants with <think> tags
qwen3-instructInstruct variants

LoRA Configuration

model = FastLanguageModel.get_peft_model(
model,
r=32, # LoRA rank
target_modules=[...], # Layers to adapt
lora_alpha=32, # Scaling factor
lora_dropout=0, # Dropout rate
bias="none", # Bias training
use_gradient_checkpointing="unsloth", # Memory optimization
random_state=3407, # Reproducibility seed
use_rslora=False, # Rank-stabilized LoRA
loftq_config=None, # LoftQ configuration
)

r (LoRA Rank)

ValueMemoryCapacityRecommendation
8LowBasicSimple tasks
16LowMediumResource-constrained
32MediumHighDefault
64HigherVery HighComplex tasks
128HighMaximumFull capability

target_modules

Layers to apply LoRA adapters to:

target_modules = [
"q_proj", # Query projection
"k_proj", # Key projection
"v_proj", # Value projection
"o_proj", # Output projection
"gate_proj", # Gate projection (MLP)
"up_proj", # Up projection (MLP)
"down_proj", # Down projection (MLP)
]

use_gradient_checkpointing

ValueEffect
FalseFastest, most memory
TrueStandard checkpointing
"unsloth"Recommended - 30% less VRAM

SFT Configuration

args = SFTConfig(
# Data
dataset_text_field="text",
max_length=8192,
# Batch
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
# Learning
warmup_ratio=0.05,
max_steps=2000,
learning_rate=2e-4,
lr_scheduler_type="linear",
# Optimization
optim="adamw_8bit",
weight_decay=0.01,
# Logging
logging_steps=1,
report_to="none",
# Checkpoints
output_dir="outputs",
save_strategy="steps",
save_steps=200,
save_total_limit=20,
# System
seed=3447,
dataloader_num_workers=0,
)

Key Training Parameters

ParameterDefaultDescription
max_length8192Maximum tokens per example
per_device_train_batch_size1Batch size per GPU
gradient_accumulation_steps4Steps before optimizer update
learning_rate2e-4Learning rate
max_steps2000Total training steps
warmup_ratio0.05Fraction of total steps for LR warmup

optim

ValueMemoryAccuracy
adamw_8bitLowGood
adamw_torchHighBest
sgdLowestWorse

lr_scheduler_type

ValueBehavior
linearLinear decay to 0
cosineCosine decay
constantNo decay

Export Configuration

Merged Model Upload

model.push_to_hub_merged(
f"{hf_account}/{output_model_name}",
tokenizer,
save_method="merged_16bit", # "merged_16bit" or "merged_4bit"
token=hf_token,
private=False, # Private repository
)

GGUF Export

model.push_to_hub_gguf(
f"{hf_account}/{output_model_name}-GGUF",
tokenizer,
quantization_method=[
"bf16", # BFloat16 (full precision)
"f16", # Float16 (full precision)
"q8_0", # 8-bit quantization
"q6_k", # 6-bit k-quant
"q5_k_m", # 5-bit k-quant mixed
"q4_k_m", # 4-bit k-quant mixed
],
token=hf_token,
private=False,
)

Environment Variables

os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["HF_DATASETS_DISABLE_MULTIPROCESSING"] = "1"
# Debug modes
os.environ["CHECK_DATASET_ONLY"] = "1" # Validate dataset and exit
os.environ["CHECK_LENGTHS_ONLY"] = "1" # Check token lengths and exit
os.environ["SANITY_MAXLEN"] = "1" # Debug max length settings