Configuration Options

Overview

This reference documents all configuration options used in TeichAI training scripts.

Global Configuration

# Identity
hf_account = "your-username"           # HuggingFace username or organization
hf_token = "hf_..."                    # HuggingFace write token
output_model_name = "My-Distill"       # Name for output model

# Model
input_model = "unsloth/Qwen3-4B"       # Base model to fine-tune
chat_template = "qwen3"                # Chat template type

# Dataset
dataset_id = "TeichAI/dataset-name"    # HuggingFace dataset ID
dataset_file = ""                      # Local JSONL file path (alternative)

# Training
max_len = 8192                         # Maximum sequence length
steps = 2000                           # Training steps
resume = False                         # Resume from checkpoint

# Upload
private_upload = False                 # Upload as private model

Model Configuration

`input_model`

The base model to fine-tune. Use Unsloth model IDs for optimized training.

Model Type	Example ID
Qwen3 Dense	`unsloth/Qwen3-4B`, `unsloth/Qwen3-8B`
Qwen3 Thinking	`unsloth/Qwen3-4B-Thinking-2507`
Qwen3 Instruct	`unsloth/Qwen3-4B-Instruct-2507`
Qwen3 MoE	`unsloth/Qwen3-30B-A3B-Thinking-2507`
Nemotron	`nvidia/Nemotron-Cascade-8B-Thinking`

`chat_template`

Chat template for formatting conversations. Must match model type.

Template	Use Case
`qwen3`	Base Qwen3 models
`qwen3-thinking`	Thinking variants with `<think>` tags
`qwen3-instruct`	Instruct variants

LoRA Configuration

model = FastLanguageModel.get_peft_model(
    model,
    r=32,                              # LoRA rank
    target_modules=[...],              # Layers to adapt
    lora_alpha=32,                     # Scaling factor
    lora_dropout=0,                    # Dropout rate
    bias="none",                       # Bias training
    use_gradient_checkpointing="unsloth",  # Memory optimization
    random_state=3407,                 # Reproducibility seed
    use_rslora=False,                  # Rank-stabilized LoRA
    loftq_config=None,                 # LoftQ configuration
)

`r` (LoRA Rank)

Value	Memory	Capacity	Recommendation
8	Low	Basic	Simple tasks
16	Low	Medium	Resource-constrained
32	Medium	High	Default
64	Higher	Very High	Complex tasks
128	High	Maximum	Full capability

`target_modules`

Layers to apply LoRA adapters to:

target_modules = [
    "q_proj",      # Query projection
    "k_proj",      # Key projection
    "v_proj",      # Value projection
    "o_proj",      # Output projection
    "gate_proj",   # Gate projection (MLP)
    "up_proj",     # Up projection (MLP)
    "down_proj",   # Down projection (MLP)
]

`use_gradient_checkpointing`

Value	Effect
`False`	Fastest, most memory
`True`	Standard checkpointing
`"unsloth"`	Recommended - 30% less VRAM

SFT Configuration

args = SFTConfig(
    # Data
    dataset_text_field="text",
    max_length=8192,

    # Batch
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,

    # Learning
    warmup_ratio=0.05,
    max_steps=2000,
    learning_rate=2e-4,
    lr_scheduler_type="linear",

    # Optimization
    optim="adamw_8bit",
    weight_decay=0.01,

    # Logging
    logging_steps=1,
    report_to="none",

    # Checkpoints
    output_dir="outputs",
    save_strategy="steps",
    save_steps=200,
    save_total_limit=20,

    # System
    seed=3447,
    dataloader_num_workers=0,
)

Key Training Parameters

Parameter	Default	Description
`max_length`	8192	Maximum tokens per example
`per_device_train_batch_size`	1	Batch size per GPU
`gradient_accumulation_steps`	4	Steps before optimizer update
`learning_rate`	2e-4	Learning rate
`max_steps`	2000	Total training steps
`warmup_ratio`	0.05	Fraction of total steps for LR warmup

`optim`

Value	Memory	Accuracy
`adamw_8bit`	Low	Good
`adamw_torch`	High	Best
`sgd`	Lowest	Worse

`lr_scheduler_type`

Value	Behavior
`linear`	Linear decay to 0
`cosine`	Cosine decay
`constant`	No decay

Export Configuration

Merged Model Upload

model.push_to_hub_merged(
    f"{hf_account}/{output_model_name}",
    tokenizer,
    save_method="merged_16bit",   # "merged_16bit" or "merged_4bit"
    token=hf_token,
    private=False,                 # Private repository
)

GGUF Export

model.push_to_hub_gguf(
    f"{hf_account}/{output_model_name}-GGUF",
    tokenizer,
    quantization_method=[
        "bf16",      # BFloat16 (full precision)
        "f16",       # Float16 (full precision)
        "q8_0",      # 8-bit quantization
        "q6_k",      # 6-bit k-quant
        "q5_k_m",    # 5-bit k-quant mixed
        "q4_k_m",    # 4-bit k-quant mixed
    ],
    token=hf_token,
    private=False,
)

Environment Variables

os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["HF_DATASETS_DISABLE_MULTIPROCESSING"] = "1"

# Debug modes
os.environ["CHECK_DATASET_ONLY"] = "1"      # Validate dataset and exit
os.environ["CHECK_LENGTHS_ONLY"] = "1"      # Check token lengths and exit
os.environ["SANITY_MAXLEN"] = "1"           # Debug max length settings