Welcome

About us

TeichAI is focused on one mission: making frontier AI capabilities accessible through open-source model distillation.

We take the reasoning capabilities of proprietary models like GPT-5, Claude 4.5 Opus, Gemini 3 Pro Preview, and DeepSeek v3.2, and transfer that knowledge to smaller, open-source models that you can run locally on your own hardware.

What is Model Distillation?

Knowledge distillation is a technique where a smaller “student” model learns to replicate the behavior of a larger “teacher” model. In our case:

Teacher Models: Frontier proprietary models (GPT-5, Claude 4.5, Gemini 3, etc.)
Student Models: Open-source base models (Qwen3, Nemotron, etc.)
Knowledge Transfer: We query the teacher models to generate high-quality reasoning traces, then fine-tune the student models on these outputs

┌─────────────────────────────────────────────────────────────┐
│                    DISTILLATION PIPELINE                    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌──────────────┐         ┌──────────────┐                │
│   │   Teacher    │  Query  │   Dataset    │                │
│   │  (GPT-5,     │ ──────► │  (Reasoning  │                │
│   │   Claude)    │         │   Traces)    │                │
│   └──────────────┘         └──────┬───────┘                │
│                                   │                         │
│                                   ▼                         │
│   ┌──────────────┐         ┌──────────────┐                │
│   │   Student    │   SFT   │   Distilled  │                │
│   │  (Qwen3,     │ ◄────── │    Model     │                │
│   │   Nemotron)  │         │              │                │
│   └──────────────┘         └──────────────┘                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Our Approach

High-Quality Datasets

We generate reasoning datasets by querying frontier models with high reasoning effort settings, capturing their step-by-step thought processes.

Efficient Training

Using Unsloth and LoRA, we can fine-tune models efficiently on consumer GPUs (as little as 16GB VRAM for 4B models).

Multiple Formats

Every model is released in both HuggingFace Transformers format (16-bit) and GGUF quantizations for local deployment.

Full Transparency

All datasets, training scripts, and model weights are publicly available. You can replicate everything we do.