LoRA Routing

This guide shows how to enable intent-aware LoRA (Low-Rank Adaptation) routing in the Semantic Router:

Minimal configuration for LoRA routing
vLLM server setup with LoRA adapters
Example request/response showing automatic LoRA selection
Verification steps

Prerequisites

A running vLLM server with LoRA support enabled
LoRA adapter files (fine-tuned for specific domains)
Envoy + the router (see Installation guide)

1. Start vLLM with LoRA Adapters

First, start your vLLM server with LoRA support enabled:

vllm serve meta-llama/Llama-2-7b-hf \
  --enable-lora \
  --lora-modules \
    technical-lora=/path/to/technical-adapter \
    medical-lora=/path/to/medical-adapter \
    legal-lora=/path/to/legal-adapter \
  --host 0.0.0.0 \
  --port 8000

Key flags:

--enable-lora: Enables LoRA adapter support
--lora-modules: Registers LoRA adapters with their names and paths
Format: adapter-name=/path/to/adapter

2. Minimal Configuration

Put this in config/config.yaml (or merge into your existing config):

# Category classifier (required for intent detection)
classifier:
  category_model:
    model_id: "models/category_classifier_modernbert-base_model"
    use_modernbert: true
    threshold: 0.6
    use_cpu: true
    category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"

# vLLM endpoint hosting your base model + LoRA adapters
vllm_endpoints:
  - name: "vllm-primary"
    address: "127.0.0.1"
    port: 8000
    weight: 1

# Define base model and available LoRA adapters
model_config:
  "llama2-7b":
    reasoning_family: "llama2"
    preferred_endpoints: ["vllm-primary"]
    # IMPORTANT: Define all available LoRA adapters here
    loras:
      - name: "technical-lora"
        description: "Optimized for programming and technical questions"
      - name: "medical-lora"
        description: "Specialized for medical and healthcare domain"
      - name: "legal-lora"
        description: "Fine-tuned for legal questions"

# Default model for fallback
default_model: "llama2-7b"

# Categories with LoRA routing
categories:
  - name: "technical"
    description: "Programming, software engineering, and technical questions"
    system_prompt: "You are an expert software engineer."
    model_scores:
      - model: "llama2-7b"           # Base model name
        lora_name: "technical-lora"  # LoRA adapter to use
        score: 1.0
        use_reasoning: true
        reasoning_effort: "medium"

  - name: "medical"
    description: "Medical and healthcare questions"
    system_prompt: "You are a medical expert."
    model_scores:
      - model: "llama2-7b"
        lora_name: "medical-lora"    # Different LoRA for medical
        score: 1.0
        use_reasoning: true
        reasoning_effort: "high"

  - name: "legal"
    description: "Legal questions and law-related topics"
    system_prompt: "You are a legal expert."
    model_scores:
      - model: "llama2-7b"
        lora_name: "legal-lora"      # Different LoRA for legal
        score: 1.0
        use_reasoning: true
        reasoning_effort: "high"

  - name: "general"
    description: "General questions"
    system_prompt: "You are a helpful assistant."
    model_scores:
      - model: "llama2-7b"           # No lora_name = uses base model
        score: 0.8
        use_reasoning: false

3. How It Works

Flow:

User sends a query to the router
Category classifier detects the intent (e.g., "technical")
Router looks up the best ModelScore for that category
If lora_name is specified, it becomes the final model name
Request is sent to vLLM with model="technical-lora"
vLLM routes to the appropriate LoRA adapter
Response is returned to the user

Test Domain Aware LoRA Routing

Send test queries and verify they're classified correctly:

# Technical query
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "MoM", "messages": [{"role": "user", "content": "Explain async/await in JavaScript"}]}'

# Medical query
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "MoM", "messages": [{"role": "user", "content": "What causes high blood pressure?"}]}'

Check the router logs to confirm the correct LoRA adapter is selected for each query.

Benefits

Domain Expertise: Each LoRA adapter is fine-tuned for specific domains
Cost Efficiency: Share base model weights across adapters (lower memory usage)
Easy A/B Testing: Compare adapter versions by adjusting scores
Flexible Deployment: Add/remove adapters without restarting the router
Automatic Selection: Users don't need to know which adapter to use

Next Steps

See complete LoRA routing example
Learn about category configuration
Explore reasoning routing to combine with LoRA adapters

Prerequisites​

1. Start vLLM with LoRA Adapters​

2. Minimal Configuration​

3. How It Works​

Test Domain Aware LoRA Routing​

Benefits​

Next Steps​