Skip to content

Inference Engine

Core components for the inference engine.

Components

  • LLMEngine - Main inference engine
  • Scheduler - Request scheduling
  • ModelRunner - Model execution
  • BlockManager - KV cache block management
  • Sequence - Sequence tracking