mini-infer
psmarter/mini-inferLLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving
182 stars
10 forks
Python
42 views