The use of timestamps at the beginning of requests significantly reduces the prefix cache hit rate in large language models.

AI TechnologyMar 30, 2026score 0.172 posts · 0 replies across 1 instances

The thread discusses 7 common anti-patterns that negatively impact the prefix cache hit rate in large language models (LLMs), focusing on issues like timestamp usage, floating tool lists, and unstable RAG chunk order. These issues are critical for optimizing cache performance in production environments.

Claims

Parent: AIEntity: Large Language Models (LLMs)Sub-entity: Prefix Cache OptimizationImpact: negativeDate: Mar 30, 2026Target: The use of timestamps at the beginning of requests

Floating tool lists in requests lead to a decrease in the prefix cache hit rate in large language models.

Parent: AIEntity: Large Language Models (LLMs)Sub-entity: Prefix Cache OptimizationImpact: negativeDate: Mar 30, 2026Target: Floating tool lists in requests

Unstable chunk order in Retrieval-Augmented Generation (RAG) systems negatively affects the prefix cache hit rate in large language models.

Parent: AIEntity: Large Language Models (LLMs)Sub-entity: Prefix Cache OptimizationImpact: negativeDate: Mar 30, 2026Target: Unstable chunk order in RAG systems

Source posts

@[email protected]

Один timestamp, один round-robin, один плавающий список tools: 7 анти-паттернов, которые убивают префикс кэша LLM

Кэширование включено, а cached_tokens всё равно не растут? Часто проблема не в модели и не в провайдере. Hit rate обычно режут совсем другие вещи: timestamp в начале запроса, плавающий порядок tools, разные реплики, RAG с нестабильным порядком чанков и слишком короткая жизнь KV-кэша. В статье разбираю 7 типовых анти-паттернов, которые убивают prefix_cache_hit в проде.

habr.com/ru/companies/bitrix/articles/1016732/

#prefix_cache #искусственный_интеллект #vllm #openai #anthropic #maas #selfhosted #promptengineering #contextengineering #agents

0 boosts · 0 favs · 0 replies · Mar 30, 2026

#agents#contextengineering#promptengineering#selfhosted#maas#anthropic

@[email protected]

Один timestamp, один round-robin, один плавающий список tools: 7 анти-паттернов, которые убивают префикс кэша LLM

Кэширование включено, а cached_tokens всё равно не растут? Часто проблема не в модели и не в провайдере. Hit rate обычно режут совсем другие вещи: timestamp в начале запроса, плавающий порядок tools, разные реплики, RAG с нестабильным порядком чанков и слишком короткая жизнь KV-кэша. В статье разбираю 7 типовых анти-паттернов, которые убивают prefix_cache_hit в проде.

habr.com/ru/companies/bitrix/articles/1016734/

#prefix_cache #искусственный_интеллект #vllm #openai #anthropic #maas #selfhosted #promptengineering #contextengineering #agents

0 boosts · 0 favs · 0 replies · Mar 30, 2026

#agents#contextengineering#promptengineering#selfhosted#maas#anthropic