Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.compose.market/llms.txt

Use this file to discover all available pages before exploring further.

Retrieval has two stages. Each selected layer returns candidates. Then summary.ts reduces those candidates into the compact prompt block used by the agent turn.

Retrieval pipeline

pre_turn requires a query. If the caller omits limits, the lower-level loop defaults to limit: 5, maxItems: 8, and maxItemChars: 500. The native agent turn passes tighter values: limit: 12, maxItems: 6, maxItemChars: 120, and budget.maxCharacters: 900.

Embeddings

embedding.ts uses MongoDB’s AI gateway to reach Voyage embeddings.
SettingValue
Default modelvoyage-4-large
Default dimensions1024
Default timeout8000 ms
Document input typedocument
Query input typequery
Cache TTL86400 seconds
Query and document embeddings use separate Redis keys. The same text can produce different vectors because Voyage’s asymmetric retrieval models treat query and document inputs differently. The embedder has no silent provider fallback. If embeddings are unavailable, the operation fails instead of mixing vector spaces. hybridVectorSearch() builds a durable Mongo filter, runs Atlas $vectorSearch, applies a score threshold, and bumps access counters for returned rows. If Atlas vector search fails, or if the vector query returns no usable results, the runtime falls back to keyword search over recent scoped documents. That fallback keeps recall available during vector-index outages, but it is not treated as equivalent to semantic retrieval. Optional Atlas prefilter fields come from MEMORY_VECTOR_PREFILTER_FIELDS. The runtime still applies memory filters after search, so prefilter configuration is an optimization, not the trust boundary.

Ranking

applyVectorRanking() applies ranking in this order:
StageDefaultBehavior
Temporal decayOnMultiplies scores by a half-life decay curve. Default half-life is 30 days.
RerankOnUses Cloudflare Workers AI with @cf/baai/bge-reranker-base by default.
Rerank skipOnSkips rerank when candidate count is below 3 or top-1 beats top-2 by at least 0.05.
MMROff unless requestedReduces near-duplicate results with lambda 0.7 by default.
Rerank is fail-soft. Missing Cloudflare credentials, timeouts, or upstream failures log a warning and keep the decayed order.

Prompt packing

Layer search returns records. The prompt packer turns them into text.
SignalWeight in packing
Layer priorityWorking and patterns get the highest priority multipliers; archives get the lowest.
Semantic scoreUses vector or layer score when present.
Query overlapRewards candidates whose text contains query terms.
ConfidenceUses fact confidence or metadata confidence.
ImportanceReads metadata.importance when present.
RecencyUses a 30 day recency score.
Access countAdds a capped logarithmic boost.
Character costRanks by score per character before packing.
Layer priorities are:
LayerPriority
working1.15
patterns1.12
graph1.08
vectors1.02
scene0.96
archives0.88
The packer dedupes candidates by a normalized fingerprint. It has a specific user-turn fingerprint for transcript-like rows so repeated session summaries do not crowd out durable facts.

Output shape

The final prompt always starts with the same header:
Memory context:

[GRAPH] The user prefers short TypeScript examples.

[WORKING] user: Continue the deployment checklist.
pre_turn also returns structured accounting:
FieldMeaning
contextUsage.charactersCharacters in the returned prompt.
contextUsage.rawCharactersApproximate characters in the raw layer payload.
contextUsage.savedCharactersVsRawRaw payload size minus compact prompt size.
contextUsage.itemsNumber of selected compact items.
omittedPer-layer raw hits not selected for the prompt.
totalsPer-layer hit counts before packing.

Comparison notes

A common memory path is “retrieve top-K vectors, paste snippets into the prompt.” Manowar does a little more before the model sees memory:
ConcernCommon approachManowar approach
Cross-thread durable memoryStore all turns in one retriever or manually choose a namespace.Durable layers ignore threadId; hot layers keep threadId.
Prompt growthReturn top-K snippets and let the caller format them.Score, dedupe, and pack by character budget before the model sees memory.
Rerank failureUpstream reranker can fail the retrieval path.Rerank is optional and fail-soft.
Fresh writesCache can serve stale results until TTL.Namespace tokens invalidate scoped query keys after writes.
Fact extractionInline LLM extraction can slow the turn response.Graph extraction is queued after post_turn; explicit saves index directly.