Skip to main content
To track token usage, enforce quotas, and build accurate dashboards, ONR needs to parse token counts and finish reasons accurately from upstream responses. Different providers place these values in different locations.

Usage Extraction modes

The usage_extract directive applies native parsing strategies tailored to standard provider schemas:
metrics { usage_extract openai_chat_completions; }
metrics { usage_extract anthropic_messages; }
metrics { usage_extract gemini_generate_content; }
Use global usage_mode presets or custom. The default repository config focuses on the more specific presets from config/modes/usage_modes.conf, such as openai_chat_completions, openai_prompt_completion, openai_responses, openai_responses_stream, anthropic_messages, anthropic_messages_stream, gemini_generate_content, and gemini_generate_content_stream. Generic names such as openai, anthropic, and gemini are no longer special builtin usage_extract modes. If you want them, define them explicitly as global usage_mode presets. Inside metrics, declaring usage_fact, *_tokens_path, or *_tokens_expr without usage_extract is equivalent to usage_extract custom;. Migration notes for the old generic provider modes:
  • gemini: the current default preset behavior can be fully replaced by custom; input token should read the total input token count, and multimodal details can be exposed as input.image/input.audio/input.video token facts.
  • anthropic: custom can cover the same core token/cache extraction, including streaming, but the config is more verbose because stream events may place usage under either message.usage or top-level usage.
  • openai: custom covers the core token/cache extraction, but image/audio/tool supplemental facts still require extra explicit usage_fact rules.
For Gemini output tokens, both of the following styles are valid in custom mode:
# Summed implicitly because both rules share the same dimension + unit
metrics {
  usage_extract custom;
  usage_fact output token path="$.usageMetadata.candidatesTokenCount";
  usage_fact output token path="$.usageMetadata.thoughtsTokenCount";
}

# Equivalent explicit arithmetic form
metrics {
  usage_extract custom;
  output_tokens_expr = $.usageMetadata.candidatesTokenCount + $.usageMetadata.thoughtsTokenCount;
}
Anthropic streaming custom sketch:
metrics {
  usage_fact input token path="$.message.usage.input_tokens" event="message_start";
  usage_fact input token path="$.message.usage.cache_read_input_tokens" event="message_start";
  usage_fact input token path="$.message.usage.cache_creation_input_tokens" event="message_start";
  usage_fact input token path="$.usage.cache_read_input_tokens" event="message_delta";
  usage_fact input token path="$.usage.cache_creation_input_tokens" event="message_delta";
  usage_fact output token path="$.usage.output_tokens" event="message_delta";
  usage_fact cache_read token path="$.message.usage.cache_read_input_tokens" event="message_start";
  usage_fact cache_read token path="$.usage.cache_read_input_tokens" event="message_delta";
  usage_fact cache_write token path="$.message.usage.cache_creation.ephemeral_5m_input_tokens" attr.ttl="5m" event="message_start";
  usage_fact cache_write token path="$.usage.cache_creation.ephemeral_5m_input_tokens" attr.ttl="5m" event="message_delta";
  usage_fact cache_write token path="$.message.usage.cache_creation.ephemeral_1h_input_tokens" attr.ttl="1h" event="message_start";
  usage_fact cache_write token path="$.usage.cache_creation.ephemeral_1h_input_tokens" attr.ttl="1h" event="message_delta";
  usage_fact cache_write token path="$.message.usage.cache_creation_input_tokens" fallback=true event="message_start";
  usage_fact cache_write token path="$.usage.cache_creation_input_tokens" fallback=true event="message_delta";
}
OpenAI supplemental facts custom sketches:
# responses: completed web search calls
metrics {
  usage_extract custom;
  usage_fact input token path="$.usage.input_tokens";
  usage_fact output token path="$.usage.output_tokens";
  usage_fact server_tool.web_search call count_path="$.output[*]" type="web_search_call" status="completed";
}

# images.generations
metrics {
  usage_extract custom;
  usage_fact input token path="$.usage.input_tokens";
  usage_fact output token path="$.usage.output_tokens";
  usage_fact image.generate image count_path="$.data[*]";
}

# audio.speech
metrics {
  usage_extract custom;
  usage_fact audio.tts second source=derived path="$.audio_duration_seconds";
}

Custom Token Extraction

If a provider hides tokens in a weird JSON path, you can map them dynamically.
metrics {
  usage_extract custom;
  
  input_tokens_path "$.usage.input_tokens";
  output_tokens_path "$.usage.output_tokens";
  
  # For sophisticated caching insights
  cache_read_tokens_path "$.usage.cache_read_input_tokens";
  cache_write_tokens_path "$.usage.cache_creation_input_tokens";
}
You can use the $.items[*].x JSONPath syntax to sum all numeric occurrences in an array.

Usage Facts

For the new custom-first flow, use usage_fact to describe each measurable item explicitly. Use usage_root first when the upstream response has a nested usage object and most facts should read from that object.
metrics {
  usage_root path="$.usage";

  usage_fact input token path="$.input_tokens";
  usage_fact output token path="$.output_tokens";
  usage_fact cache_read token path="$.cache_read_input_tokens";

  usage_fact cache_write token path="$.cache_creation.ephemeral_5m_input_tokens" attr.ttl="5m";
  usage_fact cache_write token path="$.cache_creation.ephemeral_1h_input_tokens" attr.ttl="1h";
  usage_fact cache_write token path="$.cache_creation_input_tokens" fallback=true;
}
  • usage_root path="..." extracts a usage JSON object before facts run.
  • Multiple usage_root rules merge into one usage object; later non-zero fields can fill or replace earlier zero fields.
  • In stream extraction, ONR merges usage roots from chunks first, then runs default-source / source=usage facts once at stream end. Explicit source=response, request, and derived facts still run during chunk processing.
  • event="a|b" can restrict usage_root or usage_fact to specific SSE event names.
  • path, count_path, sum_path, and expr are supported.
  • String values may use either double quotes or single quotes.
  • count_path can be combined with type and status filters.
  • event="..." optionally restricts a usage_fact rule to matching SSE event: names.
  • event_optional=true may be combined with event="..." when an upstream stream sometimes omits SSE event: framing.
  • attr.ttl distinguishes Anthropic cache write tiers.
  • Multiple usage_fact rules may share the same dimension + unit; all matched non-fallback rules are summed.
  • fallback=true uses a total field only when the more specific facts do not exist.
  • source defaults to the merged usage_root when configured; otherwise it defaults to response.
  • source currently supports usage, response, request, and derived.
  • dimension is a flat registry key; . is part of the name and does not imply nesting.
  • For filter JSONPath, single-quoted DSL strings avoid escaping inner double quotes, for example:
    • path='$.usageMetadata.promptTokensDetails[?(@.modality=="AUDIO")].tokenCount'

Supported dimensions

  • input
  • output
  • input.image
  • input.video
  • input.audio
  • output.image
  • output.video
  • output.audio
  • cache_read
  • cache_write
  • server_tool.web_search
  • server_tool.file_search
  • image.generate
  • image.edit
  • image.variation
  • audio.tts
  • audio.stt
  • audio.translate

Supported dimension + unit pairs

The current registry intentionally accepts a limited set of dimension + unit pairs:
  • Token and cache:
    • input token
    • output token
    • input.image token
    • input.video token
    • input.audio token
    • output.image token
    • output.video token
    • output.audio token
    • cache_read token
    • cache_write token
  • Tool usage:
    • server_tool.web_search call
    • server_tool.file_search call
  • Image and audio:
    • image.generate image
    • image.edit image
    • image.variation image
    • audio.tts second
    • audio.stt second
    • audio.translate second

Source examples

OpenAI Responses tool usage:
metrics {
  usage_extract openai_responses;

  usage_fact server_tool.web_search call count_path="$.output[*]" type="web_search_call" status="completed";
}
Image generation with request fallback:
metrics {
  usage_extract openai_images_generations;

  usage_fact image.generate image count_path="$.data[*]";
  usage_fact image.generate image source=request expr="$.n" fallback=true;
}
Speech synthesis with runtime-derived usage:
metrics {
  usage_extract openai_audio_speech;

  usage_fact audio.tts second source=derived path="$.audio.tts.seconds";
}
Gemini native multimodal input tokens:
metrics {
  usage_extract custom;

  usage_fact input token path="$.usageMetadata.promptTokenCount";
  usage_fact input.image token path='$.usageMetadata.promptTokensDetails[?(@.modality=="IMAGE")].tokenCount';
  usage_fact input.video token path='$.usageMetadata.promptTokensDetails[?(@.modality=="VIDEO")].tokenCount';
  usage_fact input.audio token path='$.usageMetadata.promptTokensDetails[?(@.modality=="AUDIO")].tokenCount';

  usage_fact output token path="$.usageMetadata.candidatesTokenCount";
  usage_fact output token path="$.usageMetadata.thoughtsTokenCount";
}

OpenAI-specific presets and supplemental facts

The repository’s OpenAI-specific presets commonly model canonical facts such as:
  • images.generations -> image.generate image
  • images.edits -> image.edit image
  • audio.transcriptions -> audio.stt second
  • audio.translations -> audio.translate second
  • audio.speech -> audio.tts second when derived runtime usage is available
  • responses -> server_tool.web_search call
This lets provider authors choose between reusable named presets and a fully explicit custom-first configuration.

Formula & Arithmetic Configuration

Alternatively, you can compute tokens dynamically based on math equations rather than strictly passing the exact path.
metrics {
  input_tokens_expr = "$.usage.in";
  output_tokens_expr = "$.usage.out";
}
total_tokens is derived from input + output automatically. In most cases, avoid setting total_tokens_expr explicitly, because it introduces a second total fact source that can drift from the totals derived from input and output.

Finish Reason Extraction

Similar to usage Extraction, you can instruct ONR where to look for the “stop” string (e.g. stop, length, tool_calls) indicating why generation ceased.
metrics { finish_reason_extract openai_chat_completions; }
metrics { finish_reason_extract anthropic_messages; }

# Custom location override:
metrics { finish_reason_path "$.choices[0].finish_reason"; }
Inside metrics, declaring finish_reason_path without finish_reason_extract is equivalent to finish_reason_extract custom;. Like usage extraction, finish reason extraction also supports reusable top-level presets:
finish_reason_mode "anthropic_messages_stream" {
  finish_reason_path "$.delta.stop_reason";
  finish_reason_path "$.message.stop_reason" fallback=true;
}
The default repository config keeps these presets in config/modes/finish_reason_modes.conf, which is included by config/onr.conf. The default repository config focuses on path-specific presets such as openai_chat_completions, openai_completions, openai_responses, anthropic_messages, anthropic_messages_stream, gemini_generate_content, and gemini_generate_content_stream. Generic names such as openai, anthropic, and gemini are no longer special builtin finish_reason_extract modes. If you want them, define them explicitly as global finish_reason_mode presets.