Stage Support Matrix

SrotaX exposes the full MongoDB-style aggregation vocabulary, but the runtime you choose determines how practical a stage is. The Streaming Engine must emit results incrementally, whereas a future batch job engine can materialise whole result sets before emitting. This guide calls out how each stage behaves today.

Legend
✅ – Well suited to streaming pipelines
⚠️ – Supported with caveats (state, ordering, performance)
⏳ – More appropriate for batch-style execution; avoid in streaming unless the pipeline structure is tightly controlled.

Stateless/document-scoped stages

Stage	Streaming	Batch	Notes
`$match`	✅	✅	Filters each document; ideal for ingress throttling.
`$project` / `$set` / `$addFields`	✅	✅	Reshape or add computed fields without global state.
`$unset` / `$replaceRoot` / `$replaceWith`	✅	✅	Useful for tidying payloads before sinks receive them.
`$limit` / `$skip`	⚠️	✅	Limit/skip only make sense in streaming when used after `$unwind` to constrain per-document arrays. For global limits, prefer batch mode.
`$unwind`	✅	✅	Enables array fan-out. Unlocks later stages (e.g., `$group`) because repeated keys now represent atomic events.
`$sampleRate`	✅	✅	Streaming-safe sampling for observability or rate limiting.

Stateful aggregations

Stage	Streaming	Batch	Notes
`$group`	⚠️	✅	Supported in streaming when grouping by a stable key (e.g., customerId) and using incremental accumulators (`$sum`, `$avg`, `$min`, `$max`, `$count`, `$push`, `$addToSet`). Avoid grouping by expressions that require whole-stream ordering or accumulators that need global knowledge (`$first`, `$last`, `$stdDevSamp`, `$stdDevPop`). See the $group reference for the full accumulator catalogue.
`$setWindowFields`	⚠️	✅	Works with streaming windows (tumbling, hopping) when a bounded window is declared. Window functions such as `$rank`, `$shift`, `$derivative`, and accumulator-backed windows (`$sum`, `$avg`, etc.) are safe in streaming if the window bounds are finite and the partition key is stable. Unbounded windows devolve into global state—reserve them for batch use. See the $setWindowFields guide for function specifics.
`$bucket` / `$bucketAuto`	⏳	✅	Require knowledge of global min/max or distribution. Reserve for batch or bounded replays.
`$densify`	⏳	✅	Needs complete interval knowledge to backfill gaps; better suited to batch jobs.
`$facet`	⚠️	✅	Streaming facets run the same pipeline on the same event, so ensure each sub-pipeline is itself streaming-friendly. Fan-out to sinks if results diverge wildly.
`$sort`	⏳	✅	Global sort is incompatible with infinite streams. For streaming, sort within a window (e.g., by using `$setWindowFields` + `$push` + client-side ordering) or rely on upstream ordering.

Expression-focused stages

Stage	Streaming	Batch	Notes
`$fill`	⚠️	✅	Works in streaming when backing values can be sourced from window state or defaults. Window-aware fills that need previous/next documents should run inside bounded windows.
`$function`	✅	✅	Ideal for bespoke logic. In streaming, ensure functions are side-effect safe and idempotent to support retries.
`$densify` / `$sampleRate`	see above	see above	Already covered but included here for quick scanning.

Output-oriented stages

Stage	Streaming	Batch	Notes
`$merge` / `$out`	⏳	✅	Not part of the current SrotaX stage set. For streaming, write to sinks through SrotaX Connect instead. Once batch jobs are available, `$out`-style materialisation becomes more attractive.
`$lookup`	⚠️	✅	Supported via Enrich or native stages. Streaming lookups must guard against high latency; pair with caching or asynchronous enrichment.
`$graphLookup`	⏳	✅	Depth-first traversal is expensive for streaming; reserve for batch workloads.

Guidance for structuring pipelines

Normalise arrays early. Apply $unwind as soon as possible so downstream stages operate on single logical events. This is especially important in streaming pipelines that need $group or $setWindowFields.
Isolate batch-heavy logic. If a pipeline requires $sort, $densify, or $bucketAuto, split it: stream the critical detection piece, and hand off the expensive report-building stages to a scheduled batch job.
Use sinks for persistence. Until a batch job engine ships with $out semantics, the recommended way to materialise results is via SrotaX Connect sinks (HTTP, SQL, Kafka, MongoDB, …).
Budget state carefully. Stateful stages keep per-key accumulators in memory. Pick grouping keys with bounded cardinality and monitor metrics such as stream.state.size.

Looking ahead: batch job engine

The forthcoming batch job engine will reuse the same pipeline definitions but execute over bounded document sets. Expect the following stages to shine in that environment:

$bucket / $bucketAuto for histogramming over complete datasets.
$densify and $fill for time series repair.
$sort + $group for report-ready ordering.
$graphLookup or expensive $lookup joins that need full data stores.

Document your intent using pipeline metadata today so it is clear which stages assume bounded versus unbounded inputs. That will make migrating to batch execution straightforward once the engine lands.

Accumulator cheat sheet for streaming

The following accumulators are generally safe in long-lived pipelines because they can be updated incrementally per key:

$sum, $avg, $min, $max, $count
$push, $addToSet (monitor cardinality to avoid unbounded arrays)
$first / $last only when scoped to a finite window (e.g., inside $setWindowFields with bounded bounds)

Prefer to defer these accumulators to batch execution where the full dataset is available:

$stdDevPop, $stdDevSamp, $covariancePop, $covarianceSamp
$mergeObjects when it expects to see all documents to build a final shape
$bottomN / $topN style accumulators (coming soon) that inherently need global ordering

For a comprehensive list of accumulators and their semantics, see the operators reference. Use this matrix to decide when a feature is practical in streaming—then dive into the stage documentation for syntax details and examples.