MongoDB Aggregation Pipeline Performance: Analysis of Query Plan Selection and Optimizer Behavior Across Versions and Collection Scales

This article examines how MongoDB optimizes aggregation pipeline queries, focusing on two mechanisms: a trial-based plan selection process that runs candidate execution plans in parallel and picks the one returning the most results for the least work, and rule-based operator rewriting by the Pipeline Optimizer. The study tests nine aggregation query types on a synthetic e-commerce dataset with 50K documents. It uses MongoDB versions 6.0.3 and 8.2.5 under identical conditions. For each query, it evaluates all valid operator orderings. It also examines the physical execution plan and the Pipeline Optimizer output. Each test runs 20 times. The system clears the plan cache before every run. The study also tests scalability with datasets of 150K and 250K documents. Three cases are identified where the rule-based optimizer falls short: IXSCAN preference bias at low selectivity, where the suboptimal plan is up to 9 times slower than the optimal (80ms vs. 699ms at 250K under MongoDB 8.2.5); unbounded document multiplication after $unwind; and failure to account for $group output cardinality. MongoDB 8.2.5 improves performance in most cases compared to 6.0.3. $match + $group queries run up to 28% faster. Queries that rely on IXSCAN improve by up to 18%. Unbounded projection operations run slower in MongoDB 8.2.5 at all tested sizes. The slowdown is +23% at 50K, +3% at 150K, and +14% at 250K pointing to a change in the projection execution path between versions.

Liked Liked