Data Aggregation in MongoDB

In MongoDB, "aggregation" refers to the process of transforming and computing data from multiple documents to produce insightful, summarized results.

While you can perform simple aggregations by fetching data to your application and processing it there, MongoDB's native aggregation tools are far more powerful and efficient.

There are three primary ways to perform aggregation in MongoDB:

Single-Purpose Aggregation Methods: A set of simple, standalone methods for common aggregation needs (e.g., countDocuments(), distinct()). These are best for quick, straightforward calculations.
The Aggregation Pipeline (Preferred Method): A highly flexible and powerful framework that processes documents through a multi-stage pipeline. This is the modern, standard approach for all complex aggregation tasks.
Map-Reduce (Legacy Method): A more complex, multi-phase processing framework modeled after the classic MapReduce paradigm. It is powerful but has been largely superseded by the Aggregation Pipeline for most use cases due to the pipeline's superior performance and ease of use.

The Aggregation Pipeline:

The Aggregation Pipeline is best visualized as an assembly line for your data. Documents from a collection enter at one end and proceed through a series of stages. Each stage transforms the documents in some way (e.g., filtering, grouping, reshaping) and passes its output to the next stage.

Why Use the Aggregation Pipeline?

Server-Side Processing: All computations happen directly on the database server, minimizing network latency and reducing the load on application.
Efficiency: The pipeline is highly optimized in native C++ code, making it significantly faster than manual processing or legacy Map-Reduce.
Power and Flexibility: It can perform complex, multi-step data transformations and computations in a single command.

javascript

db.collection.aggregate([
  { <stage_1> },
  { <stage_2> },
  { <stage_3> },
  // ...and so on
]);```

## Core Stages of the Aggregation Pipeline

The most common and important stages, explained with examples using `ECommerceDB`.

### 1. `$match`: The Funnel (Filtering Documents)

*   **Purpose:** To filter the stream of documents, allowing only those that match specified conditions to pass to the next stage. It's the equivalent of a `WHERE` clause in SQL.

*   **Best Practice:** Place `$match` as early as possible in your pipeline. Filtering documents early significantly reduces the amount of data that subsequent stages need to process, boosting performance.

**Example: Find all orders that have a "Delivered" status.**
```javascript
db.Orders.aggregate([
  { $match: { status: "Delivered" } }
]);

2. `$group`: The Summarizer (Grouping Documents)

Purpose: To group documents by a specified key (the _id field) and apply accumulator expressions to the grouped data to compute results. This is the core of aggregation, similar to SQL's GROUP BY.
Key Concepts:
- _id: This field defines the group key. You can group by a single field (_id: "$category") or by multiple fields to create a compound group key (_id: { category: "$category", inStock: "$inStock" }).
- _id: null groups all documents into a single result.
- Accumulators: Operators like $sum, $avg, $min, $max, and $push that calculate values over the documents in each group.

Example: Calculate the total revenue and average order amount for each order status.

javascript

db.Orders.aggregate([
  {
    $group: {
      _id: "$status", 
      // Group documents by the value of their 'status' field
      
      totalRevenue: { $sum: "$totalAmount" }, 
      // For each group, sum up the 'totalAmount'
      
      averageOrderValue: { $avg: "$totalAmount" }, 
      // For each group, average the 'totalAmount'
      
      orderCount: { $sum: 1 } 
      // For each group, add 1 to the count for each document
    }
  }
]);

3. `$project`: The Reshaper (Selecting and Creating Fields)

Purpose: To reshape the output documents. You can use it to include or exclude fields, rename fields, or create new computed fields. It's like the SELECT clause in SQL.

Example: Show all product names, their prices, and a new priceWithTax field (assuming 18% tax).

javascript

db.Products.aggregate([
  {
    $project: {
      _id: 0, // Exclude the original _id field
      productName: "$name", // Rename 'name' to 'productName'
      price: 1, // Include the 'price' field as is
      priceWithTax: { $multiply: ["$price", 1.18] } 
	      // Create a new computed field
    }
  }
]);

db.Products.aggregate([
	{
	  $project: {
	  name: 1,  // include name
      discounted: { $subtract: ["$price", "$discount"] }
  }
}
]);

4. `$sort`: The Organizer (Sorting Documents)

Purpose: To sort the documents based on a specified field. Use 1 for ascending order and -1 for descending order.

Example: Find the total amount spent by each user, sorted from highest to lowest.

javascript

db.Orders.aggregate([
  { $group: { 
	  _id: "$userId", 
	  totalSpent: { $sum: "$totalAmount" } 
	  } 
  },
  { $sort: { totalSpent: -1 } } 
  // Sort by the newly computed 'totalSpent' field
]);

db.Orders.aggregate([
  { $match: { status: "Delivered" } },
  { $group: { 
	  _id: "$customerId", 
	  totalAmount: { $sum: "$totalAmount" } 
	  } 
    },
  { $sort: { totalAmount: -1 } }
])

5. `$unwind`: The Deconstructor (Working with Arrays)

Purpose: To deconstruct an array field from the input documents to output a document for each element. If a document's array has three elements, $unwind will produce three separate documents, each with the data of the original document plus one of the array elements.

Example: Create a separate document for each item in every order.

javascript

db.Orders.aggregate([
  { $unwind: "$items" }
]);
// This will transform an order with 2 items into 2 separate documents.

6. `$lookup`: The Joiner (Combining Collections)

Purpose: To perform a left outer join to another collection in the same database, allowing you to enrich your documents with data from another collection.

Example: For each order, pull in the details of the user who placed it.

javascript

db.Orders.aggregate([
  {
    $lookup: {
      from: "Users",      // The collection to join with
      localField: "userId",  // The field from the input documents (Orders)
      foreignField: "_id",  // The field from the documents of the "from" collection (Users)
      as: "userDetails"   // The name of the new array field to add to the output
    }
  }
]);

Note: $lookup always outputs the matched documents as an array (userDetails in this case). You often use $unwind right after to deconstruct it if you expect a one-to-one match.

Other Important Stages

Stage	Description
`$limit`	Restricts the number of documents passed to the next stage.
`$skip`	Skips over a specified number of documents.
`$count`	Returns a count of the documents at that point in the pipeline.
`$addFields` / `$set`	Adds new fields to documents. `$set` is an alias for `$addFields`.
`$replaceRoot` / `$replaceWith`	Replaces the entire document with a specified sub-document.

Aggregation Operators Reference

Operator Group	Examples
Aggregation Math	`$sum`, `$avg`, `$min`, `$max`
Arithmetic	`$multiply`, `$divide`, `$add`, `$subtract`
Conditionals	`$cond`, `$ifNull`, `$switch`
Type Conversion	`$toInt`, `$toString`, `$toDate`
Array Operations	`$arrayElemAt`, `$size`, `$slice`, `$filter`
String Operations	`$regexMatch`, `$concat`, `$substr`, `$toLower`, `$toUpper`

Bulk Write Operations

Bulk Write is a high-performance operation that allows you to send multiple write operations (insertOne, updateOne, deleteOne, replaceOne, etc.) to the server in a single batch. This drastically reduces network round-trips and is the preferred method for any large-scale data modification.

javascript

db.collection.bulkWrite([
  { insertOne: { "document": { _id: 1, name: "New Product", price: 100 } } },
  { updateOne: {
      "filter": { name: "Old Product" },
      "update": { $set: { price: 200 } }
  } },
  { deleteOne: { "filter": { inStock: false } } },
  { replaceOne: {
      "filter": { _id: 2 },
      "replacement": { _id: 2, name: "Replaced Product", price: 300 }
  } }
], { ordered: false }); 
// Optional: specify ordered or unordered execution

db.students.bulkWrite([
  { insertOne: { 
	  document: { name: "John", age: 20 } 
	} 
  },
  { updateOne: { 
	  filter: { name: "John" }, 
	  update: { $set: { age: 21 } 
	  } 
	} 
  },
  { deleteOne: { 
	  filter: { name: "Old Student" } 
	 } 
   }
])

Ordered vs. Unordered Execution

ordered: true (Default): MongoDB executes the operations serially. If an error occurs on one operation, the entire bulk write process stops, and the remaining operations are not attempted.
ordered: false: MongoDB can execute the operations in parallel. If an error occurs, the process continues with the remaining operations. At the end, it reports all errors that occurred. This is faster but should be used when the order of operations does not matter.

db.collection.bulkWrite([...], { ordered: false })

Advantages of Bulk Write

Performance: The single most performant way to perform batch write operations.
Efficiency: Minimizes network traffic between your application and the database.
Use Cases: Ideal for data migrations, large-scale data cleanup jobs, or synchronizing data from an external source.

Numbers

Searching & Matrix

Sorting

Data Aggregation in MongoDB ​

The Aggregation Pipeline: ​

Why Use the Aggregation Pipeline? ​

2. $group: The Summarizer (Grouping Documents) ​

3. $project: The Reshaper (Selecting and Creating Fields) ​

4. $sort: The Organizer (Sorting Documents) ​

5. $unwind: The Deconstructor (Working with Arrays) ​

6. $lookup: The Joiner (Combining Collections) ​

Other Important Stages ​

Aggregation Operators Reference ​

Bulk Write Operations ​

Ordered vs. Unordered Execution ​

Advantages of Bulk Write ​

Data Aggregation in MongoDB

The Aggregation Pipeline:

Why Use the Aggregation Pipeline?

2. `$group`: The Summarizer (Grouping Documents)

3. `$project`: The Reshaper (Selecting and Creating Fields)

4. `$sort`: The Organizer (Sorting Documents)

5. `$unwind`: The Deconstructor (Working with Arrays)

6. `$lookup`: The Joiner (Combining Collections)

Other Important Stages

Aggregation Operators Reference

Bulk Write Operations

Ordered vs. Unordered Execution

Advantages of Bulk Write