Skip to content

Greedy Technique

Spanning Tree.

A spanning tree of an undirected connected graph is defined as its connected acyclic subgraph (i.e., a tree) that contains all the vertices of the graph.

If the graph has weights assigned to its edges, a minimum spanning tree (MST) is a spanning tree with the smallest total weight, where the weight of the tree is the sum of the weights of all its edges.

The problem of finding such a tree is known as the minimum spanning tree problem.

Page 351

Kruskal’s Algorithm for Constructing a Minimum Spanning Tree

Kruskal’s Algorithm is a greedy algorithm that always yields an optimal solution for the minimum spanning tree problem. It approaches the problem by considering a minimum spanning tree as an acyclic subgraph that contains |V| - 1 edges ( |V| is the number of vertices) and for which the sum of the edge weights is the smallest.

The algorithm constructs the MST as an expanding sequence of subgraphs that are always acyclic, though they are not necessarily connected during intermediate stages.

The algorithm works as follows:

  1. It begins by sorting all the graph’s edges in nondecreasing order of their weights.
  2. Starting with an empty subgraph, it iterates through this sorted list of edges.
  3. For each edge, it checks if adding it to the current subgraph would create a cycle.
  4. If the addition does not create a cycle, the edge is added to the growing set of tree edges; otherwise, the edge is skipped.
  5. The algorithm stops when |V| - 1 edges have been selected, forming the MST.
  6. A new cycle is formed if and only if the new edge connects two vertices that are already connected by a path (i.e., they belong to the same connected component). This crucial check is efficiently managed using union-find algorithms.
ALGORITHM Kruskal(G)
// Implements Kruskal’s algorithm for constructing a minimum spanning tree
// Input: A weighted connected graph G = 〈V, E〉
// Output: E(T), the set of edges composing a minimum spanning tree of G

sort E in nondecreasing order of the edge weights w(ei1) ≤ . . . ≤ w(ei|E|)

ET <- ∅
ecounter <- 0   // initialize the set of tree edges and its size

k <- 0 // initialize the number of processed edges

while ecounter < |V| - 1 do
    k <- k + 1
    if ET ∪ {eik} is acyclic
        ET <- ET ∪ {eik}; 
        ecounter <- ecounter + 1
return ET

Time Complexity

The time efficiency of Kruskal’s Algorithm is primarily determined by the initial step of sorting the edges. With an efficient sorting algorithm, the time efficiency of Kruskal's algorithm is in O(|E| log |E|). This is because the union-find operations, when implemented efficiently (e.g., with path compression), contribute a complexity that is only slightly worse than linear, making the sorting step the dominant factor.

Dijkstra's Algorithm

Dijkstra's Algorithm is a widely recognized algorithm designed to solve the single-source shortest-paths problem in a weighted connected graph. This problem aims to find the shortest paths from a given starting vertex (the "source") to all other vertices in the graph.

The algorithm is applicable to both undirected and directed graphs, provided that all edge weights are non-negative.

This condition is met in most practical applications, contributing to the algorithm's widespread use in fields such as transportation planning, packet routing in communication networks, motion planning in computer games, and finding shortest paths in social networks.

Core Idea of Dijkstra's Algorithm

Dijkstra's algorithm operates by systematically finding the shortest paths to the graph's vertices in increasing order of their distance from the source. It does this by maintaining a set of vertices for which the shortest path from the source has already been determined, building up a "subtree" of shortest paths.

The general process is as follows:

  • It first identifies the shortest path from the source to the vertex nearest to it.

  • Then, it finds the shortest path to the second nearest vertex, and so on.

  • The algorithm labels each vertex with two pieces of information:

    • A numeric label (d) that indicates the length of the shortest path found so far from the source to that vertex.
    • When a vertex is added to the "tree" (the set of vertices whose shortest paths are finalized), this label represents the final shortest path length.
    • A non-numeric label (p) that indicates the penultimate vertex on that path, essentially the parent of the vertex in the shortest-path tree being constructed. This allows for the reconstruction of the actual paths.
  • Vertices not yet included in the "tree" but adjacent to a vertex in the tree are considered "fringe vertices". From these, the algorithm selects the next vertex u* that is closest to the source. This selection is based on comparing the sum of the shortest distance to a tree vertex v (which is already known) and the weight of the edge connecting v to the fringe vertex u (dv + w(v, u)).

  • Once u* is identified and moved from the fringe to the set of tree vertices, the labels of its adjacent "fringe" vertices are updated if a shorter path to them is found through u*.

Pseudocode for Dijkstra’s Algorithm

The algorithm uses a priority queue to efficiently manage the fringe vertices, storing them with their current shortest path lengths as priorities.

ALGORITHM Dijkstra(G, s)
//Dijkstra’s algorithm for single-source shortest paths
//Input: A weighted connected graph G = 〈V, E〉 with nonnegative weights
// and its vertex s (source)
//Output: The length dv of a shortest path from s to v
// and its penultimate vertex pv for every vertex v in V

Initialize(Q)            // Initialize priority queue to empty

for every vertex v in V do
    dv ← ∞; pv ← null
    Insert(Q, v, dv) // Initialize vertex priority in the priority queue

ds ← 0; Decrease(Q, s, ds) // Update priority of s with ds (source distance is 0)

VT ← ∅ // Initialize the set of tree vertices (shortest paths found)

for i ← 0 to |V| - 1 do
    u* ← DeleteMin(Q) // Delete the minimum priority element (closest fringe vertex)
    VT ← VT ∪ {u*}

    for every vertex u in V - VT that is adjacent to u* do
        if du* + w(u*, u) < du then
            du ← du* + w(u*, u); pu ← u*
            Decrease(Q, u, du)

return the distances (dv) and predecessors (pv) for all v in V

Time Efficiency

The time efficiency of Dijkstra's algorithm depends significantly on the data structures used for implementing the priority queue and representing the input graph.

  • When a min-heap is used for the priority queue, and the graph is represented by adjacency lists, the time efficiency of Dijkstra’s algorithm is in Θ((|V| + |E|) log |V|).
  • For a simpler implementation using an unordered array for the priority queue and an adjacency matrix for the graph, the running time would be in Θ(|V|²).

Dijkstra's algorithm is typically more efficient for sparse graphs (graphs with relatively few edges) when implemented with adjacency lists and a min-heap, while for dense graphs (graphs with many edges), an adjacency matrix implementation can be competitive.

Huffman Trees

Page 365

Huffman's Algorithm is a greedy algorithm used to construct a Huffman tree, which in turn defines a Huffman code. This code is an optimal prefix-free variable-length encoding scheme that assigns bit strings (codewords) to symbols based on their frequencies in a given text.

  • Encoding Text: Huffman's algorithm is designed to encode text comprising symbols from an alphabet.

  • Variable-Length Encoding: Unlike fixed-length encoding (like ASCII), Huffman's assigns codewords of different lengths to different symbols.

  • Prefix-Free Codes: To avoid ambiguity when decoding variable-length bit strings, Huffman's algorithm generates "prefix-free" codes. In a prefix-free code, no codeword is a prefix of another symbol's codeword. This allows for straightforward decoding by scanning the bit string and identifying the first matching codeword.

  • Optimality: The algorithm yields an optimal, or minimal-length, encoding, provided that the frequencies of symbol occurrences are independent and known in advance.

The core idea is to assign shorter codewords to more frequent symbols and longer codewords to less frequent symbols, aiming to yield a shorter average bit string for the encoded text.

Algorithm Steps for Constructing a Huffman Tree

The algorithm constructs the Huffman tree from the bottom up, by iteratively combining the two smallest-weight trees.

  1. Initialization: Start with n one-node trees. Each tree is labeled with a symbol from the alphabet, and its root's weight is set to the symbol's frequency (probability). The weight of a tree is generally the sum of the frequencies in its leaves.

  2. Iterative Combination: Repeat the following steps until only a single tree remains:

    • Find Smallest Weights: Identify the two trees with the smallest weights from the current set of trees. Ties can be broken arbitrarily.
    • Create New Tree: Form a new tree by making the two selected trees the left and right subtrees of a new root node.
    • Assign New Weight: The weight of this new tree's root is the sum of the weights of its two children (the two combined trees).
    • Replace Trees: Remove the two original trees from the set and add the newly formed tree to the set.

The single tree obtained at the end of this process is the Huffman tree.

This tree then defines the Huffman code, where tracing paths from the root to each leaf (symbol) forms its unique codeword, typically by labeling left edges with '0' and right edges with '1'.

Example : For an alphabet with symbols A, B, C, D, _ and probabilities A: 0.35, B: 0.1, C: 0.2, D: 0.2, _: 0.15:

  • The resulting codewords might be: A: 11, B: 100, C: 00, D: 01, _: 101.

Made with ❤️ for students, by a fellow learner.