Space and Time Tradeoffs
Sorting by Counting Algorithm
Page 280
Design an algorithm for Sorting by Counting. Trace the algorithm with an example.
Sorting by Counting, also known as Comparison-Counting Sort, is an algorithm that sorts a given list by determining, for each element, the number of elements smaller than it. This count directly indicates the final position of the element in the sorted output list.
For instance, if an element has three smaller elements, it will be placed in index 3 in a 0-indexed array (fourth position) in the sorted list.
ComparisonCountingSort(A[0..n-1])
//Sorts an array by comparision ocunting
//Input: An Array A[0..n-1] of orderable elements
//Output: Array A[0..n-1] sorted in Increasing Order
for i <- 0 to n-1 do
Count[i] <- 0 // initializing Count array
for i <- 0 to n-2 do
for j <- i+1 to n-1 do
if A[i] < A[j]
Count[j] <- Count[j]+1
else
Count[i] <- Count[i]+1
for i<- 0 to n-1 do
S[Count[i]] <- A[i]
return S
Characteristics:
Time Efficiency: The core of the algorithm involves nested loops comparing every pair of elements. This leads to a time complexity of Θ(n^2) in all cases (worst, average, and best), as the number of key comparisons is
n(n-1)/2
.Space Efficiency: The algorithm requires Θ(n) extra space for the
Count
array and theS
array.
Sorting by Distribution Counting.
Explain the algorithm for Sorting by Distribution Counting.
Sorting by Distribution Counting is an input-enhancement sorting technique particularly effective when the elements to be sorted are integers within a known, limited range of values (e.g., from a lower bound l
to an upper bound u
).
Instead of direct comparisons between elements, this method leverages the distribution of values to place elements directly into their sorted positions.
Algorithm DistributionCountingSort(A[0..n-1], l, u):
// A: input array of size n
// l: minimum value in A
// u: number of distinct values in A (max-min+1)
// Output: sorted array S
for j <- 0 to u-1 do
D[j] <- 0 // Initialize frequency array
for i <- 0 to n-1 do // Count frequencies
D[A[i] - l] <- D[A[i] - l] + 1
for j <- 1 to u-1 do // Compute prefix sums(distribution)
D[j] <- D[j-1] + D[j]
for i <- n-1 downto 0 do // Place into output array
j <- A[i] - l
S[D[j] - 1] <- A[i]
D[j] <- D[j] - 1
return S
The algorithm proceeds in three main steps:
Compute Frequencies:
- An auxiliary array,
D
(for distribution or frequencies), is initialized. Its size isu - l + 1
to cover all possible integer values in the range[l, u]
. - The algorithm iterates through the original input array
A
. For each elementA[i]
, it increments the counterD[A[i] - l]
recording the frequency of each distinct value in the input array.
- An auxiliary array,
Compute Distribution (Accumulated Frequencies):
- The
D
array is then transformed to store accumulated frequencies. - This is done by iterating from the second element of
D
(index 1) to its end. EachD[j]
is updated toD[j-1] + D[j]
. - These accumulated sums directly indicate the correct positions for the last occurrences of their respective values in the final sorted array.
- The
Place Elements in Sorted Array:
- A new array
S
of the same size as the input array is created to hold the sorted results. - The algorithm iterates through the original input array
A
from right to left (fromi = n-1
down to0
). - For each element
A[i]
:- It determines its corresponding index
j
in theD
array (j = A[i] - l
). - It places
A[i]
intoS[D[j] - 1]
. The-1
adjusts for 0-based indexing, asD[j]
currently holds a count (which is 1-based). - Immediately after placing
A[i]
,D[j]
is decremented by 1 (D[j] = D[j] - 1
).
- It determines its corresponding index
- A new array
Characteristics:
Time Efficiency: Assuming the range of values (
u - l
) is fixed (i.e., constant), the algorithm performs a fixed number of passes through the input array and the frequency array. This makes it a linear-time algorithm with a time complexity of Θ(n).This is asymptotically more efficient than comparison-based sorting algorithms like mergesort or quicksort for inputs satisfying its prerequisites.
Space Efficiency: It requires Θ(n + (u - l)) extra space for the
D
andS
arrays. The(u - l)
term is constant if the range is fixed, making it Θ(n) overall.Stability: The right-to-left processing of the input array combined with decrementing the distribution counts ensures that the relative order of equal elements is preserved in the sorted output, making the algorithm stable.
Hashing – Open Hashing, Closed Hashing
Page 295
The core idea of hashing involves distributing keys (the elements to be stored) into a one-dimensional array known as a hash table, typically denoted as H[0..m-1]
.
This distribution is achieved by a hash function, h
, which computes an integer between 0 and m-1
for each key. This integer is called the hash address.
Example: If key is a non negative integer, h(K) = K mod m can be the hash function. The remainder of division by m will always be between 0 and m-1
Hashing is a highly efficient method for implementing dictionaries, which are abstract data types that support operations like searching, inserting, and deleting elements.
Occurrence of collisions are the fundamental issue in hashing, which happen when two or more distinct keys are assigned the same hash address. Collisions are unavoidable if the hash table's size (m
) is smaller than the total number of keys (n
), but they can also occur even when m
is considerably larger than n
.
Every hashing scheme must incorporate a collision resolution mechanism. which is handled differently in two main versions of hashing: Open hashing and Closed hashing.
Open Hashing
Open hashing, also known as separate chaining, is a highly efficient method for implementing dictionaries.
Separate Chaining: In open hashing, keys are not stored directly in the hash table itself. Instead, each cell (or "bucket") of the hash table
H[0...m-1]
holds a linked list.Each linked list contains all the keys that hash to that specific cell. This means that the hash table size
m
does not necessarily have to be at least as large as the number of keysn
.
Example: To insert a key K
(e.g., a word like "A", "FOOL", "ARE"), the hash function h(K)
is computed to determine its hash address. If h(ARE) = 11
and h(SOON) = 11
(sum modulo 13), both "ARE" and "SOON" would be placed in the linked list for cell 11.
- Searching: To search for a key
K
, the same hash functionh(K)
is computed to get the hash address. If the linked list ath(K)
is empty, the search is unsuccessful. If it's not empty, the linked list is traversed (compared against each element) until a match is found (successful search) or the end of the list is reached (unsuccessful search).
Closed Hashing
In closed hashing, also known as open addressing, all keys are stored directly within the hash table itself, without resorting to linked lists or other external structures. This implies that the table size m
must be at least as large as the number of keys n
.
When a collision occurs, a strategy is employed to find an alternative empty cell in the table for the new key.
Collision Resolution Techniques in Closed Hashing:
1. Linear Probing
Linear probing is the simplest strategy for resolving collisions in closed hashing.
When a key K
hashes to an occupied address h(K)
, the algorithm sequentially checks the next consecutive cell (h(K) + 1) mod m
.
If this cell is empty, the key is inserted there. If it is also occupied, the process continues to the subsequent cell (h(K) + 2) mod m
, and so on, until an empty slot is found.
The search "wraps around" to the beginning of the table if the end is reached, treating the table as a circular array.
Example: Let's insert the keys 30, 20, 56, 75, 31, 19 into a hash table of size m = 11
using the hash function h(K) = K mod 11
.
For key 30:
h(30) = 30 mod 11 = 8
.Place 30 at index 8.
[_, _, _, _, _, _, _, _, 30, _, _]
For key 20:
h(20) = 20 mod 11 = 9
.Place 20 at index 9.
[_, _, _, _, _, _, _, _, 30, 20, _]
For key 56:
h(56) = 56 mod 11 = 1
.Place 56 at index 1.
[_, 56, _, _, _, _, _, _, 30, 20, _]
For key 75:
h(75) = 75 mod 11 = 9
. Collision at index 9 (occupied by 20)!- Probe 1:
(9 + 1) mod 11 = 10
. Index 10 is empty. Place 75 at index 10.[_, 56, _, _, _, _, _, _, 30, 20, 75]
- Probe 1:
For key 31:
h(31) = 31 mod 11 = 9
. Collision at index 9 (occupied by 20)!- Probe 1:
(9 + 1) mod 11 = 10
. Index 10 is occupied (by 75). - Probe 2:
(9 + 2) mod 11 = 0
. Index 0 is empty. - Place 31 at index 0.
[31, 56, _, _, _, _, _, _, 30, 20, 75]
- Probe 1:
For key 19:
h(19) = 19 mod 11 = 8
. Collision at index 8 (occupied by 30)!- Probe 1:
(8 + 1) mod 11 = 9
. Index 9 is occupied (by 20). - Probe 2:
(8 + 2) mod 11 = 10
. Index 10 is occupied (by 75). - Probe 3:
(8 + 3) mod 11 = 0
. Index 0 is occupied (by 31). - Probe 4:
(8 + 4) mod 11 = 1
. Index 1 is occupied (by 56). - Probe 5:
(8 + 5) mod 11 = 2
. Index 2 is empty. - Place 19 at index 2.
[31, 56, 19, _, _, _, _, _, 30, 20, 75]
- Probe 1:
Searching for a key involves the same probing sequence. If a match is found, the search is successful; if an empty cell is encountered before a match, the search is unsuccessful.
Deletion is more complex and often uses "lazy deletion" to avoid breaking search paths for other keys.
2. Double Hashing
Double hashing is another collision resolution strategy designed to address the clustering issue found in linear probing.
When a collision occurs at a hash address l = h(K)
, a second hash function, s(K)
, is employed to determines a fixed increment that is used for the probing sequence.
The sequence of cells to check then becomes (l + s(K)) mod m
, (l + 2s(K)) mod m
, (l + 3s(K)) mod m
, and so forth.
To ensure that every location in the hash table can potentially be probed by this sequence, the increment s(K)
and the table size m
must be relatively prime (their only common divisor is 1).
This condition is automatically satisfied if m
is chosen as a prime number. Some recommended secondary hash functions are s(k) = m - 2 - k mod (m - 2)
.