Transform and Conquer
Presorting
Page 228
Presorting is an algorithm design technique that involves preprocessing a problem's input by sorting it to make the problem easier or more efficient to solve afterward. It is a widely used idea in computer science, as many questions about a list are simpler to answer if the list is sorted.
It transforms a given instance into a simpler or more convenient instance of the same problem. The benefits of sorting should ideally outweigh the time spent on the sorting itself.
Presorting for the Element Uniqueness Problem
How can presorting help solve the Element Uniqueness Problem? Write an algorithm and analyze its time efficiency.
The Element Uniqueness Problem involves checking whether all elements in a given array of n elements are distinct.
Presorting allows us to solve the problem more efficiently. After sorting the array, if there are any equal elements, they must be adjacent to each other. If a pair of equal adjacent elements is found, the algorithm returns "false"; otherwise, after checking all consecutive pairs, it returns "true".
Therefore, the algorithm only needs to check consecutive elements in the sorted array. Instead of comparing every pair of elements (as a brute-force approach would, which has a worst-case efficiency of $\Theta(n^2)$),
ALGORITHM PresortElementUniqueness(A[0..n − 1])
//Solves the element uniqueness problem by sorting the array first
//Input: An array A[0..n − 1] of orderable elements
//Output: Returns “true” if A has no equal elements, “false” otherwise
sort the array A // Using an efficient sorting algorithm
for i <- 0 to n − 2 do
if A[i] = A[i + 1]
return false
return true
Time Efficiency Analysis:
The running time of the PresortElementUniqueness
algorithm is determined by the sum of the time spent on sorting the array and the time spent on checking consecutive elements.
Sorting Part (
sort the array A
):- If a good sorting algorithm like Mergesort is employed, its worst-case efficiency is in $\Theta(n \log n)$.
Checking Part (
for i ← 0 to n − 2 do
):- This part of the algorithm scans the sorted array, making at most $n - 1$ comparisons between adjacent elements.
- This part, therefore, has a time complexity in $\Theta(n)$.
Overall Time Efficiency: The total running time T(n)
is the sum of the sorting time Tsort(n)
and the scanning time Tscan(n)
:
T(n) = Tsort(n) + Tscan(n)
If an $\Theta(n \log n)$ sorting algorithm is used: T(n) = Θ(n log n) + Θ(n)
The term with the higher order of growth dominates the sum. Therefore, the worst-case efficiency of the entire presorting-based algorithm for element uniqueness is in $\Theta(n \log n)$.
This is a more efficient asymptotic class compared to the $\Theta(n^2)$ efficiency of the brute-force algorithm.
Algorithm to Find the Mode of a Dataset Using Presorting and Time Complexity
Write an algorithm to find the mode of a dataset using presorting and discuss time complexity. (i) When the list is not sorted
(ii) When the list is pre-sorted
Compare the time efficiency of both approaches.
A mode is defined as a value that occurs most frequently in a given list of numbers. If multiple values have the same highest frequency, any one of them can be considered the mode.
Using presorting to find the mode: Sorting the array first brings all identical values together, forming "runs" of equal adjacent elements.
Once sorted, the algorithm can simply scan the array to find the longest run of adjacent equal values, and that value will be the mode.
ALGORITHM PresortMode(A[0..n − 1])
// Computes the mode of an array by sorting it first
// Input: An array A[0..n − 1] of orderable elements
// Output: The array’s mode
sort the array A
i <- 0 // current run begins at position i
mode_frequency <- 0 // highest frequency seen so far
mode_value <- null // value of the highest frequency
while i ≤ n − 1 do
run_length <- 1
run_value <- A[i]
while i + run_length ≤ n − 1
AND A[i + run_length] = run_value
run_length <- run_length + 1
if run_length > mode_frequency
mode_frequency <- run_length
mode_value <- run_value
i <- i + run_length //beginning of the next distinct run
return mode_value
Time Efficiency Analysis of PresortMode
: The algorithm's running time is primarily dictated by the sorting step, as the subsequent scan to find the longest run of adjacent elements takes linear time ($\Theta(n)$).
Therefore, if an $\Theta(n \log n)$ sorting algorithm (like Mergesort or Quicksort in average case) is used, the overall worst-case efficiency of PresortMode
will be $\Theta(n \log n)$.
When the list is not sorted (Brute-Force Approach)
For an unsorted list, a brute-force approach would involve:
- Scanning the list to identify all distinct values.
- Computing the frequency of each distinct value.
- Finding the value with the highest frequency.
This can be implemented by maintaining an auxiliary list to store distinct values encountered so far, along with their frequencies.
Each element from the original list is compared against this auxiliary list. If a match is found, its frequency is incremented; otherwise, the element is added to the auxiliary list with a frequency of 1.
Time Efficiency for Unsorted List (Brute-Force): The worst-case scenario for this brute-force approach occurs when all elements in the input list are distinct. In this case, the i-th element of the original list is compared with i-1 elements already in the auxiliary list of distinct values before being added.
This results in a total of $\sum_{i=1}^{n} (i-1) = \frac{(n-1)n}{2}$ comparisons to build the frequency list. An additional $n-1$ comparisons are needed to find the maximum frequency.
Thus, the worst-case efficiency of the brute-force mode calculation is in $\Theta(n^2)$.