Skip to content

Transform and Conquer

Presorting

Page 228

Presorting is an algorithm design technique that involves preprocessing a problem's input by sorting it to make the problem easier or more efficient to solve afterward. It is a widely used idea in computer science, as many questions about a list are simpler to answer if the list is sorted.

It transforms a given instance into a simpler or more convenient instance of the same problem. The benefits of sorting should ideally outweigh the time spent on the sorting itself.

Presorting for the Element Uniqueness Problem

How can presorting help solve the Element Uniqueness Problem? Write an algorithm and analyze its time efficiency.

The Element Uniqueness Problem involves checking whether all elements in a given array of n elements are distinct.

Presorting allows us to solve the problem more efficiently. After sorting the array, if there are any equal elements, they must be adjacent to each other. If a pair of equal adjacent elements is found, the algorithm returns "false"; otherwise, after checking all consecutive pairs, it returns "true".

Therefore, the algorithm only needs to check consecutive elements in the sorted array. Instead of comparing every pair of elements (as a brute-force approach would, which has a worst-case efficiency of $\Theta(n^2)$),

ALGORITHM PresortElementUniqueness(A[0..n − 1])
//Solves the element uniqueness problem by sorting the array first
//Input: An array A[0..n − 1] of orderable elements
//Output: Returns “true” if A has no equal elements, “false” otherwise

sort the array A       // Using an efficient sorting algorithm
for i <- 0 to n − 2 do
	if A[i] = A[i + 1]
		return false
return true

Time Efficiency Analysis:

The running time of the PresortElementUniqueness algorithm is determined by the sum of the time spent on sorting the array and the time spent on checking consecutive elements.

  1. Sorting Part ( sort the array A ):

    • If a good sorting algorithm like Mergesort is employed, its worst-case efficiency is in $\Theta(n \log n)$.
  2. Checking Part ( for i ← 0 to n − 2 do ):

    • This part of the algorithm scans the sorted array, making at most $n - 1$ comparisons between adjacent elements.
    • This part, therefore, has a time complexity in $\Theta(n)$.

Overall Time Efficiency: The total running time T(n) is the sum of the sorting time Tsort(n) and the scanning time Tscan(n):

T(n) = Tsort(n) + Tscan(n)

If an $\Theta(n \log n)$ sorting algorithm is used: T(n) = Θ(n log n) + Θ(n)

The term with the higher order of growth dominates the sum. Therefore, the worst-case efficiency of the entire presorting-based algorithm for element uniqueness is in $\Theta(n \log n)$.

This is a more efficient asymptotic class compared to the $\Theta(n^2)$ efficiency of the brute-force algorithm.

Algorithm to Find the Mode of a Dataset Using Presorting and Time Complexity

Write an algorithm to find the mode of a dataset using presorting and discuss time complexity. (i) When the list is not sorted
(ii) When the list is pre-sorted
Compare the time efficiency of both approaches.

A mode is defined as a value that occurs most frequently in a given list of numbers. If multiple values have the same highest frequency, any one of them can be considered the mode.

Using presorting to find the mode: Sorting the array first brings all identical values together, forming "runs" of equal adjacent elements.

Once sorted, the algorithm can simply scan the array to find the longest run of adjacent equal values, and that value will be the mode.

ALGORITHM PresortMode(A[0..n − 1])
// Computes the mode of an array by sorting it first
// Input: An array A[0..n − 1] of orderable elements
// Output: The array’s mode

sort the array A
i <- 0                 // current run begins at position i
mode_frequency <- 0    // highest frequency seen so far
mode_value <- null     // value of the highest frequency

while i ≤ n − 1 do
	run_length <- 1
	run_value <- A[i]
	while i + run_length ≤ n − 1 
		AND A[i + run_length] = run_value
	  run_length <- run_length + 1

	if run_length > mode_frequency
		mode_frequency <- run_length
		mode_value <- run_value

	i <- i + run_length   //beginning of the next distinct run

return mode_value

Time Efficiency Analysis of PresortMode: The algorithm's running time is primarily dictated by the sorting step, as the subsequent scan to find the longest run of adjacent elements takes linear time ($\Theta(n)$).

Therefore, if an $\Theta(n \log n)$ sorting algorithm (like Mergesort or Quicksort in average case) is used, the overall worst-case efficiency of PresortMode will be $\Theta(n \log n)$.

When the list is not sorted (Brute-Force Approach)

For an unsorted list, a brute-force approach would involve:

  1. Scanning the list to identify all distinct values.
  2. Computing the frequency of each distinct value.
  3. Finding the value with the highest frequency.

This can be implemented by maintaining an auxiliary list to store distinct values encountered so far, along with their frequencies.

Each element from the original list is compared against this auxiliary list. If a match is found, its frequency is incremented; otherwise, the element is added to the auxiliary list with a frequency of 1.

Time Efficiency for Unsorted List (Brute-Force): The worst-case scenario for this brute-force approach occurs when all elements in the input list are distinct. In this case, the i-th element of the original list is compared with i-1 elements already in the auxiliary list of distinct values before being added.

This results in a total of $\sum_{i=1}^{n} (i-1) = \frac{(n-1)n}{2}$ comparisons to build the frequency list. An additional $n-1$ comparisons are needed to find the maximum frequency.

Thus, the worst-case efficiency of the brute-force mode calculation is in $\Theta(n^2)$.

Made with ❤️ for students, by a fellow learner.