P, NP, NP-Complete, and NP-Hard Problems
Page 427
In computer science, computational complexity theory is a way of classifying problems based on how difficult they are to solve.
We focus on decision problems, which are questions with a simple "yes" or "no" answer. A problem's difficulty is measured by its time complexity—how the runtime of the best possible algorithm grows as the input size (n
) increases.
A problem is considered tractable (or "easy") if it can be solved in polynomial time, meaning its worst-case runtime is O(n^k) for some constant k. These problems are solvable in a practical amount of time, even for large inputs.
Problems that require more than polynomial time (e.g., exponential time, O(2^n) ) are called intractable (or "hard"). As the input size grows, the time needed to solve them becomes impossibly long.
Class P (Polynomial Time)
P is the class of decision problems that can be solved by a deterministic algorithm in polynomial time. These are the tractable problems that we consider "efficiently solvable."
- Examples: Searching for an element in a sorted list, Sorting an array of numbers, Finding the shortest path between two points in a graph.
Class NP (Nondeterministic Polynomial Time)
NP is the class of decision problems for which a proposed solution can be verified as right or wrong in polynomial time even if finding it might be hard. A "yes" answer can be verified quickly.
A nondeterministic algorithm for NP problems has two stages: guessing a candidate solution and verifying it using a deterministic polynomial-time algorithm that checks if the candidate in a solution.
All problems in P are also in NP (P ⊆ NP), but NP contains many harder combinatorial optimization problems such as the Hamiltonian circuit, traveling salesman (decision version), and knapsack (decision version) problems for which no polynomial-time algorithms have been found yet.
Analogy: A Sudoku puzzle. Solving a hard Sudoku can take a long time, but if completed grid is given, very quickly it can be checked if it follows all the rules.
Relationship to P: All problems in P are also in NP (written as P ⊆ NP). If you can solve a problem quickly, you can certainly verify a solution quickly.
Class NP-Complete (The Hardest in NP)
NP-Complete problems are the "hardest" problems within the NP class. They have two defining properties:
The problem itself is in NP (solutions can be verified quickly).
Every other problem in NP can be polynomially reduced to it.
This means that an arbitrary instance of any NP problem can be transformed into an instance of an NP-complete problem in polynomial time.
Reduction means that you had an algorithm to solve just one NP-Complete problem, you could use it to solve every problem in NP efficiently. Therefore, finding a polynomial-time solution for a single NP-Complete problem would prove that P = NP.
Key Idea: The hardest problems in NP that are all linked together.
The Hamiltonian Circuit Problem: Is there a path in a graph that visits every node exactly once and returns to the start?
Boolean Satisfiability (SAT): Is there an assignment of true/false values that makes a given complex logical expression true?
Graph Coloring: Can the nodes of a graph be colored with 'k' colors so that no two adjacent nodes share the same color?
Class NP-Hard (At Least as Hard as NP-Complete)
NP-Hard problems are at least as difficult as NP-Complete problems, but they don't have to be in NP themselves. They are not necessarily decision problems.
Distinction: An NP-Hard problem does not need to have solutions that are verifiable in polynomial time. Many are optimization problems (which ask "what is the best solution?") rather than decision problems. No polynomial-time algorithms are known for NP-hard problems, and it is widely believed that none exist.
The Traveling Salesperson Problem (Optimization Version): Find the absolute shortest possible route that visits a set of cities and returns to the origin. This asks for a value (the shortest distance), not a yes/no answer, so it's not in NP, but it is NP-Hard.
The Knapsack Problem (Optimization Version): Given a set of items with weights and values, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
Approximation Algorithms
Approximation algorithms are a crucial approach for handling NP-hard problems, such as the Traveling Salesman Problem (TSP) and the Knapsack Problem, for which no known polynomial-time exact algorithms exist. These algorithms aim to find good, though not necessarily optimal, solutions within an acceptable amount of time.
- Feasible Solution In the context of optimization problems is a point in the problem’s search space that satisfies all the problem’s constraints. For example, in the Traveling Salesman Problem, a Hamiltonian circuit is a feasible solution, and for the Knapsack Problem, a subset of items whose total weight does not exceed the knapsack’s capacity is a feasible solution.
1. Two Approximation Algorithms for the Traveling Salesman Problem (TSP)
The Traveling Salesman Problem (TSP) involves finding the shortest tour through a given set of n cities that visits each city exactly once before returning to the starting city. This problem has important applications in route planning and manufacturing.
Here are two approximation algorithms used to solve the TSP:
Nearest-Neighbor Algorithm : This is a simple greedy algorithm. It starts at an arbitrary city and, at each step, visits the nearest unvisited city until all cities have been included in the tour. Finally, it returns to the starting city to complete the circuit.
- Example: If the algorithm starts at city 'a', it would choose the closest city to 'a', then the closest unvisited city to that one, and so on.
- Performance: While straightforward due to its simplicity, the nearest-neighbor algorithm has a significant drawback: its accuracy ratio is unbounded. This means that for some instances, the length of the approximate tour can be arbitrarily longer than the optimal tour. Its performance ratio (RA) is considered to be infinity.
Twice-Around-the-Tree Algorithm : This algorithm leverages the concept of a Minimum Spanning Tree (MST):
- Construct a Minimum Spanning Tree (MST) of the graph corresponding to the TSP instance.
- Starting at an arbitrary vertex, perform a walk around the MST, recording all vertices as they are encountered (e.g., using a Depth-First Search traversal).
- Scan the recorded list of vertices and eliminate all repeated occurrences except for the starting vertex at the end of the list. This step creates "shortcuts". The resulting sequence of unique vertices forms a Hamiltonian circuit, which is the algorithm's approximate solution.
2. Approximation Algorithms for the Knapsack Problem
The Knapsack Problem involves selecting the most valuable subset of items that fit into a knapsack of a given capacity, given their weights and values. This is also an NP-hard problem.
Approximation algorithms for the Knapsack Problem include:
Greedy Algorithms : A common greedy strategy for the knapsack problem is to compute the value-to-weight ratio (vi/wi) for each item. Items are then selected and added to the knapsack in decreasing order of these ratios.
Continuous (Fractional) Knapsack Problem: If taking arbitrary fractions of items is permitted, this greedy algorithm always yields an exact optimal solution. The steps involve computing the ratios, sorting items by these ratios, and then taking items from the top of the sorted list, including a fraction of an item if needed to fill the knapsack's remaining capacity. The optimal value found for the continuous version can serve as an upper bound for the discrete version.
Discrete (0-1) Knapsack Problem: For the more common discrete version, where items must be taken entirely or not at all, this simple greedy algorithm does not guarantee an optimal solution. However, an enhanced greedy algorithm for the discrete knapsack problem has a performance ratio of 2, meaning its approximate solution's value is at least half of the optimal value.
Polynomial-Time Approximation Schemes (PTAS) : For the discrete knapsack problem, more sophisticated approaches exist as polynomial-time approximation schemes. These are families of algorithms that can achieve arbitrarily high levels of accuracy. The accuracy level is typically controlled by a parameter, k.
- Sahni's Scheme: An example is Sahni's scheme. It works by considering all subsets of the k most valuable items (or items whose total weight doesn't exceed a threshold
W0
) that fit into the knapsack. For each of these subsets, it then fills the remaining knapsack capacity by selecting items from the rest of the set using the greedy approach (based on value-to-weight ratios).
- Sahni's Scheme: An example is Sahni's scheme. It works by considering all subsets of the k most valuable items (or items whose total weight doesn't exceed a threshold