Unsupervised Learning: Descriptive Modeling

Unsupervised learning, called descriptive learning, where the objective is to discover hidden patterns or intrinsic structures (natural groupings or associations) in data in unlabeled dataset. This process is known as pattern discovery or knowledge discovery.

The algorithms work on the data without any prior training, but they are constructed in such a way that they can identify patterns, groupings, sorting order, and numerous other interesting knowledge from the set of data.

The process works on unlabeled and unclassified data (i.e., data without a known output, class, or label).

Process Flow: Unlabeled Data → Unsupervised Learning Model → Data Patterns

Applications of Unsupervised Learning

Customer Segmentation: Grouping consumers based on demography or purchasing habits.
Recommender Systems: Identifying products or media that similar users like.
Anomaly/Fraud Detection: Identifying unusual patterns in data (e.g., fraudulent transactions).
Document Clustering: Grouping similar texts or articles.
Dimensionality Reduction: Using techniques like Principal Component Analysis (PCA) to reduce the number of features in a dataset while retaining important information.

Comparison to Supervised Learning

Feature	Supervised Learning (Predictive)	Unsupervised Learning (Descriptive)
Data Type	Labeled training data (known output)	Unlabeled data (no known output)
Objective	Predict the class or value of unknown objects	Find groups, structures, or patterns
Output Variable	An explicit target variable ( $Y$ ) exists	No target variable; input features ( $X$ ) only
Examples	Classification, Regression	Clustering, Association Analysis, PCA

In terms of statistics,

Unsupervised learning is closely related to density estimation.
Supervised learning algorithm will try to learn the probability of outcome Y for a particular input X, which is called the posterior probability.

Types of Unsupervised Learning

The two major techniques are Clustering and Association Analysis.

A. Clustering

It involves grouping similar objects together, discovering unknown subgroups in data.

Grouping Principle: The goal is to partition a dataset into disjoint subsets (clusters). Objects within the same cluster should be highly similar (high intra-cluster similarity), while objects in different clusters should be highly dissimilar (low inter-cluster similarity).
Similarity Measure: The technique relies on a similarity measure, often using distance metrics (e.g., Euclidean or Minkowski distance). Data items are considered part of the same cluster if the distance between them is less. This is known as distance-based clustering.

Fields where clustering analysis is used:

Text data mining, summarization
Customer segmentation
Anomaly checking

Common Techniques:

Partitioning Methods: Divides data into non-overlapping subgroups (e.g., K-Means, K-Medoids).
Hierarchical Methods: Builds a tree-like hierarchy of clusters, which can be agglomerative (bottom-up, starting with single clusters) or divisive (top-down, starting with one cluster).
Density-Based Methods: Defines clusters as dense regions of data points (e.g., DBSCAN).

Performance Evaluation: Evaluating cluster quality is often subjective, as there is no "ground truth." A popular internal measure is the silhouette coefficient, which measures intra-cluster homogeneity and inter-cluster heterogeneity.

B. Association Analysis (Association Rule Learning)

Association analysis focuses on identifying "if-then" relationships or patterns between attributes in a dataset.

It extracts rules that best explain observed relationships between variables.
The Apriori algorithm is widely used for association rule learning.
The most common application is Market Basket Analysis, which finds strong associations like, "If a customer buys item A, they are also likely to buy item B."

The application of association analysis is also widespread in other domains such as bioinformatics, medical diagnosis, scientific data analysis, and web data mining.

Reinforcement Learning (RL)

Reinforcement learning (RL) is a type of machine learning where an agent (the model) learns to make optimal decisions by interacting with an environment through a penalty/reward mechanism; it receives rewards for actions that lead to good outcomes and penalties (or no reward) for those that do not.

It is used for solving problems where decision-making is sequential and the goal is long-term (e.g., game playing or resource management).

1. The Reinforcement Learning Framework

RL attempts to emulate "learning by self" (trial and error).

Analogy (Child Learning to Walk):

The child is the agent.
The goal is the task (walking).
The floor with hurdles is the environment.
Successfully taking a step is a reward.
Falling is a penalty (negative reward).

2. The Markov Decision Process (MDP)

RL problems are formally described using the Markov Decision Process (MDP), which consists of a quadruplet $E = \langle X, A, P, R \rangle$ :

$X$ : The set of all possible states (a description of the environment).
$A$ : The set of all possible actions the agent can perform.
$P$ : The transition function (the probability of moving from one state to another given an action).
$R$ : The reward function (the reward returned by the environment after a state transition).

The agent influences the environment by taking actions but only perceives the environment by observing transited states and returned rewards.

The Objective: Learning a Policy

The agent's goal is to learn a policy ( $\pi$ ), which is a "strategy" that selects the best action to take in any given state and maximizes the long-term cumulative rewards.

Policy Representation: A policy can be deterministic ( $\pi: X \rightarrow A$ , mapping a state to a single action) or stochastic ( $\pi: X \times A \rightarrow [0, 1]$ , mapping a state-action pair to a probability).

Comparison with Supervised Learning

While both use "feedback," the mechanism is different:

Sequential Decisions: RL handles sequential, long-term goals, whereas supervised learning typically involves single-shot predictions.
Time-Delayed Labels: In supervised learning, the label (feedback) is immediate and explicit. In RL, the feedback (reward) is often time-delayed; an agent may not know if an action was "good" or "bad" until many steps later.
No Labels: In RL, there are no labeled samples to tell the agent what to do for a given state.

RL Sub-Paradigms and Techniques

Model-Based vs. Model-Free:

Model-Based: The agent knows or learns the environment's model ( $P$ and $R$ ).
Model-Free: The agent does not know the model and learns purely from trial and error (e.g., Q-Learning).

Exploration vs. Exploitation: The agent must balance Exploitation (using the best action it knows) with Exploration (trying new actions to discover if they are better). This is studied in the K-Armed Bandit problem.

Imitation Learning: Learning from examples of decisions provided by human experts (demonstrations).

Inverse Reinforcement Learning (IRL): Deriving the (unknown) reward function by observing an expert.

Reinforcement Learning from Human Feedback (RLHF): An alignment technique, critical to modern Large Language Models (LLMs), where a model is trained using feedback (rewards) generated by human overseers to align its behavior with human expectations.

Applications

Game Playing: Intelligent game bots (e.g., Google's AlphaGo).
Self-Driving Cars: Making real-time decisions about speed, steering, and braking.
Robotics: Training robots to navigate complex environments or manipulate objects.
Resource Management: Optimizing energy consumption, logistics, or financial trading.

Comparison: Supervised, Unsupervised, and Reinforcement Learning

Feature	Supervised Learning	Unsupervised Learning	Reinforcement Learning
Model Type	Predictive Model	Descriptive Model	Policy/Agent Learning
Input Data	Labeled training data.	Unlabeled data.	State $X$ , Action $A$ , and Reward $R$ from the environment.
Core Goal	Prediction (class or value).	Pattern/Group Discovery.	Maximizing long-term cumulative rewards.
Mechanism	Guided learning from labeled input.	Self-discovery of intrinsic relationships.	Trial-and-error using a reward/penalty system.
Feedback	Immediate (label provided in data).	None / Implicit in data structure.	Time-delayed (received after actions lead to outcomes).
Output	A Classification or Regression model.	Clusters, Association Rules, or reduced dimensions.	An optimal Policy (a function mapping states to actions).
Examples	Tumor prediction, stock price forecasting.	Customer segmentation, Market Basket Analysis.	Self-driving cars, game playing (AlphaGo).

Other Learning Types

Semi-Supervised Learning

A hybrid approach that uses both labeled and unlabeled data for training, typically a small amount of labeled data and a large amount of unlabeled data.
The goal is the same as supervised learning (prediction), but it leverages the unlabeled data to improve model accuracy.

Active Learning

A form of supervised learning applied when obtaining labeled examples is expensive or time-consuming.
The learning algorithm interactively queries the user to obtain labels for new data points that it calculates would be the most informative for improving its model.

Numbers

Searching & Matrix

Sorting

Unsupervised Learning: Descriptive Modeling ​

Applications of Unsupervised Learning ​

Comparison to Supervised Learning ​

Types of Unsupervised Learning ​

A. Clustering ​

B. Association Analysis (Association Rule Learning) ​

Reinforcement Learning (RL) ​

1. The Reinforcement Learning Framework ​

2. The Markov Decision Process (MDP) ​

The Objective: Learning a Policy ​

Comparison with Supervised Learning ​

RL Sub-Paradigms and Techniques ​

Applications ​

Comparison: Supervised, Unsupervised, and Reinforcement Learning ​

Other Learning Types ​

Semi-Supervised Learning ​

Active Learning ​