Unravel the mysteries of unsupervised learning, a revolutionary technique that enables machines to become autonomous data analysts, extracting valuable insights without human intervention.
Table of contents
- What is unsupervised learning?
- Unsupervised vs. supervised learning
- How unsupervised learning works
- Types of unsupervised learning
- Applications of unsupervised learning
- Advantages of unsupervised learning
- Disadvantages of unsupervised learning
What is unsupervised learning?
Unsupervised learning is a type of machine learning (ML) that finds patterns and relationships within data on its own. The term unsupervised means that the model uses unlabeled data, meaning it gets no instructions from humans on what to look for or even guidance on what it’s looking at. Instead, it uses algorithms to evaluate datasets and find correlations, similarities, differences, and other ways to describe the data using math.
Machine learning is a subset of artificial intelligence (AI) that uses data and statistical methods to build models that mimic human reasoning rather than relying on hard-coded instructions. Unsupervised learning takes an exploratory, data-driven approach to draw conclusions from large datasets, such as grouping entities by common characteristics or finding which data points tend to co-occur—which could play out as sorting pictures of deciduous from evergreen trees, or finding that people who stream Sesame Street are likely to watch Daniel Tiger too.
Unsupervised vs. supervised learning
In contrast to unsupervised methods, supervised learning uses labeled data that pairs inputs with the correct outputs. Conversely, unsupervised learning has no inputs and outputs for the model to intuit, only data to analyze.
Labels provide the so-called supervision of the model’s learning process, guiding it to reverse-engineer its way to the correct answer from a given input. Using supervised learning makes sense when you have this sort of data that the model can aim toward and extrapolate from, including:
- Yes or no decisions, such as spam or fraud detection
- Classification, such as identifying objects within an image or speech recognition
- Forecasting, such as home prices or weather
Unsupervised learning, by contrast, isn’t for arriving at the right answer but rather for finding patterns or groupings within data. The three main applications are:
- Clustering, such as customer segmentation or document grouping
- Association, such as recommendation engines or security anomalies
- Dimensionality reduction, generally used to compress large datasets to make them more manageable
Machine learning isn’t limited to just supervised or unsupervised methods; these are merely two ends of a spectrum. Other types of machine learning methods include semi-supervised, reinforcement, and self-supervised learning.
How unsupervised learning works
Unsupervised learning is conceptually simple: Algorithms process large amounts of data to determine how various data points are related. Because the data is unlabeled, unsupervised learning has no context or goal. It’s simply trying to find patterns and other characteristics.
Here’s a brief overview of the unsupervised learning process:
1 Data collection and cleaning. Unsupervised learning evaluates one table at a time, so if you have multiple datasets, you must carefully merge them. It’s also important to tidy up the data to the best of your ability, like removing duplicates and correcting errors.
2 Feature scaling. Unsupervised algorithms can be thrown off by large ranges, so consider transforming features into tighter ranges using techniques including:
- Normalization: transforms the top value to 1, the lowest value to 0, and everything else as a decimal.
- Standardization: specifies the average value as 0 and the standard deviation as 1, with each data point adjusted accordingly.
- Logarithmic transformation: compresses wide ranges, so with a base-10 logarithm, 100,000 becomes 6, and 1,000,000 becomes 7.
3 Algorithm selection. There are multiple algorithms for each type of unsupervised learning, each with strengths and weaknesses (we’ll go through them in the next section). You may choose to apply different algorithms to the same dataset and compare.
4 Pattern discovery and identification. The chosen algorithm gets to work. This can take seconds to hours, depending on the size of the dataset and the algorithm’s efficiency. If you have a large dataset, you may want to run the algorithm on a subset before processing the whole thing.
5 Interpretation. At this stage, it’s time for humans to take over. A data analyst can use charts, spot checks, and various calculations to analyze and interpret the data.
6 Application. Once you’re confident you’re getting useful results, put it to use. We’ll talk about some applications of unsupervised learning later on.
Types of unsupervised learning
There are several types of unsupervised learning, but the three most widely used are clustering, association rules, and dimensionality reduction.
Clustering
Clustering creates groups of data points. It’s really useful for bundling items that are similar to each other so they can later be classified by human analysis. For instance, if you have a dataset that includes customer age and average transaction dollar amount, it might find clusters that help you decide where to target your ad dollars.
Types of clustering include:
- Exclusive or hard clustering. Each data point can belong to only one cluster. One popular approach known as k-means allows you to specify how many clusters you want to create, though others can determine the optimal number of clusters.
- Overlapping or soft clustering. This approach allows a data point to be in multiple clusters and have a “degree” of membership in each rather than purely in or out.
- Hierarchical clustering. If it’s done bottom-up, it’s called hierarchical agglomerative clustering, or HAC; top-down is called divisive clustering. Both involve lots of clusters organized into larger and larger ones.
- Probabilistic clustering. This is a different approach that figures out the percentage likelihood of any given data point belonging to any category. One advantage to this approach is that it can assign a certain data point a very low probability of being a part of a given cluster, which might highlight anomalous or corrupt data.
Association rules
Also known as association rule mining or association rule learning, this approach finds interesting relationships between data points. The most common use of association rules is to figure out which items are commonly bought or used together so the model can suggest the next thing to buy or show to watch.
The three core concepts of association rules are:
- Support. How frequently are A and B found together as a percentage of all the available instances (e.g., transactions)? A and B can be individual items or sets representing multiple items.
- Confidence. How often is it that if A is seen, B is also seen?
- Lift. What is the likelihood of A and B being seen together, compared to if there were no correlation? Lift is the measure of the “interestingness” of an association.
Dimensionality reduction
Dimensionality reduction corresponds to the number of columns in a table. Other terms for columns in this context are features or attributes. As the number of features in a dataset grows, analyzing the data and achieving optimal results becomes more challenging.
High-dimensional data takes more time, computing power, and energy to process. It can also lead to substandard outputs. One particularly pernicious example is overfitting, the tendency of machine learning models to learn too much from the details in the training data at the expense of broader patterns that generalize well to new data.
Dimensionality-reducting algorithms create simplified datasets by condensing the original data into smaller, more manageable versions that retain the most important information. They work by merging correlated features and noting the variation from the general trend, effectively reducing the number of columns without losing key details.
For instance, if you had a dataset about hotels and their amenities, the model might find that many features are correlated with the star rating, so it could compress attributes such as spa, room service, and 24-hour reception into a single column.
Typically, engineers reduce dimensionality as a pre-processing step to improve the performance and outcomes of other processes, including but not limited to clustering and association rule learning.
Applications of unsupervised learning
Some examples include:
- Market basket analysis. Retailers make abundant use of association rules. For instance, if you’ve put hot dogs in your grocery shopping cart, it may suggest you buy ketchup and hot dog buns because it’s seen a high lift from these combinations from other shoppers. The same data may also lead them to put ketchup and hot dogs next to each other in the supermarket.
- Recommendation engines. These look at your personal data—demographics and behavior patterns—and compare it to others’ to guess what you might enjoy buying or watching next. They can use the three types of unsupervised learning: clustering to determine which other customers’ patterns might predict yours, association rules to find correlations between certain activities or purchases, and dimensionality reduction to make complex datasets easier to process.
- Customer segmentation. While marketers have been dividing their audiences into named categories for decades, unsupervised clustering can pick out groupings that may not have been on any human’s mind. This approach allows for behavior-based analysis and can help teams target messaging and promotions in new ways.
- Anomaly detection. Because it’s very good at understanding patterns, unsupervised learning is often used to alert when things are abnormal. Uses include flagging fraudulent credit card purchases, corrupted data in a table, and arbitrage opportunities in financial markets.
- Speech recognition. Speech is complicated for computers to parse, as they have to contend with background noise, accents, dialects, and voices. Unsupervised learning helps speech recognition engines learn which sounds correlate with which phonemes (units of speech) and which phonemes are typically heard together, in addition to filtering background noise and other enhancements.
Advantages of unsupervised learning
- Low human involvement. Once an unsupervised learning system is proven reliable, running it takes little effort beyond ensuring the inputs and outputs are properly routed.
- Works on raw data. There’s no need to provide labels—that is, to specify what output should result from a given input. This capability to handle data as it comes is extremely valuable when dealing with enormous amounts of untouched data.
- Hidden pattern discovery. With no goal or agenda other than finding patterns, unsupervised learning can point you to “unknown knowns”—conclusions based on data you hadn’t previously considered but that make sense once presented. This approach is particularly useful for finding needles in haystacks, such as analyzing DNA for the cause of cell death.
- Data exploration. By reducing dimensionality and finding patterns and clusters, unsupervised learning gives analysts a head start on making sense of novel datasets.
- Incremental training. Many unsupervised models can learn as they go: As more data comes in, they can evaluate the latest input in relation to what they’ve already discovered. This takes a lot less time and computing effort.
Disadvantages of unsupervised learning
- You need a lot of data. Unsupervised learning is prone to big mistakes if trained on limited examples. It might find patterns in the data that don’t hold in the real world (overfitting), change dramatically in the face of new data (instability), or not have enough information to determine anything meaningful (limited pattern discovery).
- Low interpretability. It might be hard to understand why an algorithm, such as the logic for clustering, reached a particular conclusion.
- False positives. An unsupervised model might read too much into anomalous but unimportant data points without labels to teach it what’s worth attention.
- Hard to systematically evaluate. Since there is no “right” answer to compare it to, there’s no straightforward way to measure the accuracy or utility of the output. The issue can be somewhat mitigated by running different algorithms on the same data, but in the end, the measure of quality will be largely subjective.