Visualizing Embedding Spaces

Modern embeddings have hundreds or thousands of dimensions. You can't visualize that. So we reduce.

The standard approaches:

PCA (Principal Component Analysis): linear. Captures global linear relationships. Fast. Use it when you want to find the major axes of variation in your data.
t-SNE: nonlinear. Constructs a low-dimensional representation where locally similar points stay close together. Good for revealing clusters. Distances in the t-SNE plot don't preserve global structure — two clusters that look far apart may not be far apart in the original space.
UMAP: nonlinear, based on manifold learning. Similar goals to t-SNE but typically faster and better at preserving some global structure too.

For large embedding spaces where you want to see clusters and local relationships, prefer t-SNE or UMAP over PCA.

TensorFlow

Visualize high-dimensional data — explore word embeddings in 2D and 3D using PCA, t-SNE, and UMAP.