When to Still Use Traditional ML for NLP

These models share a fatal flaw: they don't carry context across long distances, they can't handle homonyms or synonyms, and their representations are sparse and inefficient.

However, they're computationally cheap, easy to implement, and they don't require a GPU. For text classification or clustering where context doesn't matter much, a TF-IDF + logistic regression baseline often gets you 90% of the way there in 10% of the engineering time.

✦

Always Build a Stupid Baseline First

A TF-IDF logistic regression tells you whether your fancy model is actually doing anything. If your transformer can't beat a bag-of-words baseline by a meaningful margin, you should ask whether the task actually requires context. Many surprisingly don't.

Checkpoint

You're building a customer support ticket classifier for a company with 12 labeled intent categories and ~50,000 tickets of training data. Describe the baseline you would build first and justify your choice.

←PreviousHidden Markov ModelsTraditional Approaches to NLP Next→Why We Need a Better RepresentationWord Embeddings