Stop Words

Stop Word Removal

Many common words — the, of, and, is — appear so frequently that they swamp the signal in your features. Stop word removal drops them so the model can focus on what carries meaning.

NLTK ships with a default English stop word list, but you can absolutely add to it. If you're classifying product reviews, the word "product" is technically informative but in practice useless, since it appears in every document. Add it.

Apply stop word removal to our example tokens and watch what gets stripped:

Stop Word Removal
Before13 tokens
Whichstop
class
isstop
thestop
best
class
atstop
Duke
?stop
Deep
Learning
Applications
.stop
filtered
After7 tokens6 removed
classbestclassDukeDeepLearningApplications
KeptStop word (removed)

Tokens struck through in red are NLTK stop words. The filtered list keeps only content-bearing words.

Real World: Search, Sentiment, Legal Archives

Search engines and traditional sentiment classifiers still rely on this exact pipeline. When Grammarly checks for plagiarism, when a hospital tags chart notes, when a law firm searches its case archive — these three preprocessing steps are the first thing that happens to your text.

NLTK stop word list
NLTK's default stop word list