Stop Words

Stop Word Removal

Many common words — the, of, and, is — appear so frequently that they swamp the signal in your features. Stop word removal drops them so the model can focus on what carries meaning.

NLTK ships with a default English stop word list, but you can absolutely add to it. If you're classifying product reviews, the word "product" is technically informative but in practice useless, since it appears in every document. Add it.

Apply stop word removal to our example tokens and watch what gets stripped:

Stop Word Removal

Before13 tokens

Whichstop

class

isstop

thestop

best

class

atstop

Duke

?stop

Deep

Learning

Applications

.stop

filtered

After7 tokens6 removed

classbestclassDukeDeepLearningApplications

KeptStop word (removed)

Tokens struck through in red are NLTK stop words. The filtered list keeps only content-bearing words.

◆

Real World: Search, Sentiment, Legal Archives

Search engines and traditional sentiment classifiers still rely on this exact pipeline. When Grammarly checks for plagiarism, when a hospital tags chart notes, when a law firm searches its case archive — these three preprocessing steps are the first thing that happens to your text.

NLTK stop word list — NLTK's default stop word list

←PreviousTokenizationText Preprocessing Next→Stemming and LemmatizationText Preprocessing