Feed-Forward Networks Are Forgetful
Everything we've seen so far in deep learning (fully-connected networks and CNNs) is feed-forward. Information flows once, in one direction, from input to output. Observations are treated as independent.
That's problematic for language. The meaning of "it" in "The dog ate the bone. It tasted good." comes entirely from the previous sentence. A feed-forward network has no mechanism to remember.
We need an architecture that carries information forward through time.
Enter: Recurrent Neural Networks