Handling Variable-Length Sequences
Natural text comes in every possible length. Your model expects fixed-size inputs. Two fixes:
Padding. Add zeros (or <PAD> tokens) to the end of shorter sequences until every sequence in the batch is the same length. Then provide a padding mask so the model knows to ignore those positions. This mask can be used in the loss calculation, in RNN hidden states, or in attention scores. Without a mask, your model will attend to padding tokens as if they were meaningful, and the loss signal will be polluted.
Packed Sequences (PyTorch). An optimization for RNNs that only processes actual tokens, not padding. At each time step, the batch size shrinks as shorter sequences finish. Same correctness, less wasted compute.
Without a mask, the model sees [PAD] tokens as real input. Enable the mask to fix this.
pack_padded_sequence / pad_packed_sequence.Toggle between padding and packing to see how the same four sentences are handled differently. Use the mask toggle on the padding view to see how [PAD] tokens get excluded from computation.