Question 1

What is the vanishing gradient problem?

Accepted Answer

During training, gradients can shrink exponentially as they propagate through many time steps, making it impossible for the network to learn from distant past inputs. LSTMs solve this with gates that allow gradients to flow through unchanged.

Question 2

Should I use an RNN or a transformer?

Accepted Answer

Use transformers for most text and language tasks — they are more accurate and faster to train. Use RNNs for real-time streaming data, resource-constrained devices, or when model size must be small. Transformers have largely won for offline batch processing.

Question 3

What is the difference between LSTM and GRU?

Accepted Answer

LSTM has three gates (forget, input, output) and a cell state. GRU has two gates (reset, update) and is simpler. GRU trains faster and performs comparably on many tasks. LSTM is better for very long sequences where the extra gating provides benefit.

Recurrent Neural Network Explained

Explanation

Bookuvai Implementation

Key Facts

Related Terms

Frequently Asked Questions