A deep dive into the transformer architecture and self-attention mechanism that powers modern NLP models.