Transformer is a powerful architecture that can be difficult to understand. There are many great explanations on the web, each approaching the subject in a different way. Here I link the explanations I liked, and mention who I believe the target audience is for each one.

The goal is to provide a collection of links for you to choose from, but reading all of them is still helpful to engage with the concept from different perspectives & to cement your knowledge.

Transformers from scratch

Target audience: people with Machine Learning background

Comment/opinion: An all-around outstanding explanation that includes clear code & excellent illustrations. A personal favorite.

The transformer … “explained”?

Target audience: people with general Computer Science background

Comment/opinion: Excellent overview, motivation, and intuition. No pictures. No math. Short.

Formal Algorithms for Transformers

Target audience: mathematically-minded ML people

Comment/opinion: Very nice & clear formalism. No pictures.

The Illustrated Transformer

Target audience: ML people who know what an embedding is

Comment/opinion: Great illustrations. Explanation of self-attention was not intuitively clear to me.

Transformer - Illustration and code

Target audience: ML people who know what an embedding is & find reading code helpful

Description by the author: “This notebook combines the excellent illustration of the transfomer by Jay Alammar and the code annonation by harvardnlp lab.”

Opinion: Reading the code was very helpful for me. Math not rendered nicely. Should be read after The Illustrated Transformer.

The Annotated Transformer

Target audience: ML people who know what an embedding is, find reading code helpful, and are interested in full details of the original Transformer paper

Comment/opinion: This is a rearranged version of the paper intermingled with the code. Extensive. Math rendered nicely. Illustrations are ok.

That’s it! If you have suggestions on what else to include, send me an email :)