A summary of a revolutionary paper “Attention is All You Need” and Implementing the Transformer using PyTorch — I have been a Machine Learning Engineer for almost 4 years now, I started with what is now called the “Classical Models”, Logistic, Tree-based, Baysian, etc, and since last year has moved into Neural Networks and Deep Learning. I would say I did pretty well, that was until my attention…