Transformer from scratch

My own implementation of an encoder-decoder Transformer, trained on a random sequence dataset.