[p] I Made my first Transformer architecture code
In this code I have used pytorch & math to make all the blocks of the transformer as a seperate class and then calling them into the original transformer class . I have used all the parameters as suggested in the original paper , encoding size 512, 6 layers and 8 multi head layers.
My question- Is there any better way to optimize this before I train this
Also what dataset is good for T4 gpu (google colab) This is the link of my code-
https://github.com/Rishikesh-2006/NNs/blob/main/Pytorch%2FTransformer.ipynb
submitted by /u/Jumbledsaturn52
[link] [comments]
Like
0
Liked
Liked