Weekly updates of the project - Transformer and BERT in mlpack
Author: mrityunjay-tripathi
Tue, June 23
Oh this week! This week WW3 was trending on twitter for second time in 2020 😂.
So, as I am working on the MultiheadAttention class, more and more things are getting clear. This week I had to work on correcting some of the equations in Backward function and implementing the Gradient function. I also had to introduce regularizer in MultiheadAttention class. One of the regularizer method that is used while training the Transformer model is Label Smoothing. So in the next week I will be exploring ways to add it to mlpack.
I had to introduce another method for initialization of pre-allocated matrices and cubes for each initialization rules. This was decided to be done to prevent accidently mutating any pre-allocated matrix or cube.
The ongoing border conflict between India and China made me read a few war stories 😬. One of them, which I didn't knew about is The Sino-Soviet Border War, 1969.