Transformer and BERT in mlpack

Tue, June 23

Week 3 (15 June - 21 June)

Oh this week! This week WW3 was trending on twitter for second time in 2020 😂.

So, as I am working on the MultiheadAttention class, more and more things are getting clear. This week I had to work on correcting some of the equations in Backward function and implementing the Gradient function. I also had to introduce regularizer in MultiheadAttention class. One of the regularizer method that is used while training the Transformer model is Label Smoothing. So in the next week I will be exploring ways to add it to mlpack.

I had to introduce another method for initialization of pre-allocated matrices and cubes for each initialization rules. This was decided to be done to prevent accidently mutating any pre-allocated matrix or cube.

The ongoing border conflict between India and China made me read a few war stories 😬. One of them, which I didn't knew about is The Sino-Soviet Border War, 1969.