Transformer and BERT in mlpack

Fri, May 29

Week 1 (18 May - 24 May)

Implementation of Softmax layer. (#2351)
Adding accessor methods for weight and bias in some layers in ann. (#2404)
Implementation of Multihead Attention layer. (#2375)

This week started with adding Softmax, an essential ANN layer to mlpack. It was important to add as Softmax is one of the layers in Attention models. I worked on Softmax layer earlier and it got merged this week. Then I started implementing Forward and Backward function of MultiheadAttention layer. I got introduced with visitor classes in mlpack as well the apply_visitor method in boost. They are really useful. While writing the functions in MultiheadAttention class, I found that the accessor methods for weight and bias of some ANN layers were missing. So I added them as well. For next week I will be focussing on to implement the Gradient function for MultiheadAttention class and writing tests.

One very interesting thing that I learnt this week which I wanted to share with everyone is to generate very large Prime Numbers. This Medium blog post explains it well. Also it will be fun to watch this. Thank You! 🙂