GSoC with mlpack

Weekly updates of the project - Transformer and BERT in mlpack

Author: mrityunjay-tripathi

Transformer and BERT in mlpack

Mon, August 24

Week 11 and 12 (10 August - 23 August)

Hello everyone! Here is the update for week 11 and 12. Time flew by and GSoC came to an end. The last two weeks were quite busy and we got some important things done.

1. The Positional Encoding layer got merged. We faced some memory issues initially but it looked it existed in some other ANN module(s) as valgrind didn't show any error in Positional Encoding.

2. The Linear3D layer also got merged. This PR also had similar memory issues but valgrind didn't show any error in Linear3D.

3. We completed Multihead Attention layer. Some API transformations was done in this PR in these two weeks which got finalized now.

4. The Transformer model is also completed. There were some tricky things in this PR but thanks to Mikhail who helped get it sorted.

5. The BERT model is also implemented and we are left with implementing the tokenizers. Hopefully this will be done by next two weeks (post-GSoC).

I faced a few tricky problems in these two weeks. Implementing the Transformer model along the line of new API for Multihead Attention was tricky and involved some extensive use of various layers from mlpack ann.

There are a few things still remaining to be done such as:

Back