Weekly updates of the project - Transformer and BERT in mlpack
Author: mrityunjay-tripathi
Mon, August 24
Hello everyone! Here is the update for week 11 and 12. Time flew by and GSoC came to an end. The last two weeks were quite busy and we got some important things done.
1. The Positional Encoding layer got merged. We faced some memory issues initially but it looked it existed in some other ANN module(s) as valgrind didn't show any error in Positional Encoding.
2. The Linear3D layer also got merged. This PR also had similar memory issues but valgrind didn't show any error in Linear3D.
3. We completed Multihead Attention layer. Some API transformations was done in this PR in these two weeks which got finalized now.
4. The Transformer model is also completed. There were some tricky things in this PR but thanks to Mikhail who helped get it sorted.
5. The BERT model is also implemented and we are left with implementing the tokenizers. Hopefully this will be done by next two weeks (post-GSoC).
I faced a few tricky problems in these two weeks. Implementing the Transformer model along the line of new API for Multihead Attention was tricky and involved some extensive use of various layers from mlpack ann.
There are a few things still remaining to be done such as:
Implementing tokenizers for BERT which undergo special tokenization techniques.
Load pre-trained weights of BERT-Base model from tensorflow to mlpack.
Add example for Language Modelling in examples repository.