Transformer and BERT in mlpack

Mon, July 13

Hello, everyone! Hope everything is going smooth. This week I managed to complete a few things. To mention them:

Completed multihead attention layer, a long wait. Actually, changing the approach and getting concept more clearer helped me implement it finally.
Implemented 3 dimensional Linear layer, since it is required in some cases. More details here.
Fixed Lookup layer to use it for batch size greater than 1. It is used as the first layer in the Transformer Model.
Added documentation for scaled dot product attention. That was not much but a thing hanging on my list.

This was it for this week. Not much but few things got (almost) finalized. You can help me with reviews and/or ideas on any of my pull requests here.

See you next time. Be Safe! Be Healthy!

GSoC with mlpack