GSoC with mlpack

Weekly updates of the project - Transformer and BERT in mlpack

Author: mrityunjay-tripathi

Transformer and BERT in mlpack

Mon, July 27

Week 7 and 8 (13 July - 26 July)

Hello, everyone! I'm really sorry that I could not update for the previous week. Hence, here is the update for both weeks.

There were some problem with the BLEU Score Metric. One interesting error was that negative value was assigned to a variable of type size_t. I always thought if negative value is assigned to a variable of type size_t, it would become zero but actually it will overflow and 18446744073709551614 will be stored in place of the negative value (I know you skipped that number!). There were some other optimizations and minor bug fixes as well. You can view the pull request here.

The Lookup layer now looks ready. I added Gradient test also as the original test didn't looked convincing. The Lookup layer doesn't require Backward function because it is always used as the first layer in the network. So, I removed it. You can view the pull request here.

There were some success regarding the Multihead Attention layer also. Most of the test now pass. The Backward function works smoothly now. I had to change some things silly and as small as changing from 'true' to 'false' 😀 but took considerable time to spot it. Removing the dropout layer also helped. There was too high error (of the order of 1e+06) which now came down to 1e-02.

Gradient descent was not as normal as I thought 😛. Probably some problem exist with reshaping of some matrices. You can view the pull request here.

I also tried Transformer Encoder layer to work using the scaled dot product attention instead of multihead attention but since required pull requests are not merged it turned a little tricky. I ended up getting lots of armadillo matrix errors 😛. It will be easier if #2500 or #2375 gets merged first.

This is it for now. See you next time. Be Safe! Be Healthy!

Back