Weekly updates of the project - Transformer and BERT in mlpack
Author: mrityunjay-tripathi
Mon, June 15
Time flies by. Second week is over. Last week I successfully implemented the Forward function of MultiheadAttention class. This week I worked to implement Backward function. The MultiheadAttention class is somewhat different in a way that it takes three inputs and in turn makes Backward propagation a bit complicated. I will have to add more tests for it to make sure of the validity of implementation. The code in it's current form is a little bit messy, so I plan to clean it up and make it more robust by the next week. Also, by this week I wish to implement Gradient function for MultiheadAttention class and that should make it ready for review. You can help me with the reviews and/or ideas here.
I also encountered an issue in Initialization classes in ann. You can see pull request #2404. Most probably it will be sorted in one or two days and the pull request should be ready to be merged.
Will keep you posted for the next week. See you soon. Be Healthy! Be Safe!