novice ai researcher & engineer

Joined August 2025
Photos and videos
Today I learned about git rebase, here is a mini-thread about it:
1
7
Why rebase? - Cleaner history - Easier to read logs - Looks like your work was always based on the newest main
1
6
But be careful!!! rebase rewrites history. Never rebase shared branches. Safe rule: only rebase your own local branches.
6
Ok, so I get there are two main approaches for solving RL problem - value-based method and policy-based method. But is there a rule of thumb for choosing between them?
1
2
Let me explain the attention mechanism in Transformers really simply. Let's say there is this sentence: "The animal didn't cross the street because it was too tired." We want the model to focus on "the animal" when processing "it" - like humans do.
1
2
Finally, we use these weights to combine the Value vectors from all words. This gives "it" a context vector - a smart blend of relevant info. In our example, the weight for "animal" will be the largest, so most of the context vector for "it" comes from "animal’s" Value.
1
And that’s attention: 1. Q asks the question 2. K says “this is what I am” 3. dot product of Q and K gives a score 4. V provides the answer 5. Softmax decides who to listen to most
Why do we need value function when we can just try to maximize the reward? tl;dr - unfairness: someone who worked hard and improved a lot must not be penalized because of low absolute reward - instability: we should aim for stable 90 , not one time 100 with extreme strategy
1
3