Why Nostr? What is Njump?
2023-03-12 23:46:03

Andrej Karpathy / @karpathy (RSS Feed) on Nostr: Dropout layers in a Transformer leak the phase bit (train/eval) - small example. So ...

Dropout layers in a Transformer leak the phase bit (train/eval) - small example. So an LLM may be able to determine if it is being trained and if backward pass follows. Clear intuitively but good to see, and interesting to think through repercussions of

colab.research.google.com/dr… (https://colab.research.google.com/drive/1286r553N8drh6-VeZjZA1vbUBY9Z1fps?usp=sharing)

https://nitter.moomoo.me/pic/card_img%2F1634876168636874752%2FCaT1snPL%3Fformat%3Dpng%26name%3D280x280_2

https://nitter.moomoo.me/karpathy/status/1635049541534879745#m
Author Public Key
npub1rj7u39tvjdgfpzg3c3xfym6vzalt34p7t5uvdsqhzgst9jtl7dgqs2ffmk