🏠
Author: naitian.org (did:plc:35r6coyqassnzhfg276rcqib)

Record🤔

uri:
"at://did:plc:35r6coyqassnzhfg276rcqib/app.bsky.feed.post/3l42byq5you2a"
cid:
"bafyreig7mpdbkozrrzgesm7gp6kklfyyc7i6ldgty527xrhcgvnre2cq3y"
value:
text:
"FWIW, I don’t really buy that argument because the model isn’t just trained on reproducing previous rationales, but through an RL process with an external reward (but again, that’s been true since RLHF)"
$type:
"app.bsky.feed.post"
langs:
  • "en"
reply:
root:
cid:
"bafyreibahbpz5pc6g4svmlmwgfqdcjzvpqr4cbm25rqgclsk4nly6ahhma"
parent:
cid:
"bafyreib4vptgrmrjoyatkn4slgmvdsipna222o4l3m6fz3itchkxcoi5ae"
createdAt:
"2024-09-13T15:36:41.529Z"