cid:
"bafyreig7mpdbkozrrzgesm7gp6kklfyyc7i6ldgty527xrhcgvnre2cq3y"
value:
text:
"FWIW, I don’t really buy that argument because the model isn’t just trained on reproducing previous rationales, but through an RL process with an external reward (but again, that’s been true since RLHF)"
$type:
"app.bsky.feed.post"
langs:
"en"
reply:
root:
cid:
"bafyreibahbpz5pc6g4svmlmwgfqdcjzvpqr4cbm25rqgclsk4nly6ahhma"
parent:
cid:
"bafyreib4vptgrmrjoyatkn4slgmvdsipna222o4l3m6fz3itchkxcoi5ae"
createdAt:
"2024-09-13T15:36:41.529Z"
success:
true
identity:
@context:
alsoKnownAs:
verificationMethod:
type:
"Multikey"
controller:
"did:plc:35r6coyqassnzhfg276rcqib"
publicKeyMultibase:
"zQ3shSDQuVq5nQmytjnoDZoFq6LWGPuo9HGzv6XnnvJT81S1A"
service:
id:
"#atproto_pds"
type:
"AtprotoPersonalDataServer"
serviceEndpoint:
"https://morel.us-east.host.bsky.network"