A new benchmark paper has created quite an uproar in the community. TruthfulQA is a dataset of 817 questions probing for imitative falsehoods where language models become less truthful, the larger they get. This surprising counter-intuitive finding validates many people’s criticisms of large language models, but is it really the correct conclusion?
0:00 - Intro
0:30 - Twitter Paper Announcement
4:10 - Large Language Models are to blame!
5:50 - How was the dataset constructed?
9:25 - The questions are adversarial
12:30 - Are you surprised?!