Multimodal datasets for emotion/sentiment recogonition

Good morning community. Sorry, do you may know some multimodal datasets (image/video + text) for emotion or sentiment recognition. I found some, but not many combining text + visual information. Some of my findings are below:. Any resources are greatly appreciated. Thanks!

  • Emotions
    • Emotionet:
    • Task (image, set of keywords) → emotion
    • Datasets:
      • SE30K8
      • StockEmotion
      • DE [52]: 23K images + 8 emotions
      • UBE [33]: 3K images + 6 emotions
      • AffectNet [32]: 400K facial images + 10 categories
      • SE30K8: Subset of StockEmotion dataset
      • EMOTIC: 18K images + 26 categories
      • EMTIC-B [21]
      • EMTIC-I [21]
    • Other baselines:
      • DeepEmotion [52]
      • UnBiasedEmotion [33]
      • Kosti et al. [21]
  • Intent prediction
  • Human multimodal emotion recognition
    • CMU-MOSI [24]: 2K monologue videoclips. Sentiment score: ranging from -3 (strongly negative) to 3 (strongly positive)
    • CMU-MOSEI [23]: 22K movie review video clips, and sentiment ranging from -3 to 3
    • IEMOCAP [2]: 4K video clips, + 4 emotions: happy, sad, angry and neutral)
  • Distributional visual emotion datasets
    • Flickr LDL: 11K images + 8 emotions
    • Twitter LDL [39]: 10K images + 8 emotions
    • Abstract Paintings [21]: 8 emotions
  • Multimedia emotion causality
    • SENDv1: Video clips about people sharing life stories + valence ratings
    • MovieGraphs: movie clips about social situations + actor relationships + 8 emotions
    • LIRIS-ACCEDE: ??
  • Video emotion recognition
    • SEWA [32]: emotion dataset with valence and arousal