Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma
I-HallA via PaliGemma ✨
This is an unofficial release of the paper Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering.
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering,
Youngsun Lim, Hojun Choi, Pin-Yu Chen, Hyunjung Shim [Paper][Supp][project page(TBD)][Bibetex]
Installation
This project is based on PaliGemma
It requires the following packages:
python==3.10.0
transformers==4.45.1
numpy==1.26.3
Testing
PaliGemma
The implementation based on PaliGemma achieves comparable results compared to the GPT-4o results reported in the paper.
Models | I-HallA Score (Science) | I-HallA Score (History) | I-HallA Score† (Science) | I-HallA Score† (History) |
---|---|---|---|---|
SD v1.4 | 0.253 | 0.435 | - | - |
SD v1.5 | 0.209 | 0.433 | - | - |
SD v2.0 | 0.236 | 0.440 | - | - |
SD XL | 0.298 | 0.479 | - | - |
DallE-3 | 0.561 | 0.566 | - | - |
Factual | 0.756 | 0.773 | - | - |
GPT-4o
Models | I-HallA Score (Science) | I-HallA Score (History) | I-HallA Score† (Science) | I-HallA Score† (History) |
---|---|---|---|---|
SD v1.4 | 0.353 ± 0.002 | 0.535 ± 0.013 | 0.033 ± 0.012 | 0.110 ± 0.010 |
SD v1.5 | 0.309 ± 0.011 | 0.533 ± 0.004 | 0.030 ± 0.017 | 0.117 ± 0.021 |
SD v2.0 | 0.336 ± 0.006 | 0.540 ± 0.014 | 0.027 ± 0.021 | 0.120 ± 0.010 |
SD XL | 0.398 ± 0.015 | 0.579 ± 0.012 | 0.077 ± 0.050 | 0.110 ± 0.066 |
DallE-3 | 0.661 ± 0.020 | 0.666 ± 0.003 | 0.227 ± 0.029 | 0.133 ± 0.031 |
Factual | 0.856 ± 0.002 | 0.873 ± 0.006 | 0.517 ± 0.038 | 0.533 ± 0.015 |
Citation
@inproceedings{ihalla,
title={Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering},
author={Youngsun Lim, Hojun Choi, Pin-Yu Chen, Hyunjung Shim},
year={2024},
booktitle={arXiv},
}
댓글남기기