Ana Carolina Condez, Diogo Tavares, João Magalhães
NOVA School of Science and Technology (FCT NOVA), NOVA LINCS — Lisbon, Portugal
Our model aligns multimodal representations across five fundamental moral dimensions, each with opposing virtue– vice pairs.
MoralCLIP extends multimodal learning with explicit moral grounding based on Moral Foundations Theory (MFT). By integrating visual and textual moral cues into a unified embedding space, the model aligns inputs by shared moral meaning—not only by semantic similarity—enabling morally-aware cross-modal retrieval and analysis.
See full abstract in the paper.
Coming Soon
The MoralCLIP dataset provides multi-label annotations for the five Moral Foundations (care, fairness, loyalty, authority, purity) across image–text pairs. It is designed for training and evaluating morally-aware multimodal models.
If you use MoralCLIP in your research, please cite:
@inproceedings{10.1145/3746027.3758166,
author = {Condez, Ana Carolina and Tavares, Diogo and Magalh\~{a}es, Jo\~{a}o},
title = {MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory},
year = {2025},
isbn = {9798400720352},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3746027.3758166},
doi = {10.1145/3746027.3758166},
booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia},
pages = {12399–12408},
numpages = {10},
keywords = {ai, clip, ethics, mft, moral, moral foundations, moralclip},
location = {Dublin, Ireland},
series = {MM '25}
}
Code and models will be released under a permissive research license. Portions of the dataset leverage SMID (Crone et al., 2018) annotations; please consult original licenses for any third-party data.