Back to the research area

Visual-semantic domain adaptation in Digital Humanities

While several approaches to bring vision and language together are emerging, none of them has yet addressed the digital humanities domain, which, nevertheless, is a rich source of visual and textual data. To foster research in this direction, we investigate the learning of visual-semantic embeddings for historical document illustrations and data from the digital humanities, devising both supervised and semi-supervised approaches.



We have developed a domain adaptation model for cross-modal retrieval, in which the knowledge learned from a supervised dataset can be transferred on a target dataset in which the pairing between images and sentences is not known, or not useful for training due to the limited size of the set. In a nutshell, two auto-encoders process visual and textual data and produce an intermediate representation for both modalities. These representations can be used to create a common embedding space in which images and corresponding sentences can be projected and compared. A semi-supervised visual-semantic alignment is exploited to match images and captions coming from a target domain, different from that used to train the model.

EsteArtworks dataset

The EsteArtworks dataset contains 553 artworks of the Este family collection, which comprises Italian paintings and sculptures from thefourteenth to the eighteenth centuries. For each artwork, we collect at least one sentence describing the visual content of the artwork itself, without leveraging on personal cultural background regarding the opera or the depicted characters. Overall, we it contains 1.278 textual descriptions.

Publications

1 Carraggi, Angelo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita "Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach" European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8-14 September 2018, 2018 Conference
2 Baraldi, Lorenzo; Cornia, Marcella; Grana, Costantino; Cucchiara, Rita "Aligning Text and Document Illustrations: towards Visually Explainable Digital Humanities" Proceedings of the 24th International Conference on Pattern Recognition, Beijing, China, pp. 1097 -1102 , August 20th-24th, 2018, 2018 | DOI: 10.1109/ICPR.2018.8545064 Conference

Research Activity Info