Contents Science Lab

Nagoya University Graduate School of Informatics
Back to project list

Project: Image captioning considering imageability

In this research, we aim to generate image captions tailored to the actual usage of image captions. To achieve this, we consider psycholinguistic measurements during the generation of captions.

For example, in order for visually impaired people to understand the image content, captions that describe the image content in as much detail as possible are preferred. On the other hand, for images in news articles, captions that include the content of the news article are preferred to the description of the image content.

There are various situations in image captions used in the real world, and the desired properties differ depending on each application. Aiming to generate captions according to these, Ide Laboratory is working on image captioning that freely adjusts the details of the explanation of the image contents.

Therefore, consider “Imageability”, which is a measurement of showing the ease of imagining a word’s content. Including this into caption generation, it is possible to tailor the caption to an intended degree of visualness. In the resulting model, if you input an image into the caption model and specify a low value, a concise caption will be generated. In contrast, if you specify a high value, a caption that describes the image content in detail will be generated.

Featured image

Main project members

Avatar image
Kazuki Umemura

Finished Master in AY2020

Avatar image
Dr. Marc A. Kastner

Cooperative Research Fellow (Hiroshima City University)

Recent publications

    Imageability- and length-controllable image captioning
    Marc A. Kastner, Kazuki Umemura, Ichiro Ide, Yasutomo Kawanishi, Takatsugu Hirayama, Keisuke Doman, Daisuke Deguchi, Hiroshi Murase, Shin'ichi Satoh
    IEEE Access, vol.9, pp.162951-162961, November 2021.
    Tell as you imagine: Sentence imageability-aware image captioning
    Kazuki Umemura, Marc A. Kastner, Ichiro Ide, Yasutomo Kawanishi, Takatsugu Hirayama, Keisuke Doman, Daisuke Deguchi, Hiroshi Murase
    MultiMedia Modeling -27th Int. Conf., MMM2021, Prague, Czech Republic, June 22-24, 2021, Procs., Part II, Jakub Lokoč, Tomáš Skopal, Klaus Schoeffmann, Vasileios Mezaris, Xirong Li, Stefanos Vrochidis, Ioannis Patraseds., Lecture Notes in Computer Science, vol. 12573, pp.62-73, Online, June 2021.
    TBA (in Japanese)
    Kazuki Umemura, Marc Aurel Kastner, Ichiro Ide, Yasutomo Kawanishi, Takatsugu Hirayama, Keisuke Doman, Daisuke Deguchi, Hiroshi Murase
    Meeting on Image Recognition and Understanding (MIRU) 2020, no.IS3-2-1, Online, August 2020.
    A study on image captioning considering its imageability (in Japanese)
    Kazuki Umemura, Marc Aurel Kastner, Ichiro Ide, Yasutomo Kawanishi, Takatsugu Hirayama, Keisuke Doman, Daisuke Deguchi, Hiroshi Murase
    IEICE Tech. Rep. Media Experience and Virtual Environment, MVE2019-69; MVE Award, Online, March 2020.
    Estimating the imageability of a sentence for image caption evaluation
    Kazuki Umemura, Marc Aurel Kastner, Ichiro Ide, Yasutomo Kawanishi, Daisuke Deguchi, Hiroshi Murase
    Japan-Taiwan Joint Workshop on Multimedia and HCI, National Cheng-Kung University (Tainan, Taiwan), April 2019.
    TBA (in Japanese)
    Kazuki Umemura, Marc Aurel Kastner, Ichiro Ide, Yasutomo Kawanishi, Takatsugu Hirayama, Keisuke Doman, Daisuke Deguchi, Hiroshi Murase
    Proc. 25th ANLP Annual Meeting, no.A4-9; pp.755-758, Nagoya Univ, March 2019.


Last updated: 2024-10-10 00:31:46.913773872 +0000 UTC m=+1.279993518.