Thanks to independent advances in language and image generation, we could soon be in the position to have systems that communicate with us by combining language and images in their output, a skill that humans do not possess (we receive, but we do not produce images at high speed). This paper explores some of the implications of this idea: which kinds of data sets need to be developed to train such systems, in which cases language and images could be most usefully integrated and which issues could arise on the image generation and language+images integration side. Story and dialogue illustration could be relatively low-hanging fruits for this technology, and a looped combination of I2T LLMs and T2I diffusion models is likely to play a role in solving some of the issues that arise in the design of such systems.
One Picture and One Thousand Words: Toward integrated multimodal generative models / Zamparelli, Roberto. - In: IJCOL. - ISSN 2499-4553. - ELETTRONICO. - 10:2(2024), pp. 31-55. [10.17454/ijcol102.03]
One Picture and One Thousand Words: Toward integrated multimodal generative models
Zamparelli, Roberto
2024-01-01
Abstract
Thanks to independent advances in language and image generation, we could soon be in the position to have systems that communicate with us by combining language and images in their output, a skill that humans do not possess (we receive, but we do not produce images at high speed). This paper explores some of the implications of this idea: which kinds of data sets need to be developed to train such systems, in which cases language and images could be most usefully integrated and which issues could arise on the image generation and language+images integration side. Story and dialogue illustration could be relatively low-hanging fruits for this technology, and a looped combination of I2T LLMs and T2I diffusion models is likely to play a role in solving some of the issues that arise in the design of such systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
ijcol-1432.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
1.04 MB
Formato
Adobe PDF
|
1.04 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



