Demonstration results of multi-modal instruction. (IMAGE)
Caption
Demonstration results of multi-modal instruction. The first row lists the visual stimulus, whereas the second row depicts our intermediate reconstructions. The manipulation results via the instruction of “In the [V] style” are shown within red boxes of the last row. The images are sourced from Microsoft COCO.
Credit
Visual Intelligence, Tsinghua University Press
Usage Restrictions
News organizations may use or redistribute this image, with proper attribution, as part of news coverage of this paper only.
License
Original content