ICCV 2019 Workshop on

Cross-Modal Learning in Real World

Room 301 COEX Convention Center, Seoul, Korea
Saturday afternoon, November 2, 2019


To understand the world around us more intelligently and better, it needs to be able to interpret multimodal signals together. This is because humans routinely perform complex vision tasks which involve interactions among different modalities. With the rapid growth of multimodal data (e.g., image, video, audio, depth, IR, text, sketch, synthetic, etc.), cross-modal learning, which aims to develop techniques that can process and relate information across different modalities, has drawn increasing attention recently. It is a vibrant multidisciplinary field of increasing importance and with extraordinary potential. It has been widely applied to many tasks such as cross-modal retrieval, phrase localization, visual dialogue, visual captioning, visual question answering, language-based person search/action detection/semantic segmentation, etc.

However, real world applications pose various challenges to cross modal learning, such as limited training data, multimodal content imbalance, large visual-semantic discrepancy, cross-dataset discrepancy, missing modalities, etc. To address these challenges, quite a lot of attempts motivated from various perspectives (including visual attributes, data generation, meta-learning, etc.) have been made and appeared in top conferences (including CVPR, ICCV, ECCV, NIPS and ICLR) and top journals (including TPAMI and IJCV). However, those mentioned challenges are far from being solved. The goal of this workshop is to encourage researchers to present high quality work and to facilitate effective discussions on the potential solutions to those challenges.


We encourage researchers to study and develop new cross-modal learning methods that can address various practical challenges, and yet exhibit good discriminability and robustness in real applications. We are soliciting original contributions including, but not limited to:

1) Cross-modal representation, e.g., hybrid feature representations and adversarial sample generation.

2) Cross-modal generation, e.g., visual captioning, text-based image/video generation and visual question answering.

3) Zero/few-shot cross-modal learning, e.g., zero/few-shot cross-modal retrieval/localization.

4) Cross-modal alignment, e.g., referring expression, dense cross-modal retrieval and phrase localization.

5) Cross-modal fusion, e.g., audio-visual speech recognition and text-based image classification.

6) Cross-dataset adaptation in cross-modal learning, e.g., cross-dataset generalization.

7) Binary cross-modal learning, e.g., binary cross-modal retrieval.

8) Unsupervised/semi-supervised cross-modal learning.

9) New applications of existing cross-modal learning methods.

Paper Submission

All submissions will be handled electronically via the workshop’s CMT Website. Click the following link to go to the submission site: https://cmt3.research.microsoft.com/CroMoL2019.

The authors will submit full length papers (ICCV format) online, including:

1) Title of paper and short abstract summarizing the main contribution,

2) Names and contact info of all authors, also specifying the contact author,

3) Contributions must be written and presented in English,

4) The paper in PDF format.

All submissions will be peer-reviewed by at least 3 members of the program committee.

Event Date
Paper Submission DeadlineAugust 1, 2019
Notification of AcceptanceAugust 25, 2019
Camera-ready dueAugust 30, 2019
Workshop (Half day, afternoon)November 2, 2019



13:00 - 13:05 . Welcome Introduction

13:10 - 13:50 . Oral Session (2 presentations: 20min each)

13:55 - 14:35 . Invited Talk (Talk 1)

14:40 - 15:20 . Invited Talk (Talk 2)

15:25 - 16:05 . Invited Talk (Talk 3)

16:10 - 17:00 . Poster Session (12 posters)

17:00 - 17:05 . Closing Remarks


Qi Wu
University of Adelaide
Li Liu
NUDT & University of Oulu
Matti Pietikäinen
University of Oulu

Please contact Yan Huang if you have question. The webpage template is by the courtesy of awesome Georgia.