ICCV 2019 Workshop onCross-Modal Learning in Real World |
||
Room 307A COEX Convention Center, Seoul, Korea |
To understand the world around us more intelligently and better, it needs to be able to interpret multimodal signals together. This is because humans routinely perform complex vision tasks which involve interactions among different modalities. With the rapid growth of multimodal data (e.g., image, video, audio, depth, IR, text, sketch, synthetic, etc.), cross-modal learning, which aims to develop techniques that can process and relate information across different modalities, has drawn increasing attention recently. It is a vibrant multidisciplinary field of increasing importance and with extraordinary potential. It has been widely applied to many tasks such as cross-modal retrieval, phrase localization, visual dialogue, visual captioning, visual question answering, language-based person search/action detection/semantic segmentation, etc.
However, real world applications pose various challenges to cross modal learning, such as limited training data, multimodal content imbalance, large visual-semantic discrepancy, cross-dataset discrepancy, missing modalities, etc. To address these challenges, quite a lot of attempts motivated from various perspectives (including visual attributes, data generation, meta-learning, etc.) have been made and appeared in top conferences (including CVPR, ICCV, ECCV, NIPS and ICLR) and top journals (including TPAMI and IJCV). However, those mentioned challenges are far from being solved. The goal of this workshop is to encourage researchers to present high quality work and to facilitate effective discussions on the potential solutions to those challenges.
We encourage researchers to study and develop new cross-modal learning methods that can address various practical challenges, and yet exhibit good discriminability and robustness in real applications. We are soliciting original contributions including, but not limited to:
1) Cross-modal representation, e.g., hybrid feature representations and adversarial sample generation.
2) Cross-modal generation, e.g., visual captioning, text-based image/video generation and visual question answering.
3) Zero/few-shot cross-modal learning, e.g., zero/few-shot cross-modal retrieval/localization.
4) Cross-modal alignment, e.g., referring expression, dense cross-modal retrieval and phrase localization.
5) Cross-modal fusion, e.g., audio-visual speech recognition and text-based image classification.
6) Cross-dataset adaptation in cross-modal learning, e.g., cross-dataset generalization.
7) Binary cross-modal learning, e.g., binary cross-modal retrieval.
8) Unsupervised/semi-supervised cross-modal learning.
9) New applications of existing cross-modal learning methods.
All submissions will be handled electronically via the workshop’s CMT Website. Click the following link to go to the submission site: https://cmt3.research.microsoft.com/CroMoL2019.
The authors will submit full length papers (ICCV format) online, including:
1) Title of paper and short abstract summarizing the main contribution,
2) Names and contact info of all authors, also specifying the contact author,
3) Contributions must be written and presented in English,
4) The paper in PDF format.
All submissions will be double-blind peer-reviewed by at least 3 members of the program committee.
Paper Submission Deadline | August 7, 2019 |
Notification of Acceptance | August 25, 2019 |
Camera-ready due | August 30, 2019 |
Workshop (Half day, afternoon) | November 2, 2019 |
13:00 - 13:05 . Welcome Introduction
13:10 - 13:50 . Oral Session (2 presentations: 20min each)
13:55 - 14:35 . Invited Talk (Bohyung Han)
14:40 - 15:20 . Invited Talk (Lior Wolf)
15:25 - 16:25 . Poster Session
16:30 - 17:10 . Invited Talk (Honglak Lee)
17:10 - 17:15 . Closing Remarks