The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task, which aims to analyze and detect flooding events in multimedia content shared over Twitter. In total, we proposed four different solutions including a multi-modal solution combining textual and visual information for the mandatory run, and three single modal image and text-based solutions as optional runs. In the multi-modal method, we rely on a supervised multimodal bitransformer model that combines textual and visual features in an early fusion, achieving a micro F1-score of .859 on the development data set. For the text-based flood events detection, we use a transformer network (i.e., pretrained Italian BERT model) achieving an F1-score of .853. For image-based solutions, we employed multiple deep models, pre-trained on both, the ImageNet and Places data sets, individually and combined in an early fusion achieving F1-scores of .816 and .805 on the development set, respectively.
Flood detection via twitter streams using textual and visual features / Alam, F.; Hassan, Z.; Ahmad, K.; Gul, A.; Riegler, M. A.; Conci, N.; Al-Fuqaha, A.. - 2882:(2020). (Intervento presentato al convegno Multimedia Evaluation Benchmark Workshop 2020, MediaEval 2020 tenutosi a online nel 2020).
Flood detection via twitter streams using textual and visual features
Alam F.;Hassan Z.;Ahmad K.;Conci N.
;
2020-01-01
Abstract
The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task, which aims to analyze and detect flooding events in multimedia content shared over Twitter. In total, we proposed four different solutions including a multi-modal solution combining textual and visual information for the mandatory run, and three single modal image and text-based solutions as optional runs. In the multi-modal method, we rely on a supervised multimodal bitransformer model that combines textual and visual features in an early fusion, achieving a micro F1-score of .859 on the development data set. For the text-based flood events detection, we use a transformer network (i.e., pretrained Italian BERT model) achieving an F1-score of .853. For image-based solutions, we employed multiple deep models, pre-trained on both, the ImageNet and Places data sets, individually and combined in an early fusion achieving F1-scores of .816 and .805 on the development set, respectively.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione