The earth sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, access to extensive computing resources and the maturity of deep learning techniques. Vision transformers (ViT) architectures have been adapted for image and image-like data, such as EO data and ESM simulation output. Pretraining foundation models is a compute intensive process, often requiring 105 - 107 GPU hours for large scale scientific applications. There is a limited body of knowledge on compute optimal methods for pretraining, necessitating a trial and error process. We have performed a series of experiments using ViT backbones at different scales to understand optimal and cost-effective ways to improve scientific throughput. This preliminary benchmark provides an assessment of which architectures and model configurations are favorable in a given scientific context.

Exploring Vision Transformers on the Frontier Supercomputer for Remote Sensing and Geoscientific Applications / Anantharaj, Valentine; Kurihana, Takuya; Dash, Sajal; Padovani, Gabriele; Fiore, Sandro. - (2024), pp. 3085-3088. ( 2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024 Athens, Greece 2024) [10.1109/igarss53475.2024.10640929].

Exploring Vision Transformers on the Frontier Supercomputer for Remote Sensing and Geoscientific Applications

Padovani, Gabriele;Fiore, Sandro
2024-01-01

Abstract

The earth sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, access to extensive computing resources and the maturity of deep learning techniques. Vision transformers (ViT) architectures have been adapted for image and image-like data, such as EO data and ESM simulation output. Pretraining foundation models is a compute intensive process, often requiring 105 - 107 GPU hours for large scale scientific applications. There is a limited body of knowledge on compute optimal methods for pretraining, necessitating a trial and error process. We have performed a series of experiments using ViT backbones at different scales to understand optimal and cost-effective ways to improve scientific throughput. This preliminary benchmark provides an assessment of which architectures and model configurations are favorable in a given scientific context.
2024
IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium
345 E 47TH ST, NEW YORK, NY 10017 USA
Institute of Electrical and Electronics Engineers Inc.
9798350360325
Anantharaj, Valentine; Kurihana, Takuya; Dash, Sajal; Padovani, Gabriele; Fiore, Sandro
Exploring Vision Transformers on the Frontier Supercomputer for Remote Sensing and Geoscientific Applications / Anantharaj, Valentine; Kurihana, Takuya; Dash, Sajal; Padovani, Gabriele; Fiore, Sandro. - (2024), pp. 3085-3088. ( 2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024 Athens, Greece 2024) [10.1109/igarss53475.2024.10640929].
File in questo prodotto:
File Dimensione Formato  
Exploring_Vision_Transformers_on_the_Frontier_Supercomputer_for_Remote_Sensing_and_Geoscientific_Applications.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 863.49 kB
Formato Adobe PDF
863.49 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/451573
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
  • OpenAlex 0
social impact