The earth sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, access to extensive computing resources and the maturity of deep learning techniques. Vision transformers (ViT) architectures have been adapted for image and image-like data, such as EO data and ESM simulation output. Pretraining foundation models is a compute intensive process, often requiring 105 - 107 GPU hours for large scale scientific applications. There is a limited body of knowledge on compute optimal methods for pretraining, necessitating a trial and error process. We have performed a series of experiments using ViT backbones at different scales to understand optimal and cost-effective ways to improve scientific throughput. This preliminary benchmark provides an assessment of which architectures and model configurations are favorable in a given scientific context.
Exploring Vision Transformers on the Frontier Supercomputer for Remote Sensing and Geoscientific Applications / Anantharaj, Valentine; Kurihana, Takuya; Dash, Sajal; Padovani, Gabriele; Fiore, Sandro. - (2024), pp. 3085-3088. ( 2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024 Athens, Greece 2024) [10.1109/igarss53475.2024.10640929].
Exploring Vision Transformers on the Frontier Supercomputer for Remote Sensing and Geoscientific Applications
Padovani, Gabriele;Fiore, Sandro
2024-01-01
Abstract
The earth sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, access to extensive computing resources and the maturity of deep learning techniques. Vision transformers (ViT) architectures have been adapted for image and image-like data, such as EO data and ESM simulation output. Pretraining foundation models is a compute intensive process, often requiring 105 - 107 GPU hours for large scale scientific applications. There is a limited body of knowledge on compute optimal methods for pretraining, necessitating a trial and error process. We have performed a series of experiments using ViT backbones at different scales to understand optimal and cost-effective ways to improve scientific throughput. This preliminary benchmark provides an assessment of which architectures and model configurations are favorable in a given scientific context.| File | Dimensione | Formato | |
|---|---|---|---|
|
Exploring_Vision_Transformers_on_the_Frontier_Supercomputer_for_Remote_Sensing_and_Geoscientific_Applications.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
863.49 kB
Formato
Adobe PDF
|
863.49 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



