New frontiers in AI for biodiversity research and conservation with multimodal language models
- Zhongqi Miao ,
- Yuanhan Zhang ,
- Zalan Fabian ,
- Andres Hernandez Celis ,
- Sara Beery ,
- Chunyuan Li ,
- Ziwei Liu ,
- Amrita Gupta ,
- Md Nasir ,
- Wanhua Li ,
- Jason Holmberg ,
- Meredith Palmer ,
- Kaitlyn Gaynor ,
- Rahul Dodhia ,
- Juan M. Lavista Ferres
The integration of Artificial Intelligence (AI) into biodiversity research and conservation is growing rapidly, demonstrating great potential in reducing the intensive human labor required for data preprocessing, thereby, facilitating larger data collections that offer ecological insights at unprecedented scales. However, most of these AI applications for biodiversity are still in the early stages of development, hindered by challenges inherent in real-world datasets and the limited accessibility of these technologies to practitioners without extensive programming knowledge. The recent advent of multimodal language models, which can process and generate multiple data modalities, has significantly expanded the realm of possible AI applications in biodiversity research. These models have demonstrated the ability to classify species and recognize more complex concepts, such as animal postures and orientations, without prior exposure during training. Multimodal language models can also provide explanations for their predictions and interact with humans in natural language, thereby making them more transparent, intuitive, and accessible to non-specialists. Despite these advancements, the use of multimodal language models for biodiversity still needs to overcome unique barriers to application, including high computational and financial demands, reliance on prompt engineering for consistent model performance on large datasets, and insufficient open-source sharing of state-of-the-art methods. This paper explores the transformative potential of multimodal language models for biodiversity research, compared with traditional machine learning methods, and discusses several potential applications in biodiversity research. We also discuss challenges to implementing these models in real-world conservation scenarios and propose directions for future research to overcome these hurdles. Our goal is to encourage robust discussions and research into the integration of multimodal language models to advance AI for biodiversity research and conservation.