Resource

Paper: large language models overcome the challenges of unstructured text data in ecology

Image:
Paper on LLMs and text data in ecology

Author/Contact:

Cesar Capinha, Universidade de Lisboa, Portugal.

cesarcapinha@edu.ulisboa.pt

Publication date:

Resource description:

A paper by Andry Castro et al, exploring the potential for large language models (LLMs) in overcoming the challenges of unstructured text data in ecology.

The paper notes that manual processing of such data is labour-intensive and poses a significant challenge. In this study, Castro and team used three prompt-based LLMs - GPT 3.5, GPT 4 and LLaMA-2-70B - to automate the identification, interpretation, extraction and structuring of relevant ecological information from unstructured text sources. The study found that GPT 4 consistently outperformed the other models often exceeding 90% accuracy (averaging 87-100% accuracy).

The results demonstrate the potential benefit of integrating prompt-based LLMs into ecological data assimilation workflows as essential tools to efficiently process large volumes of textual data.

Castro, A., PInto, J., Reino, L., Pipek, P., Capinha, C. (2024) Large language models overcome the challenges of unstructured text data in ecology. Universidade de Lisboa, Portugal.

Shared under Creative Commons license CC-BY-ND 4.0.

Partners:

Universidade de Lisboa, Portugal.

Licence:

  • Public/open source