Paper: large language models overcome the challenges of unstructured text data in ecology

Resource description:

A paper by Andry Castro et al, exploring the potential for large language models (LLMs) in overcoming the challenges of unstructured text data in ecology.

The paper notes that manual processing of such data is labour-intensive and poses a significant challenge. In this study, Castro and team used three prompt-based LLMs - GPT 3.5, GPT 4 and LLaMA-2-70B - to automate the identification, interpretation, extraction and structuring of relevant ecological information from unstructured text sources. The study found that GPT 4 consistently outperformed the other models often exceeding 90% accuracy (averaging 87-100% accuracy).

The results demonstrate the potential benefit of integrating prompt-based LLMs into ecological data assimilation workflows as essential tools to efficiently process large volumes of textual data.

Castro, A., PInto, J., Reino, L., Pipek, P., Capinha, C. (2024) Large language models overcome the challenges of unstructured text data in ecology. Universidade de Lisboa, Portugal.

Shared under Creative Commons license CC-BY-ND 4.0.

Author/Contact:

Cesar Capinha, Universidade de Lisboa, Portugal.

cesarcapinha@edu.ulisboa.pt

Partners:

Universidade de Lisboa, Portugal.

Paper: large language models overcome the challenges of unstructured text data in ecology

Resource description:

Author/Contact:

Partners:

Licence: