Using machine learning based emulators for the sensitivity analysis of process-driven biophysical models

PhD Thesis

Johnston, David B.. 2022. Using machine learning based emulators for the sensitivity analysis of process-driven biophysical models. PhD Thesis Doctor of Philosophy. University of Southern Queensland.

Using machine learning based emulators for the sensitivity analysis of process-driven biophysical models

TypePhD Thesis
AuthorJohnston, David B.
1. FirstA/Pr Keith Pembleton
2. SecondProf Ravinesh Deo
3. ThirdNeil I. Huth
Institution of OriginUniversity of Southern Queensland
Qualification NameDoctor of Philosophy
Number of Pages172
PublisherUniversity of Southern Queensland
Place of PublicationAustralia
Digital Object Identifier (DOI)

Sensitivity Analysis (SA) is a versatile and well-established tool used in the development and application of computer models. Although considered an integral part of the modelling process in multiple disciplines, its use in the development of process-driven biophysical models is relatively rare. One contributing reason for this lack of use is the computational burden associated with performing SA on complex models. Literature reports examples of the use of emulators, or metamodels, as an approach for reducing the computational burden of complex models, but there are no reports of using machine learning based emulators for undertaking SA of the underlying process-driven biophysical models. This doctoral thesis explores the potential of machine learning emulators (MLEs) in reducing the computational burden of performing SA on process-driven biophysical models. Firstly, a new method is developed that confirms that the variable importance indices of MLEs are comparable to the sensitivity indices produced by the commonly used Morris and Sobol methods. This provides the confidence upon which to proceed with investigating further the role that MLEs might play in reducing the computational burden of SA. Secondly, three different machine learning (ML) algorithms are used to generate MLEs of the APSIM-NextGen chickpea model to evaluate if some MLEs are better suited to the task of emulating process-driven biophysical crop models. The MLEs were assessed on accuracy of predicted values and the computational effort required to develop the MLEs themselves. The emulators based on random forest models were shown to produce the most accurate predictions, but also required the most computational effort to develop and train. Thirdly, two MLEs are used to undertake SA of all 22 input parameters of the MLEs, as well as a selected subset of six input parameters linked to the phenology of the crop. These analyses required more than 40 million simulations to be run. The MLEs were assessed based on their speed of execution, and on the Morris and Sobol indices produced. The impressive computational speed of the MLEs was quantified in comparison to the speed of the process-driven biophysical model. Some discrepancies were also noted between the results generated by the two types of MLE, so no firm conclusions could be made about the sensitivities of the underlying process-driven model. This work is at the juncture of the fields of process-driven biophysical model development, agronomy, plant physiology, machine learning emulators, and global sensitivity analysis. The outcomes of this work have implications for model development and model application in all these disciplines. Firstly, the Morris method remains a more computationally efficient choice, when compared with the development and use of MLEs, for the screening of importance of parameters of process-driven models. Secondly, the results show that, while both Morris and Sobol analyses produce very similar results across different MLEs, the discrepancies indicate that great caution is needed if interpreting these results as a way of understanding the underlying process-driven model and its input-output sensitivities. The results suggest that by using the computational efficiency of an MLE, SA of large-scale simulation experiments becomes more feasible, and this can contribute to efficiency gains for scientific research. The SA of enhanced forms of simulation experiments produced by hybrid models, which use the outputs of process-driven models and combine these with other sources of data to create new forms of ML based agro-ecological models, is suggested by this research as a direction that could be perused to advance agroecological modelling. This work has demonstrated how applied research in these areas, when combined, can better serve the needs of researchers and modelling practitioners alike.

KeywordsMachine learning, emulator, sensitivity analysis, process-driven models, APSIM
ANZSRC Field of Research 2020300207. Agricultural systems analysis and modelling
300205. Agricultural production systems simulation
460207. Modelling and simulation
461104. Neural networks
Public Notes

File reproduced in accordance with the copyright policy of the publisher/author.

Byline AffiliationsSchool of Agriculture and Environmental Science
Permalink -

Restricted files

Published Version

  • 19
    total views
  • 4
    total downloads
  • 2
    views this month
  • 0
    downloads this month

Export as