Source code for dpest.uts

from dpest.functions import *


[docs]
def uts(
        uts_file_path=None,
        treatment=None,
        variables=None,
        experiment=None,
        nspaces_year_header=None,
        nspaces_doy_header = None,
        nspaces_columns_header = None,
):
    """
    Updates DSSAT **time-series** output (``.OUT``) files by adding rows to ensure that simulated values exist for all measured
    observation dates. This situation arises during the calibration process when PEST attempts to compare a measured
    value from the DSSAT "T file" to a corresponding simulated value in the ` **time-series** output (``.OUT``) file. If the simulation
    ends *before* the date of a measured observation, PEST will terminate the calibration process due to a missing
    observation error. This often occurs when measurements, such as remote sensing data, are taken close to the plant's
    maturity phase.

    This module addresses this issue by adding rows to the  **time-series** output (``.OUT``) file file with default values (0),
    extending the simulation period to cover all measured observation dates. The format of the time-series file is preserved. The first three columns must be:

        @YEAR DOY   DAS

    and the remaining columns correspond to daily simulated variables (for example, from
    ``PlantGro.OUT``, ``PlantN.OUT``, ``PlantC.OUT``, ``PlantGrf.OUT``, ``SoilNi.OUT``,
    ``SoilWat.OUT``, ``SoilTemp.OUT``).

    **Example Scenario:**
    =======

    Suppose the ``PlantGro.OUT`` simulation results extend to the year 2022 and day of year (DOY) 102.

    However, the DSSAT "T file" contains measurements for the same treatment with the following dates:

    * 2022 DOY 031
    * 2022 DOY 046
    * 2022 DOY 060
    * 2022 DOY 070
    * 2022 DOY 083
    * 2022 DOY 095
    * 2022 DOY 109

    In this case, PEST  will throw an error and terminate the calibration because the because the time-series
    output file does not contain information for the last ``DOY`` variable. The ``uts()``
    module adds the time series for the days that do not have an observation. The last row added with some
    values are similar to:

    .. code-block:: none

       2022  103   224     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0

    **Required Arguments:**
    =======

        * **uts_file_path** (*str*):
          Path to the DSSAT time-series output (``.OUT``) file to update. This can be any
          daily DSSAT output with ``@YEAR DOY DAS`` as the first three header columns, for example:

          - ``C:/DSSAT48/Wheat/PlantGro.OUT``
          - ``C:/DSSAT48/Wheat/PlantN.OUT``
          - ``C:/DSSAT48/Wheat/PlantC.OUT``
          - ``C:/DSSAT48/Wheat/SoilNi.OUT``
          - ``C:/DSSAT48/Wheat/SoilWat.OUT``

        * **treatment** (*str*): The name of the treatment for which the cultivar is being calibrated. This should
        match exactly the treatment name as shown in the DSSAT application interface when an experiment is selected.
        For example, "164.0 KG N/HA IRRIG" is a treatment of the ``SWSW7501WH.WHX`` experiment.
        * **variables** (*list* or *str*): Variable(s) from the DSSAT "T file" (and thus present in the time-series output file)
        that PEST will extract. The PEST instruction file will use these to read the model output. You may specify a
        single variable as a string (e.g., ``'LAID'``) or multiple variables as a list (e.g., ``['LAID', 'CWAD', 'T#AD']``).

    **Optional Arguments:**
    =======

        * **experiment** (*str*, *optional*):
        Experiment code as shown in the PlantGro.OUT header (e.g. ``"AZMC9311"``).
        When the same treatment name appears in more than one experiment within
        the same time-series file, this argument is used to select the correct
        experiment block. If not provided and the treatment is unique in the file,
        the function will use the unique experiment automatically. If the
        treatment appears in multiple experiments and ``experiment`` is not
        specified, a clear error is raised indicating the available experiments.

        * **nspaces_year_header** (*int*, *default: 5*): Number of spaces reserved for the year header in
        the ``.OUT`` file. It is unlikely that the format of the ``.OUT`` file changes in a way that
        necessitates modifying this value.

        * **nspaces_doy_header** (*int*, *default: 4*): Number of spaces reserved for the day-of-year header in
        the ``.OUT`` file. It is unlikely that the format of the time-series output files changes in a way
        that necessitates modifying this value.

        * **nspaces_columns_header** (*int*, *default: 6*): Number of spaces reserved for other columns in the
        ``.OUT`` file. It is unlikely that the format of the time-series output files changes in a way that
        necessitates modifying this value.

    **Returns:**
    =======

        * ``None``

    **Examples:**
    =======

    1. **Basic Usage (List of variables):**

       .. code-block:: python

          from dpest import uts

          uts(
              plantgro_file_path='C:/DSSAT48/Wheat/PlantGro.OUT',
              treatment='164.0 KG N/HA IRRIG',
              variables=['LAID', 'CWAD', 'T#AD']
          )

       This example demonstrates the basic usage of the module with a list of variables (``LAID``, ``CWAD``, and ``T#AD``).
       If the simulation end date in the existing ``PlantGro.OUT`` file is earlier than the latest measurement date in
       the DSSAT "T file", then the ``PlantGro.OUT`` file will be extended by adding new rows. The values of all variables
       present in the ``PlantGro.OUT`` file will be set to ``0`` in the added rows.

    2. **Basic Usage (Single variable):**

       .. code-block:: python

          from dpest import uts

          uts(
              plantgro_file_path='C:/DSSAT48/Wheat/PlantGro.OUT',
              treatment='164.0 KG N/HA IRRIG',
              variables='LAID'
          )

       This example demonstrates the basic usage of the module when only one variable (``LAID``) is specified. If the
       simulation end date in the existing ``PlantGro.OUT`` file is earlier than the latest measurement date in the
       DSSAT "T file", then the ``PlantGro.OUT`` file will be extended by adding new rows. The values of all variables
       present in the ``PlantGro.OUT`` file will be set to 0 in the added rows.
    """
    rows_added = 0  # Initialize
    yaml_sim_models_key = 'SIMULATION_CROP_MODELS'

    try:
        ## Get the yaml_data
        # Get the directory of the current script
        current_dir = os.path.dirname(os.path.abspath(__file__))
        # Construct the path to arguments.yml
        arguments_file = os.path.join(current_dir, 'arguments.yml')
        # Ensure the YAML file exists
        if not os.path.isfile(arguments_file):
            raise FileNotFoundError(f"YAML file not found: {arguments_file}")
        # Load YAML configuration
        with open(arguments_file, 'r') as yml_file:
            yaml_data = yaml.safe_load(yml_file)

        # Validate uts_file_path
        validated_path = validate_file(uts_file_path, '.OUT')

        # Validate treatment
        if not treatment or not isinstance(treatment, str):
            raise ValueError("The 'treatment' must be a non-empty string.")

        # Convert 'variables' to a list if it's not already a list
        if not isinstance(variables, list):
            variables = [variables]

        # Validate that 'variables' is a non-empty list of strings
        if not variables or not all(isinstance(var, str) for var in variables):
            raise ValueError(
                "The 'variables' should be a non-empty string or a list of strings. For example: 'LAID' or ['LAID', 'CWAD']")

        # Assign default values if None and validate integer input
        if nspaces_year_header is None:
            nspaces_year_header = 5
        elif not isinstance(nspaces_year_header, int):
            raise ValueError("nspaces_year_header must be an integer.")

        if nspaces_doy_header is None:
            nspaces_doy_header = 4
        elif not isinstance(nspaces_doy_header, int):
            raise ValueError("nspaces_doy_header must be an integer.")

        if nspaces_columns_header is None:
            nspaces_columns_header = 6
        elif not isinstance(nspaces_columns_header, int):
            raise ValueError("nspaces_columns_header must be an integer.")

        # Get treatment range
        treatment_dict = simulations_lines(validated_path)
        (start_i, end_i), experiment_code = resolve_treatment_block_by_experiment(
            file_path=validated_path,
            treatment=treatment,
            treatment_dict=treatment_dict,
            experiment=experiment,
        )
        selected_treatment_dict = {treatment: (start_i, end_i)}
        treatment_range = (start_i, end_i)

        # Read growth file
        ts_file_df = read_growth_file(validated_path, treatment_range)

        # Get treatment number
        # Get treatment number
        # Get dictionaries with treatment name, treatement number, treatment and experiment code
        treatment_number_name, treatment_experiment_name, treatment_crop_name = \
            extract_treatment_info_plantgrowth(validated_path, selected_treatment_dict)

        crop_name_from_header = treatment_crop_name.get(treatment)
        if crop_name_from_header is None:
            raise ValueError(f"Could not determine crop name for treatment '{treatment}'.")

        # Load simulation crop/model mappings
        sim_models = yaml_data.get(yaml_sim_models_key, {})

        # Find the crop entry whose alias list (lower-cased) contains crop_name_from_header
        crop_code = None
        for crop_key, crop_info in sim_models.items():
            aliases = [a.lower() for a in crop_info.get('crop_aliases', [])]
            if crop_name_from_header.lower() in aliases:
                crop_code = aliases[1] if len(aliases) > 1 else aliases[0]
                break

        if crop_code is None:
            raise ValueError(
                f"Could not infer crop code from crop name '{crop_name_from_header}'. "
                "Check SIMULATION_CROP_MODELS in arguments.yml."
            )

        if experiment_code is None:
            raise ValueError(f"Could not determine experiment code for treatment '{treatment}'.")

        # Build T-file name: <EXPCODE><CROPCODE>T, e.g. SWSW7501 + WH + T -> SWSW7501WHT
        t_file_name = f"{experiment_code}.{crop_code.upper()}T"
        t_file_path = os.path.join(os.path.dirname(validated_path), t_file_name)

        # Get the dataframe from the T file data
        t_df = wht_filedata_to_dataframe(t_file_path)

        # Load and filter data for all variables and get the measured year
        dates_variable_values_dict = filter_dataframe(t_df, treatment, treatment_number_name, variables)

        # Check if the filter_dataframe returned an empty dictionary (indicating an error)
        if not dates_variable_values_dict:
            raise ValueError(f"No valid data found for treatment '{treatment}' with variables {variables}")

        # Get the year and day of the year and join it as one unique number
        year_sim = int(str(ts_file_df['@YEAR'].iloc[-1]) + f"{ts_file_df['DOY'].iloc[-1]:03}")

        # Handle both 4-digit and 2-digit years for year_measured
        year_measured_key_str = str(list(dates_variable_values_dict.keys())[-1])

        if len(year_measured_key_str) == 5:  # If year_measured has a 2-digit year
            year_measured_year = int(year_measured_key_str[:2])
            doy_measured = int(year_measured_key_str[2:])

            # Determine the correct century for the 2-digit year
            century = year_sim // 100000  # Get the century from year_sim
            year_measured = int(f"{century}{year_measured_year:02d}{doy_measured:03d}")
        else:  # If year_measured has a 4-digit year
            year_measured = int(year_measured_key_str)

        # Create the new rows to insert
        if year_sim < year_measured:
            number_rows_add = year_measured - year_sim

            # Get the new rows using the new_rows() function
            new_rows = new_rows_add(ts_file_df, number_rows_add)

            # Read the existing file and store its contents
            with open(uts_file_path, 'r') as file:
                lines = file.readlines()

            # Identify the line where the headers are defined (e.g., '@YEAR')
            header_line = next(line for line in lines if '@YEAR' in line)

            # Extract column headers to maintain correct order
            headers = header_line.strip().split()

            # Convert each dictionary into a formatted row string
            new_rows_dic = []
            for row_data in new_rows:
                row = (
                        str(row_data.get('@YEAR', 0)).rjust(nspaces_year_header) +
                        str(row_data.get('DOY', 0)).rjust(nspaces_doy_header) +
                        ''.join(str(row_data.get(col, 0)).rjust(nspaces_columns_header) for col in headers if
                                col not in ['@YEAR', 'DOY']) +
                        '\n'
                )
                new_rows_dic.append(row)

            # Add new rows to the lines list
            lines[treatment_range[1]:treatment_range[1]] = new_rows_dic

            # Update the rows_added counter
            rows_added = len(new_rows)

            # Write the updated content back to the file
            with open(validated_path, 'w') as file:
                file.writelines(lines)

            # Add messages about rows added (now inside the try block)
        if rows_added > 0:
            print(f"{validated_path} update: {rows_added} row{'s' if rows_added > 1 else ''} added successfully.")
        else:
            print(f"{validated_path} status: No update required.")

    except ValueError as ve:
        print(f"ValueError: {ve}")
    except FileNotFoundError as fe:
        print(f"FileNotFoundError: {fe}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
Table of Contents

Source code for dpest.uts