Source code for dpest.overview

import yaml
from dpest.functions import *

[docs] def overview( treatment=None, overview_file_path=None, output_path=None, experiment=None, suffix=None, variables=None, variables_classification=None, overview_ins_first_line=None, mrk="~", smk="!", ): """ Creates a ``PEST instruction file (.INS)``. This instruction file contains directions for PEST to read the simulated values from the ``OVERVIEW.OUT`` file and compare them with the corresponding observed values (originally entered in the DSSAT "A file"). The ``PEST instruction file (.INS)`` guides PEST in extracting specific model-generated observations from the ``OVERVIEW.OUT`` file, which includes a list of end-of-season crop performance metrics, and critical phenological observations used for model evaluation. Additionally, this module creates a tuple containing: 1. A DataFrame with the MEASURED observations from the ``OVERVIEW.OUT`` file (originally entered in the DSSAT "A file"). 2. The path to the generated ``PEST instruction file (.INS)``. **Required Arguments:** ======= * **treatment** (*str*): The name of the treatment for which the cultivar is being calibrated. This should match exactly the treatment name as shown in the DSSAT application interface when an experiment is selected. For example, ``"164.0 KG N/HA IRRIG"`` is a treatment of the ``SWSW7501WH.WHX`` experiment. * **overview_file_path** (*str*): Path to the ``OVERVIEW.OUT`` file to read. Usually the file is in ``C:\\DSSAT48\\Wheat\\OVERVIEW.OUT``. **Optional Arguments:** ======= * **output_path** (*str*, *default: current working directory*): Directory where the generated ``PEST instruction file (.INS)`` will be saved. * **experiment** (*str*, *optional*): Experiment code as shown in the PlantGro.OUT header (e.g. ``"AZMC9311"``). When the same treatment name appears in more than one experiment within the same OVERVIEW.OUT file, this argument is used to select the correct experiment block. If not provided and the treatment is unique in the file, the function will use the unique experiment automatically. If the treatment appears in multiple experiments and ``experiment`` is not specified, it will the variables for that treatment that appear in the last part of the file. * **suffix** (*str*, *default: ""*): Suffix to append to the output filename and variable names in the .INS file. This short code (e.g., ``TRT1``, ``TRT2``, ``TRT3``) identifies different treatments used for calibrating the same cultivar (or ecotype) in the same calibration process. It must be 1–4 characters long, containing only uppercase letters and/or numbers. For example, if ``suffix="TRT1"``, the output file will be named ``OVERVIEW_TRT1.ins`` and variable markers will include the suffix (e.g., ``!Anthesis_DAP_TRT1!``). This ensures that PEST can distinguish between variables from different treatments, as PEST does not allow variables with the same name. * **variables** (*list* or *str*): Variable(s) from the ``OVERVIEW.OUT`` file that PEST will extract in case the user does not want to use all the variables present in the DSSAT “A file” for the calibration. The PEST instruction file will use these to read the model output. You may specify a single variable as a string (e.g., ``'Anthesis (DAP)'``) or multiple variables as a list (e.g., ``['Anthesis (DAP)', 'Maturity (DAP)', 'Product wt (kg dm/ha;no loss)',`` ``'Maximum leaf area index', 'Canopy (tops) wt (kg dm/ha)', 'Above-ground N (kg/ha)']``). * **variables_classification** (*dict*): Mapping of variable names to their respective categories. If provided, it is used directly to classify variables. If not provided, the function will attempt to load crop- and model-specific classification from a configuration file located at ``dpest/<crop>/<model>/arguments.yml``. * **overview_ins_first_line** (*str*, *default: "pif"*): First line of the ``PEST instruction file (.INS)``. This is the PEST default value and is obtained from the package configuration file (``dpest/arguments.yml``) when not provided by the user. * **mrk** (*str*, *default: "~"*): Primary marker delimiter character for the instruction file. Must be a single character and cannot be A–Z, a–z, 0–9, ``!``, ``[``, ``]``, ``(``, ``)``, ``:``, space, tab, or ``&``. * **smk** (*str*, *default: "!"*): Secondary marker delimiter character for the instruction file. Must be a single character and cannot be A–Z, a–z, 0–9, ``[``, ``]``, ``(``, ``)``, ``:``, space, tab, or ``&``. **Returns:** ======= * *tuple*: A tuple containing: * *pandas.DataFrame*: A filtered DataFrame used to generate the ``PEST instruction file (.INS)``. * *str*: The full path to the generated ``PEST instruction file (.INS)``. **Examples:** ======= 1. **Basic Usage (Required arguments only, crop/model-based defaults):** .. code-block:: python from dpest import overview overview_observations, overview_ins_path = overview( treatment='164.0 KG N/HA DRY', overview_file_path='C:/DSSAT48/Wheat/OVERVIEW.OUT', crop='wheat', model='ceres', ) 2. **Specifying Variable Classifications Manually:** .. code-block:: python from dpest import overview overview( treatment='164.0 KG N/HA DRY', overview_file_path='C:/DSSAT48/Wheat/OVERVIEW.OUT', variables=[ 'Anthesis (DAP)', 'Maturity (DAP)', 'Product wt (kg dm/ha;no loss)', 'Maximum leaf area index', 'Canopy (tops) wt (kg dm/ha)', 'Above-ground N (kg/ha)', ], variables_classification={ 'Anthesis (DAP)': 'phenology', 'Maturity (DAP)': 'phenology', 'Product wt (kg dm/ha;no loss)': 'yield', 'Maximum leaf area index': 'lai', 'Canopy (tops) wt (kg dm/ha)': 'biomass', 'Above-ground N (kg/ha)': 'nitrogen', }, ) """ # Define YAML keys used in configuration files yml_file_block = "OVERVIEW_FILE" yaml_file_variables = "INS_FILE_VARIABLES" yaml_variables_classification = "VARIABLES_CLASSIFICATION" MAX_VAR_LENGTH = 20 # In PEST, the variable names should not exceed 20 characters try: # Load package-level defaults from dpest/arguments.yml current_dir = os.path.dirname(os.path.abspath(__file__)) arguments_file = os.path.join(current_dir, "arguments.yml") if not os.path.isfile(arguments_file): raise FileNotFoundError(f"YAML file not found: {arguments_file}") with open(arguments_file, "r") as yml_file: yaml_data = yaml.safe_load(yml_file) # Validate treatment if treatment is None: raise ValueError( "The 'treatment' argument is required and must be specified by the user." ) # Validate marker delimiters using the validate_marker() function mrk = validate_marker(mrk, "mrk") smk = validate_marker(smk, "smk") if mrk == smk: raise ValueError("mrk and smk must be different characters.") # Load default arguments from the YAML file if not provided if overview_ins_first_line is None: function_arguments = yaml_data[yaml_file_variables] overview_ins_first_line = function_arguments["first_line"] # Handle the optional list of variables if variables is not None: if not isinstance(variables, list): variables = [variables] if not variables or not all(isinstance(var, str) for var in variables): raise ValueError( "The 'variables' should be a non-empty string or a list of " "strings. For example: 'Maturity (DAP)' or " "['Emergence (DAP)', 'Maturity (DAP)', " "'Product wt (kg dm/ha;no loss)']" ) # Validate overview_file_path using the validate_file() function validated_path = validate_file(overview_file_path, ".OUT") # ------------------------------------------------------------------ # Resolve treatment block by experiment (same logic used in ts()) # ------------------------------------------------------------------ # Get treatment block ranges from the OVERVIEW.OUT file treatment_dict = simulations_lines(validated_path) # Resolve the correct block range and experiment code (start_i, end_i), experiment_code = resolve_treatment_block_by_experiment( file_path=validated_path, treatment=treatment, treatment_dict=treatment_dict, experiment=experiment, ) # Restrict downstream parsing to the selected block only selected_treatment_dict = {treatment: (start_i, end_i)} # Read and parse the overview file overview_df, header_line, crop_model = extract_simulation_data( validated_path, treatment_dict=selected_treatment_dict ) # Load variables_classification: # - use user-provided dict if given; # - otherwise, obtain crop and model from the .OUT file, and load variables # classification from dpest/<crop>/<model>/arguments.yml if variables_classification is None: model = crop_model.split('-')[0].strip()[:5] crop = crop_model.split('-')[1].strip().lower() crop_model_arguments_file = get_crop_model_arguments_file_path( crop=crop, model=model ) if not os.path.isfile(crop_model_arguments_file): raise FileNotFoundError( f"YAML file not found for crop='{crop}' and model='{model}': " f"{crop_model_arguments_file}" ) with open(crop_model_arguments_file, "r") as cm_yml: crop_model_yaml_data = yaml.safe_load(cm_yml) try: variables_classification = crop_model_yaml_data[yml_file_block][ yaml_variables_classification ] except KeyError as exc: raise KeyError( "The crop/model configuration file does not define the " f"'{yml_file_block}.{yaml_variables_classification}' section " f"required by the overview() function." ) from exc # Filter the DataFrame for the specified treatment filtered_df = overview_df.loc[ (overview_df["treatment"] == treatment) ].copy() if filtered_df.empty: raise ValueError( f"No data found for treatment '{treatment}'. " "Please check if the treatment exists in the OVERVIEW.OUT file." ) # Map variables to their respective groups filtered_df["group"] = filtered_df["variable"].map(variables_classification) # Remove rows where 'value_measured' column contains NaN values filtered_df = filtered_df.dropna(subset=["value_measured"]) # Filter variables if a list of variables was provided by the user if variables is not None: filtered_df = filtered_df[filtered_df["variable"].isin(variables)] # Adjust the 'position' column to create 'position_adjusted' filtered_df["position_adjusted"] = ( filtered_df["position"] - filtered_df["position"].shift(1) ) # Ensure the first row retains its original position filtered_df.loc[ filtered_df.index[0], "position_adjusted" ] = filtered_df.loc[filtered_df.index[0], "position"] # Transform the variable names to fit the max 20 characters required by PEST filtered_df = process_variable_names(filtered_df) # Validate suffix if provided if suffix is not None: if not isinstance(suffix, str): raise ValueError("Suffix must be a string.") if not suffix.isalnum(): raise ValueError("Suffix must only contain letters and numbers.") if len(suffix) > 4: raise ValueError("Suffix must be at most 4 characters long.") suffix = "_" + suffix # only add underscore *after* validation # Create a dictionary to add the treatment suffix replace_dict = add_suffix_to_variables( filtered_df["variable_name"], suffix, MAX_VAR_LENGTH ) filtered_df["variable_name"] = filtered_df["variable_name"].map( replace_dict ) # Generate the .ins file content output_text = "" for _, row in filtered_df.iterrows(): output_text += ( f"l{row['position_adjusted']} " f"{mrk}{row['variable']}{mrk} " f"{smk}{row['variable_name']}{smk}\n" ) # Combine the content into the full .ins file structure ins_file_content = ( f"{overview_ins_first_line} {mrk}\n" f"{mrk}{experiment_code}{mrk}\n" f"{mrk}{treatment}{mrk}\n" f"{mrk}{header_line[1:].strip()}{mrk}\n" f"{output_text}" ) # Validate output_path output_path = validate_output_path(output_path) # Determine and validate output_filename if suffix: output_filename = os.path.basename(validated_path).replace( ".OUT", f"{suffix}.ins" ) if not output_filename.lower().endswith(".ins"): output_filename += ".ins" else: output_filename = os.path.basename(validated_path).replace(".OUT", ".ins") # Create the path and file name for the new file output_new_file_path = os.path.join(output_path, output_filename) # Write the generated content to the .ins file with open(output_new_file_path, "w") as ins_file: ins_file.write(ins_file_content) print(f"OVERVIEW.INS file generated and saved to: {output_new_file_path}") # Remove non-useful columns from the dataframe to export ouput_overview_df = filtered_df[["variable_name", "value_measured", "group"]] return ouput_overview_df, output_new_file_path except ValueError as ve: print(f"ValueError: {ve}") except FileNotFoundError as fe: print(f"FileNotFoundError: {fe}") except Exception as e: print(f"An unexpected error occurred: {e}")