Source code for dpest.ts

import yaml
from dpest.functions import *

[docs] def ts( ts_file_path = None, treatment = None, variables = None, output_path = None, experiment=None, suffix = None, variables_classification = None, ts_ins_first_line = None, mrk = '~', smk = '!', ): """ Creates a ``PEST instruction file (.INS)`` for DSSAT **time-series** output files. This instruction file contains directions for PEST to read simulated time-series values from daily DSSAT outputs (e.g. ``PlantGro.OUT``, ``PlantN.OUT``, ``PlantC.OUT``, ``PlantGrf.OUT``, ``SoilNi.OUT``, ``SoilWat.OUT``, ``SoilTemp.OUT``) and to match them with the corresponding measured values stored in the DSSAT T file. The time-series output files supported by this function share a common tabular format where the first three columns are: @YEAR DOY DAS The ``PEST instruction file (.INS)`` guides PEST in extracting specific model-generated observations at specific time points for the variables specified by the user. Additionally, this module creates a tuple containing: 1) A DataFrame with the MEASURED observations (entered by the user in the DSSAT T file) for the specified variables. 2) The path to the generated ``PEST instruction file (.INS)``. **Required Arguments** * **ts_file_path** (*str*): Path to the DSSAT time-series output file to read. This can be any daily DSSAT output with ``@YEAR DOY DAS`` as the first three header columns, for example: - ``C:/DSSAT48/Soybean/PlantGro.OUT`` - ``C:/DSSAT48/Soybean/PlantN.OUT`` - ``C:/DSSAT48/Soybean/PlantC.OUT`` - ``C:/DSSAT48/Soybean/SoilNi.OUT`` - ``C:/DSSAT48/Soybean/SoilWat.OUT`` * **treatment** (*str*): Name of the treatment for which the cultivar is being calibrated. This must match exactly the treatment name as shown in the DSSAT interface when an experiment is selected (i.e. the value that appears in the ``TREATMENT`` line of the output header). Example: ``"164.0 KG N/HA IRRIG"`` or ``"76 Equidist BRAGG"``. * **variables** (*list* or *str*): Variable code(s) from the DSSAT T file (and thus present in the header of the selected time-series output file) that PEST will extract. The instruction file uses these codes to read the model output at the specified dates. - A single variable can be provided as a string, e.g. ``"LAID"``. - Multiple variables can be provided as a list, e.g. ``["LWAD", "SWAD", "GWAD", "RWAD", "CWAD", "HIAD", "PWAD"]``. **Optional Arguments** ====================== * **output_path** (*str*, *default: current working directory*): Directory where the generated ``PEST instruction file (.INS)`` will be saved. If not provided, the current working directory is used. * **experiment** (*str*, *optional*): Experiment code as shown in the PlantGro.OUT header (e.g. ``"AZMC9311"``). When the same treatment name appears in more than one experiment within the same time-series file, this argument is used to select the correct experiment block. If not provided and the treatment is unique in the file, the function will use the unique experiment automatically. If the treatment appears in multiple experiments and ``experiment`` is not specified, it will the variables for that treatment that appear in the last part of the file. * **suffix** (*str*, *default: ""*): Suffix to append to the output filename and variable names in the .INS file. This short code (e.g. ``"TRT1"``, ``"TRT2"``) identifies different treatments used for calibrating the same cultivar in the same calibration process. It must be 1–4 characters long and alphanumeric. For example, if ``suffix="TRT1"`` and ``ts_file_path="C:/DSSAT48/Wheat/PlantGro.OUT"``, the output file will be named ``PlantGro_TRT1.ins`` and markers will look like ``!LAID_75167_TRT1!``. * **variables_classification** (*dict*, *optional*): Mapping of variable codes to their respective categories (groups). When provided, this dictionary is used directly, with the format:: {"LAID": "lai", "CWAD": "biomass", ...} When ``variables_classification`` is ``None``, the function loads a global classification from the package configuration file (``dpest/arguments.yml``, key ``VARIABLES_CLASSIFICATION_GLOBAL``), and maps each variable code to its group from that dictionary. * **ts_ins_first_line** (*str*, *default: "pif"*): First line of the ``PEST instruction file (.INS)``. By default this is read from the package configuration (key ``INS_FILE_VARIABLES`` in ``dpest/arguments.yml``) when not provided. * **mrk** (*str*, *default: "~"*): Primary marker delimiter character for the instruction file. Must be a single character and cannot be A–Z, a–z, 0–9, ``!``, ``[``, ``]``, ``(``, ``)``, ``:``, space, tab, or ``&``. * **smk** (*str*, *default: "!"*): Secondary marker delimiter character for the instruction file. Must be a single character and cannot be A–Z, a–z, 0–9, ``[``, ``]``, ``(``, ``)``, ``:``, space, tab, or ``&``. **Internal behaviour** ====================== * The function parses the header of the selected time-series file to determine the experiment code, model, and crop name for the specified treatment. * From the crop name and the ``SIMULATION_CROP_MODELS`` section in ``dpest/arguments.yml``, it infers the DSSAT crop code (e.g. ``WH``, ``SB``, ``MZ``) and constructs the name of the T file as ``<EXPCODE>.<CROPCODE>T`` (e.g. ``SWSW7501.WHT``, ``CLMO8501.SBT``). * The T file is then read to obtain the measured time-series values for the requested variables and dates, which are used to build the DataFrame and the .INS file. **Returns** =========== * *tuple*: * *pandas.DataFrame*: A DataFrame containing the measured values for the selected variables and dates, with columns: - ``variable_name`` (including date and optional suffix) - ``value_measured`` (float) - ``group`` (classification group) * *str*: Full path to the generated ``PEST instruction file (.INS)``. **Examples** ============ 1. **PlantGro time series (Soybean)** .. code-block:: python from dpest import ts plantgro_observations, plantgro_ins_path = ts( treatment='76 Equidist BRAGG', ts_file_path='C:/DSSAT48/Soybean/PlantGro.OUT', variables=['LWAD', 'SWAD', 'GWAD', 'RWAD', 'CWAD', 'HIAD', 'PWAD', 'LN%D', 'SH%D', 'HIPD', 'SLAD'], ) This creates a ``PlantGro.ins`` file and a DataFrame with the measured values for the selected plant growth variables for treatment ``"76 Equidist BRAGG"``. 2. **PlantN time series (Soybean)** .. code-block:: python from dpest import ts plantn_observations, plantn_ins_path = ts( treatment='76 Equidist BRAGG', ts_file_path='C:/DSSAT48/Soybean/PlantN.OUT', variables=['LN%D', 'SN%D'], ) This reads nitrogen-related time-series variables from ``PlantN.OUT`` and creates a matching instruction file and DataFrame. Global variable classifications are used unless a custom mapping is supplied. 3. **Soil water time series** .. code-block:: python from dpest import ts soilwat_observations, soilwat_ins_path = ts( treatment='76 Equidist BRAGG', ts_file_path='C:/DSSAT48/Soybean/SoilWat.OUT', variables=['SW1D', 'SW2D', 'SW3D'], ) This reads daily soil water content in the top layers from ``SoilWat.OUT`` and builds a PEST instruction file for those variables. """ # Define default variables: yaml_file_variables = 'INS_FILE_VARIABLES' yaml_variables_classification = 'VARIABLES_CLASSIFICATION_GLOBAL' yaml_sim_models_key = 'SIMULATION_CROP_MODELS' MAX_VAR_LENGTH = 20 # In PEST, the variable names should not exceed 20 characters try: ## Get the yaml_data # Get the directory of the current script current_dir = os.path.dirname(os.path.abspath(__file__)) # Construct the path to arguments.yml arguments_file = os.path.join(current_dir, 'arguments.yml') # Ensure the YAML file exists if not os.path.isfile(arguments_file): raise FileNotFoundError(f"YAML file not found: {arguments_file}") # Load YAML configuration with open(arguments_file, 'r') as yml_file: yaml_data = yaml.safe_load(yml_file) # Validate treatment if not treatment or not isinstance(treatment, str): raise ValueError("The 'treatment' must be a non-empty string.") # Convert 'variables' to a list if it's not already a list if not isinstance(variables, list): variables = [variables] # Validate that 'variables' is a non-empty list of strings if not variables or not all(isinstance(var, str) for var in variables): raise ValueError( "The 'variables' should be a non-empty string or a list of strings. For example: 'LAID' or ['LAID', 'CWAD']") # Validate yaml_data if yaml_data is None: raise ValueError("The 'yaml_data' argument is required and must be specified by the user.") # Validate marker delimiters using the validate_marker() function mrk = validate_marker(mrk, "mrk") smk = validate_marker(smk, "smk") # Ensure mrk and smk are different if mrk == smk: raise ValueError("mrk and smk must be different characters.") # Validate variables_classification if variables_classification is None: variables_classification = yaml_data[yaml_variables_classification] if ts_ins_first_line is None: # Load default arguments from the YAML file if not provided function_arguments = yaml_data[yaml_file_variables] ts_ins_first_line = function_arguments['first_line'] # Validate ts_file_path validated_path = validate_file(ts_file_path, '.OUT') # Get treatment number treatment_dict = simulations_lines(validated_path) # Resolve the correct DSSAT block when the same treatment appears in multiple experiments selected_block, resolved_experiment_code = resolve_treatment_block_by_experiment( file_path=validated_path, treatment=treatment, treatment_dict=treatment_dict, experiment=experiment ) # Use only the selected block from here on to avoid using the wrong duplicated treatment selected_treatment_dict = {treatment: selected_block} # Get dictionaries with treatment name, treatement number, treatment and experiment code treatment_number_name, treatment_experiment_name, treatment_crop_name = \ extract_treatment_info_plantgrowth(validated_path, selected_treatment_dict) crop_name_from_header = treatment_crop_name.get(treatment) if crop_name_from_header is None: raise ValueError(f"Could not determine crop name for treatment '{treatment}'.") # Load simulation crop/model mappings sim_models = yaml_data.get(yaml_sim_models_key, {}) # Find the crop entry whose alias list (lower-cased) contains crop_name_from_header crop_code = None for crop_key, crop_info in sim_models.items(): aliases = [a.lower() for a in crop_info.get('crop_aliases', [])] if crop_name_from_header.lower() in aliases: crop_code = aliases[1] if len(aliases) > 1 else aliases[0] break if crop_code is None: raise ValueError( f"Could not infer crop code from crop name '{crop_name_from_header}'. " "Check SIMULATION_CROP_MODELS in arguments.yml." ) # Use the resolved experiment code from the selected block experiment_code = resolved_experiment_code if experiment_code is None: raise ValueError(f"Could not determine experiment code for treatment '{treatment}'.") # Build T-file name: <EXPCODE><CROPCODE>T, e.g. SWSW7501 + WH + T -> SWSW7501WHT t_file_name = f"{experiment_code}.{crop_code.upper()}T" t_file_path = os.path.join(os.path.dirname(validated_path), t_file_name) # Get the dataframe from the T file data t_df = wht_filedata_to_dataframe(t_file_path) # Load and filter data for all variables dates_variable_values_dict = filter_dataframe(t_df, treatment, treatment_number_name, variables) # Check if the filter_dataframe returned an empty dictionary (indicating an error) if not dates_variable_values_dict: raise ValueError(f"No valid data found for treatment '{treatment}' with variables {variables}") # Get the header and first simulation date header_line, first_sim_line, date_first_sim = get_header_and_first_sim( validated_path, treatment, treatment_dict=selected_treatment_dict ) # Calculate days dictionary days after first simulation days_dict = calculate_days_dict(dates_variable_values_dict, date_first_sim) # adjust the days after first simulation adjusted_days_dict = adjust_days_dict(days_dict) # Validate suffix if provided if suffix is not None: if not suffix.isalnum(): raise ValueError("Suffix must only contain letters and numbers.") if len(suffix) > 4: raise ValueError("Suffix must be at most 4 characters long.") suffix = "_" + suffix # Process each variable and generate output text output_text = "" for date, (days, vars_at_date) in adjusted_days_dict.items(): positions = find_variable_position(header_line, first_sim_line, vars_at_date) line = f"l{days}" current_pos = 0 # before first token on the line for var in sorted(positions, key=positions.get): pos = positions[var] # From start-of-line, reaching token `pos` requires `pos` times "w" w_count = pos if current_pos == 0 else (pos - current_pos) if w_count < 0: raise ValueError( f"Non-monotonic positions: {var} pos={pos}, current_pos={current_pos}" ) line += " w" * w_count line += f" {smk}{var}_{date}{suffix or ''}{smk}" current_pos = pos output_text += line + "\n" # Validate output_path output_path = validate_output_path(output_path) # Determine and validate output_filename if suffix is not None: # Extract the file name output_filename = os.path.basename(validated_path).replace('.OUT', f'{suffix}.ins') # Ensure it ends with '.ins' if not output_filename.lower().endswith('.ins'): output_filename += '.ins' else: # Default behavior if output_filename not provided output_filename = os.path.basename(validated_path).replace('.OUT', '.ins') # Create output text file ts_ins_file_path = os.path.join(output_path, output_filename) # Construct the content for the new .ins file # Include the experiment code as an anchor before the treatment (prevents wrong block when duplicated) ins_file_content = ( f"{ts_ins_first_line} {mrk}\n" f"{mrk}{experiment_code}{mrk}\n" f"{mrk}{treatment}{mrk}\n" f"{mrk}{header_line[1:].strip()}{mrk}\n" f"{output_text}" ) #--------- GET THE GROUP NAME OF THE VARIABLES dates_variable_values_data = [ { 'date': date, 'variable': variable, 'value_measured': value, 'variable_name': f"{variable}_{date}" } for date, variables in dates_variable_values_dict.items() for variable, value in variables.items() ] # Create the DataFrame dates_variable_values_df = pd.DataFrame(dates_variable_values_data) # Map variables to their respective groups dates_variable_values_df['group'] = dates_variable_values_df['variable'].map(variables_classification) # Convert 'value_measured' column to float dates_variable_values_df['value_measured'] = dates_variable_values_df['value_measured'].astype(float) # Add the siffix to the variable_name if suffix is not None: dates_variable_values_df['variable_name'] = dates_variable_values_df['variable_name'] + suffix # Select and reorder the columns result_df = dates_variable_values_df[['variable_name', 'value_measured', 'group']] # Write the content to the .ins file with open(ts_ins_file_path, 'w') as ins_file: ins_file.write(ins_file_content) print(f"{output_filename} file generated and saved to: {ts_ins_file_path}") return result_df, ts_ins_file_path except ValueError as ve: print(f"ValueError: {ve}") except FileNotFoundError as fe: print(f"FileNotFoundError: {fe}") except Exception as e: print(f"An unexpected error occurred: {e}")