from dpest.functions import *
[docs]
def uts(
uts_file_path=None,
treatment=None,
variables=None,
experiment=None,
nspaces_year_header=None,
nspaces_doy_header = None,
nspaces_columns_header = None,
):
"""
Updates DSSAT **time-series** output (``.OUT``) files by adding rows to ensure that simulated values exist for all measured
observation dates. This situation arises during the calibration process when PEST attempts to compare a measured
value from the DSSAT "T file" to a corresponding simulated value in the ` **time-series** output (``.OUT``) file. If the simulation
ends *before* the date of a measured observation, PEST will terminate the calibration process due to a missing
observation error. This often occurs when measurements, such as remote sensing data, are taken close to the plant's
maturity phase.
This module addresses this issue by adding rows to the **time-series** output (``.OUT``) file file with default values (0),
extending the simulation period to cover all measured observation dates. The format of the time-series file is preserved. The first three columns must be:
@YEAR DOY DAS
and the remaining columns correspond to daily simulated variables (for example, from
``PlantGro.OUT``, ``PlantN.OUT``, ``PlantC.OUT``, ``PlantGrf.OUT``, ``SoilNi.OUT``,
``SoilWat.OUT``, ``SoilTemp.OUT``).
**Example Scenario:**
=======
Suppose the ``PlantGro.OUT`` simulation results extend to the year 2022 and day of year (DOY) 102.
However, the DSSAT "T file" contains measurements for the same treatment with the following dates:
* 2022 DOY 031
* 2022 DOY 046
* 2022 DOY 060
* 2022 DOY 070
* 2022 DOY 083
* 2022 DOY 095
* 2022 DOY 109
In this case, PEST will throw an error and terminate the calibration because the because the time-series
output file does not contain information for the last ``DOY`` variable. The ``uts()``
module adds the time series for the days that do not have an observation. The last row added with some
values are similar to:
.. code-block:: none
2022 103 224 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
**Required Arguments:**
=======
* **uts_file_path** (*str*):
Path to the DSSAT time-series output (``.OUT``) file to update. This can be any
daily DSSAT output with ``@YEAR DOY DAS`` as the first three header columns, for example:
- ``C:/DSSAT48/Wheat/PlantGro.OUT``
- ``C:/DSSAT48/Wheat/PlantN.OUT``
- ``C:/DSSAT48/Wheat/PlantC.OUT``
- ``C:/DSSAT48/Wheat/SoilNi.OUT``
- ``C:/DSSAT48/Wheat/SoilWat.OUT``
* **treatment** (*str*): The name of the treatment for which the cultivar is being calibrated. This should
match exactly the treatment name as shown in the DSSAT application interface when an experiment is selected.
For example, "164.0 KG N/HA IRRIG" is a treatment of the ``SWSW7501WH.WHX`` experiment.
* **variables** (*list* or *str*): Variable(s) from the DSSAT "T file" (and thus present in the time-series output file)
that PEST will extract. The PEST instruction file will use these to read the model output. You may specify a
single variable as a string (e.g., ``'LAID'``) or multiple variables as a list (e.g., ``['LAID', 'CWAD', 'T#AD']``).
**Optional Arguments:**
=======
* **experiment** (*str*, *optional*):
Experiment code as shown in the PlantGro.OUT header (e.g. ``"AZMC9311"``).
When the same treatment name appears in more than one experiment within
the same time-series file, this argument is used to select the correct
experiment block. If not provided and the treatment is unique in the file,
the function will use the unique experiment automatically. If the
treatment appears in multiple experiments and ``experiment`` is not
specified, a clear error is raised indicating the available experiments.
* **nspaces_year_header** (*int*, *default: 5*): Number of spaces reserved for the year header in
the ``.OUT`` file. It is unlikely that the format of the ``.OUT`` file changes in a way that
necessitates modifying this value.
* **nspaces_doy_header** (*int*, *default: 4*): Number of spaces reserved for the day-of-year header in
the ``.OUT`` file. It is unlikely that the format of the time-series output files changes in a way
that necessitates modifying this value.
* **nspaces_columns_header** (*int*, *default: 6*): Number of spaces reserved for other columns in the
``.OUT`` file. It is unlikely that the format of the time-series output files changes in a way that
necessitates modifying this value.
**Returns:**
=======
* ``None``
**Examples:**
=======
1. **Basic Usage (List of variables):**
.. code-block:: python
from dpest import uts
uts(
plantgro_file_path='C:/DSSAT48/Wheat/PlantGro.OUT',
treatment='164.0 KG N/HA IRRIG',
variables=['LAID', 'CWAD', 'T#AD']
)
This example demonstrates the basic usage of the module with a list of variables (``LAID``, ``CWAD``, and ``T#AD``).
If the simulation end date in the existing ``PlantGro.OUT`` file is earlier than the latest measurement date in
the DSSAT "T file", then the ``PlantGro.OUT`` file will be extended by adding new rows. The values of all variables
present in the ``PlantGro.OUT`` file will be set to ``0`` in the added rows.
2. **Basic Usage (Single variable):**
.. code-block:: python
from dpest import uts
uts(
plantgro_file_path='C:/DSSAT48/Wheat/PlantGro.OUT',
treatment='164.0 KG N/HA IRRIG',
variables='LAID'
)
This example demonstrates the basic usage of the module when only one variable (``LAID``) is specified. If the
simulation end date in the existing ``PlantGro.OUT`` file is earlier than the latest measurement date in the
DSSAT "T file", then the ``PlantGro.OUT`` file will be extended by adding new rows. The values of all variables
present in the ``PlantGro.OUT`` file will be set to 0 in the added rows.
"""
rows_added = 0 # Initialize
yaml_sim_models_key = 'SIMULATION_CROP_MODELS'
try:
## Get the yaml_data
# Get the directory of the current script
current_dir = os.path.dirname(os.path.abspath(__file__))
# Construct the path to arguments.yml
arguments_file = os.path.join(current_dir, 'arguments.yml')
# Ensure the YAML file exists
if not os.path.isfile(arguments_file):
raise FileNotFoundError(f"YAML file not found: {arguments_file}")
# Load YAML configuration
with open(arguments_file, 'r') as yml_file:
yaml_data = yaml.safe_load(yml_file)
# Validate uts_file_path
validated_path = validate_file(uts_file_path, '.OUT')
# Validate treatment
if not treatment or not isinstance(treatment, str):
raise ValueError("The 'treatment' must be a non-empty string.")
# Convert 'variables' to a list if it's not already a list
if not isinstance(variables, list):
variables = [variables]
# Validate that 'variables' is a non-empty list of strings
if not variables or not all(isinstance(var, str) for var in variables):
raise ValueError(
"The 'variables' should be a non-empty string or a list of strings. For example: 'LAID' or ['LAID', 'CWAD']")
# Assign default values if None and validate integer input
if nspaces_year_header is None:
nspaces_year_header = 5
elif not isinstance(nspaces_year_header, int):
raise ValueError("nspaces_year_header must be an integer.")
if nspaces_doy_header is None:
nspaces_doy_header = 4
elif not isinstance(nspaces_doy_header, int):
raise ValueError("nspaces_doy_header must be an integer.")
if nspaces_columns_header is None:
nspaces_columns_header = 6
elif not isinstance(nspaces_columns_header, int):
raise ValueError("nspaces_columns_header must be an integer.")
# Get treatment range
treatment_dict = simulations_lines(validated_path)
(start_i, end_i), experiment_code = resolve_treatment_block_by_experiment(
file_path=validated_path,
treatment=treatment,
treatment_dict=treatment_dict,
experiment=experiment,
)
selected_treatment_dict = {treatment: (start_i, end_i)}
treatment_range = (start_i, end_i)
# Read growth file
ts_file_df = read_growth_file(validated_path, treatment_range)
# Get treatment number
# Get treatment number
# Get dictionaries with treatment name, treatement number, treatment and experiment code
treatment_number_name, treatment_experiment_name, treatment_crop_name = \
extract_treatment_info_plantgrowth(validated_path, selected_treatment_dict)
crop_name_from_header = treatment_crop_name.get(treatment)
if crop_name_from_header is None:
raise ValueError(f"Could not determine crop name for treatment '{treatment}'.")
# Load simulation crop/model mappings
sim_models = yaml_data.get(yaml_sim_models_key, {})
# Find the crop entry whose alias list (lower-cased) contains crop_name_from_header
crop_code = None
for crop_key, crop_info in sim_models.items():
aliases = [a.lower() for a in crop_info.get('crop_aliases', [])]
if crop_name_from_header.lower() in aliases:
crop_code = aliases[1] if len(aliases) > 1 else aliases[0]
break
if crop_code is None:
raise ValueError(
f"Could not infer crop code from crop name '{crop_name_from_header}'. "
"Check SIMULATION_CROP_MODELS in arguments.yml."
)
if experiment_code is None:
raise ValueError(f"Could not determine experiment code for treatment '{treatment}'.")
# Build T-file name: <EXPCODE><CROPCODE>T, e.g. SWSW7501 + WH + T -> SWSW7501WHT
t_file_name = f"{experiment_code}.{crop_code.upper()}T"
t_file_path = os.path.join(os.path.dirname(validated_path), t_file_name)
# Get the dataframe from the T file data
t_df = wht_filedata_to_dataframe(t_file_path)
# Load and filter data for all variables and get the measured year
dates_variable_values_dict = filter_dataframe(t_df, treatment, treatment_number_name, variables)
# Check if the filter_dataframe returned an empty dictionary (indicating an error)
if not dates_variable_values_dict:
raise ValueError(f"No valid data found for treatment '{treatment}' with variables {variables}")
# Get the year and day of the year and join it as one unique number
year_sim = int(str(ts_file_df['@YEAR'].iloc[-1]) + f"{ts_file_df['DOY'].iloc[-1]:03}")
# Handle both 4-digit and 2-digit years for year_measured
year_measured_key_str = str(list(dates_variable_values_dict.keys())[-1])
if len(year_measured_key_str) == 5: # If year_measured has a 2-digit year
year_measured_year = int(year_measured_key_str[:2])
doy_measured = int(year_measured_key_str[2:])
# Determine the correct century for the 2-digit year
century = year_sim // 100000 # Get the century from year_sim
year_measured = int(f"{century}{year_measured_year:02d}{doy_measured:03d}")
else: # If year_measured has a 4-digit year
year_measured = int(year_measured_key_str)
# Create the new rows to insert
if year_sim < year_measured:
number_rows_add = year_measured - year_sim
# Get the new rows using the new_rows() function
new_rows = new_rows_add(ts_file_df, number_rows_add)
# Read the existing file and store its contents
with open(uts_file_path, 'r') as file:
lines = file.readlines()
# Identify the line where the headers are defined (e.g., '@YEAR')
header_line = next(line for line in lines if '@YEAR' in line)
# Extract column headers to maintain correct order
headers = header_line.strip().split()
# Convert each dictionary into a formatted row string
new_rows_dic = []
for row_data in new_rows:
row = (
str(row_data.get('@YEAR', 0)).rjust(nspaces_year_header) +
str(row_data.get('DOY', 0)).rjust(nspaces_doy_header) +
''.join(str(row_data.get(col, 0)).rjust(nspaces_columns_header) for col in headers if
col not in ['@YEAR', 'DOY']) +
'\n'
)
new_rows_dic.append(row)
# Add new rows to the lines list
lines[treatment_range[1]:treatment_range[1]] = new_rows_dic
# Update the rows_added counter
rows_added = len(new_rows)
# Write the updated content back to the file
with open(validated_path, 'w') as file:
file.writelines(lines)
# Add messages about rows added (now inside the try block)
if rows_added > 0:
print(f"{validated_path} update: {rows_added} row{'s' if rows_added > 1 else ''} added successfully.")
else:
print(f"{validated_path} status: No update required.")
except ValueError as ve:
print(f"ValueError: {ve}")
except FileNotFoundError as fe:
print(f"FileNotFoundError: {fe}")
except Exception as e:
print(f"An unexpected error occurred: {e}")