import yaml
from dpest.functions import *
[docs]
def ts(
ts_file_path = None,
treatment = None,
variables = None,
output_path = None,
experiment=None,
suffix = None,
variables_classification = None,
ts_ins_first_line = None,
mrk = '~',
smk = '!',
):
"""
Creates a ``PEST instruction file (.INS)`` for DSSAT **time-series** output
files. This instruction file contains directions for PEST to read simulated
time-series values from daily DSSAT outputs (e.g. ``PlantGro.OUT``,
``PlantN.OUT``, ``PlantC.OUT``, ``PlantGrf.OUT``, ``SoilNi.OUT``,
``SoilWat.OUT``, ``SoilTemp.OUT``) and to match them with the corresponding
measured values stored in the DSSAT T file.
The time-series output files supported by this function share a common
tabular format where the first three columns are:
@YEAR DOY DAS
The ``PEST instruction file (.INS)`` guides PEST in extracting specific
model-generated observations at specific time points for the variables
specified by the user. Additionally, this module creates a tuple containing:
1) A DataFrame with the MEASURED observations (entered by the user in the
DSSAT T file) for the specified variables.
2) The path to the generated ``PEST instruction file (.INS)``.
**Required Arguments**
* **ts_file_path** (*str*):
Path to the DSSAT time-series output file to read. This can be any
daily DSSAT output with ``@YEAR DOY DAS`` as the first three header
columns, for example:
- ``C:/DSSAT48/Soybean/PlantGro.OUT``
- ``C:/DSSAT48/Soybean/PlantN.OUT``
- ``C:/DSSAT48/Soybean/PlantC.OUT``
- ``C:/DSSAT48/Soybean/SoilNi.OUT``
- ``C:/DSSAT48/Soybean/SoilWat.OUT``
* **treatment** (*str*):
Name of the treatment for which the cultivar is being calibrated.
This must match exactly the treatment name as shown in the DSSAT
interface when an experiment is selected (i.e. the value that appears
in the ``TREATMENT`` line of the output header).
Example: ``"164.0 KG N/HA IRRIG"`` or ``"76 Equidist BRAGG"``.
* **variables** (*list* or *str*):
Variable code(s) from the DSSAT T file (and thus present in the header
of the selected time-series output file) that PEST will extract. The
instruction file uses these codes to read the model output at the
specified dates.
- A single variable can be provided as a string, e.g. ``"LAID"``.
- Multiple variables can be provided as a list, e.g.
``["LWAD", "SWAD", "GWAD", "RWAD", "CWAD", "HIAD", "PWAD"]``.
**Optional Arguments**
======================
* **output_path** (*str*, *default: current working directory*):
Directory where the generated ``PEST instruction file (.INS)`` will be
saved. If not provided, the current working directory is used.
* **experiment** (*str*, *optional*):
Experiment code as shown in the PlantGro.OUT header (e.g. ``"AZMC9311"``).
When the same treatment name appears in more than one experiment within
the same time-series file, this argument is used to select the correct
experiment block. If not provided and the treatment is unique in the file,
the function will use the unique experiment automatically. If the
treatment appears in multiple experiments and ``experiment`` is not
specified, it will the variables for that treatment that appear in the last part of the file.
* **suffix** (*str*, *default: ""*):
Suffix to append to the output filename and variable names in the
.INS file. This short code (e.g. ``"TRT1"``, ``"TRT2"``) identifies
different treatments used for calibrating the same cultivar in the same
calibration process. It must be 1–4 characters long and alphanumeric.
For example, if ``suffix="TRT1"`` and
``ts_file_path="C:/DSSAT48/Wheat/PlantGro.OUT"``, the output
file will be named ``PlantGro_TRT1.ins`` and markers will look like
``!LAID_75167_TRT1!``.
* **variables_classification** (*dict*, *optional*):
Mapping of variable codes to their respective categories (groups).
When provided, this dictionary is used directly, with the format::
{"LAID": "lai", "CWAD": "biomass", ...}
When ``variables_classification`` is ``None``, the function loads a
global classification from the package configuration file
(``dpest/arguments.yml``, key ``VARIABLES_CLASSIFICATION_GLOBAL``),
and maps each variable code to its group from that dictionary.
* **ts_ins_first_line** (*str*, *default: "pif"*):
First line of the ``PEST instruction file (.INS)``. By default this
is read from the package configuration (key ``INS_FILE_VARIABLES`` in
``dpest/arguments.yml``) when not provided.
* **mrk** (*str*, *default: "~"*):
Primary marker delimiter character for the instruction file. Must be a
single character and cannot be A–Z, a–z, 0–9, ``!``, ``[``, ``]``,
``(``, ``)``, ``:``, space, tab, or ``&``.
* **smk** (*str*, *default: "!"*):
Secondary marker delimiter character for the instruction file. Must be
a single character and cannot be A–Z, a–z, 0–9, ``[``, ``]``, ``(``,
``)``, ``:``, space, tab, or ``&``.
**Internal behaviour**
======================
* The function parses the header of the selected time-series file to
determine the experiment code, model, and crop name for the specified
treatment.
* From the crop name and the ``SIMULATION_CROP_MODELS`` section in
``dpest/arguments.yml``, it infers the DSSAT crop code (e.g. ``WH``,
``SB``, ``MZ``) and constructs the name of the T file as
``<EXPCODE>.<CROPCODE>T`` (e.g. ``SWSW7501.WHT``, ``CLMO8501.SBT``).
* The T file is then read to obtain the measured time-series values for
the requested variables and dates, which are used to build the DataFrame
and the .INS file.
**Returns**
===========
* *tuple*:
* *pandas.DataFrame*:
A DataFrame containing the measured values for the selected
variables and dates, with columns:
- ``variable_name`` (including date and optional suffix)
- ``value_measured`` (float)
- ``group`` (classification group)
* *str*:
Full path to the generated ``PEST instruction file (.INS)``.
**Examples**
============
1. **PlantGro time series (Soybean)**
.. code-block:: python
from dpest import ts
plantgro_observations, plantgro_ins_path = ts(
treatment='76 Equidist BRAGG',
ts_file_path='C:/DSSAT48/Soybean/PlantGro.OUT',
variables=['LWAD', 'SWAD', 'GWAD', 'RWAD', 'CWAD',
'HIAD', 'PWAD', 'LN%D', 'SH%D', 'HIPD', 'SLAD'],
)
This creates a ``PlantGro.ins`` file and a DataFrame with the measured
values for the selected plant growth variables for treatment
``"76 Equidist BRAGG"``.
2. **PlantN time series (Soybean)**
.. code-block:: python
from dpest import ts
plantn_observations, plantn_ins_path = ts(
treatment='76 Equidist BRAGG',
ts_file_path='C:/DSSAT48/Soybean/PlantN.OUT',
variables=['LN%D', 'SN%D'],
)
This reads nitrogen-related time-series variables from ``PlantN.OUT``
and creates a matching instruction file and DataFrame. Global variable
classifications are used unless a custom mapping is supplied.
3. **Soil water time series**
.. code-block:: python
from dpest import ts
soilwat_observations, soilwat_ins_path = ts(
treatment='76 Equidist BRAGG',
ts_file_path='C:/DSSAT48/Soybean/SoilWat.OUT',
variables=['SW1D', 'SW2D', 'SW3D'],
)
This reads daily soil water content in the top layers from
``SoilWat.OUT`` and builds a PEST instruction file for those variables.
"""
# Define default variables:
yaml_file_variables = 'INS_FILE_VARIABLES'
yaml_variables_classification = 'VARIABLES_CLASSIFICATION_GLOBAL'
yaml_sim_models_key = 'SIMULATION_CROP_MODELS'
MAX_VAR_LENGTH = 20 # In PEST, the variable names should not exceed 20 characters
try:
## Get the yaml_data
# Get the directory of the current script
current_dir = os.path.dirname(os.path.abspath(__file__))
# Construct the path to arguments.yml
arguments_file = os.path.join(current_dir, 'arguments.yml')
# Ensure the YAML file exists
if not os.path.isfile(arguments_file):
raise FileNotFoundError(f"YAML file not found: {arguments_file}")
# Load YAML configuration
with open(arguments_file, 'r') as yml_file:
yaml_data = yaml.safe_load(yml_file)
# Validate treatment
if not treatment or not isinstance(treatment, str):
raise ValueError("The 'treatment' must be a non-empty string.")
# Convert 'variables' to a list if it's not already a list
if not isinstance(variables, list):
variables = [variables]
# Validate that 'variables' is a non-empty list of strings
if not variables or not all(isinstance(var, str) for var in variables):
raise ValueError(
"The 'variables' should be a non-empty string or a list of strings. For example: 'LAID' or ['LAID', 'CWAD']")
# Validate yaml_data
if yaml_data is None:
raise ValueError("The 'yaml_data' argument is required and must be specified by the user.")
# Validate marker delimiters using the validate_marker() function
mrk = validate_marker(mrk, "mrk")
smk = validate_marker(smk, "smk")
# Ensure mrk and smk are different
if mrk == smk:
raise ValueError("mrk and smk must be different characters.")
# Validate variables_classification
if variables_classification is None:
variables_classification = yaml_data[yaml_variables_classification]
if ts_ins_first_line is None:
# Load default arguments from the YAML file if not provided
function_arguments = yaml_data[yaml_file_variables]
ts_ins_first_line = function_arguments['first_line']
# Validate ts_file_path
validated_path = validate_file(ts_file_path, '.OUT')
# Get treatment number
treatment_dict = simulations_lines(validated_path)
# Resolve the correct DSSAT block when the same treatment appears in multiple experiments
selected_block, resolved_experiment_code = resolve_treatment_block_by_experiment(
file_path=validated_path,
treatment=treatment,
treatment_dict=treatment_dict,
experiment=experiment
)
# Use only the selected block from here on to avoid using the wrong duplicated treatment
selected_treatment_dict = {treatment: selected_block}
# Get dictionaries with treatment name, treatement number, treatment and experiment code
treatment_number_name, treatment_experiment_name, treatment_crop_name = \
extract_treatment_info_plantgrowth(validated_path, selected_treatment_dict)
crop_name_from_header = treatment_crop_name.get(treatment)
if crop_name_from_header is None:
raise ValueError(f"Could not determine crop name for treatment '{treatment}'.")
# Load simulation crop/model mappings
sim_models = yaml_data.get(yaml_sim_models_key, {})
# Find the crop entry whose alias list (lower-cased) contains crop_name_from_header
crop_code = None
for crop_key, crop_info in sim_models.items():
aliases = [a.lower() for a in crop_info.get('crop_aliases', [])]
if crop_name_from_header.lower() in aliases:
crop_code = aliases[1] if len(aliases) > 1 else aliases[0]
break
if crop_code is None:
raise ValueError(
f"Could not infer crop code from crop name '{crop_name_from_header}'. "
"Check SIMULATION_CROP_MODELS in arguments.yml."
)
# Use the resolved experiment code from the selected block
experiment_code = resolved_experiment_code
if experiment_code is None:
raise ValueError(f"Could not determine experiment code for treatment '{treatment}'.")
# Build T-file name: <EXPCODE><CROPCODE>T, e.g. SWSW7501 + WH + T -> SWSW7501WHT
t_file_name = f"{experiment_code}.{crop_code.upper()}T"
t_file_path = os.path.join(os.path.dirname(validated_path), t_file_name)
# Get the dataframe from the T file data
t_df = wht_filedata_to_dataframe(t_file_path)
# Load and filter data for all variables
dates_variable_values_dict = filter_dataframe(t_df, treatment, treatment_number_name, variables)
# Check if the filter_dataframe returned an empty dictionary (indicating an error)
if not dates_variable_values_dict:
raise ValueError(f"No valid data found for treatment '{treatment}' with variables {variables}")
# Get the header and first simulation date
header_line, first_sim_line, date_first_sim = get_header_and_first_sim(
validated_path,
treatment,
treatment_dict=selected_treatment_dict
)
# Calculate days dictionary days after first simulation
days_dict = calculate_days_dict(dates_variable_values_dict, date_first_sim)
# adjust the days after first simulation
adjusted_days_dict = adjust_days_dict(days_dict)
# Validate suffix if provided
if suffix is not None:
if not suffix.isalnum():
raise ValueError("Suffix must only contain letters and numbers.")
if len(suffix) > 4:
raise ValueError("Suffix must be at most 4 characters long.")
suffix = "_" + suffix
# Process each variable and generate output text
output_text = ""
for date, (days, vars_at_date) in adjusted_days_dict.items():
positions = find_variable_position(header_line, first_sim_line, vars_at_date)
line = f"l{days}"
current_pos = 0 # before first token on the line
for var in sorted(positions, key=positions.get):
pos = positions[var]
# From start-of-line, reaching token `pos` requires `pos` times "w"
w_count = pos if current_pos == 0 else (pos - current_pos)
if w_count < 0:
raise ValueError(
f"Non-monotonic positions: {var} pos={pos}, current_pos={current_pos}"
)
line += " w" * w_count
line += f" {smk}{var}_{date}{suffix or ''}{smk}"
current_pos = pos
output_text += line + "\n"
# Validate output_path
output_path = validate_output_path(output_path)
# Determine and validate output_filename
if suffix is not None:
# Extract the file name
output_filename = os.path.basename(validated_path).replace('.OUT', f'{suffix}.ins')
# Ensure it ends with '.ins'
if not output_filename.lower().endswith('.ins'):
output_filename += '.ins'
else:
# Default behavior if output_filename not provided
output_filename = os.path.basename(validated_path).replace('.OUT', '.ins')
# Create output text file
ts_ins_file_path = os.path.join(output_path, output_filename)
# Construct the content for the new .ins file
# Include the experiment code as an anchor before the treatment (prevents wrong block when duplicated)
ins_file_content = (
f"{ts_ins_first_line} {mrk}\n"
f"{mrk}{experiment_code}{mrk}\n"
f"{mrk}{treatment}{mrk}\n"
f"{mrk}{header_line[1:].strip()}{mrk}\n"
f"{output_text}"
)
#--------- GET THE GROUP NAME OF THE VARIABLES
dates_variable_values_data = [
{
'date': date,
'variable': variable,
'value_measured': value,
'variable_name': f"{variable}_{date}"
}
for date, variables in dates_variable_values_dict.items()
for variable, value in variables.items()
]
# Create the DataFrame
dates_variable_values_df = pd.DataFrame(dates_variable_values_data)
# Map variables to their respective groups
dates_variable_values_df['group'] = dates_variable_values_df['variable'].map(variables_classification)
# Convert 'value_measured' column to float
dates_variable_values_df['value_measured'] = dates_variable_values_df['value_measured'].astype(float)
# Add the siffix to the variable_name
if suffix is not None:
dates_variable_values_df['variable_name'] = dates_variable_values_df['variable_name'] + suffix
# Select and reorder the columns
result_df = dates_variable_values_df[['variable_name', 'value_measured', 'group']]
# Write the content to the .ins file
with open(ts_ins_file_path, 'w') as ins_file:
ins_file.write(ins_file_content)
print(f"{output_filename} file generated and saved to: {ts_ins_file_path}")
return result_df, ts_ins_file_path
except ValueError as ve:
print(f"ValueError: {ve}")
except FileNotFoundError as fe:
print(f"FileNotFoundError: {fe}")
except Exception as e:
print(f"An unexpected error occurred: {e}")