Source code for dpest.spe

# import os
# import yaml
from dpest.functions import *



[docs]
def spe(
    species_file_path = None,
    output_path = None,
    new_template_file_extension = None,
    tpl_first_line = None,
    mrk = '~',
    **parameters_grouped,
):
    """
    Creates a ``PEST template file (.TPL)`` for species-level parameters based on a
    ``DSSAT species file (.SPE)`` (e.g. ``SBGRO048.SPE``, ``WHCER048.SPE``).

    Unlike cultivar (``.CUL``) and ecotype (``.ECO``) files, species files do not follow
    a strictly tabular structure. Parameters are often arranged in blocks, tuples, or
    vectors, and may lack explicit column headers. This module therefore relies on
    user-specified parameter locations (line number and column position) to build the
    template.

    Each parameter to be calibrated is provided as a keyword argument, where the keyword
    is the user-defined parameter name, and the value describes the location and bounds
    of that parameter. Two input formats are supported:

    1) **Tuple / list syntax** (no explicit keys):

       .. code-block:: python

          PARMAX = (line, column, min_value, max_value, group_name)

       For example:

       .. code-block:: python

          PARMAX = (5, 1, 20.0, 60.0, 'PHOTOSYN')

       where:

       - ``line`` (*int*): 1-based line number in the ``.SPE`` file where the parameter
         value (to calibrate) is located.
       - ``column`` (*int*): 1-based column index of the parameter value (to calibrate) within that
         line, counting numeric entries from left to right.
       - ``min_value`` (*float*): Minimum value of the parameter search range to be used
         in the PEST calibration.
       - ``max_value`` (*float*): Maximum value of the parameter search range to be used
         in the PEST calibration.
       - ``group_name`` (*str*, *optional*): Name of the PEST parameter group to which
         this parameter belongs. If omitted, the group name defaults to the parameter
         name (e.g. ``PARMAX``).

    2) **Dictionary syntax** (explicit keys):

       .. code-block:: python

          PARMAX = {
              'line': 5,
              'column': 1,
              'min': 20.0,
              'max': 60.0,
              'group': 'PHOTOSYN',
          }

       If ``'group'`` is not provided, it defaults to the parameter name.

    The function replaces the original numeric values at the specified locations with
    PEST template markers, truncating or padding parameter identifiers as needed to fit
    within the available space, and ensuring that each truncated identifier is unique
    (e.g. ``~PM0~``, ``~PM1~``). The resulting template file is written alongside the
    original species file (or to ``output_path`` if provided).

    **Required Arguments:**
    =======

        * **species_file_path** (*str*):
            Full path to the ``DSSAT species file (.SPE)``. For example:

            - ``C:/DSSAT48/Genotype/WHCER048.SPE``
            - ``C:/DSSAT48/Genotype/SBGRO048.SPE``

    **Optional Arguments:**
    =======

        * **output_path** (*str*, *default: current working directory*):
            Directory to save the generated ``PEST template file (.TPL)``.

        * **new_template_file_extension** (*str*, *default: "TPL"*):
            Extension for the generated ``PEST template file (.TPL)``. This is the PEST
            default value and should not be changed without good reason.

        * **tpl_first_line** (*str*, *default: "ptf"*):
            First line to include in the ``PEST template file (.TPL)``. This is the PEST
            default value and should not be changed without good reason.

        * **mrk** (*str*, *default: "~"*):
            Primary marker delimiter character for the template file. Must be a single
            character and cannot be A–Z, a–z, 0–9, ``!``, ``[``, ``]``, ``(``, ``)``,
            ``:``, space, tab, or ``&``.

        * **parameters_grouped** (*dict*, *optional*):
            Species parameters to calibrate, passed as keyword arguments. Each keyword
            corresponds to a parameter name (used as the PEST parameter identifier before
            truncation), and its value is either:

            - A tuple/list: ``(line, column, min_value, max_value[, group_name])``
            - A dictionary with keys: ``'line'``, ``'column'``, ``'min'``, ``'max'``,
              and optional ``'group'``.

            For example:

            .. code-block:: python

               from dpest import spe

               species_parameters, species_tpl_path = spe(
                   species_file_path = 'C:/DSSAT48/Genotype/WHCER048.SPE',
                   PGERM = (15, 1, 0.0, 20.0, 'Phase_dur'),
                   P0    = (15, 3, -5.0, 5.0, 'Phase_dur'),
                   Fac   = (19, 1, 0.0, 2.0,  'Phase_dur'),
                   h     = (19, 2, 10.0, 30.0, 'Phase_dur'),
                   GrStg = (19, 3, 0.0, 4.0,  'Phase_dur'),
                   LWLOS = (30, 3, 0.0, 6.0,  'Phase_dur'),
               )

            In this case, several CERES-Wheat species parameters from different sections
            of the WHCER048.SPE file are selected for calibration in a single template.

    **Returns:**
    =======

    * *tuple*: A tuple containing:

        * *dict*: A dictionary containing:

            * ``'parameters'``:
                Current parameter values at the specified locations. Keys are the
                truncated parameter identifiers that are written into the template file.

            * ``'minima_parameters'``:
                User-specified minimum values for each parameter, keyed by the truncated
                parameter identifiers. These values define the lower limits of the
                parameter ranges explored by PEST.

            * ``'maxima_parameters'``:
                User-specified maximum values for each parameter, keyed by the truncated
                parameter identifiers. These values define the upper limits of the
                parameter ranges explored by PEST.

            * ``'parameters_grouped'``:
                Parameter group definitions, with each group containing a comma-separated
                list of the truncated parameter identifiers used in the template file.

        * *str*:
            The full path to the generated ``PEST template file (.TPL)``.

    **Notes:**
    =======

    * Line numbering convention:
        This function assumes that the user-provided ``line`` values are 1-based
        (i.e. the first line in the file is line 1). Internally, these values are
        converted to 0-based indices when accessing the file content.

    * Column indexing:
        The ``column`` argument is interpreted as a 1-based index of the numeric entry
        within the specified line, counting from left to right. Internally, the line is
        split into entries using default whitespace separation in order to locate the
        corresponding value.

    **Examples:**
    =======

    1. **CERES-Wheat species file (WHCER048.SPE):**

       .. code-block:: python

          from dpest import spe

          species_parameters, species_tpl_path = spe(
              species_file_path = 'C:/DSSAT48/Genotype/WHCER048.SPE',
              PGERM = (15, 1, 0.0, 20.0, 'Phase_dur'),
              P0    = (15, 3, -5.0, 5.0, 'Phase_dur'),
              Fac   = (19, 1, 0.0, 2.0,  'Phase_dur'),
              h     = (19, 2, 10.0, 30.0, 'Phase_dur'),
              GrStg = (19, 3, 0.0, 4.0,  'Phase_dur'),
              LWLOS = (30, 3, 0.0, 6.0,  'Phase_dur'),
          )

       The returned ``species_parameters`` dictionary can be used to define PEST
       parameter groups and parameter ranges in the PEST control file using the ``pst``
       module, and ``species_tpl_path`` is used in the ``input_output_file_pairs``
       argument of the ``pst`` module to match the original WHCER048.SPE file to the
       template file.

    2. **Soybean species file (SBGRO048.SPE) using dictionary syntax:**

       .. code-block:: python

          from dpest import spe

          species_parameters, species_tpl_path = spe(
              species_file_path = 'C:/DSSAT48/Genotype/SBGRO048.SPE',
              PARMAX = {
                  'line': 5,
                  'column': 1,
                  'min': 20.0,
                  'max': 60.0,
                  'group': 'PHOTOSYN',
              },
              PHTMAX = {
                  'line': 5,
                  'column': 2,
                  'min': 40.0,
                  'max': 80.0,
                  'group': 'PHOTOSYN',
              },
              XLMAXT_2 = {
                  'line': 11,
                  'column': 2,
                  'min': 0.0,
                  'max': 20.0,
                  'group': 'TEMP_RESP',
              },
              XPGSLW_3 = {
                  'line': 18,
                  'column': 3,
                  'min': 0.0,
                  'max': 0.01,
                  'group': 'SLW_SHAPE',
              },
              RTDEPI = {
                  'line': 40,
                  'column': 1,
                  'min': 10.0,
                  'max': 40.0,
                  'group': 'ROOTS',
              },
          )

       This example illustrates the dictionary syntax for specifying parameter locations
       and bounds in the soybean SBGRO048.SPE file, where several parameters are stored
       in tuple or vector‑like blocks rather than simple scalar entries.
    """

    # YAML configuration block for species template settings
    yml_species_block = 'SPECIES_TPL_FILE'
    yaml_file_variables = 'FILE_VARIABLES'

    # Fixed field width (characters per numeric entry in SPE lines)
    FIELD_WIDTH = 6
    # Fixed number of characters for the ID between markers: ~XXX~
    ID_LEN = 3

    try:
        # Locate YAML configuration in the same directory as this module
        current_dir = os.path.dirname(os.path.abspath(__file__))
        arguments_file = os.path.join(current_dir, 'arguments.yml')
        if not os.path.isfile(arguments_file):
            raise FileNotFoundError(f"YAML file not found: {arguments_file}")

        with open(arguments_file, 'r') as yml_file:
            yaml_data = yaml.safe_load(yml_file)

        # Validate species file path and extension
        if species_file_path is None:
            raise ValueError("The 'species_file_path' argument is required and must be specified by the user.")

        spe_extension = yaml_data[yml_species_block][yaml_file_variables].get('spe_file_extension', '.SPE')
        validated_spe_file_path = validate_file(species_file_path, spe_extension)

        # Read template-related defaults from YAML
        function_arguments = yaml_data[yml_species_block][yaml_file_variables]
        mrk = validate_marker(mrk, "mrk")

        if new_template_file_extension is None:
            new_template_file_extension = function_arguments['new_template_file_extension']
        if tpl_first_line is None:
            tpl_first_line = function_arguments['tpl_first_line']

        # Read the entire species file into a list of lines
        file_content = read_dssat_file(validated_spe_file_path)
        lines = file_content.split('\n')

        # Parameter value dicts (keyed initially by full parameter name)
        current_parameter_values = {}
        minima_parameter_values = {}
        maxima_parameter_values = {}

        # Mapping from full parameter name -> truncated PEST ID
        parameter_name_truncated = {}

        # Normalised parameter definitions: [{'name','line','column','min','max','group'}, ...]
        normalized_parameters = []

        # ------------------------------------------------------------------
        # Normalise user-provided parameter specifications
        # ------------------------------------------------------------------
        for param_name, spec in parameters_grouped.items():
            full_name = str(param_name).strip()

            # Tuple/list syntax: (line, column, min, max[, group])
            if isinstance(spec, (tuple, list)):
                if len(spec) < 4:
                    raise ValueError(
                        f"Parameter '{full_name}' must be specified as "
                        "(line, column, min, max[, group]) when using tuple/list syntax."
                    )
                line = spec[0]
                column = spec[1]
                p_min = spec[2]
                p_max = spec[3]
                group = spec[4] if len(spec) > 4 else full_name

            # Dict syntax with explicit keys
            elif isinstance(spec, dict):
                try:
                    line = spec['line']
                    column = spec['column']
                    p_min = spec['min']
                    p_max = spec['max']
                except KeyError as exc:
                    raise ValueError(
                        f"Parameter '{full_name}' dictionary must contain 'line', 'column', 'min', and 'max' keys."
                    ) from exc
                group = spec.get('group', full_name)

            else:
                raise ValueError(
                    f"Parameter '{full_name}' must be specified as a tuple/list "
                    "(line, column, min, max[, group]) or as a dictionary."
                )

            # Ensure integer indices for line and column
            try:
                line = int(line)
                column = int(column)
            except Exception as exc:
                raise ValueError(
                    f"'line' and 'column' for parameter '{full_name}' must be integers."
                ) from exc

            normalized_parameters.append(
                {
                    'name': full_name,
                    'line': line,
                    'column': column,
                    'min': p_min,
                    'max': p_max,
                    'group': str(group).strip(),
                }
            )

        # ------------------------------------------------------------------
        # First pass: assign truncated IDs based on fixed field widths, enforce uniqueness
        # ------------------------------------------------------------------
        used_truncated_ids = set()

        for param_def in normalized_parameters:
            full_name = param_def['name']
            line_number_user = param_def['line']
            column_index = param_def['column']

            # 1-based line index from user -> 0-based in list
            line_idx = line_number_user - 1
            if line_idx < 0 or line_idx >= len(lines):
                raise ValueError(
                    f"Line index {line_number_user} for parameter '{full_name}' is out of range "
                    f"in file {validated_spe_file_path}."
                )

            line_text = lines[line_idx]

            # Fixed-width field for this column (6 characters wide)
            col0 = column_index - 1  # 0-based column index
            field_start = col0 * FIELD_WIDTH
            field_end = field_start + FIELD_WIDTH

            # Ensure line is long enough for this field
            if field_end > len(line_text):
                line_text = line_text.ljust(field_end)
                lines[line_idx] = line_text

            # We always have FIELD_WIDTH characters available for the marker
            # We want 3 characters for the ID between markers: ~XXX~
            max_id_len = ID_LEN

            # Start base_id from a short code (first 3 characters of the name)
            base_id = full_name.strip()
            if len(base_id) > 3:
                base_id = base_id[:3]

            # Enforce maximum length inside the field
            if len(base_id) > max_id_len:
                base_id = base_id[:max_id_len]

            # If there is still room, pad on the right to use full ID length
            if len(base_id) < max_id_len:
                base_id = base_id.ljust(max_id_len)

            truncated = base_id
            counter = 0
            # Ensure unique IDs by altering the tail with numeric suffixes when needed
            while truncated in used_truncated_ids:
                suffix = str(counter)
                if len(base_id) > len(suffix):
                    truncated = base_id[:-len(suffix)] + suffix
                else:
                    truncated = base_id + suffix
                counter += 1

            used_truncated_ids.add(truncated)
            parameter_name_truncated[full_name] = truncated

        # ------------------------------------------------------------------
        # Second pass: build markers and modify lines
        # ------------------------------------------------------------------
        # Work on a copy of lines so multiple parameters on the same line compose correctly
        line_buffer = list(lines)

        for param_def in normalized_parameters:
            full_name = param_def['name']
            line_number_user = param_def['line']
            column_index = param_def['column']
            p_min = param_def['min']
            p_max = param_def['max']

            line_idx = line_number_user - 1
            line_text = line_buffer[line_idx]

            # Fixed-width field for this column
            col0 = column_index - 1
            field_start = col0 * FIELD_WIDTH
            field_end = field_start + FIELD_WIDTH

            if field_end > len(line_text):
                line_text = line_text.ljust(field_end)
                line_buffer[line_idx] = line_text

            field_width = field_end - field_start  # should be FIELD_WIDTH

            # Original numeric value for information (strip spaces)
            field_str = line_text[field_start:field_end]
            token_str = field_str.strip()

            # Store the original numeric value and bounds using the full parameter name
            current_value = token_str
            current_parameter_values[full_name] = current_value
            minima_parameter_values[full_name] = p_min
            maxima_parameter_values[full_name] = p_max

            # Build the marker using the precomputed truncated ID
            truncated_id = parameter_name_truncated[full_name]

            # Fixed: 3 characters for ID inside 6-char field: ~XXX~
            max_id_len = ID_LEN

            # Start from the truncated ID, further limit to max_id_len and strip spaces
            base_core = truncated_id[:max_id_len].strip()

            # Always pad with '-' to exactly max_id_len
            if len(base_core) > max_id_len:
                base_core = base_core[:max_id_len]
            if len(base_core) < max_id_len:
                base_core = base_core.ljust(max_id_len, '-')

            id_for_marker = base_core

            # Assemble the marker core with markers around the padded ID
            marker_core = f"{mrk}{id_for_marker}{mrk}"  # e.g. ~PGE~ or ~h--~

            # Right-align marker inside the field, using spaces to the left if available
            marker = marker_core.rjust(field_width)

            # Replace the field region in the line with the marker
            new_line = line_text[:field_start] + marker + line_text[field_end:]
            line_buffer[line_idx] = new_line

        # ------------------------------------------------------------------
        # Insert PEST header, write TPL, and build return structures
        # ------------------------------------------------------------------
        line_buffer.insert(0, f"{tpl_first_line} {mrk}")

        output_path = validate_output_path(output_path)
        output_new_file_path = os.path.join(
            output_path,
            os.path.splitext(os.path.basename(validated_spe_file_path))[0] + '_SPE' + '.' + new_template_file_extension
        )

        with open(output_new_file_path, 'w') as file:
            file.write("\n".join(line_buffer))

        # Translate dictionaries from full_name keys to truncated IDs
        current_parameter_values = {
            parameter_name_truncated[k]: v
            for k, v in current_parameter_values.items()
            if k in parameter_name_truncated
        }
        minima_parameter_values = {
            parameter_name_truncated[k]: v
            for k, v in minima_parameter_values.items()
            if k in parameter_name_truncated
        }
        maxima_parameter_values = {
            parameter_name_truncated[k]: v
            for k, v in maxima_parameter_values.items()
            if k in parameter_name_truncated
        }

        # Build group definitions (group -> comma-separated truncated IDs)
        grouped_truncated = {}
        for param_def in normalized_parameters:
            full_name = param_def['name']
            group_name = param_def['group']
            if full_name in parameter_name_truncated:
                tid = parameter_name_truncated[full_name]
                grouped_truncated.setdefault(group_name, []).append(tid)

        grouped_truncated = {g: ', '.join(v) for g, v in grouped_truncated.items()}

        print(f"Template file successfully created at: {output_new_file_path}")

        return {
            'parameters': current_parameter_values,
            'minima_parameters': minima_parameter_values,
            'maxima_parameters': maxima_parameter_values,
            'parameters_grouped': grouped_truncated,
        }, output_new_file_path

    except ValueError as ve:
        print(f"ValueError: {ve}")
    except FileNotFoundError as fe:
        print(f"FileNotFoundError: {fe}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
Table of Contents

Source code for dpest.spe