Validating jobs

Validating job creation

Before submitting the task, the server will transform the ecf script to a job file.

This process, known as job creation, is performed by the ecflow_server when the task is ready for submission, and includes the following steps:

The resulting job file is the script that the ecflow_server will actually submit for execution.

Considering the $HOME/course/test/t1.ecf file, defined in the previous section, the generation of the job file will include the following steps:

  • %include "../head.h" will be substituted by the content of the selected file.

  • %include "../tail.h" will be substituted by the content of the selected file.

  • All variable occurrences (i.e. any text of the form %<VAR>%) will be substituted by the value of the named variable. For example, %ECF_NAME% will be replaced by t1.

For practical purposes, it is often useful to check the job creation process even before loading the suite definition. This allows the early detection of potential problems, such as missing ecf script or include files, references to unspecified variables and other errors during pre-processing.

Using the ecFlow Python API it is possible to execute the job creation process locally.

Consider the following regarding the job creation process performed by the Python API:

  • The job creation is independent of the ecflow_server, so default values will be used for server specific variables such as ECF_PORT and ECF_HOST.

  • The resulting job files will use extension .job0, whereas the server will always generate jobs with extension .job<N> (where <N> corresponds to ECF_TRYNO which is never zero).

  • The job file is created in the same directory as the ecf script.

Listing 6 $HOME/course/validate.py
#!/usr/bin/env python3

import pathlib
from ecflow import Defs

if __name__ == '__main__':

    base = pathlib.Path.home() / "course"

    print("[1] Load suite definition from file 'test.def'")
    defs = Defs(str(base / "test.def"))
    print(defs)

    print("[2] Validating job creation: .ecf -> .job0")
    defs.check_job_creation(throw_on_error=True, verbose=True)

The script above loads the suite definition from the $HOME/course/test/t1.ecf file and performs the check via the call to ecflow.Defs.check_job_creation. An all-in-one script could also create the suite definition programmatically, followed by the job creation check.

What to do:

  1. Create the $HOME/course/validate.py script as shown above, and execute it as follows:

    cd $HOME/course
    
    # Either run by explicitly invoking python
    python3 ./validate.py
    
    # Or make the script executable, and run it directly
    chmod +x validate.py
    ./validate.py
    
  2. Examine the job file $HOME/course/test/t1.job0, in particular note the variable substitutions made by the ecFlow server (e.g. ECF_PORT, ECF_HOST).

Validating job execution

The previous section demonstrated how a task script can be transformed into a job script.

Unfortunatelly, trying to run this job script locally will fail, because the ecflow_client commands embedded in the script/job will not be able to communicate with the server. In particular, the server specific variables such as ECF_PORT and ECF_HOST where generated by the Python API and will not typically correspond to an existing ecFlow server. Even if a server was running on the specified host and port, the job would be rejected because the ECF_PASSWD variable would be used to identify the specific task. When this happens, i.e. a job uses an incorrect ECF_PASSWD, the job is treated as a zombie and essentially ignored by the server.

To disable the calls to ecflow_client, and allow the job to be executed locally, export the environment variable NO_ECF=1. When NO_ECF is set, the ecflow_client executable returns immediately with a success value, and allows the job to proceed uninterrupted.

export NO_ECF=1
$HOME/course/test/t1.job0

Warning

NO_ECF can be used in any job script, regardless if it was generated using the Python API or by the ecFlow server itself, and is useful for testing and debugging purposes.

This makes NO_ECF usefull, but should never be used in a production environment.

What to do

  1. Run the job $HOME/course/test/t1.job0, disabling the calls to ecflow_client.