Validating jobs
Validating job creation
Before submitting the task, the server will transform the ecf script to a job file.
This process, known as job creation, is performed by the ecflow_server when the task is ready for submission, and includes the following steps:
Locating and loading the ecf script – see more about the location algorithm.
Perform pre-processing of
%includedirectivesPerform variable substitution.
Store the resulting script in the job file, with a
.jobextension
The resulting job file is the script that the ecflow_server will actually submit for execution.
Considering the $HOME/course/test/t1.ecf file, defined in the previous section,
the generation of the job file will include the following steps:
%include "../head.h"will be substituted by the content of the selected file.%include "../tail.h"will be substituted by the content of the selected file.All variable occurrences (i.e. any text of the form
%<VAR>%) will be substituted by the value of the named variable. For example,%ECF_NAME%will be replaced byt1.
For practical purposes, it is often useful to check the job creation process even before loading the suite definition. This allows the early detection of potential problems, such as missing ecf script or include files, references to unspecified variables and other errors during pre-processing.
Using the ecFlow Python API it is possible to execute the job creation process locally.
Consider the following regarding the job creation process performed by the Python API:
The job creation is independent of the ecflow_server, so default values will be used for server specific variables such as
ECF_PORTandECF_HOST.The resulting job files will use extension
.job0, whereas the server will always generate jobs with extension.job<N>(where<N>corresponds to ECF_TRYNO which is never zero).The job file is created in the same directory as the ecf script.
#!/usr/bin/env python3 import pathlib from ecflow import Defs if __name__ == '__main__': base = pathlib.Path.home() / "course" print("[1] Load suite definition from file 'test.def'") defs = Defs(str(base / "test.def")) print(defs) print("[2] Validating job creation: .ecf -> .job0") defs.check_job_creation(throw_on_error=True, verbose=True)The script above loads the suite definition from the
$HOME/course/test/t1.ecffile and performs the check via the call toecflow.Defs.check_job_creation. An all-in-one script could also create the suite definition programmatically, followed by the job creation check.
What to do:
Create the
$HOME/course/validate.pyscript as shown above, and execute it as follows:cd $HOME/course # Either run by explicitly invoking python python3 ./validate.py # Or make the script executable, and run it directly chmod +x validate.py ./validate.py
Examine the job file
$HOME/course/test/t1.job0, in particular note the variable substitutions made by the ecFlow server (e.g.ECF_PORT,ECF_HOST).
Validating job execution
The previous section demonstrated how a task script can be transformed into a job script.
Unfortunatelly, trying to run this job script locally will fail, because the ecflow_client
commands embedded in the script/job will not be able to communicate with the server.
In particular, the server specific variables such as ECF_PORT and ECF_HOST
where generated by the Python API and will not typically correspond to an existing ecFlow server.
Even if a server was running on the specified host and port, the job would be rejected because
the ECF_PASSWD variable would be used to identify the specific task. When this happens,
i.e. a job uses an incorrect ECF_PASSWD, the job is treated as a zombie and essentially ignored
by the server.
To disable the calls to ecflow_client, and allow the job to be executed locally,
export the environment variable NO_ECF=1. When NO_ECF is set, the ecflow_client
executable returns immediately with a success value, and allows the job to proceed uninterrupted.
export NO_ECF=1
$HOME/course/test/t1.job0
Warning
NO_ECF can be used in any job script, regardless if it was generated using the Python API
or by the ecFlow server itself, and is useful for testing and debugging purposes.
This makes NO_ECF usefull, but should never be used in a production environment.
What to do
Run the job
$HOME/course/test/t1.job0, disabling the calls toecflow_client.