Running (remote) jobs
To start a job, the ecflow_server uses the content of the ECF_JOB_CMD variable.
By modifying this variable, it is possible to control where and how a job file will run.
The command should be used in conjunctions with the variable ECF_JOB and ECF_JOBOUT.
The ECF_JOB variable specifies the job file path and the ECF_JOBOUT defines the path of a file where the output of the job is stored.
ECF_JOB_CMD = %ECF_JOB% 1> %ECF_JOBOUT% 2>&1 &
Updating ECF_JOB_CMD allows to run the tasks on a remote machine, taking advantage of the unix command rsh.
In the following examples, consider a variable HOST that defines the name of the remote host, and assume that all the files are visible on all the hosts (i.e. using NFS).
replace the string <REMOTE-HOSTNAME> with a host name of your choice.
Note
The environment of a task running on a remote host is different from that of a task running locally. This depends on how the local and remote systems are set up.
On the remote system, it is likely that the environment variablel PATH needs to be adjusted to allow using task commands.
Consider adding the following line to the head.h file, before calling ecflow_client --init ....
export PATH=$PATH:/usr/local/apps/ecflow/%ECF_VERSION%/bin
For the following setup ensure an Ssh connection, based on private/public key, is available to the remote machine.
Attempt to access to the remote machine through ssh without a password. In case a password is requested, consider adding the public key on the remote machine, with the following commands:
REMOTE_HOST=<REMOTE-HOSTNAME> # change this to the remote host name
ssh $USER@$REMOTE_HOST mkdir -p \$HOME/.ssh
cat $HOME/.ssh/id_rsa.pub || ssh-keygen -t rsa -b 2048
cat $HOME/.ssh/id_rsa.pub | ssh $USER@$REMOTE_HOST 'cat >> $HOME/.ssh/authorized_keys'
Suite Definition
Modify the suite definition file as follows:
# Definition of the suite test
suite test
edit ECF_INCLUDE "$HOME/course"
edit ECF_HOME "$HOME/course"
limit l1 2
family f5
edit HOST <REMOTE-HOSTNAME>
edit ECF_OUT /tmp/$USER
edit ECF_JOB_CMD "ssh %HOST% 'mkdir -p %ECF_OUT%/%SUITE%/%FAMILY% && %ECF_JOB% > %ECF_JOBOUT% 2>&1 &'"
inlimit l1
edit SLEEP 20
task t1
task t2
task t3
task t4
task t5
task t6
task t7
task t8
task t9
endfamily
endsuite
When using csh as login shell, define ECF_JOB_CMD as:
edit ECF_JOB_CMD "ssh %HOST% 'mkdir -p %ECF_OUT%/%SUITE%/%FAMILY%; %ECF_JOB% >& %ECF_JOBOUT%'"
Modify the function create_family_f5() created earlier, to add HOST, ECF_OUT, ECF_LOGHOST, ECF_LOGPORT, and ECF_JOB_CMD.
import os
from ecflow import (
Defs,
Suite,
Family,
Task,
Edit,
Trigger,
Complete,
Event,
Meter,
Time,
Day,
Date,
Label,
RepeatString,
RepeatInteger,
RepeatDate,
InLimit,
Limit,
)
def create_family_f5():
return Family(
"f5",
InLimit("l1"),
Edit(
SLEEP=20,
HOST="?????",
ECF_OUT="/tmp/%s" % os.getenv("USER"),
ECF_LOGHOST="%HOST%",
ECF_LOGPORT="?????", # port=$((35000 + $(id -u))) run this on the command line
ECF_JOB_CMD="ssh %HOST% 'mkdir -p %ECF_OUT%/%SUITE%/%FAMILY%; %ECF_JOB% > %ECF_JOBOUT% 2>&1 &'",
),
[Task("t{}".format(i)) for i in range(1, 10)],
)
print("Creating suite definition")
home = os.path.join(os.getenv("HOME"), "course")
defs = Defs(
Suite(
"test",
Edit(ECF_INCLUDE=home, ECF_HOME=home),
Limit("l1", 2),
create_family_f5(),
)
)
print(defs)
print("Checking job creation: .ecf -> .job0")
print(defs.check_job_creation())
print("Checking trigger expressions")
assert len(defs.check()) == 0, defs.check()
print("Saving definition to file 'test.def'")
defs.save_as_defs("test.def")
Logserver
The job output generated on the remote machine can be inspected by using a log server.
This assumes that variables ECF_LOGHOST and ECF_LOGPORT are present in the suite definition.
Launch the log server on a remote machine:
ssh $USER@<REMOTE-HOSTNAME> /path/to/ecflow/%ECF_VERSION%/bin/ecflow_logserver.sh -d /tmp/$USER -m /tmp/$USER:/tmp/$USER
What to do
Adjust the
PATHenvironment variable inhead.hApply the changes to suite definition.
In the ecflow_ui, execute the suite.
In case of errors, inspect the ecflow server log file (i.e.
host.port.ecf.log) and determine what is the cause of the error.Add
uname -nto the task script to determine in which machine the task is running.Launch the log server, and access the remote job output using the ecflow_ui.