Running (remote) jobs

To start a job, the ecflow_server uses the content of the ECF_JOB_CMD variable. By modifying this variable, it is possible to control where and how a job file will run. The command should be used in conjunctions with the variable ECF_JOB and ECF_JOBOUT. The ECF_JOB variable specifies the job file path and the ECF_JOBOUT defines the path of a file where the output of the job is stored.

Listing 63 A typical ECF_JOB_CMD, spawns the job on the local machine in the background
ECF_JOB_CMD = %ECF_JOB% 1> %ECF_JOBOUT% 2>&1 &

Updating ECF_JOB_CMD allows to run the tasks on a remote machine, taking advantage of the unix command rsh.

In the following examples, consider a variable HOST that defines the name of the remote host, and assume that all the files are visible on all the hosts (i.e. using NFS). replace the string <REMOTE-HOSTNAME> with a host name of your choice.

Note

The environment of a task running on a remote host is different from that of a task running locally. This depends on how the local and remote systems are set up.

On the remote system, it is likely that the environment variablel PATH needs to be adjusted to allow using task commands.

Consider adding the following line to the head.h file, before calling ecflow_client --init ....

export PATH=$PATH:/usr/local/apps/ecflow/%ECF_VERSION%/bin

For the following setup ensure an Ssh connection, based on private/public key, is available to the remote machine.

Attempt to access to the remote machine through ssh without a password. In case a password is requested, consider adding the public key on the remote machine, with the following commands:

Listing 64 no password for ssh connection
REMOTE_HOST=<REMOTE-HOSTNAME>  # change this to the remote host name
ssh $USER@$REMOTE_HOST mkdir -p \$HOME/.ssh
cat $HOME/.ssh/id_rsa.pub || ssh-keygen -t rsa -b 2048
cat $HOME/.ssh/id_rsa.pub | ssh $USER@$REMOTE_HOST 'cat &gt;&gt; $HOME/.ssh/authorized_keys'

Suite Definition

Modify the suite definition file as follows:

# Definition of the suite test
suite test
 edit ECF_INCLUDE "$HOME/course"
 edit ECF_HOME    "$HOME/course"
 limit l1 2

 family f5
     edit HOST <REMOTE-HOSTNAME>
     edit ECF_OUT /tmp/$USER
     edit ECF_JOB_CMD "ssh %HOST% 'mkdir -p %ECF_OUT%/%SUITE%/%FAMILY% &amp;&amp; %ECF_JOB% &gt; %ECF_JOBOUT% 2&gt;&amp;1 &amp;'"
     inlimit l1
     edit SLEEP 20
     task t1
     task t2
     task t3
     task t4
     task t5
     task t6
     task t7
     task t8
     task t9
 endfamily
endsuite

When using csh as login shell, define ECF_JOB_CMD as:

edit ECF_JOB_CMD "ssh %HOST% 'mkdir -p %ECF_OUT%/%SUITE%/%FAMILY%; %ECF_JOB% &gt;&amp; %ECF_JOBOUT%'"

Logserver

The job output generated on the remote machine can be inspected by using a log server. This assumes that variables ECF_LOGHOST and ECF_LOGPORT are present in the suite definition.

Launch the log server on a remote machine:

ssh $USER@<REMOTE-HOSTNAME> /path/to/ecflow/%ECF_VERSION%/bin/ecflow_logserver.sh -d /tmp/$USER -m /tmp/$USER:/tmp/$USER

What to do

  1. Adjust the PATH environment variable in head.h

  2. Apply the changes to suite definition.

  3. In the ecflow_ui, execute the suite.

  4. In case of errors, inspect the ecflow server log file (i.e. host.port.ecf.log) and determine what is the cause of the error.

  5. Add uname -n to the task script to determine in which machine the task is running.

  6. Launch the log server, and access the remote job output using the ecflow_ui.