Autoarchive and autorestore

In the real world, suites can have several thousand tasks. These tasks are not required all the time. Having a server with an extremely large number of tasks can cause performance issues.

  • The server writes to the checkpoint file periodically. This disk i/o can interfere with job scheduling when dealing with an excessively large number of tasks.

  • Clients like GUI(ecflow_ui), are also adversely affected by the memory requirements, and slow interactive experience

  • Network traffic is heavily affected

This is where autoarchive becomes useful.

Listing 55 autoarchive example
autoarchive +01:00 # archive one hour after complete
autoarchive 01:00  # archive at 1 am in morning after complete
autoarchive 10     # archive 10 days after complete
autoarchive 0      # archive immediately after complete, can take up to a minute

Autoarchive will write a portion of the definition to disk.

  • Archives suite or family nodes IF they have child nodes(otherwise does nothing).

  • Saves the suite/family nodes to disk, and then removes the in-memory child nodes from the definition.

  • It improves time taken to checkpoint and reduces network bandwidth

  • If archived node is re-queued or begun, the child nodes are automatically restored

  • The nodes are saved to ECF_HOME/<host>.<port>.ECF_NAME.check, where ‘/’ has been replaced with ‘:’ in ECF_NAME

  • Care must be taken if you have trigger reference to the archived nodes

Use ecflow_client –archive to archive manually:

  • ecflow_client –archive=/s1 # archive suite s1

  • ecflow_client –archive=/s1/f1 /s2 # archive family /s1/f1 and suite /s2

  • ecflow_client –archive=force /s1 /s2 # archive suites /s1,/s2 even if they have active tasks

Autorestore can also be done automatically, but is only applied when a node completes.

To restore archived nodes manually use:

  • ecflow_client –restore=/s1/f1 # restore family /s1/f1

  • ecflow_client –restore=/s1 /s2 # restore suites /s1 and /s2

Text

Let us modify the suite definition file again. To avoid waiting this exercise will archive immediately.

# Definition of the suite test.
suite test
 edit ECF_INCLUDE "$HOME/course"
 edit ECF_HOME    "$HOME/course"
 edit SLEEP 20
 family lf1
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family lf2
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family lf3
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family restore
    trigger ./lf1<flag>archived and ./lf2<flag>archived and ./lf3<flag>archived
    task t1
       autorestore ../lf1 ../lf2 ../lf3.   # restore when t1 completes
 endfamily
endsuite

Python

Listing 56 $HOME/course/test.py
import os
from ecflow import (
    Defs,
    Suite,
    Family,
    Task,
    Edit,
    Trigger,
    Complete,
    Event,
    Meter,
    Time,
    Day,
    Date,
    Label,
    RepeatString,
    RepeatInteger,
    RepeatDate,
    InLimit,
    Limit,
    Autoarchive,
    Autorestore,
)


def create_family(name):
    return Family(name, Autoarchive(0), [Task("t{}".format(i)) for i in range(1, 10)])


def create_family_restore():
    return Family(
        "restore",
        Trigger("./lf1<flag>archived and ./lf2<flag>archived and ./lf3<flag>archived"),
        Task("t1", Autorestore(["../lf1", "../lf2", "../lf3"])),
    )


print("Creating suite definition")
home = os.path.join(os.getenv("HOME"), "course")
defs = Defs(
    Suite(
        "test",
        Edit(ECF_INCLUDE=home, ECF_HOME=home, SLEEP=20),
        create_family("lf1"),
        create_family("lf2"),
        create_family("lf3"),
        create_family_restore(),
    )
)
print(defs)

print("Checking job creation: .ecf -> .job0")
print(defs.check_job_creation())

print("Checking trigger expressions and inlimits")
assert len(defs.check()) == 0, defs.check()

print("Saving definition to file 'test.def'")
defs.save_as_defs("test.def")

What to do

  1. Edit the changes i.e. cp -r f5 lf1; cp -r f5 lf2; cp -r f5 lf3;

  2. Replace the suite definition

  3. Run the suite, you should see nodes getting archived, then restored in ecflow_ui

  4. Experiment with archive and restore in ecflow_ui.

  5. Experiment with archive and restore from the command line.

Note

The Autoarchive(0) can take up to one minute to take effect. The server has a 1-minute resolution.