Autoarchive and autorestore
In the real world, suites can have several thousand tasks. These tasks are not required all the time. Having a server with an extremely large number of tasks can cause performance issues.
The server writes to the checkpoint file periodically. This disk i/o can interfere with job scheduling when dealing with an excessively large number of tasks.
Clients like GUI(ecflow_ui), are also adversely affected by the memory requirements, and slow interactive experience
Network traffic is heavily affected
This is where autoarchive becomes useful.
autoarchive +01:00 # archive one hour after complete
autoarchive 01:00 # archive at 1 am in morning after complete
autoarchive 10 # archive 10 days after complete
autoarchive 0 # archive immediately after complete, can take up to a minute
Autoarchive will write a portion of the definition to disk.
Archives suite or family nodes IF they have child nodes(otherwise does nothing).
Saves the suite/family nodes to disk, and then removes the in-memory child nodes from the definition.
It improves time taken to checkpoint and reduces network bandwidth
If archived node is re-queued or begun, the child nodes are automatically restored
The nodes are saved to ECF_HOME/<host>.<port>.ECF_NAME.check, where ‘/’ has been replaced with ‘:’ in ECF_NAME
Care must be taken if you have trigger reference to the archived nodes
Use ecflow_client –archive to archive manually:
ecflow_client –archive=/s1 # archive suite s1
ecflow_client –archive=/s1/f1 /s2 # archive family /s1/f1 and suite /s2
ecflow_client –archive=force /s1 /s2 # archive suites /s1,/s2 even if they have active tasks
Autorestore can also be done automatically, but is only applied when a node completes.
To restore archived nodes manually use:
ecflow_client –restore=/s1/f1 # restore family /s1/f1
ecflow_client –restore=/s1 /s2 # restore suites /s1 and /s2
Text
Let us modify the suite definition file again. To avoid waiting this exercise will archive immediately.
# Definition of the suite test.
suite test
edit ECF_INCLUDE "$HOME/course"
edit ECF_HOME "$HOME/course"
edit SLEEP 20
family lf1
autoarchive 0
task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
endfamily
family lf2
autoarchive 0
task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
endfamily
family lf3
autoarchive 0
task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
endfamily
family restore
trigger ./lf1<flag>archived and ./lf2<flag>archived and ./lf3<flag>archived
task t1
autorestore ../lf1 ../lf2 ../lf3. # restore when t1 completes
endfamily
endsuite
Python
import os
from ecflow import (
Defs,
Suite,
Family,
Task,
Edit,
Trigger,
Complete,
Event,
Meter,
Time,
Day,
Date,
Label,
RepeatString,
RepeatInteger,
RepeatDate,
InLimit,
Limit,
Autoarchive,
Autorestore,
)
def create_family(name):
return Family(name, Autoarchive(0), [Task("t{}".format(i)) for i in range(1, 10)])
def create_family_restore():
return Family(
"restore",
Trigger("./lf1<flag>archived and ./lf2<flag>archived and ./lf3<flag>archived"),
Task("t1", Autorestore(["../lf1", "../lf2", "../lf3"])),
)
print("Creating suite definition")
home = os.path.join(os.getenv("HOME"), "course")
defs = Defs(
Suite(
"test",
Edit(ECF_INCLUDE=home, ECF_HOME=home, SLEEP=20),
create_family("lf1"),
create_family("lf2"),
create_family("lf3"),
create_family_restore(),
)
)
print(defs)
print("Checking job creation: .ecf -> .job0")
print(defs.check_job_creation())
print("Checking trigger expressions and inlimits")
assert len(defs.check()) == 0, defs.check()
print("Saving definition to file 'test.def'")
defs.save_as_defs("test.def")
What to do
Edit the changes i.e. cp -r f5 lf1; cp -r f5 lf2; cp -r f5 lf3;
Replace the suite definition
Run the suite, you should see nodes getting archived, then restored in ecflow_ui
Experiment with archive and restore in ecflow_ui.
Experiment with archive and restore from the command line.
Note
The Autoarchive(0) can take up to one minute to take effect. The server has a 1-minute resolution.