Friday, April 15, 2016

HTCondor 8.4.5 causes problems with partitionable slots

HTCondor 8.4.5 (released in OSG 3.3.11 on Tuesday, April 12) contains a bug that may affect some sites significantly. If your site uses partitionable slots on execute nodes, OSG recommends that you avoid HTCondor 8.4.5, either by skipping the update or reverting to a previous version (e.g., HTCondor 8.4.4).

(Tech details: Jobs that land on partitionable slots will fail to start about 10% of the time. The failed job correctly returns to the queue in Idle state, and HTCondor will continue trying to match and run the job which should complete eventually. But the recycling process adds inefficiencies into the overall system.)

The HTCondor development and OSG Software teams discovered the bug after the OSG release, and plan to release a patched version soon.