Thursday, July 15, 2010

Gratia Upgrade status update 15:40 UTC

At 10:40 CDT (11:40 EDT, 15:40 UTC) on 2010/07/15, the status of the Gratia upgrades was as follows:

The OSG-PROD service completed its table upgrade at 18:24 CDT 2010/07/14 (19:24 EDT, 23:24 UTC) and started receiving data shortly thereafter.
At 19:08 CDT (20:08 EDT, 00:08 UTC on 2010/07/15) the reporting DB IP was switched over to the collector DB, meaning that the reporting, "snapped forward" in time and continues to catch up as data come in from remote probes. As of this time, the downtime for all services except OSG-TRANSFER is considered complete.

At 21:51 CDT (22:51 EDT, 02:51 UTC on 2010/07/15) the OSG-TRANSFER table upgrade was completed. The service had to be restarted due to a possible timed out connection, meaning that there is a small possibility that some older probes, "froze" and may have to be restarted. As of this time however, the downtime for OSG-TRANSFER is considered complete.

At 10:13 CDT on 2010/07/15 (11:13 EDT, 15:13 UTC) the reporting DB was observed to have caught up to the collector DB and the IP was switched back and backups enabled. Reporter URLs are now pulling their data from the reporting DB.
At this time, the Gratia upgrade is complete and no more disruptions are anticipated.

If anyone believes their probe has become stuck (this should be only a rare occurrence), they should check for processes with "gratia" in the command string (ps auwwx | grep gratia) and kill any that were started yesterday. All probes except dCache-transfer will recover automatically; dCache-transfer probe should be restarted using:

service gratia-dcache-transfer stop
service gratia-dcache-transfer start