Wednesday, July 21, 2010

VDT ticket system downtime, Thursday July 22nd, 5:30pm Central Time

Hi everyone,

There will be a brief outage of the VDT support system on Thursday, July 22nd beginning at 5:30pm (Central US Timezone). It should last about 30 minutes.

During this time:

* email to vdt-support@opensciencegrid.org will be delayed (But should not be lost.)
* web access to our tickets on crt.cs.wisc.edu will be unavailable

We apologize for any inconvenience.

Tuesday, July 20, 2010

GOC Service Update - Tuesday, July 27th at 14:00 UTC

The GOC will upgrade the following services beginning at Tuesday, July 27th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

Host relocation at Indianapolis site

GOC staff will be relocating the servers at the Indianapolis site to another rack and installing additional equipment. The following services will be down during the relocation:
OIM
OSG TWiki

The following services will be redirected to Bloomington and will remain up, although there may be a temporary decrease in performance during the maintenance:
CEMon/BDII
GOC backend database cluster
GOC RSV client (monitors various GOC services)
GOC Ticket
MyOSG
MyOSG consolidater for RSV data
Software Cache/CA Certificate Repository

OIM 2.22 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:
Added link to OSG homepage on OSG header logo.
Fixed a broken link on the home page.


MyOSG 1.23 (https://myosg.grid.iu.edu)

ITB version is now available for testing at https://myosg-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

RSV Status Map / Changed the tooltip displayed for each status icon to site names
RSV Status Map / Fixed typo on RSV Status Map legent [MYOSG-72]
RSV Status Map / Adjusted z-index so that site selector won't get overwrapped by sub menu
RSV Status Map / Made warning icon displayed below critical
Resource Group / Status History / Fixed the lazy-loading of the graphs. Tested on FF and IE8
[MYOSG-57]
Resource Group / GIP Validation Status / Updated controller & views reflecting upgrade to GIP Validator.
Misc / Status Overview / Fixed a bug on resource group status calculation logic.
Misc / Status Overview / Various bug fixes and updated CA distribution and other services to use newly created OIM Resource Groups instead of showing "UNKNOWN" [MYOSG-60]
Updated application error page to show exception summary
Misc / Added a few check for missing parameter (for more user friendly error message)
Added a link to OSG homepage from OSG logo
Updated application banner to be more conspicuous.
Home Page / Updated full_header so that images under menu will be loaded after everything else is loaded on the page


GOC Ticket 1.23 (https://ticket.grid.iu.edu)

ITB version is now available for testing at https://ticket-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

Added capability to show application banner like myosg
Added Gratia form. Updated assignment rule for ReSS & BDII & GOC Services forms based on Arvind's departure.
Updated menu page so that it can display different set of menu based on application ID (for /goc /itb, /storage) [GOCTICKET-63]
Implemented ticket viewer auto-refresh feature [GOCTICKET-51]
Added a checkbox where it allows user to close the window when it's successfully submitted a ticket update (default - off) [GOCTICKET-52]
Added link to OSG homepage from OSG logo
Various minor bug fixes & cosmetic changes


OSG Display 1.0.6 (http://display.grid.iu.edu)

ITB version is now available for testing at http://display-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

Added link to a PDF file titled "Gratia Buffering Points" [DISPLAY-4]
Added various new tabs and controller for Graph periods.
Updated side content with new set of data
Added Google Analytics tracker
GOC Ticket Synchronizer 1.9

Release Notes:
GGUS Accesser / Added "Diary of Steps" field used when user update a ticket via Email to the list of fields which will be used to construct ticket history.

Thursday, July 15, 2010

GOC Services Restored

Today, shortly before 3pm EDT, the GOC experienced an intermittent outage that affected our services. That outage has been resolved.

This intermittent outage caused connectivity issues and included, but wasn't limited to, the following services:

* GOCTicket
* MyOSG
* OIM
* VOMS
* TWiki

System administrators restored service at approximately 6:45pm EDT and are currently monitoring services to insure that they are fully functional.

Let us know if you see any outstanding issues with these services after the outage.

GOC Service Outage - July 15, 2010

Today, shortly before 3pm EDT, the GOC began experiencing an intermittent service outage that is affecting our services.

This intermittent outage is causing connectivity issues and included, but isn't limited to, the following services:

* GOCTicket
* MyOSG
* OIM
* VOMS
* TWiki

No time estimate is available, but system engineers are currently investigating and we will provide another update when service is restored.

Gratia Upgrade status update 15:40 UTC

At 10:40 CDT (11:40 EDT, 15:40 UTC) on 2010/07/15, the status of the Gratia upgrades was as follows:

The OSG-PROD service completed its table upgrade at 18:24 CDT 2010/07/14 (19:24 EDT, 23:24 UTC) and started receiving data shortly thereafter.
At 19:08 CDT (20:08 EDT, 00:08 UTC on 2010/07/15) the reporting DB IP was switched over to the collector DB, meaning that the reporting, "snapped forward" in time and continues to catch up as data come in from remote probes. As of this time, the downtime for all services except OSG-TRANSFER is considered complete.

At 21:51 CDT (22:51 EDT, 02:51 UTC on 2010/07/15) the OSG-TRANSFER table upgrade was completed. The service had to be restarted due to a possible timed out connection, meaning that there is a small possibility that some older probes, "froze" and may have to be restarted. As of this time however, the downtime for OSG-TRANSFER is considered complete.

At 10:13 CDT on 2010/07/15 (11:13 EDT, 15:13 UTC) the reporting DB was observed to have caught up to the collector DB and the IP was switched back and backups enabled. Reporter URLs are now pulling their data from the reporting DB.
At this time, the Gratia upgrade is complete and no more disruptions are anticipated.

If anyone believes their probe has become stuck (this should be only a rare occurrence), they should check for processes with "gratia" in the command string (ps auwwx | grep gratia) and kill any that were started yesterday. All probes except dCache-transfer will recover automatically; dCache-transfer probe should be restarted using:

service gratia-dcache-transfer stop
service gratia-dcache-transfer start

Wednesday, July 14, 2010

Gratia Upgrade status update 20:00 UTC

A little after 15:00 CDT (16:00 EDT, 20:00 UTC), the status of the gratia upgrades was as follows:

1) The legacy redirector service was deactivated at 09:00 CDT (10:00 EDT, 14:00 UTC) as previously announced.

2) OSG-DAILY and OSG-ITB services have been upgraded successfully and are receiving incoming probe data.

3) Based on a very rough estimate of progress (the relative sizes of the upgrading table file and its temporary counterpart on disk), OSG-PROD is expected to come online around 18:30 CDT (19:30 EDT, 23:30 UTC) this evening.

4) At the time OSG-PROD comes back online, we will swap the reporting-DB IP over to the collector DB and the reporting will start to catch up. Meanwhile the upgrades will be replicated to the reporting DB. All service downtimes except OSG-TRANSFER will be considered complete at this time.

5)OSG-TRANSFER is expected to complete by the same token around or before 23:30 CDT (00:30 EDT, 04:30 UTC), at which time it will quietly start receiving data without human intervention.

6)The reporting-DB IP will be switched back to the reporting DB at such time as the replication has caught up, which is likely to be a day or two from now. This will be transparent and will not affect user-visible service.

One particular note: please do not try to kill / restart probes until such time as that service has been marked available in OIM: A hang is expected until that time.

Monday, July 12, 2010

Gratia Scheduled Maintenance 7/14

All FNAL-based Gratia services will be down on 2010/07/14 for OS and service
upgrades. In addition, the previously announced decommissioning of the legacy
redirector service will also take place at this time.

Gratia release notes for v1.06.16:

* Principal improvement is to the housekeeping feature: this is expected to greatly reduce and hopefully eliminate the instances of significant data lag in reporting.

Outage details:

* At about 09:00 CDT (14:00 UTC), incoming data to the GRATIA-OSG-PROD, GRATIA-OSG-ITB, GRATIA-OSG-TRANSFER and GRATIA-OSG-DAILY services will be stopped in such a way as to hopefully eliminate the possibility of probes becoming "stuck" as has happened in the past (note: probe release currently in VDT test cycle will eliminate this completely).

* There will be one data-collection outage of about 20 minutes, with a shorter
outage in reporting as services are shuffled between highly-available servers and the reporting services are upgraded.

* There will be another outage in data collection of much longer duration as
collector services are upgraded. Because a DB schema upgrade is involved, this could be several hours in duration. During this period, reporting services will be available but data will of course be stale.

* When all collectors upgrades are complete, data collection will resume. This is not expected to be later than 16:00 CDT (21:00 UTC), but given the uncertainty in the time required to upgrade each schema service may be resumed in actuality some time earlier or later than this estimate. In any event, the MyOSG downtime page will have the latest details.

* During any data collection outage, data are retained on probes and re-sent when collector service resumes.

* At some point during this period, the legacy redirector service which has up to now redirected probe data sent to obsolete addresses and port numbers, will be deactivated. The upcoming demise of this service has been announced previously.

Friday, July 9, 2010

Scheduled Outage for Gratia services 2010/07/14

All FNAL-based Gratia services will be down on 2010/07/14 for OS and service
upgrades. In addition, the previously announced decommissioning of the legacy
redirector service will also take place at this time.

Gratia release notes for v1.06.16:

* Principal improvement is to the housekeeping feature: this is expected to greatly reduce and hopefully eliminate the instances of significant data lag in reporting.

Outage details:

* At about 9am, incoming data to the GRATIA-OSG-PROD, GRATIA-OSG-ITB, GRATIA-OSG-TRANSFER and GRATIA-OSG-DAILY services will be stopped in such a way as to hopefully eliminate the possibility of probes becoming "stuck" as has happened in the past (note: probe release currently in VDT test cycle will eliminate this completely).

* There will be one data-collection outage of about 20 minutes, with a shorter
outage in reporting as services are shuffled between highly-available servers and the reporting services are upgraded.

* There will be another outage in data collection of much longer duration as
collector services are upgraded. Because a DB schema upgrade is involved, this could be several hours in duration. During this period, reporting services will be available but data will of course be stale.

* When all collectors upgrades are complete, data collection will resume.

* During any data collection outage, data are retained on probes and re-sent when collector service resumes.

* At some point during this period, the legacy redirector service which has up to now redirected probe data sent to obsolete addresses and port numbers, will be deactivated. The upcoming demise of this service has been announced previously.


Thanks for your help and time,

Chris Green

Tuesday, July 6, 2010

GOC Service Update - Tuesday, July 13th at 14:00 UTC

The GOC will upgrade the following services beginning at Tuesday, July 13th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

MyOSG 1.22 (https://myosg.grid.iu.edu)
ITB version is now available for testing at https://myosg-itb.grid.iu.edu ; we encourage users to test this service before the production release.

Release Notes:

* Updated the way expired status was displayed on RSV status map.
* Made a native iGoogle wrapper in order to fix iGoogle UWA issue [MyOSG-64]
* Fixed the style issues for mobile content [MyOSG-65]
* On resource group / summary XML and for OIM Hierarchy information, fixed the issue where facility / site information was not output correctly. Also added ID elements (No registered XML)
* Minor cosmetic changes & bug fixes


GOC Ticket 1.22 (https://ticket.grid.iu.edu)
ITB version is now available for testing at https://ticket-itb.grid.iu.edu ; we encourage users to test this service before the production release.

Release Notes:

* Fixed the bug where Footprints encoded VO names weren't correctly handled by resource issue submitter.
* (For GIP validator) Added SAM-BDII to top level BDII testing [MyOSG-70]
* Minor cosmetic changes

GOC Ticket Synchronizer 1.8

Release Notes:

* Added capability to receive test trigger (for GOC service monitoring)
* Added more debug logs.

OSG CA Distribution Release 1.15a

A new release of the CA certificates is available at
http://software.grid.iu.edu/pacman/cadist/.

This is version 1.15a and uses IGTF 1.36 as the basis.

Tarball Version: 1.15a
RPM Version: 1.15a-0

Changes:
===== Version 1.15a =================
Built 29 Jun 2010
IGTF 1.36 - current hash format (openssl 0.9x)
Updated relative to 1.13:
8a661490 root certificate for PLGrid with corrected SAN extension (PL)
ff94d436 root certificate for SRCE with new extensions and life time (HR)
1f3834d0 root certificate for ROSA with new AKI extension and serial (RO)

Removed relative to 1.13:
e1fce4e9 FNAL_KCA obsolete CA from experimental area (US)

Updated format of INDEX.txt and INDEX.html files to be consistent
with the format with the new IGTF layout coming in a future release.

Thursday, July 1, 2010

OSG 1.2.11 Update Notification

OSG 1.2.11 Update Notification

Date: July 1, 2010

Affected Components
This update affects all OSG installations.

Summary

This update replaces Java 5 with Java 6 because Java 5 is past its end of life and no longer has security updates. This affects all software in the OSG software stack that uses Java. In addition, we have updated to the latest version of Java 6, 1.6.0_20.

This release updates several software components to new versions, see the complete list below.

* Java
* Java software has been altered to use Java 6 instead of Java 5
* osg-version


Issues Fixed
The default for Java software has been altered to use the Java 6 VM. In addition, Java 6 has been updated to the latest Sun Java release which has some bug and security fixes.

Update Instructions
Update instructions can be found on the OSG TWiki under the OSG 1.2.11 update instructions: https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/OSG12UpdateInstructions.

Additional Information
The release notes for the VDT 2.0.0p18 release underlying this release can be found here: http://vdt.cs.wisc.edu/releases/2.0.0/release-p18.html.