Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message board moderation
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · Next
Author | Message |
---|---|
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
WCG Operational Staus update https://www.cs.toronto.edu/~juris/jlab/wcg.html (click operational status heading) - August 27 August 27, 2025 MAM1 7.07 updates: The addition of spdlog as a dependency to replace the previous debug level printouts with more useful output for those who like to look at stderr.txt before it gets cleaned up by the BOINC client. Some kludgey math, flags, and options now set in the application's main function to try and get Ensmallen -> which depends on Armadillo -> which depends on OpenMP/OpenBLAS -> which nested thread creation causing suspension of concurrent running tasks under the BOINC client and using more CPU than the plan class and --nthreads parameter dictated. Essentially, bad behaviour. Thanks for posting feedback for the first few batches released, I will endeavour to address any further feedback if MDMG/MAM1 continues to over-schedule w.r.t the plan class or otherwise behaves badly. Added two built-in, configurable options for adjusting learning rate when using the LibTorch backend which was observed to improve the model's avg. loss progression during cross validation in fewer epochs, corresponding to: https://docs.pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html https://docs.pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingLR.html Other fixes, additional work on features that will release in later versions after the migration. |
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
New update: https://www.cs.toronto.edu/~juris/jlab/wcg.html (click operational status heading) - August 29. Also pushed to the BOINC client. August 29, 2025 Full migration of WCG from the Graham to Nibi cloud facilities will be completed between 3:00-5:00 p.m. on August 31st, 2025 Sharcnet will then power down all hardware at Graham. We have put in a ticket with UHN Digital to move our DNS records to the new IP addresses we have been allocated in Nibi cloud, and all storage, networking, and compute resources are already provisioned at Nibi. We continue testing QA and Prod on the new infrastructure. We will experience some downtime as *.worldcommunitygrid.org URLs switch over. We will be bringing down workunit creation scripting, BOINC server components, and upload/download servers in sequence, halting the database, performing a final rsync and then bringing down the website, forums, and internal services over the next 48h. In the best case, our DNS records will be switched over on the 31st and everything behind the load balancer will be up and running. However, we want to prepare users for the possibility of additional downtime as we stand up prod on Nibi.( |
![]() Send message Joined: 28 Mar 18 Posts: 141 ![]() |
Thanks for posting this link. I had forgotten about this thread. I'm here. As an obsessive poster this will be hard on me :-) |
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
In reply to unixchick's message of 30 Aug 2025: Thanks for posting this link. I had forgotten about this thread. I'm here. As an obsessive poster this will be hard on me :-)I'm sure we will get through this migration too. Uploading of tasks are still working, but not reporting and asking for new work. The WCG forum and the rest of the site is also still online. |
Send message Joined: 25 May 09 Posts: 1362 ![]() |
...and now all down, website is giving a 503. Let's hope this move is a smooth and rapid one. |
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
Yes indeed. Everything WCG is down and out. With 152 tasks left to crunch in the cache, I believe my slow old (Core(TM) i7-3630QM CPU @ 2.40GHz) Laptop, will still have work left when WCG is back, even if it takes a day or so longer than planned for WCG to come back online. |
![]() Send message Joined: 28 Mar 18 Posts: 141 ![]() |
I think it is 3pm Toronto time... hopefully it will come back soon in 2 hours, but some of that depends on DNS tables |
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
In reply to unixchick's message of 31 Aug 2025: I think it is 3pm Toronto time... hopefully it will come back soon in 2 hours, but some of that depends on DNS tablesYeah, let's hope that everything works as they have planned it. I still have 100 tasks left to crunch, 34 tasks ready to report (uploaded before the outage), and 60 tasks waiting to upload and report. |
![]() Send message Joined: 28 Mar 18 Posts: 141 ![]() |
We now get this message on the WCG web page "503: WCG Migration to Nibi Cloud in Progress See the "Operational Status" tab of the Jurisica Lab WCG pages for details: https://www.cs.toronto.edu/~juris/jlab/wcg.html " |
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
In reply to unixchick's message of 31 Aug 2025: We now get this message on the WCG web pageThank you unixchick. I just checked, and indeed, that's the message now, on the WCG pages. Still from the old Graham adress though (gra-cloud118.graham.sharcnet.ca 199.241.167.118). So, Graham is still active, and not powered down. |
![]() Send message Joined: 28 Mar 18 Posts: 141 ![]() |
Can't reach the web page now. I get a time out error. |
Send message Joined: 25 May 09 Posts: 1362 ![]() |
Another change..... The forum & home pages are now blank, with no error messages (402/503 types), so are we in the middle of the DNS change over? |
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
In reply to robsmith's message of 1 Sep 2025: Another change.....Yes, the new IP to WCG seems to be 199.241.161.110, instead of the old 199.241.167.118. However Nibi cloud, the server and/or the DNS change, probably isn't totally ready yet. I get "ERR_CONNECTION_TIMED_OUT" now. |
![]() Send message Joined: 10 May 07 Posts: 1603 ![]() |
It's mid day in Toronto and still no communication on the Krembil/WCG status site. I get the same DNS pointers to the new IP address but no website. Will just have to wait for a sign of intelligence from Krembil. |
Send message Joined: 25 May 09 Posts: 1362 ![]() |
The sign of intelligence we need is not from Krembil, but from the new cloud host (and this sort of thing is all too common when moving from one cloud environment to another....). |
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
No signs of life when it comes to BOINC's connection to WCG either. Yeah well, the waiting game continues.... Btw: the "new" cloud host, is the same as the old one. Still Sharcnet (UHN Digital). It's just a new cloud environment within the same host. |
![]() Send message Joined: 30 Mar 20 Posts: 516 ![]() |
Well, I guess this migration didn't go as planned. Still no contact with WCG website, or through BOINC. It seems as if I really needed the upped cache. |
![]() Send message Joined: 28 Mar 18 Posts: 141 ![]() |
Nothing yet. Can we tell if it is a DNS thing or a machine thing? If it is a DNS thing then it is just time. If it is a machine thing, then I wish we had info. |
Send message Joined: 3 Nov 20 Posts: 5 ![]() |
Hi! Seems like DNS problem !?? https://dnschecker.org/all-dns-records-of-domain.php?query=www.worldcommunitygrid.org%2F&rtype=ALL&dns=google Hans S. Ps. From the latest update (Aug.29) and last bullet point paragraph it says: In the best case, our DNS records will be switched over on the 31st and everything behind the load balancer will be up and running. However, we want to prepare users for the possibility of additional downtime as we stand up prod on Nibi.( |
Send message Joined: 3 Mar 23 Posts: 17 ![]() |
In reply to Hans Sveen's message of 2 Sep 2025: Hi! No, there's no DNS problem at domain zone(worldcommunitygrid.org) or with particular web-site(www.worldcommunitygrid.org). Obviously, the problem at application level. P.S If you want to check some resource for "DNS" problems , then you should use more "tech-savvy" tools, like Hardenize: https://www.hardenize.com/report/worldcommunitygrid.org/ or Zonemaster: https://zonemaster.net/en/result/55ccbde7e145a1fb |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.