Message boards : BOINC client : Mac OS X: Can't destroy/create shared memory
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 Sep 05 Posts: 13 ![]() |
After many days of running on Mac OS X 10.4.3, on a dual G5 machine, eventually the message log starts having errors when trying to change which work unit it is working on: [i]Date Time[/i] Couldn't destroy shared memory: system shmctl After several of those during the work unit switch, eventually it is followed by: [i]Date Time Project[/i] Can't create shared memory: system shmget [i]Date Time Project[/i] Unrecoverable error for result [i]result_name[/i] The client then quickly repeatedly downloads work units and aborts them, until it hits the quota of work units. This is my work development machine, so I run CodeWarrior, Xcode and various versions of MacBU products. I know Office uses shared memory, but I am not personally familiar with the implementation. Is anyone else seeing this behavior? -nh |
Send message Joined: 30 Aug 05 Posts: 297 |
There is one other person having this same problem, but they are running OS X Server. The thread is over on SETI, Problem with Mac clients?. I'm afraid my Mac development days ended about the same time OS X came out, so I haven't been able to be much help. I'm pointing him over here as well - maybe the two of you can figure something out! |
Send message Joined: 6 Dec 05 Posts: 10 ![]() |
Hmmm ... I too am working on an up to date dual G5 (2.7 GHz). How much RAM do you have? Are you running the Server version or just the normal one? So far the only solution I've found is to reboot (this is quite impractical). I don't think I'm running code warrior but the XCode tools are installed as well ... I'm also not familliar with the Mac BU products ... The machine spends it's days running terminals for me (ssh'd into my linux boxes), firefox, RDC to a Windows 2003 Server, and running iTunes. That's pretty much it, beyond working on SETI packets ... I'm no programmer - I'm a systems admininstrator. So even if I could get the BOINC source, I don't know what kind of assistance I could offer ... I'm fairly good at running things in debugging mode and foisting off the output onto someone else, once I know *how* to run something in debugging mode. Thing is, I think the client software is giving us the important information ... |
Send message Joined: 6 Dec 05 Posts: 10 ![]() |
I had an epiphany this morning ... What if this is a "memory management on a dual processor system" thing? I know it should be the same as single processor, but maybe it's not ... |
Send message Joined: 29 Aug 05 Posts: 225 |
If the fix suggested in the link works, I would love to have your log files and a note so I could add this as an example to the Wiki ... Zip up the TXT files in the BOINC directory and send them to [email protected] Thanks! |
Send message Joined: 6 Dec 05 Posts: 10 ![]() |
Paul: It may fend off the eventuality, but no - it is not a permanent fix. And, now that I think about it, it may have just been a placebo - the "fix" requires a reboot to implement, and when the problem reoccurs after the "fix", a reboot makes it go away (until it happens again). It would be my guess that there's an issue with memory access on dual processor Macs - creation doesn't seem to be the problem (until you run out), but destruction (as per Nathan's original post, and my comments in the SETI thread) remains a problem, and the "fix" may only delay the inevitable reoccurance of "create" errors. Mind you this is with the full blown GUI client - I haven't had a chance to try any others yet ... |
Send message Joined: 29 Aug 05 Posts: 225 |
Yes, well, that is why I wanted the logs. And I can say all of that ... |
Send message Joined: 30 Aug 05 Posts: 297 |
It would be my guess that there's an issue with memory access on dual processor Macs I would limit this even further, as _most_ dual processor Macs aren't reporting the problem. Both of you who are, have "more advanced" things running - OS X Server in one case, and a lot of development tools in the other. While I would understand Server being "different", I don't know why XTools, etc., would make any difference... but there are an awful lot of Dual and Quad systems on 10.4.3 out there crunching. If the problem was common, I think we'd see a lot more reports of it. Not that this helps _you_ two! :-/ |
Send message Joined: 29 Aug 05 Posts: 225 |
Well, I am running "Tiger" and x-code so I am not sure it is that ... But, the server version is different so ... |
Send message Joined: 6 Dec 05 Posts: 10 ![]() |
Nathan didn't specify "Server" when he posted his original message, and I haven't seen a response from him yet verifying or disproving the idea, so it could still be limited to "Server on a dual PowerMac G5" thing ... Paul: You've got my logs - do you need them again? I can resend them to you; or if I didn't provide what you wanted, could you be more specific? I wonder how many folks running Tiger on a dual processor G5 turn their machines off? Or reboot them daily (or every couple of days)? Both of those situations would resolve the problem. Given how the PowerMac G5's are presented when you configure one at the Apple site, I would say very very few people probably order them with Server installed. And then how many of those that do keep them on 24 hours a day running BOINC? So far, it may just be me. And maybe Nathan. My G5 is currently having issues uploading/downloading packets (I guess because the seti database servers are busy), so I can't provide much information at this time ... it's finished with all the work it has ... I'm willing to give all information I possibly could to resolve this issue - I just need to know what I should try (and the database servers to talk to me ;) or what information people need ... |
Send message Joined: 30 Aug 05 Posts: 297 |
I'm willing to give all information I possibly could to resolve this issue - I just need to know what I should try (and the database servers to talk to me ;) or what information people need ... Are you running any other projects? If you suspended SETI while they're having trouble, and ran, say, Einstein, it would tell us if it was SETI-specific (their app) or something system-wide... If it's application specific, then possibly changing to an Altivec-optimized app would solve it. |
Send message Joined: 6 Dec 05 Posts: 10 ![]() |
I had to reboot - when I signed on to Einstein, I couldn't create any shared memory objects. After the reboot though, Einstein exhibits the same behavior: Fri Dec 9 10:23:09 2005|Einstein@Home|Pausing result l1_0197.0__0197.4_0.1_T04_S4lD_2 (removed from memory) Fri Dec 9 10:23:09 2005|Einstein@Home|Pausing result l1_0197.0__0197.0_0.1_T05_S4lD_2 (removed from memory) Fri Dec 9 10:23:10 2005||Couldn't destroy shared memory: system shmctl Fri Dec 9 10:23:10 2005||request_reschedule_cpus: process exited Fri Dec 9 10:23:11 2005||Couldn't destroy shared memory: system shmctl Fri Dec 9 10:23:11 2005||request_reschedule_cpus: process exited It would appear that the main BOINC client is where the issue lies. But that's just me talking out of my butt ... |
Send message Joined: 30 Aug 05 Posts: 297 |
It would appear that the main BOINC client is where the issue lies. But that's just me talking out of my butt ... I suspect you're right though. Are you running 5.2.13? If so, we might try _older_ versions, see if this is something they messed up recently... I don't know how much time you want to spend on this. If you're willing, I'll dig up URLs for two older versions that I had good luck with, that should be compatible with your current xml files and such. I probably won't be back on for at least 12 hours though... |
Send message Joined: 6 Dec 05 Posts: 10 ![]() |
Well it's my work machine we're talking about, and the weekend is coming ... so *I* probably won't get to it until Monday. I am using 5.2.13 now but I'm pretty sure I saw the behavior in earlier versions. I can always give a couple a try though ... |
Send message Joined: 29 Aug 05 Posts: 225 |
Paul: You've got my logs - do you need them again? I can resend them to you; or if I didn't provide what you wanted, could you be more specific? Eric, Sorry ... yes I have them ... mind is not working well ... :( |
Send message Joined: 6 Dec 05 Posts: 10 ![]() |
Bill: Haven't seen you post the versions; I can tell you you might have to go back before 5.2.8; I think I saw the behavior on that one ... I've checked and I've cleaned up after myself :( I don't have any older versions on my hard drive to check at this time. |
Send message Joined: 30 Aug 05 Posts: 297 |
Haven't seen you post the versions; I can tell you you might have to go back before 5.2.8; I think I saw the behavior on that one ... Sorry to let this slip through the cracks... been rather busy around BOINCdom the last few days! :-( I tried to check the bugs database, just to make sure you weren't doing all this hassle unnecessarily, if someone had already identified the problem - but the bugs database is down... perfect. I did go through the last couple of months of developer mailing list archives looking for anything (and I'm impressed by how much Mac stuff IS being done...) but didn't see anything that looked related to this. Here is V4.72, and V5.2.4. I wouldn't go back any further than 4.72 as that is where the "new scheduler" code came in, and I'm not sure how the project website info would react to having missing information. If we can definitely say "this started happening between version x and version y", then I know who to get that info sent to now, anyway. I _am_ pretty sure by this point that this is an OS X Server problem, or at least that it's activated by something that Server does that 'normal' installs don't do. It could be something that is set up within OS X like the journalled file system was though, where before 10.4 only server versions had it "turned on", but you _could_ turn it on in any version. I've also learned a bit about how BOINC uses shared memory, but nothing that helps with the problem. Sigh. |
Send message Joined: 6 Dec 05 Posts: 10 ![]() |
V5.2.4: Wed Dec 14 12:38:59 2005||Suspending computation and network activity - user is active Wed Dec 14 12:38:59 2005|Einstein@Home|Pausing result l1_0197.0__0197.3_0.1_T12_S4lD_1 (removed from memory) Wed Dec 14 12:38:59 2005|Einstein@Home|Pausing result l1_0197.0__0197.4_0.1_T12_S4lD_1 (removed from memory) Wed Dec 14 12:39:00 2005||Couldn't destroy shared memory: system shmctl Wed Dec 14 12:39:00 2005||request_reschedule_cpus: process exited Wed Dec 14 12:39:01 2005||Couldn't destroy shared memory: system shmctl Wed Dec 14 12:39:01 2005||request_reschedule_cpus: process exited V4.72 wanted an account key; I found my SETI, but it wouldn't let me attach to the project. I don't think I ever got one for Einstein. :( |
Send message Joined: 30 Aug 05 Posts: 297 |
Eric, thanks for all the time you've put into this. I think at this point it's going to require a developer to take a look at it. I'll try to get the right guy looking at this thread... |
Send message Joined: 7 Sep 05 Posts: 13 ![]() |
Sorry I haven't responded quickly! I am running 10.4.3 non-server. I run Xcode, and by MacBU products, I mean the Microsoft MacBU (Microsoft Office, Microsoft Messenger, Remote Desktop Connection, etc.) Again, Microsoft Office uses shared memory and may be influencing how it is used by BOINC. Eric: Are you running Microsoft Office or Messenger on that machine? If necessary, I can make myself familiar with the shared memory implementation in Office, but that does not seem like a fun task. If Eric isn't running Office or Messenger then we can remove that from the list of problem sources. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.