Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3319
Posts: 3319   Pages: 332   [ Previous Page | 158 159 160 161 162 163 164 165 166 167 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3313438 times and has 3318 replies Next Thread
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Results returned have dropped from 22K to 17K per day. and those would have been done by fast machines so more priority units for us ordinary crunchers.

Mike.
[Dec 8, 2021 7:50:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
AnandBhat
Cruncher
Joined: Apr 2, 2020
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

The big hitters may have run into this issue when ramping up their ARP output -- https://github.com/BOINC/boinc/issues/4572
[Dec 9, 2021 12:46:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

I'm skeptical of the described problem in 4572. I currently run 64 concurrent ARP1 work units and have done so for the past 28 days (broke away from ARP1 to run MCM for a challenge). Have also run 128 concurrent (cut back due to the length of time to complete a WU). I have never encountered the issue described after contributing over 100 years of computer time to ARP1. I have seen the uploads fail and accumulate like during system maintenance windows or when WCG was encountering filesystem errors on the upload storage device (resulting is a lot of HTTP errors). During those times, I did see the same messages as described in the incident but was able to retry the uploads and get them to clear. Yes, if enough accumulated in upload pending status the downloads would cease but at no time did the BOINCMGR disconnect or require a reboot or client restart to clear. In the past there have been very rare instances where, due to circumstances, the upload process got interrupted and would not restart without intervention by the WCG staff (like to delete the upload file from the upload filesystem so that the client and remote end were back in sync) and the upload would finish as normal. I have only seen this documented 4 or 5 times in 14 years was usually due to a power outage or similar immediate disconnect that took down not only the client but also the OS. I have also encountered times where I have lost internet connectivity and the uploads accumulated on the client until a connection was reestablished (maybe 24 hours or more later). Once the connection was established a flood of large file uploads would commence (I have mine set to 10 uploads concurrent) and would complete without a problem. Yes, it took as long as an hour sometimes but they did complete without intervention.
[Dec 9, 2021 2:19:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Kevin has already said why the big hitters output fluctuates. It is due to them concentrating on the work they were bought to perform and only contribute to WCG when they have spare capacity.

Problem 4572 seems to occur when the uploads are batched. The answer to that problem would seem to me to be not to batch them but to upload each unit as it completes so spreading the load.

My broadband is on 24/7 so I have no reason to batch. I don't look very often but if I see ARP units checkpointing close to each other I would suspend the running second for a few minutes. This avoids any possible overloading at the checkpointing or the uploading.

Mike
[Dec 9, 2021 9:39:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Kevin's last report said that there were 130 units currently stuck, having errored out. 96 of these would seem to be what remains of generations 079 - 095 plus most of 098 & 099.

I will look a little deeper into this for my next report at the weekend.

Mike
[Dec 9, 2021 9:54:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I have seen a similar problem on 24 and 32 thread systems, but not very often. Since I don't run a lot of ARP units, it has occurred with both MCM and OPN units. I have had luck suspending network activity , waiting about 15 seconds and then resuming network activity. Once the logjam breaks, the rest of the uploads proceed as normal. I have in the past run up to 120 threads on a single internet connections through some rube goldberg concoctions of switches, routers and range extenders. Most of the time it works without any problems, but occasionally these logjams happen for no apparent reason.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 9, 2021 9:59:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
AnandBhat
Cruncher
Joined: Apr 2, 2020
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

My apologies for the distraction. I ran into connectivity issues with my 16 thread system and a similar logjam. I found that report by chance and since the reporter had expressed an interest to push ARP at a rate of 2000 WUs/day, I thought they (and other similar contributors) may have work sitting there attempting to be sent.
[Dec 10, 2021 1:06:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

I'm now getting approximately 85% non-priority work during the past 24 hours but total validated results are still approximately 17,000 per day. Perhaps suggesting more machines have become reliable.
[Dec 12, 2021 1:31:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 277
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I've bumped my ARP limits from 3 to 6 on the main cruncher (16 threads/15 active) and from 1 to 2 on the laptops (4 threads each). So far, there doesn't seem to be any issues, but it will mean more lost work when I need to reboot them.
[Dec 12, 2021 1:41:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3319   Pages: 332   [ Previous Page | 158 159 160 161 162 163 164 165 166 167 | Next Page ]
[ Jump to Last Post ]
Post new Thread