Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
Member(s) browsing this thread: AgrFan
Thread Status: Active
Total posts in this thread: 3195
Posts: 3195   Pages: 320   [ Previous Page | 260 261 262 263 264 265 266 267 268 269 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2710492 times and has 3194 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Aperture_Science_Innovators:
I wonder why they drop the time_step (& thus make the WU take 2x as long to complete) _and_ also give it such a short deadline. Seems like a poor combination....

There is a legitimate reason for this.

Sometimes all tasks in an ARP1-workunit error out. So the error has to be fixed in some way. To resolve the issue, they found that they had to reduce the time_step. (Note: in three cases they even had to start all over again from generation 000 to resolve a particular issue.) See this interesting post 669410 and this one (post 671450) ("3 cannot be processed even with changing the more granular step_size of 24" - BTW, Kevin means time_step there).

Also, this question about reliable machines is interesting in post 671952: "If you are reducing the deadline for extreme case, are you reducing the definition of reliable to narrow the machines eligible?" Another question about running tasks on the fastest turnaround machines is also interesting here and the answer is: "I agree that would be nice, but no such mechanism exists in BOINC." Also, about the three most Extreme units: "We have also discussed this with the Delft team and they know that these 3 might take another 3-4 weeks to finish after the bulk are done."

Hope this helps a lot! cool

Adri

PS I noticed slot number 125 on your device. (This is outrageous laughing, nearly insane wink !) Could you tell some more about it, please? praying
----------------------------------------
[Edit 1 times, last edit by adriverhoef at May 8, 2023 3:27:02 PM]
[May 8, 2023 3:25:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 266
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Don't have access to the system to check now
As Mike Gibson pointed out, the time_step can be identified by looking at the progress % in BOINC Manager. "If it increases by 0.014% (occasionally 0.013%) then that is a 24 second time step. If it increases by 0.021% (occasionally 0.020%) then that is the standard 36 second time step."
I wonder why they drop the time_step (& thus make the WU take 2x as long to complete) _and_ also give it such a short deadline
Dropping the time_step was a work around to enable "stuck" workunits to make progress. Reducing the deadline was intended to help the laggards catch up. I agree that the short deadline doesn't work so well for 18 or 24 second time steps. I think that's why they send these laggards out to 3 machines initially, rather than 2. Once a stuck unit gets unstuck, they increase the time_step to 36 again (or at least they used to when IBM was in charge).
Cheers,
Mark
Edit: I see that Adri was too quick for me and has already answered the question rather better than I did! laughing
----------------------------------------
[Edit 1 times, last edit by MJH333 at May 8, 2023 3:27:28 PM]
[May 8, 2023 3:25:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thank you both. I knew I'd find some experts here who could weight in. Looks like they sent out another three copies of the WU about two hours ago after none of the devices got it in on-time. I suppose I'll see what happens this evening when my copy comes back, if it's deemed "too late" or if they abort one of the resends.
----------------------------------------

[May 8, 2023 5:09:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Mark:
Edit: I see that Adri was too quick for me and has already answered the question rather better than I did! laughing

After some digging in the vast vaults of the forum wink I posted my message, then wanted to add a PS, only to find out - after the edit - you also answered. Luckily, it didn't do any harm. wink In any case I found your post interesting, too, it is another way to look at the matter. E.g. you mentioned the buzzwords 'stuck' and 'laggards' and that these are sent to 3 machines instead of 2, where I failed to do so. You did well. At least we didn't give the same lecture so maybe we can pat ourselves on the back for that. biggrin

Just another thought, are they perhaps sending all Extreme workunits to 3 machines initially?

Adri
----------------------------------------
[Edit 1 times, last edit by adriverhoef at May 8, 2023 5:38:48 PM]
[May 8, 2023 5:17:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12349
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I notice that 2 of the extremes recently flagged were sent out as newbies with 36 hour deadlines. This seems to be a departure from the previous rule for extreme newbies of 72 hours. 36 hours used to be used for extreme re-sends after the half-way mark of the original deadline.

This can create a problem when the timestep is reduced causing the unit to take longer to crunch.

To summarise recent posts:
Normals go to 2 machines with 6 day deadlines
Accelerated go to 2 machines with 3 day deadlines
Extremes go to 3 machines with 3 day deadlines
Re-sends replace errors or no replies with the deadline halved after the halfway point of the original unit.
Units with reduced timesteps have their 36 secs. timestep restored after 2 or 3 successful generations.

Or that was how IBM did it!

Mike
----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at May 8, 2023 11:46:32 PM]
[May 8, 2023 5:46:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12349
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Sunday Report

Sorry this is a day late. I missed the 7 May output so have interpolated between 6 & 8 May.

All categories have been released this week.

977 units have been validated this week

Assuming that a full generation 182 will be the last, there are 1,637,090 units still outstanding. I will re-start my forecasting once output stabilises.

The definition of accelerated and extreme are unchanged

There are still 35 Extremes and 57 Accelerated units listed as none have moved. The numbers in their generations are 2,200 & 3837.

Mike
[May 8, 2023 6:11:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
paulch2
Cruncher
Joined: Aug 6, 2020
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Some of the extremes seem to be 36 hours rather than 3 days
https://www.worldcommunitygrid.org/contribution/workunit/301765111
[May 8, 2023 6:19:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
paulch2
Cruncher
Joined: Aug 6, 2020
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Some of the extremes seem to be 36 hours rather than 3 days
https://www.worldcommunitygrid.org/contribution/workunit/301765111


Ignore that. Mike already mentioned it.
[May 8, 2023 7:01:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Aperture_Science_Innovators:
I suppose I'll see what happens this evening when my copy comes back, if it's deemed "too late" or if they abort one of the resends.

If you'll return it within 24 hours of the needed validations you should be safe. So as long as nobody returns a result you have at least 24 hours to return your result as long as the deadline is met by the others. Plus, they have an extra 24 hours if they don't meet the deadline if you didn't meet the deadline. However, if you return your result later than 24 hours after the needed results were received, then you're out of luck.

Example 1:
Our task, OPNG_0184849_00003_1, missed the deadline ("Due time") by some 30 hours (!); luckily the last needed wingman returned their result, OPNG_0184849_00003_2, only some 20 hours and 38 minutes before we did:

$ wcgstats -wJSrrr= 300214794
OPNG_0184849_00003_0 Linux Ubuntu Valid 2023-05-04T12:29:08 2023-05-06T02:11:18
OPNG_0184849_00003_1 Fedora Linux Valid 2023-05-04T14:58:20 2023-05-08T20:41:40
OPNG_0184849_00003_2 Linux Debian Valid 2023-05-07T15:00:34 2023-05-08T00:03:44
-----------------------------------------------------------------------------------
OPNG_0184849_00003_0 Linux Ubuntu Valid 2023-05-04T12:29:08 2023-05-06T02:11:18
Sent Time: 2023-05-04T12:29:08+0000
Due Time: 2023-05-07T12:29:08+0000
Returned: 2023-05-06T02:11:18+0000
Result-ID: 497779324
OPNG_0184849_00003_1 Fedora Linux Valid 2023-05-04T14:58:20 2023-05-08T20:41:40
Sent Time: 2023-05-04T14:58:20+0000
Due Time: 2023-05-07T14:58:20+0000
Returned: 2023-05-08T20:41:40+0000
Result-ID: 498529679
OPNG_0184849_00003_2 Linux Debian Valid 2023-05-07T15:00:34 2023-05-08T00:03:44
Sent Time: 2023-05-07T15:00:34+0000
Due Time: 2023-05-09T03:00:34+0000
Returned: 2023-05-08T00:03:44+0000
Result-ID: 501490555


Example 2:
Our task, OPNG_0184849_00005_1, missed the deadline ("Due time") by some 28 hours (!), only to find that the last needed wingman returned their result, OPNG_0184849_00005_2, already more than 24 hours ago (!) before we did:

$ wcgstats -wJSrr= 300214793
OPNG_0184849_00005_0 Linux Ubuntu Valid 2023-05-04T12:29:08 2023-05-04T18:11:18
OPNG_0184849_00005_1 Fedora Linux Too Late 2023-05-04T14:58:20 2023-05-08T19:06:54
OPNG_0184849_00005_2 Linux Debian Valid 2023-05-07T15:00:34 2023-05-07T18:12:07
--------------------------------------------------------------------------------------
OPNG_0184849_00005_0 Linux Ubuntu Valid 2023-05-04T12:29:08 2023-05-04T18:11:18
Sent Time: 2023-05-04T12:29:08+0000
Due Time: 2023-05-07T12:29:08+0000
Returned: 2023-05-04T18:11:18+0000
Result-ID: 497779724
OPNG_0184849_00005_1 Fedora Linux Too Late 2023-05-04T14:58:20 2023-05-08T19:06:54
Sent Time: 2023-05-04T14:58:20+0000
Due Time: 2023-05-07T14:58:20+0000
Returned: 2023-05-08T19:06:54+0000
Result-ID: 498531551
OPNG_0184849_00005_2 Linux Debian Valid 2023-05-07T15:00:34 2023-05-07T18:12:07
Sent Time: 2023-05-07T15:00:34+0000
Due Time: 2023-05-09T03:00:34+0000
Returned: 2023-05-07T18:12:07+0000
Result-ID: 501490521

Adri

PS I just noticed one result is now Pending Validation:
Result name        OS type   Status Sent time           Due / Return time   CPUtime/Elapsed Claimed/Granted
ARP1_0033793_102_1 Linuxmint P.Val. 2023-05-07 03:20:17 2023-05-09 06:36:50 40.95/41.1 1,548.5/0

Whoopsie, another one coming in right at this time:
ARP1_0033793_102_2 Linux Debian P.Val. 2023-05-07 03:22:47 2023-05-09 09:23:19   18.89/19        1,422/0

----------------------------------------
[Edit 1 times, last edit by adriverhoef at May 9, 2023 9:32:13 AM]
[May 9, 2023 9:29:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Yup, that Linux Mint device is mine. 41 hours of CPU time is astonishing. Hopefully when a third one comes back it proves to be what the task needed though and validates without a hitch.
----------------------------------------

[May 9, 2023 1:44:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3195   Pages: 320   [ Previous Page | 260 261 262 263 264 265 266 267 268 269 | Next Page ]
[ Jump to Last Post ]
Post new Thread