Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
Member(s) browsing this thread: adriverhoef , AgrFan
Thread Status: Active
Total posts in this thread: 3313
Posts: 3313   Pages: 332   [ Previous Page | 164 165 166 167 168 169 170 171 172 173 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3302754 times and has 3312 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2171
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Mike, I've been running one from a triplet for 37 hours by the 32-bit application:
../../projects/www.worldcommunitygrid.org/wcgrid_arp1_wrf_7.32_i686-pc-linux-gnu

workunit 101956363
ARP1_0033475_101_0  Linux Ubuntu  Valid  2022-01-02T15:38:26  2022-01-05T00:47:49   44.38/45.90   1451.3/1472.6
ARP1_0033475_101_1 Linux Fedora Valid 2022-01-02T15:36:25 2022-01-04T11:11:24 36.06/36.99 1493.9/1472.6
ARP1_0033475_101_2 Linuxmint InPrg 2022-01-02T15:31:07 2022-01-07T03:31:07 0.00/0.00 0.0/0.0

Details:
ARP1_0033475_101_1  Linux Fedora  P.Val  2022-01-02T15:36:25  2022-01-04T11:11:24   36.06/36.99   1493.9/0.0   
Logfile:
<core_client_version>7.16.11</core_client_version>
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[03:38:37] INFO: Checkpoint taken at 2019-01-19_06:00:00
[08:26:21] INFO: Checkpoint taken at 2019-01-19_12:00:00
[13:19:17] INFO: Checkpoint taken at 2019-01-19_18:00:00
[17:38:31] INFO: Checkpoint taken at 2019-01-20_00:00:00
[22:14:35] INFO: Checkpoint taken at 2019-01-20_06:00:00
[03:13:48] INFO: Checkpoint taken at 2019-01-20_12:00:00
[07:55:55] INFO: Checkpoint taken at 2019-01-20_18:00:00
[12:07:58] INFO: Checkpoint taken at 2019-01-21_00:00:00
INFO: Simulation complete compressing output.
12:09:46 (1083638): called boinc_finish(0)

</stderr_txt>

Normal 64-bit execution time for ARP1 tasks on that device is about 12 hours.

EDIT: Two results are in.
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jan 5, 2022 1:09:50 AM]
[Jan 4, 2022 3:52:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12435
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

101 is one of the generations involved in the stuck units which Kevin unstuck for us. Here is his post on the subject:

A quick note. The test of changing the time-step was successful and the test workunits are moving forward. As a result, all of the workunits in the same situation have had the same time-step modification applied and they are now running. This gets about 60 of the stuck work-units moving again.

The other 70 look like they will need to be backed up a couple of generations and then have the same change applied. I won't be able to get to that until next week. However at that point we will then have all of the workunits moving again.

[Jan 4, 2022 4:24:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 990
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Having the unstuck WUs take more time per WU and have a shorter window might cause some issues. Having them be triplet might make it work. It will be interesting to see what happens. It might work better to have a longer window on the triplets now that the time-step has been modified.
[Jan 4, 2022 4:36:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12435
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

There are only 63 units in the extreme category. Hopefully, they will only go to the fastest machines, but the triplet aspect should cover it. However, we are expecting the other 70 next week, so it could become a problem.

Bear in mind the extra 24 hours allowed these days before a resend.

Mike
[Jan 4, 2022 6:00:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I got one of those triple stuck units ARP1_0033946_091. I'll see how it runs.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Jan 4, 2022 6:01:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12435
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Please bear in mind that the initial issue of the unstuck units would probably be twins and would only become triplets when the next generation is created.

Mike
[Jan 4, 2022 6:53:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

Bear in mind the extra 24 hours allowed these days before a resend.


I would think that would make the machine unreliable. Don't the WUs have to be returned within 2 days to be considered reliable? Once unreliable, you won't get the priority work anymore and the pool of reliable hosts drops
----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 4, 2022 8:38:56 PM]
[Jan 4, 2022 8:38:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12435
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Not sure of the actual qualification. There was some debate about it early on, but it is not a permanent demotion. It makes it more important to keep your cache to a minimum, so as not to lose time queueing.

However, it means your time would not be wasted.

Mike
[Jan 4, 2022 9:26:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 277
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I got one of those triple stuck units ARP1_0033946_091. I'll see how it runs.


I managed to snag ARP1_0033475_101_0, and sure enough it's running the 32-bit application. I was wondering why this one was progressing so slowly; it's showing over 42 hours of CPU time. It still has over two hours to go before it finally finishes, though the time estimates for ARP can sometimes be a bit wacky.
[Jan 4, 2022 9:48:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I got one of those triple stuck units ARP1_0033946_091. I'll see how it runs.
Cheers

I do not know if it is a 32 bit application, but it certainly is running slowly. The machine it is on generally runs ARP units in 22 to 25 hours, but this unit is showing about 51% done in a little over 24 hours. Roughly about twice as long as other units have taken on this machine.
Cheers
Edit: It is a 32 bit unit.
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Jan 5, 2022 12:34:46 PM]
[Jan 4, 2022 10:20:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3313   Pages: 332   [ Previous Page | 164 165 166 167 168 169 170 171 172 173 | Next Page ]
[ Jump to Last Post ]
Post new Thread