Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
Member(s) browsing this thread: Mike.Gibson , xensazn , catchercradle
Thread Status: Active
Total posts in this thread: 3281
Posts: 3281   Pages: 329   [ Previous Page | 180 181 182 183 184 185 186 187 188 189 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3171400 times and has 3280 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thank you, Kevin. Now we know.

Have you allowed for 0033711_099 and 0034392_089 restuck this week?

Cheers

Mike


0033711_099 has been fixed, resubmitted to the grid and validated:
https://www.worldcommunitygrid.org/contribution/workunit/127861651

0034392_089 has been fixed, resubmitted to the grid and is currently running:
https://www.worldcommunitygrid.org/contribution/workunit/131568984


[edit - and apparently entity is running 0034392_089 as noted above]
----------------------------------------
[Edit 1 times, last edit by knreed at Jan 26, 2022 3:36:03 PM]
[Jan 26, 2022 3:35:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

Of the remaining 41, we are re-running them on our servers before we put them back on the grid. This is a slightly slow process but it allows us to be sure that we understand the issue and that they are running properly before we send them out again. Some have to be re-run for multiple generations and as a result we have only been able to put 4-5 back into circulation each day. With the help of Delft, we have a way to detect the issue in the validator so we will identify this issue in the first generation it occurs in from now on so we shouldn't get these stuck jobs again (we will still have to periodically re-run the jobs with a smaller step size).

I hope to have the remaining 41 running again within the next 7-10 days.

Will they come back into the grid at the same generation as when they were stuck or will they come back to the grid at a future generation?
[Jan 26, 2022 4:15:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Of the remaining 41, we are re-running them on our servers before we put them back on the grid. This is a slightly slow process but it allows us to be sure that we understand the issue and that they are running properly before we send them out again. Some have to be re-run for multiple generations and as a result we have only been able to put 4-5 back into circulation each day. With the help of Delft, we have a way to detect the issue in the validator so we will identify this issue in the first generation it occurs in from now on so we shouldn't get these stuck jobs again (we will still have to periodically re-run the jobs with a smaller step size).

I hope to have the remaining 41 running again within the next 7-10 days.

Will they come back into the grid at the same generation as when they were stuck or will they come back to the grid at a future generation?


I expect to put them into the grid at the same generation as when they were stuck (reserving of course the right to do something differently if something unexpected comes up)
[Jan 26, 2022 4:48:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12409
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

No need for any back tracking, then?

Mike
[Jan 26, 2022 9:01:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 268
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Mike,

I've got ARP1_0034245_107, a triplet with 36 second time step.

Could that be a formerly stuck unit?

Cheers,
Mark
[Jan 27, 2022 12:44:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12409
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thank you, Mark.

All the extremes are either ultras which seem to be in the low 10000s or unstuck units which seem to be in the 30000s. The rececently unstuck will have 24 second time steps for a few generations before reverting to 36 second.

Mike
[Jan 27, 2022 3:50:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 268
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thanks Mike.

I've now got ARP1_0034390_092, another triplet with 36 second timestep.

This has errored out on one of the initial 3 machines with an unhandled exception error.

Cheers,
Mark
[Jan 27, 2022 5:18:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12409
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

That is one to watch. 34392, which could be near, errored out in 089.

Mike
[Jan 27, 2022 5:27:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

That is one to watch. 34392, which could be near, errored out in 089.

Mike

It is still running after 33 hours. Supposedly, has about 10 hours left.
[Jan 27, 2022 9:36:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 971
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Mike,

Just noticed I've got ARP1_0033880_104_1, which is part of a triplet with 36 second time-step. For what it's worth, it's running the 32-bit executable :-(

Cheers - Al.
[Jan 27, 2022 10:07:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3281   Pages: 329   [ Previous Page | 180 181 182 183 184 185 186 187 188 189 | Next Page ]
[ Jump to Last Post ]
Post new Thread