Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 21
|
![]() |
Author |
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't think the shortened due date is necessarily going to have the desired effect since this project can have very long running jobs. I recently completed a 2-day resend. It was crunching for 38.7 hours. It timed out before my rig finished it and now it is out to another wingman. If it had a 3-day due date, it would be validated now and the 4th computer could be crunching new work rather than wasting time on a WU that could be validated right now.
----------------------------------------Cheers ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, as I noted in a previous post [heck, my last post in this thread, first line], I saw my PV count drop by 40% [my rigs do about 120 results daily at this time]. If that's no evidence, I don't know what. The 38 hour tasks are outliers, the average for MCM is about 4.7 hours at this time, 0.8 hours less than yesterday.
|
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We have been dealing with a large build up of data for mcm1 batches 'in-progress'. This project has larger than average results and is currently using 1.4TB of storage for the in-process batches and growing. As a temporary method to limit the impact of this we did drop the length of time for resends from 30% of original deadline to 20%. However, we have found another way of dealing with the issue so I have just
----------------------------------------We also found a bug that allowed a resend to be assigned 100% of the original deadline in limited cases. We have fixed that bug. Now if the code determines that a user needs more time than 3 days it will simply not send it to them. [Edit 2 times, last edit by knreed at Dec 4, 2013 7:59:27 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
However, we have found another way of dealing with the issue so I have just not restored the deadline back to 30%. Typo not -> now ? |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yup. Corrected above.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Now if the code determines that a user needs more time than 3 days it will simply not send it to them. Return to sanity and the ability to cache a weeks work and go trippin off-line without having to sort through the received work and delete these shorties, or compute them, come back and find you're too late, too late [or task unknown]. Moreover, less of these queue jumpers with those jumped sitting there with 'waiting to run'. ![]() |
||
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Instead of just a fixed percentage, couldn't the deadline be the same as the other outstanding work or 30%, whichever is greater. In the example I have crunched (MCM1_0000216_8191), the other original copy is still out and has 5 days left to go. The resend that I got was for a wingman that errored-out in 2-1/2 days.
----------------------------------------Cheers ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The trouble with that as was explained some years ago is, with the simple % rule and the 10 day general deadline, you get n days, irrespective if the repair has to go out on the first day, the 8th day or the 10th day. Giving the original deadline of course then requires formulating an exception for any task for which the original had a deadline less than the %. The main drawback was that for each assignment the scheduler would have to access the database to find what that original deadline was. Not to say that I would not be in favor, it's a load question with X million tasks and results on the live-database.
----------------------------------------[Edit 1 times, last edit by Former Member at Dec 5, 2013 7:45:20 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I think I have to tell why I opened a new thread for this 2days dead line issue.
This is not the first time I have got the 2-days dead line WUs for other projects. However, MCM1 WUs are special. As you answerd to the other question, the runtime of the MCM1s are unpredictable. Some may needs more than 2 days to complete. In my case, I have never experienced timeout for MCM1 WUs though, I set 1 day buffer on BOINC (This amount of buffer is rather smaller than the others running 24/7.) so very easy to exceed the timeout. As you say, if my rig timeouts, sombody picks it up and finally, some results will be returned though, there could be multiple timeout may take place. It my impact to the project it self. It dosen't make sense. I have been an adoministrator of a datacenter about 10 years . So, I can understand sahorter timeout saves the disk space thouth, I beleave this is not the right way to solve the disk shortage issue. Thanks. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
As knreed posted:
"However, we have found another way of dealing with the issue so I have just now restored the deadline back to 30%." What'd you do when an threatening emergency state develops [running out of disk space which truly would have led to full stop... see the "No more disk-space thread"], just for MCM exceeding 1.4TB use... you grab the emergency break.] As it came the solution was put in place in about 48 hours. Why this happened... the quorum validation issue in MCMv7.26, not present in MCMv7.24 and the out of left field explosive growth of contributions [No secret what came out of left field... it gave a ripple effect ;>] |
||
|
|
![]() |