Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 179
Posts: 179   Pages: 18   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 558132 times and has 178 replies Next Thread
Psalm103
Cruncher
Joined: Jan 6, 2007
Post Count: 24
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

I realize that run-time projections can be very wrong. The unusual thing on this was that after the reboot on all 6 beta WU's I had at the time, the time between steps nearly doubled. They reached 50% in about 5.5-6 hours, but then ended up taking around 17-19 hours each to complete.

Thanks,
Ed
[Aug 29, 2015 7:13:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

Interesting ones In Progress:
BETA_ avx101118-031_ r6_ 1_ wcgfahb00060000_ 0--
BETA_ avx101118-075_ r8_ 1_ wcgfahb00070000_ 0--
Yeah, interesting ones.
These numbering was expected to show up once.
Parent tasks must be from the 1st batch, where the user either aborted the task or exceeded the deadline (more probably) when the tasks were at 60% or 70% completion.

Would be nice to know, when deadline is not achieved, whether the task will be server aborted or run to 100%.
I think Keith is requested to enlighten that.

This is normal. There are a few instances where this happens. One is that the user had returned a trickle message that was determined invalid. When that happens, the next work unit is generated from the point when the last valid point of a work unit. This allows for the greater say 3million steps that are needed to proceed without having to restart a given work unit from the beginning again. This will cut down on CPU cycles being wasted.

Thanks,
-Uplinger

Not fully answered my question.
What's happening with the task on the clients machine?
- When a trickle seems to be invalid. Will the rest of the client task be aborted by the server?

There will be few things that happen when a result is determined invalid based off the trickle message and intermediate upload (both are required to validate a 10k step section).
1. The result will be marked as returned and ready for validation. This will allow the validator to make sure that everything is in order and create the next work unit based on the step it completed. If a machine has completed 40k steps valid, then the result is marked valid and the next result is sent starting at the 40k. If a machine returns trickle messages and the first one is invalid, then the whole result is considered invalid and a fresh copy is sent out.
2. The user will be sent a trickle up message to "hard stop", which means just stop what you're doing now and check in with the server. This will give the client on it's next scheduler request a heads up that 'hey, you only completed 40% good, we don't want you to continue." There will be some lost cpu time here as it is based on the next scheduler request, but same would go for server abort.
3. Once the result is assimilated up to the x% valid, and it has been assimilated for a full day, then the result will be purged from the database, also telling the client to stop (server abort).

- When the task is at e.g. 60% when deadline achieved. Will the task be server aborted for the 40% to do?
Answered in the last section i believe.

Another interesting point.
The trickle message is requested before the corresponding upload files are received by the server.
Would it be better to upload the files successful first before sending/requesting a trickle message?

We have a check on the trickle message handler to check again for X number of minutes. If a intermediate upload is not sent back within X minutes (set to 30 right now), Then the result is marked as ready for validation as the step 1 above says.

29 Aug 18:32:06 [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_BETA_avx101118-002_r0_1_wcgfahb00200000_0_1440865872.xml
29 Aug 18:32:06 Sending scheduler request: To send trickle-up message.
29 Aug 18:34:40 Started upload of BETA_avx101118-002_r0_1_wcgfahb00200000_0_3
29 Aug 18:34:40 Started upload of BETA_avx101118-002_r0_1_wcgfahb00200000_0_13
29 Aug 18:34:44 Finished upload of BETA_avx101118-002_r0_1_wcgfahb00200000_0_3
29 Aug 18:34:44 Finished upload of BETA_avx101118-002_r0_1_wcgfahb00200000_0_13



Hope this helps better answer your questions.

Thanks,
-Uplinger
[Aug 29, 2015 9:58:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1297
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

Interesting reading. It raises for question for me
1. The result will be marked as returned and ready for validation. This will allow the validator to make sure that everything is in order and create the next work unit based on the step it completed. If a machine has completed 40k steps valid, then the result is marked valid and the next result is sent starting at the 40k. If a machine returns trickle messages and the first one is invalid, then the whole result is considered invalid and a fresh copy is sent out.

We have a check on the trickle message handler to check again for X number of minutes. If a intermediate upload is not sent back within X minutes (set to 30 right now), Then the result is marked as ready for validation as the step 1 above says.

In the event a user decides to have networking disabled for longer than 30 minutes what will the server do. Would it mark the job in valid?
----------------------------------------

[Aug 29, 2015 10:52:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mathilde2006
Senior Cruncher
Germany
Joined: Sep 30, 2006
Post Count: 269
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

Does it surprise you?

BETA_ avx101118-033_ r18_ 1_ 1-- - No Reply Sent Time 8/29/15 18:01:10 Due Time 8/29/15 18:01:10 0.00 0.0 / 0.0


Got two (no reply) betas on two clients.
Both betas are still running- but one client was in the future powered by flux capacitor and returned the result exactly at the return time:
BETA_ avx101118-008_ r12_ 1_ 1-- - In Progress 29.08.15 18:00:30 31.08.15 03:36:29 3.24 38.8 / 0.0
wingman:
BETA_ avx101118-008_ r12_ 1_ 0-- - No Reply 25.08.15 18:00:23 29.08.15 18:00:23 32.82 310.8 / 0.0

The client is still crunching here - 13% at 4.5 hours.
36 hours return time is very short.
Could be too late on my 1,5 Ghz AMD Turion. crying

Update
In the last seven hours also my second beta WU has claimed time, but is still running without wingman:

BETA_ avx101118-072_ r19_ 1_ wcgfahb00100000_ 0-- - In Progress 29.08.15 19:07:36 02.09.15 19:07:36 6.22 155.4 / 0.0

Crunching at 46% after 7 hours.
Core2 Quad 2,66
----------------------------------------

----------------------------------------
[Edit 4 times, last edit by Mathilde2006 at Aug 30, 2015 7:13:27 AM]
[Aug 30, 2015 12:28:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

Interesting reading. It raises for question for me
1. The result will be marked as returned and ready for validation. This will allow the validator to make sure that everything is in order and create the next work unit based on the step it completed. If a machine has completed 40k steps valid, then the result is marked valid and the next result is sent starting at the 40k. If a machine returns trickle messages and the first one is invalid, then the whole result is considered invalid and a fresh copy is sent out.

We have a check on the trickle message handler to check again for X number of minutes. If a intermediate upload is not sent back within X minutes (set to 30 right now), Then the result is marked as ready for validation as the step 1 above says.

In the event a user decides to have networking disabled for longer than 30 minutes what will the server do. Would it mark the job in valid?

It's a harder read, but my take is:

1) Trickle message request send for the 10K step upload
2) Server waits max 30 minutes for the trickle upload to come.
3) No Upload, the End. (Better get your BOINC version up to scratch ;)

This can very well happen for those on e.g. a scheduled connection period. If that is set to say 08:00 to 18:00 and the trickle msg is sent at 7:59:58, the client will cut of networking at 08:00 and upload is in limbo.

Maybe, maybe WCG may wish to ask Dr.Anderson to make a mod to scheduling, which is already designed to report immediately in the 30 minutes before cut-off time: If upload/trickle cycle is in progress, hold off flipping network switch till upload is finished, none are queued. A 10K block is on my comps 2.1 to 5.5 hours going by this test. Swiped aside in the 'unfortunate' event? Can see cries of frustration coming, if this is the correct interpretation.

There's rush it appears, but if a deadline is 7-10 days for the whole work unit, I wonder. Also we have regular work that takes less than an hour. A trickle is 1-5.5 hours computing, maybe some had even longer times. The actual mean-times for past 4 days in hours of the beta:

13.7383
19.1683
17.5747
19.6399

i.e. 1.3 to 1.96 hours between upload intervals [assuming all were 100K results]

Plan B, maybe just get rid of the trickle notion. Yes it's fun playing with features, but this whole complexity may not be needed as when we have production tasks of under 1 hour. Just wonder if instead of 20-22 computers being involved in the 2M steps, with 200-220, how close the last step would be compared to the whole 2M being computed on 1 computer. It'd be interesting to understand how QC is maintained on such a jumble of computer results being concatenated, not my problem, but interesting nonetheless.
[Aug 30, 2015 7:24:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

text
text
text



Hope this helps better answer your questions.

Thanks,
-Uplinger

Keith, thanks for your enlightenment nerd
[Aug 30, 2015 7:26:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1297
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

Thanks Rob for the explanation. It is certainly going to be interesting when this phase (if that is the correct terminology) goes live to see what happens.
----------------------------------------

[Aug 30, 2015 8:23:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
vepaul
Senior Cruncher
Belgium
Joined: Nov 17, 2004
Post Count: 261
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

One error, others OK

Test bêta

Nom du résultat Nom d'unité Etat Heure d'envoi Heure de retour prévue /
Heure de retour Temps d'unité centrale (heures) Crédit BOINC demandé/accordé
BETA_ avx101118-004_ r10_ 1_ wcgfahb00400000_ 0-- Bureau2-HP En cours 29/08/15 19:07:05 02/09/15 19:07:05 10,69 / 0,00 310,8 / 0,0
BETA_ avx101118-009_ r15_ 1_ wcgfahb100000_ 0-- Bureau2-HP Valide 27/08/15 10:37:45 28/08/15 13:18:28 20,76 / 22,70 455,0 / 455,0
BETA_ avx101118-036_ r2_ 1_ 0-- paul-HP2 Erreur 25/08/15 18:03:18 29/08/15 18:03:24 0,00 / 0,00 484,1 / 0,0
BETA_ avx101118-074_ r14_ 1_ 0-- Bureau2-HP Valide 25/08/15 17:59:33 26/08/15 14:07:43 13,17 / 16,48 457,1 / 457,1
BETA_ avx101118-005_ r6_ 1af_ 3-- paul-HP2 Valide 04/08/15 05:42:28 04/08/15 19:57:36 11,81 / 11,88 353,2 / 443,9
BETA_ avx101118-005_ r0_ 1ar_ 0-- paul-HP2 Valide 29/07/15 18:19:55 30/07/15 09:14:26 12,86 / 12,94 384,9 / 453,8
BETA_ avx101118-005_ r1_ 1cs_ 1-- Bureau2-HP Valide 29/07/15 18:15:59 30/07/15 15:25:48 14,46 / 17,72 489,6 / 506,1
----------------------------------------
[Edit 1 times, last edit by vep at Aug 30, 2015 9:02:40 AM]
[Aug 30, 2015 8:57:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

I was curious about the figures for the latest beta so I made a little spreadsheet just for this beta.



I have one completed at 27.18 hours on Linux with 99.92% efficiency with a Xeon E5410. I have two more in progress on other machines with one of them looking to go about 52 hours on an AMD 9150e at 1.8ghz with Win 7. Hope they allow at least a 10 day window for completion once the project is up and running. Based on the chart, the bulk of the units have been completed by some pretty fast machines, but there are some units which have been done by some pretty slow machines. If individual results were available it would be possible to calculate the median value which may be a better indicator of the machine speed distribution.The overall mean time is 18.39 hours.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Aug 30, 2015 11:24:26 AM]
[Aug 30, 2015 11:18:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
fablefox
Senior Cruncher
Joined: May 31, 2010
Post Count: 161
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test for PC v7.10 - August 25, 2015 [ Issues Thread ]

deadline date. is it international (IBM time?) or local (my time?)

I may have to abort it since its 21 hours remaining. better off giving it to to those with better hardware.

but i thought the client will not run if it think it won't have enough time, right?
----------------------------------------
[Aug 30, 2015 2:04:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 179   Pages: 18   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread