Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 38
Posts: 38   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2365 times and has 37 replies Next Thread
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1292
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

If APR work can be processed adequately on a host I do not see any reason work not to be processed stop. Choice is completely up to the individual who owns the hardware APR is running on
----------------------------------------

[Nov 10, 2024 8:39:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2161
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

- there are no break point: I had to reboot with all tasks having 20 hours of calculation and most of them restarted from 0 (but a few ones restarted with a few hours of calculation, very strange)

You should know that ARP1-tasks have only 8 checkpoints, divided evenly across their run, that is each 12.500%. Some devices are even slower than yours:

ARP1_0030988_140_3  Linux Ubuntu  Valid  2024-11-06T14:54:43  2024-11-11T06:46:56   49.45/51.46    817.7/652.8
OS-Version: Ubuntu 24.04.1 LTS [6.8.0-48-generic|libc 2.39]
Logfile:
<core_client_version>7.24.1</core_client_version>
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[01:03:44] INFO: Checkpoint taken at 2019-04-07_06:00:00
[12:50:18] INFO: Checkpoint taken at 2019-04-07_12:00:00
[00:26:10] INFO: Checkpoint taken at 2019-04-07_18:00:00
[09:12:43] INFO: Checkpoint taken at 2019-04-08_00:00:00
[18:59:25] INFO: Checkpoint taken at 2019-04-08_06:00:00
[07:41:20] INFO: Checkpoint taken at 2019-04-08_12:00:00
[20:23:59] INFO: Checkpoint taken at 2019-04-08_18:00:00
[05:38:51] INFO: Checkpoint taken at 2019-04-09_00:00:00
INFO: Simulation complete compressing output.
05:45:39 (244552): called boinc_finish(0)

</stderr_txt>

In the ARP1-run above you can see that this device needed more than 12 hours to reach the task's 7th checkpoint. If you know you must reboot, better do it right after reaching a checkpoint or pause your task right after reaching a checkpoint.

Adri
[Nov 11, 2024 9:52:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

This combined with the previous issues (status too late, download terror) = I'm done for now with ARP.

ARP has steep requirements for a reason i.e. don't try to run it on a machine with inadequate resources. I think you are right in giving up on ARP. 36 hours for an ARP unit indicates although your machine can adequately process ARP units, it is probably marginal at best.
The processing speed of a host for a single ARP1 WU is not the issue. The real problem is what our friend isn't explicitly stating. That he is likely trying to run multiple WUs at once, in defiance of the default restrictions set when the ARP project was first introduced years ago. Looking at his badges, I think it is safe to assume (as far as WCG is concerned) that he likely isn't running more than one host to crunch on WCG projects.
And if he has 15 ARP1 WUs to terminate, on one host, that means he is just one of those hoarders and doesn't understand why those restrictions have been put in place.
It's 1GB of RAM per ARP1 WU. so if he has 15 of them to terminate, that means those alone take up 15GB of RAM. Add another couple of GB for the OS itself and consider that by general rule of thumb, your RAM usage should never exceed 80% of your physical RAM installed, this would mean that he would have to have at least 20GB of RAM in the system.
Anything less than that, the system will start swapping excessively to the disk (maybe less noticeable immediately on an SSD system), but this will drive up the time it take to process each single WUs and thus the (clock) time between checkpoints, which are as you mentioned, fixed at every 12.5%.
Given that he even noticed this WU restart means the he's is resetting the processing constantly (laptop moving around, sleep settings?), thus never passing that first 12.5% checkpoint.
So all his complaints seem completely self-induced...


Ralf
----------------------------------------

[Nov 11, 2024 4:04:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
catchercradle
Advanced Cruncher
Joined: Jan 16, 2009
Post Count: 127
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

1GB RAM/core is not exactly high end these days. I can happily run 12 or more concurrently and still be only using 20% of memory on my machine. That said, in testing for other projects I have ran tasks that get through 8GB peak on each task.

The other thing that slows down ARP tasks is they hammer the level3 cache on the CPU.
I find maximum throughput on my machine is 15 out 16 real cores. Using virtual cores does not help throughput on these tasks!

Edit: Just been reading the whole thread. Lots of work in VM now coming to an end so I will gradually ramp up the number of tasks in native Linux client as it seems the problems were machine related and others are completing tasks in Linux OK.
----------------------------------------
[Edit 1 times, last edit by catchercradle at Nov 11, 2024 5:09:09 PM]
[Nov 11, 2024 5:02:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 104
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

I'm running 32 ARP1 WUs concurrently on one host and it is using 23.444 GB memory (less than 50% of the memory). Total run time per WU is about 10.5 to 11 hours
[Nov 11, 2024 5:18:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Boca Raton Community HS
Advanced Cruncher
Joined: Aug 27, 2021
Post Count: 126
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

I'm running 32 ARP1 WUs concurrently on one host and it is using 23.444 GB memory (less than 50% of the memory). Total run time per WU is about 10.5 to 11 hours



And here we are- happy when the files to run 1 ARP1 work unit is a cause of celebration on our end....
[Nov 11, 2024 5:24:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 104
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

I'm running 32 ARP1 WUs concurrently on one host and it is using 23.444 GB memory (less than 50% of the memory). Total run time per WU is about 10.5 to 11 hours



And here we are- happy when the files to run 1 ARP1 work unit is a cause of celebration on our end....

I downloaded 48 ARP1 tasks in about 3 hours this morning. It's hard for me to believe they have given me priority to the download queue. I would like to think it is a FIFO queue.
[Nov 11, 2024 7:29:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12376
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

Once the empty cache have filled the demand slows down. Everyone was totally empty this time so it has taken a while but they should be able to cope with the demand now we are about full.

Mike
[Nov 11, 2024 9:04:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[AF>Le_Pommier] Jerome_C2005
Cruncher
Joined: Aug 17, 2006
Post Count: 29
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

This combined with the previous issues (status too late, download terror) = I'm done for now with ARP.

ARP has steep requirements for a reason i.e. don't try to run it on a machine with inadequate resources. I think you are right in giving up on ARP. 36 hours for an ARP unit indicates although your machine can adequately process ARP units, it is probably marginal at best.

The processing speed of a host for a single ARP1 WU is not the issue. The real problem is what our friend isn't explicitly stating. That he is likely trying to run multiple WUs at once, in defiance of the default restrictions set when the ARP project was first introduced years ago. Looking at his badges, I think it is safe to assume (as far as WCG is concerned) that he likely isn't running more than one host to crunch on WCG projects.
And if he has 15 ARP1 WUs to terminate, on one host, that means he is just one of those hoarders and doesn't understand why those restrictions have been put in place.
It's 1GB of RAM per ARP1 WU. so if he has 15 of them to terminate, that means those alone take up 15GB of RAM. Add another couple of GB for the OS itself and consider that by general rule of thumb, your RAM usage should never exceed 80% of your physical RAM installed, this would mean that he would have to have at least 20GB of RAM in the system.
Anything less than that, the system will start swapping excessively to the disk (maybe less noticeable immediately on an SSD system), but this will drive up the time it take to process each single WUs and thus the (clock) time between checkpoints, which are as you mentioned, fixed at every 12.5%.
Given that he even noticed this WU restart means the he's is resetting the processing constantly (laptop moving around, sleep settings?), thus never passing that first 12.5% checkpoint.
So all his complaints seem completely self-induced...

Ralf

I have an i9 with 20 threads and 40 GB of RAM, it's a fix computer, Sherlock.

I had to reboot it (once) when most tasks had over 20 hours calculating, and most of them restarted from 0. So the "breakpoints" were just implemented with 2 left feet.

Anyway I stopped trying, hoarder.
----------------------------------------

[Nov 12, 2024 7:48:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All tasks failing on linux host

I have an i9 with 20 threads and 40 GB of RAM, it's a fix computer

Given the specs on your computer, their really should be no way the ARP work units should take 36 hours. If you are running all twenty threads with ARP units, I would suspect your computer is doing some self throttling due to heat issues. If that is not the case, you must have some other bottleneck in the processing stream someplace. I have an I7-7700 with 8gb RAM and ARP units take 18-20 hours. I only run 2 threads on ARP, the rest are MCM.

I am curious to know what a "fix" computer is. cool

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Nov 12, 2024 9:04:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 38   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread