Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 14
Posts: 14   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 56764 times and has 13 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2166
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Time to Retire My GPU?

Bryn Mawr:
So both machines are now downloading and successfully running tasks in just over 11 hours.
However, about 15-20% of the tasks fail at 7.88 hours with time limit exceeded.
Is there any reason why some tasks have a shorter time limit - more importantly, is there any way of predicting which jobs will be affected?

Read through this thread and you might be able to understand and correct the problem on your machine.

Adri


No, purely based on the runtime.

I'm sorry, Bryn, then I must have misunderstood your problem.

Anyway, the runtime relies on the number of jobs inside each OPNG-task and most of the time that number revolves around the number 80:
OPNG_0176011_00090, jobs: 81	    OPNG_0175999_00121, jobs: 81
OPNG_0176011_00121, jobs: 81 OPNG_0175420_00034, jobs: 79
OPNG_0175965_00085, jobs: 81 OPNG_0175875_00072, jobs: 78
OPNG_0175965_00071, jobs: 81 OPNG_0175892_00110, jobs: 77
OPNG_0175965_00076, jobs: 80 OPNG_0175949_00044, jobs: 83
OPNG_0176011_00098, jobs: 81 OPNG_0175949_00054, jobs: 84
OPNG_0176011_00057, jobs: 81 OPNG_0175949_00058, jobs: 82
OPNG_0176011_00051, jobs: 81 OPNG_0175949_00037, jobs: 85

Easy to see this for yourself after checking the variables are OK, although:
# https://boinc.berkeley.edu/wiki/client_state
# "You shouldn't rely on the client_state.xml format staying unchanged between BOINC versions.
# If you're writing a program or script that needs to get information from BOINC, use GUI RPCs instead."

# Files and directories used by BOINC:
BOINCDIR=~boinc; DIR=/var/lib/boinc; [ -d $DIR ] && BOINCDIR=$DIR
CLIENT_STATE=$BOINCDIR/client_state.xml
sed -n 's/^-jobs \(OPNG_[^ ]*\)\.job .* -wcgdpf/\1, jobs:/p' $CLIENT_STATE | pr -t -2

Adri
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Apr 22, 2023 12:27:30 PM]
[Apr 22, 2023 12:26:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 345
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Time to Retire My GPU?

Bryn Mawr:
So both machines are now downloading and successfully running tasks in just over 11 hours.
However, about 15-20% of the tasks fail at 7.88 hours with time limit exceeded.
Is there any reason why some tasks have a shorter time limit - more importantly, is there any way of predicting which jobs will be affected?

Read through this thread and you might be able to understand and correct the problem on your machine.

Adri


No, purely based on the runtime.

I'm sorry, Bryn, then I must have misunderstood your problem.

Anyway, the runtime relies on the number of jobs inside each OPNG-task and most of the time that number revolves around the number 80:
OPNG_0176011_00090, jobs: 81	    OPNG_0175999_00121, jobs: 81
OPNG_0176011_00121, jobs: 81 OPNG_0175420_00034, jobs: 79
OPNG_0175965_00085, jobs: 81 OPNG_0175875_00072, jobs: 78
OPNG_0175965_00071, jobs: 81 OPNG_0175892_00110, jobs: 77
OPNG_0175965_00076, jobs: 80 OPNG_0175949_00044, jobs: 83
OPNG_0176011_00098, jobs: 81 OPNG_0175949_00054, jobs: 84
OPNG_0176011_00057, jobs: 81 OPNG_0175949_00058, jobs: 82
OPNG_0176011_00051, jobs: 81 OPNG_0175949_00037, jobs: 85

Easy to see this for yourself after checking the variables are OK, although:
# https://boinc.berkeley.edu/wiki/client_state
# "You shouldn't rely on the client_state.xml format staying unchanged between BOINC versions.
# If you're writing a program or script that needs to get information from BOINC, use GUI RPCs instead."

# Files and directories used by BOINC:
BOINCDIR=~boinc; DIR=/var/lib/boinc; [ -d $DIR ] && BOINCDIR=$DIR
CLIENT_STATE=$BOINCDIR/client_state.xml
sed -n 's/^-jobs \(OPNG_[^ ]*\)\.job .* -wcgdpf/\1, jobs:/p' $CLIENT_STATE | pr -t -2

Adri


Sorry, I was on a 3 hour bus journey round the twisties and trying to read through the thread you linked hit my stomach. I’m now at destination, fed and watered.

Basically my problem is that, whilst 80+% of the tasks will happily run to completion in 11+ hours the other 15-20% of tasks abort with time limit exceeded after 7.88 hours.

I want to be able to identify the tasks with short time limits to either kill them early or adjust the time limit to 12 hours+

It’s not the length of time the job will run, that’s fairly constant, it’s the lime limit that Boinc sets before it aborts the task.
----------------------------------------
[Edit 1 times, last edit by Bryn Mawr at Apr 22, 2023 6:49:10 PM]
[Apr 22, 2023 6:45:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2166
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Time to Retire My GPU?

Bryn,
Are you comfortable using a script to circumvent the problem?
This is what I do:
First I'm determining what the value is of <rsc_fpops_est> for OPNG-tasks in BOINC's client_state.xml file. I'm wondering if this is the same for everyone. For me, it's 25320265880000 (or 25320265880000.000000, if you must). It seems to be a fairly constant value, because it hasn't changed in the past half year.
Then I run a script from crontab at some chosen intervals to change a few values in client_state.xml. This can only work if BOINC is temporarily stopped from running. As soon as the change has been made, BOINC can be resumed. The script does this for you.

Please find my script below. Put it in a file on your machine so that you can execute it. The script should be run from root's crontab.

BOINCDIR=~boinc; DIR=/var/lib/boinc; [ -d $DIR ] && BOINCDIR=$DIR
CLIENT_STATE=$BOINCDIR/client_state.xml
# NB: You cannot export FPOPS_EST as it will not work reaching $SUDO.
FPOPS_EST=25320265880000 # 24-08-2022
GREP="grep -q" # silent
GREP=grep # not silent; put '#' in front of this line if you want 'grep' to be silent.
if grep "<rsc_fpops_est>$FPOPS_EST\.000000</rsc_fpops_est>" $CLIENT_STATE; then
case $USER in (root) SUDO=;; (*) SUDO=sudo;; esac
cp -p $CLIENT_STATE /var/tmp/${CLIENT_STATE##*/}.$$
# (1) Stop BOINC.
$SUDO systemctl stop boinc-client
# (2) Rewrite $CLIENT_STATE.
$SUDO perl -w -i -p -e '
if (($b,$v,$e) = /(<rsc_fpops_est>)('$FPOPS_EST')(\.000000<\/rsc_fpops_est>)/) {
$v *= 4;
s//$b$v$e/;
}
$FPOPS_BOUND = 40 * '$FPOPS_EST';
if (($b,$v,$e) = /(<rsc_fpops_bound>)($FPOPS_BOUND)(\.000000<\/rsc_fpops_bound>)/) {
$v *= 4;
s//$b$v$e/;
}
' $CLIENT_STATE
# (3) Restart BOINC.
$SUDO systemctl start boinc-client
fi


See if this solves your problem, else experiment with the multiplication factor (its current value is 4, see above).
Good luck!
Adri
[Apr 23, 2023 10:10:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 345
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Time to Retire My GPU?

Bryn,
Are you comfortable using a script to circumvent the problem?
This is what I do:
First I'm determining what the value is of <rsc_fpops_est> for OPNG-tasks in BOINC's client_state.xml file. I'm wondering if this is the same for everyone. For me, it's 25320265880000 (or 25320265880000.000000, if you must). It seems to be a fairly constant value, because it hasn't changed in the past half year.
Then I run a script from crontab at some chosen intervals to change a few values in client_state.xml. This can only work if BOINC is temporarily stopped from running. As soon as the change has been made, BOINC can be resumed. The script does this for you.

Please find my script below. Put it in a file on your machine so that you can execute it. The script should be run from root's crontab.

BOINCDIR=~boinc; DIR=/var/lib/boinc; [ -d $DIR ] && BOINCDIR=$DIR
CLIENT_STATE=$BOINCDIR/client_state.xml
# NB: You cannot export FPOPS_EST as it will not work reaching $SUDO.
FPOPS_EST=25320265880000 # 24-08-2022
GREP="grep -q" # silent
GREP=grep # not silent; put '#' in front of this line if you want 'grep' to be silent.
if grep "<rsc_fpops_est>$FPOPS_EST\.000000</rsc_fpops_est>" $CLIENT_STATE; then
case $USER in (root) SUDO=;; (*) SUDO=sudo;; esac
cp -p $CLIENT_STATE /var/tmp/${CLIENT_STATE##*/}.$$
# (1) Stop BOINC.
$SUDO systemctl stop boinc-client
# (2) Rewrite $CLIENT_STATE.
$SUDO perl -w -i -p -e '
if (($b,$v,$e) = /(<rsc_fpops_est>)('$FPOPS_EST')(\.000000<\/rsc_fpops_est>)/) {
$v *= 4;
s//$b$v$e/;
}
$FPOPS_BOUND = 40 * '$FPOPS_EST';
if (($b,$v,$e) = /(<rsc_fpops_bound>)($FPOPS_BOUND)(\.000000<\/rsc_fpops_bound>)/) {
$v *= 4;
s//$b$v$e/;
}
' $CLIENT_STATE
# (3) Restart BOINC.
$SUDO systemctl start boinc-client
fi


See if this solves your problem, else experiment with the multiplication factor (its current value is 4, see above).
Good luck!
Adri


Many thanks, I’ll try this as soon as I’m back in front of my computers.
[Apr 23, 2023 12:46:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 14   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread