Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 70
Posts: 70   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 42930 times and has 69 replies Next Thread
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 181
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

BTW, here is data from BOINC startup log (from syslog) about the CPU WITH errors:

May 10 16:43:01 pc-14 systemd[1]: Started Berkeley Open Infrastructure Network Computing Client.
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Starting BOINC client version 7.16.16 for x86_64-pc-linux-gnu
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] log flags: file_xfer, sched_ops, task
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Libraries: libcurl/7.74.0 OpenSSL/1.1.1j zlib/1.2.11 brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.0 (+libidn2
/2.3.0) libssh/0.9.5/openssl/zlib nghttp2/1.43.0 librtmp/2.3
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Data directory: /var/lib/boinc-client
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] OpenCL: AMD/ATI GPU 0: AMD VERDE (DRM 2.50.0, 5.11.0-16-generic, LLVM 11.0.1) (driver version 21.0.1, d
evice version OpenCL 1.1 Mesa 21.0.1, 2048MB, 2048MB available, 512 GFLOPS peak)
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Creating new client state file
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] libc: Ubuntu GLIBC 2.33-0ubuntu5 version 2.33
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Host name: pc-14
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2]
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx
fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 s
se4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfc
tr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] OS: Linux Ubuntu: Ubuntu 21.04 [5.11.0-16-generic|libc 2.33 (Ubuntu GLIBC 2.33-0ubuntu5)]
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Memory: 11.60 GB physical, 9.31 GB virtual
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Disk: 91.17 GB total, 85.31 GB free
May 10 16:43:01 pc-14 boinc[6662]: 10-May-2021 16:43:01 [---] Local time is UTC -4 hours

----------------------------------------

[May 13, 2021 5:41:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 181
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

and here is info about CPU WITHOUT errors:

May 12 08:29:12 pc-15 boinc[1088]: 12-May-2021 08:29:12 [---] Starting BOINC client version 7.16.6 for x86_64-pc-linux-gnu
May 12 08:29:12 pc-15 boinc[1088]: 12-May-2021 08:29:12 [---] log flags: file_xfer, sched_ops, task
May 12 08:29:12 pc-15 boinc[1088]: 12-May-2021 08:29:12 [---] Libraries: libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0
.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
May 12 08:29:12 pc-15 boinc[1088]: 12-May-2021 08:29:12 [---] Data directory: /var/lib/boinc-client
May 12 08:29:12 pc-15 boinc[1088]: 12-May-2021 08:29:12 [---] No usable GPUs found
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] libc: Ubuntu GLIBC 2.31-0ubuntu9.2 version 2.31
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Host name: pc-15
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [Family 6 Model 58 Ste
pping 9]
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat ps
e36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx
f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_
clear flush_l1d
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] OS: Linux Ubuntu: Ubuntu 20.04.2 LTS [5.4.0-73-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubu
ntu9.2)]
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Memory: 7.47 GB physical, 5.86 GB virtual
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Disk: 9.61 GB total, 6.16 GB free
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Local time is UTC -4 hours
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Config: GUI RPCs allowed from:
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Config: report completed tasks immediately
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [World Community Grid] General prefs: from World Community Grid (last modified 11-May-2021 07:
49:49)
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [World Community Grid] Computer location: home
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] General prefs: using separate prefs for home
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Reading preferences override file
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] Preferences:
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] max memory usage when active: 5355.39 MB
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] max memory usage when idle: 5355.39 MB
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] max disk usage: 2.40 GB
May 12 08:29:13 pc-15 boinc[1088]: 12-May-2021 08:29:13 [---] max CPUs used: 3

----------------------------------------

[May 13, 2021 5:50:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 181
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

Making it more readable - I hope.
Here is a side-by-side difference.
I am still getting errors on the AMC chip and not on the intel chip.

diff -w 138 14-flagsSorted.txt 15-flagsSORTED.txt
00 WITH ERRORS | 00 WITHOUT ERRORS (pc-15)
01 Starting BOINC client version 7.16.16 for x86_64-pc-linux-gnu | 01 Starting BOINC client version 7.16.6 for x86_64-pc-linux-gnu
02 Libraries: libcurl/7.74.0 OpenSSL/1.1.1j zlib/1.2.11 | 02 Libraries: libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11
02a brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) | 02a brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0)
02b libssh/0.9.5/openssl/zlib nghttp2/1.43.0 librtmp/2.3 | 02b libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
03 libc: Ubuntu GLIBC 2.33-0ubuntu5 version 2.33 | 03 libc: Ubuntu GLIBC 2.31-0ubuntu9.2 version 2.31
04 Processor: 8 AuthenticAMD AMD FX(tm)-8150 | 04 Processor: 4 GenuineIntel Intel(R) Core(TM) i7-3770
04a Eight-Core Processor [Family 21 Model 1 Stepping 2] | 04a CPU @ 3.40GHz [Family 6 Model 58 Stepping 9]
> 05 Processor features:

3dnowprefetch 3dnowprefetch
abm abm
aes aes
aperfmperf aperfmperf
apic apic
arat arat
avx avx
clflush clflush
cmov cmov
cmp_legacy cmp_legacy
constant_tsc constant_tsc
cpb cpb
cpuid cpuid
cr8_legacy cr8_legacy
cx16 cx16
cx8 cx8
de de
decodeassists decodeassists
extapic extapic
extd_apicid extd_apicid
flushbyasid flushbyasid
fma4 fma4
fpu fpu
fxsr fxsr
fxsr_opt fxsr_opt
ht ht
hw_pstate hw_pstate
ibpb ibpb
ibs ibs
lahf_lm lahf_lm
lbrv lbrv
lm lm
mca mca
mce mce
misalignsse misalignsse
mmx mmx
mmxext mmxext
monitor monitor
msr msr
mtrr mtrr
nodeid_msr nodeid_msr
nonstop_tsc nonstop_tsc
nopl nopl
npt npt
nrip_save nrip_save
nx nx
osvw osvw
pae pae
pat pat
pausefilter pausefilter
pclmulqdq pclmulqdq
pdpe1gb pdpe1gb
perfc | perfctr_core
perfctr_nb perfctr_nb
pfthreshold pfthreshold
pge pge
pni pni
popcnt popcnt
pse pse
pse36 pse36
rdtscp rdtscp
rep_good rep_good
sep sep
skinit skinit
ssbd ssbd
sse sse
sse2 sse2
sse4_1 sse4_1
sse4_2 sse4_2
sse4a sse4a
ssse3 ssse3
svm svm
svm_lock svm_lock
syscall syscall
topoext topoext
tr_core <
tsc tsc
tsc_scale tsc_scale
vmcb_clean vmcb_clean
vme vme
vmmcall vmmcall
wdt wdt
xop xop
xsave xsave


----------------------------------------

[May 25, 2021 11:47:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 181
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

The differences in features are:
perfc and perfctr_core
tsc and no tsc
----------------------------------------

[May 25, 2021 11:50:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 181
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

2 and 3 June 2021 - Update.
On the AMD machine with the failing MIP1 WU, I have installed many fortran libraries and slowed down all options in the BIOS.
There has been a change.
Instead of failing in less than 2 or 3 minutes, the WU can run for about an hour.
None complete without failing.
for example(from https://www.worldcommunitygrid.org/ms/viewBoi...atus=-1&projectId=123

Result Name Device Name Status Sent Time (Time Due/Return Time) (CPU Time/ Elapsed Time (hours)) Claimed/Granted BOINC Credit

MIP1_00334749_5146_0 pc-14 Error 6/2/21 18:04:07 6/2/21 20:15:10 0.97 / 0.98 7.1 / 0.0


The error results now look like:

https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1735139001


<core_client_version>7.16.16</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
[2021- 6- 2 15:14:57:] :: BOINC:: Initializing ... ok.
[2021- 6- 2 15:14:57:] :: BOINC :: boinc_init()
INFO: result number = 0
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: ../../projects/www.worldcommunitygrid.org/wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu -in::file::zip MIP1_databasev2.zip @./MIP1_00334749.flags -out::file::silent result_silent.out -run:jran 154344808 -nstruct 1 -out::level 100 -run::no_scorefile true
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/www.worldcommunitygrid.org/mip1.MIP1_databasev2.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
set_shared_memory_fully_initialized ...
abrelax ...
abrelax.run
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Sequence Length = 355
Starting work on structure: _0001

</stderr_txt>



----------------------------------------

[Jun 3, 2021 7:40:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 181
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

Since MIP1 is ending, I'll call it quits - except if anyone has a firm lead.

T H A N K S !!
Jay
----------------------------------------

[Jun 3, 2021 7:42:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
xdarma
Cruncher
Joined: Oct 4, 2014
Post Count: 5
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

I confirm the failure of all WUs on linux with Amd CPU.
Maybe can be helpful, thus I report the instructions of the tested CPUs:

Phenom II x4 960t (K10 core): fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr cpb hw_pstate vmmcall npt lbrv svm_lock nrip_save pausefilter

FX 8300 (Piledriver core): fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

Ryzen 7 3700x (Zen2 core): fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
[Jun 13, 2021 3:55:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2159
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

Just the other night my AMD machine received this executable:
-rwxr-xr-x. 1 boinc boinc 80977064 Jul 26 20:07 wcgrid_mip1_rosetta_7.16_i686-pc-linux-gnu
It's a new machine, since two weeks in service, running all WCG projects, so also MIP1.

This setup has been running without problems, until today, when I started seeing Computation Errors. They all happen to be related to the i686 binary, because all the MIP1 tasks that have been running ever since with the x86_64 MIP1 binary are Valid. There isn't any valid i686 MIP1 task so far.
[Jul 27, 2021 11:24:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2982
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

Personally adriverhoef I wouldn't worry about it - as the MIP project is coming to an end within days.
----------------------------------------

[Jul 27, 2021 12:35:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2159
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of MIP1 WUs error out

Has anybody tried removing the i686 MIP1 binary from their Linux system by linking the x86_64 MIP1 binary to it, I mean by doing this:
  if cd ~boinc/projects/www.worldcommunitygrid.org/; then
[ -f wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu ] &&
ln -sf wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu wcgrid_mip1_rosetta_7.16_i686-pc-linux-gnu
fi
I did this at home before I went to work. When I came back, all 79 returned MIP1 results during my absence turned out to be Valid and some of them thought they had been run by the i686 MIP1 binary when in fact they have been run by the x86_64 MIP1 binary. tongue
[Jul 28, 2021 12:19:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 70   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 ]
[ Jump to Last Post ]
Post new Thread