Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 19
Posts: 19   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 12042 times and has 18 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1957
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

And coming home tonight, the host from my first post in this thread has another blocking WU.
Application                  Mapping Cancer Markers 7.41 
Name MCM1_0160081_1049
State Running
Received 3/4/2020 10:54:02 AM
Report deadline 3/11/2020 11:54:03 AM
Estimated computation size 47,923 GFLOPs
CPU time 22:38:33
CPU time since checkpoint 00:18:14
Elapsed time 20:26:00
Estimated time remaining ---
Fraction done 100.000%
Virtual memory size 6.40 MB
Working set size 7.52 MB
Directory slots/8
Process ID 13104
Progress rate 5.040% per hour
Executable wcgrid_mcm1_map_7.41_windows_x86_64
And again, how can the CPU time be larger than the elapsed time ?
----------------------------------------

[Mar 7, 2020 5:38:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 181
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

Hey There.
I am running Linux and having a problem with Matching Cancer Markers (MCM).

Is this the same problem - or should I start another post?
I have 11GB of memory, and 8 processors, yet I see
- 12 different (by process ID ) boincmgr tasks running - each using 98 GB of VM
- 10 processes of WebKit process - each using 97 GB of VM
- 11 processes of WebKit process - each using 81.9 GB of VM
I was running
- 1 WU of Einstien with 1.0 dedicated CPU
- 4 WU of MCM
- 3 WU of MIP
- I have
- Let all WU complete; Reset and removed projetcs; Removed and reinstalled BOINC - and cold shutdown and restarts several times.

of User/System times - about 25 % of each WU process is in system time (by htop and gkrellm)

HOWEVER, vmstat and the system monitor shows less than 25% of my memory used.

My BOINC Computing Preferences include:
- Memory - CPU in use: 80%
- Memory - CPU Not in use: 80%
- Page/swap file : 75%

System monitor shows NO swap file in use. (0 of 48.83 GiB)

I am stumped.
Also, Boinc Event Log shows no errors. Here is what it says:

Tue 31 Mar 2020 02:09:43 AM EDT | | Starting BOINC client version 7.16.3 for x86_64-pc-linux-gnu
Tue 31 Mar 2020 02:09:43 AM EDT | | log flags: file_xfer, sched_ops, task
Tue 31 Mar 2020 02:09:43 AM EDT | | Libraries: libcurl/7.65.3 OpenSSL/1.1.1c zlib/1.2.11 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib nghttp2/1.39.2 librtmp/2.3
Tue 31 Mar 2020 02:09:43 AM EDT | | Data directory: /var/lib/boinc-client
Tue 31 Mar 2020 02:09:43 AM EDT | | OpenCL: AMD/ATI GPU 0: AMD VERDE (DRM 2.50.0, 5.3.0-42-generic, LLVM 10.0.0) (driver version 20.0.0-devel - padoka PPA, device version OpenCL 1.1 Mesa 20.0.0-devel - padoka PPA, 2048MB, 2048MB available, 512 GFLOPS peak)
Tue 31 Mar 2020 02:09:43 AM EDT | | [libc detection] gathered: 2.30, Ubuntu GLIBC 2.30-0ubuntu2.1
Tue 31 Mar 2020 02:09:43 AM EDT | | Host name: pc-14
Tue 31 Mar 2020 02:09:43 AM EDT | | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2]
Tue 31 Mar 2020 02:09:43 AM EDT | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
Tue 31 Mar 2020 02:09:43 AM EDT | | OS: Linux Ubuntu: Ubuntu 19.10 [5.3.0-42-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu2.1)]
Tue 31 Mar 2020 02:09:43 AM EDT | | Memory: 11.62 GB physical, 48.83 GB virtual
Tue 31 Mar 2020 02:09:43 AM EDT | | Disk: 133.57 GB total, 124.70 GB free
Tue 31 Mar 2020 02:09:43 AM EDT | | Local time is UTC -4 hours


THanks in ADVANCE!!
jay
PS
Extensive memory tests show no error.
Read/Write of every block in that partition holding /var shows No errors.
I don't always get this problem. I don't always run MCM.
I am aborting and letting all MCM tasks complete and will reboot and seen if the porblem continues.
(( Stay safe out there!! ))
jay
------------------
[edit]
PPS
The use of sytem time goes away whem no MCM WU running.
However 32 (!) BOINC-related tasks are *still* using 98, 97, or 81GB of memory.
[/edit]
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by jay_Orlando at Mar 31, 2020 2:34:30 PM]
[Mar 31, 2020 2:27:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

Jay

Off topic but related to your set-up. I am running 1 Einstein with WCG, but my Einstein is running on GPU only, leaving 8 threads for WCG.

Mike
[Apr 1, 2020 1:21:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 181
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

Hello Mike,
That is what I am doing. Einstein WU runs on GPU - but I have 1.0 CPU assigned to support the GPU - leaving 7 of my kernels to crunch WCG.

Do you see any weirdness with MCM??

Jay
----------------------------------------

[Apr 2, 2020 8:27:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Macroman
Advanced Cruncher
Joined: Jun 4, 2005
Post Count: 112
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

I had what I think is the same issue on an old machine running Windows 10. I find that it occasionally gets stuck at various points but my experience has been that I can suspend and restart the task and have it finish normally. I have found that often several hours of computation are wasted after the resume. I have considered creating a tool to monitor and automatically correct stuck tasks but have not undertaken this so far.
[Apr 4, 2020 4:28:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
DE113936
Cruncher
Joined: Mar 28, 2016
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

I experienced this issue on Windows based systems in the past. After an undefined timespan the WU (Zika or MIP) resets itself, restarts from the last checkpoint and finishes successfully. If you want to speed up the process close BOINC with the option to stop all running tasks. Open task manager and search if there is a project related task still listed (in my case the affected task won’t stop gracefully) and aboard it. Restart BOINC and start processing again. I never experienced that the same task run in this issue twice.
[Apr 4, 2020 9:00:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

Last I know BOINC can handle at least 200 slots, for each new job one is created whilst the slot for the last finished job is held until it is transmitted and reported. Then it is to go into a cleaning cycle and deleted. A never ending job will of course hold on to the slot but 'blocking slots' to the extend it affects any other jobs, don't think so.
[Apr 4, 2020 10:22:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1957
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

And here is a new version of that issue:

Application Mapping Cancer Markers 7.41
Name MCM1_0161869_4260
State Running
Received 4/11/2020 1:47:54 AM
Report deadline 4/18/2020 1:47:54 AM
Estimated computation size 34,132 GFLOPs
CPU time 00:00:26
CPU time since checkpoint 00:00:26
Elapsed time 05:41:41
Estimated time remaining ---
Fraction done 0.500%
Virtual memory size 70.76 MB
Working set size 71.97 MB
Directory slots/1
Process ID 4972
Executable wcgrid_mcm1_map_7.41_windows_x86_64

----------------------------------------

[Apr 11, 2020 9:45:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Never ending WUs blocking slots

Jay

If you restrict Einstein to BRP4, it only uses part of a thread, leaving 8 available for WCG.

Mike
[Apr 12, 2020 1:46:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 19   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread