Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 129
Posts: 129   Pages: 13   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 8683 times and has 128 replies Next Thread
ysaillet@de.ibm.com
Cruncher
Joined: Jan 12, 2005
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

I have a work units which is stuck for about 10h at 28.00% without any progress since there (the CPU time is blicked at 01:17:29, 02:44:45 to completion)

The name of the work unit is za060_00297_9 using hpf2 5.07. The device name is tpyannic.boeblingen.de.ibm.com

I'm running the BOINC client on a RHEL. The machine is a laptop with a centrino dual core 2.16Ghz CPU and 2Gb RAM. Another work unit is running in parallel and it runs fine.

The deadline for the report is Mon 10 Jul 2006. Should I kill the work unit?

Thanks,
Yannick
[Jul 7, 2006 2:48:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

Hello ysaillet,
BOINC Manager shows your Device ID Number at the start of your log, just after boot. If you look at the second post in this thread, you will see some information copied from the start of a log, with irrelevant details deleted. Your stuck work unit is probably stuck. Even so, for certainty, let it run a few more hours before you abort it.

Lawrence
[Jul 7, 2006 3:04:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ysaillet@de.ibm.com
Cruncher
Joined: Jan 12, 2005
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

Hello ysaillet,
BOINC Manager shows your Device ID Number at the start of your log, just after boot. If you look at the second post in this thread, you will see some information copied from the start of a log, with irrelevant details deleted. Your stuck work unit is probably stuck. Even so, for certainty, let it run a few more hours before you abort it.

Lawrence


Here is the info with the device ID:
2006-07-07 09:12:43 [---] Starting BOINC client version 5.2.8 for i686-pc-linux-gnu
2006-07-07 09:12:43 [---] libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3
2006-07-07 09:12:43 [---] Data directory: /home/yannick/BOINC
2006-07-07 09:12:43 [---] Processor: 2 GenuineIntel Genuine Intel(R) CPU T2600 @ 2.16GHz
2006-07-07 09:12:43 [---] Memory: 1.98 GB physical, 1019.71 MB virtual
2006-07-07 09:12:43 [---] Disk: 55.13 GB total, 34.54 GB free
2006-07-07 09:12:43 [World Community Grid] Computer ID: 3714; location: Default; project prefs: default
2006-07-07 09:12:43 [---] General prefs: from World Community Grid (last modified 1970-01-01 01:00:01)
2006-07-07 09:12:43 [---] General prefs: no separate prefs for Default; using your defaults
2006-07-07 09:12:43 [---] Remote control not allowed; using loopback address
2006-07-07 09:12:43 [World Community Grid] Resuming computation for result faah0681_d477cb097_x1hpv_03_0 using faah version 509
2006-07-07 09:12:43 [World Community Grid] Resuming computation for result za060_00297_9 using hpf2 version 507
[Jul 7, 2006 3:32:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

BOINC User ID 225561, Host ID 41341.

Error 7: za078_ 00637, returned 07/07/2006 15:24:38, 1 other copy with error, 3 copies in progress, "The environment is incorrect. (0xa) - exit code 10 (0xa), Exception occurred while running Rosetta: Exception code: 0xc0000005, Exception address: 0x00A87C94"

(I should note, since all these Errors make it look like my machine is unstable, that it's not just me. If I Error out, the other copies do too, and in several hundred HPF1 and FAAH units I never got an Error or Invalid.)

And another Endlessly Inconclusive:

za062_ 00587, returned 07/07/2006 10:07:00, 11 other Inconclusives so far (12 total), with a 13th In Progress.
[Jul 7, 2006 7:19:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

Hi Mark099,
Could you give us your Computer ID and the work unit name? Also, which client are you running, what CPU and how much RAM and Virtual Memory do you have?
Lawrence


Agent Version: 3.0 (2844)
Device Name: Mark099-2
Device ID: 235956
Tasks: Proteome_Folding_2

At 0% after 49 1/2 hours now.

How do I abort this WU without reinstalling WCG?

My computer specs:

Opteron 146 @ 2.6Ghz
2GB Corsair PC4000 DDR
Virtual memory?
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 8, 2006 1:39:26 AM]
[Jul 8, 2006 1:35:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

Hi Mark099.

Please will you give us a little more information?

What is your Device ID, and what time did you download the work unit (UTC, or local time + timezone)?

After over 24 hours with no progress, you should feel free to abort the work unit.

Thank you.


How do I abort the unit?
[Jul 8, 2006 1:40:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

2006-06-26 07:07:52 [---] Starting BOINC client version 5.4.9 for windows_intelx86
2006-06-26 07:07:52 [---] libcurl/7.15.3 OpenSSL/0.9.8a zlib/1.2.3
2006-06-26 07:07:52 [---] Executing as a daemon
2006-06-26 07:07:52 [---] Data directory: C:\Program Files (x86)\BOINC
2006-06-26 07:07:52 [---] BOINC is running as a service and as a non-system user.
2006-06-26 07:07:52 [---] No application graphics will be available.
2006-06-26 07:07:53 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) 64 Processor 3300+
2006-06-26 07:07:53 [---] Memory: 895.39 MB physical, 2.13 GB virtual
2006-06-26 07:07:53 [---] Disk: 59.62 GB total, 50.00 GB free
2006-06-26 07:07:54 [World Community Grid] URL: http://www.worldcommunitygrid.org/; Computer ID: 37184; location: ; project prefs: default
2006-06-26 07:07:54 [---] General prefs: from http://boinc.bio.wzw.tum.de/boincsimap/ (last modified 2006-06-25 18:39:36)
2006-06-26 07:07:54 [---] General prefs: using your defaults
2006-06-26 07:07:54 [---] Reading preferences override file
2006-06-26 07:07:54 [---] Remote control allowed

------------------------ snip ---------------
2006-07-04 17:21:21 [World Community Grid] Finished download of file za092_00323_aaza09209_05.075_v1_3.gz
2006-07-04 17:21:21 [World Community Grid] Throughput 321224 bytes/sec
2006-07-04 17:21:22 [---] Rescheduling CPU: files downloaded
2006-07-04 19:47:56 [World Community Grid] Unrecoverable error for result za066_00933_5 ( - exit code -1073741819 (0xc0000005))

------------------------ snip ---------------

2006-07-07 18:36:45 [World Community Grid] Finished download of file za099_00020_aaza09909_05.075_v1_3.gz
2006-07-07 18:36:45 [World Community Grid] Throughput 322911 bytes/sec
2006-07-07 18:36:46 [---] Rescheduling CPU: files downloaded
2006-07-07 19:24:06 [World Community Grid] Unrecoverable error for result za098_00645_2 (The environment is incorrect. (0xa) - exit code 10 (0xa))
[Jul 8, 2006 2:47:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

BOINC User ID 225561, Host ID 41341.

Not sure if this is worth reporting or not. I've never seen a result of "Other" before, but the 13-copy Inconclusive I reported earlier has changed. Probably just means "finally gave up," in which case the other 2 Endlessly Inconclusives I've got will probably go this way eventually.

za062_ 00587, returned 07/07/2006 10:07:00, 13 copies sent, all now show status "Other".

BTW, the Result Status list doesn't filter for status Other correctly. When you select Other from the drop-down list, the Filter button does nothing.
[Jul 8, 2006 3:30:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

Hello Mark099,
Right click at the bottom of your screen, select Task Manager, then select WCGrid_Rosetta in the processes, then Kill it.
Lawrence
[Jul 8, 2006 3:57:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: A few unusual HPF2 work units

RickH, if you see "Other" and possibly a date of 1.1.970, it means the sending was retracted. Thus if you count 13 on a WU, 12 were send, but now we already got 14 and counting, which is against the 12 cut-off algorythm that knreed advised on in that other thread.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jul 8, 2006 6:47:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 129   Pages: 13   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread