Jump to content


mcelog


  • Please log in to reply
16 replies to this topic

#1 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 15 July 2012 - 08:50 PM

Hi all!
I think my cpu is on it's way out. The computer keeps on crashing for no reasons; either after a few hours or sometimes just a few minutes. When I start it I get 2 messages at the very top that took me hours to write down so fast they go by;

Hardware Error; no human readable MCE decoding support on this cpu type
Hardware Error; run the message through 'mcelog --ascii'

or something to that effect.

I found this on wikipedia;

Quote

The error usually occurs due to failure or overstressing of hardware components where the error cannot be more specifically identified with a different error message.[clarification needed] Diagnosing the error message can be difficult, although Intel Pentium processors do generate more specific codes which can be decoded by contacting the manufacturer.[citation needed]

MCEs require a restart of the system before users can continue normal operation: they often indicate a long-term problem of a general nature.

and;

Quote

Normal causes[clarification needed] for MCE errors include overheating and/or incorrect hardware installation. Some specific manually induced causes could include:

overclocking (which normally increases heat-output)
poorly fitted heatsink/computer fans (the same problem can happen with excessive dust in the CPU fan)
an overloaded internal or external power supply (fixable by upgrading)

Computer software can also cause MCE errors (normally by corrupting data which programs read or write). For example, software performing read or write operations from or to non-existent memory regions can lead to confusion for the processor and/or the system bus.[citation needed]

I didn't overclock and it happens most of the time when I have too many tabs opened in FF (in Linux) or if I play a game in Windows. It just happened 3 times while I was writing this post and trying to open a new tab to go to wikipedia to retrieve the info I found last night. I am now writing it first in gedit ( kinda like notepad ) which I save every paragraph so I don't have to retype the whole thing over and over again.

I did get a 12 foot s-video cable and connected a big tv set someone gave me to what used to be my wife's video card video card ( EVGA e-GEForce FX 5500 ) but I do get the crashes even when the cable in unplugged.

I have blown out the dust (which I do on a regular basis) and I also tried reseating the RAM as someone suggested in a recent post..

So would you say the problem is the cpu and how could I retrieve the info running the message through 'mcelog --ascii'? I've tried a whole lot of variations in a console without any luck?

Edited by réjean, 15 July 2012 - 08:56 PM.


registered linux user #374889

#2 OFFLINE   zlim

zlim

    It's me, plodr

  • Forum MVP
  • 6,067 posts

Posted 15 July 2012 - 09:53 PM

Quote

This is pretty much a bug in newer Linux kernels. They print this message on every corrected error, even though it's useless and also the decoding into the kernel log is not very useful because mcelog can aggregate the information much better.
Source: http://mcelog.org/faq.html#13
Patch http://git.kernel.or...432e8e862037bfd
Liz
Registered Linux User # 401459
Posted Image

#3 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 15 July 2012 - 10:10 PM

Thanks Liz!
I am not very concerned about the message appearing. What worries me the most is all the crashing and what it might do to the cpu and also the inconvenience of rebooting all the time.

registered linux user #374889

#4 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 15 July 2012 - 10:23 PM

After reading the mcelog FAQ I looked for a /var/log/mcelog but I don't see such a file in PCLinuxOS Zen.

registered linux user #374889

#5 OFFLINE   sunrat

sunrat

    Discussion Deity

  • Forum Moderators
  • 3,577 posts

Posted 15 July 2012 - 10:30 PM

Have you checked CPU temperatures? I had a similar problem when the heatsink fan got jammed. Also run a memory testing utility; you can use Parted Magic live CD for that.
Definitely sounds like a hardware problem if it exists in Linux and Windows.
registered Linux user number 324659  || The importance of Reading The *Fine* Manual! :D
Posted ImagePosted ImagePosted Image  
Today's subliminal thought is:

#6 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 15 July 2012 - 10:41 PM

What could I use to check the cpu temperatures. I have Gnome System Monitor installed and I use it to check the cpu usages but I don't see a temperature check.
I'll give Parted Magic Live a try to check the memory if I still have it.

registered linux user #374889

#7 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 15 July 2012 - 11:17 PM

I am presently doing a memory test using OpenSUSE latest RC and everything was okay until test #7 [random number sequence] where I have over 1000000 error bits. Until now 46% had pass.

registered linux user #374889

#8 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 15 July 2012 - 11:34 PM

Actually the CPU temperature is 42 C,107 F
the M/B temperature is 36 C, 96 F
and the CPU Fan Speed is 2207 RPM.

I should had that I haven't seen the messages showing up this evening nore has the computer crashed. But I haven't forced the machine either.

registered linux user #374889

#9 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 16 July 2012 - 12:13 AM

I just finished redoing the same Memtest 86 v.4.20 this time using a Parted Magic live cd and test #7 had no errors, test #8 [modulo 20, Random Pattern] went flawlessly also. So this time 100% Passed without any error.
So who knows what is going on right now.

registered linux user #374889

#10 OFFLINE   sunrat

sunrat

    Discussion Deity

  • Forum Moderators
  • 3,577 posts

Posted 16 July 2012 - 02:36 AM

View Postréjean, on 15 July 2012 - 11:34 PM, said:

Actually the CPU temperature is 42 C,107 F
the M/B temperature is 36 C, 96 F
and the CPU Fan Speed is 2207 RPM.

I should had that I haven't seen the messages showing up this evening nore has the computer crashed. But I haven't forced the machine either.
Temps are normal. Memtest errors are a worry. Try running it for a few hours or overnight. Intermittent errors are hard to diagnose.
Computers have a curious habit of crashing when you need them most, and working perfectly when you have time to troubleshoot. :angry: :rolleyes:
registered Linux user number 324659  || The importance of Reading The *Fine* Manual! :D
Posted ImagePosted ImagePosted Image  
Today's subliminal thought is:

#11 OFFLINE   ross549

ross549

    I live here.

  • Forum Admins
  • 7,620 posts

Posted 16 July 2012 - 04:34 AM

View Postréjean, on 15 July 2012 - 08:50 PM, said:

So would you say the problem is the cpu and how could I retrieve the info running the message through 'mcelog --ascii'? I've tried a whole lot of variations in a console without any luck?

mcelog --ascii > log.txt

That should drop a file in your current location with the output from mcelog.

Adam
I don't suffer from insanity, I enjoy it.
Posted Image Posted Image Posted Image Posted Image

#12 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 16 July 2012 - 11:40 AM

This morning I am not seeing the error messages but I had the hardest time connecting to the internet. My wife kept getting a message about some conflict in the ipconfig or something to that effect. Anyway here is what I get Ross;
[rejean@localhost ~]$ mcelog --ascii > log.txt
bash: mcelog: command not found
[rejean@localhost ~]$ su
Password:
[root@localhost rejean]# mcelog --ascii > log.txt
bash: mcelog: command not found
[root@localhost rejean]#


Edited by réjean, 16 July 2012 - 12:10 PM.


registered linux user #374889

#13 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 16 July 2012 - 12:09 PM

I just had a look at my wife's computer and here is the exact message she is getting;

Quote

There is an IP address conflict with another system on the network.

and today the clock is off by 6 and a half hour.
I am passing this info along just to see if it can help finding out what the main problem is.

Edited by réjean, 16 July 2012 - 01:27 PM.


registered linux user #374889

#14 OFFLINE   ross549

ross549

    I live here.

  • Forum Admins
  • 7,620 posts

Posted 16 July 2012 - 04:19 PM

If the command is not found try installing mcelog through your favorite package manager and try again.

Adam
I don't suffer from insanity, I enjoy it.
Posted Image Posted Image Posted Image Posted Image

#15 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 16 July 2012 - 04:53 PM

It's not there. I had thought about it yesterday but couldn't find it. I looked again into synaptic and it is not listed.

registered linux user #374889

#16 OFFLINE   sunrat

sunrat

    Discussion Deity

  • Forum Moderators
  • 3,577 posts

Posted 16 July 2012 - 08:56 PM

You're in PCLOs, correct? Not sure if you have apt-cache, but in Debian you can do:
roger@brain:~$ apt-cache search mcelog
mcelog - x86 Machine Check Exceptions collector and decoder
apt-cache search will find the term in package name or description, so if the package is named differently it should still find it.

Quote

There is an IP address conflict with another system on the network.
I get this one occasionally too (share house with up to 8 computers), mainly in Windows7. The most likely cause is when 2 computers have static IP addresses and attempt to connect at the same time, or if DHCP assigns an address and later a static address joins with the same IP. Make sure all computers on the network use dynamic addresses with DHCP and this shouldn't happen.
registered Linux user number 324659  || The importance of Reading The *Fine* Manual! :D
Posted ImagePosted ImagePosted Image  
Today's subliminal thought is:

#17 OFFLINE   réjean

réjean

    Discussion Deity

  • Forum MVP
  • 3,984 posts

Posted 16 July 2012 - 09:01 PM

It didn't find anything;
[rejean@localhost ~]$ apt-cache search mcelog
[rejean@localhost ~]$


after searching for a good minute.
Thanks for the info about DHCP. I am pretty sure I always use dynamic addresses but I'll check again.

registered linux user #374889




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users