Jump to content

Arch big update caused kernel panic -- help !!!!!


abarbarian

Recommended Posts

V.T. Eric Layton

WIPE IT CLEAN!

Start nice and shiny and new again! :)

 

Oh, and learn from your mistakes. ✅

  • Like 1
Link to comment
Share on other sites

securitybreach
10 hours ago, abarbarian said:

 

Yeah me neither of late. Apart from trying out a few programs I have done no fiddling around at all. Which is why the catastrophe is such a surprise.

 

Things got even worse.

 

I decided to try and replace the borked system with me backup. I seem to remember that I had actually done a real life test that had worked out so it should work , right.

It might have gone pretty smoothly if only I had not had a bright idea.

My ESP and ROOT partitions needed some adjustments. I wanted the ESP larger and the ROOT smaller.So I made the changes with gparted. Now I only have one nvme drive and I use "/dev/nvme0np*"descriptors both in my refind.conf and fstab so I thought that making small changes to partition sizes would not be problematic for booting. I was right and wrong in my thinking.

Used the rsync script and only gave it a quick look as it seemed to work well.

Tried to boot and ended up with a black screen with a GRUB prompt. Hmmmmmmmmmmmm. After some deep thought it struck me that I may have transfered the stuff correctly but with the partition changes the initramfs-linux.image stuff would not work so I would need to make some changes.

A long time ago when I first tried to use chroot I found it very hard to understand and use. Years later as like now, using chroot seemed so easy to do. I was just about to do a "mkinitcpio -P" when I had the bright idea to do a "pacman -Syu" first. Why not, I would have to upgrade sometime so why not now. I should have waited.

On rebooting I had the same fail with my main image, exactly the same.

The fallback image did work. Sort of. Whilst I could get to a working login prompt I could no longer use "startx" or "/usr/bin/wmaker" or anything to get to my old graphical desktop.It took me quite a while to figure out what the heck was going on. Seem like my rsync script was not 100%. My boot part worked OK, my ROOT part worked OK, my HOME part worked sort of. It placed its contents in "/home" instead of "/home/bloodaxe". Once I found the problem a simple copy and paste brought me almost back to normal. My FireFox was messed up but I did manage to get my bookmarks back which was my main concern.

So here I am after a day of beating my head against a brick wall still in the same situation, only able to boot via a fallback image.

Thinking about it I should have not done a update whilst in chroot mode. I would have ended up with a fully working set up which would have helped me how ? After an update I would still have ended up as I am now with no clue as to why.

 

This information below still bothers me. If the main image and the fallback image are made from the same components apart from the "autodetect" feature then the problem must be something to do with "modules"  I am going to try running mkinitcpio with the autodetect left out of the build.

 

 

 

I hate to say it but I am at a loss as none of it makes any sense. At this point, I would backup my configs (home folder/hidden folders and /etc), then reinstall. When I got it up and running, I would immediately do a clonezilla backup (or whatever you use). I know the feeling of wanting to know what the heck happened but I think its time to throw in the towel on this one.

  • Agree 2
Link to comment
Share on other sites

securitybreach
9 hours ago, crp said:

yeah, don't use Arch.

    🥵

 

I have not had any real breakage on arch since 2012 with the systemd move.

  • +1 1
Link to comment
Share on other sites

V.T. Eric Layton
12 hours ago, securitybreach said:

When I got it up and running, I would immediately do a clonezilla backup (or whatever you use).

 

Good advice, BUT... make sure everything is working fine once you have it all set up BEFORE you make that backup. AND do NOT overwrite your older backup, as you may need to pick & choose goodies from that one to get your new installation the way you want it. Once you're sure that you don't need that older back up (for /home/Erik stuff, etc.) then it's safe to overwrite with a new backup. This is assuming you are "mirroring" individual partitions onto a secondary drive. If you store your backups a different way, then do what works best for you.

  • Like 1
Link to comment
Share on other sites

V.T. Eric Layton
12 hours ago, wa4chq said:

May I suggest this?

 

"Thoughts and Prayers" won't cut it. However, an exorcism may be in order. ;)

 

th?id=OIP.YIL6VpJr8YPGUSl29lRVxAHaHa%26p

  • Haha 2
Link to comment
Share on other sites

abarbarian

Thanks folks for all the thoughts.

 

As to mistakes -- apart from doing an update I did nothing.

Backups --- almost sure that I did a trial run of my rysnc script when I got my larger nvme and it worked. So not sure why it did not work properly this time.

Clonezilla -- can only be used with same or smaller partitions - one reason why I looked to other solutions.

 

I do not have the motivation to do a new install. Life has kicked me in the teeth at this time and fixing a broken pc is low down on my list of things to cheer meself up. I'll carry on using the fallback image for now. When I get my mojo back I'll come up with another plan 9.

 

Can not resist fiddling though. So fiddled with neofetch to cheer meself up. 😋

 

BamZdIy.png

 

This is a hint as to the change

 

hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4

  • Like 2
Link to comment
Share on other sites

V.T. Eric Layton
1 hour ago, abarbarian said:

Life has kicked me in the teeth at this time and fixing a broken pc is low down on my list of things to cheer meself up.

 

I can totally relate to this statement/mood.

Link to comment
Share on other sites

After reading through this thread I thought it was high time to update Arch on my pathetic 2019 Toshiba netbook. After 477 package upgrades and a reboot everything is just fine. Mind you I'm using grub not reFind.

 

  • Like 2
Link to comment
Share on other sites

abarbarian
9 hours ago, raymac46 said:

After reading through this thread I thought it was high time to update Arch on my pathetic 2019 Toshiba netbook. After 477 package upgrades and a reboot everything is just fine. Mind you I'm using grub not reFind.

 

 

I do updates at least twice a week and always make a backup before I do  linux kernel update. So the large update was a real shock and the subsequent catastrophe even more so.

I have looked at logs to see if I can spot anything but with no luck. Nothing jumps out as a gremlin.

I thought it may have something to do with the strange mobo/ram issue I had a while back. I lost all ability to use more than one stick of ram.Then on a whim a few weeks later I tried a second stick of ram and it worked. So I now have 16GB ram which is adequate and I have not tried to see if all four ram sticks work. Could all of that be hardware gremlins or could it all have been software gremlins.

 

Whatever I will keep on using my fallback image as it seems to update and work. When I get my mojo back I will have a look into mobo/ram and os gremlins and annihilate them.

 

Oh and rEFind is not part of the problem at all.

 

😎

Link to comment
Share on other sites

securitybreach
34 minutes ago, abarbarian said:

 

I do updates at least twice a week and always make a backup before I do  linux kernel update. So the large update was a real shock and the subsequent catastrophe even more so.

I have looked at logs to see if I can spot anything but with no luck. Nothing jumps out as a gremlin.

I thought it may have something to do with the strange mobo/ram issue I had a while back. I lost all ability to use more than one stick of ram.Then on a whim a few weeks later I tried a second stick of ram and it worked. So I now have 16GB ram which is adequate and I have not tried to see if all four ram sticks work. Could all of that be hardware gremlins or could it all have been software gremlins.

 

Whatever I will keep on using my fallback image as it seems to update and work. When I get my mojo back I will have a look into mobo/ram and os gremlins and annihilate them.

 

Oh and rEFind is not part of the problem at all.

 

😎

 

I know it sounds strange but a hardware issue could of caused this. Usually whenever it makes zero sense, its usually a hardware issue.

Link to comment
Share on other sites

  • 2 weeks later...
abarbarian

Found the little gremlin that was causing mayhem.

 

It was connected with this,

 

https://forums.scotsnewsletter.com/index.php?/topic/97641-call-depth-tracking-with-linux-62-on-skylake-cpus/

 

Found this snippet

 

https://bbs.archlinux.org/viewtopic.php?id=283924

 

which led to here

 

https://bugs.archlinux.org/task/77601

 

that had a screenshot of a failed boot that looked identical to mine.

 

So to boot I have this in my refind_linux.conf

 

Quote

"Boot using default options" "root=/dev/nvme0n1p2 retbleed=stuff rw initrd=intel-ucode.img  initrd=initramfs-%v.img"

 

changed that to

 

Quote

"Boot using default options" "root=/dev/nvme0n1p2  rw initrd=intel-ucode.img  initrd=initramfs-%v.img"

 

and the jolly old Skylake booted up just fine.

 

As there are different options open to me and I do not want to run fully unprotected I am using

 

Quote

"Boot using default options" "root=/dev/nvme0n1p2 retbleed=off  rw initrd=intel-ucode.img  initrd=initramfs-%v.img"

 

As it does not impact so much as retbleed=stuff , I may go back to it if I can see no perceptible gains with retbleed=off. That is if it works of course after I have tested it out.

 

So happy days again all is good and I tracked down yet another pesky gremlin. 😋

  • Like 2
  • Agree 1
Link to comment
Share on other sites

Somerimes it's nice to have crummy old CPUs like AMD Beema or Intel Broadwell which aren't affected by retbleed very much (if at all.)

Edited by raymac46
  • Agree 1
Link to comment
Share on other sites

abarbarian
11 hours ago, raymac46 said:

Somerimes it's nice to have crummy old CPUs like AMD Beema or Intel Broadwell which aren't affected by retbleed very much (if at all.)

 

Yup you should never discount the value of some old stuff. This was a tricky gremlin and I only tracked it down by pure chance. 😎

  • Agree 1
Link to comment
Share on other sites

Right now in my dubious collection of trailing edge and junker hardware I have one CPU (Zen 2) that might be affected by retbleed. It is running Windows 11 right now.

AMD appears to be slightly less affected by retbleed overhead and it's not as if I am Amazon Web Services. If I keep away from dodgy sites and use an up-to-date browser I feel secure enough.

  • Agree 1
Link to comment
Share on other sites

securitybreach
1 hour ago, raymac46 said:

Right now in my dubious collection of trailing edge and junker hardware I have one CPU (Zen 2) that might be affected by retbleed. It is running Windows 11 right now.

AMD appears to be slightly less affected by retbleed overhead and it's not as if I am Amazon Web Services. If I keep away from dodgy sites and use an up-to-date browser I feel secure enough.

 

Retbleed was fixed last year.

Quote

 

Windows is not vulnerable because the existing mitigations already tackle it.[1] Linux kernels 5.18.14 and 5.19 contain the fixes.[5][6] The 32-bit Linux kernel, which is vulnerable, will not receive updates to fix the issue.[7]

 

 

https://en.wikipedia.org/wiki/Retbleed

 

2023-05-27-100155-268x77-scrot

Link to comment
Share on other sites

On 5/10/2023 at 12:05 PM, V.T. Eric Layton said:

WIPE IT CLEAN!

Start nice and shiny and new again! :)

 

Oh, and learn from your mistakes. ✅

Wipe it good...

  • Like 1
  • Haha 1
Link to comment
Share on other sites

V.T. Eric Layton

AMD Phenom II 1090 6-core CPU here with 4.4.301 kernel in Slackware 14.2. I never even heard of ratblood, redsteed, or whatever until this thread here. ;)

  • Haha 1
Link to comment
Share on other sites

V.T. Eric Layton
1 hour ago, wa4chq said:

Wipe it good...

 

Hmm... suddenly there's a thump-thump-thump bass line playing in my head and I'm picture red flower pots for some reason.

Link to comment
Share on other sites

13 hours ago, V.T. Eric Layton said:

AMD Phenom II 1090 6-core CPU here with 4.4.301 kernel in Slackware 14.2. I never even heard of ratblood, redsteed, or whatever until this thread here. ;)

Ah the joy of old hardware. Your neighbor hands you his 8-year-old laptop and in a couple of days you have it running as well as his new one.

  • Agree 1
  • +1 1
Link to comment
Share on other sites

securitybreach
16 hours ago, V.T. Eric Layton said:

AMD Phenom II 1090 6-core CPU here with 4.4.301 kernel in Slackware 14.2. I never even heard of ratblood, redsteed, or whatever until this thread here. ;)

 

Insane that Slackware is only at 4.4.301. That's years of missed improvements.

 

I just looked it up and it was released on January 10, 2016 https://en.wikipedia.org/wiki/Linux_kernel_version_history

  • Like 1
Link to comment
Share on other sites

V.T. Eric Layton

The above is just my system's Slackware. My kernel is up-to-date for 14.2, but Slackware 15 is running 5.15.19.

  • Like 1
Link to comment
Share on other sites

securitybreach
1 hour ago, V.T. Eric Layton said:

The above is just my system's Slackware. My kernel is up-to-date for 14.2, but Slackware 15 is running 5.15.19.

 

Ah ok

Link to comment
Share on other sites

abarbarian
On 5/27/2023 at 4:00 PM, securitybreach said:

Retbleed was fixed last year.

 

Indeed yes and then they went and unfixed it.

 

This post fro march 2023 led me to the truth,

 

https://bbs.archlinux.org/viewtopic.php?id=283924

 

Quote

I've encountered some problems with Vanilla and Zen kernel v6.2.1 together with the new retbleed=stuff kernel parameter. Both problems seem to occur for other users too.

 

from there I got to here

 

https://bugs.archlinux.org/task/77601

 

Apparently the bug appeared in " package version: 6.2.arch1-1 " . It seems to have been more permanently patched by mid February reading down the message.

 

Quote

Comment by Worty (w0rty) - Friday, 24 February 2023, 15:17 GMT

Yeah that patch fixed it on my VM and host!

output of lscpu: Retbleed: Mitigation; Stuffing

Thanks for your help! I did some benchmarks and it is a few percent faster.
Hope this patch gets merged before linux hitting core.

 

However by May 2023 and with the introduction of " linux 6.3.4.arch1-1 " the bug is back.

 

Quote

Comment by Toolybird (Toolybird) - Saturday, 06 May 2023, 21:38 GMT

Dupe  FS#78425 

Patch hasn't been applied upstream. The Arch patch was dropped when we upgraded to 6.3.x. Maybe @heftig assumed it had been applied? Someone needs to lobby the kernel folks...

 

 

Quote

Comment by Worty (w0rty) - Monday, 15 May 2023, 22:00 GMT

Yep, can confirm its the same bug again.

 

Yup I'll confirm that too. My solution is to run with "retbleed-off" for the moment.

 

😎

 

 

 

Link to comment
Share on other sites

I honestly wonder how much effect all these "vulnerabilities" are having on the Linux universe. Are they out in the wild or just academic curiosities? Would the mitigations really mess up an old desktop Linux system that is used for light Web surfing and some office stuff? Seems to me that the risk of a kernel panic is worse than anything that Spectre can do to you.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...