Hello,The memory management unit (MMU) of your installed Linux kernel (version 6.10.3-amd64 - Debian 6.10.3-1) tried to free a memory page used for the process named "node" (what is it ?), but this memory page was probably in an inconsistent state according to some internal kernel checks. So, the kernel reported it. Probably, a memory corruption is the cause of the VM freeze.
Why memory page was in an inconsistent state ? Quite difficult to say.
Your kernel reports that it is "tainted" [1]: where:
Furthermore, in your assumption quoted below:So you have been running a new kernel version for a week. This could of course be in the causal chain, as it is the major change according to what you have reported so far.
Hope this helps.
--
[1] Tainted kernels
As @wizard10000 already pointed out, you reported a kernel messag error:Any ideas on this or should i just keep running it. I'm just concerned this will happen again. Restore or not to restore...
Code:
BUG: Bad page state in process node pfn:329600Aug 14 22:02:00 net1server kernel: page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) setAug 14 22:02:00 net1server kernel: Modules linked in: dm_mod xt_REDIRECT nvidia_uvm(PO) nfsv3 nfs_acl ip_vs_rr xt_ipvs ip_vs veth vxlan ip6_udp_tunnel udp_tunnel xt_policy xt_mark xt_bpf xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_ad>Aug 14 22:02:00 net1server kernel: configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci efivarfs qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sd_mod t10_pi hid_generic sr_mod usbhid cdrom crc64_rocksoft hid crc64 crc_t10dif crct10dif_generic>Aug 14 22:02:00 net1server kernel: CPU: 3 PID: 3510135 Comm: node Tainted: P O 6.10.3-amd64 #1 Debian 6.10.3-1Aug 14 22:02:00 net1server kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 4.2023.08-4 02/15/2024
Why memory page was in an inconsistent state ? Quite difficult to say.
Your kernel reports that it is "tainted" [1]:
Code:
Tainted: P O
- P - proprietary module was loaded
- O - externally-built (“out-of-tree”) module was loaded
Furthermore, in your assumption quoted below:
you are probably not taking into account that Debian released the 6.10.3-1 Linux kernel for Debian Unstable about two weeks ago and for Debian Testing (Trixie) one week ago (2024-08-11):[..] this is the first time this error has ever happen out of the three years my cluster has been up.
Code:
linux (6.10.3-1) unstable; urgency=medium * New upstream stable update: https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.10.2 https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.10.3 - ext4: don't track ranges in fast_commit if inode has inlined data (Closes: #1039883) [ Salvatore Bonaccorso ] * [rt] Update to 6.10.2-rt14 - Refresh patches and drop patches applied upstream * [arm64] drivers/net/ethernet/microsoft: Enable MICROSOFT_MANA as module * drivers/net/ethernet/pensando: Enable IONIC as module (Closes: #1041893) [ Vincent Blut ] * [arm64] drivers/phy/marvell: Enable PHY_MVEBU_CP110_UTMI as module (Closes: #1076934) [ Ben Hutchings ] * net: drop bad gso csum_start and offset in virtio_net_hdr (regression in 6.10.3) * spi: spidev: Add missing spi_device_id for bh2228fv (regression in 6.10.3) -- Ben Hutchings <benh@debian.org> Sun, 04 Aug 2024 22:10:58 +0200
Hope this helps.
--
[1] Tainted kernels
Statistics: Posted by Aki — 2024-08-17 18:58 — Replies 4 — Views 105