Module 'devicehealth' has failed: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Bug #1964322 reported by sascha arthur
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Fix Released
High
James Page

Bug Description

#ceph -s

...
health: HEALTH_ERR
            Module 'devicehealth' has failed: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
..

# dpkg -l | grep ceph
ii ceph-base 17.1.0-0ubuntu1 amd64 common ceph daemon libraries and management tools
ii ceph-common 17.1.0-0ubuntu1 amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-mds 17.1.0-0ubuntu1 amd64 metadata server for the ceph distributed file system
ii ceph-mgr 17.1.0-0ubuntu1 amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 17.1.0-0ubuntu1 all ceph manager modules which are always enabled
ii ceph-mon 17.1.0-0ubuntu1 amd64 monitor server for the ceph storage system
ii ceph-osd 17.1.0-0ubuntu1 amd64 OSD server for the ceph storage system
ii ceph-volume 17.1.0-0ubuntu1 all tool to facilidate OSD deployment
ii libcephfs2 17.1.0-0ubuntu1 amd64 Ceph distributed file system client library
ii libsqlite3-mod-ceph 17.1.0-0ubuntu1 amd64 SQLite3 VFS for Ceph
ii python3-ceph-argparse 17.1.0-0ubuntu1 amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 17.1.0-0ubuntu1 all Python 3 utility libraries for Ceph
ii python3-cephfs 17.1.0-0ubuntu1 amd64 Python 3 libraries for the Ceph libcephfs library

# ls -alh /usr/bin/python3
lrwxrwxrwx 1 root root 10 Jan 13 16:58 /usr/bin/python3 -> python3.10

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Jammy Jellyfish (development branch)
Release: 22.04
Codename: jammy

2022-03-08T17:58:18.807+0000 7f7734ff9640 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 338, in serve
    if self.db_ready() and self.enable_monitoring:
  File "/usr/share/ceph/mgr/mgr_module.py", line 1189, in db_ready
    return self.db is not None
  File "/usr/share/ceph/mgr/mgr_module.py", line 1201, in db
    self._db = self.open_db()
  File "/usr/share/ceph/mgr/mgr_module.py", line 1178, in open_db
    self.create_mgr_pool()
  File "/usr/share/ceph/mgr/mgr_module.py", line 1084, in create_mgr_pool
    self.rename_pool(devhealth, self.MGR_POOL_NAME)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1048, in rename_pool
    self.check_mon_command(c)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1560, in check_mon_command
    r = HandleCommandResult(*self.mon_command(cmd_dict, inbuf))
  File "/usr/share/ceph/mgr/mgr_module.py", line 1577, in mon_command
    self.send_command(result, "mon", "", json.dumps(cmd_dict), "", inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1641, in send_command
    self._ceph_send_command(result, svc_type, svc_id, command, tag, inbuf)
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

maybe bug fix here: https://github.com/ceph/ceph/pull/44112

sascha arthur (sarthur)
description: updated
Revision history for this message
sascha arthur (sarthur) wrote :

Module 'balancer' has failed: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

is affected as well.

Revision history for this message
sascha arthur (sarthur) wrote :

# ceph telemetry status
Error EINVAL: SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu):
status: New → Confirmed
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Cherry picked and pushed to git ready for the next upload to 22.04 development.

Changed in ceph (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
Revision history for this message
sascha arthur (sarthur) wrote :

thanks @james-page, any ETA for the fix beeing on the apt mirros?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 17.1.0-0ubuntu3

---------------
ceph (17.1.0-0ubuntu3) jammy; urgency=medium

  * d/p/py310-py-ssize-t-compat.patch: Cherry pick fix to resolve
    compatibility issues with Python 3.10 (LP: #1964322).
  * d/ceph-osd.postinst: apply sysctl tuning for ceph-osd daemons
    on installation (LP: #1903221).
  * d/control: Drop use of google-perftools on armhf (LP: #1812179).

 -- James Page <email address hidden> Tue, 22 Mar 2022 10:22:37 +0000

Changed in ceph (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
sascha arthur (sarthur) wrote :

~ # dpkg -l | grep ceph
ii ceph-base 17.1.0-0ubuntu3 amd64 common ceph daemon libraries and management tools
ii ceph-common 17.1.0-0ubuntu3 amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-mds 17.1.0-0ubuntu3 amd64 metadata server for the ceph distributed file system
ii ceph-mgr 17.1.0-0ubuntu3 amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 17.1.0-0ubuntu3 all ceph manager modules which are always enabled
ii ceph-mon 17.1.0-0ubuntu3 amd64 monitor server for the ceph storage system
ii ceph-osd 17.1.0-0ubuntu3 amd64 OSD server for the ceph storage system
ii ceph-volume 17.1.0-0ubuntu3 all tool to facilidate OSD deployment
ii libcephfs2 17.1.0-0ubuntu3 amd64 Ceph distributed file system client library
ii libsqlite3-mod-ceph 17.1.0-0ubuntu3 amd64 SQLite3 VFS for Ceph
ii python3-ceph-argparse 17.1.0-0ubuntu3 amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 17.1.0-0ubuntu3 all Python 3 utility libraries for Ceph
ii python3-cephfs 17.1.0-0ubuntu3 amd64 Python 3 libraries for the Ceph libcephfs library

~ # ceph telemetry status
Error EINVAL: SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Fix doesnt work

Changed in ceph (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
sascha arthur (sarthur) wrote :

Uhm, interesting.. its maybe still fixed.

But it needs a reconnect of ssh/reboot after installing the new packages..

Changed in ceph (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
sascha arthur (sarthur) wrote :

even after reboot / restart.. under random circumstances this bug comes back.

~ # dpkg -l | grep ceph
ii ceph-base 17.1.0-0ubuntu3 amd64 common ceph daemon libraries and management tools
ii ceph-common 17.1.0-0ubuntu3 amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-mds 17.1.0-0ubuntu3 amd64 metadata server for the ceph distributed file system
ii ceph-mgr 17.1.0-0ubuntu3 amd64 manager for the ceph distributed file system
ii ceph-mgr-modules-core 17.1.0-0ubuntu3 all ceph manager modules which are always enabled
ii ceph-mon 17.1.0-0ubuntu3 amd64 monitor server for the ceph storage system
ii ceph-osd 17.1.0-0ubuntu3 amd64 OSD server for the ceph storage system
ii ceph-volume 17.1.0-0ubuntu3 all tool to facilidate OSD deployment
ii libcephfs2 17.1.0-0ubuntu3 amd64 Ceph distributed file system client library
ii libsqlite3-mod-ceph 17.1.0-0ubuntu3 amd64 SQLite3 VFS for Ceph
ii python3-ceph-argparse 17.1.0-0ubuntu3 amd64 Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 17.1.0-0ubuntu3 all Python 3 utility libraries for Ceph
ii python3-cephfs 17.1.0-0ubuntu3 amd64 Python 3 libraries for the Ceph libcephfs library

~ # uptime
 20:55:23 up 52 min, 2 users, load average: 0.04, 0.13, 0.22

~ # ceph crash ls
Error EINVAL: SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

~ # ceph telemetry status
Error EINVAL: SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

any idea?

Changed in ceph (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

@sarcher: Have you restarted all of the Ceph daemons? The packages take pains to not restart the services when performing a package upgrade and intermittent failure like this sounds like it works when talking to newer daemons and doesn't when talking to older ones.

Revision history for this message
sascha arthur (sarthur) wrote :

hey @chris.macnaughton,

Yes i restarted all daemons, rebooted even the whole machine. As you can see in my console logs above. Having a cluster of +10 nodes, tried it on multiple hosts.

It was working at some point, but now it seems fully broken again. Which is.. weird.

The mgr-daemons are broken (especially this "balancer module" for example) and the CLI tools like "ceph crash ls" / "ceph telemtry status".

Wondering where this "not deterministic" factor is coming from. Some dynamic loading of python..?

Revision history for this message
sascha arthur (sarthur) wrote :

A lot of trouble, traced it down.

So basicly the fix is working.

The reason was an 'old' not restarted daemon on another host.

Very confusing is that even an old daemon can break itself (which is expected) AND the CLI tools on another host. Which was not expected by me and confused me hardly. Sorry for the report confusing.

Thanks for solving it.

Finally closed.

Changed in ceph (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.