v0.72.3 Emperor (pending release)

Upgrading

  • Monitor ‘auth’ read-only commands now expect the user to have ‘rx’ caps. This is the same behavior that was present in dumpling, but in emperor and more recent development releases the ‘r’ cap was sufficient. Note that this backported security fix will break mon keys that are using the following commands but do not have the ‘x’ bit in the mon capability:

    1. ceph auth export
    2. ceph auth get
    3. ceph auth get-key
    4. ceph auth print-key
    5. ceph auth list

v0.72.2 Emperor

This is the second bugfix release for the v0.72.x Emperor series. We have fixed a hang in radosgw, and fixed (again) a problem with monitor CLI compatibility with mixed version monitors. (In the future this will no longer be a problem.)

Upgrading

  • The JSON schema for the ‘osd pool set …’ command changed slightly. Please avoid issuing this particular command via the CLI while there is a mix of v0.72.1 and v0.72.2 monitor daemons running.

  • As part of fix for #6796, ‘ceph osd pool set <pool> <var> <arg>’ now receives <arg> as an integer instead of a string. This affects how ‘hashpspool’ flag is set/unset: instead of ‘true’ or ‘false’, it now must be ‘0’ or ‘1’.

Changes

  • mon: ‘osd pool set …’ syntax change

  • osd: added test for missing on-disk HEAD object

  • osd: fix osd bench block size argument

  • rgw: fix hang on large object GET

  • rgw: fix rare use-after-free

  • rgw: various DR bug fixes

  • rgw: do not return error on empty owner when setting ACL

  • sysvinit, upstart: prevent starting daemons using both init systems

For more detailed information, see the complete changelog.

v0.72.1 Emperor

Important Note

When you are upgrading from Dumpling to Emperor, do not run any of the “ceph osd pool set” commands while your monitors are running separate versions. Doing so could result in inadvertently changing cluster configuration settings that exhaust compute resources in your OSDs.

Changes

  • osd: fix upgrade bug #6761

  • ceph_filestore_tool: introduced tool to repair errors caused by #6761

This release addresses issue #6761. Upgrading to Emperor can cause reads to begin returning ENFILE (too many open files). v0.72.1 fixes that upgrade issue and adds a tool ceph_filestore_tool to repair osd stores affected by this bug.

To repair a cluster affected by this bug:

  1. Upgrade all osd machines to v0.72.1

  2. Install the ceph-test package on each osd machine to get ceph_filestore_tool

  3. Stop all osd processes

  4. To see all lost objects, run the following on each osd with the osd stopped and the osd data directory mounted:

    1. ceph_filestore_tool --list-lost-objects=true --filestore-path=<path-to-osd-filestore> --journal-path=<path-to-osd-journal>
  5. To fix all lost objects, run the following on each osd with the osd stopped and the osd data directory mounted:

    1. ceph_filestore_tool --fix-lost-objects=true --list-lost-objects=true --filestore-path=<path-to-osd-filestore> --journal-path=<path-to-osd-journal>
  6. Once lost objects have been repaired on each osd, you can restart the cluster.

Note, the ceph_filestore_tool performs a scan of all objects on the osd and may take some time.

v0.72 Emperor

This is the fifth major release of Ceph, the fourth since adopting a 3-month development cycle. This release brings several new features, including multi-datacenter replication for the radosgw, improved usability, and lands a lot of incremental performance and internal refactoring work to support upcoming features in Firefly.

Important Note

When you are upgrading from Dumpling to Emperor, do not run any of the “ceph osd pool set” commands while your monitors are running separate versions. Doing so could result in inadvertently changing cluster configuration settings that exhaust compute resources in your OSDs.

Highlights

  • common: improved crc32c performance

  • librados: new example client and class code

  • mds: many bug fixes and stability improvements

  • mon: health warnings when pool pg_num values are not reasonable

  • mon: per-pool performance stats

  • osd, librados: new object copy primitives

  • osd: improved interaction with backend file system to reduce latency

  • osd: much internal refactoring to support ongoing erasure coding and tiering support

  • rgw: bucket quotas

  • rgw: improved CORS support

  • rgw: performance improvements

  • rgw: validate S3 tokens against Keystone

Coincident with core Ceph, the Emperor release also brings:

  • radosgw-agent: support for multi-datacenter replication for disaster recovery

  • tgt: improved support for iSCSI via upstream tgt

Packages for both are available on ceph.com.

Upgrade sequencing

There are no specific upgrade restrictions on the order or sequence of upgrading from 0.67.x Dumpling. However, you cannot run any of the “ceph osd pool set” commands while your monitors are running separate versions. Doing so could result in inadvertently changing cluster configuration settings and exhausting compute resources in your OSDs.

It is also possible to do a rolling upgrade from 0.61.x Cuttlefish, but there are ordering restrictions. (This is the same set of restrictions for Cuttlefish to Dumpling.)

  1. Upgrade ceph-common on all nodes that will use the command line ‘ceph’ utility.

  2. Upgrade all monitors (upgrade ceph package, restart ceph-mon daemons). This can happen one daemon or host at a time. Note that because cuttlefish and dumpling monitors can’t talk to each other, all monitors should be upgraded in relatively short succession to minimize the risk that an a untimely failure will reduce availability.

  3. Upgrade all osds (upgrade ceph package, restart ceph-osd daemons). This can happen one daemon or host at a time.

  4. Upgrade radosgw (upgrade radosgw package, restart radosgw daemons).

Upgrading from v0.71

  • ceph-fuse and radosgw now use the same default values for the admin socket and log file paths that the other daemons (ceph-osd, ceph-mon, etc.) do. If you run these daemons as non-root, you may need to adjust your ceph.conf to disable these options or to adjust the permissions on /var/run/ceph and /var/log/ceph.

Upgrading from v0.67 Dumpling

  • ceph-fuse and radosgw now use the same default values for the admin socket and log file paths that the other daemons (ceph-osd, ceph-mon, etc.) do. If you run these daemons as non-root, you may need to adjust your ceph.conf to disable these options or to adjust the permissions on /var/run/ceph and /var/log/ceph.

  • The MDS now disallows snapshots by default as they are not considered stable. The command ‘ceph mds set allow_snaps’ will enable them.

  • For clusters that were created before v0.44 (pre-argonaut, Spring 2012) and store radosgw data, the auto-upgrade from TMAP to OMAP objects has been disabled. Before upgrading, make sure that any buckets created on pre-argonaut releases have been modified (e.g., by PUTing and then DELETEing an object from each bucket). Any cluster created with argonaut (v0.48) or a later release or not using radosgw never relied on the automatic conversion and is not affected by this change.

  • Any direct users of the ‘tmap’ portion of the librados API should be aware that the automatic tmap -> omap conversion functionality has been removed.

  • Most output that used K or KB (e.g., for kilobyte) now uses a lower-case k to match the official SI convention. Any scripts that parse output and check for an upper-case K will need to be modified.

  • librados::Rados::pool_create_async() and librados::Rados::pool_delete_async() don’t drop a reference to the completion object on error, caller needs to take care of that. This has never really worked correctly and we were leaking an object

  • ‘ceph osd crush set <id> <weight> <loc..>’ no longer adds the osd to the specified location, as that’s a job for ‘ceph osd crush add’. It will however continue to work just the same as long as the osd already exists in the crush map.

  • The OSD now enforces that class write methods cannot both mutate an object and return data. The rbd.assign_bid method, the lone offender, has been removed. This breaks compatibility with pre-bobtail librbd clients by preventing them from creating new images.

  • librados now returns on commit instead of ack for synchronous calls. This is a bit safer in the case where both OSDs and the client crash, and is probably how it should have been acting from the beginning. Users are unlikely to notice but it could result in lower performance in some circumstances. Those who care should switch to using the async interfaces, which let you specify safety semantics precisely.

  • The C++ librados AioComplete::get_version() method was incorrectly returning an int (usually 32-bits). To avoid breaking library compatibility, a get_version64() method is added that returns the full-width value. The old method is deprecated and will be removed in a future release. Users of the C++ librados API that make use of the get_version() method should modify their code to avoid getting a value that is truncated from 64 to to 32 bits.

Notable Changes since v0.71

  • build: fix [/usr]/sbin locations (Alan Somers)

  • ceph-fuse, radosgw: enable admin socket and logging by default

  • ceph: make -h behave when monitors are down

  • common: cache crc32c values where possible

  • common: fix looping on BSD (Alan Somers)

  • librados, mon: ability to query/ping out-of-quorum monitor status (Joao Luis)

  • librbd python bindings: fix parent image name limit (Josh Durgin)

  • mds: avoid leaking objects when deleting truncated files (Yan, Zheng)

  • mds: fix F_GETLK (Yan, Zheng)

  • mds: fix many bugs with stray (unlinked) inodes (Yan, Zheng)

  • mds: fix many directory fragmentation bugs (Yan, Zheng)

  • mon: allow (un)setting HASHPSPOOL flag on existing pools (Joao Luis)

  • mon: make ‘osd pool rename’ idempotent (Joao Luis)

  • osd: COPY_GET on-wire encoding improvements (Greg Farnum)

  • osd: bloom_filter encodability, fixes, cleanups (Loic Dachary, Sage Weil)

  • osd: fix handling of racing read vs write (Samuel Just)

  • osd: reduce blocking on backing fs (Samuel Just)

  • radosgw-agent: multi-region replication/DR

  • rgw: fix/improve swift COPY support (Yehuda Sadeh)

  • rgw: misc fixes to support DR (Josh Durgin, Yehuda Sadeh)

  • rgw: per-bucket quota (Yehuda Sadeh)

  • rpm: fix junit dependencies (Alan Grosskurth)

Notable Changes since v0.67 Dumpling

  • build cleanly under clang (Christophe Courtaut)

  • build: Makefile refactor (Roald J. van Loon)

  • build: fix [/usr]/sbin locations (Alan Somers)

  • ceph-disk: fix journal preallocation

  • ceph-fuse, radosgw: enable admin socket and logging by default

  • ceph-fuse: fix problem with readahead vs truncate race (Yan, Zheng)

  • ceph-fuse: trim deleted inodes from cache (Yan, Zheng)

  • ceph-fuse: use newer fuse api (Jianpeng Ma)

  • ceph-kvstore-tool: new tool for working with leveldb (copy, crc) (Joao Luis)

  • ceph-post-file: new command to easily share logs or other files with ceph devs

  • ceph: improve parsing of CEPH_ARGS (Benoit Knecht)

  • ceph: make -h behave when monitors are down

  • ceph: parse CEPH_ARGS env variable

  • common: bloom_filter improvements, cleanups

  • common: cache crc32c values where possible

  • common: correct SI is kB not KB (Dan Mick)

  • common: fix looping on BSD (Alan Somers)

  • common: migrate SharedPtrRegistry to use boost::shared_ptr<> (Loic Dachary)

  • common: misc portability fixes (Noah Watkins)

  • crc32c: fix optimized crc32c code (it now detects arch support properly)

  • crc32c: improved intel-optimized crc32c support (~8x faster on my laptop!)

  • crush: fix name caching

  • doc: erasure coding design notes (Loic Dachary)

  • hadoop: removed old version of shim to avoid confusing users (Noah Watkins)

  • librados, mon: ability to query/ping out-of-quorum monitor status (Joao Luis)

  • librados: fix async aio completion wakeup

  • librados: fix installed header #includes (Dan Mick)

  • librados: get_version64() method for C++ API

  • librados: hello_world example (Greg Farnum)

  • librados: sync calls now return on commit (instead of ack) (Greg Farnum)

  • librbd python bindings: fix parent image name limit (Josh Durgin)

  • librbd, ceph-fuse: avoid some sources of ceph-fuse, rbd cache stalls

  • mds: avoid leaking objects when deleting truncated files (Yan, Zheng)

  • mds: fix F_GETLK (Yan, Zheng)

  • mds: fix LOOKUPSNAP bug

  • mds: fix heap profiler commands (Joao Luis)

  • mds: fix locking deadlock (David Disseldorp)

  • mds: fix many bugs with stray (unlinked) inodes (Yan, Zheng)

  • mds: fix many directory fragmentation bugs (Yan, Zheng)

  • mds: fix mds rejoin with legacy parent backpointer xattrs (Alexandre Oliva)

  • mds: fix rare restart/failure race during fs creation

  • mds: fix standby-replay when we fall behind (Yan, Zheng)

  • mds: fix stray directory purging (Yan, Zheng)

  • mds: notify clients about deleted files (so they can release from their cache) (Yan, Zheng)

  • mds: several bug fixes with clustered mds (Yan, Zheng)

  • mon, osd: improve osdmap trimming logic (Samuel Just)

  • mon, osd: initial CLI for configuring tiering

  • mon: a few ‘ceph mon add’ races fixed (command is now idempotent) (Joao Luis)

  • mon: allow (un)setting HASHPSPOOL flag on existing pools (Joao Luis)

  • mon: allow cap strings with . to be unquoted

  • mon: allow logging level of cluster log (/var/log/ceph/ceph.log) to be adjusted

  • mon: avoid rewriting full osdmaps on restart (Joao Luis)

  • mon: continue to discover peer addr info during election phase

  • mon: disallow CephFS snapshots until ‘ceph mds set allow_new_snaps’ (Greg Farnum)

  • mon: do not expose uncommitted state from ‘osd crush {add,set} …’ (Joao Luis)

  • mon: fix ‘ceph osd crush reweight …’ (Joao Luis)

  • mon: fix ‘osd crush move …’ command for buckets (Joao Luis)

  • mon: fix byte counts (off by factor of 4) (Dan Mick, Joao Luis)

  • mon: fix paxos corner case

  • mon: kv properties for pools to support EC (Loic Dachary)

  • mon: make ‘osd pool rename’ idempotent (Joao Luis)

  • mon: modify ‘auth add’ semantics to make a bit more sense (Joao Luis)

  • mon: new ‘osd perf’ command to dump recent performance information (Samuel Just)

  • mon: new and improved ‘ceph -s’ or ‘ceph status’ command (more info, easier to read)

  • mon: some auth check cleanups (Joao Luis)

  • mon: track per-pool stats (Joao Luis)

  • mon: warn about pools with bad pg_num

  • mon: warn when mon data stores grow very large (Joao Luis)

  • monc: fix small memory leak

  • new wireshark patches pulled into the tree (Kevin Jones)

  • objecter, librados: redirect requests based on cache tier config

  • objecter: fix possible hang when cluster is unpaused (Josh Durgin)

  • osd, librados: add new COPY_FROM rados operation

  • osd, librados: add new COPY_GET rados operations (used by COPY_FROM)

  • osd: ‘osd recover clone overlap limit’ option to limit cloning during recovery (Samuel Just)

  • osd: COPY_GET on-wire encoding improvements (Greg Farnum)

  • osd: add ‘osd heartbeat min healthy ratio’ configurable (was hard-coded at 33%)

  • osd: add option to disable pg log debug code (which burns CPU)

  • osd: allow cap strings with . to be unquoted

  • osd: automatically detect proper xattr limits (David Zafman)

  • osd: avoid extra copy in erasure coding reference implementation (Loic Dachary)

  • osd: basic cache pool redirects (Greg Farnum)

  • osd: basic whiteout, dirty flag support (not yet used)

  • osd: bloom_filter encodability, fixes, cleanups (Loic Dachary, Sage Weil)

  • osd: clean up and generalize copy-from code (Greg Farnum)

  • osd: cls_hello OSD class example

  • osd: erasure coding doc updates (Loic Dachary)

  • osd: erasure coding plugin infrastructure, tests (Loic Dachary)

  • osd: experiemental support for ZFS (zfsonlinux.org) (Yan, Zheng)

  • osd: fix RWORDER flags

  • osd: fix exponential backoff of slow request warnings (Loic Dachary)

  • osd: fix handling of racing read vs write (Samuel Just)

  • osd: fix version value returned by various operations (Greg Farnum)

  • osd: generalized temp object infrastructure

  • osd: ghobject_t infrastructure for EC (David Zafman)

  • osd: improvements for compatset support and storage (David Zafman)

  • osd: infrastructure to copy objects from other OSDs

  • osd: instrument peering states (David Zafman)

  • osd: misc copy-from improvements

  • osd: opportunistic crc checking on stored data (off by default)

  • osd: properly enforce RD/WR flags for rados classes

  • osd: reduce blocking on backing fs (Samuel Just)

  • osd: refactor recovery using PGBackend (Samuel Just)

  • osd: remove old magical tmap->omap conversion

  • osd: remove old pg log on upgrade (Samuel Just)

  • osd: revert xattr size limit (fixes large rgw uploads)

  • osd: use fdatasync(2) instead of fsync(2) to improve performance (Sam Just)

  • pybind: fix blacklisting nonce (Loic Dachary)

  • radosgw-agent: multi-region replication/DR

  • rgw: complete in-progress requests before shutting down

  • rgw: default log level is now more reasonable (Yehuda Sadeh)

  • rgw: fix S3 auth with response-* query string params (Sylvain Munaut, Yehuda Sadeh)

  • rgw: fix a few minor memory leaks (Yehuda Sadeh)

  • rgw: fix acl group check (Yehuda Sadeh)

  • rgw: fix inefficient use of std::list::size() (Yehuda Sadeh)

  • rgw: fix major CPU utilization bug with internal caching (Yehuda Sadeh, Mark Nelson)

  • rgw: fix ordering of write operations (preventing data loss on crash) (Yehuda Sadeh)

  • rgw: fix ordering of writes for mulitpart upload (Yehuda Sadeh)

  • rgw: fix various CORS bugs (Yehuda Sadeh)

  • rgw: fix/improve swift COPY support (Yehuda Sadeh)

  • rgw: improve help output (Christophe Courtaut)

  • rgw: misc fixes to support DR (Josh Durgin, Yehuda Sadeh)

  • rgw: per-bucket quota (Yehuda Sadeh)

  • rgw: validate S3 tokens against keystone (Roald J. van Loon)

  • rgw: wildcard support for keystone roles (Christophe Courtaut)

  • rpm: fix junit dependencies (Alan Grosskurth)

  • sysvinit radosgw: fix status return code (Danny Al-Gaaf)

  • sysvinit rbdmap: fix error ‘service rbdmap stop’ (Laurent Barbe)

  • sysvinit: add condrestart command (Dan van der Ster)

  • sysvinit: fix shutdown order (mons last) (Alfredo Deza)

v0.71

This development release includes a significant amount of new code and refactoring, as well as a lot of preliminary functionality that will be needed for erasure coding and tiering support. There are also several significant patch sets improving this with the MDS.

Upgrading

  • The MDS now disallows snapshots by default as they are not considered stable. The command ‘ceph mds set allow_snaps’ will enable them.

  • For clusters that were created before v0.44 (pre-argonaut, Spring 2012) and store radosgw data, the auto-upgrade from TMAP to OMAP objects has been disabled. Before upgrading, make sure that any buckets created on pre-argonaut releases have been modified (e.g., by PUTing and then DELETEing an object from each bucket). Any cluster created with argonaut (v0.48) or a later release or not using radosgw never relied on the automatic conversion and is not affected by this change.

  • Any direct users of the ‘tmap’ portion of the librados API should be aware that the automatic tmap -> omap conversion functionality has been removed.

  • Most output that used K or KB (e.g., for kilobyte) now uses a lower-case k to match the official SI convention. Any scripts that parse output and check for an upper-case K will need to be modified.

Notable Changes

  • build: Makefile refactor (Roald J. van Loon)

  • ceph-disk: fix journal preallocation

  • ceph-fuse: trim deleted inodes from cache (Yan, Zheng)

  • ceph-fuse: use newer fuse api (Jianpeng Ma)

  • ceph-kvstore-tool: new tool for working with leveldb (copy, crc) (Joao Luis)

  • common: bloom_filter improvements, cleanups

  • common: correct SI is kB not KB (Dan Mick)

  • common: misc portability fixes (Noah Watkins)

  • hadoop: removed old version of shim to avoid confusing users (Noah Watkins)

  • librados: fix installed header #includes (Dan Mick)

  • librbd, ceph-fuse: avoid some sources of ceph-fuse, rbd cache stalls

  • mds: fix LOOKUPSNAP bug

  • mds: fix standby-replay when we fall behind (Yan, Zheng)

  • mds: fix stray directory purging (Yan, Zheng)

  • mon: disallow CephFS snapshots until ‘ceph mds set allow_new_snaps’ (Greg Farnum)

  • mon, osd: improve osdmap trimming logic (Samuel Just)

  • mon: kv properties for pools to support EC (Loic Dachary)

  • mon: some auth check cleanups (Joao Luis)

  • mon: track per-pool stats (Joao Luis)

  • mon: warn about pools with bad pg_num

  • osd: automatically detect proper xattr limits (David Zafman)

  • osd: avoid extra copy in erasure coding reference implementation (Loic Dachary)

  • osd: basic cache pool redirects (Greg Farnum)

  • osd: basic whiteout, dirty flag support (not yet used)

  • osd: clean up and generalize copy-from code (Greg Farnum)

  • osd: erasure coding doc updates (Loic Dachary)

  • osd: erasure coding plugin infrastructure, tests (Loic Dachary)

  • osd: fix RWORDER flags

  • osd: fix exponential backoff of slow request warnings (Loic Dachary)

  • osd: generalized temp object infrastructure

  • osd: ghobject_t infrastructure for EC (David Zafman)

  • osd: improvements for compatset support and storage (David Zafman)

  • osd: misc copy-from improvements

  • osd: opportunistic crc checking on stored data (off by default)

  • osd: refactor recovery using PGBackend (Samuel Just)

  • osd: remove old magical tmap->omap conversion

  • pybind: fix blacklisting nonce (Loic Dachary)

  • rgw: default log level is now more reasonable (Yehuda Sadeh)

  • rgw: fix acl group check (Yehuda Sadeh)

  • sysvinit: fix shutdown order (mons last) (Alfredo Deza)

v0.70

Upgrading

  • librados::Rados::pool_create_async() and librados::Rados::pool_delete_async() don’t drop a reference to the completion object on error, caller needs to take care of that. This has never really worked correctly and we were leaking an object

  • ‘ceph osd crush set <id> <weight> <loc..>’ no longer adds the osd to the specified location, as that’s a job for ‘ceph osd crush add’. It will however continue to work just the same as long as the osd already exists in the crush map.

Notable Changes

  • mon: a few ‘ceph mon add’ races fixed (command is now idempotent) (Joao Luis)

  • crush: fix name caching

  • rgw: fix a few minor memory leaks (Yehuda Sadeh)

  • ceph: improve parsing of CEPH_ARGS (Benoit Knecht)

  • mon: avoid rewriting full osdmaps on restart (Joao Luis)

  • crc32c: fix optimized crc32c code (it now detects arch support properly)

  • mon: fix ‘ceph osd crush reweight …’ (Joao Luis)

  • osd: revert xattr size limit (fixes large rgw uploads)

  • mds: fix heap profiler commands (Joao Luis)

  • rgw: fix inefficient use of std::list::size() (Yehuda Sadeh)

v0.69

Upgrading

  • The sysvinit /etc/init.d/ceph script will, by default, update the CRUSH location of an OSD when it starts. Previously, if the monitors were not available, this command would hang indefinitely. Now, that step will time out after 10 seconds and the ceph-osd daemon will not be started.

  • Users of the librados C++ API should replace users of get_version() with get_version64() as the old method only returns a 32-bit value for a 64-bit field. The existing 32-bit get_version() method is now deprecated.

  • The OSDs are now more picky that request payload match their declared size. A write operation across N bytes that includes M bytes of data will now be rejected. No known clients do this, but the because the server-side behavior has changed it is possible that an application misusing the interface may now get errors.

  • The OSD now enforces that class write methods cannot both mutate an object and return data. The rbd.assign_bid method, the lone offender, has been removed. This breaks compatibility with pre-bobtail librbd clients by preventing them from creating new images.

  • librados now returns on commit instead of ack for synchronous calls. This is a bit safer in the case where both OSDs and the client crash, and is probably how it should have been acting from the beginning. Users are unlikely to notice but it could result in lower performance in some circumstances. Those who care should switch to using the async interfaces, which let you specify safety semantics precisely.

  • The C++ librados AioComplete::get_version() method was incorrectly returning an int (usually 32-bits). To avoid breaking library compatibility, a get_version64() method is added that returns the full-width value. The old method is deprecated and will be removed in a future release. Users of the C++ librados API that make use of the get_version() method should modify their code to avoid getting a value that is truncated from 64 to to 32 bits.

Notable Changes

  • build cleanly under clang (Christophe Courtaut)

  • common: migrate SharedPtrRegistry to use boost::shared_ptr<> (Loic Dachary)

  • doc: erasure coding design notes (Loic Dachary)

  • improved intel-optimized crc32c support (~8x faster on my laptop!)

  • librados: get_version64() method for C++ API

  • mds: fix locking deadlock (David Disseldorp)

  • mon, osd: initial CLI for configuring tiering

  • mon: allow cap strings with . to be unquoted

  • mon: continue to discover peer addr info during election phase

  • mon: fix ‘osd crush move …’ command for buckets (Joao Luis)

  • mon: warn when mon data stores grow very large (Joao Luis)

  • objecter, librados: redirect requests based on cache tier config

  • osd, librados: add new COPY_FROM rados operation

  • osd, librados: add new COPY_GET rados operations (used by COPY_FROM)

  • osd: add ‘osd heartbeat min healthy ratio’ configurable (was hard-coded at 33%)

  • osd: add option to disable pg log debug code (which burns CPU)

  • osd: allow cap strings with . to be unquoted

  • osd: fix version value returned by various operations (Greg Farnum)

  • osd: infrastructure to copy objects from other OSDs

  • osd: use fdatasync(2) instead of fsync(2) to improve performance (Sam Just)

  • rgw: fix major CPU utilization bug with internal caching (Yehuda Sadeh, Mark Nelson)

  • rgw: fix ordering of write operations (preventing data loss on crash) (Yehuda Sadeh)

  • rgw: fix ordering of writes for mulitpart upload (Yehuda Sadeh)

  • rgw: fix various CORS bugs (Yehuda Sadeh)

  • rgw: improve help output (Christophe Courtaut)

  • rgw: validate S3 tokens against keystone (Roald J. van Loon)

  • rgw: wildcard support for keystone roles (Christophe Courtaut)

  • sysvinit radosgw: fix status return code (Danny Al-Gaaf)

  • sysvinit rbdmap: fix error ‘service rbdmap stop’ (Laurent Barbe)

v0.68

Upgrading

  • ‘ceph osd crush set <id> <weight> <loc..>’ no longer adds the osd to the specified location, as that’s a job for ‘ceph osd crush add’. It will however continue to work just the same as long as the osd already exists in the crush map.

  • The OSD now enforces that class write methods cannot both mutate an object and return data. The rbd.assign_bid method, the lone offender, has been removed. This breaks compatibility with pre-bobtail librbd clients by preventing them from creating new images.

  • librados now returns on commit instead of ack for synchronous calls. This is a bit safer in the case where both OSDs and the client crash, and is probably how it should have been acting from the beginning. Users are unlikely to notice but it could result in lower performance in some circumstances. Those who care should switch to using the async interfaces, which let you specify safety semantics precisely.

  • The C++ librados AioComplete::get_version() method was incorrectly returning an int (usually 32-bits). To avoid breaking library compatibility, a get_version64() method is added that returns the full-width value. The old method is deprecated and will be removed in a future release. Users of the C++ librados API that make use of the get_version() method should modify their code to avoid getting a value that is truncated from 64 to to 32 bits.

Notable Changes

  • ceph-fuse: fix problem with readahead vs truncate race (Yan, Zheng)

  • ceph-post-file: new command to easily share logs or other files with ceph devs

  • ceph: parse CEPH_ARGS env variable

  • librados: fix async aio completion wakeup

  • librados: hello_world example (Greg Farnum)

  • librados: sync calls now return on commit (instead of ack) (Greg Farnum)

  • mds: fix mds rejoin with legacy parent backpointer xattrs (Alexandre Oliva)

  • mds: fix rare restart/failure race during fs creation

  • mds: notify clients about deleted files (so they can release from their cache) (Yan, Zheng)

  • mds: several bug fixes with clustered mds (Yan, Zheng)

  • mon: allow logging level of cluster log (/var/log/ceph/ceph.log) to be adjusted

  • mon: do not expose uncommitted state from ‘osd crush {add,set} …’ (Joao Luis)

  • mon: fix byte counts (off by factor of 4) (Dan Mick, Joao Luis)

  • mon: fix paxos corner case

  • mon: modify ‘auth add’ semantics to make a bit more sense (Joao Luis)

  • mon: new ‘osd perf’ command to dump recent performance information (Samuel Just)

  • mon: new and improved ‘ceph -s’ or ‘ceph status’ command (more info, easier to read)

  • monc: fix small memory leak

  • new wireshark patches pulled into the tree (Kevin Jones)

  • objecter: fix possible hang when cluster is unpaused (Josh Durgin)

  • osd: ‘osd recover clone overlap limit’ option to limit cloning during recovery (Samuel Just)

  • osd: cls_hello OSD class example

  • osd: experiemental support for ZFS (zfsonlinux.org) (Yan, Zheng)

  • osd: instrument peering states (David Zafman)

  • osd: properly enforce RD/WR flags for rados classes

  • osd: remove old pg log on upgrade (Samuel Just)

  • rgw: complete in-progress requests before shutting down

  • rgw: fix S3 auth with response-* query string params (Sylvain Munaut, Yehuda Sadeh)

  • sysvinit: add condrestart command (Dan van der Ster)