[arch-general] BTRFS scrub from systemd unit
Hi, folks. I've been noodling over this rather odd issue I've been having, and I thought I'd get a second opinion on things. I've got a server with four hard disks, set up in two (separate) BTRFS RAID1 mounts. I just recovered from a bout of SATA signalling issues (which BTRFS did a marvellous job of saving my data from), and I'd like to periodically fire off scrubs to ensure my data's OK. If I manually run a scrub: ---------------------------------------------------- $ sudo btrfs scrub start /media/data/ scrub started on /media/data/, fsid 490b8b7c-59c4-45dc-ac63-6a90f0966776 (pid=14434) ---------------------------------------------------- Then things work as they should: ---------------------------------------------------- $ sudo btrfs scrub status /media/data/ scrub status for 490b8b7c-59c4-45dc-ac63-6a90f0966776 scrub started at Wed Mar 19 20:27:14 2014, running for 40 seconds total bytes scrubbed: 12.70GiB with 0 errors $ sudo btrfs scrub cancel /media/data/ scrub cancelled $ sudo btrfs scrub status /media/data/ scrub status for 490b8b7c-59c4-45dc-ac63-6a90f0966776 scrub started at Wed Mar 19 20:27:14 2014 and was aborted after 89 seconds total bytes scrubbed: 28.17GiB with 0 errors ---------------------------------------------------- So far, so good. So I made a systemd unit to trigger the scrub: ---------------------------------------------------- $ cat /etc/systemd/system/diskscrub.service [Unit] Description=Disk scrub runner [Service] Type=oneshot ExecStart=/usr/bin/btrfs scrub start /media/data ---------------------------------------------------- Now, this unit starts fine, and exits immediately as it should, being a oneshot service and given that the btrfs scrub command backgrounds itself (or something? I'm not exactly sure what it does, but it returns immediately). However, the scrub status does not work and the scrub cannot be canceled: ---------------------------------------------------- $ sudo systemctl start diskscrub $ sudo systemctl status diskscrub diskscrub.service - Disk scrub runner Loaded: loaded (/etc/systemd/system/diskscrub.service; static) Active: inactive (dead) Mar 19 20:33:38 rat systemd[1]: Starting Disk scrub runner... Mar 19 20:33:38 rat systemd[1]: Started Disk scrub runner. Mar 19 20:33:38 rat btrfs[15221]: scrub started on /media/data, fsid 490b8b7c-59c4-45dc-ac63-6a90f0966776 (pid=15222) $ sudo btrfs scrub status /media/data/ scrub status for 490b8b7c-59c4-45dc-ac63-6a90f0966776 no stats available total bytes scrubbed: 0.00 with 0 errors $ sudo btrfs scrub cancel /media/data/ ERROR: scrub cancel failed on /media/data/: not running ---------------------------------------------------- New scrubs cannot be started until the stale/invalid scrub states are deleted (rm -f /var/lib/btrfs/*). So I'm stumped, here. Anyone have any clue as to what's happening? Thanks, --Sean
On 20-03-2014 00:41, Sean Greenslade wrote:
Hi, folks. I've been noodling over this rather odd issue I've been having, and I thought I'd get a second opinion on things. I've got a server with four hard disks, set up in two (separate) BTRFS RAID1 mounts. I just recovered from a bout of SATA signalling issues (which BTRFS did a marvellous job of saving my data from), and I'd like to periodically fire off scrubs to ensure my data's OK. If I manually run a scrub:
---------------------------------------------------- $ sudo btrfs scrub start /media/data/ scrub started on /media/data/, fsid 490b8b7c-59c4-45dc-ac63-6a90f0966776 (pid=14434) ----------------------------------------------------
Then things work as they should:
---------------------------------------------------- $ sudo btrfs scrub status /media/data/ scrub status for 490b8b7c-59c4-45dc-ac63-6a90f0966776 scrub started at Wed Mar 19 20:27:14 2014, running for 40 seconds total bytes scrubbed: 12.70GiB with 0 errors
$ sudo btrfs scrub cancel /media/data/ scrub cancelled
$ sudo btrfs scrub status /media/data/ scrub status for 490b8b7c-59c4-45dc-ac63-6a90f0966776 scrub started at Wed Mar 19 20:27:14 2014 and was aborted after 89 seconds total bytes scrubbed: 28.17GiB with 0 errors ----------------------------------------------------
So far, so good. So I made a systemd unit to trigger the scrub:
---------------------------------------------------- $ cat /etc/systemd/system/diskscrub.service [Unit] Description=Disk scrub runner
[Service] Type=oneshot ExecStart=/usr/bin/btrfs scrub start /media/data ----------------------------------------------------
Now, this unit starts fine, and exits immediately as it should, being a oneshot service and given that the btrfs scrub command backgrounds itself (or something? I'm not exactly sure what it does, but it returns immediately). However, the scrub status does not work and the scrub cannot be canceled:
---------------------------------------------------- $ sudo systemctl start diskscrub
$ sudo systemctl status diskscrub diskscrub.service - Disk scrub runner Loaded: loaded (/etc/systemd/system/diskscrub.service; static) Active: inactive (dead)
Mar 19 20:33:38 rat systemd[1]: Starting Disk scrub runner... Mar 19 20:33:38 rat systemd[1]: Started Disk scrub runner. Mar 19 20:33:38 rat btrfs[15221]: scrub started on /media/data, fsid 490b8b7c-59c4-45dc-ac63-6a90f0966776 (pid=15222)
$ sudo btrfs scrub status /media/data/ scrub status for 490b8b7c-59c4-45dc-ac63-6a90f0966776 no stats available total bytes scrubbed: 0.00 with 0 errors
$ sudo btrfs scrub cancel /media/data/ ERROR: scrub cancel failed on /media/data/: not running ----------------------------------------------------
New scrubs cannot be started until the stale/invalid scrub states are deleted (rm -f /var/lib/btrfs/*).
So I'm stumped, here. Anyone have any clue as to what's happening?
Thanks,
--Sean
Just a guess but you might want to change the unit type to simple instead of oneshot. -- Mauro Santos
On Thu, Mar 20, 2014 at 01:06:18AM +0000, Mauro Santos wrote:
On 20-03-2014 00:41, Sean Greenslade wrote:
Hi, folks. I've been noodling over this rather odd issue I've been having, and I thought I'd get a second opinion on things. <SNIP> So I'm stumped, here. Anyone have any clue as to what's happening?
Thanks,
--Sean
Just a guess but you might want to change the unit type to simple instead of oneshot.
-- Mauro Santos
I thought of that, but it just does the same thing. The scrub command returns after forking(?) back the real scrub process. Now, maybe if someone has a clever way of making the service detect when the scrub finishes, I could do a remainafterexit unit, but I can't see a way to do that. --Sean
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 03/19/2014 09:40 PM, Sean Greenslade wrote:
On Thu, Mar 20, 2014 at 01:06:18AM +0000, Mauro Santos wrote:
On 20-03-2014 00:41, Sean Greenslade wrote:
Hi, folks. I've been noodling over this rather odd issue I've been having, and I thought I'd get a second opinion on things. <SNIP> So I'm stumped, here. Anyone have any clue as to what's happening?
Thanks,
--Sean
Just a guess but you might want to change the unit type to simple instead of oneshot.
-- Mauro Santos
I thought of that, but it just does the same thing. The scrub command returns after forking(?) back the real scrub process. Now, maybe if someone has a clever way of making the service detect when the scrub finishes, I could do a remainafterexit unit, but I can't see a way to do that.
--Sean
Salutations, If it's supposed to fork, you may want to switch to type=forking. See <http://www.freedesktop.org/software/systemd/man/systemd.service.html> Regards, Mark -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlMqR5YACgkQZ/Z80n6+J/a7sgD/URN81eys/q0U5AR/3GhMkO+T lfshsUS0cGNxQfNQWX4A+QGL5kwecc8YJFgev4TLokMDrsP3xWZ80CA4OI7EVbXv =IKdv -----END PGP SIGNATURE-----
On Wed, Mar 19, 2014 at 09:42:46PM -0400, Mark Lee wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 03/19/2014 09:40 PM, Sean Greenslade wrote:
On Thu, Mar 20, 2014 at 01:06:18AM +0000, Mauro Santos wrote:
On 20-03-2014 00:41, Sean Greenslade wrote:
Hi, folks. I've been noodling over this rather odd issue I've been having, and I thought I'd get a second opinion on things. <SNIP> So I'm stumped, here. Anyone have any clue as to what's happening?
Thanks,
--Sean
Just a guess but you might want to change the unit type to simple instead of oneshot.
-- Mauro Santos
I thought of that, but it just does the same thing. The scrub command returns after forking(?) back the real scrub process. Now, maybe if someone has a clever way of making the service detect when the scrub finishes, I could do a remainafterexit unit, but I can't see a way to do that.
--Sean
Salutations,
If it's supposed to fork, you may want to switch to type=forking.
See <http://www.freedesktop.org/software/systemd/man/systemd.service.html>
Regards, Mark
I did an strace on the start scrub process, but my knowledge on its output is limited. I _believe_ this line means that it is forking, but can someone else confirm this? clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffbb9fddb50) = 713 --Sean
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 03/19/2014 10:34 PM, Sean Greenslade wrote:
On Wed, Mar 19, 2014 at 09:42:46PM -0400, Mark Lee wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 03/19/2014 09:40 PM, Sean Greenslade wrote:
On Thu, Mar 20, 2014 at 01:06:18AM +0000, Mauro Santos wrote:
On 20-03-2014 00:41, Sean Greenslade wrote:
Hi, folks. I've been noodling over this rather odd issue I've been having, and I thought I'd get a second opinion on things. <SNIP> So I'm stumped, here. Anyone have any clue as to what's happening?
Thanks,
--Sean
Just a guess but you might want to change the unit type to simple instead of oneshot.
-- Mauro Santos
I thought of that, but it just does the same thing. The scrub command returns after forking(?) back the real scrub process. Now, maybe if someone has a clever way of making the service detect when the scrub finishes, I could do a remainafterexit unit, but I can't see a way to do that.
--Sean
Salutations,
If it's supposed to fork, you may want to switch to type=forking.
See <http://www.freedesktop.org/software/systemd/man/systemd.service.html>
Regards, Mark
I did an strace on the start scrub process, but my knowledge on its output is limited. I _believe_ this line means that it is forking, but can someone else confirm this?
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffbb9fddb50) = 713
--Sean
Salutations, Did you try to just switch to type=forking? Regards, Mark -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlMqU/oACgkQZ/Z80n6+J/YFkQEAkI598OvFLLDNceXB++k3o1dO hJJymcHIA28aPsAWqScA/0I9xA9s8OvKYr0g/BDGRboJxd0CQPMuSsyOwadyR9vu =ezb8 -----END PGP SIGNATURE-----
On Wed, Mar 19, 2014 at 10:35:38PM -0400, Mark Lee wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 03/19/2014 10:34 PM, Sean Greenslade wrote:
On Wed, Mar 19, 2014 at 09:42:46PM -0400, Mark Lee wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 03/19/2014 09:40 PM, Sean Greenslade wrote:
On Thu, Mar 20, 2014 at 01:06:18AM +0000, Mauro Santos wrote:
On 20-03-2014 00:41, Sean Greenslade wrote:
Hi, folks. I've been noodling over this rather odd issue I've been having, and I thought I'd get a second opinion on things. <SNIP> So I'm stumped, here. Anyone have any clue as to what's happening?
Thanks,
--Sean
Just a guess but you might want to change the unit type to simple instead of oneshot.
-- Mauro Santos
I thought of that, but it just does the same thing. The scrub command returns after forking(?) back the real scrub process. Now, maybe if someone has a clever way of making the service detect when the scrub finishes, I could do a remainafterexit unit, but I can't see a way to do that.
--Sean
Salutations,
If it's supposed to fork, you may want to switch to type=forking.
See <http://www.freedesktop.org/software/systemd/man/systemd.service.html>
Regards, Mark
I did an strace on the start scrub process, but my knowledge on its output is limited. I _believe_ this line means that it is forking, but can someone else confirm this?
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ffbb9fddb50) = 713
--Sean
Salutations,
Did you try to just switch to type=forking?
Regards, Mark
Just tried it and it seems to be working. And if the clone line's return value is to be believed, systemd's automatic PID guesser seems to be working as well. Fantastic! Thanks, guys. --Sean
participants (3)
-
Mark Lee
-
Mauro Santos
-
Sean Greenslade