[pacman-dev] pacman-disowned

newer
[pacman-dev] [PATCH 1/2] makepkg:...

Jeremy Heiner

29 Sep 2013 29 Sep '13

4:53 p.m.

Greetings, everyone! My name is Jeremy. Pleased to make your acquaintance. The "Pacman Tips" wiki page provides a pacman-disowned shell script and suggests running it periodically. I completely agree: I have found it both helpful and educational. I've played around with the script, trying to squeeze more information into its output. Just so you can get a sense of what direction my exploration lead me, I've attached my scala implementation. To be clear: I am *not* suggesting this scala program as a replacement for pacman-disowned. I can't even guarantee that my scala program does anything useful, and I eschew any liability should you ever run it :). I've only included it in case you're wondering where my head is at. What do I mean by "squeeze more info in"? Well, it would be fairly easy to tweak the shell script to list missing files (even though you can get that info using "-Qk"). So my scala program outputs a line for each file that indicates if it is missing or if it is not owned by a package: it combines the info from both "-Qk" and pacman-disowned. Anyway, I've been thinking about what it would take to bake this functionality into pacman itself. Now I'm certain many of you are already objecting: "it's better to layer this on top than to bake it in". And I concede that is a powerful objection. Still, I can't see any harm in exploring the idea. I'm benefiting by learning about the insides of pacman, even if my code is not adopted. I must also admit that I have not got to the point of implementing my scala stuff in C yet. I do have five-ish patches ready to post here, but they are all about preparing for eventually adding that functionality. It is possible that I might never post a patch that attains my end goal, but still some of these preliminary patches might prove useful. The first clump of patches I will post add unit tests for "-Qk". My ideas will (eventually) require big changes to check.c, and I wouldn't want to break anything there. This involved adding a new "hook" mechanism to the test framework, so that a package could be installed, then mangled by removing items, all in the generation phase so that the "-Qk" output could be verified in the test phase. Thanks in advance for your time evaluating my offerings, and for any feedback you have, Jeremy

Attachments:

installed.scala (application/octet-stream — 2.0 KB)

Show replies by date

Allan McRae

30 Sep 30 Sep

midnight

On 30/09/13 02:53, Jeremy Heiner wrote:

...

Greetings, everyone! My name is Jeremy. Pleased to make your acquaintance.

The "Pacman Tips" wiki page provides a pacman-disowned shell script and suggests running it periodically. I completely agree: I have found it both helpful and educational.

You do realize that you have not described what pacman-disowned is and that it is not a part of the pacman project? But based on what was below, I guess it finds files that are not tracked by pacman? If that is the case, I would be happy for it to go into contrib/, but I don't see this as a future feature for pacman. Unless someone can convince me that a package manager should deal with files it does not track. Allan

Jeremy Heiner

12:53 a.m.

Hi, Allan, You correctly surmised the purpose of the script. Sorry for not being more explicit about that. I agree that a package manager bears no responsibility for files it does not track. But I also think it is very helpful to be able to easily tell exactly which files are being dealt with by the package manager and which files got onto the system by other means. And the package manager is the only thing that is able to provide that information. What I'm thinking of is an enhanced "-Qkk". It would provide the same info (in a more compact form) for managed files, but it would also let you know about any unmanaged files it finds within the directories it is responsible for. Does that make a little more sense? Jeremy On Sun, Sep 29, 2013 at 8:00 PM, Allan McRae <allan@archlinux.org> wrote:

...

On 30/09/13 02:53, Jeremy Heiner wrote:

...
Greetings, everyone! My name is Jeremy. Pleased to make your acquaintance.

The "Pacman Tips" wiki page provides a pacman-disowned shell script and suggests running it periodically. I completely agree: I have found it both helpful and educational.

You do realize that you have not described what pacman-disowned is and that it is not a part of the pacman project? But based on what was below, I guess it finds files that are not tracked by pacman?

If that is the case, I would be happy for it to go into contrib/, but I don't see this as a future feature for pacman. Unless someone can convince me that a package manager should deal with files it does not track.

Allan

Allan McRae

1 Oct 1 Oct

2:39 a.m.

On 30/09/13 10:53, Jeremy Heiner wrote:

...

Hi, Allan,

You correctly surmised the purpose of the script. Sorry for not being more explicit about that. I agree that a package manager bears no responsibility for files it does not track. But I also think it is very helpful to be able to easily tell exactly which files are being dealt with by the package manager and which files got onto the system by other means. And the package manager is the only thing that is able to provide that information.

What I'm thinking of is an enhanced "-Qkk". It would provide the same info (in a more compact form) for managed files, but it would also let you know about any unmanaged files it finds within the directories it is responsible for. Does that make a little more sense?

Can you give some example output so that we can understand what "enhanced -Qkk" means. Also, look at "pacman -Qo /home" and note it is tracked in Arch Linux. There are a lot of unmanaged files in that directory... How will that be managed? Allan

Jeremy Heiner

2 Oct 2 Oct

12:55 a.m.

First, the /home question... Everything under an unmanaged dir is guaranteed(*) also to be unmanaged. So only the immediate contents of /home would be listed. That's actually one of the mistakes the shell script makes: it runs find, so it could potentially waste a lot of time digging deep into the filesystem. My scala prototype only ever examines the contents of managed dirs. That is all that's needed to answer the question "where is all the unmanaged stuff on my filesystem?". (*) Actually the approach works even if this is relaxed (i.e. managed items within unmanaged dirs). My prototype examine all dirs containing managed items and all parents of all such dirs - still way fewer than find would. My thoughts on the output are still nebulous. But I'm envisioning something like `ls -l`... The "-rwx" part would indicate status. The first letter could be 'p' (or '-') to indicate managed (or not). Next maybe 'f' (or '-') if it exists in the filesystem (or not), and an 'm' if there is an mtree signature. Then one letter for each of the mtree tests: Uid, Gid, Mode, Time, Kind, Link, Size, ... where the capital letter indicates failure and '-' indicates success. I'm sure I'll remember more to add later. The next column (like "size" for ls) could show the number of packages claiming ownership. My prototype actually lists all the owners instead of just counting, but that's maybe a "-verbose" option. I include the count because a 1 here is boring so you want a way to filter those entries out. Add the item name at the end and you get: pf-------- 94 /etc/ pfm------- 1 /etc/arch-release -f-------- 0 /etc/crackdict pfm---T--- 1 /etc/crypttab -f-------- 0 /etc/group- p--------- 1 /etc/motd Something along those lines. The unit tests for -Qk and -Qkk are almost ready to go, but real life stuff keeps interrupting... tomorrow, I hope. Jeremy

Allan McRae

3 Oct 3 Oct

2:38 a.m.

On 02/10/13 10:55, Jeremy Heiner wrote:

...

First, the /home question...

Everything under an unmanaged dir is guaranteed(*) also to be unmanaged. So only the immediate contents of /home would be listed. That's actually one of the mistakes the shell script makes: it runs find, so it could potentially waste a lot of time digging deep into the filesystem. My scala prototype only ever examines the contents of managed dirs. That is all that's needed to answer the question "where is all the unmanaged stuff on my filesystem?".

(*) Actually the approach works even if this is relaxed (i.e. managed items within unmanaged dirs). My prototype examine all dirs containing managed items and all parents of all such dirs - still way fewer than find would.

My thoughts on the output are still nebulous. But I'm envisioning something like `ls -l`...

The "-rwx" part would indicate status. The first letter could be 'p' (or '-') to indicate managed (or not). Next maybe 'f' (or '-') if it exists in the filesystem (or not), and an 'm' if there is an mtree signature. Then one letter for each of the mtree tests: Uid, Gid, Mode, Time, Kind, Link, Size, ... where the capital letter indicates failure and '-' indicates success. I'm sure I'll remember more to add later.

The next column (like "size" for ls) could show the number of packages claiming ownership. My prototype actually lists all the owners instead of just counting, but that's maybe a "-verbose" option. I include the count because a 1 here is boring so you want a way to filter those entries out.

Add the item name at the end and you get:

pf-------- 94 /etc/ pfm------- 1 /etc/arch-release -f-------- 0 /etc/crackdict pfm---T--- 1 /etc/crypttab -f-------- 0 /etc/group- p--------- 1 /etc/motd

Something along those lines. The unit tests for -Qk and -Qkk are almost ready to go, but real life stuff keeps interrupting... tomorrow, I hope.

I am very against that style of output. I want it to be clear what the change is without having to decipher a code. Allan

Jeremy Heiner

11:48 p.m.

On Wed, Oct 2, 2013 at 10:38 PM, Allan McRae <allan@archlinux.org> wrote:

...

I am very against that style of output. I want it to be clear what the change is without having to decipher a code.

Like I said, the output format isn't something I've put much thought into. The "ls -l" style is just something that is so ubiquitous that I thought it would be easy to grok. Any suggestions for a style of output would be great. But, perhaps, discussing output format might be a bit premature. The thing I want to put thinking time into right now is the use case scenario. The motivation. Part of system maintenance should be comparing pacman's idea of the filesystem with the actual contents of the filesystem. And part of that is keeping an eye out for stray files in managed dirs. I've identified check.c as the place for this as it already iterates over the package files and mtrees. Details like output format can be settled later. But are there any major pitfalls here that I am just not seeing? (It certainly wouldn't be the first time that's happened to me ;) Thanks, Jeremy

Allan McRae

4 Oct 4 Oct

4:40 a.m.

On 04/10/13 09:48, Jeremy Heiner wrote:

...

On Wed, Oct 2, 2013 at 10:38 PM, Allan McRae <allan@archlinux.org> wrote:

...
I am very against that style of output. I want it to be clear what the change is without having to decipher a code.

Like I said, the output format isn't something I've put much thought into. The "ls -l" style is just something that is so ubiquitous that I thought it would be easy to grok. Any suggestions for a style of output would be great. But, perhaps, discussing output format might be a bit premature.

The thing I want to put thinking time into right now is the use case scenario. The motivation. Part of system maintenance should be comparing pacman's idea of the filesystem with the actual contents of the filesystem. And part of that is keeping an eye out for stray files in managed dirs. I've identified check.c as the place for this as it already iterates over the package files and mtrees. Details like output format can be settled later. But are there any major pitfalls here that I am just not seeing? (It certainly wouldn't be the first time that's happened to me ;)

OK... I am mildly convinced that this should be part of pacman rather than in a separate tool. But, I do not think it should be part of -Qk or -Qkk. These check that what is listed in the local database is correct. Looking for untracked files is a separate task and should be treated as such. Allan

Jeremy Heiner

2:25 p.m.

I'm using -Qkkk right now (easy hack, minimal footprint), but like the output format that can easily be tweaked. One reason I keep associating this new find untracked feature with the existing '--check's is that they are algorithmic cousins. From the controller (query.c) point of view these 3 features are called in basically identical ways. And from the implementation (check.c) view they all have the same shape (for each file in list call 1 or more predicates). However, I'm definitely not implying that implementation details should dictate user interface. But there seems to be a deeper reason. It's rooted in the use case. Consider what actions the user must do to achieve the goal. At a minimum(*) they must invoke pacman twice. Just -Qk isn't enough because it ignores the mtrees. And just -Qkk checks nothing for packages without an mtree. Am I wrong to think that adding another step is the wrong direction to go to help the user achieve their goal? So I want to advocate for a solution that does all the steps in a single invocation. I don't want to remove the ability to run the steps independently. In fact, I think it makes a lot of sense for the output of the single invocation to be very terse, providing the 10,000 foot view, and the user needs to re-invoke (w/ different args) for more detail on any problems noted in the overview. (*)Are there other steps that should be folded in? My brain is so down in the weeds of the implementation right now that I don't completely trust my view of the trees, much less the forest.

郑文辉(Techlive Zheng)

5:26 p.m.

2013/10/4 Jeremy Heiner <scalaprotractor@gmail.com>:

...

I'm using -Qkkk right now (easy hack, minimal footprint), but like the output format that can easily be tweaked.

One reason I keep associating this new find untracked feature with the existing '--check's is that they are algorithmic cousins. From the controller (query.c) point of view these 3 features are called in basically identical ways. And from the implementation (check.c) view they all have the same shape (for each file in list call 1 or more predicates). However, I'm definitely not implying that implementation details should dictate user interface.

But there seems to be a deeper reason. It's rooted in the use case. Consider what actions the user must do to achieve the goal. At a minimum(*) they must invoke pacman twice. Just -Qk isn't enough because it ignores the mtrees. And just -Qkk checks nothing for packages without an mtree. Am I wrong to think that adding another step is the wrong direction to go to help the user achieve their goal?

So I want to advocate for a solution that does all the steps in a single invocation. I don't want to remove the ability to run the steps independently. In fact, I think it makes a lot of sense for the output of the single invocation to be very terse, providing the 10,000 foot view, and the user needs to re-invoke (w/ different args) for more detail on any problems noted in the overview.

(*)Are there other steps that should be folded in? My brain is so down in the weeds of the implementation right now that I don't completely trust my view of the trees, much less the forest.

Yeah, I second your propose, I used to combine `find` and `pacman -Qo` to accomplish this, but it is too time consuming, being able to have this as a built-in feature and an option would be really great.

郑文辉(Techlive Zheng)

5:30 p.m.

2013/10/5 郑文辉(Techlive Zheng) <techlivezheng@gmail.com>:

...

2013/10/4 Jeremy Heiner <scalaprotractor@gmail.com>:

...
I'm using -Qkkk right now (easy hack, minimal footprint), but like the output format that can easily be tweaked.

One reason I keep associating this new find untracked feature with the existing '--check's is that they are algorithmic cousins. From the controller (query.c) point of view these 3 features are called in basically identical ways. And from the implementation (check.c) view they all have the same shape (for each file in list call 1 or more predicates). However, I'm definitely not implying that implementation details should dictate user interface.

But there seems to be a deeper reason. It's rooted in the use case. Consider what actions the user must do to achieve the goal. At a minimum(*) they must invoke pacman twice. Just -Qk isn't enough because it ignores the mtrees. And just -Qkk checks nothing for packages without an mtree. Am I wrong to think that adding another step is the wrong direction to go to help the user achieve their goal?

So I want to advocate for a solution that does all the steps in a single invocation. I don't want to remove the ability to run the steps independently. In fact, I think it makes a lot of sense for the output of the single invocation to be very terse, providing the 10,000 foot view, and the user needs to re-invoke (w/ different args) for more detail on any problems noted in the overview.

(*)Are there other steps that should be folded in? My brain is so down in the weeds of the implementation right now that I don't completely trust my view of the trees, much less the forest.

Yeah, I second your propose, I used to combine `find` and `pacman -Qo` to accomplish this, but it is too time consuming, being able to have this as a built-in feature and an option would be really great.

As for the output format, anything parsable for a later processing would be fine.

Dave Reisner

5 Oct 5 Oct

1:15 a.m.

On Fri, Oct 04, 2013 at 10:25:01AM -0400, Jeremy Heiner wrote:

...

I'm using -Qkkk right now (easy hack, minimal footprint), but like the output format that can easily be tweaked.

One reason I keep associating this new find untracked feature with the existing '--check's is that they are algorithmic cousins. From the controller (query.c) point of view these 3 features are called in basically identical ways. And from the implementation (check.c) view they all have the same shape (for each file in list call 1 or more predicates). However, I'm definitely not implying that implementation details should dictate user interface.

But there seems to be a deeper reason. It's rooted in the use case. Consider what actions the user must do to achieve the goal. At a minimum(*) they must invoke pacman twice. Just -Qk isn't enough because it ignores the mtrees. And just -Qkk checks nothing for packages without an mtree. Am I wrong to think that adding another step is the wrong direction to go to help the user achieve their goal?

So I want to advocate for a solution that does all the steps in a single invocation. I don't want to remove the ability to run the steps independently. In fact, I think it makes a lot of sense for the output of the single invocation to be very terse, providing the 10,000 foot view, and the user needs to re-invoke (w/ different args) for more detail on any problems noted in the overview.

(*)Are there other steps that should be folded in? My brain is so down in the weeds of the implementation right now that I don't completely trust my view of the trees, much less the forest.

So, I'm still not convinced that this belongs in pacman. The package manager manages *packages* and the files that belong to them. That they're algorithmically similar doesn't really appeal to me -- it's about problem domain. In addition, the local DB and the files are structured in such a way that they're extremely inefficient at lookups of this nature. As you've yet to post any code, output, or performance numbers, I'm going to blindly guess that this is a *long* operation. You could, of course, restructure the data to make it quicker to search. I'm not against the idea in principle, but I really don't see why it needs to be in pacman. For fun, I cobbled together the attached shell script which eschews some accuracy for speed. I'm sure it could be improved. Currently, it runs in 4 seconds on my machine. Cheers, Dave

Ashley Whetter

10:16 a.m.

On 2013-10-05 02:15, Dave Reisner wrote:

...

On Fri, Oct 04, 2013 at 10:25:01AM -0400, Jeremy Heiner wrote:

...
I'm using -Qkkk right now (easy hack, minimal footprint), but like the output format that can easily be tweaked.

One reason I keep associating this new find untracked feature with the existing '--check's is that they are algorithmic cousins. From the controller (query.c) point of view these 3 features are called in basically identical ways. And from the implementation (check.c) view they all have the same shape (for each file in list call 1 or more predicates). However, I'm definitely not implying that implementation details should dictate user interface.

But there seems to be a deeper reason. It's rooted in the use case. Consider what actions the user must do to achieve the goal. At a minimum(*) they must invoke pacman twice. Just -Qk isn't enough because it ignores the mtrees. And just -Qkk checks nothing for packages without an mtree. Am I wrong to think that adding another step is the wrong direction to go to help the user achieve their goal?

So I want to advocate for a solution that does all the steps in a single invocation. I don't want to remove the ability to run the steps independently. In fact, I think it makes a lot of sense for the output of the single invocation to be very terse, providing the 10,000 foot view, and the user needs to re-invoke (w/ different args) for more detail on any problems noted in the overview.

(*)Are there other steps that should be folded in? My brain is so down in the weeds of the implementation right now that I don't completely trust my view of the trees, much less the forest.

So, I'm still not convinced that this belongs in pacman. The package manager manages *packages* and the files that belong to them. That they're algorithmically similar doesn't really appeal to me -- it's about problem domain.

In addition, the local DB and the files are structured in such a way that they're extremely inefficient at lookups of this nature. As you've yet to post any code, output, or performance numbers, I'm going to blindly guess that this is a *long* operation. You could, of course, restructure the data to make it quicker to search.

I'm not against the idea in principle, but I really don't see why it needs to be in pacman. For fun, I cobbled together the attached shell script which eschews some accuracy for speed. I'm sure it could be improved. Currently, it runs in 4 seconds on my machine.

Cheers, Dave

I agree with Dave here. I don't think this functionality is really "core" to pacman and what it does. I do, however, think it would be a really useful function and that therefore it should go into contrib. One of the reasons pacman is so great is because it does a good job of keeping the filesystem clean. But it's still a package manager, rather than a filesystem manager. I'm also going to argue that whatever we do here, we should do the same with pacdiff. The reasons for this new functionality being in pacman, contrib, whatever are likely going to be the same as pacdiff because it's also dealing with files that aren't part of a package. In fact pacdiff might even have more reason to go into pacman than this new functionality because it's only dealing with files created by pacman itself. Ashley

Jeremy Heiner

1:13 p.m.

On Sat, Oct 5, 2013 at 6:16 AM, Ashley Whetter <ashley@awhetter.co.uk> wrote:

...

I agree with Dave here. I don't think this functionality is really "core" to pacman and what it does. I do, however, think it would be a really useful function and that therefore it should go into contrib. One of the reasons pacman is so great is because it does a good job of keeping the filesystem clean. But it's still a package manager, rather than a filesystem manager.

I'm also going to argue that whatever we do here, we should do the same with pacdiff. The reasons for this new functionality being in pacman, contrib, whatever are likely going to be the same as pacdiff because it's also dealing with files that aren't part of a package. In fact pacdiff might even have more reason to go into pacman than this new functionality because it's only dealing with files created by pacman itself.

Ashley

Hi, Ashley, Thanks for the feedback. This point isn't just to you, but to everyone drawing that solid line separating package manager from filesystem manager. I totally get that certain functions fall clearly to one side of that line or the other. And the benefits to a software project of being able to define what is inside and what is outside its scope are obvious. But because packages are fairly useless until they get unpacked into a filesystem I can't see that line as being as solid as people are suggesting. I see interdependencies and functions that straddle that line and gray areas that don't fit neatly into the rigidly drawn boundaries. Of course, the lines do need to be drawn. And I know I don't get to do that in this case. What I can do is suggest again that boundaries which align to the tasks the user needs to accomplish are in many ways preferable to those that align to the computer's tasks. And regarding pacdiff, I think there are some very significant differences. You pointed out the biggest one: that pacman itself creates and logs the creation of the pacnew files. That is a huge advantage to pacdiff and makes its task very unlike the task of finding files that got onto the system by who knows what crazy unpredictable reason or accident. Jeremy

Jeremy Heiner

12:03 p.m.

On Fri, Oct 4, 2013 at 9:15 PM, Dave Reisner <d@falconindy.com> wrote:

...

So, I'm still not convinced that this belongs in pacman. The package manager manages *packages* and the files that belong to them. That they're algorithmically similar doesn't really appeal to me -- it's about problem domain.

Hi, Dave, Thank you for your response. I want to clarify that I never meant to cite algorithmic similarity as justification for this feature's inclusion within pacman. The only justifications I have offered for this are based on the use case analysis. I'm sorry my words were confusing. I was only referencing the algorithms to indicate where the feature seemed best to fit so as to maximize the potential for code reuse. I agree that this decision is very much about the problem domain. And I've tried to provide justification by examining the user's goal of performing system maintenance. I would very much like to hear any thoughts you have about that.

...

In addition, the local DB and the files are structured in such a way that they're extremely inefficient at lookups of this nature. As you've yet to post any code, output, or performance numbers, I'm going to blindly guess that this is a *long* operation. You could, of course, restructure the data to make it quicker to search.

My original post (sept.29) has an attachment containing my prototype. Somehow it got marked as non-text (?) but it is just Scala source code. Sorry, it's not documented at all. But it's perfectly obvious to me :). I would be happy to answer any questions. I don't think the local db structure is much of a factor because the time reading+parsing 1 or 2 files per package is swamped by reading the filesystem. And if you count up the filesystem accesses, this new feature is on par with "pacman -Qk", so not that long at all.

...

I'm not against the idea in principle, but I really don't see why it needs to be in pacman. For fun, I cobbled together the attached shell script which eschews some accuracy for speed. I'm sure it could be improved. Currently, it runs in 4 seconds on my machine.

Note that unmanaged files may appear in non-leaf managed dirs, but such files are not reported by this script. Thanks again, Jeremy

Dave Reisner

4:52 p.m.

On Sat, Oct 05, 2013 at 08:03:43AM -0400, Jeremy Heiner wrote:

...

On Fri, Oct 4, 2013 at 9:15 PM, Dave Reisner <d@falconindy.com> wrote:

...
So, I'm still not convinced that this belongs in pacman. The package manager manages *packages* and the files that belong to them. That they're algorithmically similar doesn't really appeal to me -- it's about problem domain.

Hi, Dave, Thank you for your response.

I want to clarify that I never meant to cite algorithmic similarity as justification for this feature's inclusion within pacman. The only justifications I have offered for this are based on the use case analysis. I'm sorry my words were confusing. I was only referencing the algorithms to indicate where the feature seemed best to fit so as to maximize the potential for code reuse.

I agree that this decision is very much about the problem domain. And I've tried to provide justification by examining the user's goal of performing system maintenance. I would very much like to hear any thoughts you have about that.

We're getting a bit into the abstract here. Pacman is just *one* of the tools used as part of maintaining a system. You surely wouldn't propose that we merge functionality into pacman that allows it to do full system backups, fsck devices, or scrub raid arrays. I think it's equally strange that you'd propose that pacman directly deal with the files that it knows the least about.

...

...
In addition, the local DB and the files are structured in such a way that they're extremely inefficient at lookups of this nature. As you've yet to post any code, output, or performance numbers, I'm going to blindly guess that this is a *long* operation. You could, of course, restructure the data to make it quicker to search.

My original post (sept.29) has an attachment containing my prototype. Somehow it got marked as non-text (?) but it is just Scala source code. Sorry, it's not documented at all. But it's perfectly obvious to me :). I would be happy to answer any questions.

The fact that there's now 2 concise implementations which take little from pacman itself is fairly solid evidence that this belongs in contrib.

...

I don't think the local db structure is much of a factor because the time reading+parsing 1 or 2 files per package is swamped by reading the filesystem. And if you count up the filesystem accesses, this new feature is on par with "pacman -Qk", so not that long at all.

...
I'm not against the idea in principle, but I really don't see why it needs to be in pacman. For fun, I cobbled together the attached shell script which eschews some accuracy for speed. I'm sure it could be improved. Currently, it runs in 4 seconds on my machine.

Note that unmanaged files may appear in non-leaf managed dirs, but such files are not reported by this script. Thanks again, Jeremy

Hence my comment about eschewing accuracy for speed...

Jeremy Heiner

8:04 p.m.

Hi, Dave, You raise good points. I shall do my best to answer them, but first I would very much like to hear your thoughts on... What are the responsibilities of pacman regarding managed directories versus files/links? The mtrees enable -Qkk to report altered files/links. But what things should count as altered for dirs? Is it only (as currently implemented) the removal of the entire dir? Why doesn't adding to a dir count as an alteration? So I don't feel my proposal is fairly characterized as having pacman deal with files it is not responsible for. It owns those managed dirs and is responsible for them. It knows exactly what it put into those dirs, and any additions constitute a divergence of the filesystem from pacman's database image of it. There are 3 implementations if you count the original "pacman-disowned". But they all fail to meet the requirements of the scenario I described previously. You've mentioned the longish run time for this feature, and that is in fact motivation for including it in the pacman binary. Any external script would need to invoke pacman three times (-Ql, -Qk, -Qkk) in order to meet the requirements. It does not seem advantageous to force the user to wait three times longer than necessary. Looking forward to your response, Jeremy

Allan McRae

6 Oct 6 Oct

1:10 a.m.

On 06/10/13 06:04, Jeremy Heiner wrote:

...

There are 3 implementations if you count the original "pacman-disowned". But they all fail to meet the requirements of the scenario I described previously. You've mentioned the longish run time for this feature, and that is in fact motivation for including it in the pacman binary. Any external script would need to invoke pacman three times (-Ql, -Qk, -Qkk) in order to meet the requirements. It does not seem advantageous to force the user to wait three times longer than necessary.

Why call both -Qk and -Qkk? So lets say any external script calls -Ql and -Qkk. Two things. Given if cchecking for unowned files is part of pacman, it _will not_ be part of -Qkk, both -Qkk and -Q --unowned will need called. Two things... Also, I think this is the key factor to excluding this from pacman.

...

From your implementation:

for ( item <- List( "dev", "proc", "run", "sys", "tmp", // these look wrong, but see .head & .tail below "certs/etc/ssl", "locale/usr/share", "pkg/var/cache/pacman" ) ) { I assume these are exclusion lists. They will always be incomplete and non-portable so need to be configurable. We are not having such configuration in pacman.conf. This all points to an extrenal script. Allan

Jeremy Heiner

2:42 a.m.

On Sat, Oct 5, 2013 at 9:10 PM, Allan McRae <allan@archlinux.org> wrote:

...

Why call both -Qk and -Qkk? So lets say any external script calls -Ql and -Qkk. Two things.

Hi, Allan, 35% of the packages I have installed lack an mtree. And -Qkk performs no checks at all for those. Sure, I guess the external script can use the -Ql to fill in the gaps that -Qkk didn't check. But that's just shifting the work around: one fewer exec of pacman, but someone's got to perform those checks. If I were writing the script I would just as soon let -Qk take care of it for me instead of reimplementing something pacman can do for me.

...

Given if cchecking for unowned files is part of pacman, it _will not_ be part of -Qkk, both -Qkk and -Q --unowned will need called. Two things...

The consequence of this decision is that the user will experience twice the necessary run time, or three times if they want to also check packages lacking mtrees.

...

Also, I think this is the key factor to excluding this from pacman. From your implementation:

for ( item <- List( "dev", "proc", "run", "sys", "tmp", // these look wrong, but see .head & .tail below "certs/etc/ssl", "locale/usr/share", "pkg/var/cache/pacman" ) ) {

I assume these are exclusion lists. They will always be incomplete and non-portable so need to be configurable. We are not having such configuration in pacman.conf.

Those are only there because it's my script that I use on my system and I've hacked it to suit my needs. It's just a prototype, not the final product. I have not suggested, nor would I ever, that these be incorporated into pacman. A simple "|grep -v exclusion-pattern" is all that would be required to achieve what my hack does, and would certainly be the preferable solution for pacman. Jeremy

Xyne

4:56 p.m.

On 2013-10-05 16:04 -0400 Jeremy Heiner wrote:

...

Any external script would need to invoke pacman three times (-Ql, -Qk, -Qkk) in order to meet the requirements. It does not seem advantageous to force the user to wait three times longer than necessary.

What is preventing someone from writing a dedicated, independent tool in C that hooks into alpm? Wouldn't that achieve the same speed without including this in Pacman? This is a genuine question. I do not know if alpm is sufficiently modular to support this. Regards, Xyne

Andrew Gregory

5:22 p.m.

On 10/06/13 at 04:56pm, Xyne wrote:

...

On 2013-10-05 16:04 -0400 Jeremy Heiner wrote:

...
Any external script would need to invoke pacman three times (-Ql, -Qk, -Qkk) in order to meet the requirements. It does not seem advantageous to force the user to wait three times longer than necessary.

What is preventing someone from writing a dedicated, independent tool in C that hooks into alpm? Wouldn't that achieve the same speed without including this in Pacman?

This is a genuine question. I do not know if alpm is sufficiently modular to support this.

Regards, Xyne

You mean like this? https://github.com/andrewgregory/pacreport It does more than just find unowned files, but that could easily be extracted. apg

Jeremy Heiner

7:54 p.m.

On Sun, Oct 6, 2013 at 1:22 PM, Andrew Gregory <andrew.gregory.8@gmail.com> wrote:

...

You mean like this? https://github.com/andrewgregory/pacreport

Thank you, Andrew! I wish I had know about that earlier! Thanks also to Xyne for asking exactly the right question. This will very likely end of this thread here, which I have no objection to. I've said what I needed to about how -Qk and -Qkk don't quite meet the needs of one looking for the list of discrepancies between pacman's db and the filesystem. I would still very much like to hear Dave's and Allan's and anyone else's thoughts on the extent of the responsibilities pacman bears for managed directories. But I have a path forward that circumvents the need for any unsettling changes to the pacman code base, so I am satisfied. Actually, it seems Andrew has already coded it up, so I am happy. Jeremy

Jeremy Heiner

30 Sep 30 Sep

1:07 a.m.

Sigh... I waited 8 hours for my original post to work its way to my inbox so I could send in patches as replies to it... But apparently gmail had other plans. Sorry about that! Anybody with more experience with gmail's quirks know what I did wrong? Jeremy

Florian Pritz

9:51 a.m.

On 30.09.2013 03:07, Jeremy Heiner wrote:

...

Sigh... I waited 8 hours for my original post to work its way to my inbox so I could send in patches as replies to it... But apparently gmail had other plans. Sorry about that! Anybody with more experience with gmail's quirks know what I did wrong?

Gmail removes "duplicate" mails. So if you send a mail to an ML the mail is already in your sent folder and gmail won't show it anywhere else. That check is based on the message id of the mail.

Jeremy Heiner

11:03 a.m.

Thank you, Florian. That explains the unexpectedly long 8 hour "delay". I thought maybe since that was my first post ever to the list that it was being held pending moderator approval (as an anti-spam measure) or something. But it was just gmail hiding my "original duplicate" from me, until Allan's reply arrived then showing it as part of the thread. But I'm still confused by the patches I sent as replies to my original showing up as new top-level threads. It seems that gmail trashed the in-reply-to chain when I edited the subject? That seems drastic (if that's what's happening), and makes me wonder: how do I properly submit multi-patch patches using gmail if I can't edit the subject line? Jeremy On Mon, Sep 30, 2013 at 5:51 AM, Florian Pritz <bluewind@xinu.at> wrote:

...

Gmail removes "duplicate" mails. So if you send a mail to an ML the mail is already in your sent folder and gmail won't show it anywhere else.

That check is based on the message id of the mail.

Andrew Gregory

11:49 a.m.

On 09/30/13 at 07:03am, Jeremy Heiner wrote:

...

Thank you, Florian. That explains the unexpectedly long 8 hour "delay". I thought maybe since that was my first post ever to the list that it was being held pending moderator approval (as an anti-spam measure) or something. But it was just gmail hiding my "original duplicate" from me, until Allan's reply arrived then showing it as part of the thread.

But I'm still confused by the patches I sent as replies to my original showing up as new top-level threads. It seems that gmail trashed the in-reply-to chain when I edited the subject? That seems drastic (if that's what's happening), and makes me wonder: how do I properly submit multi-patch patches using gmail if I can't edit the subject line? Jeremy

Don't use gmail for patches, ever; it mangles them. Use `git send-email`. If you look at your patches, it broke them by wrapping a few long lines. apg

Jeremy Heiner

12:10 p.m.

Thanks, Andrew. My apologies to everyone. Consider my two patches rescinded. I'll figure out how to configure git to send email and then resubmit. Sorry! Jeremy On Mon, Sep 30, 2013 at 7:49 AM, Andrew Gregory <andrew.gregory.8@gmail.com> wrote:

...

Don't use gmail for patches, ever; it mangles them. Use `git send-email`. If you look at your patches, it broke them by wrapping a few long lines.

apg

Karol Blazewicz

12:13 p.m.

On Mon, Sep 30, 2013 at 2:10 PM, Jeremy Heiner <scalaprotractor@gmail.com> wrote:

...

Thanks, Andrew. My apologies to everyone. Consider my two patches rescinded. I'll figure out how to configure git to send email and then resubmit. Sorry! Jeremy

https://wiki.archlinux.org/index.php/Super_Quick_Git_Guide Please stop top-posting.

Florian Pritz

12:14 p.m.

On 30.09.2013 14:10, Jeremy Heiner wrote:

...

My apologies to everyone. Consider my two patches rescinded. I'll figure out how to configure git to send email and then resubmit.

You can use gmail (the service), you just can't use the web interface. git config --global sendemail.smtpserver /usr/bin/msmtp git config --global sendemail.supresscc self git config --global sendemail.chainreplyto false https://wiki.archlinux.org/index.php/Msmtp#Quick_start https://wiki.archlinux.org/index.php/Super_Quick_Git_Guide#Sending_patches PS: Please don't top post (yes, gmail web interface doesn't make that easy afaik)

4159

Age (days ago)

4166

Last active (days ago)

List overview

Download

28 comments

9 participants

participants (9)

Allan McRae
Andrew Gregory
Ashley Whetter
Dave Reisner
Florian Pritz
Jeremy Heiner
Karol Blazewicz
Xyne
郑文辉(Techlive Zheng)