Re: [aur-dev] [aur-general] AurJson - orphan packages
2009/9/14 Pierre Chapuis <catwell@archlinux.us>:
Le Sun, 13 Sep 2009 18:23:02 +0200, Pierre Chapuis <catwell@archlinux.us> a écrit :
Hi,
I was wondering if there's a way to know if a package is orphan using the AurJson interface.
If that's not the case, I think I will find a feature request for something like:
"Uploader":"pseudonym"
in "info/results", with "pseudonym" set as "orphan" if the package is orphan (I hope nobody has taken the pseudonym "orphan" on AUR...).
Does that look fine to you?
Of course I meant: "I will file a feature request"...
Another possibility might be to add a boolean like OutOfDate, but the solution I proposed would also enable scripts to find the uploader of a package, which might be useful later. As far as I'm concerned, I don't really care since I only want to know if the package is orphan...
I'm CC-ing this to the aur-dev, as I think it belongs there. The following patch adds the Orphan field to the aurjson output. For username we would need to hit the database once more (maybe, have to think more SQL for that), but the Orphan-ness is quite straightforward to evaluate. So let's just do that first. This patch, however, might clash with one patch I sent in a few days ago [1], since that one hasn't been applied to the repo, yet. Anyway, neither of those are hard to see where they should go... Any comments? Cheers, Greg [1] http://mailman.archlinux.org/pipermail/aur-dev/2009-September/000868.html
From 2aba57717cac80f643886b25900c8d7ad14e9198 Mon Sep 17 00:00:00 2001 From: Gergely Imreh <imrehg@gmail.com> Date: Mon, 14 Sep 2009 22:28:58 +0800 Subject: [PATCH] aurjson: add "Orphan" boolean field to JSON output
Signed-off-by: Gergely Imreh <imrehg@gmail.com> --- web/lib/aurjson.class.php | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php index 29fc424..85a3b89 100644 --- a/web/lib/aurjson.class.php +++ b/web/lib/aurjson.class.php @@ -19,8 +19,9 @@ include_once("aur.inc"); class AurJSON { private $dbh = false; private $exposed_methods = array('search','info'); - private $fields = array('ID','Name','Version','CategoryID','Description', - 'LocationID', 'URL','URLPath','License','NumVotes','OutOfDate'); + private $fields = array('ID','Name','Version','MaintainerUID','CategoryID', + 'Description','LocationID', 'URL','URLPath','License','NumVotes', + 'OutOfDate'); /** * Handles post data, and routes the request. @@ -139,6 +140,8 @@ class AurJSON { if ( $result && (mysql_num_rows($result) > 0) ) { $row = mysql_fetch_assoc($result); mysql_free_result($result); + $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); + unset($row['MaintainerUID']); return $this->json_results('info', $row); } else { -- 1.6.4.2
On Mon 14 Sep 2009 22:46 +0800, Gergely Imreh wrote:
The following patch adds the Orphan field to the aurjson output. For username we would need to hit the database once more (maybe, have to think more SQL for that), but the Orphan-ness is quite straightforward to evaluate. So let's just do that first.
This patch, however, might clash with one patch I sent in a few days ago [1], since that one hasn't been applied to the repo, yet. Anyway, neither of those are hard to see where they should go...
Any comments?
@@ -139,6 +140,8 @@ class AurJSON { if ( $result && (mysql_num_rows($result) > 0) ) { $row = mysql_fetch_assoc($result); mysql_free_result($result); + $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); + unset($row['MaintainerUID']);
Just leaving MaintainerUID as-is here should be good enough I think. I'll push that if you want to resubmit. Thanks.
2009/9/17 Loui Chang <louipc.ist@gmail.com>:
On Mon 14 Sep 2009 22:46 +0800, Gergely Imreh wrote:
The following patch adds the Orphan field to the aurjson output. For username we would need to hit the database once more (maybe, have to think more SQL for that), but the Orphan-ness is quite straightforward to evaluate. So let's just do that first.
This patch, however, might clash with one patch I sent in a few days ago [1], since that one hasn't been applied to the repo, yet. Anyway, neither of those are hard to see where they should go...
Any comments?
@@ -139,6 +140,8 @@ class AurJSON { if ( $result && (mysql_num_rows($result) > 0) ) { $row = mysql_fetch_assoc($result); mysql_free_result($result); + $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); + unset($row['MaintainerUID']);
Just leaving MaintainerUID as-is here should be good enough I think. I'll push that if you want to resubmit. Thanks.
The reason I have the "Orphan" and not the "MaintainerUID", because from the website it is not possible to connect the MaintainerUID and the username, and I'm not aware of any way one can search for the username or packages using MaintainerUID. That makes this number pretty meaningless except for AUR internal usage. Also, having an "Orphan" value is completely self-explanatory (as are the OutOfDate and all other fields...) - one can just look at the json output for the very first time in their life and see what means what. The "If MaintainerUID is 0 then it is an orphan package, if whatever
0 then someone maintains it" is a needed explanation beyond the aurjson output.
If you want to change something, then remove the removal of the MaintainerUID (for some usage I might not thing of right now). The "orphan" field should stay in my opinion. Of course this is ultimately your call, these are just some thoughts.... Cheers, Greg
On Thu 17 Sep 2009 10:34 +0800, Gergely Imreh wrote:
2009/9/17 Loui Chang <louipc.ist@gmail.com>:
On Mon 14 Sep 2009 22:46 +0800, Gergely Imreh wrote:
The following patch adds the Orphan field to the aurjson output. For username we would need to hit the database once more (maybe, have to think more SQL for that), but the Orphan-ness is quite straightforward to evaluate. So let's just do that first.
This patch, however, might clash with one patch I sent in a few days ago [1], since that one hasn't been applied to the repo, yet. Anyway, neither of those are hard to see where they should go...
Any comments?
@@ -139,6 +140,8 @@ class AurJSON { if ( $result && (mysql_num_rows($result) > 0) ) { $row = mysql_fetch_assoc($result); mysql_free_result($result); + $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); + unset($row['MaintainerUID']);
Just leaving MaintainerUID as-is here should be good enough I think. I'll push that if you want to resubmit. Thanks.
The reason I have the "Orphan" and not the "MaintainerUID", because from the website it is not possible to connect the MaintainerUID and the username, and I'm not aware of any way one can search for the username or packages using MaintainerUID. That makes this number pretty meaningless except for AUR internal usage.
Also, having an "Orphan" value is completely self-explanatory (as are the OutOfDate and all other fields...) - one can just look at the json output for the very first time in their life and see what means what. The "If MaintainerUID is 0 then it is an orphan package, if whatever
0 then someone maintains it" is a needed explanation beyond the aurjson output.
If you want to change something, then remove the removal of the MaintainerUID (for some usage I might not thing of right now). The "orphan" field should stay in my opinion.
I understand how this might be useful for the average user, but they'll most likely never see it. It only needs to be understood once by the application developer, and it's documented - in the code heh. It's inefficient to add more processing just for aesthetic purposes. So let's think about adding something useful. You can get usernames with one query: select Name,Username from Packages left join Users on (Packages.MaintainerUID = Users.ID) where DummyPkg = 0 limit 10; I just tested this against plain MaintainerUID. That yields a performance hit of about 200 milliseconds for 15828 packages, which doesn't seem too bad. I'm testing on a slow system anyways. So I think it may be alright to just grab Username. Orphaned packages will have a NULL Username. This should be added to both info and search functions.
On Wed, Sep 16, 2009 at 10:43 PM, Loui Chang <louipc.ist@gmail.com> wrote:
On Thu 17 Sep 2009 10:34 +0800, Gergely Imreh wrote:
2009/9/17 Loui Chang <louipc.ist@gmail.com>:
On Mon 14 Sep 2009 22:46 +0800, Gergely Imreh wrote:
The following patch adds the Orphan field to the aurjson output. For username we would need to hit the database once more (maybe, have to think more SQL for that), but the Orphan-ness is quite straightforward to evaluate. So let's just do that first.
This patch, however, might clash with one patch I sent in a few days ago [1], since that one hasn't been applied to the repo, yet. Anyway, neither of those are hard to see where they should go...
Any comments?
@@ -139,6 +140,8 @@ class AurJSON { if ( $result && (mysql_num_rows($result) > 0) ) { $row = mysql_fetch_assoc($result); mysql_free_result($result); + $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); + unset($row['MaintainerUID']);
Just leaving MaintainerUID as-is here should be good enough I think. I'll push that if you want to resubmit. Thanks.
The reason I have the "Orphan" and not the "MaintainerUID", because from the website it is not possible to connect the MaintainerUID and the username, and I'm not aware of any way one can search for the username or packages using MaintainerUID. That makes this number pretty meaningless except for AUR internal usage.
Also, having an "Orphan" value is completely self-explanatory (as are the OutOfDate and all other fields...) - one can just look at the json output for the very first time in their life and see what means what. The "If MaintainerUID is 0 then it is an orphan package, if whatever
0 then someone maintains it" is a needed explanation beyond the aurjson output.
If you want to change something, then remove the removal of the MaintainerUID (for some usage I might not thing of right now). The "orphan" field should stay in my opinion.
I understand how this might be useful for the average user, but they'll most likely never see it. It only needs to be understood once by the application developer, and it's documented - in the code heh.
It's inefficient to add more processing just for aesthetic purposes.
So let's think about adding something useful.
You can get usernames with one query:
select Name,Username from Packages left join Users on (Packages.MaintainerUID = Users.ID) where DummyPkg = 0 limit 10;
I just tested this against plain MaintainerUID. That yields a performance hit of about 200 milliseconds for 15828 packages, which doesn't seem too bad.
I'm testing on a slow system anyways.
So I think it may be alright to just grab Username. Orphaned packages will have a NULL Username. This should be added to both info and search functions.
OMG, query benchmarking? I've never seen this in the AUR before. :P I would just do it right and return a maintainer name. Worry about performance later; there really is little to no penalty here for this. I already told you one query that needs optimizing the other day that is much more prevalent than this one. -Dan
On Wed 16 Sep 2009 22:48 -0500, Dan McGee wrote:
OMG, query benchmarking? I've never seen this in the AUR before. :P
Hah, why not. There's kind of a tradition of being picky with the JSON interface.
I would just do it right and return a maintainer name. Worry about performance later; there really is little to no penalty here for this. I already told you one query that needs optimizing the other day that is much more prevalent than this one.
Which other day? I thought you fixed all of them. Please refresh my memory. Thanks.
On Wed, Sep 16, 2009 at 11:06 PM, Loui Chang <louipc.ist@gmail.com> wrote:
On Wed 16 Sep 2009 22:48 -0500, Dan McGee wrote:
OMG, query benchmarking? I've never seen this in the AUR before. :P
Hah, why not. There's kind of a tradition of being picky with the JSON interface.
I would just do it right and return a maintainer name. Worry about performance later; there really is little to no penalty here for this. I already told you one query that needs optimizing the other day that is much more prevalent than this one.
Which other day? I thought you fixed all of them. Please refresh my memory. Thanks.
Attached is the processed full query from sigurd, generated like so: $ sudo mysqldumpslow -s t /var/lib/mysql/sigurd-slow.log | gzip -9 > slow-queries-processed.txt.gz It is sorted by total time, so those queries that run the most (as opposed to take the most time per query) will bubble to the top, as those are the best ones to attack. Here are the top two entries in that file, for example: Count: 49 Time=2.19s (107s) Lock=0.02s (1s) Rows=26.5 (1300), aur[aur]@localhost SELECT SQL_CALC_FOUND_ROWS Users.Username AS Maintainer, PackageCategories.Category, PackageLocations.Location, Packages.Name, Packages.Version, Packages.Description, Packages.NumVotes, Packages.ID, Packages.OutOfDate FROM Packages LEFT JOIN Users ON (Packages.MaintainerUID = Users.ID) LEFT JOIN PackageCategories ON (Packages.CategoryID = PackageCategories.ID) LEFT JOIN PackageLocations ON (Packages.LocationID = PackageLocations.ID) WHERE Packages.DummyPkg = N ORDER BY Name ASC, LocationID ASC, CategoryID DESC LIMIT N, N Count: 50 Time=2.01s (100s) Lock=0.00s (0s) Rows=25.0 (1250), aur[aur]@localhost SELECT SQL_CALC_FOUND_ROWS CommentNotify.UserID AS Notify, PackageVotes.UsersID AS Voted, Users.Username AS Maintainer, PackageCategories.Category, PackageLocations.Location, Packages.Name, Packages.Version, Packages.Description, Packages.NumVotes, Packages.ID, Packages.OutOfDate FROM Packages LEFT JOIN Users ON (Packages.MaintainerUID = Users.ID) LEFT JOIN PackageVotes ON (Packages.ID = PackageVotes.PackageID AND PackageVotes.UsersID = N) LEFT JOIN CommentNotify ON (Packages.ID = CommentNotify.PkgID AND CommentNotify.UserID = N) LEFT JOIN PackageCategories ON (Packages.CategoryID = PackageCategories.ID) LEFT JOIN PackageLocations ON (Packages.LocationID = PackageLocations.ID) WHERE Packages.DummyPkg = N ORDER BY Name ASC, LocationID ASC, CategoryID DESC LIMIT N, N These two queries (for 99 total slow queries) are generated from the package view page. One has votes and one does not; that may be logged in vs. not logged in. For something like this, the best bet is to run an explain on the query, see why it sucks (I can already tell you it is extremely inefficient here at using the right indexes), and try to improve it. In all honesty, the AUR can keep up with this stuff just fine, and the server isn't overloaded, so it probably isn't a big deal, but it is a good exercise if you are interested in performance. -Dan
2009/9/17 Loui Chang <louipc.ist@gmail.com>:
On Thu 17 Sep 2009 10:34 +0800, Gergely Imreh wrote:
2009/9/17 Loui Chang <louipc.ist@gmail.com>:
On Mon 14 Sep 2009 22:46 +0800, Gergely Imreh wrote:
The following patch adds the Orphan field to the aurjson output. For username we would need to hit the database once more (maybe, have to think more SQL for that), but the Orphan-ness is quite straightforward to evaluate. So let's just do that first.
This patch, however, might clash with one patch I sent in a few days ago [1], since that one hasn't been applied to the repo, yet. Anyway, neither of those are hard to see where they should go...
Any comments?
@@ -139,6 +140,8 @@ class AurJSON { if ( $result && (mysql_num_rows($result) > 0) ) { $row = mysql_fetch_assoc($result); mysql_free_result($result); + $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); + unset($row['MaintainerUID']);
Just leaving MaintainerUID as-is here should be good enough I think. I'll push that if you want to resubmit. Thanks.
The reason I have the "Orphan" and not the "MaintainerUID", because from the website it is not possible to connect the MaintainerUID and the username, and I'm not aware of any way one can search for the username or packages using MaintainerUID. That makes this number pretty meaningless except for AUR internal usage.
Also, having an "Orphan" value is completely self-explanatory (as are the OutOfDate and all other fields...) - one can just look at the json output for the very first time in their life and see what means what. The "If MaintainerUID is 0 then it is an orphan package, if whatever
0 then someone maintains it" is a needed explanation beyond the aurjson output.
If you want to change something, then remove the removal of the MaintainerUID (for some usage I might not thing of right now). The "orphan" field should stay in my opinion.
I understand how this might be useful for the average user, but they'll most likely never see it. It only needs to be understood once by the application developer, and it's documented - in the code heh.
It's inefficient to add more processing just for aesthetic purposes.
So let's think about adding something useful.
You can get usernames with one query:
select Name,Username from Packages left join Users on (Packages.MaintainerUID = Users.ID) where DummyPkg = 0 limit 10;
I just tested this against plain MaintainerUID. That yields a performance hit of about 200 milliseconds for 15828 packages, which doesn't seem too bad.
I'm testing on a slow system anyways.
So I think it may be alright to just grab Username. Orphaned packages will have a NULL Username. This should be added to both info and search functions.
I personally don't think that one should have to read the AUR codebase to be able to interpret the rpc.php output. It is simple enough that it should be self explanatory - just like it was before this conversation started. And don't really feel it's an aesthetic issue but a purely logical. Ways to know if a package is an orphan: 1st version: "Are you an orphan? yes / no" 2nd version: "Who's your daddy? JohnDoe / NULL" Just because it's software mainly talking to software, it still can be clear. Also, I'm a bit puzzled by the objection to a single line of $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); which was countered with a suggestion of a full database re-query. Would THAT really take that much shorter? Have to do some checks in my test aur install to wrap my head around it. Would have thought that AUR has a lot of other inefficiencies that can be improved (pretty sure that my PKGBUILD parsing routines are no exceptions), before we worry about a single "if". Having said all this, I can see the point of returning the username / "orphan" (or username / ""?) instead of Orphan logical variable. I wouldn't have chosen it myself originally because of the extra database hit, but now it seems to be more in line with the behaviour of the web interface. Loui, do you want to write that changes, or should I send an updated patch? I don't mind either way, glad to get the job done. :) Cheers, Greg
On Wed, Sep 16, 2009 at 11:31 PM, Gergely Imreh <imrehg@gmail.com> wrote:
Having said all this, I can see the point of returning the username / "orphan" (or username / ""?) instead of Orphan logical variable. I wouldn't have chosen it myself originally because of the extra database hit, but now it seems to be more in line with the behaviour of the web interface.
What? You do know a join in SQL requires no extra DB hit, right? -Dan
2009/9/17 Dan McGee <dpmcgee@gmail.com>:
On Wed, Sep 16, 2009 at 11:31 PM, Gergely Imreh <imrehg@gmail.com> wrote:
Having said all this, I can see the point of returning the username / "orphan" (or username / ""?) instead of Orphan logical variable. I wouldn't have chosen it myself originally because of the extra database hit, but now it seems to be more in line with the behaviour of the web interface.
What? You do know a join in SQL requires no extra DB hit, right?
Should have read it more carefully. Of course it isn't... :) Greg
On Thu 17 Sep 2009 12:31 +0800, Gergely Imreh wrote:
Also, I'm a bit puzzled by the objection to a single line of $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); which was countered with a suggestion of a full database re-query.
Would THAT really take that much shorter? Have to do some checks in my test aur install to wrap my head around it. Would have thought that AUR has a lot of other inefficiencies that can be improved (pretty sure that my PKGBUILD parsing routines are no exceptions), before we worry about a single "if".
My concern is that nowadays the JSON interface is being accessed quite a lot. That can only grow as more users access the site and more features are added. It's best to be prudent and get the best value out of the code that we can. Those instructions don't add any value in my opinion. I see it as the equivalent of adding: # This is a comment echo "";
Having said all this, I can see the point of returning the username / "orphan" (or username / ""?) instead of Orphan logical variable. I
There's no possibility of someone having "" as a username, so that's best. No extra processing either.
wouldn't have chosen it myself originally because of the extra database hit, but now it seems to be more in line with the behaviour of the web interface.
Sorry. I was just showing how to grab the Username in a query. You would build that into the existing queries, so there's only one query per rpc request.
Loui, do you want to write that changes, or should I send an updated patch? I don't mind either way, glad to get the job done. :)
You're certainly welcome to give it a shot. If you're finding it difficult, I can write the patch. I'm a bit lazy though. :P
2009/9/17 Loui Chang <louipc.ist@gmail.com>:
On Thu 17 Sep 2009 12:31 +0800, Gergely Imreh wrote:
Also, I'm a bit puzzled by the objection to a single line of $row['Orphan'] = ($row['MaintainerUID'] == "0" ? "1" : "0"); which was countered with a suggestion of a full database re-query.
Would THAT really take that much shorter? Have to do some checks in my test aur install to wrap my head around it. Would have thought that AUR has a lot of other inefficiencies that can be improved (pretty sure that my PKGBUILD parsing routines are no exceptions), before we worry about a single "if".
My concern is that nowadays the JSON interface is being accessed quite a lot. That can only grow as more users access the site and more features are added. It's best to be prudent and get the best value out of the code that we can. Those instructions don't add any value in my opinion. I see it as the equivalent of adding:
# This is a comment echo "";
I agree with the best value of code completely. Didn't know the usage statistics, and I didn't really use the json interface (or used it until now), only when there were things to fix. I start to like it, though. :) The example of Zero Added Value: come on, that's not quite the same. An output variable that you prefer with another name but maybe some would like the way I sent it, compared with an empty instruction? Opinions disagree, and I know very well, that the "patches are welcome" != "your patch will be applied". The discussion is very helpful and I certaily learn from it. Also, will try not to turn these things into bikeshed painting...
Having said all this, I can see the point of returning the username / "orphan" (or username / ""?) instead of Orphan logical variable. I
There's no possibility of someone having "" as a username, so that's best. No extra processing either.
Sounds good.
wouldn't have chosen it myself originally because of the extra database hit, but now it seems to be more in line with the behaviour of the web interface.
Sorry. I was just showing how to grab the Username in a query. You would build that into the existing queries, so there's only one query per rpc request.
My fault, should have read it more carefully. And my SQL needs to be improved anyway, so my original objection had no basis.
Loui, do you want to write that changes, or should I send an updated patch? I don't mind either way, glad to get the job done. :)
You're certainly welcome to give it a shot. If you're finding it difficult, I can write the patch. I'm a bit lazy though. :P
Sure thing. I know what it should do, so shouldn't be to bad. Be back with some code later. Cheers, Greg
participants (3)
-
Dan McGee
-
Gergely Imreh
-
Loui Chang