On Mon, Apr 11, 2011 at 10:15 PM, Dan McGee <dan@archlinux.org> wrote:
[1] Rough data from April 11, 2011, with a total hit count of 1,109,163: 12 /login.php 13 /rpc.php?type=sarch 15 /rpc.php?type=msearch 16 /pingserver.php 16 /rpc.php 22 /logout.php 163 /passreset.php 335 /account.php 530 /pkgsubmit.php 916 /rss2.php 3838 /index.php 6752 /rss.php 9699 / 42478 /rpc.php?type=search 184737 /packages.php 681725 /rpc.php?type=info
That means a whopping 61.5% of our requests were for info over the RPC interface; package pages are a distant second at only 16.7%.
I had a question about this data. What is the breakdown of those 'info' type queries that are querying by ID vs by Name. There was a time (pre-2009) when the search endpoint result set wasn't quite as complete as the info result set. I am wondering if the preponderance of info style queries could be very legacy clients that are first searching, then performing 'by id' or 'by name' lookups to populate the result set for the end user. The fact that the search result set is now just as complete as the info result set, could possibly provide an avenue to help alleviate load if said clients were identified/fixed/updated. Additionally, if the 'ID' type queries are a low percentage, it may be a good idea to just drop that and simply do exact package name equality comparisons. This would also be more consistent as ID is more for internal database representation, names are unique, and search does not allow searching based on partial ID matches (which would be silly). This would also rule out the need for 'server side ordering' as you get back a list of packages, from a list of submitted names, and you can order/sort/identify them client side very easily (no ambiguity of trying to deal with whether it was an info result based on a name, or based on an id search term).