On Tue, Apr 12, 2011 at 1:04 AM, elij <elij.mx@gmail.com> wrote:
On Mon, Apr 11, 2011 at 10:15 PM, Dan McGee <dan@archlinux.org> wrote:
The majority of "real world" info requests [1] come in hefty batches. We would be better served to handle these in one request rather than multiple by allowing AUR clients to send multiple arguments.
This enables things like this to work: http://aur.test/rpc.php?type=info&arg[]=cups-xerox&arg[]=cups-mc2430dl&arg[]=10673
This kind of breaks the query format convention (thanks php!). It might be cleaner to use a separator (like a space) to break up args (and simply split on space).
http://aur.test/rpc.php?type=info&arg=cups-xerox+cups-mc1430dl+10673
(note using plus signs here as an example since spaces or %20s are harder to 'see')
I know it breaks "convention", but introducing another seemed a bit silly to me. All old queries work 100% however, if that wasn't clear. If we want to expand this to msearch or search down the road (which seemed totally plausible to me when designing the code here), where a space has totally different meaning and you would not want to split on it, this new syntax would not work at all.
Note to RPC users: unfortunately due to the asinine design of PHP, you unfortunately have to use the 'arg[]' syntax if you want more than one query argument, or you will only get the package satisfying the last arg you pass.
[1] Rough data from April 11, 2011, with a total hit count of 1,109,163: 12 /login.php 13 /rpc.php?type=sarch 15 /rpc.php?type=msearch 16 /pingserver.php 16 /rpc.php 22 /logout.php 163 /passreset.php 335 /account.php 530 /pkgsubmit.php 916 /rss2.php 3838 /index.php 6752 /rss.php 9699 / 42478 /rpc.php?type=search 184737 /packages.php 681725 /rpc.php?type=info
That means a whopping 61.5% of our requests were for info over the RPC interface; package pages are a distant second at only 16.7%.
Signed-off-by: Dan McGee <dan@archlinux.org> --- web/lib/aurjson.class.php | 59 ++++++++++++++++++++++++++++++++++++-------- 1 files changed, 48 insertions(+), 11 deletions(-)
diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php index 9b0e1a0..57096d8 100644 --- a/web/lib/aurjson.class.php +++ b/web/lib/aurjson.class.php @@ -101,6 +101,36 @@ class AurJSON { }
/** + * Parse the args to the info function. We may have a string or an array, + * so do the appropriate thing. Within the elements, both * package IDs + * and package names are valid; sort them into the relevant arrays and + * escape/quote the names. + * @param $args the arg string or array to parse. + * @return mixed An array containing 'ids' and 'names'. + **/ + private function parse_info_args($args) { + if (!is_array($args)) { + $args = array($args); + } + + $id_args = array(); + $name_args = array(); + foreach ($args as $arg) { + if (!$arg) { + continue; + } + if (is_numeric($arg)) { + $id_args[] = intval($arg); + } else { + $escaped = mysql_real_escape_string($arg, $this->dbh); + $name_args[] = "'" . $escaped . "'"; + } + } + + return array('ids' => $id_args, 'names' => $name_args); + } + + /** * Performs a fulltext mysql search of the package database. * @param $keyword_string A string of keywords to search with. * @return mixed Returns an array of package matches. @@ -129,21 +159,28 @@ class AurJSON { **/ private function info($pqdata) { $fields = implode(',', self::$fields); + $args = $this->parse_info_args($pqdata); + $ids = $args['ids']; + $names = $args['names'];
- $base_query = "SELECT {$fields} " . - " FROM Packages WHERE "; + if (!$ids && !$names) { + return $this->json_error('Invalid query arguments'); + }
- if ( is_numeric($pqdata) ) { - // just using sprintf to coerce the pqd to an int - // should handle sql injection issues, since sprintf will - // bork if not an int, or convert the string to a number 0 - $query_stub = "ID={$pqdata}"; + $query = "SELECT {$fields} " . + " FROM Packages WHERE "; + if ($ids) { + $ids_value = implode(',', $args['ids']); + $query .= "ID IN ({$ids_value})";
I thought 'where in' queries in mysql were known to be terribly slow compared to using 'where ='.
Nope, at least not in this case- if things are indexed properly you have no problems, and we aren't using a subquery for the IN clause. mysql is at least smart enough to recognize the columns being filtered on are unique so it knows the row count is extremely limited before it even runs the query. mysql> explain select * from Packages where id in (10673, 6941) or Name in ('cups-xerox', 'cups-mc2430dl'); +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+ | 1 | SIMPLE | Packages | index_merge | PRIMARY,Name | Name,PRIMARY | 66,4 | NULL | 4 | Using sort_union(Name,PRIMARY); Using where | +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+ 1 row in set (0.00 sec)