[aur-dev] [PATCH 4/4] rpc: allow multiple args on info query

Dan McGee dpmcgee at gmail.com
Tue Apr 12 02:11:26 EDT 2011


On Tue, Apr 12, 2011 at 1:04 AM, elij <elij.mx at gmail.com> wrote:
> On Mon, Apr 11, 2011 at 10:15 PM, Dan McGee <dan at archlinux.org> wrote:
>> The majority of "real world" info requests [1] come in hefty batches. We
>> would be better served to handle these in one request rather than
>> multiple by allowing AUR clients to send multiple arguments.
>>
>> This enables things like this to work:
>>    http://aur.test/rpc.php?type=info&arg[]=cups-xerox&arg[]=cups-mc2430dl&arg[]=10673
>
> This kind of breaks the query format convention (thanks php!).
> It might be cleaner to use a separator (like a space) to break up args
> (and simply split on space).
>
>    http://aur.test/rpc.php?type=info&arg=cups-xerox+cups-mc1430dl+10673
>
> (note using plus signs here as an example since spaces or %20s are
> harder to 'see')

I know it breaks "convention", but introducing another seemed a bit
silly to me. All old queries work 100% however, if that wasn't clear.
If we want to expand this to msearch or search down the road (which
seemed totally plausible to me when designing the code here), where a
space has totally different meaning and you would not want to split on
it, this new syntax would not work at all.

>> Note to RPC users: unfortunately due to the asinine design of PHP, you
>> unfortunately have to use the 'arg[]' syntax if you want more than one
>> query argument, or you will only get the package satisfying the last arg
>> you pass.
>>
>> [1] Rough data from April 11, 2011, with a total hit count of 1,109,163:
>>     12 /login.php
>>     13 /rpc.php?type=sarch
>>     15 /rpc.php?type=msearch
>>     16 /pingserver.php
>>     16 /rpc.php
>>     22 /logout.php
>>    163 /passreset.php
>>    335 /account.php
>>    530 /pkgsubmit.php
>>    916 /rss2.php
>>   3838 /index.php
>>   6752 /rss.php
>>   9699 /
>>  42478 /rpc.php?type=search
>>  184737 /packages.php
>>  681725 /rpc.php?type=info
>>
>> That means a whopping 61.5% of our requests were for info over the RPC
>> interface; package pages are a distant second at only 16.7%.
>>
>> Signed-off-by: Dan McGee <dan at archlinux.org>
>> ---
>>  web/lib/aurjson.class.php |   59 ++++++++++++++++++++++++++++++++++++--------
>>  1 files changed, 48 insertions(+), 11 deletions(-)
>>
>> diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php
>> index 9b0e1a0..57096d8 100644
>> --- a/web/lib/aurjson.class.php
>> +++ b/web/lib/aurjson.class.php
>> @@ -101,6 +101,36 @@ class AurJSON {
>>     }
>>
>>     /**
>> +     * Parse the args to the info function. We may have a string or an array,
>> +     * so do the appropriate thing. Within the elements, both * package IDs
>> +     * and package names are valid; sort them into the relevant arrays and
>> +     * escape/quote the names.
>> +     * @param $args the arg string or array to parse.
>> +     * @return mixed An array containing 'ids' and 'names'.
>> +     **/
>> +    private function parse_info_args($args) {
>> +        if (!is_array($args)) {
>> +            $args = array($args);
>> +        }
>> +
>> +        $id_args = array();
>> +        $name_args = array();
>> +        foreach ($args as $arg) {
>> +            if (!$arg) {
>> +                continue;
>> +            }
>> +            if (is_numeric($arg)) {
>> +                $id_args[] = intval($arg);
>> +            } else {
>> +                $escaped = mysql_real_escape_string($arg, $this->dbh);
>> +                $name_args[] = "'" . $escaped . "'";
>> +            }
>> +        }
>> +
>> +        return array('ids' => $id_args, 'names' => $name_args);
>> +    }
>> +
>> +    /**
>>      * Performs a fulltext mysql search of the package database.
>>      * @param $keyword_string A string of keywords to search with.
>>      * @return mixed Returns an array of package matches.
>> @@ -129,21 +159,28 @@ class AurJSON {
>>      **/
>>     private function info($pqdata) {
>>         $fields = implode(',', self::$fields);
>> +        $args = $this->parse_info_args($pqdata);
>> +        $ids = $args['ids'];
>> +        $names = $args['names'];
>>
>> -        $base_query = "SELECT {$fields} " .
>> -            " FROM Packages WHERE ";
>> +        if (!$ids && !$names) {
>> +            return $this->json_error('Invalid query arguments');
>> +        }
>>
>> -        if ( is_numeric($pqdata) ) {
>> -            // just using sprintf to coerce the pqd to an int
>> -            // should handle sql injection issues, since sprintf will
>> -            // bork if not an int, or convert the string to a number 0
>> -            $query_stub = "ID={$pqdata}";
>> +        $query = "SELECT {$fields} " .
>> +            " FROM Packages WHERE ";
>> +        if ($ids) {
>> +            $ids_value = implode(',', $args['ids']);
>> +            $query .= "ID IN ({$ids_value})";
>
> I thought 'where in' queries in mysql were known to be terribly slow
> compared to using 'where ='.

Nope, at least not in this case- if things are indexed properly you
have no problems, and we aren't using a subquery for the IN clause.
mysql is at least smart enough to recognize the columns being filtered
on are unique so it knows the row count is extremely limited before it
even runs the query.

mysql> explain select * from Packages where id in (10673, 6941) or
Name in ('cups-xerox', 'cups-mc2430dl');
+----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
| id | select_type | table    | type        | possible_keys | key
    | key_len | ref  | rows | Extra
   |
+----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
|  1 | SIMPLE      | Packages | index_merge | PRIMARY,Name  |
Name,PRIMARY | 66,4    | NULL |    4 | Using sort_union(Name,PRIMARY);
Using where |
+----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
1 row in set (0.00 sec)


More information about the aur-dev mailing list