[PATCH] Add rate limit support to API
Florian Pritz
bluewind at xinu.at
Fri Jan 26 21:02:42 UTC 2018
On 26.01.2018 20:39, Lukas Fleischer wrote:
> On Thu, 07 Dec 2017 at 23:38:59, Florian Pritz wrote:
>> This allows us to prevent users from hammering the API every few seconds
>> to check if any of their packages were updated. Real world users check
>> as often as every 5 or 10 seconds.
>
> First of all, thanks for the patch!
>
> Out of curiosity: is there a noticeable performance impact caused by
> these requests? Our is this just a safety measure for the future? I am
> also asking because below, your are setting the default limit to one
> request every ~20 seconds. Do you think a reduction by a factor of two
> is sufficient to mitigate the issue?
They don't send one request every 5 or 10 seconds, they send one request
for each aur package they have installed because they use cower -u. So
it's more like 20-100 requests from a single users each 5 or 10 seconds
in the bad cases.
The most noticeable impact is that the log file contains millions of
requests per day and thus 1-2GiB of text per day. I don't so much care
about the performance impact since I have no idea how to measure that
with all the things we have on the box anyways. What I really care about
is people not abusing server resources just because they are lazy and
put cower -u into their conky config and conky then runs the command all
the time for no good reason.
Here's the number of requests that a single ipv6 address made today in
the last 20.5 hours: 1808234. That's a whopping 24 requests per second
averaged over the time frame.
Looking for a specific package name in the requests shows that this user
ran cower -u 5713 times in that time period. That's an average delay of
~13 seconds between runs. Looking at the start and end of each batch of
requests, it takes roughly 2 seconds for one cower -u run to complete.
While this one is certainly a quite extreme case (not the worst yet,
mind you!), there are 47 more that would already have reached the 4000
request limit today.
I can only imagine that this problem will grow worse in the future
unless we start implementing rate limiting. Even then, users may take a
while to notice they run into the limit and they might not adjust their
scripts that quickly. Still, I think being able to limit API usage is
important because otherwise we can only block them completely which is
arguably difficult to debug for them and might be interpreted as our
servers not working in the worst case.
I plan to eventually reduce the limit once cower is used less, but first
we need the feature.
>> [...]
>> --- /dev/null
>> +++ b/upgrading/4.7.0.txt
>
> You also need to create the table in our database schema (which is
> located under schema/aur-schema.sql).
>
>> @@ -0,0 +1,11 @@
>> +1. Add ApiRateLimit table:
>> +
>> +---
>> +CREATE TABLE `ApiRateLimit` (
>> + `ip` varchar(45) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
>> + `requests` int(11) NOT NULL,
>> + `window_start` bigint(20) NOT NULL,
>
> We use for the field names in all other tables and I would like to keep
> this consistent. Upper case is used for the data types.
>
>> + PRIMARY KEY (`ip`),
>> + KEY `window_start_idx` (`window_start`)
>
> Same applies to the key.
>
>> +) ENGINE=InnoDB;
>
> Space before and after the "=".
I'll look into fixing all those things.
>> @@ -130,6 +135,72 @@ public function handle($http_data) {
>> }
>> }
>>
>> + /*
>> + * Check if an IP needs to be rate limited.
>> + *
>> + * @param $ip IP of the current request
>> + *
>> + * @return true if IP needs to be rate limited, false otherwise.
>> + */
>> + private function check_ratelimit($ip) {
>> + $limit = config_get("ratelimit", "request_limit");
>> + $window_length = config_get("ratelimit", "window_length");
>> + $this->update_ratelimit($ip);
>> + $stmt = $this->dbh->prepare("
>> + SELECT requests,window_start FROM ApiRateLimit
>> + WHERE ip = :ip");
>> + $stmt->bindParam(":ip", $ip);
>
> We usually do not use prepared statements but I think it does not
> hurt -- unless it breaks with SQLite.
They are so much nicer to write and I believe they work with all PDO
drivers. I'll look into testing that.
>
>> + $result = $stmt->execute();
>> +
>> + if (!$result) {
>> + return false;
>> + }
>> +
>> + $row = $stmt->fetch(PDO::FETCH_ASSOC);
>> +
>> + if ($row['window_start'] < time() - $window_length) {
>> + $stmt = $this->dbh->prepare("
>> + DELETE FROM ApiRateLimit
>> + WHERE ip = :ip");
>> + $stmt->bindParam(":ip", $ip);
>> + $stmt->execute();
>> + return false;
>> + }
>> +
>> + if ($row['requests'] > $limit) {
>> + return true;
>> + }
>> + return false;
>> + }
>> +
>> + /*
>> + * Update a rate limit for an IP by increasing it's requests value by one.
>> + *
>> + * @param $ip IP of the current request
>> + *
>> + * @return void
>> + */
>> + private function update_ratelimit($ip) {
>> + $window_length = config_get("ratelimit", "window_length");
>> + $time = time();
>> + $deletion_time = $time - $window_length;
>> + $stmt = $this->dbh->prepare("
>> + INSERT INTO ApiRateLimit
>> + (ip, requests, window_start)
>> + VALUES (:ip, 0, :window_start)
>> + ON DUPLICATE KEY UPDATE requests=requests+1");
>
> I do not think this works with SQLite. Is there a more standardized way
> of doing this? Maybe "INSERT OR REPLACE"?
I didn't know sqlite was supported. I'll look into it.
>
>> + $stmt->bindParam(":ip", $ip);
>> + $stmt->bindParam(":window_start", $time);
>> + $stmt->execute();
>> +
>> + // TODO: this should move into some cronjob
>> + $stmt = $this->dbh->prepare("
>> + DELETE FROM ApiRateLimit
>> + WHERE window_start < :time");
>> + $stmt->bindParam(":time", $deletion_time);
>> + $stmt->execute();
>
> I am quite confused. There is a similar piece of code in
> check_ratelimit() already. Is either of these statements a leftover or
> do they serve different purposes?
The query above removes all limits for IPs that have not visited in a
long time. The one in check_ratelimit removes only the limit for a
single IP if that limit is no longer relevant. Afterwards a new one is
created.
> I am also curious about the general performance impact of this patch. It
> reduces request flooding but at the same time adds a couple of
> additional database queries to each request. Do you happen to have any
> statistics?
It adds a select and update/insert for most cases. True, those add load,
but without them we can not limit API usage at all which leads to wild
growth without any consideration as shown above. I didn't look at the
performance impact, but I think we need the feature nevertheless. Unless
you want to compare this to an implementation with a different
technology (maybe memcached/redis or something), it doesn't matter much.
Are you interested in the feature implemented like this (with the fixes)
or do you want it built differently/not at all/...? Another idea I have
would be to remove the API and distribute the database similar to the
pacman repo databases via our mirrors. However, I'm not interested in
working on that. It's just an idea.
In case you generally accept the patch idea, I'll work on fixing the
issues you mentioned.
Florian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 858 bytes
Desc: OpenPGP digital signature
URL: <https://lists.archlinux.org/pipermail/aur-dev/attachments/20180126/3676495a/attachment.asc>
More information about the aur-dev
mailing list