Re: [PATCH] Add rate limit support to API

28 Jan 2018


      Hi Florian,

On Fri, 26 Jan 2018 at 22:02:42, Florian Pritz wrote:
...
[...]
Here's the number of requests that a single ipv6 address made today in
the last 20.5 hours: 1808234. That's a whopping 24 requests per second
averaged over the time frame.
Looking for a specific package name in the requests shows that this user
ran cower -u 5713 times in that time period. That's an average delay of
~13 seconds between runs. Looking at the start and end of each batch of
requests, it takes roughly 2 seconds for one cower -u run to complete.
While this one is certainly a quite extreme case (not the worst yet,
mind you!), there are 47 more that would already have reached the 4000
request limit today.
Ok, that sounds quite bad indeed. Thanks for the clarification!
...
[...]
I'll look into fixing all those things.
[...]
They are so much nicer to write and I believe they work with all PDO
drivers. I'll look into testing that.
[...]
I didn't know sqlite was supported. I'll look into it.
Great!
...
...
...
+               $stmt->bindParam(":ip", $ip);
+               $stmt->bindParam(":window_start", $time);
+               $stmt->execute();
+
+               // TODO: this should move into some cronjob
+               $stmt = $this->dbh->prepare("
+                       DELETE FROM ApiRateLimit
+                       WHERE window_start < :time");
+               $stmt->bindParam(":time", $deletion_time);
+               $stmt->execute();
I am quite confused. There is a similar piece of code in
check_ratelimit() already. Is either of these statements a leftover or
do they serve different purposes?
The query above removes all limits for IPs that have not visited in a
long time. The one in check_ratelimit removes only the limit for a
single IP if that limit is no longer relevant. Afterwards a new one is
created.
It is not clear to me why this needs to be done in two separate steps.
Can't we simply remove all outdated entries with a single query before
doing anything else?
...
...
I am also curious about the general performance impact of this patch. It
reduces request flooding but at the same time adds a couple of
additional database queries to each request. Do you happen to have any
statistics?
It adds a select and update/insert for most cases. True, those add load,
but without them we can not limit API usage at all which leads to wild
growth without any consideration as shown above. I didn't look at the
performance impact, but I think we need the feature nevertheless. Unless
you want to compare this to an implementation with a different
technology (maybe memcached/redis or something), it doesn't matter much.
Are you interested in the feature implemented like this (with the fixes)
or do you want it built differently/not at all/...? Another idea I have
would be to remove the API and distribute the database similar to the
pacman repo databases via our mirrors. However, I'm not interested in
working on that. It's just an idea.
In case you generally accept the patch idea, I'll work on fixing the
issues you mentioned.
I like the idea and I think we should give your implementation (modulo
the small issues I mentioned) a try. If it turns out to produce too much
load, we can still switch to "something better" later.

Regards,
Lukas

Re: [PATCH] Add rate limit support to API

Lukas Fleischer