Hi Florian, On Fri, 26 Jan 2018 at 22:02:42, Florian Pritz wrote:
[...] Here's the number of requests that a single ipv6 address made today in the last 20.5 hours: 1808234. That's a whopping 24 requests per second averaged over the time frame.
Looking for a specific package name in the requests shows that this user ran cower -u 5713 times in that time period. That's an average delay of ~13 seconds between runs. Looking at the start and end of each batch of requests, it takes roughly 2 seconds for one cower -u run to complete.
While this one is certainly a quite extreme case (not the worst yet, mind you!), there are 47 more that would already have reached the 4000 request limit today.
Ok, that sounds quite bad indeed. Thanks for the clarification!
[...] I'll look into fixing all those things. [...] They are so much nicer to write and I believe they work with all PDO drivers. I'll look into testing that. [...] I didn't know sqlite was supported. I'll look into it.
Great!
+ $stmt->bindParam(":ip", $ip); + $stmt->bindParam(":window_start", $time); + $stmt->execute(); + + // TODO: this should move into some cronjob + $stmt = $this->dbh->prepare(" + DELETE FROM ApiRateLimit + WHERE window_start < :time"); + $stmt->bindParam(":time", $deletion_time); + $stmt->execute();
I am quite confused. There is a similar piece of code in check_ratelimit() already. Is either of these statements a leftover or do they serve different purposes?
The query above removes all limits for IPs that have not visited in a long time. The one in check_ratelimit removes only the limit for a single IP if that limit is no longer relevant. Afterwards a new one is created.
It is not clear to me why this needs to be done in two separate steps. Can't we simply remove all outdated entries with a single query before doing anything else?
I am also curious about the general performance impact of this patch. It reduces request flooding but at the same time adds a couple of additional database queries to each request. Do you happen to have any statistics?
It adds a select and update/insert for most cases. True, those add load, but without them we can not limit API usage at all which leads to wild growth without any consideration as shown above. I didn't look at the performance impact, but I think we need the feature nevertheless. Unless you want to compare this to an implementation with a different technology (maybe memcached/redis or something), it doesn't matter much.
Are you interested in the feature implemented like this (with the fixes) or do you want it built differently/not at all/...? Another idea I have would be to remove the API and distribute the database similar to the pacman repo databases via our mirrors. However, I'm not interested in working on that. It's just an idea.
In case you generally accept the patch idea, I'll work on fixing the issues you mentioned.
I like the idea and I think we should give your implementation (modulo the small issues I mentioned) a try. If it turns out to produce too much load, we can still switch to "something better" later. Regards, Lukas