[PATCH] Add rate limit support to API

Lukas Fleischer lfleischer at archlinux.org
Sun Jan 28 12:12:41 UTC 2018


Hi Florian,

On Fri, 26 Jan 2018 at 22:02:42, Florian Pritz wrote:
> [...]
> Here's the number of requests that a single ipv6 address made today in
> the last 20.5 hours: 1808234. That's a whopping 24 requests per second
> averaged over the time frame.
> 
> Looking for a specific package name in the requests shows that this user
> ran cower -u 5713 times in that time period. That's an average delay of
> ~13 seconds between runs. Looking at the start and end of each batch of
> requests, it takes roughly 2 seconds for one cower -u run to complete.
> 
> While this one is certainly a quite extreme case (not the worst yet,
> mind you!), there are 47 more that would already have reached the 4000
> request limit today.

Ok, that sounds quite bad indeed. Thanks for the clarification!

> [...]
> I'll look into fixing all those things.
> [...]
> They are so much nicer to write and I believe they work with all PDO
> drivers. I'll look into testing that.
> [...]
> I didn't know sqlite was supported. I'll look into it.

Great!

> 
> > 
> >> +               $stmt->bindParam(":ip", $ip);
> >> +               $stmt->bindParam(":window_start", $time);
> >> +               $stmt->execute();
> >> +
> >> +               // TODO: this should move into some cronjob
> >> +               $stmt = $this->dbh->prepare("
> >> +                       DELETE FROM ApiRateLimit
> >> +                       WHERE window_start < :time");
> >> +               $stmt->bindParam(":time", $deletion_time);
> >> +               $stmt->execute();
> > 
> > I am quite confused. There is a similar piece of code in
> > check_ratelimit() already. Is either of these statements a leftover or
> > do they serve different purposes?
> 
> The query above removes all limits for IPs that have not visited in a
> long time. The one in check_ratelimit removes only the limit for a
> single IP if that limit is no longer relevant. Afterwards a new one is
> created.

It is not clear to me why this needs to be done in two separate steps.
Can't we simply remove all outdated entries with a single query before
doing anything else?

> 
> > I am also curious about the general performance impact of this patch. It
> > reduces request flooding but at the same time adds a couple of
> > additional database queries to each request. Do you happen to have any
> > statistics?
> 
> It adds a select and update/insert for most cases. True, those add load,
> but without them we can not limit API usage at all which leads to wild
> growth without any consideration as shown above. I didn't look at the
> performance impact, but I think we need the feature nevertheless. Unless
> you want to compare this to an implementation with a different
> technology (maybe memcached/redis or something), it doesn't matter much.
> 
> Are you interested in the feature implemented like this (with the fixes)
> or do you want it built differently/not at all/...? Another idea I have
> would be to remove the API and distribute the database similar to the
> pacman repo databases via our mirrors. However, I'm not interested in
> working on that. It's just an idea.
> 
> In case you generally accept the patch idea, I'll work on fixing the
> issues you mentioned.

I like the idea and I think we should give your implementation (modulo
the small issues I mentioned) a try. If it turns out to produce too much
load, we can still switch to "something better" later.

Regards,
Lukas


More information about the aur-dev mailing list