[arch-dev-public] pkgstats: second try

Pierre Schmitz pierre at archlinux.de
Mon Sep 13 06:39:27 EDT 2010


On Fri, 10 Sep 2010 15:46:53 -0500, Dan McGee <dpmcgee at gmail.com>
wrote:
> On Fri, Sep 10, 2010 at 3:27 PM, Pierre Schmitz <pierre at archlinux.de> wrote:
>> Well, we have discussed all this before. If I don't limit the
>> submission by ip it will be too easy for a single person to flood us
>> with false data making the whole stats pointless. The ip is the only
>> value you cannot easily spoof over internet.
> 
> Sure- I'm not saying don't validate IP addresses at all, but the limit
> should probably be higher than 1 submission per IP in the given time
> frame.

You are now allowed to do 10 submission per IP within 24h. Of course
there are always corner cases but I am happy if we catch most use cases
here without making it to easy to screw the whole stats for one single
person.

> What about something like this:
> 1. Submit something "unique" but relatively harmless- first network
> device MAC address seems reasonable. Root UUID would probably be a bit
> more work.
> 2. This suggestion forms a (IP, MAC) combo. If we've seen it before,
> let it through- what does it matter? We should just update the
> statistics list for this guy.
> 3. Same IP, new MAC, and MAC is nowhere else in system- let it through
> if we haven't had more than X (5? 10?) submissions in the last 24h
> from this IP.
> 4. Same IP, new MAC, MAC is already in system- update the stored IP
> address of the system entry, allow submission through overwriting old
> submission.
> 5. Different IP, MAC already in system- same as above in 4- change the
> system entry and then allow submission, replacing old values.
> 6. And so on- we can "trust" IP address, we can't trust MAC address.

Thinking about that it's probably not worth the effort. The MAC address
or /-uuid would just be a user submitted value. This wouldn't make it
any harder for idiots to flood the db. It just increases the workload on
our side. We would also collect more data from the users than we need
which might raise privacy concerns.

> Every month, cull the stats- if we haven't heard from you in two
> months, you are removed from the counted values. Gather submissions
> once a week or so. Thus someone that wanted to poison the stats would
> have to keep up with the submissions from all of their bogus MAC
> addresses.

Indeed. The more fair users participate on a regular base the better
the results are and some false data from idiots wont matter.

-- 
Pierre Schmitz, https://users.archlinux.de/~pierre


More information about the arch-dev-public mailing list