[arch-dev-public] pkgstats: second try

Dan McGee dpmcgee at gmail.com
Fri Sep 10 16:46:53 EDT 2010


On Fri, Sep 10, 2010 at 3:27 PM, Pierre Schmitz <pierre at archlinux.de> wrote:
> On Fri, 10 Sep 2010 16:16:46 -0400, Daenyth Blank
> <daenyth+arch at gmail.com> wrote:
>> On Fri, Sep 10, 2010 at 16:15, Ionuț Bîru <ibiru at archlinux.org> wrote:
>>> i noticed this myself when i tried to submit the data from other machine in
>>> my network.
>>>
>>> like an idea we can use the UUID from the root partition
>> Maybe.. Would it make more sense to take a hash of the eth0 mac
>> address? Not sure if that is sensible... I guess the UUID doesn't
>> change that often.
>
> Well, we have discussed all this before. If I don't limit the
> submission by ip it will be too easy for a single person to flood us
> with false data making the whole stats pointless. The ip is the only
> value you cannot easily spoof over internet.

Sure- I'm not saying don't validate IP addresses at all, but the limit
should probably be higher than 1 submission per IP in the given time
frame.

> Whatever we would implement on the client side (pkgstats) doesn't
> matter as you still can post your data directly or just modify the
> script. (and yes, client ssl certs are overkill and people wont use
> pkgstats)
>
> One thing I could do though is to allow more than one submission per ip
> and day. what would be a reasonable value? Like 10 submission per ip
> within 24h?

What about something like this:
1. Submit something "unique" but relatively harmless- first network
device MAC address seems reasonable. Root UUID would probably be a bit
more work.
2. This suggestion forms a (IP, MAC) combo. If we've seen it before,
let it through- what does it matter? We should just update the
statistics list for this guy.
3. Same IP, new MAC, and MAC is nowhere else in system- let it through
if we haven't had more than X (5? 10?) submissions in the last 24h
from this IP.
4. Same IP, new MAC, MAC is already in system- update the stored IP
address of the system entry, allow submission through overwriting old
submission.
5. Different IP, MAC already in system- same as above in 4- change the
system entry and then allow submission, replacing old values.
6. And so on- we can "trust" IP address, we can't trust MAC address.

Every month, cull the stats- if we haven't heard from you in two
months, you are removed from the counted values. Gather submissions
once a week or so. Thus someone that wanted to poison the stats would
have to keep up with the submissions from all of their bogus MAC
addresses.

-Dan


More information about the arch-dev-public mailing list