On Fri, 10 Sep 2010 15:46:53 -0500, Dan McGee <dpmcgee@gmail.com> wrote:
On Fri, Sep 10, 2010 at 3:27 PM, Pierre Schmitz <pierre@archlinux.de> wrote:
Well, we have discussed all this before. If I don't limit the submission by ip it will be too easy for a single person to flood us with false data making the whole stats pointless. The ip is the only value you cannot easily spoof over internet.
Sure- I'm not saying don't validate IP addresses at all, but the limit should probably be higher than 1 submission per IP in the given time frame.
You are now allowed to do 10 submission per IP within 24h. Of course there are always corner cases but I am happy if we catch most use cases here without making it to easy to screw the whole stats for one single person.
What about something like this: 1. Submit something "unique" but relatively harmless- first network device MAC address seems reasonable. Root UUID would probably be a bit more work. 2. This suggestion forms a (IP, MAC) combo. If we've seen it before, let it through- what does it matter? We should just update the statistics list for this guy. 3. Same IP, new MAC, and MAC is nowhere else in system- let it through if we haven't had more than X (5? 10?) submissions in the last 24h from this IP. 4. Same IP, new MAC, MAC is already in system- update the stored IP address of the system entry, allow submission through overwriting old submission. 5. Different IP, MAC already in system- same as above in 4- change the system entry and then allow submission, replacing old values. 6. And so on- we can "trust" IP address, we can't trust MAC address.
Thinking about that it's probably not worth the effort. The MAC address or /-uuid would just be a user submitted value. This wouldn't make it any harder for idiots to flood the db. It just increases the workload on our side. We would also collect more data from the users than we need which might raise privacy concerns.
Every month, cull the stats- if we haven't heard from you in two months, you are removed from the counted values. Gather submissions once a week or so. Thus someone that wanted to poison the stats would have to keep up with the submissions from all of their bogus MAC addresses.
Indeed. The more fair users participate on a regular base the better the results are and some false data from idiots wont matter. -- Pierre Schmitz, https://users.archlinux.de/~pierre