[aur-dev] [PATCH 1/4] make gendummydata script more friendly

Wed Apr 6 02:29:54 EDT 2011

On Tue, Apr 5, 2011 at 11:20 PM, Rémy Oudompheng
<remyoudompheng at gmail.com> wrote:
> On Tue 05 April 2011 at 17:54 -0700, elij wrote:
>> - no need to use mysql
>> - just considering categories as an integer range, specified to the size
>>   of that in the aur-schema.
>
> So does this produce valid SQL commands ? Why don't you escape the
> strings anymore ?
>
>> - use logging module instead of writing directly to stderr
>>   this makes the code cleaner as there is only one test for the value of
>>   DBUG.
>
> Why is this in the same patch? And I don't really see the point of using
> the logging module here: it seems to spam the user with dozens of
> "DEBUG: working..." where the previous little dots actually looked nice.

I removed that in a later patch.
Because of the space format issue previously mentioned, I didn't
squash history and turn it into a giant single patch.

>> ---
>>  support/schema/gendummydata.py |  100 +++++++++++++---------------------------
>>  1 files changed, 32 insertions(+), 68 deletions(-)
>>
>> diff --git a/support/schema/gendummydata.py b/support/schema/gendummydata.py
>> index 7b1d0cf..47d9bd5 100755
>> --- a/support/schema/gendummydata.py
>> +++ b/support/schema/gendummydata.py
>> @@ -15,7 +15,8 @@ import os
>>  import sys
>>  import cStringIO
>>  import commands
>> -
>> +import logging
>> +import re
>
> Where is the re module used ?

I forgot to remove this.

I had used it at one point for extracting the category names from the
aur-schema.sql file, but then realized that it was a pointless
endeavor. The names are not used to generate the package data, just
the IDs. Since the category ID for a dummydata package is chosen via
randomization, just choosing a random number from 0 to
count_of_categories is enough. In the case of the current AUR, that is
17. I made it a variable right along side many of the other variables.

Looks like I need to cleanup my patch set a bit.

>>  DBUG      = 1
>>  SEED_FILE = "/usr/share/dict/words"
>> @@ -33,6 +34,7 @@ PKG_FILES = (8, 30)    # min/max number of files in a package
>>  PKG_DEPS  = (1, 5)     # min/max depends a package has
>>  PKG_SRC   = (1, 3)     # min/max sources a package has
>>  PKG_CMNTS = (1, 5)     # min/max number of comments a package has
>> +CATEGORIES_COUNT = 17  # the number of categories from aur-schema
>
> I am wondering whether something like counting the matching lines in
> aur-schema.sql would not be a better idea.

I think the schema for the number of categories changes so seldom,
that it would be pointless.
If the count of categories is increased beyond 17, there simply would
be no test packages with that category (not a critical failure). If
the names of the categories change, it would not matter at all (only
IDs used). The only case that counts is if categories are removed. In
that case, update the variable.