[aur-dev] some patches to gendummydata (take 3) - Aur-dev - lists.archlinux.org

newer
[aur-dev] Testing patches...

[aur-dev] some patches to gendummydata (take 3)

older
[aur-dev] [PATCH 1/2] Use HTTPS...

elij

7 Apr 2011 7 Apr '11

5:23 a.m.

Updated with changes Lukas requested. - split logging and mysql removal into separate patches - update logging patch with uppercase variable convention - drop tabs -> spaces patch - clarified commit messages

Show replies by date

elij

7 Apr 7 Apr

5:23 a.m.

New subject: [aur-dev] [PATCH 1/4] remove mysql dependency from gendummydata

- remove need to use mysql for escaping the sql -- removing single quote should be enough - instead of using sql to fetch categories from a live database, simply consider categories an integer range, specified to the size of that in the aur-schema. --- support/schema/gendummydata.py | 43 +-------------------------------------- 1 files changed, 2 insertions(+), 41 deletions(-) diff --git a/support/schema/gendummydata.py b/support/schema/gendummydata.py index 7b1d0cf..4dc0de1 100755 --- a/support/schema/gendummydata.py +++ b/support/schema/gendummydata.py @@ -56,33 +56,10 @@ if not os.path.exists(SEED_FILE): sys.stderr.write("Please install the 'words' Arch package\n"); raise SystemExit -# Make sure database access will be available -# -try: - import MySQLdb -except: - sys.stderr.write("Please install the 'mysql-python' Arch package\n"); - raise SystemExit - -# try to connect to database -# -try: - db = MySQLdb.connect(host = DB_HOST, user = DB_USER, - db = DB_NAME, passwd = DB_PASS) - dbc = db.cursor() -except: - sys.stderr.write("Could not connect to database\n"); - raise SystemExit - -esc = db.escape_string - - # track what users/package names have been used # seen_users = {} seen_pkgs = {} -categories = {} -category_keys = [] user_keys = [] # some functions to generate random data @@ -95,7 +72,7 @@ def genVersion(): ver.append("%d" % random.randrange(0,100)) return ".".join(ver) + "-u%d" % random.randrange(1,11) def genCategory(): - return categories[category_keys[random.randrange(0,len(category_keys))]] + return random.randrange(0,CATEGORIES_COUNT) def genUID(): return seen_users[user_keys[random.randrange(0,len(user_keys))]] @@ -149,22 +126,6 @@ while len(seen_pkgs) < MAX_PKGS: # contents = None -# Load package categories from database -# -if DBUG: print "Loading package categories..." -q = "SELECT * FROM PackageCategories" -dbc.execute(q) -row = dbc.fetchone() -while row: - categories[row[1]] = row[0] - row = dbc.fetchone() -category_keys = categories.keys() - -# done with the database -# -dbc.close() -db.close() - # developer/tu IDs # developers = [] @@ -245,7 +206,7 @@ for p in seen_pkgs.keys(): # num_comments = random.randrange(PKG_CMNTS[0], PKG_CMNTS[1]) for i in range(0, num_comments): - fortune = esc(commands.getoutput(FORTUNE_CMD).replace("'","")) + fortune = commands.getoutput(FORTUNE_CMD).replace("'","") now = NOW + random.randrange(400, 86400*3) s = "INSERT INTO PackageComments (PackageID, UsersID, Comments, CommentTS) VALUES (%d, %d, '%s', %d);\n" % (seen_pkgs[p], genUID(), fortune, now) out.write(s) -- 1.7.4.1

elij

5:23 a.m.

New subject: [aur-dev] [PATCH 2/4] replace print statements with logging module in gendummydata

use the logging module instead of writing directly to stderr this makes the code cleaner as it removes the numerous tests for the value of DBUG, yet allows devs to control the level of output verbosity. --- support/schema/gendummydata.py | 63 ++++++++++++++------------------------- 1 files changed, 23 insertions(+), 40 deletions(-) diff --git a/support/schema/gendummydata.py b/support/schema/gendummydata.py index 4dc0de1..290002a 100755 --- a/support/schema/gendummydata.py +++ b/support/schema/gendummydata.py @@ -15,9 +15,9 @@ import os import sys import cStringIO import commands +import logging - -DBUG = 1 +LOG_LEVEL = logging.DEBUG # logging level. set to logging.INFO to reduce output SEED_FILE = "/usr/share/dict/words" DB_HOST = os.getenv("DB_HOST", "localhost") DB_NAME = os.getenv("DB_NAME", "AUR") @@ -33,6 +33,7 @@ PKG_FILES = (8, 30) # min/max number of files in a package PKG_DEPS = (1, 5) # min/max depends a package has PKG_SRC = (1, 3) # min/max sources a package has PKG_CMNTS = (1, 5) # min/max number of comments a package has +CATEGORIES_COUNT = 17 # the number of categories from aur-schema VOTING = (0, .30) # percentage range for package voting RANDOM_PATHS = ( # random path locations for package files "/usr/bin", "/usr/lib", "/etc", "/etc/rc.d", "/usr/share", "/lib", @@ -45,15 +46,19 @@ RANDOM_URL = ("http://www.", "ftp://ftp.", "http://", "ftp://") RANDOM_LOCS = ("pub", "release", "files", "downloads", "src") FORTUNE_CMD = "/usr/bin/fortune -l" +# setup logging +logformat = "%(levelname)s: %(message)s" +logging.basicConfig(format=logformat, level=LOG_LEVEL) +log = logging.getLogger() if len(sys.argv) != 2: - sys.stderr.write("Missing output filename argument"); + log.error("Missing output filename argument") raise SystemExit # make sure the seed file exists # if not os.path.exists(SEED_FILE): - sys.stderr.write("Please install the 'words' Arch package\n"); + log.error("Please install the 'words' Arch package") raise SystemExit # track what users/package names have been used @@ -79,7 +84,7 @@ def genUID(): # load the words, and make sure there are enough words for users/pkgs # -if DBUG: print "Grabbing words from seed file..." +log.debug("Grabbing words from seed file...") fp = open(SEED_FILE, "r") contents = fp.readlines() fp.close() @@ -94,7 +99,7 @@ else: # select random usernames # -if DBUG: print "Generating random user names..." +log.debug("Generating random user names...") user_id = USER_ID while len(seen_users) < MAX_USERS: user = random.randrange(0, len(contents)) @@ -107,7 +112,7 @@ user_keys = seen_users.keys() # select random package names # -if DBUG: print "Generating random package names..." +log.debug("Generating random package names...") num_pkgs = PKG_ID while len(seen_pkgs) < MAX_PKGS: pkg = random.randrange(0, len(contents)) @@ -140,8 +145,7 @@ out.write("BEGIN;\n") # Begin by creating the User statements # -if DBUG: print "Creating SQL statements for users.", -count = 0 +log.debug("Creating SQL statements for users.") for u in user_keys: account_type = 1 # default to normal user if not has_devs or not has_tus: @@ -162,22 +166,18 @@ for u in user_keys: # a normal user account # pass - + s = "INSERT INTO Users (ID, AccountTypeID, Username, Email, Passwd) VALUES (%d, %d, '%s', '%s@example.com', MD5('%s'));\n" % (seen_users[u], account_type, u, u, u) out.write(s) - if count % 10 == 0: - if DBUG: print ".", - count += 1 -if DBUG: print "." -if DBUG: - print "Number of developers:", len(developers) - print "Number of trusted users:", len(trustedusers) - print "Number of users:", (MAX_USERS-len(developers)-len(trustedusers)) - print "Number of packages:", MAX_PKGS + +log.debug("Number of developers: %d" % len(developers)) +log.debug("Number of trusted users: %d" % len(trustedusers)) +log.debug("Number of users: %d" % (MAX_USERS-len(developers)-len(trustedusers))) +log.debug("Number of packages: %d" % MAX_PKGS) # Create the package statements # -if DBUG: print "Creating SQL statements for packages.", +log.debug("Creating SQL statements for packages.") count = 0 for p in seen_pkgs.keys(): NOW = int(time.time()) @@ -198,8 +198,6 @@ for p in seen_pkgs.keys(): genCategory(), NOW, uuid, muid) out.write(s) - if count % 100 == 0: - if DBUG: print ".", count += 1 # create random comments for this package @@ -211,13 +209,10 @@ for p in seen_pkgs.keys(): s = "INSERT INTO PackageComments (PackageID, UsersID, Comments, CommentTS) VALUES (%d, %d, '%s', %d);\n" % (seen_pkgs[p], genUID(), fortune, now) out.write(s) -if DBUG: print "." - # Cast votes # track_votes = {} -if DBUG: print "Casting votes for packages.", -count = 0 +log.debug("Casting votes for packages.") for u in user_keys: num_votes = random.randrange(int(len(seen_pkgs)*VOTING[0]), int(len(seen_pkgs)*VOTING[1])) @@ -231,9 +226,6 @@ for u in user_keys: track_votes[pkg] = 0 track_votes[pkg] += 1 out.write(s) - if count % 100 == 0: - if DBUG: print ".", - count += 1 # Update statements for package votes # @@ -243,8 +235,7 @@ for p in track_votes.keys(): # Create package dependencies and sources # -if DBUG: print "."; print "Creating statements for package depends/sources.", -count = 0 +log.debug("Creating statements for package depends/sources.") for p in seen_pkgs.keys(): num_deps = random.randrange(PKG_DEPS[0], PKG_DEPS[1]) this_deps = {} @@ -268,17 +259,9 @@ for p in seen_pkgs.keys(): seen_pkgs[p], src) out.write(s) - if count % 100 == 0: - if DBUG: print ".", - count += 1 - - # close output file # out.write("COMMIT;\n") out.write("\n") out.close() - -if DBUG: print "." -if DBUG: print "Done." - +log.debug("Done.") -- 1.7.4.1

elij

5:23 a.m.

New subject: [aur-dev] [PATCH 3/4] wrap long SQL commands to improve formatting and readability

--- support/schema/gendummydata.py | 34 +++++++++++++++++++++++----------- 1 files changed, 23 insertions(+), 11 deletions(-) diff --git a/support/schema/gendummydata.py b/support/schema/gendummydata.py index 290002a..cb27f9c 100755 --- a/support/schema/gendummydata.py +++ b/support/schema/gendummydata.py @@ -167,7 +167,9 @@ for u in user_keys: # pass - s = "INSERT INTO Users (ID, AccountTypeID, Username, Email, Passwd) VALUES (%d, %d, '%s', '%s@example.com', MD5('%s'));\n" % (seen_users[u], account_type, u, u, u) + s = ("INSERT INTO Users (ID, AccountTypeID, Username, Email, Passwd)" + " VALUES (%d, %d, '%s', '%s@example.com', MD5('%s'));\n") + s = s % (seen_users[u], account_type, u, u, u) out.write(s) log.debug("Number of developers: %d" % len(developers)) @@ -191,11 +193,15 @@ for p in seen_pkgs.keys(): uuid = genUID() # the submitter/user if muid == 0: - s = "INSERT INTO Packages (ID, Name, Version, CategoryID, SubmittedTS, SubmitterUID, MaintainerUID) VALUES (%d, '%s', '%s', %d, %d, %d, NULL);\n" % (seen_pkgs[p], p, genVersion(), - genCategory(), NOW, uuid) + s = ("INSERT INTO Packages (ID, Name, Version, CategoryID," + " SubmittedTS, SubmitterUID, MaintainerUID) VALUES" + " (%d, '%s', '%s', %d, %d, %d, NULL);\n") + s = s % (seen_pkgs[p], p, genVersion(), genCategory(), NOW, uuid) else: - s = "INSERT INTO Packages (ID, Name, Version, CategoryID, SubmittedTS, SubmitterUID, MaintainerUID) VALUES (%d, '%s', '%s', %d, %d, %d, %d);\n" % (seen_pkgs[p], p, genVersion(), - genCategory(), NOW, uuid, muid) + s = ("INSERT INTO Packages (ID, Name, Version, CategoryID," + " SubmittedTS, SubmitterUID, MaintainerUID) VALUES " + " (%d, '%s', '%s', %d, %d, %d, %d);\n") + s = s % (seen_pkgs[p], p, genVersion(), genCategory(), NOW, uuid, muid) out.write(s) count += 1 @@ -206,7 +212,9 @@ for p in seen_pkgs.keys(): for i in range(0, num_comments): fortune = commands.getoutput(FORTUNE_CMD).replace("'","") now = NOW + random.randrange(400, 86400*3) - s = "INSERT INTO PackageComments (PackageID, UsersID, Comments, CommentTS) VALUES (%d, %d, '%s', %d);\n" % (seen_pkgs[p], genUID(), fortune, now) + s = ("INSERT INTO PackageComments (PackageID, UsersID," + " Comments, CommentTS) VALUES (%d, %d, '%s', %d);\n") + s = s % (seen_pkgs[p], genUID(), fortune, now) out.write(s) # Cast votes @@ -220,7 +228,9 @@ for u in user_keys: for v in range(num_votes): pkg = random.randrange(1, len(seen_pkgs) + 1) if not pkgvote.has_key(pkg): - s = "INSERT INTO PackageVotes (UsersID, PackageID) VALUES (%d, %d);\n" % (seen_users[u], pkg) + s = ("INSERT INTO PackageVotes (UsersID, PackageID)" + " VALUES (%d, %d);\n") + s = s % (seen_users[u], pkg) pkgvote[pkg] = 1 if not track_votes.has_key(pkg): track_votes[pkg] = 0 @@ -230,7 +240,8 @@ for u in user_keys: # Update statements for package votes # for p in track_votes.keys(): - s = "UPDATE Packages SET NumVotes = %d WHERE ID = %d;\n" % (track_votes[p], p) + s = "UPDATE Packages SET NumVotes = %d WHERE ID = %d;\n" + s = s % (track_votes[p], p) out.write(s) # Create package dependencies and sources @@ -243,7 +254,8 @@ for p in seen_pkgs.keys(): while i != num_deps: dep = random.randrange(1, len(seen_pkgs) + 1) if not this_deps.has_key(dep): - s = "INSERT INTO PackageDepends VALUES (%d, %d, NULL);\n" % (seen_pkgs[p], dep) + s = "INSERT INTO PackageDepends VALUES (%d, %d, NULL);\n" + s = s % (seen_pkgs[p], dep) out.write(s) i += 1 @@ -255,8 +267,8 @@ for p in seen_pkgs.keys(): p, RANDOM_TLDS[random.randrange(0,len(RANDOM_TLDS))], RANDOM_LOCS[random.randrange(0,len(RANDOM_LOCS))], src_file, genVersion()) - s = "INSERT INTO PackageSources VALUES (%d, '%s');\n" % ( - seen_pkgs[p], src) + s = "INSERT INTO PackageSources VALUES (%d, '%s');\n" + s = s % (seen_pkgs[p], src) out.write(s) # close output file -- 1.7.4.1

elij

5:27 a.m.

On Wed, Apr 6, 2011 at 7:23 PM, elij <elij.mx@gmail.com> wrote:

Updated with changes Lukas requested.

- split logging and mysql removal into separate patches - update logging patch with uppercase variable convention - drop tabs -> spaces patch - clarified commit messages

hmm. looks like I missed a single line extraction in my rebase -i for splitting the mysql and logging patches.

+CATEGORIES_COUNT = 17 # the number of categories from aur-schema

I can fix and resend if desired.

Lukas Fleischer

4:21 p.m.

On Wed, Apr 06, 2011 at 07:27:59PM -0700, elij wrote:

On Wed, Apr 6, 2011 at 7:23 PM, elij <elij.mx@gmail.com> wrote:

...
Updated with changes Lukas requested.

- split logging and mysql removal into separate patches - update logging patch with uppercase variable convention - drop tabs -> spaces patch - clarified commit messages

hmm. looks like I missed a single line extraction in my rebase -i for splitting the mysql and logging patches.

...
+CATEGORIES_COUNT = 17 # the number of categories from aur-schema

I can fix and resend if desired.

Nah, 's alright. I'll keep that in mind and fix it when applying your patches. Apart from that, your patches look fine to me now. I'll push them as soon as the gettext/Transifex transition is done.

Lukas Fleischer

12 Apr 12 Apr

2:08 p.m.

On Thu, Apr 07, 2011 at 03:21:48PM +0200, Lukas Fleischer wrote:

On Wed, Apr 06, 2011 at 07:27:59PM -0700, elij wrote:

...
On Wed, Apr 6, 2011 at 7:23 PM, elij <elij.mx@gmail.com> wrote:

...
Updated with changes Lukas requested.

- split logging and mysql removal into separate patches - update logging patch with uppercase variable convention - drop tabs -> spaces patch - clarified commit messages

hmm. looks like I missed a single line extraction in my rebase -i for splitting the mysql and logging patches.

...
+CATEGORIES_COUNT = 17 # the number of categories from aur-schema

I can fix and resend if desired.

Nah, 's alright. I'll keep that in mind and fix it when applying your patches.

Apart from that, your patches look fine to me now. I'll push them as soon as the gettext/Transifex transition is done.

Pushed, including the "CATEGORIES_COUNT" fix. I also changed the random number range used in genCategory() in the MySQL dependency patch to generate 1-based IDs instead of 0-based ones. Otherwise it would generate wrong IDs and foreign key constraints on the "CategoryID" column would fail when importing dummy data into the database.

5463

Age (days ago)

5468

Last active (days ago)

Download

6 comments

2 participants

tags

participants (2)

elij
Lukas Fleischer