[aur-dev] some notes from converting the aur from mysql to postgresql

Mon Apr 25 23:38:05 EDT 2011

On Mon, Apr 25, 2011 at 7:49 PM, Dan McGee <dpmcgee at gmail.com> wrote:
> On Mon, Apr 25, 2011 at 9:35 PM, elij <elij.mx at gmail.com> wrote:
>> While converting the aur codebase from mysql to postgresql (just for
>> fun/to see if I could), I ran into a few mysql specific gotchas, and a
>> few oddities. Here are some of my notes from the experience.
>
> Mind sending the diff along for funzies? Even if we can't apply it as
> a patch, it would be good to see and maybe chop up into "this will be
> compatible with all DBs so we should fix it anyway" and "if we switch
> we will need to look into this".

Sure. I will probably just do a git-format patch and dump the patches
into a tarball and upload it somewhere, since it is quite large, and
not exactly 'split into single change patches'. ;)

>> * I ran into a few sql integer comparisons that are using string
>> comparisons. postgres didn't like this, but mysql was fine with it
>>  ex: `$q .= "WHERE s.UsersId = '" . $userID . "' ";`
> Should definitely be fixed anyway...awesome.

Yeah. Some of these fixes in the above patchset are wrapped up in
other things, but I can try to
split out the "these can be fixed regardless" and "there were fixes to
get postgres working".

>> * using the limit offset format of `"LIMIT x,y"`
>>  postgresql doesn't support this format, so I had to track each
>> occurrence down and change it to `"Limit x OFFSET y"`
> This is SQL standard, so you could use this form anyway, correct? Of
> course the MySQL docs claim this:
> For compatibility with PostgreSQL, MySQL also supports the LIMIT
> row_count OFFSET offset syntax.

Yeah. LIMIT w/OFFSET is sql standard. Not sure why mysql had to create
their own limit/offset
standard, but well.. yeah.

>> * A few instances of `"LIMIT 0,10"`. A `"LIMIT 10"` would have been
>> sufficient. Not sure why offset was specified in these few instances.
> Sure it wasn't auto-generated, in which case it is easier for the DB
> query analyzer to just throw it away anyway?

It looks like there were just two instances.

web/html/rss.php:
    $q = "SELECT * FROM Packages ";
    $q.= "WHERE DummyPkg != 1 ";
    $q.= "ORDER BY SubmittedTS DESC ";
    $q.= "LIMIT 0 , 20";

web/lib/stats.inc:
    $q = 'SELECT * FROM Packages WHERE DummyPkg != 1 ORDER BY
GREATEST(SubmittedTS,ModifiedTS) DESC LIMIT 0 , 10';

>> * A pair of column names were reserved words. "user' and 'end' in the
>> TU_VoteInfo table. I changed the column names to
>>  username and ending, to avoid issues and/or having to quote the name
>> all the time.
>>
>> * postgresql table and column names are lowercase (when not quoted).
>> This required me to either quote all instances of table and
>>  column names (ugh) or change the fetch_assoc php references to
>> lowercase. I chose the latter, even though it was tedious to
>>  track down all instances and change them.
>>
>> * postgres has no mysql_insert_id counterpart. Instead I appended to
>> the query "RETURNING ID" and then used pg_fetch_result
>>  to get the ID that was created.
>>
>> * postgres LIKE and ILIKE queries did full table scans, and was quite
>> slow. I think mysql was caching these queries when I was doing my
>>  performance tests. The end result was postgres performing poorly in
>> my load test for LIKE/ILIKE operations. So I created a postgres
>> fulltext
>>  search column for the packages table, indexed it, defined triggers
>> to keep it udpated, and converted to using postgres fulltext searching
>>  `"@@ to_tsquery('term')"`. The result was quite speedy and gave nice
>> results. The syntax was a bit different for 'prefix searching' though,
>>  and the results were not the same as the LIKE/ILIKE queries, because
>> fulltext uses stemming instead of partial string comparison.
> And case sensitivity would come into play, no?

I built the fulltext index as follows:
alter table packages add column tsv tsvector;
update packages set tsv = to_tsvector('english', coalesce(lower(name),
'') || ' ' || coalesce(lower(description), ''));

then added a trigger to update it for new data:

    DROP FUNCTION IF EXISTS package_fulltext_trigger() CASCADE;
    CREATE FUNCTION package_fulltext_trigger() RETURNS trigger as $$
     begin
      new.tsv :=
       setweight(to_tsvector('english', coalesce(lower(new.name), '')), 'A') ||
       setweight(to_tsvector('english',
coalesce(lower(new.description), '')), 'B');
      return new;
     end
    $$ LANGUAGE plpgsql;

    DROP TRIGGER IF EXISTS packages_tsv_update;
    CREATE TRIGGER packages_tsv_update BEFORE INSERT OR UPDATE
      on packages FOR EACH ROW EXECUTE PROCEDURE package_fulltext_trigger();

>> * There is quite a bit of sql code sprinkled all over the place. It
>> might make sense to try and consolidate some of the sql to fewer
>> files.
>>
>> * It might make sense to generalize the 'mysql_real_escape_string'
>> instances to a wrapper 'db_escape' or something similar. This would
>> could
>>  make things a bit cleaner, ease possible future porting to other
>> databases (fewer things to track down and change individually).
>>
>> * postgres doesn't support `"UNIX_TIMESTAMP()"`. Instead of changing
>> all instances of it, I defined a few convenience sql functions.
>>
>>        --
>>        -- helper functions to ease conversion from mysql
>>        -- from:
>> http://janusz.slota.name/blog/2009/06/mysql-from_unixtime-and-unix_timestamp-functions-in-postgresql/
>>        --
>>        -- no params
>>        CREATE OR REPLACE FUNCTION unix_timestamp() RETURNS BIGINT AS '
>>            SELECT EXTRACT(EPOCH FROM CURRENT_TIMESTAMP(0))::bigint AS result;
>>        ' LANGUAGE 'SQL';
>>
>>        -- timestamp without time zone (i.e. 1973-11-29 21:33:09)
>>        CREATE OR REPLACE FUNCTION unix_timestamp(TIMESTAMP) RETURNS BIGINT AS '
>>            SELECT EXTRACT(EPOCH FROM $1)::bigint AS result;
>>        ' LANGUAGE 'SQL';
>>
>>        -- timestamp with time zone (i.e. 1973-11-29 21:33:09+01)
>>        CREATE OR REPLACE FUNCTION unix_timestamp(TIMESTAMP WITH TIME
>> zone) RETURNS BIGINT AS '
>>            SELECT EXTRACT(EPOCH FROM $1)::bigint AS result;
>>        ' LANGUAGE 'SQL';
>>
>>
>> I think that is it.
>>
>