[aur-dev] some notes from converting the aur from mysql to postgresql
elij
elij.mx at gmail.com
Mon Apr 25 23:38:05 EDT 2011
On Mon, Apr 25, 2011 at 7:49 PM, Dan McGee <dpmcgee at gmail.com> wrote:
> On Mon, Apr 25, 2011 at 9:35 PM, elij <elij.mx at gmail.com> wrote:
>> While converting the aur codebase from mysql to postgresql (just for
>> fun/to see if I could), I ran into a few mysql specific gotchas, and a
>> few oddities. Here are some of my notes from the experience.
>
> Mind sending the diff along for funzies? Even if we can't apply it as
> a patch, it would be good to see and maybe chop up into "this will be
> compatible with all DBs so we should fix it anyway" and "if we switch
> we will need to look into this".
Sure. I will probably just do a git-format patch and dump the patches
into a tarball and upload it somewhere, since it is quite large, and
not exactly 'split into single change patches'. ;)
>> * I ran into a few sql integer comparisons that are using string
>> comparisons. postgres didn't like this, but mysql was fine with it
>> ex: `$q .= "WHERE s.UsersId = '" . $userID . "' ";`
> Should definitely be fixed anyway...awesome.
Yeah. Some of these fixes in the above patchset are wrapped up in
other things, but I can try to
split out the "these can be fixed regardless" and "there were fixes to
get postgres working".
>> * using the limit offset format of `"LIMIT x,y"`
>> postgresql doesn't support this format, so I had to track each
>> occurrence down and change it to `"Limit x OFFSET y"`
> This is SQL standard, so you could use this form anyway, correct? Of
> course the MySQL docs claim this:
> For compatibility with PostgreSQL, MySQL also supports the LIMIT
> row_count OFFSET offset syntax.
Yeah. LIMIT w/OFFSET is sql standard. Not sure why mysql had to create
their own limit/offset
standard, but well.. yeah.
>> * A few instances of `"LIMIT 0,10"`. A `"LIMIT 10"` would have been
>> sufficient. Not sure why offset was specified in these few instances.
> Sure it wasn't auto-generated, in which case it is easier for the DB
> query analyzer to just throw it away anyway?
It looks like there were just two instances.
web/html/rss.php:
$q = "SELECT * FROM Packages ";
$q.= "WHERE DummyPkg != 1 ";
$q.= "ORDER BY SubmittedTS DESC ";
$q.= "LIMIT 0 , 20";
web/lib/stats.inc:
$q = 'SELECT * FROM Packages WHERE DummyPkg != 1 ORDER BY
GREATEST(SubmittedTS,ModifiedTS) DESC LIMIT 0 , 10';
>> * A pair of column names were reserved words. "user' and 'end' in the
>> TU_VoteInfo table. I changed the column names to
>> username and ending, to avoid issues and/or having to quote the name
>> all the time.
>>
>> * postgresql table and column names are lowercase (when not quoted).
>> This required me to either quote all instances of table and
>> column names (ugh) or change the fetch_assoc php references to
>> lowercase. I chose the latter, even though it was tedious to
>> track down all instances and change them.
>>
>> * postgres has no mysql_insert_id counterpart. Instead I appended to
>> the query "RETURNING ID" and then used pg_fetch_result
>> to get the ID that was created.
>>
>> * postgres LIKE and ILIKE queries did full table scans, and was quite
>> slow. I think mysql was caching these queries when I was doing my
>> performance tests. The end result was postgres performing poorly in
>> my load test for LIKE/ILIKE operations. So I created a postgres
>> fulltext
>> search column for the packages table, indexed it, defined triggers
>> to keep it udpated, and converted to using postgres fulltext searching
>> `"@@ to_tsquery('term')"`. The result was quite speedy and gave nice
>> results. The syntax was a bit different for 'prefix searching' though,
>> and the results were not the same as the LIKE/ILIKE queries, because
>> fulltext uses stemming instead of partial string comparison.
> And case sensitivity would come into play, no?
I built the fulltext index as follows:
alter table packages add column tsv tsvector;
update packages set tsv = to_tsvector('english', coalesce(lower(name),
'') || ' ' || coalesce(lower(description), ''));
then added a trigger to update it for new data:
DROP FUNCTION IF EXISTS package_fulltext_trigger() CASCADE;
CREATE FUNCTION package_fulltext_trigger() RETURNS trigger as $$
begin
new.tsv :=
setweight(to_tsvector('english', coalesce(lower(new.name), '')), 'A') ||
setweight(to_tsvector('english',
coalesce(lower(new.description), '')), 'B');
return new;
end
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS packages_tsv_update;
CREATE TRIGGER packages_tsv_update BEFORE INSERT OR UPDATE
on packages FOR EACH ROW EXECUTE PROCEDURE package_fulltext_trigger();
>> * There is quite a bit of sql code sprinkled all over the place. It
>> might make sense to try and consolidate some of the sql to fewer
>> files.
>>
>> * It might make sense to generalize the 'mysql_real_escape_string'
>> instances to a wrapper 'db_escape' or something similar. This would
>> could
>> make things a bit cleaner, ease possible future porting to other
>> databases (fewer things to track down and change individually).
>>
>> * postgres doesn't support `"UNIX_TIMESTAMP()"`. Instead of changing
>> all instances of it, I defined a few convenience sql functions.
>>
>> --
>> -- helper functions to ease conversion from mysql
>> -- from:
>> http://janusz.slota.name/blog/2009/06/mysql-from_unixtime-and-unix_timestamp-functions-in-postgresql/
>> --
>> -- no params
>> CREATE OR REPLACE FUNCTION unix_timestamp() RETURNS BIGINT AS '
>> SELECT EXTRACT(EPOCH FROM CURRENT_TIMESTAMP(0))::bigint AS result;
>> ' LANGUAGE 'SQL';
>>
>> -- timestamp without time zone (i.e. 1973-11-29 21:33:09)
>> CREATE OR REPLACE FUNCTION unix_timestamp(TIMESTAMP) RETURNS BIGINT AS '
>> SELECT EXTRACT(EPOCH FROM $1)::bigint AS result;
>> ' LANGUAGE 'SQL';
>>
>> -- timestamp with time zone (i.e. 1973-11-29 21:33:09+01)
>> CREATE OR REPLACE FUNCTION unix_timestamp(TIMESTAMP WITH TIME
>> zone) RETURNS BIGINT AS '
>> SELECT EXTRACT(EPOCH FROM $1)::bigint AS result;
>> ' LANGUAGE 'SQL';
>>
>>
>> I think that is it.
>>
>
More information about the aur-dev
mailing list