On Mon, Apr 25, 2011 at 7:49 PM, Dan McGee <dpmcgee@gmail.com> wrote:
On Mon, Apr 25, 2011 at 9:35 PM, elij <elij.mx@gmail.com> wrote:
While converting the aur codebase from mysql to postgresql (just for fun/to see if I could), I ran into a few mysql specific gotchas, and a few oddities. Here are some of my notes from the experience.
Mind sending the diff along for funzies? Even if we can't apply it as a patch, it would be good to see and maybe chop up into "this will be compatible with all DBs so we should fix it anyway" and "if we switch we will need to look into this".
Sure. I will probably just do a git-format patch and dump the patches into a tarball and upload it somewhere, since it is quite large, and not exactly 'split into single change patches'. ;)
* I ran into a few sql integer comparisons that are using string comparisons. postgres didn't like this, but mysql was fine with it ex: `$q .= "WHERE s.UsersId = '" . $userID . "' ";` Should definitely be fixed anyway...awesome.
Yeah. Some of these fixes in the above patchset are wrapped up in other things, but I can try to split out the "these can be fixed regardless" and "there were fixes to get postgres working".
* using the limit offset format of `"LIMIT x,y"` postgresql doesn't support this format, so I had to track each occurrence down and change it to `"Limit x OFFSET y"` This is SQL standard, so you could use this form anyway, correct? Of course the MySQL docs claim this: For compatibility with PostgreSQL, MySQL also supports the LIMIT row_count OFFSET offset syntax.
Yeah. LIMIT w/OFFSET is sql standard. Not sure why mysql had to create their own limit/offset standard, but well.. yeah.
* A few instances of `"LIMIT 0,10"`. A `"LIMIT 10"` would have been sufficient. Not sure why offset was specified in these few instances. Sure it wasn't auto-generated, in which case it is easier for the DB query analyzer to just throw it away anyway?
It looks like there were just two instances. web/html/rss.php: $q = "SELECT * FROM Packages "; $q.= "WHERE DummyPkg != 1 "; $q.= "ORDER BY SubmittedTS DESC "; $q.= "LIMIT 0 , 20"; web/lib/stats.inc: $q = 'SELECT * FROM Packages WHERE DummyPkg != 1 ORDER BY GREATEST(SubmittedTS,ModifiedTS) DESC LIMIT 0 , 10';
* A pair of column names were reserved words. "user' and 'end' in the TU_VoteInfo table. I changed the column names to username and ending, to avoid issues and/or having to quote the name all the time.
* postgresql table and column names are lowercase (when not quoted). This required me to either quote all instances of table and column names (ugh) or change the fetch_assoc php references to lowercase. I chose the latter, even though it was tedious to track down all instances and change them.
* postgres has no mysql_insert_id counterpart. Instead I appended to the query "RETURNING ID" and then used pg_fetch_result to get the ID that was created.
* postgres LIKE and ILIKE queries did full table scans, and was quite slow. I think mysql was caching these queries when I was doing my performance tests. The end result was postgres performing poorly in my load test for LIKE/ILIKE operations. So I created a postgres fulltext search column for the packages table, indexed it, defined triggers to keep it udpated, and converted to using postgres fulltext searching `"@@ to_tsquery('term')"`. The result was quite speedy and gave nice results. The syntax was a bit different for 'prefix searching' though, and the results were not the same as the LIKE/ILIKE queries, because fulltext uses stemming instead of partial string comparison. And case sensitivity would come into play, no?
I built the fulltext index as follows: alter table packages add column tsv tsvector; update packages set tsv = to_tsvector('english', coalesce(lower(name), '') || ' ' || coalesce(lower(description), '')); then added a trigger to update it for new data: DROP FUNCTION IF EXISTS package_fulltext_trigger() CASCADE; CREATE FUNCTION package_fulltext_trigger() RETURNS trigger as $$ begin new.tsv := setweight(to_tsvector('english', coalesce(lower(new.name), '')), 'A') || setweight(to_tsvector('english', coalesce(lower(new.description), '')), 'B'); return new; end $$ LANGUAGE plpgsql; DROP TRIGGER IF EXISTS packages_tsv_update; CREATE TRIGGER packages_tsv_update BEFORE INSERT OR UPDATE on packages FOR EACH ROW EXECUTE PROCEDURE package_fulltext_trigger();
* There is quite a bit of sql code sprinkled all over the place. It might make sense to try and consolidate some of the sql to fewer files.
* It might make sense to generalize the 'mysql_real_escape_string' instances to a wrapper 'db_escape' or something similar. This would could make things a bit cleaner, ease possible future porting to other databases (fewer things to track down and change individually).
* postgres doesn't support `"UNIX_TIMESTAMP()"`. Instead of changing all instances of it, I defined a few convenience sql functions.
-- -- helper functions to ease conversion from mysql -- from: http://janusz.slota.name/blog/2009/06/mysql-from_unixtime-and-unix_timestamp... -- -- no params CREATE OR REPLACE FUNCTION unix_timestamp() RETURNS BIGINT AS ' SELECT EXTRACT(EPOCH FROM CURRENT_TIMESTAMP(0))::bigint AS result; ' LANGUAGE 'SQL';
-- timestamp without time zone (i.e. 1973-11-29 21:33:09) CREATE OR REPLACE FUNCTION unix_timestamp(TIMESTAMP) RETURNS BIGINT AS ' SELECT EXTRACT(EPOCH FROM $1)::bigint AS result; ' LANGUAGE 'SQL';
-- timestamp with time zone (i.e. 1973-11-29 21:33:09+01) CREATE OR REPLACE FUNCTION unix_timestamp(TIMESTAMP WITH TIME zone) RETURNS BIGINT AS ' SELECT EXTRACT(EPOCH FROM $1)::bigint AS result; ' LANGUAGE 'SQL';
I think that is it.