On Fri, Oct 01, 2010 at 06:23:06PM +0200, PyroPeter wrote:
On 10/01/2010 05:52 PM, Lukas Fleischer wrote:
This won't match URLs like "https://aur.archlinux.org/packages.php?O=0&K=" and an ampersand at the end of an URL won't be converted correctly :/ I'll try to implement it a more proper way the next days. Maybe I'll actually go with splitting comments at link boundaries as you suggested before... :)
Well, that's the problem. Which characters should belong to the end of the URL, and which should not? There could also be cases in which punctuation belongs to the URL. If punctuation is parsed as not belonging to the URL, there would be no way to post a working link to certain URLs. If punctuation is parsed as part of the URL, one could insert a space between the URL and the punctuation that should not belong to the URL. One should also consider that inserting an URL into a sentence looks horrible and is normally not done (by me, at least).
About splitting at boundaries: Contrary to what I have said before, using regular expressions seems to be a valid and efficient way. (I thought you would have to escape tag-content and attributes in different ways (percent-encoding vs. html-entities). After reading the HTML4 specification I realized this is not the case, as content and attributes are both escaped using html-entities)
Regards, PyroPeter -- freenode/pyropeter "12:50 - Ich drücke Return."
I didn't read the whole thread but as far as I understand you're searching for a proper solution how to correctly find urls in comments. John Gruber's Regex seems quite right for this: http://daringfireball.net/2010/07/improved_regex_for_matching_urls Does this help? Jan-Erik (badboy_)