On 09/30/2010 06:38 PM, Lukas Fleischer wrote:
On Thu, Sep 30, 2010 at 06:18:24PM +0200, PyroPeter wrote:
+ $url = str_replace('&','&', $url); + $url = str_replace('&', '&', $url);
What about the occurrences of "&(html-entity-code-here);" you produced the line before?
Nothing? Any occurrence of an HTML entity code is correctly encoded as "&". People shouldn't be able to manually insert HTML entities in comments. The first line is actually even superfluous as I realized just now since ampersands should already have been replaced by htmlspecialchars() before at the time this line is executed (didn't check that before, this part of code has been extracted from the DokuWiki plugin).
Well, but you are encoding existing entities, that are not "&" as "&foo;". See the example below.
+ $patterns[] = '(\b(?i)www?(?-i)\.[' . $host . ']+?\.[' . $host . ']+?[' . $any . ']+?(?=[' . $punc . ']*[^' . $any . ']))'; + $patterns[] = '(\b(?i)ftp?(?-i)\.['. $host . ']+?\.[' . $host . ']+?[' . $any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
I am not that experienced with PHP, but this looks like the $patterns array got replaced instead of extended.
Nope, it doesn't. Check [1].
I see, "$var[] = foo" creates the array $var if necessary and appends foo.
+ $comment = htmlspecialchars($comment);
Won't this render the next instruction useless if there are html-characters in a link?
Nope. Links need to be escaped as well. Not sure what happens if a link contains quotes or "<"/">". This shouldn't happen too often tho.
Imo, you should split the message at the link boundaries. ( "foo ", "http://foo.bar.tld", " baz") Then you should encode the html-entities in all elements, wrap the links in <a>'s, and then join all that together. I can not think of a way to connect <a>-tags with proper encoding of the user input using just "normal" string functions like replace, or regexes. I would be happy if you could prove the opposite, and will help by providing you with input that breaks your system. == example 1 == input: "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz" If I am not mistaken, $regex would be "/http://foo.tld/iLikeToUseApersands/foo&bar.html/msS" (are the "/" correctly escaped? I will assume they are.) Then, $regex would be: "/http:\/\/foo\.tld\/iLikeToUseApersands\/foo&bar\.html/msS" $comment would be set by htmlspecialchars() to: "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz" => preg_replace_callback() would not match, as & got replaced.
Generally I would not make hostnames ("www.foo.tld") clickable. If people are not able to provide proper URL's, they have a serious problem. (there is also the technical argument that the hostname is not a good indicator for the kind of service the host provides.)
Why not? What if you explicitly want to link to a project's home page? It'll also just convert hostnames if they start with a "www" or "ftp" subdomain, so comments refering to domains in other ways won't be converted.
You can also link to a homepage using valid URL's. The additional "feature" may be nice, but makes the code more complex. It also trains users to omit the "http://" and produces more work for devs, as they all now have to parse this invalid hostname+path stuff. Unrelated: You seem to accept only a-zA-Z in hostnames? Or does PHP's \w include 0-9 and language-dependent letters? What about underscores? Why does the <a>'s content only include the Path of the URL? Regards, PyroPeter -- freenode/pyropeter "12:50 - Ich drücke Return."