On Thu, Sep 30, 2010 at 08:56:56PM +0200, PyroPeter wrote:
Well, but you are encoding existing entities, that are not "&" as "&foo;". See the example below.
Yep, and that's how it's supposed to be. There shouldn't be any entities that users put in the comments and that are not encoded.
I see, "$var[] = foo" creates the array $var if necessary and appends foo.
Correct.
Imo, you should split the message at the link boundaries. ( "foo ", "http://foo.bar.tld", " baz") Then you should encode the html-entities in all elements, wrap the links in <a>'s, and then join all that together.
Yes... That would be cleaner, but also way more complicated to implement and would require huge amounts of code for making links clickable.
== example 1 ==
input: "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz"
If I am not mistaken, $regex would be "/http://foo.tld/iLikeToUseApersands/foo&bar.html/msS" (are the "/" correctly escaped? I will assume they are.)
Then, $regex would be: "/http:\/\/foo\.tld\/iLikeToUseApersands\/foo&bar\.html/msS"
$comment would be set by htmlspecialchars() to: "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz"
=> preg_replace_callback() would not match, as & got replaced.
Why should it not work? preg_replace_callback() still matches if the URL contains a semicolon. This will be parsed and output a valid link (tested with current GIT version and patch applied).
You can also link to a homepage using valid URL's. The additional "feature" may be nice, but makes the code more complex. It also trains users to omit the "http://" and produces more work for devs, as they all now have to parse this invalid hostname+path stuff.
Hm, that's a question of taste. We'll let Loui decide :p
Unrelated: You seem to accept only a-zA-Z in hostnames? Or does PHP's \w include 0-9 and language-dependent letters? What about underscores?
"\w" in perl compatible regex includes all alphanumeric characters plus the underscore ("_").
Why does the <a>'s content only include the Path of the URL?
It doesn't. The "<a></a>"'s content contains excactly what the user typed (with special chars converted by htmlspecialchars()). Please don't just assume things but test your examples using a current GIT checkout with the patch applied in future.