[aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

Lukas Fleischer archlinux at cryptocrack.de
Thu Sep 30 19:05:47 EDT 2010


On Thu, Sep 30, 2010 at 08:56:56PM +0200, PyroPeter wrote:
> Well, but you are encoding existing entities, that are not "&" as
> "&foo;". See the example below.

Yep, and that's how it's supposed to be. There shouldn't be any entities
that users put in the comments and that are not encoded.

> I see, "$var[] = foo" creates the array $var if necessary and appends
> foo.

Correct.

> Imo, you should split the message at the link boundaries.
> ( "foo ", "http://foo.bar.tld", " baz")
> Then you should encode the html-entities in all elements, wrap the links
> in <a>'s, and then join all that together.

Yes... That would be cleaner, but also way more complicated to implement
and would require huge amounts of code for making links clickable.

> == example 1 ==
> 
> input: "foo http://foo.tld/iLikeToUseApersands/foo&bar.html baz"
> 
> If I am not mistaken, $regex would be
> "/http://foo.tld/iLikeToUseApersands/foo&bar.html/msS"
> (are the "/" correctly escaped? I will assume they are.)
> 
> Then, $regex would be:
> "/http:\/\/foo\.tld\/iLikeToUseApersands\/foo&bar\.html/msS"
> 
> $comment would be set by htmlspecialchars() to:
> "foo http://foo.tld/iLikeToUseApersands/foo&amp;bar.html baz"
> 
> => preg_replace_callback() would not match, as & got replaced.

Why should it not work? preg_replace_callback() still matches if the URL
contains a semicolon. This will be parsed and output a valid link
(tested with current GIT version and patch applied).

> You can also link to a homepage using valid URL's. The additional
> "feature" may be nice, but makes the code more complex. It also
> trains users to omit the "http://" and produces more work for devs,
> as they all now have to parse this invalid hostname+path stuff.

Hm, that's a question of taste. We'll let Loui decide :p

> Unrelated: You seem to accept only a-zA-Z in hostnames? Or does
> PHP's \w include 0-9 and language-dependent letters? What about
> underscores?

"\w" in perl compatible regex includes all alphanumeric characters plus
the underscore ("_").

> Why does the <a>'s content only include the Path of the URL?

It doesn't. The "<a></a>"'s content contains excactly what the user
typed (with special chars converted by htmlspecialchars()).

Please don't just assume things but test your examples using a current
GIT checkout with the patch applied in future.


More information about the aur-dev mailing list