[aur-dev][PATCH] Correctly handle package sources which do not validate as an url
php's parse_url does not handle proper rfc3986 URIs, specifically, it does not handle the case of an empty authority such as file:/// or local:/// and only handles the case of file by applying a special case for file itself. These URIs are deemed "malformed" and return false. When such URIs were used, we would end up always treating the package source as a filename (despite that this is incorrect, since plain files will be correctly handled by parse_url, we will correctly determine that there is no schema, and we will go to the source_file_uri). Instead, handle the case of a "malformed" URI by treating it as another example of a source with a schema, and linking it as-is. See https://lists.archlinux.org/pipermail/aur-general/2019-January/034782.html for details. Signed-off-by: Eli Schwartz <eschwartz@archlinux.org> --- This fixes the case of local:///, but are there other cases where php would claim a malformed url where we would actually want to link to cgit? Should we just be dumb like makepkg and git/update.py, and check if it has the string literal '://'? Given the other two places where a source url might be handled don't even make a pretense of being proper rfc3986 parsers, this would at least mean we're highly consistent in our behavior. web/lib/pkgfuncs.inc.php | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/lib/pkgfuncs.inc.php b/web/lib/pkgfuncs.inc.php index ced1f8e..126b5c3 100644 --- a/web/lib/pkgfuncs.inc.php +++ b/web/lib/pkgfuncs.inc.php @@ -481,7 +481,7 @@ function pkg_source_link($url, $arch, $package) { $url = explode('::', $url); $parsed_url = parse_url($url[0]); - if (isset($parsed_url['scheme']) || isset($url[1])) { + if ($parsed_url === false || isset($parsed_url['scheme']) || isset($url[1])) { $link = '<a href="' . htmlspecialchars((isset($url[1]) ? $url[1] : $url[0]), ENT_QUOTES) . '">' . htmlspecialchars($url[0]) . '</a>'; } else { $file_url = sprintf(config_get('options', 'source_file_uri'), htmlspecialchars($url[0]), $package); -- 2.20.1
On Sun, 06 Jan 2019 at 18:56:04, Eli Schwartz wrote:
php's parse_url does not handle proper rfc3986 URIs, specifically, it does not handle the case of an empty authority such as file:/// or local:/// and only handles the case of file by applying a special case for file itself. These URIs are deemed "malformed" and return false.
When such URIs were used, we would end up always treating the package source as a filename (despite that this is incorrect, since plain files will be correctly handled by parse_url, we will correctly determine that there is no schema, and we will go to the source_file_uri).
Instead, handle the case of a "malformed" URI by treating it as another example of a source with a schema, and linking it as-is.
Sorry for replying only now, this somehow slipped through the cracks. But I realized it's not yet in master, so it's probably not too late! What happens if somebody uses "javascript:alert('XSS!')" in their sources? I hope it is not converted to a link? I think we shouldn't create links for anything other than HTTP and HTTPs schemes (and maybe FTP as well). These links are just for convenience and probably not used very often. So it's likely a good idea to err on the safe side. Lukas
On 4/27/19 8:39 AM, Lukas Fleischer wrote:
On Sun, 06 Jan 2019 at 18:56:04, Eli Schwartz wrote:
php's parse_url does not handle proper rfc3986 URIs, specifically, it does not handle the case of an empty authority such as file:/// or local:/// and only handles the case of file by applying a special case for file itself. These URIs are deemed "malformed" and return false.
When such URIs were used, we would end up always treating the package source as a filename (despite that this is incorrect, since plain files will be correctly handled by parse_url, we will correctly determine that there is no schema, and we will go to the source_file_uri).
Instead, handle the case of a "malformed" URI by treating it as another example of a source with a schema, and linking it as-is.
Sorry for replying only now, this somehow slipped through the cracks. But I realized it's not yet in master, so it's probably not too late!
What happens if somebody uses "javascript:alert('XSS!')" in their sources? I hope it is not converted to a link?
I think we shouldn't create links for anything other than HTTP and HTTPs schemes (and maybe FTP as well). These links are just for convenience and probably not used very often. So it's likely a good idea to err on the safe side.
Ah... yeah, that is a pretty good point. I'd probably want to display that as straight up plaintext. It definitely should not be appended to the cgit url if it is a valid schema, though. And regarding not making a link at all (e.g. for the common git:// protocol), how would that play with renamed sources like "$pkgname::git://example.com/project-something"? I wish php had a schema validator that wasn't broken... python's urlparse cleverly handles all this nonsense, and you can just refuse to print urls with ParseResult(scheme='javascript',...). Maybe we should do string comparisons to reject javascript schemes? Is there anything else which matters in this context? -- Eli Schwartz Bug Wrangler and Trusted User
On Sat, 27 Apr 2019 at 23:49:11, Eli Schwartz wrote:
Ah... yeah, that is a pretty good point. I'd probably want to display that as straight up plaintext.
It definitely should not be appended to the cgit url if it is a valid schema, though. And regarding not making a link at all (e.g. for the common git:// protocol), how would that play with renamed sources like "$pkgname::git://example.com/project-something"?
What's the problem with that? We always remove the "$pkgname::" prefix. And then, this source would be shown as "git://example.com/project-something" without linking to anything.
I wish php had a schema validator that wasn't broken... python's urlparse cleverly handles all this nonsense, and you can just refuse to print urls with ParseResult(scheme='javascript',...). Maybe we should do string comparisons to reject javascript schemes? Is there anything else which matters in this context?
We could do that. But why blacklist instead of whitelist for such a minor feature? I suggest to convert local links to cgit URLs, convert HTTP/HTTPs/FTP to absolute links, and display everything else in plain text. Lukas
On 4/28/19 8:26 AM, Lukas Fleischer wrote:
On Sat, 27 Apr 2019 at 23:49:11, Eli Schwartz wrote:
Ah... yeah, that is a pretty good point. I'd probably want to display that as straight up plaintext.
It definitely should not be appended to the cgit url if it is a valid schema, though. And regarding not making a link at all (e.g. for the common git:// protocol), how would that play with renamed sources like "$pkgname::git://example.com/project-something"?
What's the problem with that?
We always remove the "$pkgname::" prefix. And then, this source would be shown as "git://example.com/project-something" without linking to anything.
We could do as you say. But currently, we are setting the link text to "$pkgname", which I guess would not have an analogue for text-only sources without an href. It would be a touch inconsistent.
I wish php had a schema validator that wasn't broken... python's urlparse cleverly handles all this nonsense, and you can just refuse to print urls with ParseResult(scheme='javascript',...). Maybe we should do string comparisons to reject javascript schemes? Is there anything else which matters in this context?
We could do that. But why blacklist instead of whitelist for such a minor feature? I suggest to convert local links to cgit URLs, convert HTTP/HTTPs/FTP to absolute links, and display everything else in plain text.
I don't know. :D It seems like it would be reasonable to go either way: fix some current urls, or drop support for all the irregular ones. Do we have any statistics on whether people actually use different urls for anything? I mean, I suppose in theory one could have a custom scheme handler for git:// which would open in some GUI shell for git repositories. But that be sort of reaching for an excuse. -- Eli Schwartz Bug Wrangler and Trusted User
participants (2)
-
Eli Schwartz
-
Lukas Fleischer