On 07-04-2024 13:10, Jan Alexander Steffens (heftig) wrote:
Hi Arvid,
Thanks for bringing this issue to my attention and your detailed email about it. I'm CCïng our public development mailing list in this response so our other maintainers get informed, too.
I agree that Arch needs a solution for this eventually. Unlike Fedora we do not package Rust libraries so I think we need some help from Cargo for this. Preferably from upstream, but a third-party tool would work as well.
Ideally, I think there we would create a SPDX license expression from the entire crate tree and then simplify it, e.g. to turn `(MIT) AND (MPL-2.0 OR MIT) AND (MIT AND BSD-2-Clause) AND (MPL-2.0 OR BSD-3-Clause)` into `MIT AND BSD-2-Clause AND (MPL-2.0 OR BSD-3-Clause)`. Or perhaps even simpler if the tool had knowledge about which licenses are covered by others.
We could call such a tool in the `package()` function to set the `license` for the package.
I'm not sure how feasible this would be. Are crates required to use SPDX expressions?
Greetings, Jan
Hey, Replying on the general mailing list since the dev list is staff only. The license field of the pacman package is actually only secondary concern. Many libraries have a license that requires shipping the copyright information along with binary distributions (such as MIT and BSD licenses). This is more than just the name or SPDX identifier of the license. Usually, it is included at the top of a license file and it would look like this:
Copyright (c) 2024, Maarten de Vries
There are tools to help with this: https://crates.io/crates/cargo-bundle-licenses https://crates.io/crates/cargo-lichking Personally I think having incomplete SPDX identifier in the pacman package is not in itself a license violation as long as the individual license files are shipped with the package. Although it would certainly be nice for tooling if the package information is complete too. Kind regards, Maarten de Vries
Hi,
Replying on the general mailing list since the dev list is staff only.
tried to reply to arch-dev-public earlier, that explains why it didn’t work.
Personally I think having incomplete SPDX identifier in the pacman package is not in itself a license violation as long as the individual license files are shipped with the package. Although it would certainly be nice for tooling if the package information is complete too.
I think having the licenses of all dependencies in the license field is (1) a lot of clutter and (2) not what I would expect. If I want to check under which license linux is released, the result $ pacman -Si linux ... Licenses : GPL-2.0-only ... is a lot more useful (to me) than $ pacman -Si linux-lts ... Licenses : Apache-2.0 OR MIT BSD-2-Clause OR GPL-2.0-or-later BSD-3-Clause BSD-3-Clause OR GPL-2.0-only BSD-3-Clause OR GPL-2.0-or-later BSD-3-Clause-Clear GPL-1.0-or-later GPL-1.0-or-later OR BSD-3-Clause GPL-2.0-only GPL-2.0-only OR Apache-2.0 GPL-2.0-only OR BSD-2-Clause GPL-2.0-only OR BSD-3-Clause GPL-2.0-only OR CDDL-1.0 GPL-2.0-only OR Linux-OpenIB GPL-2.0-only OR MIT GPL-2.0-only OR MPL-1.1 GPL-2.0-only OR X11 GPL-2.0-only WITH Linux-syscall-note GPL-2.0-or-later GPL-2.0-or-later OR BSD-2-Clause GPL-2.0-or-later OR BSD-3-Clause GPL-2.0-or-later OR MIT GPL-2.0-or-later OR X11 GPL-2.0-or-later WITH GCC-exception-2.0 ISC LGPL-2.0-or-later LGPL-2.1-only LGPL-2.1-only OR BSD-2-Clause LGPL-2.1-or-later MIT MPL-1.1 X11 Zlib ... (though I’m not sure why they differ) Best regards, tippfehlr
On Sun, Apr 7, 2024, at 12:42 PM, tippfehlr wrote:
Hi,
Replying on the general mailing list since the dev list is staff only.
tried to reply to arch-dev-public earlier, that explains why it didn’t work.
Personally I think having incomplete SPDX identifier in the pacman package is not in itself a license violation as long as the individual license files are shipped with the package. Although it would certainly be nice for tooling if the package information is complete too.
I think having the licenses of all dependencies in the license field is (1) a lot of clutter and (2) not what I would expect.
If I want to check under which license linux is released, the result
$ pacman -Si linux ... Licenses : GPL-2.0-only ...
is a lot more useful (to me) than
$ pacman -Si linux-lts ... Licenses : Apache-2.0 OR MIT BSD-2-Clause OR GPL-2.0-or-later BSD-3-Clause BSD-3-Clause OR GPL-2.0-only BSD-3-Clause OR GPL-2.0-or-later BSD-3-Clause-Clear GPL-1.0-or-later GPL-1.0-or-later OR BSD-3-Clause GPL-2.0-only GPL-2.0-only OR Apache-2.0 GPL-2.0-only OR BSD-2-Clause GPL-2.0-only OR BSD-3-Clause GPL-2.0-only OR CDDL-1.0 GPL-2.0-only OR Linux-OpenIB GPL-2.0-only OR MIT GPL-2.0-only OR MPL-1.1 GPL-2.0-only OR X11 GPL-2.0-only WITH Linux-syscall-note GPL-2.0-or-later GPL-2.0-or-later OR BSD-2-Clause GPL-2.0-or-later OR BSD-3-Clause GPL-2.0-or-later OR MIT GPL-2.0-or-later OR X11 GPL-2.0-or-later WITH GCC-exception-2.0 ISC LGPL-2.0-or-later LGPL-2.1-only LGPL-2.1-only OR BSD-2-Clause LGPL-2.1-or-later MIT MPL-1.1 X11 Zlib ...
(though I’m not sure why they differ)
Best regards, tippfehlr
*Attachments:* • signature.asc
I agree with this. The "license" of the package isn't the collection of licenses that make up the software along with all of its libraries, it's the license of the software itself. Including the license of all the libraries in the "license" field would just muddy the waters and make that field effectively useless. What could be done, IMO, is that all of the relevant licenses and copyright notices be included in the licenses directory for that package, for instance: /usr/share/licenses/<package name>/LICENSE_<library name> Based on my knowledge and readings of open source licenses (I am not a lawyer and this is not legal advice), this should satisfy the majority if not all conditions of binary distribution of licenses. However, I do think that the rust/cargo maintainers need to have some skin in the game here (along with go, nodejs, and similar languages) and have some way of dumping the licenses of dependencies. When you have a packaging system that makes it easy to pull in hundreds of dependencies, there should be an easy way of checking what those licenses are anyway because you could otherwise end up in a bad situation.
On 04/08/24 at 05:45am, Ryan Petris wrote:
On Sun, Apr 7, 2024, at 12:42 PM, tippfehlr wrote:
Hi,
Replying on the general mailing list since the dev list is staff only.
tried to reply to arch-dev-public earlier, that explains why it didn’t work.
Personally I think having incomplete SPDX identifier in the pacman package is not in itself a license violation as long as the individual license files are shipped with the package. Although it would certainly be nice for tooling if the package information is complete too.
I think having the licenses of all dependencies in the license field is (1) a lot of clutter and (2) not what I would expect.
If I want to check under which license linux is released, the result
$ pacman -Si linux ... Licenses : GPL-2.0-only ...
is a lot more useful (to me) than
$ pacman -Si linux-lts ... Licenses : Apache-2.0 OR MIT BSD-2-Clause OR GPL-2.0-or-later BSD-3-Clause BSD-3-Clause OR GPL-2.0-only BSD-3-Clause OR GPL-2.0-or-later BSD-3-Clause-Clear GPL-1.0-or-later GPL-1.0-or-later OR BSD-3-Clause GPL-2.0-only GPL-2.0-only OR Apache-2.0 GPL-2.0-only OR BSD-2-Clause GPL-2.0-only OR BSD-3-Clause GPL-2.0-only OR CDDL-1.0 GPL-2.0-only OR Linux-OpenIB GPL-2.0-only OR MIT GPL-2.0-only OR MPL-1.1 GPL-2.0-only OR X11 GPL-2.0-only WITH Linux-syscall-note GPL-2.0-or-later GPL-2.0-or-later OR BSD-2-Clause GPL-2.0-or-later OR BSD-3-Clause GPL-2.0-or-later OR MIT GPL-2.0-or-later OR X11 GPL-2.0-or-later WITH GCC-exception-2.0 ISC LGPL-2.0-or-later LGPL-2.1-only LGPL-2.1-only OR BSD-2-Clause LGPL-2.1-or-later MIT MPL-1.1 X11 Zlib ...
(though I’m not sure why they differ)
Best regards, tippfehlr
*Attachments:* • signature.asc
I agree with this.
The "license" of the package isn't the collection of licenses that make up the software along with all of its libraries, it's the license of the software itself. Including the license of all the libraries in the "license" field would just muddy the waters and make that field effectively useless.
I would argue the exact opposite. The package is a separate product from the software it packages. If multiple projects are being bundled into the package, the license of the package is not as simple as just the license of the primary upstream project. If a user is actually concerned about the license of the software they install on their machine, omitting relevant licenses because they don't apply directly to the primary upstream obfuscates that information. apg
Could someone forward the contents of the original message? Thanks. -- Cheers, Aᴀʀᴏɴ
Here is the original email sent to arch-dev-public: Forwarded message from Jan Alexander Steffens (heftig) on Sun Apr 7, 2024 at 1:10 PM: ---snip--- On Sat, Apr 6, 2024 at 10:42 PM Arvid Norlander <arvid@vorpal.se> wrote:
Hi,
After talking to people on Arch Linux IRC channels (mpan in particular) about this, they recommended I contact you directly about this, since it affects many packages (so filing a bug on any specific package wasn't really appropriate).
Arch only packages the final binary crates for Rust (as opposed to separate packages for every single Rust dependency, which would rightfully drive people crazy). As a result, you only get one single package that contains many libraries.
Lets take an example: ripgrep, which links to ~53 dependencies (maybe less, not all dependencies are used on all platforms, etc). Many of these dependencies use MIT license or similar that requires:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
Looking in /usr/share/licenses/ripgrep I only see the copyright notice for ripgrep itself. This is a bit of a problem.
Similarly, one of it's dependencies is unicode-ident which uses "Unicode-DFS-2016" as it's license. That is missing. encoding_rs is "(Apache-2.0 OR MIT) AND BSD-3-Clause" (which may collapse down to MIT, not sure, I am not a lawyer).
So I can see two problems here:
1. Copyright notices that need to be included for dependencies, aren't. 2. The SPDX expression may be incorrect (missing some things from dependencies that should be included).
Looking at some other packages (not just ripgrep) I see similar issues across all of those packages.
Now, for something like rust it would be impossible to require the maintainers to handle this by hand. So lets talk solutions.
* There is cargo-about (already packaged in Arch, also has suspect license/copyright info). It doesn't do quite what you want: Given a config and a handlebar template it will generate an HTML page with all the licenses you use. It will collapse "OR" in the license info based on a priority list of which licenses you prefer.
You could perhaps wrangle the handlebar template to generate a text file instead of a HTML file, not sure. It can also output a JSON file instead though (I have used that at my dayjob for license compliance).
While it is a good solution, it is not a drop-in solution without additional scripting on top (and you need to specify accepted licenses for your project).
* I believe Fedora has some automated tooling based on this comment by a Fedora packager: https://users.rust-lang.org/t/psa-check-if-your-cargo-crates-are-clean-and-t...
That is for the SPDX expression. I have not looked into the details of their tooling.
* Debian probably have thought about this too, they tend to be rather careful (some may say up-tight even) about this sort of issues. It might be worth checking out what they did.
I'd like some automated tooling that works for AUR too, I maintain some Rust packages there (some of which I'm also the upstream for). For that reason, I'd be happy to stay "in the loop" on this issue as well as possibly help (time and energy permitting, due to recently recovering from burnout, energy tends to vary on a day to day basis), rather than treating it as a one-off issue report.
Is it a high priority issue? No. Should it be solved eventually? Yes, for legal reasons.
Best regards, Arvid Norlander (Arch Linux user and Rust / C++ software developer)
Hi Arvid, Thanks for bringing this issue to my attention and your detailed email about it. I'm CCïng our public development mailing list in this response so our other maintainers get informed, too. I agree that Arch needs a solution for this eventually. Unlike Fedora we do not package Rust libraries so I think we need some help from Cargo for this. Preferably from upstream, but a third-party tool would work as well. Ideally, I think there we would create a SPDX license expression from the entire crate tree and then simplify it, e.g. to turn `(MIT) AND (MPL-2.0 OR MIT) AND (MIT AND BSD-2-Clause) AND (MPL-2.0 OR BSD-3-Clause)` into `MIT AND BSD-2-Clause AND (MPL-2.0 OR BSD-3-Clause)`. Or perhaps even simpler if the tool had knowledge about which licenses are covered by others. We could call such a tool in the `package()` function to set the `license` for the package. I'm not sure how feasible this would be. Are crates required to use SPDX expressions? Greetings, Jan
Hello. Before this hit mails/MLs, I had a talk with Arvid in #archlinux-offtopic, where the issue was first mentioned, finally suggesting to to mail heftig directly. Two points from that talk. First. I believe the “/usr/share/licenses” part is both more important and easier to solve. The importance comes from many licenses requiring keeping the copyright notice (or other form of attribution). A solution may be as simple as concatenating licenses from deps into a single file. Second. The `license` array on the other hand has no such requirement. In most cases it may be the same as upstream’s declared license. If they use a given dependency, they’re already required to adjust their own license to match. If there is a mismatch, it’s still best if it’s fixed at the upstream, not by the distro. The exception is a situation where source uses an API, the API has multiple equivalent implementations, and Arch package maintainer is choosing one of them. In this case it’s IMO maintainer’s job to attach the right license to the `license` array, as the upstream couldn’t do that. But I’m not sure this actually happens with Rust deps discussed here. Blindly combining license identifiers is also suboptimal and, as tippfehlr noted, leads to a meaningless mess. Collapsing the graph is theoretically possible, but isn’t trivial. With OR clauses one of the options has to be chosen (which? by whom?). It all depends on specific license’s language (hard to automatize; sure, Foo is compatible with Foo, but is Foo compatible with GPL-2.1-only?) So I would leave that part for later. Cheers
Ey, I don't know much about the kinks of this, but maybe we could put licenses for each library in their own folders and symlink to these folders in the package's license folder, all while keeping the parent package's license field the same? I feel like there are probably a lot of issues in this solution, but I haven't thought of any. Cheers, Aᴀʀᴏɴ
Hello,
I don't know much about the kinks of this, but maybe we could put licenses for each library in their own folders and symlink to these folders in the package's license folder, all while keeping the parent package's license field the same? I feel like there are probably a lot of issues in this solution, but I haven't thought of any.
Sounds like a horrible idea, you would need a package per licence... I dub it "package sprawling". Question: Why can't rust libraries be built from source and installed as shared objects, and then dynamically linked against? I know its not "the rust way" (Which roughly translates to "trying to b break every convention") but it solves every single issue here... I do wonder how many rustls duplicates there are in an oxidised Arch Linux installation, I am aware the licence is truncated and only what is needed is linked into the executable. Anyways if each library is built from source, each licence has its respective package, it integrates right into the build system, no issues, flawless. But I assume "cargo doesn't allow this". I would also like to say this issue exists for Java packages too, I spoke to Artafinde about it and I was told that its not the responsibility of the developer to ensure the program correctly attributes the library authors. So I dropped the topic and only worried about the program and not the dependencies. One way of doing it (which I heard of some codebases doing) is to append all the dependency licences into a single file "DEPENDENCYLICENSES" or "3RDPARTYLICENSES", a lot of android apps do this and then spit out the file in a "licence" screen, I have seen proprietary products do this as well, I believe Discord has a file on their website with all the attribution. Simply install this next to the LICENCE in /usr/share/licenses and all is solved. Although then you would need to stick this in the licenses array, which will again cause sprawling, but this time on your screen. So possible implementation of a "dependency_licenses" array, and then that can be a minimised list or a second page <package url>/dependency-licenses as an example. Although this then needs to be coded into the Arch build system... which isn't ideal either. If rust is going to get looked at for the licence issue, I kindly ask for the same to be done with Java, it would be nice to stop uberjaring and instead package all the dependencies from source (also good for repro no? and keeping dependencies up to date, which would be useful for log4j-like exploits). Java compiles fast too so shouldn't be a huge burden. The issue again discussed with Artafinde is dependency compatibility, lots of Java codebases use old versions of libraries, meaning multiple versions of say slf4j-api would need to be packaged, which is a headache, this would also need tooling to load each classpath of the dependencies... so also not an easy solution. Apologies I went offtopic, TL;DR this issue isn't exclusive to rust, and there doesn't seem to be any "good" solution apart from the standard... compiling all dependencies from source and dynamic linking to them, and installing appropriate licences for each dependency. Questions: - Is this solution worth the manpower? - Has Arch ever been sued or hit with legal action over attribution? - Is it upstreams responsibility to attribute the dependencies? - Does Arch have the manpower to undergo any solution to this problem? Take care, -- Polarian GPG signature: 0770E5312238C760 Website: https://polarian.dev JID/XMPP: polarian@icebound.dev
Hey, On 08/04/2024 19:11, Polarian wrote:
<SNIP>
One way of doing it (which I heard of some codebases doing) is to append all the dependency licences into a single file "DEPENDENCYLICENSES" or "3RDPARTYLICENSES", a lot of android apps do this and then spit out the file in a "licence" screen, I have seen proprietary products do this as well, I believe Discord has a file on their website with all the attribution. Simply install this next to the LICENCE in /usr/share/licenses and all is solved. Although then you would need to stick this in the licenses array, which will again cause sprawling, but this time on your screen. So possible implementation of a "dependency_licenses" array, and then that can be a minimised list or a second page <package url>/dependency-licenses as an example. Although this then needs to be coded into the Arch build system... which isn't ideal either.
I don't see why you would need to stick this in the license array. Putting it in /usr/share/licenses is good enough for compliance with the license requirements.
<SNIP> Questions: - Is this solution worth the manpower? Personally I doubt it. A Rust project can easily explode into 300 dependencies. For somewhat bigger ones I have seen around 500~600 too. I suppose in the end many projects will share a large subset of those, but quite possibly at different versions. And for me more importantly, dynamic linking is not the supported way of building the software. Who knows what kind of edge cases you would run in to. The upstream developers will certainly not have done that.
- Has Arch ever been sued or hit with legal action over attribution? A quick search and my memory seems to suggest not. That is not a good reason to knowingly ignore license compliance though.
- Is it upstreams responsibility to attribute the dependencies? No, because upstream doesn't ship the dependencies. Arch Linux ships the dependencies, so Arch Linux must do the attribution.
- Does Arch have the manpower to undergo any solution to this problem?] I think so. Using `cargo-about` or `cargo-bundle-licenses` is pretty easy and solves the problem of license compliance.
Kind regards, Maarten de Vries
participants (8)
-
Aaron Liu
-
Andrew Gregory
-
Lime In a Jacket (Aaron Liu)
-
Maarten de Vries
-
mpan
-
Polarian
-
Ryan Petris
-
tippfehlr