Commons:Village pump/Proposals/Archive/2019/03

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

RfD for registered users only?

I wish to propose a small change to the rules for deletion. Looking at RfDs like this that look to me like more a revenge RfD by a German registered user that acts as logged out user (it's just an example: here is plenty of RfD raised by IPs), can we improve the policy stating that only registered and autoconfirmed users can open RfDs? All in all it takes a registered user to upload, thus I don't see why also the RdD shouldn't be done by an user who should be accountable in front of the community and not anonymous. -- SERGIO (aka the Blackcat) 10:48, 27 March 2019 (UTC)

Wikimedia Commons text contributions to remain CC-BY-SA by policy

Proposal

Wikimedia Commons text contributions are given as CC-BY-SA. This includes all non-trivial contributions.

Trivial contributions may be presumed to be donated as CC0, such as button click choices of related Wikidata items or the data for drawing boxes for annotations. Trivial text of either a purely technical nature or very short "facts" in metadata format, may be scraped as CC0 data where the text is legally ineligible for copyright, and contains no subjective, creative or personal content. Any text as complex as a sentence, descriptive title, or a subjective list, must be presumed to be non-trivial unless independently contributed from a CC0 or public domain source.

Background

The standard Commons page footer on copyright for contributions was recently changed, fundamentally changing the respect for attribution of user text contributions that was established for Wikimedia Commons when it was launched. A consequence of this change is that all "structured" user entered text may be mass scraped and commercially reused without attribution, which is most likely to be quickly monetized by large commercial internet reusers such as Google and Amazon. There has been no proposal or associated community consensus to support the change, this proposal corrects that gap.

Thanks -- (talk) 17:45, 28 March 2019 (UTC)

Votes (CC-BY-SA text contributions by policy)

Titles and captions can and do have copyrightable content, there is currently nothing to stop this happening. -- (talk) 19:55, 1 April 2019 (UTC)
+1 and if all are really PD, we wouldn't need the license CC-0 --Habitator terrae 🌍 20:58, 1 April 2019 (UTC)
I definitely agree that captions and titles can contain copyrighted text, but that's the exception not the rule (and why we have the CC0 license just in case). Giving our structured data a more rigid license would just cause confusion and headaches for no good reason, IMO. The whole idea behind structured data is to provide metadata that can easily be reused along-side our media in a wide variety of contexts. Forcing re-users to credit not only the media author, but every author of every piece of displayed metadata would be counterproductive, especially in mobile contexts where space is limited. (Note that this is just my opinion as a volunteer and carries no official weight.) Kaldari (talk) 21:38, 1 April 2019 (UTC)
This is off topic, but could you provide the analysis somewhere else that CC0 text under the banner of 'structured data' is not being used to abuse the moral rights of contributors in the same proportion as copyvios that exist for any other type of contribution to this project? I did not believe that such an analysis existed. Thanks -- (talk) 09:10, 2 April 2019 (UTC)

Discussion of CC-BY-SA text contributions by policy

Please add discussion here rather than risking taking the votes on tangents. Thanks -- (talk) 17:45, 28 March 2019 (UTC)

  • Every contribution here could conceivably already be monetized by Big Tech, especially any files that are CC-0. As I note above, if your actual concern is that Big Tech could monetize our content aggressively and in a predatory fashion, then Commons should discuss hosting noncommercial-only files and adopt a viewpoint similar to the late Aaron Swartz. Since we are named Commons and we care about "the sum of human knowledge", I am open to what structured data can bring. (Although the implementation has been a bit poor with the confusing disclosures, petty bickering over trivial matters like this, and with depicts testing somehow creating a nasty bug that had to be fixed yesterday) I understand the desire to keep Commons stable, but Commons doesn't exist in hermetically sealed environment, it should work well technically and socially with sister projects. (Obviously certain deletionist traits like selfiephobia & overenforcement of URAA can be hindrance to the latter) Abzeronow (talk) 19:36, 28 March 2019 (UTC)
The big difference is harvesting volunteer contributions without attribution. This project has always allowed commercial reuse, with attribution. So if Alexa starts using your descriptions of paintings from harvesting Wikimedia Commons, you will at least get your moral right of attribution. -- (talk) 19:39, 28 March 2019 (UTC)
In practice, attribution of volunteer contributions is only practical as far as attributing the uploader of files since attributing descriptions either requires deep diving into the listed source if applicable or deep diving into the history tab since we don't sign when we make description changes. Also, personally, I don't really care if someone attributes my contributions or not, I just want pages to be useful. Abzeronow (talk) 19:50, 28 March 2019 (UTC)
Great that you don't care about your contributions. However, everyone else released theirs on the understanding that they will always be CC-BY-SA. -- (talk) 19:56, 28 March 2019 (UTC)

In my opinion, every license that has no Copyleft limits the possibility to work with data, what we want to do. Copyleft potentially could safe the data in the freedom, so that we get more and more free content everywhere. Habitator terrae 🌍 19:22, 29 March 2019 (UTC)

  •  Moral question, what is wrong with others making a profit off of our works? Let's use an example of an image I made being used by Breitbart News (Internet Archive) which is a right-wing medium while I myself am far-left politically, my image is now being used on a page with advertisements. I have take no issue with this as the image was donated, in fact I experience a level of pride in it, but of course this image is fairly attributed but it's in public. Now what would I gain from the fact that a database would use some short descriptions attributed or unattributed? Reckon that most of these databases aren't public and that the CC-0 (zero) license might be beneficial for them.
But here is a better question, whenever you save a line of text Wikimedia tells you that if you click "save" that simply a link to the original page is "attribution" and what website doesn't link? Google always links to their sources, so companies will probably just treat the text the same as they would anyhow. I fail to see the benefits of using a more restrictive license. Though I am open to change my mind on the issue. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:26, 31 March 2019 (UTC)
Sorry, but you missunderstood the question: Should we change the license from CC-0 to CC-BY-SA, not change to NC! That means only, that if this work is used the author must be attributed and that it remain into this free license. It could be used commercial. Habitator terrae 🌍 08:51, 31 March 2019 (UTC)
I didn't misunderstood the question, I just badly worded my question as I changed my mind on how to phrase it halfway in writing it and didn't bother removing the old stuff, to put it simply, what benefits would we have from being attributed? Because that's the only actual difference, also a company would have less incentive building a database based on our works if they have to redistribute it under the same or a similar license. As much I don't like to see those databases be copyrighted having a financial incentive to create them will be more motivational to start them in the first place. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 09:42, 31 March 2019 (UTC)

Fae wrote "So if Alexa starts using your descriptions of paintings from harvesting Wikimedia Commons, you will at least get your moral right of attribution.". No, you won't. I can ask Alexa about topics I've written on Wikipedia and Alexa will read out the first few sentences of the article. She does not say "This is is what Colin wrote on Wikipedia... " even though all those words are mine. She will attribute "Here's what I found on Wikipedia". While the screen UI of smart AI devices (Google, Amazon, MS, Apple) may display a Wikipedia link that suffices for "attribution", there's no way the spoken UI is going to read out the contributors, the licence name and terms or give a url. Wikipedia is a collaborative editing project so the attribution is complex. Mostly we've thought only about the media on Commons, which usually only has one creator/owner deserving attribution. If titles and other fields are going to be more heavily and collaboratively edited in future, then I reckon attribution will also get diminished to a hyperlink somewhere just like it does for Wikipedia. I don't think the issue of spoken attribution has been addressed legally, and I don't think most users of such devices would appreciate specific personal attribution. Attribution of metadata like title, categories, tags, subject, location, is probably in a similarly murky area.

I think being concerned about Google, Amazon, MS, Apple monetising Wikipedia and Commons content is a bit weird. I can enjoy Wikipedia or photos on Commons using my desktop PC. Is my Dell monitor monetising Commons images because it display's them for free? Perhaps my Firefox browser is? Or my broadband provider? I'd like Google to do a better job of attributing images in it search results, which is trivially easy for Commons images. I don't really see how it could attribute (more specifically than "Wikipedia" or "Commons") in brief spoken communication. I think using copyright licence terms to solve this is probably too clumsy a tool.

Take File:Bronze Age Dagger (FindID 424034).jpg as an example. The description appears scraped from this source however there is no formal record attributing the text separate from the file. On the Finds.org website, the image is clearly CC BY-SA 4.0. The text is less clear but possibly CC BY 3.0 according to a link on the bottom right of the page. The Commons file mentions CC BY-SA 2.0 which is a different version. The Commons file history does not record the text attribution. If someone wrote text on Wikipedia drawn from a freely-licensed source, they are required to indicate the source attribution in the edit summary -- we don't follow that practice on Commons. If I were to edit the description of that dagger, I release my edits under CC BY-SA 3.0 and GFDL. And I get attribution in the file history, but nothing more explicit than that. It is all a mess. -- Colin (talk) 10:15, 1 April 2019 (UTC)

 Comment Dass die strukturierten Daten cc-0 sein müssen ist einsichtig. Inzwischen habe ich es vielfach erlebt, dass eine Bildbeschreibung von mir, die Teil einer unter cc-4 veröffentlichten Dateibeschreibungsseite ist, von einem anderen Autor unverändert in die Caption kopiert wurde. Ich bin gerne bereit zuzugestehen, dass die (bislang) betroffenen Texte keine Schöpfungshöhe haben, und dies daher wohl rechtlich möglich ist. Aber: Die Grundlage des ganzen Projekts Wikipedia ist die rechtlich einwandfreie Handhabung von Lizenzfragen. Dies dient dem Schutz der Nutzer und Nachnutzer, der Autoren, der Stiftung und dem gesicherten Fortbestehen des Projekts. Leider ist es so, dass in der Bevölkerung die meisten Personen in Fragen des Lizenzrechts völlig unbewandert sind. Ich denke, jede/r, der/die schon mal versucht hat einem nicht-projektbeteiligten die Funktionsweise zu erklären oder gar Beiträge (Text oder Medien oder Rechte für die Veröffentlichung von selbsterstellten Medien geschützer Bildgegenstände) einzuwerben, wird erfahren haben, wie schwierig solche Erklärungen sind, und wie kompliziert und unangenehm Wikipedia deshalb wirkt. Dass ein Textschnipsel einfach von cc-4 in ein cc-0-Textfeld übertragen werden kann, mag juristisch abgesichert sein, intuitiv einsichtig (insbesondere für einen unbeteiligten) ist es nicht. Solches Verhalten wirkt verwirrend, macht der Allgemeinheit das Projekt Wikipedia noch fremder und schadet dem Projekt so, da es noch schwieriger wird neue Autoren zu motivieren und Wikipedia in der gesellschaft zu verankern. Daher fände ich es wirklich hilfreich, wenn das herüberkopieren von Text aus Info-Template descriptions in das Captionsfeld technisch unterbunden würde. Ausserdem erscheint es mir im Sinne strukturierter Daten auch sinnvoller, in den Captions kontrolliertes Vokabular zu verwenden. --C.Suthorn (talk) 14:18, 2 April 2019 (UTC)

Agreed, but it seems that WMF development have nothing planned like this. Structured data remains, unstructured. -- (talk) 14:23, 2 April 2019 (UTC)

 Comment Changing of license from one to another without the permission of the author is not very simple. We know en:Wikipedia:Licensing update and the background works (https://creativecommons.org/2007/12/01/progress-on-license-interoperability-with-wikipedia/, https://creativecommons.org/2008/11/03/wikipediacc-news-fsf-releases-fdl-13/). So I wonder how WMF can simply change the license from "Text is available under the Creative Commons Attribution-ShareAlike License" to "Files are available under licenses specified on their description page. All structured data from the file and property namespaces is available under the Creative Commons CC0 License; all unstructured text is available under the Creative Commons Attribution-ShareAlike License" without asking permission form original authors (or Creative Commons). As some others stated above copyright ineligible texts can be used without attribution; but nothing more. Jee 14:48, 4 April 2019 (UTC)

@Donald Trung: Structured Data and "playing nice with Wikidata" are supposed to be two separate things. Structured Data is sold on the basis that it is not Wikidata, that the reason for doing it is not to hoover up Commons content into Wikidata, or to become a extension of Wikidata but to facilitate a better Wikimedia Commons to suit this project's scope. Wikimedia Commons was established with attribution at its foundation, in precisely the same way as was Wikipedia. If the community now wants a copyright-flexitarian approach where some (yet to be properly defined) parts of Wikipedia and some parts of Commons are CC0 while others are CC-BY-SA, then it needs a proper proposal with a real understanding of the implications and practical application, not just the voices of evangelical Wikidataites (regardless of whether they are paid or unpaid). -- (talk) 09:51, 6 April 2019 (UTC)

@: , I have proposed to separate Structured Data on Wikimedia Commons from Wikidata by allowing for local "items" to be created to depicts which failed, it is also clear that Wikidata doesn't want to play nice with Wikimedia Commons as when I proposed to lower their notability standards to better fit with Wikimedia Commons they basically described this website as "a spamfarm" for allowing basically anything due to the vagueness of Wikimedia Commons' scope and the non-existent notability standards. But in the respect of creating a Structured Data on Wikimedia Commons programme we basically have 2 (two) options, either we build everything from the ground up (including creating a separate item for every depicted statement) or utilise the millions of Wikidata pages which use every other Wikimedia website for content organsiation. If Structured Data on Wikimedia Commons works 100% (one-hundred percent) independent from Wikidata then it would mean creating millions of pages taking a lot of volunteer time. I don't like this dependence on Wikidata either but they have the infrastructure we need or we are basically forced to recreate the project in its entirety here. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:49, 6 April 2019 (UTC)
The key issue is that strategically, this is happening the wrong way around. Because Wikidata has all the money and WMF PR attention, projects and development changes have been pitched with Wikidata glasses on. If Commons inheriting data from Wikidata is handy, great, it's CC0 by default, nobody cares. But Commons always has been CC-BY-SA, the dancing about trying to get volunteers to enter duplicate "new" text as CC0 is bizarre. I have personally created a million Commons image pages with titles, notes, references, and other fields which the WMF and Wikidataists are trying to now claim as CC0, with no actual lawyer making any truthfully clear or legally binding statement about it, because that would be convenient for Wikidata and Wikidataists pet projects like 'the sum of all paintings'. The playing around with licencing parts of Commons as CC0 has absolutely no rationale based on Wikimedia Commons' project scope and has never had a proper proposal, despite this being a fundamental and system wide change. -- (talk) 13:02, 6 April 2019 (UTC)
I completely understand where you're coming from, but the (legal) fact is that titles can't be copyrighted (PlagiarismToday), of course I am not trying to excuse the sloppy launch of file captions and their license or how for whatever reason the Structured Data on Wikimedia Commons team doesn't seem to have a direct line of communications with the Wikimedia Foundation's (WMF's) legal department, but simple titles aren't copyrightable. The other things you mentioned are in fact protected by (our) copyright © and we could enforce it (I am not making a legal threat towards our colleagues at the SDC, I am merely talking about a hypothetical scenario). --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 13:36, 6 April 2019 (UTC)
@Donald Trung: If this would be the case, not CC-0 (a license), but the PD-Mark would be the right way to show this;) --Habitator terrae 🌍 13:40, 6 April 2019 (UTC)
Your reference is misleading and you are misinterpreting what "titles" are on Commons. For the vast majority of our uploads of photographs, the uploaders put a title in the filename and sometimes include a title in the image page information box. This is specifically not the the same thing in law as the "title" of a work, unless the original photographer has published a title for the photograph/image. Consequently a file with the filename File:Burking Poor Old Mrs Constitution. Wellcome L0019663.jpg which has a "title" of "Burke and Hare suffocating Mrs Docherty for sale to Dr. Knox" is not public domain text, as the "title" is actually a subjectively created description with the artwork's original title being entirely different, and the "filename" itself a subjective transcription from the artwork, itself copyrightable as is any modern transcription of old works. The truth is that even in this example of a old PD work, WMF legal cannot state that transcriptions, translations, descriptions or any other text that may have subjective modern elements are public domain or CC0, unless the source or author has specifically released these text works as CC0.
In addition as Habitator terrae points out, if a text work is public domain and the "title" is extracted from the PD text, then it cannot be licensed as CC0 as the person typing in that text has no possible rights over the text and therefore has no legal authority to release it, it must be correctly published as public domain. -- (talk) 13:53, 6 April 2019 (UTC)

The attempt to define what is and is not copyrightable text by dividing into "trivial" vs "complex" is naive. This assumes copyright can be determined by examining the brevity of a single fact or sentence in isolation. The Wikidata:Licensing page cautions that while individual facts may not merit copyright, larger datasets of facts may be copyrightable. This will affect those who upload images + textual information scraped on large scale from other websites. Many websites are a confusing mess wrt copyright licensing and often only clear about the licence governing the images, but not the associated texts and facts or the dataset that the website is built from. Anyone performing large-scale scraping of Commons pages into Wikidata may also run into this problem. -- Colin (talk)

The statement "Trivial contributions may be presumed to be donated as CC0" is legal nonsene. Being potentially inelligable for copyright protection is not the same thing as the author making a CC0 dedication about works they own rights in. This legal declaration is not something that occurs spontaneously or can be inferred from someone's contribution based on some rule like "sentence: CC BY SA 3.0" vs "single fact: CC0". CC0 is more than just a copyright declaration but a waiver of "all related and neighboring rights, to the extent allowed by law" as well as a protection that the author "makes no warranties about the work, and disclaims liability for all uses of the work, to the fullest extent permitted by applicable law." The Wikimedia Terms of Use make it clear how our text contributions are licenced and what conditions are needed for uploading text written by others. You and I still agree to licence our text by "CC SA 3.0 & GFDL". What Wikimedia declare at the bottom of the page wrt "All structured data from the file and property namespaces" is kinda up to their own lawyers. Anyone using media, text and data from Commons is also pointed to the General disclaimer which makes no promises at all about the legality of what you find here. -- Colin (talk) 15:58, 6 April 2019 (UTC)

What WMF meant to say by "All structured data from the file and property namespaces"? Did they mean the new fancy sub-head "Structured data"? Structured data has a more vivid meaning; Structured data refers to any data that resides in a fixed field within a record, file or database. So most information mentioned under "Summary" are Structured data. Jee 03:20, 7 April 2019 (UTC)