Commons:Village pump/Technical/Archive/2024/04

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Table of contents

Can someone figure out why {{CategoryTOC}} doesn't work on this page: Category:EC-Audiovisual_Center_review_needed, and fix it. Thank you! // sikander { talk } 🦖 17:57, 1 April 2024 (UTC)

@Sikander: As far as I can tell, the horizontal Category TOC on that page is working fine. I tried several different browsers as well as both logged in and logged out. I suspect the issue is at your end, perhaps try emptying your browser cache. —RP88 (talk) 18:03, 1 April 2024 (UTC)
It looks for subcategories, but you want files: try Template:FileCategoryTOC. Enhancing999 (talk) 18:04, 1 April 2024 (UTC)
Actually, no. It's just that they are all sorted under "0-9". Enhancing999 (talk) 18:08, 1 April 2024 (UTC)
Sort through Template:LicenseReview/layout-reviewme is by pageid. Enhancing999 (talk) 18:15, 1 April 2024 (UTC)
Ah, seeing Enhancing999's replies led me to think what you might have meant by "doesn't work". In my reply I was saying that I saw {{CategoryTOC}} correctly displaying and working exactly as intended (but apparently not how you probably expected). As Enhancing999 implied, the template {{EC-Audiovisual Center}} invokes {{LicenseReview}}, which (while in the needs review state) uses Template:LicenseReview/layout-reviewme, which sets [[Category:EC-Audiovisual Center review needed|< PAGEID >]]. So the files in this particular category are sorted by their page id, like all other license review needed categories. —RP88 (talk) 18:43, 1 April 2024 (UTC)
@RP88 and Enhancing999: Ah, sorry for creating some confusion here. My intention is to navigate the contents of Category:EC-Audiovisual_Center_review_needed category by file name and thought {{CategoryTOC}} would work. {{FileCategoryTOC}} is also not doing what I thought it would. Any other suggestions? Thank you. // sikander { talk } 🦖 19:30, 1 April 2024 (UTC)
Hmm... depending on what exactly you want to do, you might be able to use Special:Search with incategory:"EC-Audiovisual Center review needed" and then adjust the advanced search fields. This won't get you an alphabetically ordered list, but it would let you search the contents of the category. —RP88 (talk) 20:13, 1 April 2024 (UTC)
Template:LRCategoryTOC has some logic for this type of category. Enhancing999 (talk) 21:13, 1 April 2024 (UTC)

Tech News: 2024-14

MediaWiki message delivery 03:33, 2 April 2024 (UTC)

why does the time scale end in the 49th century BC?

compare for example Romania in the 35th century BC to Romania in the 54th century BC. I couldn'd find out the lack. anro (talk) 21:18, 2 April 2024 (UTC)

On an immediate level, that's because the {{Subject by century}} template (which is used to implement templates like {{Centuries BC in Romania}}) only runs from the 50th century BC to the 25th century CE. It will not display categories which lie outside those bounds.
Realistically, the use of year-based categories for time periods well outside recorded history like "49th century BC" may not be appropriate, as the margin of error on dates that far back is often wider than a century. (Nor am I convinced that it's correct to describe subjects like the Turdaș culture as being "in Romania" over five thousand years before any such country existed.) Omphalographer (talk) 00:02, 4 April 2024 (UTC)

Viewer to highlight topic on annotated files in category (sample: mountain panoramas)

Category:Plattenhörner (Vereina) has several photos with the two summits "Plattenhörner" annotated. I think it could be interesting to have a viewer that allows to view that annotated on all files at once, ideally with larger versions of the photos. @Kuhni74 fyi. Enhancing999 (talk) 09:15, 7 April 2024 (UTC)

Tech News: 2024-15

MediaWiki message delivery 23:34, 8 April 2024 (UTC)

How do I flag my bot’s edits as coming from a bot?

I'm the operator of the bot User:FlickypediaBackfillrBot. I've been through the bot approval process and it was approved by User:Krd in March. The user appears in the list of bots on Special:ListUsers.

I'm using the wbeditentity API and setting the bot=1 query parameter like it suggests, but the edits aren't being flagged as bot edits. This means that when I run the bot, the edits can overwhelm people’s watch lists (see feedback here: User talk:FlickypediaBackfillrBot#BOT flag).

I notice the wbeditentity API says This URL flag will only be respected if the user belongs to the group "Bots" – but this user is part of the "Bots" user group, isn't it?

I've extracted a minimal version of the code here, and you can see me setting the bot=1 parameter on L28: https://gist.github.com/alexwlchan/0acdb92d2e94a1c47d54fcef5b3e1fe9 I got this by looking at the equivalent code in mwclient.

Is this a bug in my code, or is there some other step in the bot approval procedure that I’ve missed? Alexwlchan (talk) 11:54, 10 April 2024 (UTC)

@Alexwlchan, i tried to replicate the problem and by using pywikibot your post parameters are looking good. Maybe one thing what you could try is to check if the bot flag is working when you are doing edits through pywikibot? (to see if there is some problem with the user rights. Though afaik bot flag should be working.)
My example code
Running the code‎
:python3 -m venv ./venv
:source venv/bin/activate
:pip install pywikibot
:echo "usernames['commons']['commons'] = 'YourBotUserName'" > user-config.py
:python editcaption.py
:‎
-- Zache (talk) 12:48, 11 April 2024 (UTC)
Also, you can use Quarry to check if the edits are bot flagged (example: https://quarry.wmcloud.org/query/81935 ) -- Zache (talk) 12:49, 11 April 2024 (UTC)
Thanks User:Zache!
If I run your code, I do see my edits showing up with the bot flag, so it must be a bug on my side. I'll have a look through the pywikibot code to understand what I’m doing differently and find my mistake. Alexwlchan (talk) 13:30, 11 April 2024 (UTC)
Right, the issue seems to be how I'm authenticating – I'm using a Personal API token whereas the Pywikibot code is using some login cookies to make its request. I think I need to take a closer look at my auth code. :) Alexwlchan (talk) 15:20, 11 April 2024 (UTC)
For the sake of future readers: one of the key steps in debugging was observing the HTTP requests made by pywikibot. I've written about how I did that here: https://alexwlchan.net/til/2024/how-to-see-pywikibot-http-requests/ Alexwlchan (talk) 15:34, 12 April 2024 (UTC)
Thank you for the debugging example! --Zache (talk) 17:40, 12 April 2024 (UTC)

Is there any way to get a list of users who uploaded images which appear in a Commons category?

Hi all

I'm working on some documentation around organising photography events and competitions and would really like to find a way to do this e.g I want to get a list of all the user names of uploaders for all the images in the category and all subcategories for Category:Potatoes. It would be a way to understand who is interested in the topic. I have a feeling this info might also be really useful for people who run the Wiki Loves photo competitions as well e.g 'this many people took part this year' although they may have other ways to do this.

Any suggestions would be really appreciated.

Thanks very much

John Cummings (talk) 09:16, 8 April 2024 (UTC)

@John Cummings: It’s possible using the API: query+imageinfo can be used to get uploaders from a given category (but not subcategories), and query+categorymembers can be used to get the direct subcategories of a category. I don’t know about any ready-to-use user-friendly solutions. —Tacsipacsi (talk) 19:33, 14 April 2024 (UTC)

Tech News: 2024-16

MediaWiki message delivery 23:26, 15 April 2024 (UTC)

Module:Autotranslate was altered today to use more efficient test for presence of a language subtemplate. The new version was tested extensively, but please be on a lookout for any issues with templates using it. Please ping me if any issues are found or suspected. Jarekt (talk) 13:34, 18 April 2024 (UTC)

Tech News: 2024-17

MediaWiki message delivery 20:25, 22 April 2024 (UTC)

Mobile versions of user talk pages are useless

Though I'm not active on Commons, I got bored and used my mobile device to look through some deletion requests, then at at a user's talk page. I knew that user had several deletion requests, but saw only one on their talk page. The talk page's history showed that old sections were archived, which was fine, but the page seemed to have no links to those archives.

I couldn't find the 'Read as wiki page' (or whatever it is) option, so out of desperation, I switched to the desktop version of the page. I was shocked at what I found there: not only the archive links, but also a notice that the user had been blocked indefinitely. (Before that, I hadn't seen any reference to a block.)

In summary, the mobile versions of user talk pages (and perhaps talk pages in general) are so crippled as to be useless. Brianjd (talk) 08:22, 4 April 2024 (UTC)

Yes, the mobile experience has always been really bad and as someone who basically exclusively edits with a mobile device it's not a fun experience. The worst part is that in most of the developing world the vast majority of people who surf the web do so using a mobile device. Not too long ago the experience of talk pages on mobile devices wasn't this bad, but they basically made it worse during an update a few months ago. After many, many years of neglecting the mobile experience I sincerely doubt that the developers are planning on fixing it anytime soon. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:47, 9 April 2024 (UTC)
should mobile view on Commons be deactivated? Enhancing999 (talk) 15:18, 17 April 2024 (UTC)
That might go to far, to be honest, the best solution would be to make the mobile version identical to the desktop version but tailored for smaller screens, though the Wikimedia Foundation (WMF) have confirmed multiple times that they will not do that. I have no idea why they insist on delivering an inferior and often infuriating experience to us mobile users, but there doesn't seem to be much we can do about it. De-activating it would break more than it would fix. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 15:44, 17 April 2024 (UTC)
I wonder what there is to break. Consider file description pages: mobile version lacks have of the content (categories).
I think the skin running on each WMF site is determined by the communities. Accordingly, e.g. dewiki doesn't use the same as enwiki, as they consider it not suitable. Enhancing999 (talk) 15:49, 17 April 2024 (UTC)
@Enhancing999 If you login in mobile view, select settings from top right menu and then toggle Advanced mode to be enabled then the categories are visible. --Zache (talk) 17:37, 17 April 2024 (UTC)
I don't think it's available without login. We could keep the default mobile view from logged in users who require it (e.g. WMF staff for testing purposes). Enhancing999 (talk) 18:14, 17 April 2024 (UTC)
Yes, this "solution" only works if you have a Wikimedia SUL account and happen to know where to find the button to enable it. Imagine you're a re-user looking for similar images (or other files) from a certain category and can't figure out how to do that. Most people surf the web on mobile devices these days, this is especially true for people from developing countries. When I was in the rural Philippines a few weeks ago I met several people who have never worked with or used a laptop or desktop computer in their lives but all use smartphones, even homeless people have smartphones, these are the most affordable internet-connected devices for most people in the world. Yet, for whatever reason "the mobile experience" of Wikimedia websites is beyond bad. If possible we should put it to a vote to show the MediaWiki developers that we don't want to suppress vital information for a major part of the audience. Files are for re-users and if they can't find the files then there's something seriously wrong with the system. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 18:34, 17 April 2024 (UTC)
I have been impression that "advanced mode" is for things in beta testing, not ready for primetime, for whatever reason. We could though ask what are plans to primetime it. --Zache (talk) 18:40, 17 April 2024 (UTC)
In mobile view parent categories aren't even visible. How useful can that be? Enhancing999 (talk) 19:34, 17 April 2024 (UTC)
@Donald Trung, Please, provide the link here to any question "put to a vote." Thanks, -- Ooligan (talk) 17:49, 23 April 2024 (UTC)

Deletions on "structured tab"

Anybody know if the feature is still being maintained? I reported a problem a while back about the impossibly to remove wrong statements, but it's still broken. Enhancing999 (talk) 07:44, 20 April 2024 (UTC)

Just to confirming. Your are referring to the coordinate deletion problem described in phab:T313638 or is there some other bug? -- Zache (talk) 16:08, 20 April 2024 (UTC)
It's the problem noted here in January 2023 (more than a year ago): Commons:Village_pump/Technical/Archive/2023/01#Delete_wrong_coordinates_not_working. Any idea why nothing happened? Enhancing999 (talk) 14:40, 23 April 2024 (UTC)

How to search under 6807 photos

I usually edit issues related to rivers of Chile and there is a lack of images directly showing that themes. Some time ago I asked here how to search for a foto of an object located between coordinates (lat1, lon1) and (lat2,lon2) under the 6807 fotos of the Category:Satellite pictures of Chile . @HyperGaruda: kindly answered here with a reference to use "Template:Object location". I tried it and became a new window on the directory page but nothing more. Can anyone give a more explizit indication?. Thanks in advance, Juan Villalobos (talk) 13:53, 28 April 2024 (UTC)

what or how exactly do you want to find?
try using "Map of all coordinates on OSM" i just added. RZuo (talk) 14:29, 28 April 2024 (UTC)
structured data might allow you to search with a bounding box. Enhancing999 (talk) 14:38, 28 April 2024 (UTC)
RZuo, the "Map of all coordinates on OSM" do it very well. I had thought in a excel file with filename;lat;lon;description. But the map is as least so good.
Enhancing999, what is "structured data" in this case and how can I use it?
My problem is that many regions of Chile have been never visited by anyone, not to say photografied. The last option is to show a satellite photo.
Thank you. --Juan Villalobos (talk) 12:58, 29 April 2024 (UTC)
if you want to search about a specific coord, here's a way to search within r km from (x,y): mw:Help:CirrusSearch#Geo_Search.
take a look at other sections of the page for more creative ways to restrict search results.
i dont know how to search within a box bounded by two pairs of coords though.
Commons:Structured data. RZuo (talk) 15:02, 29 April 2024 (UTC)
https://commons.wikimedia.org/w/index.php?search=nearcoord:99km,-38.73,-72.66 for temuco.
https://commons.wikimedia.org/w/index.php?search=nearcoord:99km,-38.73,-72.66+deepcategory:"Satellite+pictures+of+Chile" :) --RZuo (talk) 15:18, 29 April 2024 (UTC)

For your question, the following may do:

#defaultView:Map
SELECT ?file ?image ?location ?filename WHERE 
{
  SERVICE wikibase:box 
  {
    ?file wdt:P1259 ?location.
    bd:serviceParam wikibase:cornerWest "Point(-121.872777777 36.304166666)"^^geo:wktLiteral .
    bd:serviceParam wikibase:cornerEast "Point(-121.486111111 38.575277777)"^^geo:wktLiteral .
  }
  ?file schema:url ?image; 
  schema:contentUrl ?url.
  BIND(wikibase:decodeUri(CONCAT("File:", SUBSTR(STR(?url), 53 ))) AS ?filename)
}
LIMIT 10000

Try it!

You need to pick eastmost and westmost points. Enhancing999 (talk) 23:38, 29 April 2024 (UTC)

Tech News: 2024-18

MediaWiki message delivery 03:30, 30 April 2024 (UTC)

New tool for detecting logos

Hi all! As you already probably know, the Structured Content team is working this year on improving the current user experience with UploadWizard. We already have done some work on the “release rights” step, and we recently concluded a community discussion about the “describe” step. We are currently integrating the feedback received from you into our workflow.

Another thing we are working on is a potential improvement to automatically detect logos when uploaded on Commons through UploadWizard, in order to facilitate their evaluation by the community. A need for machine detection tools was raised in several discussions and user interviews we had in the past with the community, and logos are the second reason for media deletion after Freedom of Panorama, so we decided to work in that direction.

The tool we developed has shown promising results (accuracy is ~96%); in case you’re interested, you can see a brief summary of an evaluation of a sample batch of images. Our intention is for you to discuss and then, if consensus is reached, to integrate the tool in UploadWizard, in a way that would be beneficial for moderation workflow.

We would love to have your input. Do you think this tool could be useful? Do you think this tool could be integrated in UploadWizard, and then integrated in your moderation workflow? Sannita (WMF) (talk) 10:18, 9 April 2024 (UTC)

Maybe you could run it through the "icons" category to test. Enhancing999 (talk) 10:53, 9 April 2024 (UTC)
If integrated into the MediaWiki Upload Wizard, how would it work? Would it prevent the uploading of a PD-textlogo or would it detect if an incompatible license or attribution is present? Or would it simply add them to a daily page for community evaluation, akin to "User:Minorax/PD textlogo/2020 June 6"? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:03, 9 April 2024 (UTC)
@Donald Trung This is actually something that we wanted to discuss with the community. The tool, as of now, only recognises logos with a very high accuracy, but we want to ask the community what to do next. We are open to suggestions. Sannita (WMF) (talk) 11:17, 9 April 2024 (UTC)
Sannita, I think that it might be wise to index all logos by date of upload on a single page and have the tool also detect the licenses, and then add these into sections like "Logos with Creative Commons licenses tagged as Own work", "Logos with public domain licenses", "Logos with Creative Commons licenses attributed to an external source". That way we can easily go through each type of upload, a lot of (new) users upload free logos as "Own work" with wrong licenses, these are the most problematic, but those uploaded with external links and / or specific licenses tend to be less problematic, so we would immediately know which areas to focus on, but still evaluate the other logos. For example, a very complex logo shouldn't be in "PD-textlogo" and can be deleted if not found to have a free license, which would also be easily detected using such a system. -- — Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:26, 9 April 2024 (UTC)
For clarification "I think that it might be wise to index all logos by date of upload on a single page and have the tool also detect the licenses" would be a page like "Commons:Image detection/Logos/2024/November/12", I deliberately made logos as a sub-category of image detection because of the suggestions below by user "Adamant1" to also add this for postcards. I think that a tool like this could supersede the groups based on category done by the OgreBot today in the future. The page "Commons:Image detection" could be a central hub where reviewers (in the broadest sense) can use the AI-powered tool to find and detect logos. Heck, maybe in the distant future it can also detect images based on freedom of panorama (exterior images) and many other categories. Logos is a good start, though we should make sure that we don't discourage people from uploading, we should simply make it easier to detect possible copyright ©️ violations. We also used to have a page for uploads by new users, but I am not sure if we still have that (I think that I read somewhere that an update to the MediaWiki software made it difficult to maintain), perhaps the "Image detection" page can also have a sub-category for new users as well.
Heck, maybe we can even use this tool retroactively to group images together on a page like "Commons:Image detection/Logos/2012/December/8" and have like a button that trusted users can press to mark an image on that page as "patrolled" independent from the current solution we have. There are many ways that such a tool could be implemented, however, I sincerely hope that it won't be included before an image is published, as we could be missing out of valuable uploads because a new user (or a Wikipedian not familiar with this website) would be scared off by a warning message. -- — Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 13:21, 9 April 2024 (UTC)
My own humble suggestion is to disable and dismantle this whole thing, and rather redirect WMF’s funding and priorities to actually useful things requested by the community — like, say, fixing the 10 year old bugs that still plague the unified login process.
I have nothing against some random AI outfit making use of Commons’s publicly available data to “train” their GIGO machines: Good luck with that. But WMF funding should not be used for this sort of useless, misleading, hyped-up nonsensical makework.
And I know that this suggestion will never be given any attention, but that’s how it goes. -- Tuválkin 01:14, 10 April 2024 (UTC)
@Tuvalkin, I'm not sure I'd have ranked it as the top priority (just since the old wizard wasn't especially bad), but I do think that improving the wizard is a reasonable place to focus. We're a media repository, which makes the process for uploading media a pretty essential function. We lose an untold number of contributions (Sannita, do you have any data?) from people who aren't able to go through it successfully, and it's the first line of defense to prevent/fix problematic contributions (which reduces moderation work later). Sdkbtalk 04:48, 10 April 2024 (UTC)
Oh, I don’t think the upload wizzard should exist, at all. So there’s that. (Before you ask, I use Special:Upload or external tools, like Vicuña and Commonist.) I agree that uploading media a pretty essential function — therefore gamified uploads from clueless jokers should be discouraged. You however both bemoan those lost uploaders (can it be lost, that which never existed?…) and also call for a better ratmaze to make sure their “work” is not too deterimental to Commons, in terms of scope and copyright blunders.
At the same time many of the same people who cheer for this sort of continued addition of even more bells and whistles to the upload process, supposedly to avoid that loss of untold poor dears, are the same who constantly decry mass uploads, and who are happy to dance on the grave of the most prolific uploader (both quantity and quality) Commons ever had.
So, basicly not impressed. Wont help the Upload Wizzard this adding to the loop of an opaque step which likely takes pre-published media files away from WMF purview off to some blackbox; the environmental impact of the additional computing resources needed — that’s a cherry on top. (Remings me of: Hop on a jet plane to join all the WMF cronies for needless face-to-face meetings: The cafeteria is 100% vegan because environment, dontcha know?)
To clarify: In my opinion, one-off uploads, especially by editors creating articles in other projects, should be allowed into Commons via a pipeline that’s not the same of mass or “expert” uploading — I presume the Visual Editor has such a function (never used it — unsurprisingly, I think it’s also nonsense). However, it’s not helping this the training of an A.I. (and I’m all out of irony quote marks at this point) who will nag Clippy-style said unexperienced uploader, muddling the process even more. Especially since it will, most of the time, be confusing for logos all kinds of icons, diagrams, maps, flags, and traffic signs.
Was I the only one who shook their head and palmed their face at the apparent fact that the control group for this A.I. training was the contents of Category:Logos, with depth=0…? Yes, that one garbage-bag category where end up stuck all the poorly categorized logos, including, more than any deeper subcat, miscurated images which are anything but logos! Just wow — the more I think of this, the worse it looks like.
But you’re right that this sort of nonsense is (also) «requested by the community». Sad…!
-- Tuválkin 06:39, 10 April 2024 (UTC)
Re training using a depth of 0, oof — Sannita, you all should rethink that.
Re your braoder point, you seem to be pursuing a world in which it's far harder for newcomers to contribute to Wikimedia. That's a world in which our projects have more systemic bias and less overall content, which is not what I'd want. Sdkbtalk 14:17, 10 April 2024 (UTC)
@Tuválkin: thanks for your comments. Logo samples in the evaluation dataset were collected via a PetScan query with a category depth > 0. I'm sorry I couldn't retrieve the exact depth from that query.
Cheers, MFossati (WMF) (talk) 16:55, 11 April 2024 (UTC)
I agree to Tuválkin’s point that this looks like very weird priorities. Recently, UploadWizard became so broken for me that I have to look into different tools for upload. (I don’t know if it would still be usable with high-income country bandwidth and no power cuts.) Why do people start to work on new features of something whose basic function is more or less broken? Just because AI sounds fancy or it’s interesting to play around with it?
Just to be clear: Also agree with Sdkb that I’m not in favor of a world in which it's far harder for newcomers [and] more systemic bias. However, this doesn’t invalidate wondering about reasons for prioritizing a potentially useful feature over basic functionality. —Marsupium (talk) 01:25, 7 May 2024 (UTC)
This looks like a super cool tool. I was just wishing there was something similar for detecting images of postcards. I have to agree with Donald Trung that it's probably better to use an indexer of exiting files instead of being directly integrated into UploadWizard. I imagine things like this are going to be the future of detecting and organizing specific types of images anyway and I doubt every use case going forward could be integrated into UploadWizard, but a hub for different types of detected images going forward would be great, starting out with logos and then integrating it with other types in the future. I don't think turning UploadWizard into a metaphorical Swiss Army knife of image detection at the point of upload would really be practical or useful though. From what I've seen most people are usually turned off by that sort of thing. Especially new users. UploadWizard is hard enough to understand and work with as it is already. --Adamant1 (talk) 12:57, 9 April 2024 (UTC)
This seems like a really useful tool! As far as integration into the Upload Wizard, we could use it to customize the user experience. For instance, there could be a dialogue "This looks like a logo. Is that correct? [Yes] [No]". [Yes] would autocategorize the image and lead to follow-up steps trying to confirm the copyright (e.g. directing them to COM:RELGEN if they claim it's their own work), and [No] would apply a hidden category or add it to a feed of reported false positives. Sdkbtalk 16:29, 9 April 2024 (UTC)
Very nice! The model seems to detect reliable for graphics vs. random photos. Let me suggest some curveballs, from the type of images I often encounter: How does it perform when you include Flags and CoAs into the mixup? Uploads of those categories are also often deleted, but less often than logos. Another test could include Map details (i.e. cutouts) which are usually not deleted.
Also, what would this tool actually do after the detection? Bring the positives into a dedicated (hidden) category for users to evaluate? Yes, I could really see a use in that, for quicker processing of uncategorized uploads. --Enyavar (talk) 12:19, 9 April 2024 (UTC)
@Enyavar Thanks for your opinion. For now, we limited ourselves to logos because they were easier to detect, but if there is consensus for it, we can expand our work to other kinds of images.
About what the tool can actually do... that's the point of the discussion. We have suggestions to make, but we want to hear from you first, not to influence the discussion. The dedicated category/tag was actually one of the suggestions that we had, to be fair. But we'll take note of this suggestion, thanks! Sannita (WMF) (talk) 14:12, 9 April 2024 (UTC)
Ah, sorry, I meant something else with my first comment: can the tool reliably detect logos in the presence of these other (similar looking) images? What are the percentages for examples A, B, C, D, E or possibly F, G, H? It won't be totally bad if these get detected as logos with a certainty of 90%+... I'm just curious whether or not files from these categories have a higher false positive rate for the current tool. --Enyavar (talk) 15:08, 9 April 2024 (UTC)
Good idea, to test on coats of arms, flags and maps in addition to icons I mentioned above.
Wonder if the training set was useful: Commons mainly has simple logos for which it doesn't actually matter that they are also logos. Enhancing999 (talk) 20:04, 9 April 2024 (UTC)
Hi @Enyavar and Enhancing999: {{u}} here, I'm the main technical point of contact behind this effort. Thanks for your feedback, very useful and appreciated! Here are the probability scores of the images you mentioned:
  • A = 77.82 %
  • B = 48.96 %
  • C = 88.9 %
  • D = 97.9 %
  • E = 99.87 %
  • F = 0.45 %
  • G = 0.36 %
  • H = 43.29 %
A few observations:
  • all scores but D and E seem low enough to be cut out;
  • as a human, I'd consider D and E as logos if I didn't know what coat of arms are. The model was indeed trained on logos and non-logos, so it hasn't got any notion of coat of arms;
  • probability threshold selection will be crucial to tune the amount of what we’d like to consider as true positives.
Anyway, I definitely think that your suggestions make a lot of sense: testing the robustness of the model against visually similar inputs is now in my to-do list.
Cheers, MFossati (WMF) (talk) 11:03, 10 April 2024 (UTC)
This doesn't seem un-useful, but I am a bit skeptical that it will move the needle much in terms of checking logos for copyvio/spam. There is already an overwhelming backlog of this stuff -- 118,798 files in Category:Unidentified logos alone. The most likely outcome is another backlog. Gnomingstuff (talk) 20:42, 9 April 2024 (UTC)
What would probably make it a lot more useful is if it was used for flow control in the upload wizard. If an image is detected as a logo and the user confirms that it's a logo, that could eventually shunt the user into a workflow that asks them for logo-specific information, e.g. "what does this logo stand for", "where did you find this logo", etc. Omphalographer (talk) 18:57, 10 April 2024 (UTC)
I can see both sides and arguments here. Tuvalkin is strictly against adding such "another whistle and bell" to the upload wizard (I find it tedious too, and I can understand how a Clippy-style intervention during the upload would frustrate the more experienced uploaders). On the other hand, this addition to the wizard could immensely enrich the descriptions of uploaded logos by one-time contributors (often by the companies/brands who hold copyright of the logo, i.e. these are officially sanctioned uploads that we'll be able to use) who we can never contact again even a week later for clarification on their uploads. --Enyavar (talk) 09:32, 11 April 2024 (UTC)

Thanks all for your interventions! We've already identified some good things we can work on, even though we would like the discussion to go on and let you continue have your say. We would like the discussion to go on some more days, to try to get some consensus going on. Sannita (WMF) (talk) 10:17, 15 April 2024 (UTC)