Commons talk:Structured data/Computer-aided tagging/Archive 2019

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Few questions

  • How long is a waiting period for analyzation you have mentioned in the notes?
  • Is this an image recognition technology?

Juandev (talk) 05:40, 17 September 2019 (UTC)

The wait period hasn't been decided yet, at least a week or two if not 30 days. One of the main reasons for waiting before sending the images for processing is to give the community time to do normal maintenance on new images - mainly speedy deletions and DRs for new images unsuitable for Commons. There's little reason sending everything as it comes in when some of it is going to be deleted. And yes, this is image recognition. Keegan (WMF) (talk) 15:53, 17 September 2019 (UTC)
I wonder who will be waiting with his 300 images 30 days to be able to tag them. Thats sounds very user unfriendly. Juandev (talk) 17:05, 17 September 2019 (UTC)
Good point, noted, thank you. Keegan (WMF) (talk) 19:42, 17 September 2019 (UTC)
Well, I think there is no reason to wait at all. On Wikimedia Commons, it doesn't work like on Wikipedia, where within minutes or hours possible copyvio is discovered and processed out. On Wikimedia Commons, this may be found after years.
The second point is already mentioned above. If I upload 300 images, I'd like to describe them at the point of upload, because in 4 weeks' time, I would have another 300 images and I don't want to go back to the first bunch. There are people, who go back and curate their files, but I would say that most work is done at the time around upload, so if this feature has to help if have to be performed immediately. Juandev (talk) 02:28, 19 September 2019 (UTC)

Sign up for design and prototype testing

I normally (and will continue to do so) publish an open call for people to review and comment on designs and prototypes as they're developed. I'd like to make sure that people who would like to be directly involved are, so please sign up for me to make sure you're contacted about both calls for feedback as well as making sure that your feedback is given to the team.

  1. Keegan (WMF) (talk) 17:55, 17 September 2019 (UTC)
  2. John Samuel (talk) 18:39, 17 September 2019 (UTC)
  3. Tris T7 TT me 04:15, 18 September 2019 (UTC)
  4. Juandev (talk) 05:40, 18 September 2019 (UTC)
  5. Jarekt (talk) 03:18, 23 September 2019 (UTC)
  6. Ayack (talk) 07:53, 24 September 2019 (UTC)
  7. GPSLeo (talk) 08:11, 24 September 2019 (UTC)
  8. Christian Ferrer (talk) 10:49, 24 September 2019 (UTC)
  9. Aktron (talk) 18:08, 20 November 2019 (UTC)
  10. Raymond 17:28, 9 December 2019 (UTC)

Mapping output to Wikidata items

The Commons Android app has been brainstorming this feature since 2016. The biggest challenge we recognized is to map the output (most probably WordNet concepts) to Wikidata items.

Most image classification implementations output WordNet 3.0 concepts. I wrote this query that shows the mapping between WordNet concepts, Wikidata items, and Commons categories. It takes a while to execute, so is a screenshot.

There are currently 474 mappings, and it has not increased in a year. We really need to motivate people to add more mappings. Any idea how to do so? Maybe via a game?

Great to see this becoming a reality! :-) Syced (talk) 02:55, 18 September 2019 (UTC)

What does it mean mappings? Juandev (talk) 05:44, 18 September 2019 (UTC)
@Juandev: in this context, a "mapping" is making a note of the relationship between an entry in one database and another database. For example, Wikidata's item for house (Q3947) maps to Google's Freebase ID "/m/03jm5". Keegan (WMF) (talk) 14:58, 19 September 2019 (UTC)
Fortunately, the Google Cloud Vision API returns Freebase concept IDs, and millions of those IDs are already included as external IDs on Wikidata items. There is some improvement to be made here, but the mappings we already have are a very good start and we're building a lookup table to make things fast. RIsler (WMF) (talk) 23:25, 18 September 2019 (UTC)

Combination with coordinates

Could this technology be combined with coordinates? What I mean. One way is to ask artificial intelligence to recognize the object. So in the case of a church, it will recognize a church. But if we have coordinates of the position of the photographer, we may retrieve from OpenStreetMaps, what is in the specified radius around the photographer and we may get into the "Church of Saint Stephen". So combining both technologies, would give better results. Would such a tool be possible to develop in the future? As far as I remember there were some negotiations with OSM foundation... Juandev (talk) 18:03, 19 September 2019 (UTC)

I think for this cases we already have the category to say what church is on the picture. But this tool maybe could say what part of the church is depicted. --GPSLeo (talk) 18:31, 19 September 2019 (UTC)

How many and how detail tags should we expect?

I have some concerns and some hopes. I wander how many depicts should we expect. The Complete Encyclopedia of World Aircraft (Q21014429) currently depicts (P180) 1982 items and The Sawley Map (Q22815091) 171 items. I hope this tool will pick top 5-10 items and ignore the rest. Also I wander how detail they might get:

I would love if this tool was able to tell that image depicts Mona Lisa or any other known painting, or do facial recognition and figure out who is there. --Jarekt (talk) 03:56, 23 September 2019 (UTC)

The depics guideline say, you have to be as specific as posible. So in the case of Albert Einstein, the only value should be Albert Einstein. The fact that he is somethig else, should be harvestable from his item on Wikidata. Juandev (talk) 10:15, 23 September 2019 (UTC)
So my question was how specific will be the tags provided by this proposed tool. If the tool was able to identify specific people, paintings, buildings, etc., than it would have to be great. Identifying content of images to the level of "person" or "building", might be useful but only marginally. I am also weirded out by proliferation of tags like white people (Q235155). --Jarekt (talk) 13:22, 23 September 2019 (UTC)
It'll probably be most helpful to answer this question after there's a prototype up and running in a few weeks, after some designs are put forward to the community. Keegan (WMF) (talk) 18:09, 23 September 2019 (UTC)
As Keegan said above, community will be able to test this out and answer these questions solidly in the coming weeks. There are some questions we can answer now though. First, artwork rules and conventions on Commons are a bit tricky so we're going to try to avoid running those types of images through the tool for now (generally by looking for the Artwork and other related templates). Google Vision's Labels functionality, which we'll use primarily, doesn't try to get too specific on paintings anyway. For photos of landmarks, however, we'll try to make use of Google's landmark API to identify specific buildings, monuments, locations, etc. As for how many suggested tags to expect from the tool, generally it'll be 5 to 10, but please keep in mind that nothing actually gets added to the file's structured data until a user confirms the tags. For example, the tool may show 10 suggested tags but the user who sees them may only pick 3 of them and ignore the rest. Finally, the odds of seeing "White people" as a suggested tag from the tool are near zero (the Google label API doesn't suggest ethnicity/race). RIsler (WMF) (talk) 18:49, 23 September 2019 (UTC)
RIsler (WMF) thank you for the explanation, and mostly answers my question and you and Keegan are right we will just have to wait and see to learn the details. One last thought about artworks and {{Artwork}} template: true artwork rules and conventions on Commons are a bit tricky and In flux at the moment, but one use scenario might be for identifying artworks. Here is the idea, we do have almost 250k images using {{Artwork}} template and associated with specific Wikidata ID. We also have over 2M images using {{Artwork}} template and not associated with any Wikidata ID. Many of the images with Wikidata IDs depict the same artworks as images without. It would have been great to recognize that that is the same image and copy the wikidata ID. But perhaps I just described idea for a different tool.

Some questions - About the usage and the Vision API

  • 1) Are you using targeted custom labels detection (custom ML model ) or the native one from google ?
  • 2) Are you using the web detection feature ? or only label detection ? If only label detection, how do you deal with the portrait of Albert Einstein (see File:Albert Einstein Head.jpg)? Labels are (Photograph 96 %), (Portrait 89%), (Black-white 89% ) and blah blah blah .... It does not show his name(as we expect).
  • 3) Web detection identifies that he is who he is. Can you make it automatic (auto-tagging) if it scores above a specific detection score ?
  • 4) MOST IMPORTANT: Is there any limit on the API usage ? I remember there was a limit on OCR(it is/was used on a different project). -- Eatcha (talk) 14:44, 12 October 2019 (UTC)
Hello! Sorry for the late reply. Answers to your questions.
  • 1) We are using the default native model from Google
  • 2) Only label detection. Web detection isn't reliable for new photographs especially (and web detection info often comes from Commons anyway)
  • 3) Moot, see above.
  • 4) Yes. We'll have a set number of credits, but we're aiming to have enough to cover all images on Commons eventually. RIsler (WMF) (talk) 22:43, 21 October 2019 (UTC)
Thanks! -- Eatcha (talk) 01:47, 22 October 2019 (UTC)

Ability to remove suggestions from the Queue

There are a lot of tags that are completely inappropriate in the Suggestions for "depicts" statements -- i.e. I just saw one for "close up" which is a genre of photography, or an instance of problem, but should never be in depicts. Could we have a function for removing these wikidata items from the queue? Magnus's WDFIST tool, allows for users to put images on a "never again use this" list that is really transparent and in a list format on the wiki: https://www.wikidata.org/wiki/User:Magnus_Manske/FIST_icons . By doing so the workflow becomes cleaner, and the chance of a less experienced user using an inappropriate tag becomes better (moreover it gives feedback into the machine learning for later refinement of these kinds of tools). Sadads (talk) 21:53, 13 November 2019 (UTC)

I.e. 2 of the three tags on: File:Horyu-ji36s3200.jpg are entirely inappropriate clutter, and the third Japanese architecture (Q1422874) is only useful if someone is going to sort that general set later . Sadads (talk) 21:59, 13 November 2019 (UTC)
And almost all of the depicts at File:Shakedown_2008_Figure_4.jpg are inappropriate.Sadads (talk) 22:00, 13 November 2019 (UTC)

Tests

  • I have discovered that... I'm a bit upset that the categories are not visibles. I don't say it is bad, I say that it changes than compared to the usual... This brings me to think that the tool could maybe also propose some tags based on the existing categories for the files. Other than that, I'm a little frustrated that I can not add tags that are not offered. But I have rather a good general impression, and that looks pretty accomplished. Regards, Christian Ferrer (talk) 22:18, 14 November 2019 (UTC)
    • I would also appreciate the ability to add tags not offered: there are a lot of hints in file titles and descriptions of the "right" potential item. It also feels like we aren't gathering potential tags from the text, which are likely to be a lot more accurate, especially with specific locations and specific species. Sadads (talk) 13:37, 15 November 2019 (UTC)
  • I just tested the tagging prototype. I tagged about 6 images and all but 2 of them successfully published (I received an error message). The error message just sad that it didn't go through. Feedback from the tagging: I liked: being able to easily tap the pill buttons and it was super intuitive to publish. I wish 1) it were easier to preview the image after publish by clicking the feedback confirmation message or something and 2. there was a way to add tags. Iamjessklein (talk) 20:37, 15 November 2019 (UTC)
  • I was able to easily add tags to images, but I did wonder if the large image sizes would make it hard for first time users to locate the tags, which often required scrolling. Additionally, one time the review pop-up did not appear for me. --Climadeo (talk) 21:05, 15 November 2019 (UTC)

Theoretical concepts as suggestions

There are many suggestions they does not make any sense because they are theoretical concepts like juristic concepts or philosophical concepts. I think it is clear that they can do be depicted in an image. Some examples: nature reserve (Q179049) natural environment (Q43619) demonstration (Q1395149)(in this case the problem seems to be two different meanings of the English word "demonstration") people (Q2472587) social work (Q205398). Maybe it would be good to exclude items they are a subclass of concept (Q151885) in general. --GPSLeo (talk) 13:00, 16 November 2019 (UTC)

Feedback on Computer-aided tagging

After testing tagging both popular and personal pictures on Wikimedia Commons, I feel that the overall experience was fluid. I just had the error a couple of times while adding tags (Something went wrong...). But tags were added, when I retried. John Samuel (talk) 18:59, 16 November 2019 (UTC)

Aktron

  • It would be much better to have a more keyboard usable interface. When I do the review of the tags, I can't just click enter (FFS!) I have to take a mouse and click on a button. Also, it would be nice, if the tags can be added just with retyping them – it suggests "building", "sea", "shore" – Like... it is faster to type this then to take the mouse, move it somewhere and click.
  • It would be nice to have an interface with more thumbnails of more pictures and a possibility to add wikidata labels at once to all of them and THEN to submit it only once. That would make the whole process of adding labes more streamlined, since you'd have to click or submit only once per 10 pictures, not per every picture. There will be less loading, or loading would happen in batches.
  • The amount of tags suggested seems to be surprisingly low, especially for photography from urban areas.
  • The tags are utterly generic (sea/building), more specific tags would be cool.
  • I got the error message "Something went wrong and the tags cannot be...Please try again later" multiple times. Web-version, Chrome.
  • It's nice to do valued images – it would be however even better to be able to go through images an user uploaded in the past. Aktron (talk) 14:20, 22 November 2019 (UTC)
    • Regarding this point, it would be nice to have multiple ways of selection of "your" files for tagging. E.g. some filters, hole category, hole galery and also manual selection (the way Cat-a-lot works e.g.). --Juandev (talk) 21:59, 25 November 2019 (UTC)

Thanks for the feedback

The development team appreciates everyone who tried out the tool and left comments here. There seems to be overwhelming consensus for the ability to add tags that aren't suggested within the tool, and have the tool work with other gadgets. While the team doesn't currently plan on building out any new or additional features, the requests are heard and will be remembered. I'll be back with more information about final development and release in the near future. Keegan (WMF) (talk) 18:28, 25 November 2019 (UTC)

Do note that I'll be going over the feedback throughout the week with relevant people involved in development, and I'll get back with updates as best as I can. Keegan (WMF) (talk) 19:58, 25 November 2019 (UTC)
The bug(s) experienced during testing have been patched, and there have been some minor tweaks in the design. As I previously mentioned, the team isn't able to build any new features for the tool at this time (such as category integration, all personal uploads), but the suggestions will be remembered in case the team is able to return to the tool in the future for enhancements. The tool should be live for all logged-in auto-confirmed users on Wednesday, 11 December. Keegan (WMF) (talk) 18:54, 6 December 2019 (UTC)

Tags that already exist are still offered to me

I just used the tool to tag File:Bald Eagle Portrait.jpg with Bald Eagle. I then looked at my contribs and there was no record. The truth is the photo already has a statement depicts (P180)=Bald Eagle (Q127216).--Roy17 (talk) 21:54, 11 December 2019 (UTC)

Thanks, passing this along and I'll post a bug link if/when I have one. Keegan (WMF) (talk) 22:06, 11 December 2019 (UTC)

Instructions marked for translation yet?

The instructions on top and the CC0 warning below are still in English as I tried using German UI.--Roy17 (talk) 21:54, 11 December 2019 (UTC)

These are system messages so they live on translatewiki.net, and translations are in progress. I've sent out an ask to the translators mailing list for help with this. Translations on translatewiki are pushed here to Commons not too long after they become available, so you should be seeing German soon as the messages are processed by translators. Keegan (WMF) (talk) 22:22, 11 December 2019 (UTC)

The tool is live

The tool is now turned on for all autoconfirmed, logged in users. Keegan (WMF) (talk) 22:20, 11 December 2019 (UTC)

No tags at all

No Suggested tags 20191212

--Roy17 (talk) 12:42, 12 December 2019 (UTC)

Should have image description

I think it would be better if the image description was included, so i have context at what I am looking at. I'm also not sure precisely what is supposed to be labelled (Does water mean its all water? Has any water in the picture at all? Has water as a significant element?). Also we should have an input to add additional tags if the suggestions are missing something. Bawolff (talk) 18:18, 12 December 2019 (UTC)

The tags ~ structured data have to be as much as specific. So if you have an image of the river, you should add a tag river, but if the tool does not provide the tag river, I would just skip it. Sometimes you even see the name of the river in the filename, but you are unable to add it as you dont have such feature and if you look behind the WMF team does not plan to add such functionality, sadly!--Juandev (talk) 11:29, 13 December 2019 (UTC)

Feedback on some images I skipped and that keep coming back

  • Some tags seem too general/not applicable
  • For the last one, I wasn't really sure.
  • For the second to last one, maybe "river, tree, house, sky" could do
  • One image has nothing suggested
  • Somehow I doubt "art of painting, illustration, visual arts, modern art, art" are of much use to any image, but I can understand that in a way of adding statements different from mine this could be seen as useful.
  • The drawings seem to get hardly any suggestions about what is depicted.
  • Some images are in use and suggestions could be based on the articles they are used in (but I understand that isn't the purpose of the module)

BTW it could be that there just many similar images and I think they are the same coming back. Jura1 (talk) 23:29, 12 December 2019 (UTC)

Some other

Above a few others where I actually added statements, based on what was suggested.

Maybe adding "insect" to all insect is useful (and probably I couldn't be more precise by merely looking at them), so someone else can try to look at all "insects" starting out from the initial "tagging". Oddly, the first two images already have much more precise "depict" statements (I found that when going through my additions now).

Interesting new feature. Jura1 (talk) 23:52, 12 December 2019 (UTC)

I think especially the insect example does not make sense, because most files are categorized very well and so we have the information about the exact species. --GPSLeo (talk) 19:37, 13 December 2019 (UTC)
I'd think the feature would be most useful for images that have no categories, statements and aren't in use and might not even have much a description/explicit filename. Obviously, starting out from "Popular" probably doesn't select these. That categories aren't taken in account might also be due to the fact that statements should be an alternative. Jura1 (talk) 04:43, 14 December 2019 (UTC)

How to reach the tool?

What are the ways to reach the tool? Its even not reachable fast from this page!--Juandev (talk) 11:15, 13 December 2019 (UTC)

Oh good catch, now that it's live I should link it :)
If you opt-in to the service in the UploadWizard or to receive "Suggested tags for review" notifications in preferences, you'll receive a notification link to Special:SuggestedTags. Otherwise, the page is listed in Special:SpecialPages. The tool is not highly visible by design; it's meant to be very unobtrusive and not really easy to find because structured data is so new, and Community standards around depicts and other data modeling are still under discussion. If you plan to use the tool often, I suggest bookmarking Special:SuggestedTags for now, unless someone writes a line for your Commons .css to link you in the sidebar directly. Keegan (WMF) (talk) 17:22, 13 December 2019 (UTC)

Funny tags

For this one of pots and saucer-like items : it suggests "organism" (?what?) and "vehicle" : likely a flying saucer. Thanks for the laugh :) Pueblopassingby (talk) 02:07, 15 December 2019 (UTC)

Une erreur est survenue et les étiquettes n’ont pas été publiées. Veuillez réessayer plus tard

Though I was not very convinced by the suggestions, I inserted the tags in order to practice.

But again and again this message appears : Une erreur est survenue et les étiquettes n’ont pas été publiées. Veuillez réessayer plus tard.

Now there are more and more files waiting [1] and I feel discouraged. How can I escape ? -- Ji-Elle (talk) 10:16, 16 December 2019 (UTC)

I believe there were some server problems earlier today causing this problem that were unrelated to the tool. Sorry about that. Keegan (WMF) (talk) 18:20, 16 December 2019 (UTC)
Thank you for answering. However I cannot go further here [2]... Ji-Elle (talk) 11:41, 17 December 2019 (UTC)
Five days later nothing has changed. The same message appears : Une erreur est survenue et les étiquettes n’ont pas été publiées. Veuillez réessayer plus tard. What can we do ? -- Ji-Elle (talk) 10:59, 21 December 2019 (UTC)

How to opt-out?

Is it possible to opt-out from this service? I do not like the automated notifications I am receiving (at the bell on the top of my page) for suggested tags. Elly (talk) 10:31, 16 December 2019 (UTC)

That is an option at Special:Preferences#mw-prefsection-echo --GPSLeo (talk) 11:23, 16 December 2019 (UTC)
thanks so much for your quick answer. Elly (talk) 12:38, 16 December 2019 (UTC)

Je n'ai aucune idée de comment utiliser cet outil

J'ai répondu par erreur et je n'ai pas su corriger mon erreur ! --Bel Bonjour, Ambre Troizat (talk) 12:20, 17 December 2019 (UTC)