Commons:Bots/Requests/VortBot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Operator: Vort (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: updates of Flickr images with high resolution versions, discussion: COM:Village pump#Flickr import quality.

Automatic or manually assisted: automatic

Edit type (e.g. Continuous, daily, one time run): one time run

Maximum edit rate (e.g. edits per minute): 12

Bot flag requested: (Y/N): no, I have implemented multipart/form-data support in my code yes, bot can't operate without flag because of missing permissions for url upload

Programming language(s): C#, source code: WikiTasks/cm_flickr_update

Vort (talk) 08:05, 4 October 2017 (UTC)[reply]

Discussion

  • For this to work, the image has to still be on Flickr. Therefore, if the flickrreview template was reset, the normal User:FlickreviewR 2 would re-upload the full size image and fill in the template. This seems a lot simpler than creating another script to do the same thing, and would require no extra permissions than to edit. -- (talk) 10:42, 4 October 2017 (UTC)[reply]
    • @: The problem is that human Image reviewers haven't been uploading the highest resolution images, as User:FlickreviewR 2 would. Please see the examples at COM:Village pump#Flickr import quality. We may need to add such a requirement to COM:LR (I thought we had one, but perhaps I just internalized FlickreviewR's behavior). For test runs, the bot could download the images from Flickr to its local computer and overwrite them using Vort's account.   — Jeff G. ツ 11:06, 4 October 2017 (UTC)[reply]
      • For Flickr uploads, human reviewers should not be wasting their time, it's all done automatically. Where image reviewers have been doing this in the past (we really should be telling them to stop) a script can simply replace the flickrreview template with a blanked one, this will kick off the current automated review. When there are failures by the automated process, the suggested VortBot automation will not work as there will be the same discrepancies that still need manual intervention to deal with, such as the Flickr account overwriting the original image with a different image, such as replacing with an alternate of the same scene, sometime after it was uploaded to Commons.
      BTW, I don't think it's reasonable to force image reviewers to upload the full size file from Flickr manually. Fortunately they don't have to.
      PS If VortBot does get approved, I think it could do the job with an ImageReviewer flag rather than a bot flag and there may be benefits to keeping the overwrites at that level of visibility. I'm presuming there are not a humongous number of files that can be upgraded in this way. Intuitively I'd guess this was a smallish number, so I would be interested to see some estimate of how many files can be upgraded, maybe by using some sample stats. -- (talk) 11:36, 4 October 2017 (UTC)[reply]
1. Any of these rights should be good: Боты, Администраторы, Проверяющие изображения, Пользователи GWToolset. 2. My bot is checking images identity, so images will not be overwritten by different versions. — Vort (talk) 11:42, 4 October 2017 (UTC)[reply]
Could you explain what you mean by image identity, I'm presuming that's not the Flickr photo id as I think that will be unchanged on an overwrite at source. -- (talk) 11:48, 4 October 2017 (UTC)[reply]
Pixel by pixel comparison using CIE76 DeltaE threshold. — Vort (talk) 11:54, 4 October 2017 (UTC)[reply]
Nice way of doing the image hash. I think you mean 'region' rather than pixel, as the latter will obviously not match. -- (talk) 11:59, 4 October 2017 (UTC)[reply]
I'm doing the resampling from flickr dimensions to commons dimensions, then pixel by pixel checking is made. — Vort (talk) 12:02, 4 October 2017 (UTC)[reply]
You may find User:Fæ/Imagehash an interesting case study. Image hashes like this are efficient to generate and would be sufficient. -- (talk) 12:32, 4 October 2017 (UTC)[reply]
If current approach fails (for example, because of JPEG artifacts [but I can try to simulate them]), I will try different algorithms. — Vort (talk) 12:39, 4 October 2017 (UTC)[reply]
: Status codes for 200 random images:
Code Count %
NoHighResolution 157 78.5
FlickrErrorNotFound 24 12.0
Success 10 5.0
NoFlickrLink 6 3.0
OriginalNotFound 2 1.0
ImagesNotEqual 1 0.5
Vort (talk) 13:39, 4 October 2017 (UTC)[reply]
Interesting, which of these would trigger a reupload? We are on a bit of a tangent, but out of curiosity are these error types you created or are they based on FlickrAPI codes? Thanks -- (talk) 13:56, 4 October 2017 (UTC)[reply]
Only Success code will trigger upload. NoFlickrLink is triggered by files like File:-Sssh - someone might hear.-.jpg, which are in category Category:Flickr images uploaded by Flickr upload bot, but have no "Uploaded from *** using Flickr upload bot" comment. OriginalNotFound means that Flickr sizes dropdown list have no "Original" entry (example: File:Art Of Dying - 2007.11.20.jpg [2085812490]). NoHighResolution means that "Original" entry is present, but it have the same resolution as at commons. FlickrErrorNotFound is the only actual Flickr error (strangely, it contains not only 404, but also 403 entries; examples: File:Armenian Qarhunj01.jpg [404], File:Azulejo panel in Lisbon - Jul 2008.jpg [403]). ImagesNotEqual category contains images, which was detected as different by comparison algorithm (example: File:Azulejos - Córdoba (España) 001.jpg). Maybe I will add some more status codes in the future. — Vort (talk) 14:26, 4 October 2017 (UTC)[reply]
@Vort: NoHighResolution would be better as NoHigherResolution. Also, you may want to check the licensing in the template, description page, and Flickr against each other.   — Jeff G. ツ 23:10, 4 October 2017 (UTC)[reply]
CC-BY-SA licenses are non-revocable, so I prefer to trust Flickr upload bot. — Vort (talk) 03:25, 5 October 2017 (UTC)[reply]
Could a Flickr user change licence and then upload a higher-resolution version, so that the high-res version is under a more restrictive licence than the low-res version was? Geograph Update Bot does a similar job, and it refuses to overwrite a file when the attribution text has changed. --bjh21 (talk) 19:25, 7 October 2017 (UTC)[reply]
"if the low-resolution and high-resolution copies are the same work under applicable copyright law, permission under a CC license is not limited to a particular copy, and someone who receives a copy in high resolution may use it under the terms of the CC license applied to the low-resolution copy.", "digitally enhancing or changing the format of a work absent some originality, such as expressive choices made in the enhancement or encoding, will not likely create a separate work for copyright purposes". I am checking images to be sure they are equal. — Vort (talk) 05:23, 8 October 2017 (UTC)[reply]
Excellent. I wasn't aware of that rule. --bjh21 (talk) 10:04, 8 October 2017 (UTC)[reply]
FIO, this may not be true for v4, there was a long debate about it on Commons and it remains a controversial topic. However as Flickr currently uses older versions, it is not an immediate concern. -- (talk) 11:10, 11 October 2017 (UTC)[reply]
I have implemented multipart/form-data support, now additional permissions are not required. Will make test run soon. — Vort (talk) 07:27, 5 October 2017 (UTC)[reply]
Test run is finished. Here is the statistics on MaxDeltaE criterion: [1]. x axis shows MaxDeltaE value, y axis shows processed images count. Yellow bars are for identical images classification, blue bar — for different. Actually, 50 and 80 values are in fact duplicates too, but I think it is safer to leave threshold at value 45. If my edits are considered as good, I will continue the processing. — Vort (talk) 13:13, 5 October 2017 (UTC)[reply]
I think your upload comments, “better quality”, are rather short. I think they should at least note that the new version comes from Flickr, and maybe that it's only the resolution that's changed. — Preceding unsigned comment added by Bjh21 (talk • contribs) 19:25, 7 October 2017‎ (UTC)[reply]
Bjh21: My English knowledge is bad, so I'm not sure what comment should look like. "Uploading higher resolution from Flickr" is fine? — Vort (talk) 05:10, 8 October 2017 (UTC)[reply]
That would be good, yes. That makes it clear what the change is and where it came from. --bjh21 (talk) 10:04, 8 October 2017 (UTC)[reply]
Done. — Vort (talk) 11:39, 8 October 2017 (UTC)[reply]

Approved. --Krd 07:11, 16 October 2017 (UTC)[reply]