User:Geograph Update Bot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Botboxes
This user has uploaded images to Wikimedia Commons.

To contact the operator, use his talk page.

This bot's general purpose is to operate on images from Geograph Britain and Ireland that are already on Commons, improving them by copying information from Geograph as appropriate.

It curently has four permitted tasks:

The bot's code is available on Wikimedia GitLab.

Resolution improvement[edit]

This task runs once a week over new uploads (either on Commons or Geograph). When running, it operates at a maximum of one edit per minute.

Background[edit]

In the beginning, Geograph only stored images up to 640px in each dimension. The Geograph software would downscale (initially badly) any larger image that was uploaded. Geograph now permits larger uploads, including uploading larger versions of existing images. The 640px image is still special, though, in that it's stored separately and is immutable, and it's what's displayed on the Geograph Web site.

Because of this, many images from Geograph had a higher-resolution version available on Geograph than was on Commons. While 40% of the images on geograph have versions over 640px, before Geograph Update Bot started upgrading them fewer than 5% of the images from Geograph on Commons were that large. This is largely because Commons is biased towards earlier Geograph images, but even allowing for that it seemed that there might be 50,000 images on Commons that could be improved by uploading a higher-resolution version from Geograph in accordance with the guideline on overwriting existing files. The Geograph Update Bot's first task was to upload those higher-resolution versions.

In 2018, Geograph started presenting slightly higher-resolution images on its Web pages, leading to a number of uploads of these intermediate-resolution images. The bot's code has been extended to handle overwriting these.

Results[edit]

On its first pass through the database, the bot uploaded 8,910 images, with another 420 found on a second pass (largely from added {{Geograph}} templates). That's rather fewer than expected, but still a worthwhile effort. The bot will continue to be run occasionally to pick up newly-uploaded images.

Method[edit]

The bot extracts the Geograph ID for an image from its {{Geograph}} template. It checks the image dimensions against a dump of the gridimage_size table, and copies across the Geograph full-resolution image if all of these criteria are fulfilled:

  • No version of the image has been uploaded by User:Geograph Update Bot already (since 27 August 2017)
  • There is precisely one {{Geograph}} template in the image description.
  • The {{Geograph}} template contains a valid Geograph ID.
  • That image has a high-resolution version on Geograph.
  • The aspect ratios of the old and new images agree within 1% or are inverses within 1% (since 25 August 2017).
  • The current image on Commons has the same dimensions as the 640px image on Geograph.
  • The current image on Commons has the same SHA-1 as the 640px image on Geograph.
  • The attribution specified by the {{Geograph}} template is the same as the attribution specified on Geograph.
  • If the image on Commons has a {{Credit line}}, it specifies the same title as the image has on Geograph.
  • No warnings (apart from overwriting an existing file) are generated by the upload.

It then compares the new and old 120px thumbnails. If they differ by more than a specified amount, it adds the file to Category:Dubious uploads by Geograph Update Bot for human attention.

Location updates[edit]

This task runs once a week over new uploads (either on Commons or Geograph). When running, this task operates at a maximum of twelve edits per minute.

Background[edit]

All photographs on Geographs are geolocated to some extent. Every photo has a subject location recorded, and 95% have a camera location as well. Locations in Great Britain and the Isle of Man are recorded in the British National Grid, while locations in Ireland use the Irish Grid. These locations are frequently used on Commons as well. Locations can be updated on Geograph, and when that happens, it's helpful to update Commons to match.

Method[edit]

This task operates on files from Geograph whose {{Location}} and {{Object location}} templates are tagged with source:geograph, and those files from Geograph that have no locations at all.

The bot updates templates where the location on Geograph is significantly different. If the existing template has a source grid reference, then the bot simply compares that with the new grid reference. If there's no existing grid reference then the bot looks for a significant change in WGS-84 co-ordinates.

When the bot updates a template, it also checks if there's a coordinates of depicted place (P9149) or coordinates of the point of view (P1259) with the same co-ordinates as the existing template. If there it, it assumes that it came from Geograph and updates (or deletes) it accordingly. If the statement is updated, the bot also adds a suitable reference to Geograph. If the bot is adding a geocoding template and there is no corresponding statement, it will add a suitable statement, but only if it's also changing structured data for another reason.

Before 7 September 2021 20:12Z, the bot updated coordinate location (P625) instead of coordinates of depicted place (P9149).

Before 22 January 2022 19:15Z, the bot did not add references.

Before 13 February 2022 13:50Z, the bot did not add statements to structured data.

Sample templates[edit]

File:Woodchester Mansion - geograph.org.uk - 4.jpg
8-figure subject and camera references; use6fig set
{{Location|51.71051|-2.2766|source:geograph-osgb36(SO80980134)_heading:292|prec=100}}
{{Object location|51.71069|-2.2773|source:geograph-osgb36(SO80930136)_heading:292|prec=100}}
File:Lake at Woodchester Park - geograph.org.uk - 5.jpg
4-figure subject reference only, but moderated as Geograph
{{Location|51.712|-2.25|source:geograph-osgb36(SO8201)|prec=1000}}
{{Object location|51.712|-2.25|source:geograph-osgb36(SO8201)|prec=1000}}
Would have had the camera location removed after 28 November 2017 22:30Z.
File:Raised shoreline and creep terracettes - geograph.org.uk - 1803781.jpg
4-figure subject and camera references
{{Location|55.174|-4.93|source:geograph-osgb36(NX1390)_heading:225|prec=1000}}
{{Object location|55.174|-4.93|source:geograph-osgb36(NX1390)_heading:225|prec=1000}}
Would have had the camera location removed after 28 November 2017 22:30Z.
File:Ogham stones near Baile Mhic Íre (Ballymakeery) - geograph.org.uk - 2913.jpg
6-figure subject reference only; Ireland
{{Location|51.935|-9.16|source:geograph-irishgrid(W2076)|prec=1000}}
{{Object location|51.9360|-9.152|source:geograph-irishgrid(W208765)|prec=100}}
Would have had the camera location removed after 28 November 2017 22:30Z.
File:Captain's Pool - geograph.org.uk - 715.jpg
10-figure camera reference; 4-figure subject reference; no view direction
{{Location|52.372194|-2.22568|source:geograph-osgb36(SO8473274929)|prec=1}}
{{Object location|52.368|-2.23|source:geograph-osgb36(SO8474)|prec=1000}}
Would not have had object location added after 28 November 2017 22:30Z.
File:Fossilised tree stumps near Lulworth Cove - geograph.org.uk - 15.jpg
4-figure subject reference only, moderated as supplemental
{{Object location|50.615|-2.23|source:geograph-osgb36(SY8379)|prec=1000}}
File:Clifton Road Bridge, Clifton - Brighouse - geograph.org.uk - 190630.jpg
Geograph has no camera location recorded.
{{Object location|53.7028|-1.775|source:geograph-osgb36(SE149229)_heading:180|prec=100}}
File:The Stotts Arms updated, Wakefield Road, Brighouse - geograph.org.uk - 924556.jpg
Geograph displays the camera as SE 149 229. Internally, it was coded as the 6 figure grid ref SE 149 229.
{{Location|53.7028|-1.775|source:geograph-osgb36(SE149229)_heading:315|prec=100}}
{{Object location|53.7028|-1.775|source:geograph-osgb36(SE149229)_heading:315|prec=100}}
File:J W Lister Ltd Wireworkers - Clifton Road - geograph.org.uk - 802290.jpg
Geograph displays the camera as SE 149 229. Internally, it was coded as the 8 figure grid ref SE14942298, but with a command to drop to a 6 figure location on display.
{{Location|53.70309|-1.7751|source:geograph-osgb36(SE14942298)_heading:292|prec=100}}
{{Object location|53.70318|-1.7756|source:geograph-osgb36(SE14912299)_heading:292|prec=100}}
http://www.geograph.org.uk/photo/3419425
has not been uploaded to Commons, but Geograph displays the camera as SE 1493 2291. It was coded as a 8 figure grid ref
{{Location|53.70246|-1.7753|source:geograph-osgb36(SE14932291)_heading:0|prec=10}}
{{Object location|53.70255|-1.7753|source:geograph-osgb36(SE14932292)_heading:0|prec=10}}

Location correction[edit]

This job was a one-time run to correct the mistakes of GeographBot.

Background[edit]

All photographs on Geographs are geolocated to some extent. Every photo has a subject location recorded, and 95% have a camera location as well. Locations in Great Britain and the Isle of Man are recorded in the British National Grid, while locations in Ireland use the Irish Grid. There are also WGS84 geodetic co-ordinates in the Geograph database, and these (roughly) correspond with the subject location, but these are not actually used by Geograph.

When GeographBot imported 1.7 M images from Geograph, it generated {{Location}} templates from the WGS84 columns of the Geograph database. Since {{Location}} is meant to contain the camera location, this meant that a lot of locations were incorrect.

Method[edit]

The bot operates on files uploaded by GeographBot. It constructs a new {{Object location}} template based on the subject location from the Geograph database. If a viewpoint location is recorded (or implied), it also constructs a {{Location}} template, but only if the new camera location has a precision better than 1km.

The existing {{Location dec}} template is removed, and replaced by the new {{Location}}, if any, if all of the following conditions are met:

The new {{Object location}} is added if the file doesn't already have an {{Object location}} and either new object location has a precision better than 1km or there is no camera location.

Before 27 November 2017 22:30Z, the bot didn't treat 1km precision specially.

The update is flagged as a minor edit if it is only replacing {{Location dec}} with {{Location}} and the locations differ by less than the grid-reference precision.

Tagging locations[edit]

To simplify future work, the bot can add source parameters to {{Location dec}} templates that lack them where those locations came from Geograph. These are generally added in batch runs. When they occur, they run at a maximum of twelve edits per minute.

Background[edit]

Geocoding templates like {{Location}} and {{Object location}} can be marked with a source parameter to indicate where the co-ordinates came from. Co-ordinates generated by Geograph since 2018, and those added by Geograph Update Bot are already tagged with source:geograph, but there are many templates that are derived from Geograph co-ordinates but not tagged. If they were tagged, this would help the bot to recognise them in future as being eligible to be updated from Geograph.

Method[edit]

There are various cases where the bot will tag a {{Location dec}} template:

  • Where it is identical to that on the first revision of the file description page, and that first revision was generated by geograph2commons or by GeographBot. The current, OAuth-based geograph2commons is identified from its edit summary. The older version is assumed to be responsible for all Geograph uploads by File Upload Bot (Magnus Manske).
  • Where it is identical to one added by DschwenBot with the edit summary "adding missing Location data from www.geograph.org.uk".
  • Where it has been edited by BotMultichill with the edit summary "Fixing location" and is now identical to the edited version.

Credit lines[edit]

Most files from Geograph Britain and Ireland now have credit lines. This task is being run occasionally by hand since phab:T262750 shows no signs of progress. When running, this task operates at a maximum of twelve edits per minute.

Background[edit]

Geograph pictures have titles, and CC BY-SA 2.0 requires that these titles be conveyed along with them. Most Geograph pictures on Commons have their titles in their descriptions or filenames, but these sometimes get edited and there's no indication that they need to be preserved when the pictures are re-used. On Commons, the {{Credit line}} template is used to store information like this that's required to be kept with a work.

Method[edit]

The bot skips any file that has already has a {{Credit line}} template on the assumption that it's correct. It only works on images with a {{Geograph}} template, since the licensing arrangements for {{Also geograph}} images may be different. It only operates on images where the author name in the Geograph database matches that in the {{Geograph}} template, since otherwise there would be a danger of ending up with a title/author combination that had never actually been licensed. If all the preconditions are met then the bot adds a {{Credit line}} template to the other fields parameter of {{Information}}, adding the parameter if necessary.

Reports[edit]

The bot can scan the Geograph images on Commons to detect anomalies that might merit human attention. These reports generally run once a week. Current reports are:

Method[edit]

These reports involve scanning Category:Images from Geograph Britain and Ireland and collecting the list of Geograph IDs that are represented on Commons. The bot can then detect duplicates in that list and identifiers that don't appear in the Geograph database.