User:Martinvl/sandbox

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

It is mandatory in Commons that all files are allocated to one of more categories. Although there are rules regarding categories, this essays aims to put those rules into perspective.

Three main types of user categories are discussed - categories that host images, categories that define attributes that might be applied to files and categories that are designed to help humans to navigate through Commons. At times, logic might suggest that a group of images that are hosted in one category should be broken up into a number of different categories, each category reflecting a different additional attribute of that set of files. The writer of this essay proposes that this should normally only be done if the number of images in the original category is too large to be handled as a single set of images. Allowing for the trade-off between having too many images on a single screen and so many sub-categories that the readers is having to open up a large number of small categories when looking for suitable files, it is suggested that categories having fewer than 20 members are prime cases for retaining a category unchanged,, but that categories hosting more than 200 members are prime candidates for being split up into separate sub-categories.

Functions of categories

Categories serve a number of distinct functions. These include:

  • providing a host for images.
  • defining attributes that the image might use.
  • providing a navigation aid through Commons.

Hosting images

Categories form a multi-hierarchical tree. Using standard hierarchical network terminology, the children of categories can either be subtrees or leaves. Children that are subtrees are themselves categories. Children that are leaves are files. All children in a Commons tree must have one parent, but may have multiple parents. Although from a data handling point of view, it might be considered poor practice for a category to host both sub-categories and images; in practice this is not uncommon.

Attribute definition

Water ripples illustrating Bessels function
Church at base Camp Bastion, the main British army base in Afghanistan during Operation Herrick. (2002-2014). When this page was written, this was the only image in Commons of a church in Afghanistan.

One way to describe attributes is to do so by example. The image on the right depicts the church used by British Forces in Afghanistan during Operation Herrick. The photo was taken by a British Army photographer and was filed in the Army records with the keywords "Helmand", "Afganistan", "Afghanistan", "Herrick", "Campaign", "Op", "Operation", "Army", "Camp", "Bastion", "Church", "Religion", "Worship", "Place". These keywords map onto Commons attributes. As can be seen, there are two distinct families of attributes - the military attributes and the religious attributes.

The military categories are split into two families - location (Camp Bastion) and operation (Operation Herrick). The image itself has entries in the categories "Camp Bastion" and "Operation Herrick", both of which map onto Ministry of Defence (MoD) keywords - one an operation and the other a location. Both categories have a common ancestor category in "War in Afghanistan (2001-present)".

  • Camp Bastion Church MOD 45150966.jpg
  • Camp Bastion Church MOD 45150966.jpg

The category tree related to churches follows a standard pattern seen in many Commons trees. Immediately below the main entry are a number of categories of the type "XXX by location", "XXX by type" and many other such attruibutes that are specific to the main category. There are also a number of categories that that are not grouped - one such category that is used in this example is "Temporary churches".

  • Camp Bastion Church MOD 45150966.jpg
  • Camp Bastion Church MOD 45150966.jpg

Navigation Aid

A navigation category has many similar properties to a library shelf. In some village libraries, the medical section might not fill a full shelf. In this library however, several shelving units are dedicated to nursing alone.

If conventional network theory is applied, then many Commons categories are redundant. However they assist with human navigation. This can be illustrated with a few examples.

In the tree in the preceding section, the category "Churches by country" does not add anything to the understanding of the image itself. There are many hundreds of categories that potentially describe attributes of groups of churches. If all of these categories were in a single list, that list might well be difficult for a human to navigate. Thus the category "Churches by country" was introduced to assist in human navigation. This single navigation-type category allows nearly 200 potential attributes that are very similar in nature to be grouped together. Other navigation-type categories combine two or more attributes into one to assist the reader in narrowing down the number of images in his search list. One such example is churches dedicated to St Peter. This number is huge, so the category Saint Peter churches in France was introduced. According to network theory, this adds nothing to the attributes of the image, but it does assist humans in navigating through churches in France if they are interested in navigating using patron saints.

Adding categories

When is it appropriate to add new categories? If a new attribute is being introduced, then it is obvious that a new category should be introduced to reflect that attribute. The new category should be introduced at the level where is appropriate - it the case of the category "Temporary churches" it is appropriate that it be a child category of the category "Churches". It members should be the churches themselves, with or without intermediate categories.

When should one have intermediate categories? Consider the difference between "Churches in France" and "Churches in Afghanistan". It could be argued that both should have similar structures, but "Churches in Afghanistan" has a single member whereas in France churches are categorised by multiple levels of geographical sub-region as is shown in the table below:

  • A number of images

If Afghanistan and France are compared, France is a wealthy country (and therefore has a high proportion of Commons photographers) whereas Afghanistan is a poor country with relatively few Commons photographers. Moreover France has a Christian heritage whereas Afghanistan has a Moslem heritage. Itis little surprise therefore to find that there are substantially more images of French churches than of Afghan churches. The introduction of classification by region, department etc is merely a device to make things manageable on a human sale.

Many categories are an intersection between two attributes that are of equal importance, such as the category Saint Peter churches in France. This is a trickier question. From a theoretical point of view, categories that are formed from the intersection of two sets of attributes are unnecessary. However, from a Human point of view, they often prove very useful. My proposition is that one should look at the number of entries in the category. If the number is less than 20 (less than a screen-full), then unless there is a very good reason one should avoid creating categories that are intersections of attributes. They do not help in any automated filtering that might be done and they make navigating the structure long-winded for humans. That is why the single image in the category "Churches in Afghanistan" links directly to that category, whereas is France, the link between "Churches in France" and the actual churches themselves is via a hierarchy. Soy when should on look at splitting categories up. In my view, once the number of entries gets above 100 then one should be actively looking at ways of splitting the category up.

Users point of view

Categories are there for the benefit of the users. A suitable case study into the way in which categories are used in the real world is to examine how the editor came to select the images used in this essay.

The criteria that the editor used for the image of the church in Camp Bastion was to examine the category "Churches by country" and look for an entry with no sub-categories and as few entries as possible. "Churches in Afghanistan" met that criteria and had the added bonus of a second classification structure -a military structure and also that it had a string of keywords that had been assigned by a professional who had nothing to do with Wikipedia or Wikimedia Commons.

The criteria that the editor had for the second was the interior of a library. Having a fairly large number of entries (114) in the category Interiors of libraries (but not too many entries), allowed him to find a suitable image without visiting sub-categories.

References and Notes

  1. a b In this tree, categories are shown as Wikilinks to the actual category itself, keywords supplied by the MoD are in bold while the name of the actual image shown is in bold italics.

Aide Memoire

Categories

Reflected relations

Problem of over-categorisation (Categories#Categorization_tips

A category can combine two (or more) different criteria; such categories are called “compound categories” or “intersection categories”

Location

The geocoded location of the location of this image has been rounded to the nearest 10 km for privacy or other reasons. Please do not improve the accuracy of the coordinates even if you are able to do so.

Pieces of Eight

Geocoordinates for Artwork

A dispute has broken out regarding what geocoordinates that are required for VI submissions that directly or indirectly uses the {{Artwork}} template.

If the "Institution" field of this template is populated, then the geocoordinates of the institution (usually an Art Gallery) are drawn from Wikidata. The dispute centre on whether in such circumstances it is neccessary to use the "Object location" (or similar) template to repeat the geocode. In the last few days User:Archaeodontosaurus has opposed a number VI submissions on grounds that they did not have a geocode, even though a geocode was already present in the "Collections" (Institution) inforbox. Although I have no objection to him adding this information to his own uploads, I believe that his interpretation of the need to repeat the geocode for VIs is unorothodox and as such should not be imposed on others.

As an example, consider the fragment of code shown below (an edited version of this file, uploaded by User:Archaeodontosaurus) which I have annotated.

{{Art photo
|wikidata=Q15974346
|institution = {{Institution:Gallerie dell'Accademia (Venice)}} -- (Creates geocode)
|Source={{own}}
|photographer =[[User:Archaeodontosaurus|Didier Descouens]]}}
{{Object location dec| 45.43122|12.3283|region:IT}} -- (Repeats geocode)

This code yields the following display. The geocoordinates of the institution can be found by expanding the "Collection" infobox. The coordinates in the expanded box were copied manually to the "Object location" template resulting in them being displayed as part of the photograph information.

Object

Domenico Fetti: Magdalene in Meditation  wikidata:Q15974346 reasonator:Q15974346
Artist
Domenico Fetti  (1589–)  wikidata:Q551695
 
Domenico Fetti
Description Italian painter
Date of birth/death 1589 Edit this at Wikidata 16 April 1623 / 1624 Edit this at Wikidata
Location of birth/death Rome Venice
Work period Baroque
Work location
Authority file
creator QS:P170,Q551695
 Edit this at Wikidata
image of artwork listed in title parameter on this page
Title
Magdalene in Meditation
label QS:Lja,"瞑想"
label QS:Lfr,"Méditation"
label QS:Lit,"Meditazione"
label QS:Len,"Magdalene in Meditation"
label QS:Lde,"Meditation"
label QS:Lzh,"沉思"
label QS:Lnl,"Meditation"
Object type painting / prime version Edit this at Wikidata
Genre religious art Edit this at Wikidata
Depicted people Mary Magdalene Edit this at Wikidata
Date 1610s
date QS:P,+1610-00-00T00:00:00Z/8
 Edit this at Wikidata
Medium oil on canvas Edit this at Wikidata
Dimensions height: 179 cm (70.4 in) Edit this at Wikidata; width: 140 cm (55.1 in) Edit this at Wikidata
dimensions QS:P2048,+179U174728
dimensions QS:P2049,+140U174728
institution QS:P195,Q338330
Accession number
References
Other versions

Photograph

Source Own work
Author Didier Descouens
Object location45° 25′ 52.39″ N, 12° 19′ 41.88″ E Kartographer map based on OpenStreetMap.View all coordinates using: OpenStreetMapinfo

The question is Is it neccessary to repeat the geocoordinates in order for the image to be a VI?.

I have looked at what other uploaders think and using Google search using the search string "valued image artwork" and a filter to limit me to Commons files, I very quickly found seven other uploaders who used the "Artwork" template. They are listed below:

An analysis of their work showed that only User:Archaeodontosaurus repeated the geocode in the manner described above. This tells me that under the established practice only one set of geocoordinates is sufficent for a VI. Comments please? Martinvl (talk) 20:23, 23 September 2020 (UTC)

Comments