Commons:Village pump/Request for extension to provide metadata support

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Given that

  1. the WMF Mission is to "disseminate [educational content] effectively and globally", and
  2. redistributing our files for inclusion in others' collections is "effective dissemination", and
  3. third-party repositories often/usually require machine-readable metadata for importing large collections of data, and
  4. MediaWiki as used by Wikimedia Commons lacks a method of storing (and offering access to) machine-readable file metadata,

the Wikimedia Commons community wishes that the Wikimedia administrators would do whatever is necessary to enable a method for editors to record metadata in a machine-readable format, such as mw:Extension:RDF. An even more powerful option would be mw:Extension:Semantic MediaWiki, if it can be made to work on a large site like commons or wikipedia.

Metadata here refers to information such as author(s), rights information (license/s), descriptions, date, location and keywords (tags/categories). The code that defines this metadata in a machine readable way would be integrated into templates, especially intop license tags and {{tl:Information}}, so normal users wouldn't have to deal with it. This also means that it would become available for millions of images immediately.

--> https://bugzilla.wikimedia.org/show_bug.cgi?id=17503 (but vote here to add your support rather than for the bug...)

Votes[edit]

  1.  Support pfctdayelise (说什么?) 09:18, 5 February 2009 (UTC)[reply]
  2.  Support Duesentrieb 09:26, 5 February 2009 (UTC) We really need this, yes![reply]
  3.  Support Raymond Disc. 13:27, 5 February 2009 (UTC)[reply]
  4.  Support Multichill (talk) 13:37, 5 February 2009 (UTC)[reply]
  5.  Support Gnangarra 13:43, 5 February 2009 (UTC)[reply]
  6.  Support Pruneautalk 14:22, 5 February 2009 (UTC)[reply]
  7.  Support EvanProdromou (talk) 14:25, 5 February 2009 (UTC)[reply]
  8.  Support Platonides (talk) 15:00, 5 February 2009 (UTC)[reply]
  9.  Support Lupo 15:13, 5 February 2009 (UTC)[reply]
  10.  Support as a start. guillom 15:30, 5 February 2009 (UTC)[reply]
  11.  Support, vital feature for projects which offer a lot of content with different licenses and authors, like Commons. -- ChrisiPK (Talk|Contribs) 15:37, 5 February 2009 (UTC)[reply]
  12.  Support, why not? Wuzur 15:41, 5 February 2009 (UTC)[reply]
  13.  Support mit Sternchen. sугсго 15:42, 5 February 2009 (UTC)[reply]
  14.  Support But keep in mind that developer's time is limited resource and voting itself is not enough to increase priority of issue (code review, deployment, etc) to them. --EugeneZelenko (talk) 15:44, 5 February 2009 (UTC)[reply]
  15. support notafish }<';> 17:24, 5 February 2009 (UTC)[reply]
  16.  Support this is a big problem for poster projet on fr.wikipedia.org. We have to print the whole Image page with the poster to be fine with the licence. Plyd (talk) 17:24, 5 February 2009 (UTC)[reply]
  17.  Support Yann (talk) 18:11, 5 February 2009 (UTC)[reply]
  18.  Support --Fajro (talk) 18:38, 5 February 2009 (UTC)[reply]
  19. extreme  Support for this. Bastique demandez 19:14, 5 February 2009 (UTC)[reply]
  20.  Support, yes, good suggestion.--Wing (talk) 20:18, 5 February 2009 (UTC)[reply]
  21.  Support --Nemo 20:23, 5 February 2009 (UTC)[reply]
  22.  Support - in terms of return on investment, this is one of the most productive things I've yet seen suggested for MediaWiki development. Shimgray (talk) 20:40, 5 February 2009 (UTC)[reply]
  23.  Support --MichaelMaggs (talk) 20:56, 5 February 2009 (UTC)[reply]
  24.  Support the concept - whether this implementation is what we want is not clear to me presently.  — Mike.lifeguard 21:40, 5 February 2009 (UTC)[reply]
  25.  Support Railwayfan2005 (talk) 22:46, 5 February 2009 (UTC)[reply]
  26.  Support adding this functionality, though I cannot comment on the particular extension that was mentioned. --Sopoforic (talk) 00:22, 6 February 2009 (UTC)[reply]
  27.  Support--shizhao (talk) 03:59, 6 February 2009 (UTC)[reply]
  28.  Support --WikedKentaur (talk) 07:46, 6 February 2009 (UTC)[reply]
  29.  Support --Magnus Manske (talk) 10:23, 6 February 2009 (UTC)[reply]
  30.  Support --Geraki TLG 14:27, 6 February 2009 (UTC)[reply]
  31.  Support – I think functionality like this will make Commons 1000 times more useful, and will hopefully spur other sites to adopt Semantic Web–type technology. --bdesham  23:54, 6 February 2009 (UTC)[reply]
  32.  Support --Flominator (talk) 13:02, 7 February 2009 (UTC)[reply]
  33.  Support --He!ko (talk) 17:25, 8 February 2009 (UTC)[reply]
  34.  Support --Foroa (talk) 08:32, 9 February 2009 (UTC)[reply]
  35.  Support --Kam Solusar (talk) 19:21, 10 February 2009 (UTC)[reply]
  36.  Support --UV (talk) 23:17, 10 February 2009 (UTC)[reply]
  37. Support --Frank Schulenburg (talk) 18:35, 11 February 2009 (UTC)[reply]
  38.  Support --Mdale (talk) 01:26, 12 February 2009 (UTC)[reply]
  39.  Support -- Sure! - Badseed talk 01:45, 12 February 2009 (UTC)[reply]
  40.  Support --Longbow4u (talk) 10:20, 12 February 2009 (UTC) The sooner the better![reply]
  41.  Support Stifle (talk) 10:23, 14 February 2009 (UTC)[reply]
  42.  Support Jastrow (Λέγετε) 19:55, 17 February 2009 (UTC)[reply]
  43.  Support Diti the penguin 08:44, 19 February 2009 (UTC)[reply]
  44.  Support Kolossos (talk) 17:11, 19 February 2009 (UTC)[reply]
  45.  Support! It will be a great experience. Анастасия Львоваru (ru-n, en-2) 19:48, 23 February 2009 (UTC)[reply]
  46.  Support progress. --Rave (talk) 19:54, 23 February 2009 (UTC)[reply]
  47.  Support If it looks like a good idea, quacks like a good idea and... well, you know. Patrícia msg 17:00, 24 February 2009 (UTC)[reply]
  48.  Support! --Miha (talk) 19:44, 25 February 2009 (UTC)[reply]
  49.  Support--Lilyu (talk) 07:45, 1 March 2009 (UTC)[reply]
  50.  Support Jbeigel (talk) 14:27, 4 March 2009 (UTC)[reply]
  51.  Support // tsca [re] 10:59, 13 March 2009 (UTC)[reply]
  52.  Support --Anatoliy (talk) 22:41, 17 March 2009 (UTC)[reply]
  53.  Support -- Gaurav (talk) 00:06, 12 August 2012 (UTC)[reply]

Discussion[edit]

We use semantic mediaWiki for temporal media tagging within the metavidWiki extension. It does work well... I can imagine a bit of work to scale it up to commons with much less effort than starting from scratch ;)

Note that there is already an img_metadata field in the database, which seems to get its information directly from the file. That metadata is currently only readable through the API (not through the UI) and not changeable at all IIRC. An example of what this metadata looks like is here (note: format will change at the next scap). --Catrope (talk) 13:34, 5 February 2009 (UTC)[reply]

I think that just contains the EXIF stuff, i.e. mostly technical metadata, not the authorship/license stuff we want. Well, technically, EXIF *could* provide that info, but few peopl,e actually embed it that way.
For base functionality, it wouldn't even be neccessary to store the datawe want in the DB. although that would sure be useful.
-- Duesentrieb 14:23, 5 February 2009 (UTC)[reply]
It currently just contains the EXIF stuff, but that could be extended. Just pointing out an existing way of storing and getting metadata so someone else doesn't reinvent the wheel. --Catrope (talk) 15:40, 5 February 2009 (UTC)[reply]

As a side note, the above-referenced RDF extension is in widespread use on Wikitravel and has been used to implement article status markers, geographical hierarchies, and various other metadata layering on top of MW. --EvanProdromou (talk) 14:27, 5 February 2009 (UTC)[reply]

AS a side note to the side note: Evan wrote the extension, bug him if you have questions :) Well, he's a busy guy, but I hope he'll check back here every now and then. -- Duesentrieb 15:16, 5 February 2009 (UTC)[reply]
Why not just make it a microformat, like:

<span class="img"> <div class="des">A bird in flight</div> <div class="form">JPG</div> <div class="auth">Someone</div> <div class="lice">CC-BY-3.0</div>< <div class="loc"><span class="lad">SOMETHING</span><span class="lon">SOMETHING</span></div> </span>

hmm? ViperSnake151 (talk) 01:56, 6 February 2009 (UTC)[reply]

Microformats are a good addition to, but no replacement for, proper metadata. Loading full HTML is massive overhead, parsing malformed HTML is slow and unreliable, there are not many good standards for microformats, stuff embeded this way can't easily be stored in the db for optimized queries, etc etc.
If we have proper metadata embedding, mapping them to microformats for html output would sure be nice, yes. But making up microformats as we go along and hope people will find out how to parse our tag soup is not a good idea. -- Duesentrieb 09:47, 6 February 2009 (UTC)[reply]

Embedded metadata[edit]

It would be terrific if MediaWiki could automatically edit the embedded image metadata (e.g. XMP) to match the information on the image page. I think most professional or semi-pro photographers do this already, but most people don't. --bdesham  23:58, 6 February 2009 (UTC)[reply]

I don't know about "automatically"... I think it should require people to press a button, at least. It could also be an option at upload time. editing the embedded metadata should, however, create a new version of the file, so the original remains available. -- Duesentrieb 17:32, 7 February 2009 (UTC)[reply]
It would be cool if the upload page could detect the metadata in the file before uploading. --ƒajro @ 18:32, 9 February 2009 (UTC)

Other Metadata[edit]

Don't forget about the metadata in OGG and SVG files. I'd like mark files with incorrect metadata, there are templates for that? --ƒajro @ 18:32, 9 February 2009 (UTC)

License management as a core feature of MW[edit]

While I appreciate this request, it is not really solving the problems of re-users. In our opinion, license management/handling needs to be a core feature of MediaWiki, because the software is explicitly developed for the collaborative creation and distribution of free content. Licenses of the contained articles and images should not be represented via some agreed-upon convention (if at all in smaller wikis) but via structured (and machine-readable) information, available for each relevant object in the wiki. This is not limited to images but also necessary for articles since wikis like Wikisource include text covered by various licenses.

Some information that would be desired:

  • Full (official) name of the license(s)
  • Whether the full text of the license has to be included or a reference is sufficient
  • Reference to the full text of the license(s) (in some rigidly defined format like wikitext)
  • Whether attribution is required
  • Data required to attribute

So, basically all the information that's required to check if it's possible to take some part of the MediaWiki and use it somewhere else and all the information that has to be included in derived works. This information should be available in every default MediaWiki setup, being accessible via API and be part of the XHTML also.

I know that these requirements are somewhat out of the scope of this request. But since it was motivated by the WMFs Mission is to "disseminate [educational content] effectively and globally" I wanted to add my view on what would be necessary to achieve this. --He!ko (talk) 12:19, 11 March 2009 (UTC)[reply]

Hi He!ko,
This request is precisely about getting license data into a machine readable format. Perhaps it can be incorporated into the MW API even though it is via an extension. I don't think it is too far from your request. --pfctdayelise (说什么?) 03:08, 12 March 2009 (UTC)[reply]