Commons talk:Database reports/Archive 1

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Thanks[edit]

Excellent initiative. As most special pages are made for the main namespace (article or gallery namespace), Commons is in need of such reports. -- User:Docu at 14:08, 1 January 2010 (UTC)[reply]

Request: long pages on file namespace (5k)[edit]

Would you run Commons:Database reports/Long pages on file namespace (checking file description page size, not file size)?

Not quite sure what the minimal size for the file description page should be though. I would think that Category:Images with annotations are probably longer due the code generated by the image annotation tool, but even Category:Images with 10+ annotations [1] has only 5 files longer than 10000 bytes. Maybe 5k is a reasonable limit if files in "Images with 10+ annotations" are excluded. For a start, we could try 10k and see what it outputs.

Ideally we would find file description pages like this (it was 18k). -- User:Docu at 14:51, 1 January 2010 (UTC)[reply]

Neither of these two queries factored in Category:Images with 10+ annotations, they're merely sample data to see if you'd like anything refined before outputting a proper report. That category can certainly be excluded if desired. --MZMcBride (talk) 20:26, 1 January 2010 (UTC)[reply]
Thanks that was quick. I started looking through the first one. Some crap, a few essays, some source data for graphics, logs, a pre-svg conversions, etc. I will attempt to go through all of them. -- User:Docu at 21:05, 1 January 2010 (UTC)[reply]
For future queries, we could just exclude Category:Long file description pages and its subcategories. -- User:Docu at 06:03, 5 January 2010 (UTC)[reply]

Thanks for running this report. Of the >10 KB list most have been reviewed. The few remaining ones are in Category:Long file description pages for review. Several should probably be deleted.

There are about 2000 uploads by Ralf Roletschek that would also go into Category:Long file description pages with license text. These make up most entries of the list >5 KB. If the report is re-run, we could just exclude his uploads.

As I have seen too many of these pages right now, I wouldn't look into an updated report for now. ;) I left a note on COM:AN, maybe someone else would want to review some of the files in the subcategories of Category:Long file description pages. -- User:Docu at 04:57, 8 January 2010 (UTC)[reply]

Request: Files in Category:Featured pictures on Wikimedia Commons with less than 4 non-hidden categories[edit]

As a tool to improve categorization, would you set a report that lists the above selection from Category:Featured pictures on Wikimedia Commons? The output could have the following form:

ImageCurrent non-hidden categoriesCount Upload date
Sheet Lightning over Mt Wellington.jpg
Mount Wellington
Lightning
2 2010-01-12

Current categories don't necessarily need to link. Hidden categories shouldn't be taken in account. Upload date would be nice to have. Update could be daily or weekly. -- User:Docu at 11:40, 24 January 2010 (UTC)[reply]

I don't have too much free time right now, so I can only give you rough results: this is the pages in Category:Featured pictures on Wikimedia Commons by number of non-hidden categories. Because of the way GROUP BY works, this is the list of pages that are in 0 non-hidden categories. Let me know if you need further help. I can probably do a GROUP_CONCAT for you if you'd like or something similar. Not sure I have time right now to do a completely different output format, though. --MZMcBride (talk) 06:46, 25 January 2010 (UTC)[reply]
Thanks for the report. So many results (even some with 0 categories). I think it will take quite some time to work through the current results (584 with 1 category, 689 with 2, 501 with 3). Format is fine, I will convert it myself. -- User:Docu at 08:34, 25 January 2010 (UTC)[reply]
I mentioned it at VP. -- User:Docu at 09:13, 25 January 2010 (UTC)[reply]


Request: pages in file namespace with interwikis[edit]

As there shouldn't be any interwiki links on these, we could do a report to check how many pages would need to be cleaned-up. Not really urgent nor a priority, but maybe someone wants to this one day by bot. -- User:Docu at 11:14, 1 February 2010 (UTC)[reply]

I'm not sure what you mean by "Request" pages. Do you have an example? --MZMcBride (talk) 16:23, 1 February 2010 (UTC)[reply]
Almost every section is prefixed with "(new query) request (to MZM):" ;)
Here it is: sample diff. guideline. -- User:Docu at 16:28, 1 February 2010 (UTC)[reply]
Haha, sorry. I guess the lack of sleep is showing. :-) I queried this just now. It's about 41,000 results, which makes it a bit unmanageable for a wiki page. I've put the results here: tools:~mzmcbride/doc-commons-langlinks-2010-02-01.txt (save as... or you'll crash your browser). I would think that people might intentionally have language links on the file description page if there is an intentionally local copy of the image on another wiki, but I'll admit to knowing little-to-nothing about Commons. Let me know if you need anything further. (This could be set to output to a file monthly or something if you'd like.) --MZMcBride (talk) 17:05, 1 February 2010 (UTC)[reply]

Query used:

SELECT
  page_title,
  COUNT(ll_lang)
FROM page
JOIN langlinks
ON ll_from = page_id
WHERE page_namespace = 6
GROUP BY page_title;

Thanks, that was quick. You even included the count of interwikis. The winner with 136 links was File:Scalable Vectorized Adminstrative Map of Perú JMK.SVG. As there are now these nice, up-to-date global usage links, I think people are less likely to attempt to add them. I will try to get a bot to remove these old ones. -- User:Docu at 17:23, 1 February 2010 (UTC)[reply]

I personally may have used interwiki language links on descriptions, sometimes. I'm sure many other people do it as well (some don't know the difference between [[en:]] and [[w:]], for example). I haven't taken a look at the query results, but if it counts interwiki language links that don't lead to the file namespace in the target project, it could potentially have lots of false positives. Killiondude (talk) 17:39, 1 February 2010 (UTC)[reply]
Here is the detail by number of links on the pages:
1	34026    (34026 file description pages have 1 interwiki link)
2	4155
3	1195
4	595
5	362
6	242
7	180
8	137
9	159
10	88
11	64
12	72
13	41
14	54
15	78
16	40
17	58
18	23
19	18
20	51
21	31
22	13
23	34
24	22
25	12
26	8
27	26
28	5
29	18
30	50
31	5
32	4
33	7
34	9
35	1
36	1
37	9
38	3
39	1
40	6
42	4
43	1
44	14
46	6
47	1
48	1
49	2
51	19
52	1
53	1
54	1
56	1
60	1
61	1
62	5
64	1
65	1
67	4
69	5
73	1
75	1
79	3
80	1
81	5
89	4
93	1
110	1
136	1 (1 page has 136 interwiki links)

I just did a few of the ones with the most links manually: one transcluded a category instead of linking to it, the others had all interwikis to a marginally related Wikipedia article ("Spain", "Peru"). -- User:Docu at 18:01, 1 February 2010 (UTC)[reply]

There have been several discussions of the use of interwiki's on images without reaching consensus or a simple rule. It is clear that properly described images in a proper category don't need interwiki's and that general interwiki's such as references to countries don't help.
There are however many cases where interwiki's that point to wikipedia articles add great value to the image files:
  • When the image has a poor description, an interwiki to an article might provide more information than the description itself. A link to another picture might help too if the other file is properly described and categorised. Sometimes, I find more image information by looking at the uploader contributions on Commons and other wikipedia's.
  • When the interwiki refers to an article that is different than the category to which it belongs (because it is not worth of making a separate category for that image alone, for example a detail of a building that is discussed in another article, a member of a pop band, a specific model of an instrument that is used, ...
  • When no specific category exist yet (for example a single image of an "actor from China"), the interwiki might lead to documentation that tends hard to find when creating categories. A substantial part of the uncategorised categories can be solved by looking into the image categories and interwiki's: the description of the image provides often the least information (the language used provides sometimes more information than the description itself).
So I don't think that a bot can systematically clear out those interwiki's. --Foroa (talk) 18:12, 1 February 2010 (UTC)[reply]
Foroa, thanks for bringing us your usual veto. You probably missed the proper forum though. If you think the relevant guideline needs updating, you should bring this up at the VP. -- User:Docu at 18:27, 1 February 2010 (UTC)[reply]
(edit conflict) I think, and I could be totally wrong, that Docu is looking for (only?) interwiki language links, that is, links that are like [[en:]] or [[de:]], which are separate from interwiki ''project'' links like [[w:]]. But as I said above, if people don't know the difference, they could be linking with a file description as a language link rather than an interwiki project link. The reason for the confusion is that language links lead to Wikipedia by default, because Commons isn't a multilingual project like wikisource, wikipedia, etc. whereas language links on Wikipedia would lead to that title on the other language version of that project. In any case, I think that links to countries are dumb, but as Foroa said, there is some value in linking to other projects. 18:28, 2010 February 1 Killiondude
Yes, the ones marked with on Commons:Guide_to_layout#Interwiki_links. Removing primarily the interwikis that lead to file description pages in other wikis should avoid most problems. -- User:Docu at 18:35, 1 February 2010 (UTC)[reply]

Here's my understanding of internal or semi-internal links (links using the [[foo]] syntax):

  1. Internal page links: Tracked by the pagelinks table; includes any link that isn't using a special prefix;
  2. File links: Tracked by the imagelinks table; includes any link beginning with File or Image (and their translations);
  3. Category links: Tracked by the categorylinks table; includes any link beginning with Category (and its translations);
  4. Interwiki links: Defined primarily by the m:Interwiki map and stored in the interwiki table, but usage is not tracked; includes any link beginning with prefixes such as mw:, tswiki:, etc.;
  5. Interproject links: Defined inside the software, but usage is not tracked; includes links beginning with w:, wikt:, q:, etc.;
  6. Interlanguage or interproject links: Tracked by the langlinks table; includes links beginning with designated language code prefixes such as en:, de:, fr:; these only act as interlanguage links in certain namespaces;

In addition to these types of links, there are two primary groups that are tracked that do not use the [[foo]] syntax: (1) templates are tracked by the templatelinks table; and (2) external links are tracked by the externallinks table.

This particular query was run on the File namespace, which treats [[en:foo]] as an interlanguage/interproject link. For Commons, this is specified as Wikipedia. As far as I'm aware, the langlinks table only tracks uses that are in the sidebar; if the link is in the file description summary, it would not appear in the langlinks table and would not be counted in this query. In fact, it is not tracked at all by the database, as far as I know.

Someone may want to find a MediaWiki expert to confirm some or all of what I'm saying here. --MZMcBride (talk) 19:22, 1 February 2010 (UTC)[reply]

From this, it is clear that Commons:Guide_to_layout#Interwiki_links is not linked to any Commons Guideline, that it has never been seriously discussed or formally accepted and is on the Commons To Do category. On the other hand, there have been many discussion on interwiki linking without ever coming to a global conclusion. So you can not use that guideline to delete all interwiki's. --Foroa (talk) 19:26, 1 February 2010 (UTC)[reply]

By the way, as a general note, I follow this page rather closely in order to be able to respond to requests here quickly and efficiently. So, if you're going to have a pissing match over something stupid and irrelevant to database reports, please, please, please take it elsewhere, as I plainly do not give a fuck and don't need the extra noise. Thanks! --MZMcBride (talk) 19:29, 1 February 2010 (UTC)[reply]

I think your report matches my specs. The point Killiondude raised can be taken in account when dealing with it. -- User:Docu at 22:43, 1 February 2010 (UTC)[reply]