User:Fæ/Project list/Cooper-Hewitt

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Cooper-Hewitt batch upload project

[edit]

Files uploaded as part of this project can be listed using this search.

French silk, c.1865, from chair back. Part of Textiles.

Cooper Hewitt is a design museum, part of the Smithsonian Institution. Collections cover decorative arts and design. At the end of October 2017, the museum announced that 200,000 objects in its collection had been digitized and these were available on their website.[1] By officially declaring their own photography as no copyright known, the museum has waived potential rights in the photograph.

The Cooper-Hewitt API is used to generate details for the upload. Not all details are included. For example exhibition history is not copied and can be seen on the live Cooper-Hewitt catalog page linked. The API uses access keys and OAuth confirmation, for this project an app of "Commons" used a key that expired after a month, then a second key of indefinite duration for read only access.

The batch run completed on 14 December 2017 with 74,024 images uploaded, illustrating around 60,000 objects.

Technical details

[edit]

File names are of the format:

File:<title> (CH <Object_id>).jpg

Phabricator request for whitelisting: Phab: T180241.

The API is queried for objects with no copyright known. An arbitrary results page size of 50 objects has been chosen for convenience, the technical maximum is 500.

Where the Artist is identified, the name is tested to see if a creator template is available. Non-artist roles like engraver or manufacturer are added to the Artist parameter for information, but are not tested for creator templates.

Multiple images

[edit]

The largest image is selected from the images available in the API listing for an object. In many cases there is one large photograph and derived thumbnails, however there are also some with alternative images, though experiments show that few have alternates greater than 1 megapixel in size. Where there are two or more images for the object greater than 1 megapixel, the file format for the second image onwards is:

File:<title> (CH <Object_id>-<image sequence>).jpg

Example of multiple images for one catalogued object:

A very large gallery can be seen with this French sample book.

Categorization

[edit]
Male nude by Domenico Corvi, c.1740; categorized to Drawings.

A limited approach to categorization is applied at upload based on department.

if department == "Textiles":
   cats.append(u"Textiles in the Cooper–Hewitt, Smithsonian Design Museum")
if department == "Drawings, Prints, and Graphic Design":
   cats.append(u"Drawings and paintings in the Cooper–Hewitt, Smithsonian Design Museum")
if department == "Product Design and Decorative Arts":
   cats.append(u"Product Design and Decorative Arts in the Cooper–Hewitt, Smithsonian Design Museum")
if department == "Wallcoverings":
   cats.append(u"Wallcoverings in the Cooper–Hewitt, Smithsonian Design Museum")
if cats == []:
   cats.append(u"Collections of the Cooper–Hewitt, Smithsonian Design Museum")

When "role_name" in "participants" is "Artist" and that is more than 8 characters, then a matching category name is added if it exists. This may not be ideal, as many have custom subcategories like drawings by Artist, however it's a reasonable starting point.

Where department_id for an object does not find a matching department in the standard API list, categorization will default to putting the image in the parent category.

Where date is given to a specific year, such as 1890, a general category of the form Category:1890 works is added so long as the category exists.

Where a country is given in "woe:country_name", the specific country category is added. For example Category:Artworks of the United Kingdom in the Cooper–Hewitt, Smithsonian Design Museum. These are manually created as they arise.

[edit]
English door knob, c.1775. Part of decorative arts.

Only "no copyright known" status files are listed, this is a true/false option available via the API search. The specific licenses used on the Commons image page are chosen based on deduced date:

  lic = "Cooper Hewitt - No known copyright restrictions"
  if year < 1890:
    lic = "PD-old-100-1923"
  elif year < 1923:
    lic = "PD-old-70-1923"
  elif year < 1963:
    lic = "PD-US-not renewed"

Date is deduced by any set value for "date", but if unknown falls back on a maximum of the acquisition date.

The {{PD-Art}} template is only applied as a 'wrapper' to the main license, to Drawings, Prints, Graphic Design and Wallcoverings as identified by the object's tagged department. These are presumed to be two-dimensional works.

Where the date is after 1923 or no alternative is deduced, the custom template {{Cooper Hewitt - No known copyright restrictions}} is added to the image. For example File:Textile, Powdered, designed ca. 1874, printed ca. 1934 (CH 18340067).jpg probably was printed in the 19th C. when first designed, but the only firm date is the printed date of 1934, so the no known copyright restrictions template is added.

See Petscan report for a list of no known copyright files.

Notes and queries

[edit]
Kitsch cats greeting card, 1890s.
  • Though the main query is for no copyright known, this may return objects for which there are no images.
  • The largest image returned by the API is chosen by height only.
  • Dates are returned in various formats like "17th century", "ca. 1800" and "1890". There is no forced consistency in format, e.g. "n.d." and "n. d." are both used for "no date". These are not automatically parsed into Wikimedia Commons date templates.
  • To avoid format problems that could be created by wiki-sensitive characters like "=" or "|", free form fields of description and markings are wrapped with <nowiki>.
  • Unlike other object metadata, the use of woe (Where On Earth) is not consistent as it may be left out in the record.
  • https://collection.cooperhewitt.org/objects/18105219 gave a file format error post-upload (file contains HTML or script code), even though it displays at source.
  • File:Drawing (CH 18102981).jpg, File:Drawing (CH 18103009).jpg, File:Drawing (CH 18103023).jpg are the same object with different object_ids and different but nearly identical scans. This seems to have been a cataloging error during scanning.
  • CH 18159949 https://images.collection.cooperhewitt.org/312634_b34153745609fa3f_x.jpg has access forbidden (Error 403). Similarly 18259547 and

18265367.