Investigating The Met collection through Data

The following Python script does some basic data analysis to better understand The Met's database dump of 400,000+ objects, to provide insights on how to integrate it with linked open data projects, especially Wikidata. Any question: send to andrew.lih@gmail.com

import csv
from urllib.request import urlopen
import codecs
# from tqdm import tqdm
# from tqdm._tqdm_notebook import tqdm_notebook
import pandas as pd
import matplotlib.pyplot as plt

# The Met's weekly CSV dump URL in Github is quite big at 250 Mbytes
# Comment out for now, since this is slow to load. 
# url = 'https://media.githubusercontent.com/media/metmuseum/openaccess/master/MetObjects.csv'

# Use local copy of CSV file for speed and read the CSV file into a pandas dataframe
url = 'metmuseum/MetObjects-20190425.csv'
df = pd.read_csv(url,low_memory=False)

# Make a new shallow/efficient copy of dataframe (data in place):

# Just the highlights (~1000 objects)
hdf = df[df['Is Highlight'] == True].copy()
# Just public domain
pddf = df[df['Is Public Domain'] == True].copy()

Basic stats on Met collection

Below are some stats showing how full or empty each of the columns are in the database.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 494311 entries, 0 to 494310
Data columns (total 44 columns):
Object Number              494311 non-null object
Is Highlight               494311 non-null bool
Is Public Domain           494311 non-null bool
Object ID                  494311 non-null int64
Department                 494311 non-null object
Object Name                489851 non-null object
Title                      463050 non-null object
Culture                    209050 non-null object
Period                     89550 non-null object
Dynasty                    23284 non-null object
Reign                      11205 non-null object
Portfolio                  22217 non-null object
Artist Role                285440 non-null object
Artist Prefix              98323 non-null object
Artist Display Name        287530 non-null object
Artist Display Bio         237989 non-null object
Artist Suffix              12208 non-null object
Artist Alpha Sort          287495 non-null object
Artist Nationality         193440 non-null object
Artist Begin Date          240617 non-null object
Artist End Date            237808 non-null object
Object Date                479252 non-null object
Object Begin Date          494311 non-null int64
Object End Date            494311 non-null int64
Medium                     486728 non-null object
Dimensions                 417876 non-null object
Credit Line                493520 non-null object
Geography Type             60369 non-null object
City                       32248 non-null object
State                      2805 non-null object
County                     8576 non-null object
Country                    76812 non-null object
Region                     31975 non-null object
Subregion                  22277 non-null object
Locale                     15562 non-null object
Locus                      7329 non-null object
Excavation                 15967 non-null object
River                      2098 non-null object
Classification             437900 non-null object
Rights and Reproduction    24940 non-null object
Link Resource              494311 non-null object
Metadata Date              494311 non-null object
Repository                 494311 non-null object
Tags                       277566 non-null object
dtypes: bool(2), int64(3), object(39)
memory usage: 159.3+ MB

Breakdown of columns

What departments have the largest number of objects?

ax = df['Department'].value_counts().plot(kind='barh')

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)

# Switch off ticks
ax.tick_params(axis="both", which="both", bottom=False, top=False, labelbottom=True, left=False, right=False, labelleft=True)

# Draw vertical axis lines
vals = ax.get_xticks()
for tick in vals:
    ax.axvline(x=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1)

# Set x-axis label
ax.set_xlabel("Count", labelpad=20, weight='bold', size=12)

# Set y-axis label
ax.set_ylabel("Department", labelpad=20, weight='bold', size=12)

# Format y-axis label
# ax.xaxis.set_major_formatter(StrMethodFormatter('{x:,g}'))
    
ax.invert_yaxis()
df['Department'].value_counts()
Drawings and Prints                          178182
European Sculpture and Decorative Arts        43024
Photographs                                   39040
Asian Art                                     37780
Greek and Roman Art                           33700
Costume Institute                             31331
Egyptian Art                                  27911
American Decorative Arts                      18571
Islamic Art                                   15785
Modern and Contemporary Art                   14790
Arms and Armor                                13582
Arts of Africa, Oceania, and the Americas     13053
Medieval Art                                   7503
Ancient Near Eastern Art                       6324
Musical Instruments                            5321
European Paintings                             2944
The Cloisters                                  2629
Robert Lehman Collection                       2586
The Libraries                                   255
Name: Department, dtype: int64
df['Medium'].value_counts().head(90)
Terracotta                                                   23557
Commercial color lithograph                                  23252
Etching                                                      16274
Engraving                                                    11042
Albumen photograph                                           10674
Gelatin silver print                                          9834
Silk                                                          8488
Bronze                                                        7328
Glass                                                         6681
Lithograph                                                    6019
Film negative                                                 5894
Albumen silver print from glass negative                      4998
Faience                                                       4874
Silver                                                        4723
silk                                                          4712
Woodcut                                                       4501
Photolithograph                                               4157
Gold                                                          3964
Oil on canvas                                                 3949
Commercial color photolithograph                              3469
Commercial photolithograph                                    3101
Polychrome woodblock print; ink and color on paper            3082
Etching and engraving                                         2998
Cotton                                                        2966
Wood                                                          2884
Albumen silver print                                          2820
Commercial lithograph                                         2673
Instant color print                                           2639
Hard-paste porcelain                                          2610
cotton                                                        2561
                                                             ...  
Photomechanical print                                          941
Copper                                                         934
Ink on paper                                                   918
Tin-glazed earthenware                                         903
Clay                                                           896
Platinum print                                                 889
Cut paper silhouette                                           818
Travertine (Egyptian alabaster)                                796
Commercial Color Lithograph                                    796
Stucco; carved                                                 767
Drypoint                                                       765
Illustrations: photomechanical process                         763
Illustrations: wood engraving                                  754
Wool, linen; plain weave, tapestry weave                       741
Graphite on paper                                              731
Hand-colored lithograph                                        730
Etching; second state of two (Lieure)                          713
Steatite                                                       702
Flint                                                          694
Carnelian                                                      692
Wool                                                           691
Pen and black ink, watercolor and gouache with gum arabic      688
Ink, opaque watercolor, and gold on paper                      671
Mud                                                            663
Silk, metallic thread                                          656
Commercial color lithographs                                   653
Jade                                                           648
Watercolor on ivory                                            635
Illustrations: engraving                                       627
Glazed steatite                                                626
Name: Medium, Length: 90, dtype: int64

Object details

Here's a typical object returned from the API or read from the spreadsheet.

df.iloc[34]
Object Number                                                        04.1a–c
Is Highlight                                                           False
Is Public Domain                                                       False
Object ID                                                                 35
Department                                          American Decorative Arts
Object Name                                                             Vase
Title                                                         The Adams Vase
Culture                                                             American
Period                                                                   NaN
Dynasty                                                                  NaN
Reign                                                                    NaN
Portfolio                                                                NaN
Artist Role                                            Designer|Manufacturer
Artist Prefix                                    Designed by|Manufactured by
Artist Display Name                           Paulding Farnham|Tiffany & Co.
Artist Display Bio                                    1859–1927|1837–present
Artist Suffix                                                            NaN
Artist Alpha Sort                            Farnham, Paulding|Tiffany & Co.
Artist Nationality                                                       NaN
Artist Begin Date                                      1859      |1837      
Artist End Date                                        1927      |9999      
Object Date                                                          1893–95
Object Begin Date                                                       1893
Object End Date                                                         1895
Medium                     Gold, amethysts, spessartites, tourmalines, fr...
Dimensions                 Overall: 19 7/16 x 13 x 9 1/4 in. (49.4 x 33 x...
Credit Line                                    Gift of Edward D. Adams, 1904
Geography Type                                                       Made in
City                                                                New York
State                                                                    NaN
County                                                                   NaN
Country                                                        United States
Region                                                                   NaN
Subregion                                                                NaN
Locale                                                                   NaN
Locus                                                                    NaN
Excavation                                                               NaN
River                                                                    NaN
Classification                                                         Metal
Rights and Reproduction                                                  NaN
Link Resource              http://www.metmuseum.org/art/collection/search/35
Metadata Date                                           4/22/2019 8:00:03 AM
Repository                          Metropolitan Museum of Art, New York, NY
Tags                                               Birds|Palmettes|Men|Vases
Name: 34, dtype: object
df.iloc[34]['Medium']
'Gold, amethysts, spessartites, tourmalines, fresh water pearls, quartzes, rock crystal, and enamel'

Dimensions

One particular challenging field is 'dimensions' which may be delimited by newlines, and varies dramatically for 2D objects and 3D objects, such as vases, furniture, coins, etc.

df.iloc[34]['Dimensions']
'Overall: 19 7/16 x 13 x 9 1/4 in. (49.4 x 33 x 23.5 cm); 352 oz. 18 dwt. (10977 g)\r\nBody: H. 18 7/8 in. (47.9 cm)\r\nCover: 4 1/4 x 4 13/16 in. (10.8 x 12.2 cm); 19 oz. 6 dwt. (600.1 g)'

Advanced parsing of the CSV file must be done because of this. Simple UNIX tools, and even MS Excel, are typically fooled by the CSV file that breaks up 'Dimensions' into multiple lines like this.

By country

df['Country'].value_counts().head(50)
Egypt                               31314
United States                        8984
Iran                                 6124
Peru                                 3483
France                               1726
Byzantine Egypt                      1673
Mexico                               1644
India                                1463
Indonesia                            1438
England                              1117
China                                 918
Turkey                                906
Germany                               900
Papua New Guinea                      878
Nigeria                               664
Italy                                 549
Democratic Republic of the Congo      537
Syria                                 493
Spain                                 422
Iraq                                  404
Canada                                399
Mali                                  374
Colombia                              358
Côte d'Ivoire                         304
Egypt or Syria                        301
Iran|Iran                             280
America                               257
United States|United States           257
Cameroon                              254
Japan                                 239
United Kingdom                        207
Costa Rica                            206
Morocco                               197
present-day France                    178
Ghana                                 170
Bolivia                               164
Northern France                       137
Ecuador                               132
United States|England                 129
Panama                                127
Saudi Arabia                          127
Austria                               125
present-day Uzbekistan                122
Republic of Benin                     119
Guatemala                             114
Palestine                              98
United States: N.A.                    94
Netherlands                            94
Burkina Faso                           90
Russia                                 90
Name: Country, dtype: int64

Objects that have country "Iran|Iran"

df.loc[df.Country.isin(['Iran|Iran']), ['Object Number','Title','Object Begin Date']].sort_values('Object Begin Date',ascending=False)
Object Number Title Object Begin Date
313984 30.95.174.1a, b Davis Album 1777
320605 39.40.127.304 Coin 1500
320606 39.40.127.305 Coin 1500
320597 39.40.127.296 Coin 1307
311511 09.87 Tile Panel in the form of an Architectural Niche 1300
391147 SL.12.2016.14.4 Basin with a Snake and a Caparisoned Elephant ... 1190
391123 SL.12.2016.34.1 Bowl with Seated Figures by a Pond 1186
411400 SL.12.2016.26.10 Dish with Polo Player 1183
400262 SL.12.2016.28.2 Textile Fragment Depicting a Figure and Mythic... 1175
391212 SL.12.2016.41.2 Melting pan 1175
391179 SL.12.2016.11.2 Fragment of a box with a combination lock 1175
391141 SL.12.2016.38.2 Model of a House Depicting a Feast 1175
400334 SL.12.2016.8.2 Fragment of a Base Depicting a Game of Backgammon 1175
391137 SL.12.2016.27.4 Bottle with Gilded and Mina'i Decoration 1175
391135 SL.12.2016.49.6 Container in the Form of a Camel Carrying a Jar 1175
391132 SL.12.2016.10.3 Star-Shaped Tile with a Seated Figure Holding ... 1175
391130 SL.12.2016.10.1 Figurine of a Falcon with Motifs of Seated Fig... 1175
391117 SL.12.2016.23.9 Bowl with Seated Figure 1175
408059 SL.12.2016.27.11 Fragment of a Bowl Depicting a Seated Couple, ... 1175
391139 SL.12.2016.27.8 Jug in the Form of a Crouching Man 1175
426556 SL.12.2016.5.5 Ewer in the Shape of a Lion 1175
391237 SL.12.2016.39.3 Panel with Enthroned Ruler and Courtiers 1150
391125 SL.12.2016.2.1 Luster Bottle with Interlace Strapwork and Lob... 1150
391263 SL.12.2016.41.1 Folios from a Copy (Mushaf) of the Qur'an 1139
391131 SL.12.2016.10.2 Apothecary Jar with Seated Figures and Running... 1125
391151 SL.12.2016.41.7 Luster bowl with Harpy 1125
391201 SL.12.2016.38.3a, b Celestial globe 1119
315112 39.40.39a Pottery Mold 1100
391138 SL.12.2016.27.7 Bowl with Colorless Glaze and Carved Vegetal M... 1100
391220 SL.12.2016.3.2 Bucket with Signs of the Zodiac 1100
... ... ... ...
315320 40.170.87 Bowl 700
315266 40.170.31 Bowl 700
315248 40.170.13 Bowl 700
315097 39.40.24 Bowl 700
320625 39.40.127.324 Coin 700
315104 39.40.31 Bowl 700
315085 39.40.9 Bowl 700
320702 39.40.127.402 Coin 400
320390 39.40.127.89 Coin 375
320596 39.40.127.295 Coin 186
411492 SL.12.2016.9.2 Mina'i Bowl with an Assembly with Children in ... 12
391120 SL.12.2016.23.12 Bath Scraper 0
391140 SL.12.2016.27.10 Luster Bowl with Bear 0
391225 SL.12.2016.2.3 Mihrab 0
391228 SL.12.2016.27.5 Tombstone of Abu Bakr b. Ibrahim 0
391231 SL.12.2016.23.3 Tile with Griffin Motif 0
391136 SL.12.2016.49.8 Figurine of ‘Sultan Tughril’ 0
391134 SL.12.2016.52.6 Blue Container in the Form of a Humped Bovine ... 0
391243 SL.12.2016.39.1 Fragmentary Burial Shroud with Checkered Pattern 0
391129 SL.12.2016.11.9 Dish with Schoolroom Scene 0
391244 SL.12.2016.39.2 Textile Fragment with Scene of Apotheosis 0
391124 SL.12.2016.34.2 Bowl with Epigraphic Band and Signs of the Zodiac 0
397673 SL.12.2016.51.3 Textile Fragment with Scene of Apotheosis 0
397672 SL.12.2016.51.2 Textile Fragment with Scene of Apotheosis 0
397669 SL.12.2016.51.1 Textile Fragment with Scene of Apotheosis 0
391238 SL.12.2016.44.3 Relief with Two Fighting Horsemen 0
391247 SL.12.2016.48.2a, b Cenotaph with Finials 0
391246 SL.12.2016.25.1 Robe 0
391245 SL.12.2016.39.4 Fragment of a Burial Textile 0
391257 SL.12.2016.49.4 First Folio from a Single-Volume Qur'an 0

280 rows × 3 columns

Iranian and Persian content

The Country column has both "Iran" and "Iran|Iran" as separate entries. Here's an analysis of the different timespans on the objects in each classification.

Iran

df.loc[df.Country.isin(['Iran']), ['Object Begin Date']].sort_values('Object Begin Date',ascending=False).hist()
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f50459c7470>]],
      dtype=object)

Iran|Iran

df.loc[df.Country.isin(['Iran|Iran']), ['Object Begin Date']].sort_values('Object Begin Date',ascending=False).hist()
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f475658aef0>]],
      dtype=object)

Egyptian artifacts

Objects with country=Egypt make up the largest proportion of database entries. They span from hundreds of thousands of years ago to modern day. To focus on artworks, we filter out objects older than 6000 B.C.E.

df.loc[df.Country.isin(['Egypt']), ['Object Number','Title','Object Begin Date']].loc[df['Object Begin Date']>-6001].sort_values('Object Begin Date',ascending=True)
Object Number Title Object Begin Date
359409 26.10.11 Spearpoint -5000
358996 26.10.15 Spearpoint -5000
358991 25.10.16i Arrowhead, tanged -5000
377125 06.322.147 Tool -5000
358993 26.10.99 Arrowhead, tanged -5000
358994 26.10.105 Arrowhead, tanged -5000
358992 26.10.80 Arrowhead, tanged -5000
374955 07.228.65b Bracelet -4500
374956 07.228.65c Bracelet -4500
359568 36.1.54 Rough ware jar -4500
374963 07.228.160 Palette -4500
360528 07.228.60 Lugged jar depicting two boats -4500
354688 33.4.26 Bifacial blade -4500
374959 07.228.63 Mace head -4500
374960 07.228.64 Mace head -4500
374961 07.228.104 Mace head -4500
374962 07.228.159 Palette -4500
375084 26.10.21 Tool -4500
375759 26.10.94 Tool -4500
354690 33.4.28 Ax or adze -4500
374954 07.228.65a Bracelet -4500
354692 33.4.30 Ax or adze -4500
375085 26.10.32 Tool -4500
376098 33.4.57 Sandstone mortar -4500
376097 33.4.35 Axe -4500
376096 33.4.5 Pounder -4500
376095 33.4.3 Pounder -4500
375086 26.10.39 Tool -4500
376046 29.4.4 Axe -4500
354693 33.4.31 Ax or adze -4500
... ... ... ...
350871 89.4.3157 Naqqara 1875
350322 89.4.3447 Camel Bell 1885
350323 89.4.3448 Camel Bell 1885
346711 89.2.189 Goge 1885
371582 2016.371.2 Hatshepsut's Mother, Queen Ahmose 1899
371581 2016.371.1 Hatshepsut's Grandmother, Seniseneb 1899
366127 26.8.210 Horus falcon Necklaces (2) 1900
71629 1982.429.2 Headscarf 1900
316750 55.111.20 Anklet 1900
316749 55.111.19 Anklet 1900
71622 C.I.49.19a–c Ensemble 1900
316751 55.111.21 Anklet 1900
321163 1980.389.10 Burka 1900
350450 11.151.742 Mizwij 1900
350449 11.151.741 Arghul 1900
71623 C.I.50.86.1 Scarf 1900
71618 C.I.39.124.19 Shawl 1900
71626 1974.88 Headscarf 1900
349942 1986.467.24 Riqq 1901
377169 2012.144 Model of the Mastaba Tomb of Perneb 1913
89014 1980.387.3 Scarf 1920
89392 2008.274.5a, b Ensemble 1930
377812 2012.139 Model of the Temple of Hatshepsut at Deir el-B... 1934
491919 N7381.75.S97 S43 1940 La séance continue 1940
87888 1978.582.230a–c Uniform 1946
348679 1974.231.2 Nay 1962
349790 1974.231.1 Zummārā Sattawija 1970
348813 1982.143.1 Ūd 1977
87889 1981.178 Dress 1981
382528 N.A.2013.15 Model of the New York Obelisk ("Cleopatra's Ne... 2013

31251 rows × 3 columns

Distribution of Egyptian objects

Below is a histogram showing the different time frames for Egyptian objects. There may be some false peaks there, as some objects may be rounded to the nearest guess (ie. 2000 BCE)

df.loc[df.Country.isin(['Egypt']), ['Object Number','Title','Object Begin Date']].loc[df['Object Begin Date']>-6001].sort_values('Object Begin Date',ascending=True).hist(bins=30)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc11cda8550>]],
      dtype=object)

Classification analysis

A general breakdown shows prints, photographs and drawings being the most prevalent

Some general questions to be addressed:

  • What is the distinction between Prints and Prints|Ephemera
  • ... Photographs and Photographs|Ephemera
  • What does the empty pipe mean? (3085 occurrences)
df['Classification'].value_counts()
Prints                                                         76967
Prints|Ephemera                                                42568
Photographs                                                    29218
Drawings                                                       26076
Vases                                                          21290
Books                                                          14868
Ceramics                                                       14035
Paintings                                                      11698
Photographs|Ephemera                                           11225
Textiles-Woven                                                 10994
Glass                                                           8947
Negatives                                                       6146
Prints|Ornament & Architecture                                  5152
Textiles-Laces                                                  4962
Sculpture                                                       4880
Ceramics-Porcelain                                              4325
Drawings|Ornament & Architecture                                4153
Textiles-Embroidered                                            4098
Metalwork-Silver                                                3994
Books|Prints|Ornament & Architecture                            3751
Ceramics-Pottery                                                3556
Textiles                                                        3275
|                                                               3085
Textiles-Printed                                                2559
Jewelry                                                         2522
Metal                                                           2406
Metalwork                                                       2389
Textiles-Trimmings                                              1950
Gold and Silver                                                 1938
Metal-Ornaments                                                 1895
                                                               ...  
Metalwork-Vessels|Silver                                           1
Leather|Woodwork-Furniture                                         1
Natural Substances|Woodwork-Miscellany                             1
Portfolios|Photographs                                             1
Hide-Furnishings                                                   1
Stone-Implements|Plates                                            1
Accessory-Jewelry-Childrenswear                                    1
Beads-Ornaments|Stone-Ornaments|Jewelry                            1
Metal-Implements|Gold                                              1
Ivories|Metalwork-Silver                                           1
Photographs|(not assigned)                                         1
Books|Ornament & Architecture|Albums                               1
Drawings|Photographs|Prints|Ephemera|Collages                      1
Miscellaneous|Ephemera|Prints                                      1
Textiles-Woven|Textiles-Embroidered|Textiles-Ecclesiastical        1
Albums|Prints|Books                                                1
Stone-Ornaments|Shell-Ornaments                                    1
Glass|Miscellaneous-Stone Vases                                    1
Woodwork–Miscellany                                                1
Woodwork|Paintings-Decorative|Paintings                            1
Textiles-Painted|Textiles-Woven                                    1
Prints-Fete|Prints|Ornament & Architecture                         1
Books|Ornament & Architecture|Albums|Drawings                      1
Metalwork-Tablets-Inscribed                                        1
Hide-Costumes|Beads-Containers                                     1
Beads-Costumes|Hide-Costumes                                       1
Textiles-Laces|Textiles-Woven                                      1
Membranophone-single-headed / trough                               1
Textiles-Embroidered-Painted and Printed                           1
Prints|Textiles-Painted|Paintings|Metalwork-Tin|Ephemera           1
Name: Classification, Length: 1227, dtype: int64
df.head(50)
Object Number Is Highlight Is Public Domain Object ID Department Object Name Title Culture Period Dynasty ... Locale Locus Excavation River Classification Rights and Reproduction Link Resource Metadata Date Repository Tags
0 1979.486.1 False False 1 American Decorative Arts Coin One-dollar Liberty Head Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/1 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
1 1980.264.5 False False 2 American Decorative Arts Coin Ten-dollar Liberty Head Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/2 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
2 67.265.9 False False 3 American Decorative Arts Coin Two-and-a-Half Dollar Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/3 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
3 67.265.10 False False 4 American Decorative Arts Coin Two-and-a-Half Dollar Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/4 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
4 67.265.11 False False 5 American Decorative Arts Coin Two-and-a-Half Dollar Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/5 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
5 67.265.12 False False 6 American Decorative Arts Coin Two-and-a-Half Dollar Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/6 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
6 67.265.13 False False 7 American Decorative Arts Coin Two-and-a-Half Dollar Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/7 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Birds|Coins
7 67.265.14 False False 8 American Decorative Arts Coin Two-and-a-Half Dollar Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/8 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Inscriptions|Men|Profiles|Coins
8 67.265.15 False False 9 American Decorative Arts Coin Two-and-a-Half Dollar Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/9 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
9 1979.486.3 False False 10 American Decorative Arts Coin Two-and-a-half-dollar Indian Head Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/10 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
10 1979.486.2 False False 11 American Decorative Arts Coin Two-and-a-half-dollar Liberty Head Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/11 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
11 1979.486.7 False False 12 American Decorative Arts Coin Twenty-dollar Liberty Head Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/12 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
12 1979.486.4 False False 13 American Decorative Arts Coin Five-dollar Indian Head Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/13 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
13 1979.486.5 False False 14 American Decorative Arts Coin Five-dollar Liberty Head Coin NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/14 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
14 16.74.49 False False 15 American Decorative Arts Coin Coin, 1/2 Real Mexican NaN NaN ... NaN NaN NaN NaN Silver NaN http://www.metmuseum.org/art/collection/search/15 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
15 16.74.27 False False 16 American Decorative Arts Peso Coin, 1/4 Peso Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/16 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
16 16.74.28 False False 17 American Decorative Arts Peso Coin, 1/4 Peso Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/17 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
17 16.74.29 False False 18 American Decorative Arts Peso Coin, 1/4 Peso Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/18 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
18 16.74.30 False False 19 American Decorative Arts Peso Coin, 1/4 Peso Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/19 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
19 16.74.31 False False 20 American Decorative Arts Peso Coin, 1/4 Peso Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/20 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
20 16.74.32 False False 21 American Decorative Arts Peso Coin, 1/4 Peso Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/21 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
21 16.74.43 False False 22 American Decorative Arts Coin Coin, 1/4 Real Guatemalan NaN NaN ... NaN NaN NaN NaN Silver NaN http://www.metmuseum.org/art/collection/search/22 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
22 16.74.44 False False 23 American Decorative Arts Coin Coin, 1/4 Real Guatemalan NaN NaN ... NaN NaN NaN NaN Silver NaN http://www.metmuseum.org/art/collection/search/23 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
23 16.74.33 False False 24 American Decorative Arts Centavos Coin, 10 Centavos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/24 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
24 16.74.34 False False 25 American Decorative Arts Centavos Coin, 10 Centavos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/25 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
25 16.74.35 False False 26 American Decorative Arts Centavos Coin, 10 Centavos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/26 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
26 16.74.36 False False 27 American Decorative Arts Centavos Coin, 10 Centavos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/27 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
27 16.74.38 False False 28 American Decorative Arts Centavos Coin, 10 Centavos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/28 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
28 16.74.39 False False 29 American Decorative Arts Centavos Coin, 10 Centavos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/29 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
29 16.74.37 False False 30 American Decorative Arts Centavos Coin, 10 Centavos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/30 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
30 16.74.40 False False 31 American Decorative Arts Centavos Coin, 10 Centavos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/31 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
31 09.9.15 False False 32 American Decorative Arts Pesos Coin, 20 Pesos Mexican NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/32 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
32 64.62 False False 33 American Decorative Arts Bust Bust of Abraham Lincoln American NaN NaN ... NaN NaN NaN NaN Glass NaN http://www.metmuseum.org/art/collection/search/33 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Sculpture|Abraham Lincoln|Portraits
33 1970.289.6 False True 34 American Decorative Arts Clock Acorn Clock American NaN NaN ... NaN NaN NaN NaN Furniture NaN http://www.metmuseum.org/art/collection/search/34 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Landscapes|Trees|Boats|Clocks
34 04.1a–c False False 35 American Decorative Arts Vase The Adams Vase American NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/35 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Birds|Palmettes|Men|Vases
35 1976.319 False False 36 American Decorative Arts Side Chair Side Chair American NaN NaN ... NaN NaN NaN NaN Furniture NaN http://www.metmuseum.org/art/collection/search/36 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Chairs
36 38.165.51 False True 37 American Decorative Arts Figure Figure of Admiral George Rodney British (American market) NaN NaN ... NaN NaN NaN NaN Ceramics NaN http://www.metmuseum.org/art/collection/search/37 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Cannons|Swords|Men
37 38.165.50 False True 38 American Decorative Arts Figure Figure of Admiral Samuel Hood British (American market) NaN NaN ... NaN NaN NaN NaN Ceramics NaN http://www.metmuseum.org/art/collection/search/38 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Cannons|Swords|Men|Admirals
38 18.11.10 False True 39 American Decorative Arts Advertisement Advertisement for Norwich Stone Ware Factory American NaN NaN ... NaN NaN NaN NaN Natural Substances NaN http://www.metmuseum.org/art/collection/search/39 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Advertisements
39 46.140.143 False True 40 American Decorative Arts Ale glass Ale Glass American NaN NaN ... NaN NaN NaN NaN Glass NaN http://www.metmuseum.org/art/collection/search/40 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Drinking Glasses
40 46.140.864 False True 41 American Decorative Arts Ale glass Ale Glass American NaN NaN ... NaN NaN NaN NaN Glass NaN http://www.metmuseum.org/art/collection/search/41 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Drinking Glasses
41 60.58.1 False True 42 American Decorative Arts Andiron Andiron American NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/42 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Utilitarian Objects
42 60.58.2 False True 43 American Decorative Arts Andiron Andiron American NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/43 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Utilitarian Objects
43 10.125.444a False True 44 American Decorative Arts Andiron Andiron NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/44 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Utilitarian Objects
44 10.125.444b False True 45 American Decorative Arts Andiron Andiron NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/45 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Utilitarian Objects
45 10.125.445a False True 46 American Decorative Arts Andiron Andiron NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/46 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
46 10.125.445b False True 47 American Decorative Arts Andiron Andiron NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/47 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
47 10.125.446a False False 48 American Decorative Arts Andiron Andiron NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/48 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
48 10.125.446b False False 49 American Decorative Arts Andiron Andiron NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/49 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY NaN
49 10.125.447a False True 50 American Decorative Arts Andiron Andiron NaN NaN NaN ... NaN NaN NaN NaN Metal NaN http://www.metmuseum.org/art/collection/search/50 4/22/2019 8:00:03 AM Metropolitan Museum of Art, New York, NY Utilitarian Objects|Men

50 rows × 44 columns

All objects

df['Object Number'].count()
494311

Just public domain released objects

pddf['Object Number'].count()
225886

For all PD objects, a breakdown of the classifications

Only for objects that have an image

pddf['Classification'].value_counts().head(500)
Prints                                            27588
Vases                                             19902
Drawings                                          14511
Ceramics                                          11255
Textiles-Woven                                     7847
Glass                                              6697
Paintings                                          6176
Photographs                                        5606
Photographs|Ephemera                               3835
Ceramics-Porcelain                                 3767
Textiles-Embroidered                               3419
Prints|Ephemera                                    3339
Textiles-Laces                                     3000
Sculpture                                          2915
Metalwork-Silver                                   2747
Ceramics-Pottery                                   2645
Drawings|Ornament & Architecture                   2609
Books|Prints|Ornament & Architecture               2262
Textiles                                           2255
Metalwork                                          1725
Stucco                                             1723
Gold and Silver                                    1666
Bronzes                                            1641
Metal-Ornaments                                    1633
Jade                                               1591
Metal                                              1482
Jewelry                                            1387
Terracottas                                        1337
Prints|Ornament & Architecture                     1288
Woodwork-Furniture                                 1240
                                                  ...  
Sculpture-Alabaster                                   6
Costumes-Accessories                                  6
Ceramics-Paintings                                    5
Stone-Vessels-Inscribed                               5
Chordophone-Zither                                    5
Glass-Cylinder Seals                                  5
Stone-Tablets-Inscribed                               5
Armor                                                 5
Stone-Flint                                           5
Idiophone-Friction                                    5
Woodwork-Furniture|Paintings                          5
Helmet Crests                                         5
Hide                                                  5
Miscellaneous-Leatherwork                             5
Membranophone-single-headed / cylindro-conical        5
Feathers-Ornaments                                    5
Textiles-Laces|Textiles-Embroidered                   5
Chordophone-Zither-struck-clavichord                  5
Chess Sets-Reproductions                              5
Aerophone-Automatic                                   5
Stone-Architectural                                   5
Shell-Equestrian                                      5
Chordophone-Lyre-plucked                              5
Chordophone-Lute                                      5
Aerophone                                             5
Textiles-Laces|Textiles-Ecclesiastical                5
Horology-Clocks                                       5
Enamels-Translucent                                   5
Miscellaneous-Bone                                    5
Firearms Parts                                        5
Name: Classification, Length: 500, dtype: int64

Met highlights

The object "hdf" contains an efficient copy of all the "Met highlights" items, which number around 2,000

hdf['Object Number'].count()
1967

How many of the Met highlight objects are declared as "public domain"?

This will form the maximum number that could be expected to be in Commons with an image

hpddf = df[(df['Is Public Domain'] == True) & (df['Is Highlight'] == True)].copy()
hpddf['Object Number'].count()
1399
# Make list from object numbers
hlist = hdf['Object ID'].tolist()

# Make a string of the object numbers, each in quotes, space separated
hstring = ' '.join('"' + str(item) + '"' for item in hlist)
len(hstring.split())
1967
hstring.split()
['"200"',
 '"237"',
 '"282"',
 '"364"',
 '"674"',
 '"802"',
 '"1029"',
 '"1047"',
 '"1083"',
 '"1084"',
 '"1503"',
 '"1524"',
 '"1674"',
 '"1815"',
 '"1997"',
 '"2059"',
 '"2390"',
 '"2408"',
 '"3141"',
 '"3152"',
 '"3158"',
 '"3497"',
 '"3555"',
 '"3577"',
 '"4282"',
 '"4285"',
 '"4785"',
 '"4923"',
 '"5223"',
 '"5582"',
 '"5630"',
 '"6186"',
 '"6635"',
 '"6778"',
 '"6906"',
 '"7586"',
 '"7595"',
 '"7873"',
 '"8288"',
 '"8489"',
 '"9317"',
 '"9334"',
 '"9480"',
 '"9724"',
 '"9982"',
 '"9997"',
 '"10127"',
 '"10154"',
 '"10159"',
 '"10174"',
 '"10388"',
 '"10391"',
 '"10409"',
 '"10464"',
 '"10481"',
 '"10482"',
 '"10497"',
 '"10499"',
 '"10522"',
 '"10527"',
 '"10531"',
 '"10574"',
 '"10586"',
 '"10732"',
 '"10786"',
 '"10793"',
 '"10819"',
 '"10827"',
 '"10830"',
 '"10843"',
 '"10909"',
 '"10912"',
 '"10946"',
 '"11040"',
 '"11050"',
 '"11055"',
 '"11080"',
 '"11120"',
 '"11122"',
 '"11130"',
 '"11133"',
 '"11207"',
 '"11227"',
 '"11234"',
 '"11263"',
 '"11311"',
 '"11329"',
 '"11375"',
 '"11383"',
 '"11396"',
 '"11417"',
 '"11476"',
 '"11477"',
 '"11494"',
 '"11605"',
 '"11619"',
 '"11680"',
 '"11707"',
 '"11734"',
 '"11737"',
 '"11789"',
 '"11790"',
 '"11792"',
 '"11797"',
 '"11865"',
 '"11900"',
 '"11999"',
 '"12004"',
 '"12012"',
 '"12015"',
 '"12019"',
 '"12127"',
 '"12138"',
 '"12388"',
 '"12600"',
 '"12602"',
 '"12613"',
 '"12649"',
 '"12667"',
 '"12702"',
 '"12828"',
 '"13052"',
 '"13084"',
 '"13134"',
 '"13137"',
 '"13211"',
 '"13215"',
 '"13315"',
 '"13471"',
 '"13606"',
 '"13647"',
 '"13758"',
 '"13763"',
 '"14049"',
 '"14092"',
 '"14282"',
 '"14336"',
 '"14380"',
 '"14430"',
 '"14472"',
 '"14488"',
 '"14494"',
 '"14871"',
 '"14930"',
 '"14931"',
 '"14932"',
 '"14972"',
 '"15387"',
 '"15541"',
 '"15583"',
 '"15588"',
 '"15590"',
 '"16255"',
 '"16571"',
 '"16577"',
 '"16578"',
 '"16584"',
 '"16687"',
 '"16863"',
 '"16947"',
 '"17053"',
 '"17066"',
 '"17139"',
 '"17447"',
 '"19261"',
 '"20144"',
 '"20612"',
 '"20615"',
 '"21698"',
 '"21940"',
 '"22275"',
 '"22387"',
 '"22405"',
 '"22506"',
 '"22521"',
 '"22631"',
 '"22634"',
 '"22739"',
 '"22769"',
 '"22860"',
 '"22871"',
 '"22876"',
 '"22914"',
 '"22932"',
 '"23026"',
 '"23205"',
 '"23216"',
 '"23367"',
 '"23939"',
 '"23944"',
 '"23948"',
 '"24014"',
 '"24320"',
 '"24623"',
 '"24648"',
 '"24671"',
 '"24681"',
 '"24685"',
 '"24686"',
 '"24693"',
 '"24813"',
 '"24832"',
 '"24860"',
 '"24861"',
 '"24865"',
 '"24900"',
 '"24907"',
 '"24927"',
 '"24931"',
 '"24937"',
 '"24948"',
 '"24953"',
 '"24957"',
 '"24960"',
 '"24975"',
 '"25111"',
 '"27789"',
 '"27790"',
 '"27791"',
 '"27792"',
 '"27936"',
 '"35650"',
 '"35775"',
 '"36029"',
 '"36131"',
 '"37145"',
 '"37743"',
 '"37788"',
 '"37801"',
 '"37942"',
 '"38133"',
 '"38162"',
 '"38198"',
 '"38341"',
 '"38468"',
 '"38574"',
 '"38648"',
 '"39097"',
 '"39637"',
 '"39649"',
 '"39666"',
 '"39668"',
 '"39707"',
 '"39733"',
 '"39738"',
 '"39901"',
 '"39915"',
 '"39918"',
 '"39936"',
 '"39957"',
 '"40055"',
 '"40081"',
 '"42162"',
 '"42163"',
 '"42178"',
 '"42183"',
 '"42229"',
 '"42716"',
 '"44696"',
 '"44858"',
 '"44859"',
 '"44918"',
 '"45428"',
 '"45432"',
 '"48948"',
 '"49156"',
 '"50342"',
 '"50688"',
 '"56220"',
 '"59669"',
 '"60870"',
 '"61429"',
 '"65397"',
 '"65576"',
 '"72498"',
 '"74832"',
 '"75960"',
 '"76974"',
 '"79048"',
 '"79091"',
 '"79101"',
 '"79142"',
 '"79220"',
 '"79269"',
 '"79585"',
 '"79778"',
 '"79884"',
 '"79893"',
 '"80208"',
 '"80378"',
 '"80547"',
 '"80591"',
 '"80790"',
 '"80840"',
 '"80911"',
 '"81100"',
 '"81101"',
 '"81102"',
 '"81103"',
 '"81104"',
 '"81105"',
 '"81106"',
 '"81107"',
 '"81108"',
 '"81110"',
 '"81111"',
 '"81112"',
 '"81113"',
 '"81114"',
 '"81115"',
 '"81116"',
 '"81118"',
 '"81119"',
 '"81121"',
 '"81122"',
 '"81123"',
 '"81124"',
 '"81125"',
 '"81127"',
 '"81128"',
 '"81130"',
 '"81131"',
 '"81132"',
 '"81134"',
 '"81135"',
 '"81136"',
 '"81137"',
 '"81138"',
 '"81139"',
 '"81140"',
 '"81141"',
 '"81142"',
 '"81143"',
 '"81168"',
 '"81387"',
 '"81453"',
 '"81460"',
 '"81468"',
 '"81472"',
 '"81476"',
 '"81480"',
 '"81490"',
 '"81558"',
 '"81678"',
 '"81692"',
 '"81754"',
 '"81781"',
 '"82103"',
 '"82105"',
 '"82170"',
 '"82419"',
 '"82426"',
 '"82433"',
 '"82443"',
 '"82446"',
 '"82879"',
 '"82880"',
 '"82981"',
 '"83175"',
 '"83179"',
 '"83209"',
 '"83234"',
 '"83242"',
 '"83250"',
 '"83259"',
 '"83442"',
 '"83443"',
 '"83605"',
 '"84034"',
 '"87016"',
 '"87370"',
 '"87613"',
 '"88206"',
 '"88645"',
 '"95534"',
 '"96434"',
 '"107375"',
 '"107620"',
 '"112900"',
 '"130481"',
 '"159388"',
 '"187784"',
 '"189164"',
 '"191259"',
 '"191803"',
 '"191843"',
 '"192716"',
 '"192727"',
 '"192729"',
 '"193344"',
 '"193506"',
 '"193606"',
 '"193614"',
 '"193632"',
 '"194243"',
 '"194432"',
 '"194622"',
 '"195223"',
 '"195456"',
 '"195473"',
 '"196439"',
 '"196910"',
 '"197462"',
 '"198556"',
 '"198715"',
 '"199003"',
 '"199404"',
 '"199410"',
 '"199674"',
 '"199708"',
 '"199737"',
 '"200668"',
 '"201633"',
 '"201862"',
 '"201895"',
 '"202115"',
 '"202141"',
 '"202192"',
 '"202614"',
 '"202718"',
 '"202996"',
 '"203008"',
 '"204533"',
 '"204758"',
 '"204804"',
 '"204812"',
 '"204896"',
 '"205116"',
 '"205250"',
 '"205351"',
 '"205485"',
 '"205526"',
 '"206045"',
 '"206399"',
 '"206499"',
 '"206587"',
 '"206918"',
 '"206976"',
 '"206989"',
 '"206990"',
 '"207032"',
 '"207394"',
 '"207667"',
 '"207754"',
 '"207797"',
 '"208149"',
 '"208523"',
 '"208555"',
 '"208816"',
 '"209028"',
 '"209063"',
 '"209104"',
 '"209329"',
 '"211383"',
 '"211486"',
 '"230011"',
 '"231564"',
 '"231667"',
 '"232038"',
 '"232119"',
 '"236643"',
 '"236688"',
 '"236691"',
 '"239584"',
 '"242006"',
 '"242008"',
 '"242408"',
 '"243823"',
 '"245376"',
 '"245787"',
 '"247008"',
 '"247009"',
 '"247017"',
 '"247020"',
 '"247117"',
 '"247173"',
 '"247916"',
 '"247964"',
 '"247967"',
 '"247993"',
 '"248132"',
 '"248140"',
 '"248268"',
 '"248466"',
 '"248483"',
 '"248499"',
 '"248579"',
 '"248644"',
 '"248696"',
 '"248851"',
 '"248876"',
 '"248891"',
 '"248892"',
 '"248899"',
 '"248902"',
 '"248904"',
 '"249186"',
 '"249222"',
 '"249223"',
 '"249228"',
 '"249414"',
 '"250551"',
 '"250939"',
 '"250945"',
 '"250951"',
 '"251050"',
 '"251428"',
 '"251476"',
 '"251532"',
 '"251838"',
 '"251929"',
 '"251935"',
 '"252451"',
 '"252452"',
 '"252453"',
 '"252884"',
 '"252890"',
 '"252948"',
 '"252973"',
 '"253050"',
 '"253056"',
 '"253135"',
 '"253343"',
 '"253348"',
 '"253349"',
 '"253351"',
 '"253370"',
 '"253373"',
 '"253505"',
 '"253566"',
 '"253592"',
 '"254473"',
 '"254478"',
 '"254502"',
 '"254587"',
 '"254589"',
 '"254595"',
 '"254597"',
 '"254613"',
 '"254649"',
 '"254779"',
 '"254801"',
 '"254819"',
 '"254825"',
 '"254842"',
 '"254843"',
 '"254896"',
 '"254923"',
 '"255122"',
 '"255154"',
 '"255275"',
 '"255344"',
 '"255367"',
 '"255391"',
 '"255408"',
 '"255417"',
 '"255949"',
 '"255973"',
 '"256126"',
 '"256184"',
 '"256205"',
 '"256403"',
 '"256548"',
 '"256570"',
 '"256846"',
 '"256861"',
 '"256970"',
 '"256974"',
 '"256975"',
 '"256976"',
 '"256977"',
 '"256978"',
 '"257511"',
 '"257603"',
 '"257640"',
 '"257875"',
 '"257880"',
 '"259797"',
 '"261941"',
 '"262612"',
 '"262720"',
 '"264688"',
 '"264711"',
 '"265047"',
 '"265132"',
 '"265133"',
 '"265166"',
 '"265447"',
 '"265465"',
 '"265543"',
 '"265550"',
 '"265553"',
 '"265556"',
 '"265563"',
 '"265616"',
 '"265726"',
 '"265903"',
 '"265904"',
 '"266102"',
 '"266121"',
 '"266133"',
 '"266162"',
 '"266199"',
 '"266262"',
 '"266284"',
 '"266332"',
 '"266349"',
 '"266427"',
 '"266480"',
 '"266537"',
 '"266538"',
 '"266644"',
 '"266709"',
 '"266746"',
 '"266784"',
 '"266850"',
 '"266853"',
 '"266855"',
 '"266859"',
 '"266982"',
 '"266983"',
 '"267010"',
 '"267019"',
 '"267041"',
 '"267042"',
 '"267087"',
 '"267124"',
 '"267175"',
 '"267193"',
 '"267271"',
 '"267362"',
 '"267363"',
 '"267426"',
 '"267530"',
 '"267717"',
 '"267748"',
 '"267757"',
 '"267775"',
 '"267803"',
 '"267838"',
 '"267839"',
 '"267842"',
 '"267891"',
 '"268621"',
 '"269436"',
 '"269442"',
 '"271570"',
 '"271615"',
 '"271716"',
 '"271885"',
 '"271890"',
 '"271948"',
 '"271954"',
 '"271963"',
 '"272071"',
 '"281940"',
 '"281977"',
 '"282004"',
 '"282022"',
 '"282039"',
 '"282040"',
 '"282043"',
 '"282046"',
 '"282051"',
 '"282163"',
 '"282190"',
 '"282234"',
 '"282515"',
 '"282602"',
 '"282740"',
 '"282756"',
 '"282757"',
 '"282774"',
 '"282778"',
 '"282883"',
 '"283099"',
 '"283121"',
 '"283222"',
 '"283236"',
 '"283277"',
 '"283626"',
 '"283720"',
 '"283742"',
 '"284086"',
 '"284572"',
 '"284712"',
 '"285142"',
 '"286725"',
 '"291739"',
 '"291951"',
 '"307474"',
 '"307630"',
 '"307651"',
 '"307975"',
 '"309427"',
 '"309428"',
 '"309861"',
 '"309868"',
 '"309959"',
 '"310073"',
 '"310175"',
 '"310279"',
 '"310325"',
 '"310364"',
 '"310453"',
 '"310454"',
 '"310542"',
 '"310552"',
 '"310563"',
 '"310604"',
 '"310652"',
 '"310765"',
 '"310870"',
 '"310950"',
 '"310960"',
 '"311021"',
 '"311024"',
 '"311159"',
 '"311171"',
 '"311183"',
 '"311237"',
 '"311290"',
 '"311294"',
 '"311336"',
 '"311651"',
 '"311944"',
 '"311950"',
 '"312119"',
 '"312180"',
 '"312231"',
 '"312288"',
 '"312290"',
 '"312336"',
 '"312342"',
 '"312429"',
 '"312460"',
 '"312602"',
 '"312677"',
 '"312747"',
 '"312781"',
 '"313240"',
 '"313256"',
 '"313313"',
 '"313327"',
 '"313330"',
 '"313386"',
 '"313546"',
 '"313629"',
 '"313654"',
 '"313658"',
 '"313697"',
 '"313780"',
 '"313830"',
 '"314034"',
 '"314204"',
 '"314299"',
 '"314362"',
 '"314366"',
 '"314370"',
 '"314704"',
 '"314826"',
 '"315786"',
 '"316007"',
 '"316008"',
 '"316145"',
 '"316173"',
 '"316330"',
 '"316393"',
 '"316442"',
 '"316457"',
 '"316503"',
 '"316594"',
 '"316661"',
 '"316682"',
 '"316945"',
 '"317196"',
 '"317618"',
 '"317735"',
 '"317736"',
 '"317747"',
 '"317792"',
 '"317793"',
 '"317823"',
 '"318345"',
 '"318622"',
 '"318667"',
 '"318764"',
 '"318765"',
 '"318899"',
 '"319033"',
 '"319233"',
 '"319458"',
 '"319872"',
 '"320053"',
 '"322114"',
 '"322375"',
 '"322443"',
 '"322483"',
 '"322485"',
 '"322499"',
 '"322585"',
 '"322608"',
 '"322609"',
 '"322611"',
 '"322626"',
 '"322889"',
 '"322890"',
 '"322903"',
 '"323163"',
 '"323178"',
 '"323735"',
 '"324008"',
 '"324029"',
 '"324075"',
 '"324111"',
 '"324155"',
 '"324158"',
 '"324290"',
 '"324291"',
 '"324444"',
 '"324492"',
 '"324687"',
 '"324695"',
 '"324739"',
 '"324830"',
 '"324917"',
 '"325005"',
 '"325089"',
 '"325511"',
 '"325562"',
 '"325584"',
 '"325688"',
 '"325693"',
 '"325710"',
 '"325955"',
 '"326230"',
 '"326243"',
 '"326374"',
 '"326623"',
 '"326655"',
 '"326695"',
 '"327066"',
 '"327104"',
 '"327369"',
 '"327399"',
 '"327401"',
 '"327409"',
 '"327427"',
 '"327434"',
 '"327497"',
 '"327518"',
 '"327520"',
 '"327544"',
 '"327830"',
 '"328905"',
 '"328961"',
 '"329072"',
 '"329073"',
 '"329074"',
 '"329075"',
 '"329076"',
 '"329077"',
 '"329078"',
 '"329079"',
 '"329081"',
 '"329090"',
 '"329091"',
 '"329227"',
 '"331619"',
 '"333813"',
 '"333915"',
 '"334002"',
 '"334245"',
 '"334650"',
 '"334710"',
 '"334769"',
 '"335240"',
 '"335287"',
 '"335663"',
 '"335668"',
 '"336046"',
 '"336162"',
 '"336222"',
 '"336228"',
 '"336256"',
 '"336259"',
 '"336327"',
 '"336489"',
 '"336773"',
 '"337057"',
 '"337058"',
 '"337061"',
 '"337062"',
 '"337063"',
 '"337064"',
 '"337068"',
 '"337069"',
 '"337070"',
 '"337071"',
 '"337075"',
 '"337088"',
 '"337105"',
 '"337172"',
 '"337487"',
 '"337489"',
 '"337490"',
 '"337491"',
 '"337493"',
 '"337494"',
 '"337496"',
 '"337497"',
 '"337498"',
 '"337499"',
 '"337500"',
 '"337632"',
 '"337699"',
 '"337700"',
 '"337702"',
 '"337725"',
 '"337844"',
 '"337894"',
 '"339751"',
 '"352328"',
 '"354631"',
 '"354644"',
 '"360255"',
 '"365017"',
 '"383125"',
 '"395485"',
 '"413556"',
 '"435600"',
 '"435621"',
 '"435641"',
 '"435702"',
 '"435728"',
 '"435739"',
 '"435802"',
 '"435809"',
 '"435817"',
 '"435826"',
 '"435839"',
 '"435844"',
 '"435851"',
 '"435853"',
 '"435868"',
 '"435876"',
 '"435882"',
 '"435888"',
 '"435896"',
 '"435908"',
 '"435962"',
 '"435984"',
 '"436002"',
 '"436037"',
 '"436095"',
 '"436101"',
 '"436105"',
 '"436106"',
 '"436121"',
 '"436173"',
 '"436244"',
 '"436253"',
 '"436282"',
 '"436322"',
 '"436435"',
 '"436504"',
 '"436528"',
 '"436532"',
 '"436535"',
 '"436545"',
 '"436573"',
 '"436575"',
 '"436579"',
 '"436603"',
 '"436616"',
 '"436622"',
 '"436658"',
 '"436819"',
 '"436838"',
 '"436851"',
 '"436892"',
 '"436896"',
 '"436918"',
 '"436947"',
 '"436964"',
 '"436966"',
 '"437056"',
 '"437097"',
 '"437127"',
 '"437133"',
 '"437153"',
 '"437175"',
 '"437261"',
 '"437283"',
 '"437299"',
 '"437326"',
 '"437329"',
 '"437344"',
 '"437372"',
 '"437394"',
 ...]

Checking Met Highlights against Wikidata

The variable "hstring" is a manual list of the 1,900+ Met highlight objects. How may of these are in Wikidata, by manual match?

import urllib.parse
from IPython.display import IFrame
baseurl='https://query.wikidata.org/embed.html#'

def wdq(query='',width=800,height=500):
  return IFrame(baseurl+urllib.parse.quote(query), width=width, height=height)

import requests

wikidata_api_url = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql'

How many of the 1,900+Met highlight objects are in Wikidata now?

query = '''
SELECT (COUNT(?item) as ?count) WHERE { 
  VALUES ?ids { %s } 
  ?item wdt:P3634 ?ids.
}
'''
query = query % hstring
#query
data = requests.post(wikidata_api_url, data={'query': query, 'format': 'json'}).json()

print ('Of the', len(hstring.split()), 'Met highlight objects, Wikidata contains', int(data['results']['bindings'][0]['count']['value']), 'determined by manual object ID matches')
Of the 1967 Met highlight objects, Wikidata contains 1598 determined by manual object ID matches
query = '''
SELECT (COUNT(?item) as ?count) WHERE { 
  BIND (wd:Q160236 AS ?institution)
  ?item wdt:P195 ?institution .
  ?item p:P195 [ ps:P195 ?id ; pq:P2868 wd:Q29188408 ]  .
}
'''
data = requests.post(wikidata_api_url, data={'query': query, 'format': 'json'}).json()

print ('Wikidata has', int(data['results']['bindings'][0]['count']['value']), 'items with collection->Met, subject has role->collection highlight')
Wikidata has 1477 items with collection->Met, subject has role->collection highlight

Perform query to get all Wikidata QIDs of collection->Met, subject has role->collection highlight

query = '''
SELECT ?item WHERE { 
  BIND (wd:Q160236 AS ?institution)
  ?item wdt:P195 ?institution .
  ?item p:P195 [ ps:P195 ?id ; pq:P2868 wd:Q29188408 ]  .
}
'''
data = requests.post(wikidata_api_url, data={'query': query, 'format': 'json'}).json()

resultarray = []
for item in data['results']['bindings']:
    resultarray.append({
        'qid': int(item['item']['value'].replace('http://www.wikidata.org/entity/Q',''))})

df_highlights_byrole = pd.DataFrame(resultarray)
print(len(df_highlights_byrole))
df_highlights_byrole.info()
1477
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1477 entries, 0 to 1476
Data columns (total 1 columns):
qid    1477 non-null int64
dtypes: int64(1)
memory usage: 11.6 KB

Perform query to get all Wikidata QIDs of that match manual list of highlight items (hstring)

query = '''
SELECT ?item WHERE { 
  VALUES ?ids { %s } 
  ?item wdt:P3634 ?ids .
}
'''
query = query % hstring

data = requests.post(wikidata_api_url, data={'query': query, 'format': 'json'}).json()
resultarray = []
for item in data['results']['bindings']:
    resultarray.append({
        'qid': int(item['item']['value'].replace('http://www.wikidata.org/entity/Q',''))})

df_highlights_byvalues = pd.DataFrame(resultarray)
print(len(df_highlights_byvalues))
df_highlights_byvalues.info()
1598
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1598 entries, 0 to 1597
Data columns (total 1 columns):
qid    1598 non-null int64
dtypes: int64(1)
memory usage: 12.6 KB

Differences between the two highlights lists

If we look at the differences between the manual list of highlights using "collection/subject has role" and by manual list of object IDs, we have some disrepancies. These seem to be Wikidata items that have "subject has role"->"collection highlight" but when checking the Met API, it comes back as false.

All highlights items that have "collection->Met" and "subject has role->collection highlight" but not in the manually matched rows

This means they are old or mislabeled entries in Wikidata since the Met does not consider them highlights anymore. In a call with Jennie Choi of The Met, these are indeed former highlights and should the claim removed.

df_highlights_byrole[~df_highlights_byrole.qid.isin(df_highlights_byvalues.qid)]
qid
207 19911548
502 867510
509 18009719
605 19911857
723 19912313
918 20167211
1100 72650
1116 19905335
1199 55835403

All highlights items in the manually matched rows but has no statement "collection->Met" and "subject has role"

This implies that the statement with qualifier needs to be added to Wikidata.

df_highlights_byvalues[~df_highlights_byvalues.qid.isin(df_highlights_byrole.qid)]
qid
26 20200739
35 20172190
64 17008188
93 20176821
116 19922089
142 50943530
148 20189686
186 19912688
194 3937623
196 17275914
197 22443796
201 19920995
217 19911495
224 20169728
285 29384006
290 19911667
334 15290965
342 20190342
373 20192152
379 20184222
384 19917767
397 20189884
416 942523
420 19911654
452 19911604
483 3713914
492 19918956
496 293990
504 19336630
517 20181967
... ...
1262 19917726
1288 3984608
1310 19922092
1324 5026931
1327 19918404
1352 19918987
1356 19922087
1367 20189766
1370 18177483
1375 19917476
1381 20189430
1382 20191094
1394 19904853
1424 19919764
1426 20200599
1427 7994688
1428 16987384
1453 19919692
1462 860930
1469 19911684
1490 19922184
1516 19922659
1519 20190159
1544 19920648
1550 20199665
1552 20198815
1560 933665
1562 19917877
1574 20189382
1594 19917777

130 rows × 1 columns

highlights_toadd = df_highlights_byvalues[~df_highlights_byvalues.qid.isin(df_highlights_byrole.qid)]['qid'].tolist()

qs_commands = '\n'.join('Q' + str(item) + '|P195|Q160236|P2868|Q29188408' for item in highlights_toadd)
print (qs_commands)
Q20200739|P195|Q160236|P2868|Q29188408
Q20172190|P195|Q160236|P2868|Q29188408
Q17008188|P195|Q160236|P2868|Q29188408
Q20176821|P195|Q160236|P2868|Q29188408
Q19922089|P195|Q160236|P2868|Q29188408
Q50943530|P195|Q160236|P2868|Q29188408
Q20189686|P195|Q160236|P2868|Q29188408
Q19912688|P195|Q160236|P2868|Q29188408
Q3937623|P195|Q160236|P2868|Q29188408
Q17275914|P195|Q160236|P2868|Q29188408
Q22443796|P195|Q160236|P2868|Q29188408
Q19920995|P195|Q160236|P2868|Q29188408
Q19911495|P195|Q160236|P2868|Q29188408
Q20169728|P195|Q160236|P2868|Q29188408
Q29384006|P195|Q160236|P2868|Q29188408
Q19911667|P195|Q160236|P2868|Q29188408
Q15290965|P195|Q160236|P2868|Q29188408
Q20190342|P195|Q160236|P2868|Q29188408
Q20192152|P195|Q160236|P2868|Q29188408
Q20184222|P195|Q160236|P2868|Q29188408
Q19917767|P195|Q160236|P2868|Q29188408
Q20189884|P195|Q160236|P2868|Q29188408
Q942523|P195|Q160236|P2868|Q29188408
Q19911654|P195|Q160236|P2868|Q29188408
Q19911604|P195|Q160236|P2868|Q29188408
Q3713914|P195|Q160236|P2868|Q29188408
Q19918956|P195|Q160236|P2868|Q29188408
Q293990|P195|Q160236|P2868|Q29188408
Q19336630|P195|Q160236|P2868|Q29188408
Q20181967|P195|Q160236|P2868|Q29188408
Q19920925|P195|Q160236|P2868|Q29188408
Q53445802|P195|Q160236|P2868|Q29188408
Q1671114|P195|Q160236|P2868|Q29188408
Q20189411|P195|Q160236|P2868|Q29188408
Q19922146|P195|Q160236|P2868|Q29188408
Q3395798|P195|Q160236|P2868|Q29188408
Q19914543|P195|Q160236|P2868|Q29188408
Q2873092|P195|Q160236|P2868|Q29188408
Q3976307|P195|Q160236|P2868|Q29188408
Q20190096|P195|Q160236|P2868|Q29188408
Q7979098|P195|Q160236|P2868|Q29188408
Q19921181|P195|Q160236|P2868|Q29188408
Q20199600|P195|Q160236|P2868|Q29188408
Q19917610|P195|Q160236|P2868|Q29188408
Q29385435|P195|Q160236|P2868|Q29188408
Q29383745|P195|Q160236|P2868|Q29188408
Q19917692|P195|Q160236|P2868|Q29188408
Q20183401|P195|Q160236|P2868|Q29188408
Q20190456|P195|Q160236|P2868|Q29188408
Q19920959|P195|Q160236|P2868|Q29188408
Q20198067|P195|Q160236|P2868|Q29188408
Q25936897|P195|Q160236|P2868|Q29188408
Q20190867|P195|Q160236|P2868|Q29188408
Q20189992|P195|Q160236|P2868|Q29188408
Q20189474|P195|Q160236|P2868|Q29188408
Q20184249|P195|Q160236|P2868|Q29188408
Q20185035|P195|Q160236|P2868|Q29188408
Q20198282|P195|Q160236|P2868|Q29188408
Q19919026|P195|Q160236|P2868|Q29188408
Q20200039|P195|Q160236|P2868|Q29188408
Q20199228|P195|Q160236|P2868|Q29188408
Q19921217|P195|Q160236|P2868|Q29188408
Q20190003|P195|Q160236|P2868|Q29188408
Q20189406|P195|Q160236|P2868|Q29188408
Q20190618|P195|Q160236|P2868|Q29188408
Q19923418|P195|Q160236|P2868|Q29188408
Q19917838|P195|Q160236|P2868|Q29188408
Q19917972|P195|Q160236|P2868|Q29188408
Q20167086|P195|Q160236|P2868|Q29188408
Q20190104|P195|Q160236|P2868|Q29188408
Q20190484|P195|Q160236|P2868|Q29188408
Q20189694|P195|Q160236|P2868|Q29188408
Q20189416|P195|Q160236|P2868|Q29188408
Q20182015|P195|Q160236|P2868|Q29188408
Q20191641|P195|Q160236|P2868|Q29188408
Q19920734|P195|Q160236|P2868|Q29188408
Q3622793|P195|Q160236|P2868|Q29188408
Q19920774|P195|Q160236|P2868|Q29188408
Q20189483|P195|Q160236|P2868|Q29188408
Q29049154|P195|Q160236|P2868|Q29188408
Q56741122|P195|Q160236|P2868|Q29188408
Q20190879|P195|Q160236|P2868|Q29188408
Q20190701|P195|Q160236|P2868|Q29188408
Q29385502|P195|Q160236|P2868|Q29188408
Q20190490|P195|Q160236|P2868|Q29188408
Q20191901|P195|Q160236|P2868|Q29188408
Q19917481|P195|Q160236|P2868|Q29188408
Q19905213|P195|Q160236|P2868|Q29188408
Q19905165|P195|Q160236|P2868|Q29188408
Q19905264|P195|Q160236|P2868|Q29188408
Q20183057|P195|Q160236|P2868|Q29188408
Q19905368|P195|Q160236|P2868|Q29188408
Q20189913|P195|Q160236|P2868|Q29188408
Q19918560|P195|Q160236|P2868|Q29188408
Q20185044|P195|Q160236|P2868|Q29188408
Q20198810|P195|Q160236|P2868|Q29188408
Q20192123|P195|Q160236|P2868|Q29188408
Q5179376|P195|Q160236|P2868|Q29188408
Q19912312|P195|Q160236|P2868|Q29188408
Q20198484|P195|Q160236|P2868|Q29188408
Q19917726|P195|Q160236|P2868|Q29188408
Q3984608|P195|Q160236|P2868|Q29188408
Q19922092|P195|Q160236|P2868|Q29188408
Q5026931|P195|Q160236|P2868|Q29188408
Q19918404|P195|Q160236|P2868|Q29188408
Q19918987|P195|Q160236|P2868|Q29188408
Q19922087|P195|Q160236|P2868|Q29188408
Q20189766|P195|Q160236|P2868|Q29188408
Q18177483|P195|Q160236|P2868|Q29188408
Q19917476|P195|Q160236|P2868|Q29188408
Q20189430|P195|Q160236|P2868|Q29188408
Q20191094|P195|Q160236|P2868|Q29188408
Q19904853|P195|Q160236|P2868|Q29188408
Q19919764|P195|Q160236|P2868|Q29188408
Q20200599|P195|Q160236|P2868|Q29188408
Q7994688|P195|Q160236|P2868|Q29188408
Q16987384|P195|Q160236|P2868|Q29188408
Q19919692|P195|Q160236|P2868|Q29188408
Q860930|P195|Q160236|P2868|Q29188408
Q19911684|P195|Q160236|P2868|Q29188408
Q19922184|P195|Q160236|P2868|Q29188408
Q19922659|P195|Q160236|P2868|Q29188408
Q20190159|P195|Q160236|P2868|Q29188408
Q19920648|P195|Q160236|P2868|Q29188408
Q20199665|P195|Q160236|P2868|Q29188408
Q20198815|P195|Q160236|P2868|Q29188408
Q933665|P195|Q160236|P2868|Q29188408
Q19917877|P195|Q160236|P2868|Q29188408
Q20189382|P195|Q160236|P2868|Q29188408
Q19917777|P195|Q160236|P2868|Q29188408
url = 'toah-20190228.csv'
toahobjects = pd.read_csv(url,low_memory=False)
toahobjects.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8017 entries, 0 to 8016
Data columns (total 1 columns):
ObjectID    8017 non-null int64
dtypes: int64(1)
memory usage: 62.7 KB
toahobjects.head(10)
ObjectID
0 239584
1 437920
2 436634
3 437376
4 436257
5 436427
6 437328
7 436558
8 437588
9 436752