Of the twelve thousand creators and depicted persons in the Amsterdam Museum collection, we were able to link 465 to one of the twenty thousand biographies in the Ecartico database. This post describes the way we dit it.
First, 465 doesn’t sound like an awful lot, considering the twelve thousand persons on the one side and more than twenty thousand on the other. Please keep in mind the Amsterdam Museum collection spans over five centuries, while Ecartico focusses on artists working in 17th-century Amsterdam. By nature, false negatives are hard to find, but the fact that we couldn’t think of one person that should’ve been matched and wasn’t suggests we didn’t do a bad job.
The fields to match on were: name, date of birth, date of death and RKD URI (more on RKD URI’s in a previous post). We considered ‘profession’ as well, but at the Amsterdam Museum this field was left empty most of the time and mapping ‘persons that created objects tagged as silverware’ to the term ‘silversmith’ would have been time-consuming.
Not knowing what combinations would prove to yield the best results, we decided to match on each of these fields in itself and save the results in a matrix. So people that were born in the same year en died in the same year as well scored a ‘1’ in the birth and death year column of the matrix, regardless of their names.
The RKD URI proved to be the best, and only, one-field-matcher. It leaves no room for ambiguity, so only human error might result in mistakes here.
Matching persons on their exact names yielded a considerable amount of false positives because there’s more than one Jan Jansen and some fathers, like Romeyn de Hooghe, name their sons after themselves. The number of false negatives was much higher – ‘Rembrandt’ just wouldn’t match with ‘Rembrandt Harmensz. van Rijn’.
Obviously, on a fuzzy name search, the number of false negatives was much lower, but the the number of false positives went skyrocketing into the thousands. However, the combination of a fuzzy-name-match and a birth-and-death-year-match (just years, not exact dates, since many records just had years) did the job very well. We were able to identify just one false positive: a Jan Claesz (1570-1618) matching Nicolaes Jansz. Wytmans, Claes Wijtmans (1570-1618).
The final results:
- 280 matches on RKD URI alone
- 181 additional matches on the fuzzy-name-match and a birth-and-death-year-match combination
- 3 additional matches on a exact-name-match and a birth-and-death-year-match combination (implies a slight error in the fuzzy search!)
- 1 manual match: `Neeltje Willemsdr. van Zuijdtbrouck` marked as matching `Rembrandt’s moeder`.