Bilderatlas – Progress Post 2

Current progress

In the past three weeks our focus has been mainly on modifying the bot that loads images into the CNN database and testing it. We have encountered several problems in making it work, so we are slightly behind schedule.

The fifth week we have received the API address that allows us to communicate with the CNN server. However, we had to change a few things with respect to the original plan. Initially, after having found on the internet the images corresponding to the chosen tables from the Bilderatlas [1], the idea was to load them on the DHCanvas, annotate them, receive a URL, and later send it to the CNN database in order to add the images; the CNN would have automatically taken the correspondent annotations from the URL and would have returned an ID for each image.

Instead, in order to load the images on the CNN server, we were required to directly send the URL of the images found on the internet to the CNN server, together with the correspondent annotations.

In order to do so in a more efficient way, we prepared an Excel file where we reported the URLs of the images with the correspondent annotations (Figure 1). The tags were made in the same style used for all the other images already present in the CNN server. A few information are missing about some images. For example, there are some cases in which the data are photos and it is therefore meaningless to fill some fields related to paintings, sculptures, or architecture. For other images that report reliefs on ancient temples it was instead impossible to find the author. However, this should not be a problem, as those fields are useful to us mainly for information visualization once the result of the query appears.

Excel file containing the images from each table with the relative annotations.
Figure 1 – Excel file containing the images from each table with the relative annotations.

As a consequence of this change of plan, we had to adapt the bot to automatically read the URL and the relative annotations of the images directly from the Excel file. In short, with respect to the old code, we had to add a first part where the bot extracts the data from the .xls file. Following is the part of the code that accomplishes this:

print('Opening workbook')
wb = openpyxl.load_workbook('tables.xlsx')
sheets = ['Table46', 'Table2', 'Table25', 'Table45']

for s in range(0, len(sheets)):
 print('Reading', sheets[s])
 sheet = wb.get_sheet_by_name(sheets[s])
 print('Reading rows')
 for r in range(2, sheet.max_row+1): # skip the first row

 imageURL = sheet['A' + str(r)].value 
 if(imageURL is None):
 print("Missing image URL at ", sheet, "-", r)
 exit() 
 author = sheet['B' + str(r)].value
 if(author is None):
 author = ""
 title = sheet['C' + str(r)].value
 if(title is None):
 title = ""
 date = sheet['D' + str(r)].value
 if(date is None):
 date = ""

The code is based on the requests library which makes an HTTP POST containing a json structure. Basically, for each row of each table we create the relative json structure and we send it to the server. The server returns its ID and we add it to the respective column for that painting. We finally save an updated version of the Excel file having also the ID returned by the server. We report the script:

jsonData = {"image_url": str(imageURL),
 "metadata": {"author": str(author),
 "title": str(title),
 "date": str(date),
 "school": str(school),
 "form": str(form),
 "type": str(typeImg)
 },
 "origin": str(origin),
 "webpage_url": str(webpageURL)
 }
 req = requests.post(url, data = json.dumps(jsonData))
 print(req)
 ID = str(req.json())
 
 #update worksheet
 sheet.cell(row=r, column=10).value = ID #first index is 1. J <-> 10
wb.save('updatedTables.xlsx')

In case we are not satisfied with the results we get with the uploaded images, we can always decide to add more images to the database later on. This will likely be the case when we start launching queries where we want to analyse single details inside one image, for example a specific figure in a painting (Figure 2). Indeed, to do so we must create a new image that includes only the single detail and then upload it onto the CNN database. The query can be therefore launched with the returned ID.

Selecting a detail from an image.
Figure 2 – Selecting a detail from an image.

At the moment, we are working on the bot that will launch the queries, but, since we are just at the beginning of it, we will report it in the next post.

 References

[1]. Biugliarello E., Caputo E., Giunto A., Bilderatlas – Progress Post 1