This document is under construction. It will describe the details about integrating already generated metadata with Clowder.

Background

At the time of writing this document, there are about 171,000 images from the Library of Congress Farm Security Administration/Office of War Information Photograph Collection that were processed for extracting various features like faces, eyes, facial profile, closeups, printed text, presence of Stryker hole, presence of border, mean and standard deviation of grayscale values, subject details, photographer details, and category details. These were computed on XSEDE Comet by using stripped down version of Clowder Extractors or in certain cases, by creating new standalone programs. Integrating this information with Clowder is important to use its features like RESTful API, authentication and authorization, available visualizations, etc.

Database Table Descriptions and Converted Sample JSON Documents

The following set of tables contain description about the database tables that contain extracted metadata. Following each database table description is an example JSON document that will be generated by the Extractor Integration Script for that particular table.

CategoryInfo

Sl. No.	Database Column Name	Field Description	Remarks
1	id	LOC Index	String;
2	category	LOC Category number (other_number field in the image JSON document)	String;

CategoryInfo JSON Document

{
	"id": "fsa1997018591",
	"category": "F 665"
}

CreatorInfo

NOTE: some creators are empty strings, so it might need some refinement.

Sl. No.	Database Column Name	Field Description	Remarks
1	id	LOC Index	String;
2	name	Creator name in the format: <last name>, <first name>, <birth year> - <death year>. If the creator name is blank the value is NULL.	String;
3	year_mon	Year and month (abbreviated in certain cases) in which the photograph was taken in the format: <year> - <month \| month1 - month2 \| season >	String; Some year - month values are like '[between 1940 and 1946]'; The format mentioned in the left cell may not be strictly followed. Need to look into this in detail when doing the transformation.

FacesInfo

Sl. No.	Database Column Name	Field Description	Remarks
1	id	LOC Index	String;
2	imght	Image height	Float;
3	imgwid	Image width	Float;
4	dumb1	The letter F, it's only there to help browse raw data	String;
5	num_faces	Number of faces found	Integer;
6	face_segs	Bounding box location of faces	String; this is a text string that has the i^thface, x, y, width, height of face segment. Each face segment is separated by a semicolon.
7	dumb2	The letter P, it's only there to help browse raw data	String;
8	num_profiles	Number of profiles found	Integer;
9	prof_segs	Bounding box location of profiles	String;
10	dumb3	The letter Y, it's only there to help browse raw data	String;
11	num_eyes	Number of eyes found	Integer;
12	eye_segs	Bounding box location of eyes	String;
13	dumb4	The letter C , it's only there to help browse raw data	String;
14	num_fullcls	Number of face full closeups	Integer; 'FULL' is relative to image size
15	num_midcls	Number of face mid closeups	Integer; 'MID' is relative to image size
16	num_fullprof	Number of profile full closeups	Integer; 'FULL' is relative to image size
17	num_midprof	Number of profile mid closeups	Integer; 'MID' is relative to image size

ImageProperties

Sl. No.	Database Column Name	Field Description	Remarks
1	id	LOC Index	String;
2	hole	Presence of Stryker hole	Boolean;
3	border	Presence of border	Boolean;
4	meangray	Mean of grayscale values (not including hole and border)	Float;
5	stdgray	Standard deviation of grayscale values (not including hole and border)	Float;

ImageFilesList

Sl. No.	Database Column Name	Field Description	Remarks
1	fileid	File ID (Serial number)	Integer;
2	id	LOC Index	String;
3	cometfn	Filename in Comet	String;
4	locurl	URL of the photograph in LOC website	String;

OCRInfo

Sl. No.	Database Column Name	Field Description	Remarks
1	id	LOC Index	String;
2	ocr_pred	Overall prediction of whether or not text is present in image. 'nop' means OCR found nothing. Where if any one box predicted text then the final prediction is set to T.	String; What is the difference between nop and F?
3	scores	Prediction scores. A string that consists of sets of 3 numbers (separated by semicolon) where, for each OCR text box found, a 0/1 classification value indicating no-text/text predicted, 2 floats indicating classification score for no-text/text	String;
4	box_sum	A count of number of 1's found across text box score sets	Integer;
5	box_cnt	Number of text boxes. Note that box_sum / box_cnt is another possible score instead of the T/F above.	Integer; What is the difference between box_sum and box_cnt?
6	box_txt	Set of strings separated by semicolon. One string for each text box found in OCR process	String;
7	box_locs	A string that consists of sets of 4 numbers (separated by semicolon) one set for each text box, where the numbers are upper left x coordinate, upper left y coordinate, box width, box height.	String;

SubjectInfo

Sl. No.	Database Column Name	Field Description	Remarks
1	id	LOC Index	String;
2	subject	Subject information	String;

Space shortcuts

Page tree

Background

Database Table Descriptions and Converted Sample JSON Documents

CategoryInfo

CreatorInfo

FacesInfo

ImageProperties

ImageFilesList

OCRInfo

SubjectInfo

Space shortcuts

Page tree

Integrating Generated Metadata with Clowder

Background

Database Table Descriptions and Converted Sample JSON Documents

CategoryInfo

CreatorInfo

FacesInfo

ImageProperties

ImageFilesList

OCRInfo

SubjectInfo