This document is under construction. It will describe the details about integrating already generated metadata with Clowder.

Background

At the time of writing this document, there are about 171,000 021 images from the Library of Congress Farm Security Administration/Office of War Information Photograph Collection that were processed for extracting various features like faces, eyes, facial profile, closeups, printed text, presence of Stryker hole, presence of border, mean and standard deviation of grayscale values, subject details, photographer details, and category details. These were computed on XSEDE Comet by using stripped down version of Clowder Extractors or in certain cases, by creating new standalone programs. Integrating this information with Clowder is important to use its features like RESTful API, authentication and authorization, available visualizations, etc.

...

The following set of tables contain description about the database tables that contain extracted metadata. Following each database table description is an example JSON document that will be generated by the Extractor Integration Script for that particular table.

CategoryInfo

Sl. No.	Database Column Name	JSON Metadata Field Name	Field Description	Remarks
1	id	loc_id	LOC Index	String;
2	category	category	LOC Category number (other_number field in the image JSON document)	String;

Code Block

language	js
theme	Eclipse
title	CategoryInfo Sample JSON Document

// Input CSV Data: fsa1997018591#F 665
{
	"loc_id": "fsa1997018591",
	"category": "F 665"
}

...

NOTE: some creators are empty strings, so it might need some refinement.

Sl. No.	Database Column Name	JSON Metadata Field Name	Field Description	Remarks
1	id	loc_id	LOC Index	String;
2	name	Creator	Creator name loosely in the format: <last name>, <first name>, <birth year> - <death year>. One or all of these sub parts could be missing. If the creator name is blank the value is NULL.	String;
3	year_mon	Year

and month (abbreviated in certain cases) in

Year in which the photograph was taken in the format

: <year> - <month | month1 - month2 | season >

(null if not available).

String; Some year

- month

values are like '[between 1940 and 1946]'; The format mentioned in the left cell may not be strictly followed. Need to look into this in detail when doing the transformation.

Code Block

language	js
theme	Eclipse
title	CreatorInfo Sample JSON Document

// Input CSV Data: fsa1997018591#Shahn, Ben, 1898-1969#1938 Aug.
{
	"loc_id": "fsa1997018591",
	"Creator": "Shahn, Ben, 1898-1969",
	"Year": "1938"
}

FacesInfo

Sl. No.	Database Column Name	JSON Metadata Field Name	Field Description	Remarks
1	id	loc_id	LOC Index	String;
2	imght	image_height	Image height	Float;
3	imgwid	image_width	Image width	Float;
4	dumb1	N/A	The letter F, it's only there to help browse raw data	String;
5	num_faces	Number of People	Number of faces found	Integer;
6	face_segs	face_segments	Bounding box location of faces	String; this is a text string that has the i^thface, x, y, width, height of face segment. Each face segment is separated by a semicolon.
7	dumb2	N/A	The letter P, it's only there to help browse raw data	String;
8	num_profiles	num_profiles	Number of profiles found	Integer;
9	prof_segs	profile_segments	Bounding box location of profiles	String;
10	dumb3	N/A	The letter Y, it's only there to help browse raw data	String;
11	num_eyes	num_eyes	Number of eyes found	Integer;
12	eye_segs	eye_segments	Bounding box location of eyes	String;
13	dumb4	N/A	The letter C , it's only there to help browse raw data	String;
14	num_fullcls	num_full_closeups	Number of face full closeups	Integer; 'FULL' is relative to image size
15	num_midcls	num_mid_closeups	Number of face mid closeups	Integer; 'MID' is relative to image size
16	num_fullprof	num_full_profiles	Number of profile full closeups	Integer; 'FULL' is relative to image size
17	num_midprof	num_mid_profiles	Number of profile mid closeups	Integer; 'MID' is relative to image size

Code Block

language	js
theme	Eclipse
title	FacesInfo Sample JSON Document

//Input CSV Data: fsa1997018503#723#1024#F#1#1,185,210,113,113;#P#0#-1#Y#1#1,55,31,26,26;#C#0#0#0#0
{
	"loc_id": "fsa1997018503",
	"image_height": 723.0,
	"image_width": 1024.0,
	"Number of People": 1,
	"face_segments": 
		[
			{
				"x": 185,
				"y": 210,
				"width": 113,
				"height": 113
			}
		],
	"num_profiles": 0,
	"profile_segments": [],
	"num_eyes": 1,
	"eye_segments": 
		[
			{
				"x": 55,
				"y": 31,
				"width": 26,
				"height": 26
			}
		],
	"num_full_closeups": 0,
	"num_midc_closeups": 0,
	"num_full_profiles": 0,
	"num_mid_profiles": 0
}

ImageFilesList

Sl. No.	Database Column Name	JSON Metadata Field Name	Field Description	Remarks
1	fileid	N/A	File ID (Serial number)	Integer;
2	id	loc_id	LOC Index	String;
3	cometfn	N/A	Filename in Comet	String;
4	locurl	loc_url	URL of the photograph in LOC website	String;

Code Block

language	js
theme	Eclipse
title	ImageFilesList Sample JSON Document

// Input CSV Data: 1#fsa1997018564#/oasis/projects/nsf/vlp101/sandeeps/complete_fsa_owi_data/186/fsa1997018564.PP_large.jpg#//hdl.loc.gov/loc.pnp/fsa.8a18634
{
	"loc_id": "fsa1997018503",
	"loc_url": "http://hdl.loc.gov/loc.pnp/fsa.8a18634"
}

ImageProperties

Sl. No.	Database Column Name	JSON Metadata Field Name	Field Description	Remarks
1	id	loc_id	LOC Index	String;
2	hole	Stryker Punch	Presence of Stryker hole	Boolean;
3	border	Border	Presence of border	Boolean;
4	meangray	mean_grayscale_value	Mean of grayscale values (not including hole and border)	Float;
5	stdgray	std_grayscale_value	Standard deviation of grayscale values (not including hole and border)	Float;

...

Code Block

language	js
theme	Eclipse
title	ImageProperties Sample JSON Document

//  Input CSV Data: fsa1997018564#True#True#139.37#42.41
{
	"loc_id": "fsa1997018564",
	"Stryker Punch": true,
	"Border": true,
	"mean_grayscale_value": 139.37,
	"std_grayscale_value": 42.41
}

LocationInfo

Sl. No.	Database Column Name	JSON Metadata Field Name	Field Description	Remarks
1	id

fileidFile ID (Serial number)Integer;2idLOC IndexString;3cometfnFilename in CometString;4locurlURL of the photograph in LOC websiteString;

loc_id	LOC Index	String;
2	lat	latitude	Latitude coordinate	Float;
3	long	longitude	Longitude coordinate	Float;
4	state	State	Abbreviation or state name (if abbreviation not found) or "United States" if no. Other country names are kept as it is.	String;
5	city	city	City name, when available	String;
6	county	county	County name, when available (this value is mostly unavailable)	String;

Code Block

language	js
theme	Eclipse
title	LocationInfo Sample JSON Document

//  Input CSV Data: fsa1997018591#40.106639#-83.767142#OH#Urbana#nop
{
	"loc_id": "fsa1997018591",
	"latitude": 40.106639,
	"longitude": -83.767142,
	"State": "OH",
	"city": "Urbana",
	"county": null
}

OCRInfo

Sl. No.	Database Column Name	JSON Metadata Field Name	Field Description	Remarks
1	id	loc_id	LOC Index	String;
2	ocr_pred	Text	Overall prediction of whether or not text is present in image. 'nop' means OCR found nothing. Where if any one box predicted text then the final prediction is set to T.	String; nop means no text was found; F means after finding possible text regions, it was ruled out not to be text;
3	scores	ocr_scores	Prediction scores. A string that consists of sets of 3 numbers (separated by semicolon) where, for each OCR text box found, a 0/1 classification value indicating no-text/text predicted, 2 floats indicating classification score for no-text/text	String;
4	box_sum	num_text_predictions	A count of number of 1's found across text box score sets	Integer;
5	box_cnt	num_text_boxes	Number of text boxes. Note that box_sum / box_cnt is another possible score instead of the T/F above.	Integer; box_sum is the count of '1's while box_cnt is the count of 'T's.
6	box_txt	ocr_results	Set of strings separated by semicolon. One string for each text box found in OCR process	String;
7	box_locs	ocr_text_boxes	A string that consists of sets of 4 numbers (separated by semicolon) one set for each text box, where the numbers are upper left x coordinate, upper left y coordinate, box width, box height.	String;

Code Block

language	js
theme	Eclipse
title	OCRInfo Sample JSON Document

//  Input CSV Data: fsa1997000226#T#1,0.49,0.51;1,0.06,0.94#2#2#Ironnzny;GARAGE#703,264,71,60;713,277,55,14
{
	"loc_id": "fsa1997000226",
	"Text": "true",
	"ocr_scores": 
	[
		{
			"prediction_bit": 1,
			"prediction_score": 0.51
		},
		{
			"prediction_bit": 1,
			"prediction_score": 0.94
		}
	],
	"num_text_predictions": 2,
	"num_text_boxes": 2,
	"ocr_results": ["Ironnzny", "GARAGE"],
	"ocr_text_boxes": 
	[
		{
			"x": 703,
			"y": 264,
			"width": 71,
			"height": 60
		},
		{

			"x": 713,
			"y": 277,
			"width": 55,
			"height": 14
		}
	]
	
}

SubjectInfo

Sl. No.	Database Column Name	JSON Metadata Field Name	Field Description	Remarks
1	id	loc_id	LOC Index	String;
2	subject	subject	Subject information	String;

Code Block

language	js
theme	Eclipse
title	SubjectInfo Sample JSON Document

// Input CSV Data: fsa1998017950#Farms, rural scenes--Vermont
{
	"loc_id": "fsa1998017950",
	"subject": "Farms, rural scenes--Vermont"
}

PyClowder2

Write a short note about PyClowder2 - the latest version of Python library for writing Clowder extractors

...

Space shortcuts

Page tree

Versions Compared

Old Version 40

New Version Current

Key

Table of Contents

Background

CategoryInfo

FacesInfo

ImageFilesList

ImageProperties

LocationInfo

OCRInfo

SubjectInfo

PyClowder2

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 40

New Version Current

Key

Table of Contents

Background

CategoryInfo

FacesInfo

ImageFilesList

ImageProperties

LocationInfo

OCRInfo

SubjectInfo

PyClowder2