Date
Attendees
Agenda
- GI Identification - tool information gathering
Discussion items
GI Identification
Luigi
Shannon
Ankit
Rob
Notes:
Overview – algorithm – project for Luigi on the overhead
This is the final alg using for GI detector for BD
It takes RGB image – 1024/1024 – breaks it down to scale by half – 512x512 image and you get 4
Keep original as well
Input image can be of any resolution
Then for each level – have 3 different features
Create a window of nxn size – from each window – slide window from left top to right top
Extract features
Classifier
X and Y location recorded
Such as tree or bicyclist
Creates a location for box of that classifier
Detects multiple high priority regions
Creates a pyramid of images – x levels deep
Tests the classifier on those windows
Demo – 512x512 – but imagine hi rez and more images and populate locations
Much larger / complexity
Want to test model on BD on single level of data – 100,000 to 2,000,000 images
Test real time online detection
Local machine got stuck on a couple thousand images
Can BD handle this load?
Include this algorithm – he is testing deep learning models as well – 120 hours to train on Blue Waters
GPU clusters – all hand labeled for learning / training data
So far 76 to 88 percent accurate
Improve algorithmically? Or increase computational power?
BD helps you scale large collections by deploying many instances of extractors – if you put 1,000,000 images – then BD would scale up to many instances to handle that many – easy to parallelize since the images are independent
Need to tell the extractor – we can run this many instances at once – then upload all the images to it
Currently scaling testing
In this case we do not want to run every image extractor – we only want to run 1 extractor on all the photos
Can start doing this now – we can monitor together
Inside each extractor – you can modify model – we can make sure there are enough resources on each node – each node only has so much memory – so we need to make sure the images are not too large
Eventually this would be used in Dallas – scale it up for the city of Dallas
Could make it so you give xml with location of photos – so specific to that use case – at this point it is intensive in specificity
Need to be able to upload images to BD – send an image/images to BD
BD reacts to data coming in
In future – connect to local data instead of having to upload?
Green path – call to google API to get one image? Or chunk locally
Send the chunks
Get scores back
Then combine locally
Work with Marcus to refine and Ankit needs updated on the new API
Call just that one extractor – and allow that extractor to scale up (right now we only allow 5)
Will need to make sure the RAM instantiated with the docker VM is enough