Page History

...

In short, the fellowship is exploring the creation of reduced-dimensional term-topic matrices for the HathiTrust collection. This includes the exploration of scalable methods for dimension reduction/topic modeling (LSA/pLSA, LDA, autoencoders) for the full collection.

Updates

12/14/2017

BW access finally in place as of 12/12, can start transfer process but need to enable Globus endpoint for HT data.
Allocation will be used for two different projects related to HTRC – faculty fellowship and ngramming of HT data. Will meet with both projects teams on 12/15 to coordinate.

12/6/2017

BW allocation approved, still waiting for access.
Will work with Capitanu on sync'ing initial data for evaluation of deeplearning4j by end of week.
Will meet with Co-PI Bhattacharyya 12/11 about BW project we are piggy-backing on

11/27/2017

Conference call (Willis, Capitanu)
Still waiting for BW allocation
Boris explored deploying TensorFlow on TORQUE cluster and concluded that it's too complicated given that the deeplearning4j Spark already has a variational autoencoder implementation
Will focus on deeplearning4j for now. Craig to request update on BW access.

11/20/2017

Conference call (Willis, Capitanu)
Discussed Tensorflow v deeplearning4j for scalable autoencoder implementations
- Spark has support for SVD and LDA. Deeplearning4j add autoencoders for Spark.
- Both can use GPUs
Autoencoders
- Proposing to use Sparse autoencoders
- Hinton paper appears to be the motivation for applying autoencoders to text
- Hinton and Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks
- Lecture on youtube: https://www.youtube.com/watch?v=ARQ6PZh8vgE
- Compare results to LSA only (on Reuters collection)
- TensorFlow has VariationalAutoEncoder implementation as does deeplearning4j
For next meeting, will prepare the following:
- Shared access to either BW, ROGER, or IU (HTRC) cluster
- Download and prepare Ted's 100K english volumes (need collection information)
- Preliminary scaling of Tensorflow and deeplearning4j autoencoder with either Ted's or other collection
- Access to BW allocation, if possible

...

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Updates