Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In short, the fellowship is exploring the creation of reduced-dimensional term-topic matrices for the HathiTrust collection. This includes the exploration of scalable methods for dimension reduction/topic modeling (LSA/pLSA, LDA, autoencoders) for the full collection.

Updates

11/27/2017

  • Conference call (Willis, Capitanu)
  • Still waiting for BW allocation
  • Boris explored deploying TensorFlow on TORQUE cluster and concluded that it's too complicated given that the deeplearning4j Spark already has a variational autoencoder implementation
  • Will focus on deeplearning4j for now.  Craig to request update on BW access.

11/20/2017

  • Conference call (Willis, Capitanu)
  • Discussed Tensorflow v deeplearning4j for scalable autoencoder implementations
    • Spark has support for SVD and LDA.  Deeplearning4j add autoencoders for Spark.
    • Both can use GPUs
  • Autoencoders
  • For next meeting, will prepare the following:
    • Shared access to either BW, ROGER, or IU (HTRC) cluster
    • Download and prepare Ted's 100K english volumes (need collection information)
    • Preliminary scaling of Tensorflow and deeplearning4j autoencoder with either Ted's or other collection
    • Access to BW allocation, if possible

...