• Implemented zero-order optimization for neural network, enabling effective training of gradient-agnostic models.
  • Employed feature selection, feature reuse, iterative model pruning, and multi-GPU parallelism to accelerate training and further lower memory usage.
  • Demonstrated only a 1% or less drop in classification accuracy while achieving a 90% reduction in memory consumption compared to the Adam optimizer.

Updated: