Friday, October 7, 2022

Installation of Elephas (for distributed deep learning) on Ubuntu through archives (Apr 2020)


Elephas is an extension of Keras, which allows you to run distributed deep learning models at scale with Spark. Elephas currently supports a number of applications, including:

% Data-parallel training of deep learning models
% Distributed hyper-parameter optimization
% Distributed training of ensemble models

Schematically, elephas works as follows.
We have listed packages that are required outside of Anaconda distribution. The following code goes in a Shell (.sh) script for Ubuntu or .bat script Windows: pip install Keras_Applications-1.0.8.tar.gz pip install keras-team-keras-preprocessing-1.1.0-0-gff90696.tar.gz pip install Keras-2.3.1.tar.gz pip install hyperopt-0.2.4-py2.py3-none-any.whl pip install hyperas-0.4.1-py3-none-any.whl pip install tensorflow_estimator-2.1.0-py2.py3-none-any.whl pip install grpcio-1.28.1-cp37-cp37m-manylinux2010_x86_64.whl pip install protobuf-3.11.3-cp37-cp37m-manylinux1_x86_64.whl pip install gast-0.3.3.tar.gz pip install opt_einsum-3.2.1.tar.gz pip install astor-0.8.1.tar.gz pip install absl-py-0.9.0.tar.gz pip install cachetools-4.1.0.tar.gz pip install pyasn1-0.4.8.tar.gz pip install pyasn1-modules-0.2.8.tar.gz pip install rsa-4.0.tar.gz pip install google-auth-1.14.1.tar.gz pip install oauthlib-3.1.0.tar.gz pip install requests-oauthlib-1.3.0.tar.gz pip install google-auth-oauthlib-0.4.1.tar.gz pip install Markdown-3.2.1.tar.gz pip install tensorboard-2.1.1-py3-none-any.whl pip install google-pasta-0.2.0.tar.gz pip install gast-0.2.2.tar.gz pip install termcolor-1.1.0.tar.gz pip install tensorflow-2.1.0-cp37-cp37m-manylinux2010_x86_64.whl pip install pypandoc-1.5.tar.gz pip install py4j-0.10.7.zip pip install pyspark-2.4.5.tar.gz pip install elephas-0.4.3-py3-none-any.whl Generated .whl files are stored in directory (here 'ashish' is my username): /home/ashish/.cache/pip/wheels Few packages were not accepted for latest release labels: # tensorflow-estimator [2.2.0, >=2.1.0rc0] (from tensorflow==2.1.0). Latest available is: 2.2.0 # pip install gast-0.3.3.tar.gz # pip install py4j-0.10.9.tar.gz Most of these packages are required by TensorFlow, except: 1. hyperopt-0.2.4-py2.py3-none-any.whl 2. hyperas-0.4.1-py3-none-any.whl 3. pypandoc-1.5.tar.gz 4. py4j-0.10.7.zip 5. pyspark-2.4.5.tar.gz All the packages are present in this Google Drive link, except TensorFlow and PySpark due to their sizes. PySpark size: 207 MB TensorFlow size: 402 MB Running the shell script second time, uninstalls and reinstalls the packages again. Here is a Python script to avoid doing this and install only if packages are not installed already: import sys import subprocess import pkg_resources required = { 'pyspark', 'scipy', 'tensorflow' } installed = { pkg.key for pkg in pkg_resources.working_set } missing = required - installed if missing: python = sys.executable subprocess.check_call([python, '-m', 'pip', 'install', *missing], stdout=subprocess.DEVNULL) References: 1. Elephas Documentation 2. GitHub Repository
Tags: Technology,Deep Learning,Machine Learning,Big Data,

No comments:

Post a Comment