Enabling Python VirtualEnv in JupyterLab

This post illustrates how you can enable Python virtualenv in GCP JupyterLab so that you can organize your .ipynb files to use different virtual environments to keep track of Python package dependencies.

PROBLEM

  • You are using GCP JupyterLab.
  • You want to adhere to the Python development best practices by not polluting the global environment with your Python packages so that you can generate a cleaner “pip freeze” in the future.
  • You want each Notebook file (.ipynb) to have its own environment so that you can run them with different package versions.
  • You configured a Python virtual environment, but “pip install” from the Notebook file still installs the packages in the global environment.
  • You are fed up.

SOLUTION

Configuring Virtual Environments

In the JupyterLab Notebook’s terminal, create an empty directory to organize all virtual environments.

mkdir virtualenv

Ensure ipykernel is installed. This is used to create new IPython kernels.

python3 -m pip install ipykernel

For each new virtual environment, run the following commands to perform these steps:

  • Get into the base virtual environment directory.
  • Define a new virtual environment name. Replace [NEW_ENV_NAME] with a new name.
  • Create new Python virtual environment.
    • The –system-site-packages option ensures you can still use the “data-sciencey” packages that come pre-installed with the GCP JupyterLab Notebook within your new virtual environment.
  • Jump into the newly created virtual environment.
  • Create a new IPython kernel.
  • Exit from virtual environment.
cd virtualenv
VENV=[NEW_ENV_NAME] # Update this!
python3 -m venv $VENV --system-site-packages
source $VENV/bin/activate
python -m ipykernel install --user --name=$VENV
deactivate

Configuring Notebook File

Create a new Notebook file (.ipynb).

In Select Kernel dialog, select the kernel that you created. In this example, there are 2 new virtual environments (“smurfs” and “thundercats”).

Selecting a kernel in JupyterLab

To perform a simple test, install a new package.

%pip install pandas==1.3.0

IMPORTANT: You need to use IPython’s Magics (literally speaking) to ensure the packages are installed in the virtual environment.

  • %pip = This uses the pip package manager within the current kernel. Magic!
  • ! pip = This uses the pip package manager from the underlying OS. No magic!

From the menu bar, select Kernel -> Restart Kernel and Clear All Outputs… . Always restart the kernel when new packages are installed with %pip.

Kernel -> Restart Kernel and Clear All Outputs in JupyterLab

Inspect the package version. This should show the version you have just installed.

import pandas
print(pandas.__version__)

To verify this actually works, create another Notebook file pointing to another kernel. In this file, install the same package but with different version.

Testing Python Virtualenv in JupyterLab

GCSFuse + Docker: “Error while creating mount source path ‘/a’: mkdir /a: file exists.”

This post illustrates how you can mount a GCS bucket using GCSFuse on your host machine and expose it as a volume to a Docker container.

PROBLEM

You want to volume mount a FUSE-mounted directory to a container, for example:

When attempting to run the container…

docker run -it --rm -v /my-kfc-bucket:/home busybox

… an error occurred:

docker: Error response from daemon: error while creating
mount source path '/my-kfc-bucket': mkdir /my-kfc-bucket: 
file exists.

SOLUTION

Unmount the existing FUSE-mounted directory.

sudo umount /my-kfc-bucket

Mount it back with the following option. Because this command with -o allow_other must be executed with sudo privilege, you will need to change the root ownership to yourself (via –uid and –gid) so that you can easily read/write within the directory.

sudo gcsfuse \
  -o allow_other \
  --uid $(id -u) \
  --gid $(id -g) \
  gcs-bucket /my-kfc-bucket  

If it is successful, the output should look like this:

Start gcsfuse/0.40.0 (Go version go1.17.6) for app "" using mount point: /my-kfc-bucket
Opening GCS connection...
Mounting file system "gcs-bucket"...
File system has been successfully mounted.

Rerun the docker container.

docker run -it --rm -v /my-kfc-bucket:/home busybox

Now, you can read/write the GCS bucket’s data from the container. In this example, the GCS bucket’s data is located in /home.

GCP: Accessing GUI-Based App in GCE from Mac using X11

PROBLEM

You want to access a GUI-based software that is installed in a GCE instance without using NoMachine.

SOLUTION

GCE Instance (One Time Configuration)

Ensure X11Forwarding is enabled and set to yes. If not, change it.

grep X11Forwarding /etc/ssh/sshd_config

If a change is made to this file, restart the service.

sudo systemctl restart sshd

Mac (One Time Configuration)

Apple no longer include X11 since OS X 10.8 Mountain Lion. So, you need to install XQuartz, which is an open source version of X.

Using Homebrew, install XQuartz.

brew install xquartz

Reboot the machine. If you don’t reboot, it will not work.

Once rebooted, add the following line to ~/.ssh/config. If this file does not exist, create it.

ForwardX11 yes

This configuration prevents us from explicitly passing the -X option to the SSH command.

Accessing GUI-based software from Mac

Log into GCP.

gcloud auth login

SSH into the GCE instance.

gcloud compute ssh [INSTANCE] --project [PROJECT] --zone [ZONE]

Once you are in the GCE instance, run any GUI-based software from the command line.