Difference between revisions of "Running Jupyter Notebook"

(On Gadi)
(16 intermediate revisions by 4 users not shown)
Line 1: Line 1:
=On VDI=
 
  
Currently, the easiest way to run IPython Notebook is on NCI's [http://nci.org.au/services/vdi/ | Virtual Desktop Infrastructure (VDI)]. For a guide to set up and use VDI, click [https://opus.nci.org.au/display/Help/VDI+User+Guide | here].
+
= On VDI =
  
Within VDI, open a Linux terminal ('''Applications menu''' -> '''System Tools''' -> '''Terminal''').
+
Currently, the easiest way to run a Jupyter Notebook is on NCI's [[VDI|Virtual Desktop Infrastructure (VDI)]]. For a guide to set up and use VDI, click [https://opus.nci.org.au/display/Help/VDI+User+Guide here].
Inside the Terminal, load the conda environment
+
 
<syntaxhighlight lang=bash>
+
There are two options for running a Jupyter Notebook on the VDI. Either use the provided script to automate the process, or open a notebook in the strudel client window.
module use /g/data3/hh5/public/modules
+
 
 +
== Scripted Access ==
 +
=== Prerequisite ===
 +
For this method to work in the easiest way, you should become part of the hh5 project at NCI using [https://my.nci.org.au/mancini/project/hh5 my.nci.org.au].
 +
 
 +
This project will give you access to [[Conda|conda environments]] for Python managed by the CMS team.
 +
=== Method ===
 +
Clone the [https://github.com/coecms/nci_scripts nci_scripts repository]&nbsp;and see the instructions to run the <code>vdi_jupyter.py</code> script. The script will prompt for an NCI username the first time it is used and after that it should work without prompting if ssh keys are properly set up (see Setting up SSH, but note the ssh key has to be copied to your home directory on VDI. Once you have a running session you can see which VDI node you are running on and copy the SSH key to it, or use the [https://opus.nci.org.au/display/Help/VDI+User+Guide#VDIUserGuide-3.5.2.SFTPTransferNode sftp transfer node]).
 +
 
 +
The script should open a Jupyter window in your default browser. It will show a listing of your VDI home directory. To access <tt>/g/data </tt>disks make symbolic links to the directories you need to access. e.g.
 +
<pre>ln -s /g/data/v45 ~/gdata_v45
 +
</pre>
 +
 
 +
To run this command you will need shell access to the VDI. You&nbsp;can open a shell terminal window from within the jupyter notebook:
 +
 
 +
[[File:Jupyter terminal.png|How top open a terminal in a jupyter window]]
 +
 
 +
&nbsp;
 +
 
 +
== Access via strudel ==
 +
 
 +
Within VDI, open a Linux terminal ('''Applications menu''' -> '''System Tools''' -> '''Terminal'''). Inside the Terminal, load the conda environment
 +
<syntaxhighlight lang="bash">
 +
module use /g/data/hh5/public/modules
 
module load conda
 
module load conda
  
 
</syntaxhighlight>
 
</syntaxhighlight>
You should then be able to start the notebook with
+
You should then be able to start the notebook with <syntaxhighlight lang="bash">
<syntaxhighlight lang=bash>
 
 
jupyter notebook
 
jupyter notebook
 
</syntaxhighlight>
 
</syntaxhighlight>
<span style="background-color: #ffffff; color: #333333; display: block; font-family: &quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif; font-size: 14px;">
 
----
 
</span><span style="background-color: #ffffff; display: block; font-family: Arial,Helvetica,sans-serif;">
 
=On Raijin=
 
  
If you don't have access to VDI, you can still run IPython Notebook from Raijin in a browser on your local computer. Currently these instructions show how to run the notebook from the login node - '''which is not recommended''' - and, as such, are just a demonstration of what is possible. Ideally the notebook should be run from an interactive session.
+
=== Connect your local browser ===
</span>
+
 
Log in to raijin using SSH, forwarding your ports as you go (you might have to sudo this)
+
You can connect the browser on your computer to the jupyter notebook running on vdi once it's started.
<syntaxhighlight lang=bash>
+
 
ssh -L 8889:localhost:8889 [USERNAME]@raijin.nci.org.au
+
You still need to connect to VDI in the usual way. Once the session is started, open a terminal. You need the name of the computer that is running your VDI instance. It is of the form '''vdi-nXX''' where XX is a number. It is possible that this is already displayed in your command line prompt there, otherwise you can run the command:
 +
 
 +
&nbsp;
 +
<syntaxhighlight lang="bash">
 +
hostname
 
</syntaxhighlight>
 
</syntaxhighlight>
Load the conda environment
+
 
<syntaxhighlight lang=bash>
+
and it will give you the hostname. Remember it.
module use /g/data3/hh5/public/modules
+
 
 +
Next, you load the conda module as above, but tell jupyter to not start a browser:
 +
 
 +
&nbsp;
 +
<syntaxhighlight lang="bash">
 +
module use /g/data/hh5/public/modules
 
module load conda
 
module load conda
 +
jupyter notebook --no-browser
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Set up an ipython profile that serves your notebooks (adapted from [[https://ipython.org/ipython-doc/dev/notebook/public_server.html]])
+
You wait for this line:
<syntaxhighlight lang=bash>
+
 
ipython profile create nbserver
+
&nbsp;
 +
<syntaxhighlight lang="unknown">
 +
http://localhost:8888/?token=...
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Then modify the Notebook config file in ~/.ipython/profile_nbserver/ipython_config.py (or create that file if it wasn't generated) so that it reads:
+
The 4 digits after the colon (often just 8888, but not always) are the port. We use this number and the hostname to create a new ssh connection to VDI. So in a new terminal session on your computer, run the command:
<syntaxhighlight lang=python>
 
c = get_config()
 
c.NotebookApp.open_browser = False
 
# It is a good idea to put it on a known, fixed port - must match port used to ssh in.
 
c.NotebookApp.port = 8889
 
c.NotebookApp.base_project_url = 'ipython'
 
  
# The following can be used to set up a password,
+
&nbsp;
# from here: https://ipython.org/ipython-doc/dev/notebook/public_server.html
+
<syntaxhighlight lang="bash">
# c.NotebookApp.certfile = u'/absolute/path/to/your/certificate/mycert.pem'
+
ssh userXX@vdi-nXX.nci.org.au -L 8888:localhost:8888
# c.NotebookApp.ip = '*'
+
</syntaxhighlight>
# c.NotebookApp.password = u'sha1:bcd259ccf...[your hashed password here]'
 
  
 +
Of course, you need to replace '''userXX''' with your NCI username, '''vdi-nXX''' with the hostname, and ''both'' instances of '''8888''' with the actual port number.
 +
 +
Once you've done that, you can copy the full line that you got from jupyter notebook (the one
 +
<nowiki>http://localhost:8888/?token=...</nowiki>
 +
 +
(including the full token) into your browser of choice.
 +
 +
&nbsp;
 +
 +
= On Gadi =
 +
 +
You can also run a notebook from Gadi's compute nodes using the [https://github.com/coecms/nci_scripts/blob/master/gadi_jupyter gadi_jupyter] script.
 +
=== Pre-requisite ===
 +
For this method to work, you must become part of the hh5 project at NCI using [https://my.nci.org.au/mancini/project/hh5 my.nci.org.au].
 +
 +
This project will give you access to [[Conda|conda environments]] for Python managed by the CMS team.
 +
 +
=== Setting Up SSH ===
 +
 +
You will need to set up SSH keys to use gadi_jupyter
 +
 +
Create the file ~/.ssh/config with something like the following lines in:
 +
<syntaxhighlight lang="text">
 +
Host *.nci.org.au
 +
User abc123 # Your NCI username here
 
</syntaxhighlight>
 
</syntaxhighlight>
The dependencies required for ipython have been installed and should load automatically when the ipython module is loaded.
 
  
You should be able to start the notebook with
+
Create a SSH key by running 'ssh-keygen'. It will ask for a 'passphrase', make sure you enter one, it doesn't need to be the same as your NCI password. Use the default key file name.
<syntaxhighlight lang=bash>
+
 
ipython notebook --profile=nbserver
+
Copy your SSH key to gadi. Some computers have a command that can do this automatically - 'ssh-copy-id gadi.nci.org.au'. If not, you need to add the contents of the file '~/.ssh/id_rsa.pub' on your local computer to the end of the file '~/.ssh/authorized_keys' on Gadi.
 +
 
 +
Enable your SSH key by running the command 'ssh-add', it will ask for the SSH passphrase you entered when you made the key (which may not be your NCI password). If it says 'Could not open a connection to your SSH agent', or '‘could not connect to authentication agent', run 'ssh-agent bash' and then 'ssh-add'.
 +
 
 +
You can now run './gadi_jupyter'
 +
 
 +
=== Notes ===
 +
 
 +
*You need to download this script to your own computer and execute there. The script handles the connection.
 +
*This job requires an implementation of bash. This is default on MacOS and Linux, but not Windows.
 +
*This will cost SU depending on the queue and the resources requested (default: 1 cores, express queue)
 +
*You also won't have access to the internet to, for example, download data.
 +
 
 +
=== Running ===
 +
 
 +
You can download the script from the link above directly, or by running
 +
<syntaxhighlight lang="bash">
 +
git clone https://github.com/coecms/nci_scripts.git
 
</syntaxhighlight>
 
</syntaxhighlight>
  
On your local computer, direct your browser to [[http://127.0.0.1:8889/ipython/]] and you should have access to the notebook.
+
which will download the script into a directory called '''nci_scripts'''.
  
If another user is already running an ipython session using the port number above, the ipython process will tell you with a message like this:
+
There are several options that you can use when running the script:
<syntaxhighlight>
+
<syntaxhighlight lang="unknown">
[I 11:06:03.545 NotebookApp] The port 8899 is already in use, trying another random port.
+
General Options:
</syntaxhighlight>
+
    -h:        Print help
and it will tell you which port it has chosen instead:
+
    -l:         NCI username
<syntaxhighlight>
+
    -L:         NCI login node (default 'gadi.nci.org.au')
[I 11:06:03.552 NotebookApp] The IPython Notebook is running at: http://localhost:8900/ipython/
+
    -e:        Conda environment
 +
 
 +
Queue Options:
 +
    -q QUEUE:  Queue name
 +
    -n NCPU:   Use NCPU cpus
 +
    -m MEM:     Memory allocation (default 4*NCPU GB)
 +
    -t TIME:   Walltime limit (default 1 hour)
 +
    -J JOBFS:   Jobfs allocation (default 100 GB)
 +
    -P PROJ:   Submit job under project PROJ
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Unfortunately the ssh connection will only forward the port you specified when you logged in. You should edit the ~/.ipython/profile_nbserver/ipython_config.py file, change the port number to the one ipython chose (or another higher number less than 65535), log out and log back in, using the new port number.
+
=== Windows Desktops ===
<syntaxhighlight lang=bash>
+
 
ssh -L 8889:localhost:8890 [USERNAME]@raijin.nci.org.au
+
To run gadi_jupyter on Windows you'll need Bash and some form of SSH available. This could be from
</syntaxhighlight>
+
 
The first port number is the one you connect to on your local computer, and can remain unchanged.
+
* git bash - [https://gitforwindows.org/ https://gitforwindows.org/]
 +
* cygwin terminal - [https://www.cygwin.com/ https://www.cygwin.com/]
 +
 
 +
Install one of these, then within the terminal follow the 'Setting up SSH' instructions before running 'gadi_jupyter'. (The terminal should start in your 'My Documents' directory if you need to find the downloaded script)
  
 
[[Category:Python]]
 
[[Category:Python]]

Revision as of 01:32, 20 July 2020

On VDI

Currently, the easiest way to run a Jupyter Notebook is on NCI's Virtual Desktop Infrastructure (VDI). For a guide to set up and use VDI, click here.

There are two options for running a Jupyter Notebook on the VDI. Either use the provided script to automate the process, or open a notebook in the strudel client window.

Scripted Access

Prerequisite

For this method to work in the easiest way, you should become part of the hh5 project at NCI using my.nci.org.au.

This project will give you access to conda environments for Python managed by the CMS team.

Method

Clone the nci_scripts repository and see the instructions to run the vdi_jupyter.py script. The script will prompt for an NCI username the first time it is used and after that it should work without prompting if ssh keys are properly set up (see Setting up SSH, but note the ssh key has to be copied to your home directory on VDI. Once you have a running session you can see which VDI node you are running on and copy the SSH key to it, or use the sftp transfer node).

The script should open a Jupyter window in your default browser. It will show a listing of your VDI home directory. To access /g/data disks make symbolic links to the directories you need to access. e.g.

ln -s /g/data/v45 ~/gdata_v45

To run this command you will need shell access to the VDI. You can open a shell terminal window from within the jupyter notebook:

How top open a terminal in a jupyter window

 

Access via strudel

Within VDI, open a Linux terminal (Applications menu -> System Tools -> Terminal). Inside the Terminal, load the conda environment

module use /g/data/hh5/public/modules
module load conda

You should then be able to start the notebook with

jupyter notebook

Connect your local browser

You can connect the browser on your computer to the jupyter notebook running on vdi once it's started.

You still need to connect to VDI in the usual way. Once the session is started, open a terminal. You need the name of the computer that is running your VDI instance. It is of the form vdi-nXX where XX is a number. It is possible that this is already displayed in your command line prompt there, otherwise you can run the command:

 

hostname

and it will give you the hostname. Remember it.

Next, you load the conda module as above, but tell jupyter to not start a browser:

 

module use /g/data/hh5/public/modules
module load conda
jupyter notebook --no-browser

You wait for this line:

 

http://localhost:8888/?token=...

The 4 digits after the colon (often just 8888, but not always) are the port. We use this number and the hostname to create a new ssh connection to VDI. So in a new terminal session on your computer, run the command:

 

ssh userXX@vdi-nXX.nci.org.au -L 8888:localhost:8888

Of course, you need to replace userXX with your NCI username, vdi-nXX with the hostname, and both instances of 8888 with the actual port number.

Once you've done that, you can copy the full line that you got from jupyter notebook (the one http://localhost:8888/?token=...

(including the full token) into your browser of choice.

 

On Gadi

You can also run a notebook from Gadi's compute nodes using the gadi_jupyter script.

Pre-requisite

For this method to work, you must become part of the hh5 project at NCI using my.nci.org.au.

This project will give you access to conda environments for Python managed by the CMS team.

Setting Up SSH

You will need to set up SSH keys to use gadi_jupyter

Create the file ~/.ssh/config with something like the following lines in:

Host *.nci.org.au
User abc123 # Your NCI username here

Create a SSH key by running 'ssh-keygen'. It will ask for a 'passphrase', make sure you enter one, it doesn't need to be the same as your NCI password. Use the default key file name.

Copy your SSH key to gadi. Some computers have a command that can do this automatically - 'ssh-copy-id gadi.nci.org.au'. If not, you need to add the contents of the file '~/.ssh/id_rsa.pub' on your local computer to the end of the file '~/.ssh/authorized_keys' on Gadi.

Enable your SSH key by running the command 'ssh-add', it will ask for the SSH passphrase you entered when you made the key (which may not be your NCI password). If it says 'Could not open a connection to your SSH agent', or '‘could not connect to authentication agent', run 'ssh-agent bash' and then 'ssh-add'.

You can now run './gadi_jupyter'

Notes

  • You need to download this script to your own computer and execute there. The script handles the connection.
  • This job requires an implementation of bash. This is default on MacOS and Linux, but not Windows.
  • This will cost SU depending on the queue and the resources requested (default: 1 cores, express queue)
  • You also won't have access to the internet to, for example, download data.

Running

You can download the script from the link above directly, or by running

git clone https://github.com/coecms/nci_scripts.git

which will download the script into a directory called nci_scripts.

There are several options that you can use when running the script:

General Options:
    -h:         Print help
    -l:         NCI username
    -L:         NCI login node (default 'gadi.nci.org.au')
    -e:         Conda environment

Queue Options:
    -q QUEUE:   Queue name
    -n NCPU:    Use NCPU cpus
    -m MEM:     Memory allocation (default 4*NCPU GB)
    -t TIME:    Walltime limit (default 1 hour)
    -J JOBFS:   Jobfs allocation (default 100 GB)
    -P PROJ:    Submit job under project PROJ

Windows Desktops

To run gadi_jupyter on Windows you'll need Bash and some form of SSH available. This could be from

* git bash - https://gitforwindows.org/
* cygwin terminal - https://www.cygwin.com/

Install one of these, then within the terminal follow the 'Setting up SSH' instructions before running 'gadi_jupyter'. (The terminal should start in your 'My Documents' directory if you need to find the downloaded script)