Renku Python Library, CLI and Service

https://github.com/SwissDataScienceCenter/renku-python/workflows/Test,%20Integration%20Tests%20and%20Deploy/badge.svg https://img.shields.io/coveralls/SwissDataScienceCenter/renku-python.svg https://img.shields.io/github/tag/SwissDataScienceCenter/renku-python.svg https://img.shields.io/pypi/dm/renku.svg Documentation Status https://img.shields.io/github/license/SwissDataScienceCenter/renku-python.svg

A Python library for the Renku collaborative data science platform. It includes a CLI and SDK for end-users as well as a service backend. It provides functionality for the creation and management of projects and datasets, and simple utilities to capture data provenance while performing analysis tasks.

NOTE:
renku-python is the python library and core service for Renku - it does not start the Renku platform itself - for that, refer to the Renku docs on running the platform.

Installation

Renku releases and development versions are available from PyPI. You can install it using any tool that knows how to handle PyPI packages. Our recommendation is to use :code:pipx.

Note

We do not officially support Windows at this moment. The way Windows handles paths and symlinks interferes with some Renku functionality. We recommend using the Windows Subsystem for Linux (WSL) to use Renku on Windows.

Prerequisites

Renku depends on Git under the hood, so make sure that you have Git installed on your system.

Renku also offers support to store large files in Git LFS, which is used by default and should be installed on your system. If you do not wish to use Git LFS, you can run Renku commands with the -S flag, as in renku -S <command>. More information on Git LFS usage in renku can be found in the Data in Renku section of the docs.

Renku uses CWL to execute recorded workflows when calling renku update or renku rerun. CWL depends on NodeJs to execute the workflows, so installing NodeJs is required if you want to use those features.

For development of the service, Docker is recommended.

pipx

First, install pipx and make sure that the $PATH is correctly configured.

$ python3 -m pip install --user pipx
$ python3 -m pipx ensurepath

Once pipx is installed use following command to install renku.

$ pipx install renku
$ which renku
~/.local/bin/renku

pipx installs Renku into its own virtual environment, making sure that it does not pollute any other packages or versions that you may have already installed.

Note

If you install Renku as a dependency in a virtual environment and the environment is active, your shell will default to the version installed in the virtual environment, not the version installed by pipx.

To install a development release:

$ pipx install --pip-args pre renku

pip

$ pip install renku

The latest development versions are available on PyPI or from the Git repository:

$ pip install --pre renku
# - OR -
$ pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku

Use following installation steps based on your operating system and preferences if you would like to work with the command line interface and you do not need the Python library to be importable.

Windows

Note

We don’t officially support Windows yet, but Renku works well in the Windows Subsystem for Linux (WSL). As such, the following can be regarded as a best effort description on how to get started with Renku on Windows.

Renku can be run using the Windows Subsystem for Linux (WSL). To install the WSL, please follow the official instructions.

We recommend you use the Ubuntu 20.04 image in the WSL when you get to that step of the installation.

Once WSL is installed, launch the WSL terminal and install the packages required by Renku with:

$ sudo apt-get update && sudo apt-get install git python3 python3-pip python3-venv pipx

Since Ubuntu has an older version of git LFS installed by default which is known to have some bugs when cloning repositories, we recommend you manually install the newest version by following these instructions.

Once all the requirements are installed, you can install Renku normally by running:

$ pipx install renku
$ pipx ensurepath

After this, Renku is ready to use. You can access your Windows in the various mount points in /mnt/ and you can execute Windows executables (e.g. *.exe) as usual directly from the WSL (so renku run myexecutable.exe will work as expected).

Docker

The containerized version of the CLI can be launched using Docker command.

$ docker run -it -v "$PWD":"$PWD" -w="$PWD" renku/renku-python renku

It makes sure your current directory is mounted to the same place in the container.

Getting Started

Interaction with the platform can take place via the command-line interface (CLI).

Start by creating for folder where you want to keep your Renku project:

$ renku init my-renku-project
$ cd my-renku-project

Create a dataset and add data to it:

$ renku dataset create my-dataset
$ renku dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renku-python/master/README.rst

Run an analysis:

$ renku run wc < data/my-dataset/README.rst > wc_readme

Trace the data provenance:

$ renku log wc_readme

These are the basics, but there is much more that Renku allows you to do with your data analysis workflows.

For more information about using renku, refer to the renku –help.