Renku Python Library, CLI and Service¶
A Python library for the Renku collaborative data science platform. It includes a CLI and SDK for end-users as well as a service backend. It provides functionality for the creation and management of projects and datasets, and simple utilities to capture data provenance while performing analysis tasks.
- NOTE:
renku-python
is the python library and core service for Renku - it does not start the Renku platform itself - for that, refer to the Renku docs on running the platform.
Installation¶
Renku releases and development versions are available from PyPI. You can install it using any tool that knows how to handle PyPI packages. Our recommendation is to use :code:pipx.
Note
We do not officially support Windows at this moment. The way Windows handles paths and symlinks interferes with some renku functionality. We recommend using the Windows Subsystem for Linux (WSL) to use renku on Windows.
pipx
¶
First, install pipx
and make sure that the $PATH
is correctly configured.
$ python3 -m pip install --user pipx
$ pipx ensurepath
Once pipx
is installed use following command to install renku
.
$ pipx install renku
$ which renku
~/.local/bin/renku
pipx
installs renku into its own virtual environment, making sure that it
does not pollute any other packages or versions that you may have already
installed.
Note
If you install renku as a dependency in a virtual environment and the
environment is active, your shell will default to the version installed
in the virtual environment, not the version installed by pipx
.
To install a development release:
$ pipx install --pip-args pre renku
pip
¶
$ pip install renku
The latest development versions are available on PyPI or from the Git repository:
$ pip install --pre renku
# - OR -
$ pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku
Use following installation steps based on your operating system and preferences if you would like to work with the command line interface and you do not need the Python library to be importable.
Docker¶
The containerized version of the CLI can be launched using Docker command.
$ docker run -it -v "$PWD":"$PWD" -w="$PWD" renku/renku-python renku
It makes sure your current directory is mounted to the same place in the container.
Getting Started¶
Interaction with the platform can take place via the command-line interface (CLI).
Start by creating for folder where you want to keep your Renku project:
$ renku init my-renku-project
$ cd my-renku-project
Create a dataset and add data to it:
$ renku dataset create my-dataset
$ renku dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renku-python/master/README.rst
Run an analysis:
$ renku run wc < data/my-dataset/README.rst > wc_readme
Trace the data provenance:
$ renku log wc_readme
These are the basics, but there is much more that Renku allows you to do with your data analysis workflows.
For more information about using renku, refer to the renku –help.
Renku Command Line¶
The base command for interacting with the Renku platform.
renku
(base command)¶
To list the available commands, either run renku
with no parameters or
execute renku help
:
$ renku help
Usage: renku [OPTIONS] COMMAND [ARGS]...
Check common Renku commands used in various situations.
Options:
--version Print version number.
--global-config-path Print global application's config path.
--install-completion Install completion for the current shell.
--path <path> Location of a Renku repository.
[default: (dynamic)]
--external-storage / -S, --no-external-storage
Use an external file storage service.
-h, --help Show this message and exit.
Commands:
# [...]
Configuration files¶
Depending on your system, you may find the configuration files used by Renku command line in a different folder. By default, the following rules are used:
- MacOS:
~/Library/Application Support/Renku
- Unix:
~/.config/renku
- Windows:
C:\Users\<user>\AppData\Roaming\Renku
If in doubt where to look for the configuration file, you can display its path
by running renku --global-config-path
.
renku init
¶
Create an empty Renku project or reinitialize an existing one.
Start a Renku project¶
If you have an existing directory which you want to turn into a Renku project, you can type:
$ cd ~/my_project
$ renku init
or:
$ renku init ~/my_project
This creates a new subdirectory named .renku
that contains all the
necessary files for managing the project configuration.
If provided directory does not exist, it will be created.
Use a different template¶
Renku is installed together with a specific set of templates you can select when you initialize a project. You can check them by typing:
$ renku init --list-templates
INDEX ID DESCRIPTION PARAMETERS
----- ------ ------------------------------- -----------------------------
1 python The simplest Python-based [...] description: project des[...]
2 R R-based renku project with[...] description: project des[...]
If you know which template you are going to use, you can provide either the id
--template-id
or the template index number --template-index
.
You can use a newer version of the templates or even create your own one and
provide it to the init
command by specifying the target template repository
source --template-source
(both local path and remote url are supported) and
the reference --template-ref
(branch, tag or commit).
You can take inspiration from the official Renku template repository
$ renku init --template-ref master --template-source \
https://github.com/SwissDataScienceCenter/renku-project-template
Fetching template from
https://github.com/SwissDataScienceCenter/renku-project-template@master
... OK
INDEX ID DESCRIPTION PARAMETERS
----- -------------- -------------------------- ----------------------
1 python-minimal Basic Python Project:[...] description: proj[...]
2 R-minimal Basic R Project: The [...] description: proj[...]
Please choose a template by typing the index:
Provide parameters ~~~~~~~~~~~~~~~~~-
Some templates require parameters to properly initialize a new project. You
can check them by listing the templates --list-templates
.
To provide parameters, use the --parameter
option and provide each
parameter using --parameter "param1"="value1"
.
$ renku init --template-id python-minimal --parameter \
"description"="my new shiny project"
Initializing new Renku repository... OK
If you don’t provide the required parameters through the option
-parameter
, you will be asked to provide them. Empty values are allowed
and passed to the template initialization function.
Note
Every project requires a name
that can either be provided using
--name
or automatically taken from the target folder. This is
also considered as a special parameter, therefore it’s automatically added
to the list of parameters forwarded to the init
command.
Update an existing project¶
There are situations when the required structure of a Renku project needs
to be recreated or you have an existing Git repository. You can solve
these situation by simply adding the --force
option.
$ git init .
$ echo "# Example\nThis is a README." > README.md
$ git add README.md
$ git commit -m 'Example readme file'
# renku init would fail because there is a git repository
$ renku init --force
You can also enable the external storage system for output files, if it was not installed previously.
$ renku init --force --external-storage
renku config
¶
Get and set Renku repository or global options.
Set values¶
You can set various Renku configuration options, for example the image registry URL, with a command like:
$ renku config set registry https://registry.gitlab.com/demo/demo
By default, configuration is stored locally in the project’s directory. Use
--global
option to store configuration for all projects in your home
directory.
Remove values¶
To remove a specific key from configuration use:
$ renku config remove registry
By default, only local configuration is searched for removal. Use --global
option to remove a global configuration value.
Query values¶
You can display all configuration values with:
$ renku config show
Both local and global configuration files are read. Values in local
configuration take precedence over global values. Use --local
or
--global
flag to read corresponding configuration only.
You can provide a KEY to display only its value:
$ renku config show registry
https://registry.gitlab.com/demo/demo
Available configuration values¶
The following values are available for the renku config
command:
Name | Description | Default |
---|---|---|
registry | The image registry to store Docker images in | None |
zenodo.access_token | Access token for Zenodo API | None |
dataverse.access_token | Access token for Dataverse API | None |
dataverse.server_url | URL for the Dataverse API server to use | None |
show_lfs_message | Whether to show messages about files being added to git LFS or not | True |
lfs_threshold | Threshold file size below which files are not added to git LFS | 100kb |
renku dataset
¶
Renku CLI commands for handling of datasets.
Manipulating datasets¶
Creating an empty dataset inside a Renku project:
$ renku dataset create my-dataset
Creating a dataset ... OK
You can pass the following options to this command to set various metadata for the dataset.
Option | Description |
---|---|
-t, –title | A human-readable title for the dataset. |
-d, –description | Dataset’s description. |
-c, –creator | Creator’s name, email, and an optional affiliation. Accepted format is ‘Forename Surname <email> [affiliation]’. Pass multiple times for a list of creators. |
-k, –keyword | Dataset’s keywords. Pass multiple times for a list of keywords. |
Editing a dataset’s metadata
Use edit
subcommand to change metadata of a dataset. You can edit the same
set of metadata as the create command by passing the options described in the
table above.
$ renku dataset edit my-dataset --title 'New title'
Successfully updated: title.
Listing all datasets:
$ renku dataset ls
ID NAME TITLE VERSION
-------- ------------- ------------- ---------
0ad1cb9a some-dataset Some Dataset
9436e36c my-dataset My Dataset
You can select which columns to display by using --columns
to pass a
comma-separated list of column names:
$ renku dataset ls --columns id,name,date_created,creators
ID NAME CREATED CREATORS
-------- ------------- ------------------- ---------
0ad1cb9a some-dataset 2020-03-19 16:39:46 sam
9436e36c my-dataset 2020-02-28 16:48:09 sam
Displayed results are sorted based on the value of the first column.
To inspect the state of the dataset on a given commit we can use --revision
flag for it:
$ renku dataset ls --revision=1103a42bd3006c94efcaf5d6a5e03a335f071215
ID NAME TITLE VERSION
a1fd8ce2 201901_us_flights_1 2019-01 US Flights 1
c2d80abe ds1 ds1
Deleting a dataset:
$ renku dataset rm some-dataset
OK
Working with data¶
Adding data to the dataset:
$ renku dataset add my-dataset http://data-url
This will copy the contents of data-url
to the dataset and add it
to the dataset metadata.
You can create a dataset when you add data to it for the first time by passing
--create
flag to add command:
$ renku dataset add --create new-dataset http://data-url
To add data from a git repository, you can specify it via https or git+ssh URL schemes. For example,
$ renku dataset add my-dataset git+ssh://host.io/namespace/project.git
Sometimes you want to add just specific paths within the parent project.
In this case, use the --source
or -s
flag:
$ renku dataset add my-dataset --source path/within/repo/to/datafile \
git+ssh://host.io/namespace/project.git
The command above will result in a structure like
data/
my-dataset/
datafile
You can use shell-like wildcards (e.g. , *, ?) when specifying paths to be added. Put wildcard patterns in quotes to prevent your shell from expanding them.
$ renku dataset add my-dataset --source 'path/**/datafile' \
git+ssh://host.io/namespace/project.git
You can use --destination
or -d
flag to set the location where the new
data is copied to. This location be will under the dataset’s data directory and
will be created if does not exists. You will get an error message if the
destination exists and is a file.
$ renku dataset add my-dataset \
--source path/within/repo/to/datafile \
--destination new-dir/new-subdir \
git+ssh://host.io/namespace/project.git
will yield:
data/
my-dataset/
new-dir/
new-subdir/
datafile
To add a specific version of files, use --ref
option for selecting a
branch, commit, or tag. The value passed to this option must be a valid
reference in the remote Git repository.
Adding external data to the dataset:
Sometimes you might want to add data to your dataset without copying the
actual files to your repository. This is useful for example when external data
is too large to store locally. The external data must exist (i.e. be mounted)
on your filesystem. Renku creates a symbolic to your data and you can use this
symbolic link in renku commands as a normal file. To add an external file pass
--external
or -e
when adding local data to a dataset:
$ renku dataset add my-dataset -e /path/to/external/file
Updating a dataset:
After adding files from a remote Git repository or importing a dataset from a
provider like Dataverse or Zenodo, you can check for updates in those files by
using renku dataset update
command. For Git repositories, this command
checks all remote files and copies over new content if there is any. It does
not delete files from the local dataset if they are deleted from the remote Git
repository; to force the delete use --delete
argument. You can update to a
specific branch, commit, or tag by passing --ref
option.
For datasets from providers like Dataverse or Zenodo, the whole dataset is
updated to ensure consistency between the remote and local versions. Due to
this limitation, the --include
and --exclude
flags are not compatible
with those datasets. Modifying those datasets locally will prevent them from
being updated.
You can limit the scope of updated files by specifying dataset names, using
--include
and --exclude
to filter based on file names, or using
--creators
to filter based on creators. For example, the following command
updates only CSV files from my-dataset
:
$ renku dataset update -I '*.csv' my-dataset
Note that putting glob patterns in quotes is needed to tell Unix shell not to expand them.
External data are not updated automatically because they require a checksum
calculation which can take a long time when data is large. To update external
files pass --external
or -e
to the update command:
$ renku dataset update -e
Tagging a dataset:
A dataset can be tagged with an arbitrary tag to refer to the dataset at that point in time. A tag can be added like this:
$ renku dataset tag my-dataset 1.0 -d "Version 1.0 tag"
A list of all tags can be seen by running:
$ renku dataset ls-tags my-dataset
CREATED NAME DESCRIPTION DATASET COMMIT
------------------- ------ --------------- ---------- ----------------
2020-09-19 17:29:13 1.0 Version 1.0 tag my-dataset 6c19a8d31545b...
A tag can be removed with:
$ renku dataset rm-tags my-dataset 1.0
Importing data from other Renku projects:
To import all data files and their metadata from another Renku dataset use:
$ renku dataset import \
https://renkulab.io/projects/<username>/<project>/datasets/<dataset-id>
or
$ renku dataset import \
https://renkulab.io/datasets/<dataset-id>
You can get the link to a dataset form the UI or you can construct it by knowing the dataset’s ID.
Importing data from an external provider:
$ renku dataset import 10.5281/zenodo.3352150
This will import the dataset with the DOI (Digital Object Identifier)
10.5281/zenodo.3352150
and make it locally available.
Dataverse and Zenodo are supported, with DOIs (e.g. 10.5281/zenodo.3352150
or doi:10.5281/zenodo.3352150
) and full URLs (e.g.
http://zenodo.org/record/3352150
). A tag with the remote version of the
dataset is automatically created.
Exporting data to an external provider:
$ renku dataset export my-dataset zenodo
This will export the dataset my-dataset
to zenodo.org
as a draft,
allowing for publication later on. If the dataset has any tags set, you
can chose if the repository HEAD version or one of the tags should be
exported. The remote version will be set to the local tag that is being
exported.
To export to a Dataverse provider you must pass Dataverse server’s URL and the name of the parent dataverse where the dataset will be exported to. Server’s URL is stored in your Renku setting and you don’t need to pass it every time.
Listing all files in the project associated with a dataset.
$ renku dataset ls-files
DATASET NAME ADDED PATH LFS
------------------- ------------------- ----------------------------- ----
my-dataset 2020-02-28 16:48:09 data/my-dataset/addme *
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file1 *
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file2
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file3 *
You can select which columns to display by using --columns
to pass a
comma-separated list of column names:
$ renku dataset ls-files --columns name,creators, path
DATASET NAME CREATORS PATH
------------------- --------- -----------------------------
my-dataset sam data/my-dataset/addme
my-dataset sam data/my-dataset/weather/file1
my-dataset sam data/my-dataset/weather/file2
my-dataset sam data/my-dataset/weather/file3
Displayed results are sorted based on the value of the first column.
Sometimes you want to filter the files. For this we use --dataset
,
--include
and --exclude
flags:
$ renku dataset ls-files --include "file*" --exclude "file3"
DATASET NAME ADDED PATH LFS
------------------- ------------------- ----------------------------- ----
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file1 *
my-dataset 2020-02-28 16:49:02 data/my-dataset/weather/file2 *
Unlink a file from a dataset:
$ renku dataset unlink my-dataset --include file1
OK
Unlink all files within a directory from a dataset:
$ renku dataset unlink my-dataset --include "weather/*"
OK
Unlink all files from a dataset:
$ renku dataset unlink my-dataset
Warning: You are about to remove following from "my-dataset" dataset.
.../my-dataset/weather/file1
.../my-dataset/weather/file2
.../my-dataset/weather/file3
Do you wish to continue? [y/N]:
Note
The unlink
command does not delete files,
only the dataset record.
renku run
¶
Track provenance of data created by executing programs.
Capture command line execution¶
Tracking execution of your command line script is done by simply adding the
renku run
command before the actual command. This will enable detection of:
- arguments (flags),
- string and integer options,
- input files or directories if linked to existing paths in the repository,
- output files or directories if modified or created while running the command.
Note
If there were uncommitted changes in the repository, then the
renku run
command fails. See git status for details.
Warning
Input and output paths can only be detected if they are passed as
arguments to renku run
.
Warning
Circular dependencies are not supported for renku run
. See
Circular Dependencies for more details.
Warning
When using output redirection in renku run
on Windows (with
`` > file`` or `` 2> file``), all Renku errors and messages are redirected
as well and renku run
produces no output on the terminal. On Linux,
this is detected by renku and only the output of the command to be run is
actually redirected. Renku specific messages such as errors get printed to
the terminal as usual and don’t get redirected.
Detecting input paths¶
Any path passed as an argument to renku run
, which was not changed during
the execution, is identified as an input path. The identification only works if
the path associated with the argument matches an existing file or directory
in the repository.
The detection might not work as expected if:
- a file is modified during the execution. In this case it will be stored as an output;
- a path is not passed as an argument to
renku run
.
Specifying auxiliary inputs (--input
)
You can specify extra inputs to your program explicitly by using the
--input
option. This is useful for specifying hidden dependencies
that don’t appear on the command line. Explicit inputs must exist before
execution of renku run
command. This option is not a replacement for
the arguments that are passed on the command line. Files or directories
specified with this option will not be passed as input arguments to the
script.
Disabling input detection (--no-input-detection
)
Input paths detection can be disabled by passing --no-input-detection
flag to renku run
. In this case, only the directories/files that are
passed as explicit input are considered to be file inputs. Those passed via
command arguments are ignored unless they are in the explicit inputs list.
This only affects files and directories; command options and flags are
still treated as inputs.
Detecting output paths¶
Any path modified or created during the execution will be added as an output.
Because the output path detection is based on the Git repository state after
the execution of renku run
command, it is good to have a basic
understanding of the underlying principles and limitations of tracking
files in Git.
Git tracks not only the paths in a repository, but also the content stored in those paths. Therefore:
- a recreated file with the same content is not considered an output file, but instead is kept as an input;
- file moves are detected based on their content and can cause problems;
- directories cannot be empty.
Note
When in doubt whether the outputs will be detected, remove all
outputs using git rm <path>
followed by git commit
before running
the renku run
command.
Command does not produce any files (--no-output
)
If the program does not produce any outputs, the execution ends with an error:
Error: There are not any detected outputs in the repository.
You can specify the --no-output
option to force tracking of such
an execution.
Specifying outputs explicitly (--output
)
You can specify expected outputs of your program explicitly by using the
--output
option. These output must exist after the execution of the
renku run
command. However, they do not need to be modified by
the command.
Disabling output detection (--no-output-detection
)
Output paths detection can be disabled by passing --no-output-detection
flag to renku run
. When disabled, only the directories/files that are
passed as explicit output are considered to be outputs and those passed via
command arguments are ignored.
Detecting standard streams¶
Often the program expect inputs as a standard input stream. This is detected
and recorded in the tool specification when invoked by renku run cat < A
.
Similarly, both redirects to standard output and standard error output can be done when invoking a command:
$ renku run grep "test" B > C 2> D
Warning
Detecting inputs and outputs from pipes |
is not supported.
Specifying inputs and outputs programmatically¶
Sometimes the list of inputs and outputs are not known before execution of the program. For example, a program might accept a date range as input and access all files within that range during its execution.
To address this issue, the program can dump a list of input and output files
that it is accessing in inputs.txt
and outputs.txt
. Each line in these
files is expected to be the path to an input or output file within the
project’s directory. When the program is finished, Renku will look for
existence of these two files and adds their content to the list of explicit
inputs and outputs. Renku will then delete these two files.
By default, Renku looks for these two files in .renku/tmp
directory. One
can change this default location by setting RENKU_INDIRECT_PATH
environment variable. When set, it points to a sub-directory within the
.renku/tmp
directory where inputs.txt
and outputs.txt
reside.
Exit codes¶
All Unix commands return a number between 0 and 255 which is called “exit code”. In case other numbers are returned, they are treaded module 256 (-10 is equivalent to 246, 257 is equivalent to 1). The exit-code 0 represents a success and non-zero exit-code indicates a failure.
Therefore the command specified after renku run
is expected to return
exit-code 0. If the command returns different exit code, you can specify them
with --success-code=<INT>
parameter.
$ renku run --success-code=1 --no-output fail
Circular Dependencies¶
Circular dependencies are not supported in renku run
. This means you cannot
use the same file or directory as both an input and an output in the same step,
for instance reading from a file as input and then appending to it is not
allowed. Since renku records all steps of an analysis workflow in a dependency
graph and it allows you to update outputs when an input changes, this would
lead to problems with circular dependencies. An update command would change the
input again, leading to renku seeing it as a changed input, which would run
update again, and so on, without ever stopping.
Due to this, the renku dependency graph has to be acyclic. So instead of appending to an input file or writing an output file to the same directory that was used as an input directory, create new files or write to other directories, respectively.
renku log
¶
Show provenance of data created by executing programs.
File provenance¶
Unlike the traditional file history format, which shows previous revisions of the file, this format presents tool inputs together with their revision identifiers.
A *
character shows to which lineage the specific file belongs to.
A @
character in the graph lineage means that the corresponding file does
not have any inputs and the history starts there.
When called without file names, renku log
shows the history of most
recently created files. With the --revision <refname>
option the output is
shown as it was in the specified revision.
Provenance examples¶
renku log B
- Show the history of file
B
since its last creation or modification. renku log --revision HEAD~5
- Show the history of files that have been created or modified 5 commits ago.
renku log --revision e3f0bd5a D E
- Show the history of files
D
andE
as it looked in the commite3f0bd5a
.
Output formats¶
Following formats supported when specified with --format
option:
- ascii
- dot
- dot-full
- dot-landscape
- dot-full-landscape
- dot-debug
- json-ld
- json-ld-graph
- Makefile
- nt
- rdf
You can generate a PNG of the full history of all files in the repository using the dot program.
$ FILES=$(git ls-files --no-empty-directory --recurse-submodules)
$ renku log --format dot $FILES | dot -Tpng > /tmp/graph.png
$ open /tmp/graph.png
Output validation¶
The --strict
option forces the output to be validated against the Renku
SHACL schema, causing the command to fail if the generated output is not
valid, as well as printing detailed information on all the issues found.
The --strict
option is only supported for the jsonld
, rdf
and
nt
output formats.
renku status
¶
Show status of data files created in the repository.
Inspecting a repository¶
Displays paths of outputs which were generated from newer inputs files and paths of files that have been used in diverent versions.
The first paths are what need to be recreated by running renku update
.
See more in section about renku update.
The paths mentioned in the output are made relative to the current directory
if you are working in a subdirectory (this is on purpose, to help
cutting and pasting to other commands). They also contain first 8 characters
of the corresponding commit identifier after the #
(hash). If the file was
imported from another repository, the short name of is shown together with the
filename before @
.
renku update
¶
Update outdated files created by the “run” command.
Recreating outdated files¶
The information about dependencies for each file in the repository is generated from information stored in the underlying Git repository.
A minimal dependency graph is generated for each outdated file stored in the repository. It means that only the necessary steps will be executed and the workflow used to orchestrate these steps is stored in the repository.
Assume that the following history for the file H
exists.
C---D---E
/ \
A---B---F---G---H
The first example shows situation when D
is modified and files E
and
H
become outdated.
C--*D*--(E)
/ \
A---B---F---G---(H)
** - modified
() - needs update
In this situation, you can do effectively two things:
Recreate a single file by running
$ renku update E
Update all files by simply running
$ renku update --all
Note
If there were uncommitted changes then the command fails. Check git status to see details.
Pre-update checks¶
In the next example, files A
or B
are modified, hence the majority
of dependent files must be recreated.
(C)--(D)--(E)
/ \
*A*--*B*--(F)--(G)--(H)
To avoid excessive recreation of the large portion of files which could have
been affected by a simple change of an input file, consider specifying a single
file (e.g. renku update G
). See also renku status.
Update siblings¶
If a tool produces multiple output files, these outputs need to be always updated together.
(B)
/
*A*--[step 1]--(C)
\
(D)
An attempt to update a single file would fail with the following error.
$ renku update C
Error: There are missing output siblings:
B
D
Include the files above in the command or use --with-siblings option.
The following commands will produce the same result.
$ renku update --with-siblings C
$ renku update B C D
renku rerun
¶
Recreate files created by the “run” command.
Recreating files¶
Assume you have run a step 2 that uses a stochastic algorithm, so each run
will be slightly different. The goal is to regenerate output C
several
times to compare the output. In this situation it is not possible to simply
call renku update since the input file A
has not been modified
after the execution of step 2.
A-[step 1]-B-[step 2*]-C
Recreate a specific output file by running:
$ renku rerun C
If you would like to recreate a file which was one of several produced by a tool, then these files must be recreated as well. See the explanation in updating siblings.
renku rm
¶
Remove a file, a directory, or a symlink.
Removing a file that belongs to a dataset will update its metadata. It also will attempt to update tracking information for files stored in an external storage (using Git LFS).
renku mv
¶
Move or rename a file, a directory, or a symlink.
Moving a file that belongs to a dataset will update its metadata. It also will attempt to update tracking information for files stored in an external storage (using Git LFS). Finally it makes sure that all relative symlinks work after the move.
renku workflow
¶
Manage the set of CWL files created by renku
commands.
Manipulating workflows¶
Listing workflows:
$ renku workflow ls
26be2e8d66f74130a087642768f2cef0_rerun.yaml:
199c4b9d462f4b27a4513e5e55f76eb2_cat.yaml:
9bea2eccf9624de387d9b06e61eec0b6_rerun.yaml:
b681b4e229764ceda161f6551370af12_update.yaml:
25d0805243e3468d92a3786df782a2c4_rerun.yaml:
Each *.yaml
file corresponds to a renku run/update/rerun execution.
Exporting workflows:
You can export the workflow to create a file as Common Workflow Language by using:
$ renku workflow set-name create output_file
baseCommand:
- cat
class: CommandLineTool
cwlVersion: v1.0
id: 22943eca-fa4c-4f3b-a92d-f6ac7badc0d2
inputs:
- default:
class: File
path: /home/user/project/intermediate
id: inputs_1
inputBinding:
position: 1
type: File
- default:
class: File
path: /home/user/project/intermediate2
id: inputs_2
inputBinding:
position: 2
type: File
outputs:
- id: output_stdout
streamable: false
type: stdout
requirements:
InitialWorkDirRequirement:
listing:
- entry: $(inputs.inputs_1)
entryname: intermediate
writable: false
- entry: $(inputs.inputs_2)
entryname: intermediate2
writable: false
stdout: output_file
You can use --revision
to specify the revision of the output file to
generate the workflow for. You can also export to a file directly with
-o <path>
.
renku save
¶
Convenience method to save local changes and push them to a remote server.
If you have local modification to files, you can save them using
$ renku save
Username for 'https://renkulab.io': my.user
Password for 'https://my.user@renkulab.io':
Successfully saved:
file1
file2
OK
Warning
The username and password for renku save are your gitlab user/password, not your renkulab login!
You can additionally supply a message that describes the changes that you
made by using the -m
or --message
parameter followed by your
message.
$ renku save -m "Updated file1 and 2."
Successfully saved:
file1
file2
OK
If no remote server has been configured, you can specify one by using the
-d
or --destination
parameter. Otherwise you will get an error.
$ renku save
Error: No remote has been set up for the current branch
$ renku save -d https://renkulab.io/gitlab/my.user/my-project.git
Successfully saved:
file1
file2
OK
You can also specify which paths to save:
$ renku save file1
Successfully saved:
file1
OK
renku show
¶
Show information about objects in current repository.
Siblings¶
In situations when multiple outputs have been generated by a single
renku run
command, the siblings can be discovered by running
renku show siblings PATH
command.
Assume that the following graph represents relations in the repository.
D---E---G
/ \
A---B---C F
Then the following outputs would be shown.
$ renku show siblings C
C
D
$ renku show siblings G
F
G
$ renku show siblings A
A
$ renku show siblings C G
C
D
---
F
G
$ renku show siblings
A
---
B
---
C
D
---
E
---
F
G
You can use the -f
or --flat
flag to output a flat list, as well as
the -v
or --verbose
flag to also output commit information.
Input and output files¶
You can list input and output files generated in the repository by running
renku show inputs
and renku show outputs
commands. Alternatively,
you can check if all paths specified as arguments are input or output files
respectively.
$ renku run wc < source.txt > result.wc
$ renku show inputs
source.txt
$ renku show outputs
result.wc
$ renku show outputs source.txt
$ echo $? # last command finished with an error code
1
You can use the -v
or --verbose
flag to print detailed information
in a tabular format.
$ renku show inputs -v
PATH COMMIT USAGE TIME WORKFLOW
---------- ------- ------------------- -------------------...-----------
source.txt 6d10e05 2020-09-14 23:47:17 .renku/workflow/388...d8_head.yaml
renku storage
¶
Manage an external storage.
Pulling files from git LFS¶
LFS works by checking small pointer files into git and saving the actual contents of a file in LFS. If instead of your file content, you see something like this, it means the file is stored in git LFS and its contents are not currently available locally (they are not pulled):
version https://git-lfs.github.com/spec/v1
oid sha256:42b5c7fb2acd54f6d3cd930f18fee3bdcb20598764ca93bdfb38d7989c054bcf
size 12
You can manually pull contents of file(s) you want with:
$ renku storage pull file1 file2
Removing local content of files stored in git LFS¶
If you want to restore a file back to its pointer file state, for instance to free up space locally, you can run:
$ renku storage clean file1 file2
This removes any data cached locally for files tracked in in git LFS.
renku doctor
¶
Check your system and repository for potential problems.
renku migrate
¶
Migrate project to the latest Renku version.
renku githooks
¶
Install and uninstall Git hooks.
Prevent modifications of output files¶
The commit hooks are enabled by default to prevent situation when some output file is manually modified.
$ renku init
$ renku run echo hello > greeting.txt
$ edit greeting.txt
$ git commit greeting.txt
You are trying to update some output files.
Modified outputs:
greeting.txt
If you are sure, use "git commit --no-verify".
Error Tracking¶
Renku is not bug-free and you can help us to find them.
GitHub¶
You can quickly open an issue on GitHub with a traceback and minimal system information when you hit an unhandled exception in the CLI.
Ahhhhhhhh! You have found a bug. 🐞
1. Open an issue by typing "open";
2. Print human-readable information by typing "print";
3. See the full traceback without submitting details (default: "ignore").
Please select an action by typing its name (open, print, ignore) [ignore]:
Sentry¶
When using renku
as a hosted service the Sentry integration can be enabled
to help developers iterate faster by showing them where bugs happen, how often,
and who is affected.
- Install Sentry-SDK with
python -m pip install sentry-sdk
; - Set environment variable
SENTRY_DSN=https://<key>@sentry.<domain>/<project>
.
Warning
User information might be sent to help resolving the problem. If you are not using your own Sentry instance you should inform users that you are sending possibly sensitive information to a 3rd-party service.
Internals¶
Internals of the renku-python
library.
Models¶
Model objects used in Python SDK.
Projects¶
Model objects representing projects.
-
class
renku.core.models.projects.
Project
(name=None, created=NOTHING, version='8', agent_version='pre-0.11.0', template_source: str = None, template_ref: str = None, template_id: str = None, template_version: str = None, template_metadata: str = '{}', immutable_template_files=NOTHING, automated_update=False, *, client=None, creator=None, id=None)[source]¶ Represent a project.
Method generated by attrs for class Project.
-
project_id
¶ Return the id for the project.
-
-
class
renku.core.models.projects.
ProjectCollection
(client=None)[source]¶ Represent projects on the server.
Example
Create a project and check its name.
# >>> project = client.projects.create(name=’test-project’) # >>> project.name # ‘test-project’
Create a representation of objects on the server.
-
create
(name=None, **kwargs)[source]¶ Create a new project.
Parameters: name – The name of the project. Returns: An instance of the newly create project. Return type: renku.core.models.projects.Project
-
Datasets¶
Model objects representing datasets.
Dataset object¶
-
class
renku.core.models.datasets.
Dataset
(*, commit=None, client=None, path=None, project: renku.core.models.projects.Project = None, parent=None, creators, id=None, label=None, date_published=None, description=None, identifier=NOTHING, in_language=None, keywords=None, license=None, title: str = None, url=None, version=None, date_created=NOTHING, files=NOTHING, tags=NOTHING, same_as=None, name=None, derived_from=None, immutable=False)[source]¶ Represent a dataset.
Method generated by attrs for class Dataset.
-
creators_csv
¶ Comma-separated list of creators associated with dataset.
-
creators_full_csv
¶ Comma-separated list of creators with full identity.
-
data_dir
¶ Directory where dataset files are stored.
-
default_id
()¶ Configure calculated ID.
-
default_label
()¶ Generate a default label.
-
editable
¶ Subset of attributes which user can edit.
-
entities
¶ Yield itself.
-
find_file
(path, return_index=False)[source]¶ Find a file in files container using its relative path.
-
classmethod
from_jsonld
(data, client=None, commit=None, schema_class=None)[source]¶ Create an instance from JSON-LD data.
-
classmethod
from_revision
(client, path, revision='HEAD', parent=None, find_previous=True, **kwargs)¶ Return dependency from given path and revision.
-
keywords_csv
¶ Comma-separated list of keywords associated with dataset.
-
mutate
()[source]¶ Update mutation history and assign a new identifier.
Do not mutate more than once before committing the metadata or otherwise there would be missing links in the chain of changes.
-
parent
¶ Return the parent object.
-
set_client
(client)¶ Sets the clients on this entity.
-
short_id
¶ Shorter version of identifier.
-
submodules
¶ Proxy to client submodules.
Comma-separated list of tags associated with dataset.
-
Dataset file¶
Manage files in the dataset.
-
class
renku.core.models.datasets.
DatasetFile
(*, commit=None, client=None, path=None, id=None, label=NOTHING, project: renku.core.models.projects.Project = None, parent=None, added=NOTHING, checksum=None, filename=NOTHING, name=None, filesize=None, filetype=None, url=None, based_on=None, external=False, source=None, is_lfs=False)[source]¶ Represent a file in a dataset.
Method generated by attrs for class DatasetFile.
-
commit_sha
¶ Return commit hash.
-
default_id
()¶ Configure calculated ID.
-
default_label
()¶ Generate a default label.
-
entities
¶ Yield itself.
-
classmethod
from_revision
(client, path, revision='HEAD', parent=None, find_previous=True, **kwargs)¶ Return dependency from given path and revision.
-
full_path
¶ Return full path in the current reference frame.
-
parent
¶ Return the parent object.
-
set_client
(client)¶ Sets the clients on this entity.
-
size_in_mb
¶ Return file size in megabytes.
-
submodules
¶ Proxy to client submodules.
-
Provenance¶
Extract provenance information from the repository.
Activities¶
-
class
renku.core.models.provenance.activities.
Activity
(*, commit=None, client=None, path=None, label=NOTHING, project: renku.core.models.projects.Project = None, id=None, message=NOTHING, was_informed_by=NOTHING, part_of=None, generated=None, invalidated=None, influenced=NOTHING, started_at_time=NOTHING, ended_at_time=NOTHING, agents=NOTHING)[source]¶ Represent an activity in the repository.
Method generated by attrs for class Activity.
-
default_label
()¶ Generate a default label.
-
classmethod
from_jsonld
(data, client=None, commit=None)[source]¶ Create an instance from JSON-LD data.
-
nodes
¶ Return topologically sorted nodes.
-
parents
¶ Return parent commits.
-
paths
¶ Return all paths in the commit.
-
removed_paths
¶ Return all paths removed in the commit.
-
submodules
¶ Proxy to client submodules.
-
-
class
renku.core.models.provenance.activities.
ProcessRun
(*, commit=None, client=None, path=None, label=NOTHING, project: renku.core.models.projects.Project = None, id=None, message=NOTHING, was_informed_by=NOTHING, part_of=None, invalidated=None, influenced=NOTHING, started_at_time=NOTHING, ended_at_time=NOTHING, agents=NOTHING, generated=None, association=None, annotations=None, qualified_usage=None)[source]¶ A process run is a particular execution of a Process description.
Method generated by attrs for class ProcessRun.
-
default_agents
()¶ Set person agent to be the author of the commit.
-
default_ended_at_time
()¶ Configure calculated properties.
-
default_id
()¶ Configure calculated ID.
-
default_influenced
()¶ Calculate default values.
-
default_invalidated
()¶ Entities invalidated by this Action.
-
default_label
()¶ Generate a default label.
-
default_message
()¶ Generate a default message.
-
default_started_at_time
()¶ Configure calculated properties.
-
default_was_informed_by
()¶ List parent actions.
-
classmethod
from_jsonld
(data, client=None, commit=None)[source]¶ Create an instance from JSON-LD data.
-
classmethod
from_run
(run, client, path, commit=None, subprocess_index=None, update_commits=False)[source]¶ Convert a
Run
to aProcessRun
.
-
classmethod
from_yaml
(path, client=None, commit=None)¶ Return an instance from a YAML file.
-
classmethod
generate_id
(commitsha)¶ Calculate action ID.
-
get_output_paths
()¶ Gets all output paths generated by this run.
-
nodes
¶ Return topologically sorted nodes.
-
parents
¶ Return parent commits.
-
paths
¶ Return all paths in the commit.
-
removed_paths
¶ Return all paths removed in the commit.
-
submodules
¶ Proxy to client submodules.
-
-
class
renku.core.models.provenance.activities.
WorkflowRun
(*, commit=None, client=None, path=None, label=NOTHING, project: renku.core.models.projects.Project = None, id=None, message=NOTHING, was_informed_by=NOTHING, part_of=None, invalidated=None, influenced=NOTHING, started_at_time=NOTHING, ended_at_time=NOTHING, agents=NOTHING, generated=None, association=None, annotations=None, qualified_usage=None, processes=NOTHING)[source]¶ A workflow run typically contains several subprocesses.
Method generated by attrs for class WorkflowRun.
-
add_annotations
(annotations)¶ Adds annotations from an external tool.
-
default_agents
()¶ Set person agent to be the author of the commit.
-
default_ended_at_time
()¶ Configure calculated properties.
-
default_generated
()¶ Create default
generated
.
-
default_id
()¶ Configure calculated ID.
-
default_influenced
()¶ Calculate default values.
-
default_invalidated
()¶ Entities invalidated by this Action.
-
default_label
()¶ Generate a default label.
-
default_message
()¶ Generate a default message.
-
default_started_at_time
()¶ Configure calculated properties.
-
default_was_informed_by
()¶ List parent actions.
-
classmethod
from_jsonld
(data, client=None, commit=None)[source]¶ Create an instance from JSON-LD data.
-
classmethod
from_run
(run, client, path, commit=None, update_commits=False)[source]¶ Convert a
Run
to aWorkflowRun
.
-
classmethod
from_yaml
(path, client=None, commit=None)¶ Return an instance from a YAML file.
-
classmethod
generate_id
(commitsha)¶ Calculate action ID.
-
get_output_paths
()¶ Gets all output paths generated by this run.
-
nodes
¶ Yield all graph nodes.
-
parents
¶ Return parent commits.
-
paths
¶ Return all paths in the commit.
-
plugin_annotations
()¶ Adds
Annotation``s from plugins to a ``ProcessRun
.
-
removed_paths
¶ Return all paths removed in the commit.
-
submodules
¶ Proxy to client submodules.
-
subprocesses
¶ Subprocesses of this
WorkflowRun
.
-
Entities¶
-
class
renku.core.models.entities.
Entity
(*, commit=None, client=None, path=None, id=None, label=NOTHING, project: renku.core.models.projects.Project = None, parent=None)[source]¶ Represent a data value or item.
Method generated by attrs for class Entity.
-
default_id
()¶ Configure calculated ID.
-
default_label
()¶ Generate a default label.
-
entities
¶ Yield itself.
-
classmethod
from_revision
(client, path, revision='HEAD', parent=None, find_previous=True, **kwargs)[source]¶ Return dependency from given path and revision.
-
parent
¶ Return the parent object.
-
submodules
¶ Proxy to client submodules.
-
-
class
renku.core.models.entities.
Collection
(*, commit=None, client=None, path=None, id=None, label=NOTHING, project: renku.core.models.projects.Project = None, parent=None, members=None)[source]¶ Represent a directory with files.
Method generated by attrs for class Collection.
-
default_id
()¶ Configure calculated ID.
-
default_label
()¶ Generate a default label.
-
entities
¶ Recursively return all files.
-
classmethod
from_revision
(client, path, revision='HEAD', parent=None, find_previous=True, **kwargs)¶ Return dependency from given path and revision.
-
parent
¶ Return the parent object.
-
submodules
¶ Proxy to client submodules.
-
Agents¶
-
class
renku.core.models.provenance.agents.
Person
(*, client=None, name, email=None, label=NOTHING, affiliation=None, alternate_name=None, id=None)[source]¶ Represent a person.
Method generated by attrs for class Person.
-
full_identity
¶ Return name, email, and affiliation.
-
short_name
¶ Gives full name in short form.
-
Relations¶
-
class
renku.core.models.provenance.qualified.
Usage
(*, entity, role=None, id=None)[source]¶ Represent a dependent path.
Method generated by attrs for class Usage.
Renku Workflow¶
Renku uses PROV-O and its own Renku ontology to represent workflows.
Run¶
Represents a workflow template.
-
class
renku.core.models.workflow.run.
OrderedSubprocess
(*, id, index: int, process)[source]¶ A subprocess with ordering.
Method generated by attrs for class OrderedSubprocess.
-
class
renku.core.models.workflow.run.
OrderedSubprocessSchema
(*args, commit=None, client=None, **kwargs)[source]¶ OrderedSubprocess schema.
Create an instance.
-
class
Meta
[source]¶ Meta class.
-
model
¶ alias of
OrderedSubprocess
-
-
class
-
class
renku.core.models.workflow.run.
Run
(*, commit=None, client=None, path=None, id=None, label=NOTHING, project: renku.core.models.projects.Project = None, command: str = None, successcodes: list = NOTHING, subprocesses=NOTHING, arguments=NOTHING, inputs=NOTHING, outputs=NOTHING, activity=None)[source]¶ Represents a renku run execution template.
Method generated by attrs for class Run.
-
activity
¶ Return the activity object.
-
Parameters¶
Represents a workflow template.
-
class
renku.core.models.workflow.parameters.
CommandArgument
(*, id=None, label=None, position: int = None, prefix: str = None, value: str = None)[source]¶ An argument to a command that is neither input nor output.
Method generated by attrs for class CommandArgument.
-
class
renku.core.models.workflow.parameters.
CommandArgumentSchema
(*args, commit=None, client=None, **kwargs)[source]¶ CommandArgument schema.
Create an instance.
-
class
Meta
[source]¶ Meta class.
-
model
¶ alias of
CommandArgument
-
-
class
-
class
renku.core.models.workflow.parameters.
CommandInput
(*, id=None, label=None, position: int = None, prefix: str = None, consumes, mapped_to=None)[source]¶ An input to a command.
Method generated by attrs for class CommandInput.
-
class
renku.core.models.workflow.parameters.
CommandInputSchema
(*args, commit=None, client=None, **kwargs)[source]¶ CommandArgument schema.
Create an instance.
-
class
Meta
[source]¶ Meta class.
-
model
¶ alias of
CommandInput
-
-
class
-
class
renku.core.models.workflow.parameters.
CommandOutput
(*, id=None, label=None, position: int = None, prefix: str = None, create_folder: bool = False, produces, mapped_to=None)[source]¶ An output of a command.
Method generated by attrs for class CommandOutput.
-
class
renku.core.models.workflow.parameters.
CommandOutputSchema
(*args, commit=None, client=None, **kwargs)[source]¶ CommandArgument schema.
Create an instance.
-
class
Meta
[source]¶ Meta class.
-
model
¶ alias of
CommandOutput
-
-
class
-
class
renku.core.models.workflow.parameters.
CommandParameter
(*, id=None, label=None, position: int = None, prefix: str = None)[source]¶ Represents a parameter for an execution template.
Method generated by attrs for class CommandParameter.
-
sanitized_id
¶ Return
_id
sanitized for use in non-jsonld contexts.
-
-
class
renku.core.models.workflow.parameters.
CommandParameterSchema
(*args, commit=None, client=None, **kwargs)[source]¶ CommandParameter schema.
Create an instance.
-
class
Meta
[source]¶ Meta class.
-
model
¶ alias of
CommandParameter
-
-
class
Renku Workflow Conversion¶
Renku allows conversion of tracked workflows to runnable workflows in supported tools (Currently CWL)
Tools and Workflows¶
Manage creation of tools and workflows for workflow tracking.
Command-line tool¶
Represent a CommandLineToolFactory
for tracking workflows.
-
class
renku.core.models.cwl.command_line_tool.
CommandLineToolFactory
(command_line, explicit_inputs=NOTHING, explicit_outputs=NOTHING, no_input_detection=False, no_output_detection=False, directory='.', working_dir='.', stdin=None, stderr=None, stdout=None, successCodes=NOTHING, annotations=None, messages=None, warnings=None)[source]¶ Command Line Tool Factory.
Method generated by attrs for class CommandLineToolFactory.
Annotation¶
Represent an annotation for a workflow.
Parameter¶
Represent parameters from the Common Workflow Language.
-
class
renku.core.models.cwl.parameter.
CommandInputParameter
(id=None, streamable=None, type='string', description=None, default=None, inputBinding=None)[source]¶ An input parameter for a CommandLineTool.
Method generated by attrs for class CommandInputParameter.
-
class
renku.core.models.cwl.parameter.
CommandLineBinding
(position=None, prefix=None, separate: bool = True, itemSeparator=None, valueFrom=None, shellQuote: bool = True)[source]¶ Define the binding behavior when building the command line.
Method generated by attrs for class CommandLineBinding.
-
class
renku.core.models.cwl.parameter.
CommandOutputBinding
(glob=None)[source]¶ Define the binding behavior for outputs.
Method generated by attrs for class CommandOutputBinding.
-
class
renku.core.models.cwl.parameter.
CommandOutputParameter
(id=None, streamable=None, type='string', description=None, format=None, outputBinding=None)[source]¶ Define an output parameter for a CommandLineTool.
Method generated by attrs for class CommandOutputParameter.
-
class
renku.core.models.cwl.parameter.
InputParameter
(id=None, streamable=None, type='string', description=None, default=None, inputBinding=None)[source]¶ An input parameter.
Method generated by attrs for class InputParameter.
-
class
renku.core.models.cwl.parameter.
OutputParameter
(id=None, streamable=None, type='string', description=None, format=None, outputBinding=None)[source]¶ An output parameter.
Method generated by attrs for class OutputParameter.
-
class
renku.core.models.cwl.parameter.
Parameter
(streamable=None)[source]¶ Define an input or output parameter to a process.
Method generated by attrs for class Parameter.
Types¶
Represent the Common Workflow Language types.
-
class
renku.core.models.cwl.types.
Directory
(path=None, listing=NOTHING)[source]¶ Represent a directory.
Method generated by attrs for class Directory.
Workflow¶
Represent workflows from the Common Workflow Language.
-
class
renku.core.models.cwl.workflow.
Workflow
(steps=NOTHING)[source]¶ Define a workflow representation.
Method generated by attrs for class Workflow.
File References¶
Manage names of Renku objects.
-
class
renku.core.models.refs.
LinkReference
(client, name)[source]¶ Manage linked object names.
Method generated by attrs for class LinkReference.
-
REFS
= 'refs'¶ Define a name of the folder with references in the Renku folder.
-
classmethod
check_ref_format
(name, no_slashes=False)[source]¶ Ensures that a reference name is well formed.
It follows Git naming convention:
- any path component of it begins with “.”, or
- it has double dots “..”, or
- it has ASCII control characters, or
- it has “:”, “?”, “[“, “", “^”, “~”, SP, or TAB anywhere, or
- it has “*” anywhere, or
- it ends with a “/”, or
- it ends with “.lock”, or
- it contains a “@{” portion
-
path
¶ Return full reference path.
-
reference
¶ Return the path we point to relative to the client.
-
Repository API¶
This API is built on top of Git and Git-LFS.
Renku repository management.
-
class
renku.core.management.
LocalClient
(path=<function default_path>, renku_home='.renku', parent=None, commit_activity_cache=NOTHING, activity_index=None, remote_cache=NOTHING, external_storage_requested=True, *, data_dir='data')[source]¶ A low-level client for communicating with a local Renku repository.
Method generated by attrs for class LocalClient.
Datasets¶
Client for handling datasets.
-
class
renku.core.management.datasets.
DatasetsApiMixin
[source]¶ Client for handling datasets.
Method generated by attrs for class DatasetsApiMixin.
-
CACHE
= 'cache'¶ Directory to cache transient data.
-
DATASETS
= 'datasets'¶ Directory for storing dataset metadata in Renku.
-
POINTERS
= 'pointers'¶ Directory for storing external pointer files.
-
add_data_to_dataset
(dataset, urls, force=False, overwrite=False, sources=(), destination='', ref=None, external=False, extract=False, all_at_once=False, destination_names=None, progress=None)[source]¶ Import the data into the data directory.
-
add_dataset_tag
(dataset, tag, description='', force=False)[source]¶ Adds a new tag to a dataset.
Validates if the tag already exists and that the tag follows the same rules as docker tags. See https://docs.docker.com/engine/reference/commandline/tag/ for a documentation of docker tag syntax.
Raises: errors.ParameterError
-
create_dataset
(name=None, title=None, description=None, creators=None, keywords=None)[source]¶ Create a dataset.
-
dataset_commits
(dataset, max_results=None)[source]¶ Gets the newest commit for a dataset or its files.
Commits are returned sorted from newest to oldest.
-
datasets
¶ Return mapping from path to dataset.
Removes tags from a dataset.
-
static
remove_file
(filepath)[source]¶ Remove a file/symlink and its pointer file (for external files).
-
renku_datasets_path
¶ Return a
Path
instance of Renku dataset metadata folder.
-
renku_pointers_path
¶ Return a
Path
instance of Renku pointer files folder.
-
Repository¶
Client for handling a local repository.
-
class
renku.core.management.repository.
PathMixin
(path=<function default_path>)[source]¶ Define a default path attribute.
Method generated by attrs for class PathMixin.
-
class
renku.core.management.repository.
RepositoryApiMixin
(renku_home='.renku', parent=None, commit_activity_cache=NOTHING, activity_index=None, remote_cache=NOTHING, *, data_dir='data')[source]¶ Client for handling a local repository.
Method generated by attrs for class RepositoryApiMixin.
-
ACTIVITY_INDEX
= 'activity_index.yaml'¶ Caches activities that generated a path.
-
DOCKERFILE
= 'Dockerfile'¶ Name of the Dockerfile in the repo.
-
LOCK_SUFFIX
= '.lock'¶ Default suffix for Renku lock file.
-
METADATA
= 'metadata.yml'¶ Default name of Renku config file.
-
WORKFLOW
= 'workflow'¶ Directory for storing workflow in Renku.
-
activities_for_paths
(paths, file_commit=None, revision='HEAD')[source]¶ Get all activities involving a path.
-
activity_index_path
¶ Path to the activity filepath cache.
-
check_immutable_template_files
(*paths)[source]¶ Check paths and return a list of those that are marked immutable in the project template.
-
data_dir
= None¶ Define a name of the folder for storing datasets.
-
docker_path
¶ Path to the Dockerfile.
-
find_previous_commit
(paths, revision='HEAD', return_first=False, full=False)[source]¶ Return a previous commit for a given path starting from
revision
.Parameters: - revision – revision to start from, defaults to
HEAD
- return_first – show the first commit in the history
- full – return full history
Raises: KeyError – if path is not present in the given commit
- revision – revision to start from, defaults to
-
import_from_template
(template_path, metadata, force=False)[source]¶ Render template files from a template directory.
-
latest_agent
¶ Returns latest agent version used in the repository.
-
lock
¶ Create a Renku config lock.
-
parent
= None¶ Store a pointer to the parent repository.
-
path_activity_cache
¶ Cache of all activities and their generated paths.
-
process_commit
(commit=None, path=None)[source]¶ Build an
Activity
.Parameters: - commit – Commit to process. (default:
HEAD
) - path – Process a specific CWL file.
- commit – Commit to process. (default:
-
project
¶ Return the Project instance.
-
remote
¶ Return host, owner and name of the remote if it exists.
-
renku_home
= None¶ Define a name of the Renku folder (default:
.renku
).
-
renku_metadata_path
¶ Return a
Path
instance of Renku metadata file.
-
renku_path
= None¶ Store a
Path
instance of the Renku folder.
-
template_checksums
¶ Return a
Path
instance to the template checksums file.
-
workflow_path
¶ Return a
Path
instance of the workflow folder.
-
Git Internals¶
Wrap Git client.
-
class
renku.core.management.git.
GitCore
[source]¶ Wrap Git client.
Method generated by attrs for class GitCore.
-
candidate_paths
¶ Return all paths in the index and untracked files.
-
commit
(commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None)[source]¶ Automatic commit.
-
dirty_paths
¶ Get paths of dirty files in the repository.
-
modified_paths
¶ Return paths of modified files.
-
repo
= None¶ Store an instance of the Git repository.
-
-
renku.core.management.git.
get_mapped_std_streams
(lookup_paths, streams=('stdin', 'stdout', 'stderr'))[source]¶ Get a mapping of standard streams to given paths.
Git utilities.
-
class
renku.core.models.git.
GitURL
(href, pathname=None, protocol='ssh', hostname='localhost', username=None, password=None, port=None, owner=None, name=None, regex=None)[source]¶ Parser for common Git URLs.
Method generated by attrs for class GitURL.
-
image
¶ Return image name.
-
-
class
renku.core.models.git.
Range
(start, stop)[source]¶ Represent parsed Git revision as an interval.
Method generated by attrs for class Range.
Plugin Support¶
Runtime Plugins¶
Runtime plugins are supported using the pluggy library.
Runtime plugins can be created as Python packages that contain the respective entry point definition in their setup.py file, like so:
from setuptools import setup
setup(
...
entry_points={"renku": ["name_of_plugin = myproject.pluginmodule"]},
...
)
where myproject.pluginmodule points to a Renku hookimpl e.g.:
from renku.core.plugins import hookimpl
@hookimpl
def plugin_hook_implementation(param1, param2):
...
renku run
hooks¶
Plugin hooks for renku run customization.
-
renku.core.plugins.run.
cmdline_tool_annotations
(tool)[source]¶ Plugin Hook to add
Annotation
entry list to aWorkflowTool
.Parameters: run – A WorkflowTool
object to get annotations for.Returns: A list of renku.core.models.cwl.annotation.Annotation
objects.
CLI Plugins¶
Command-line interface plugins are supported using the click-plugins <https://github.com/click-contrib/click-plugins> library.
As in case the runtime plugins, command-line plugins can be created as Python packages that contain the respective entry point definition in their setup.py file, like so:
from setuptools import setup
setup(
...
entry_points={"renku.cli_plugins": ["mycmd = myproject.pluginmodule:mycmd"]},
...
)
where myproject.pluginmodule:mycmd points to a click command e.g.:
import click
@click.command()
@pass_local_client()
def mycmd(client):
...
Changes¶
` <https://github.com/SwissDataScienceCenter/renku-python/compare/v0.12.0…v0.12.1>`__ (2020-11-16)¶
Bug Fixes¶
- core: re-raise renku handled exception on network failure (#1623) (4856a05)
- dataset: no commit if nothing is edited (#1706) (a68edf6)
- service: correctly determine resource age (#1695) (40153f0)
- service: correctly set project_name slug on project create (#1691) (234e1b3)
- service: set template version and metadata correctly (#1708) (ed98be3)
` <https://github.com/SwissDataScienceCenter/renku-python/compare/v0.11.6…v0.12.0>`__ (2020-11-03)¶
Bug Fixes¶
- core: fix bug where remote_cache caused project ids to leak (#1618) (3ef04fb)
- core: fix graph building for nodes with same subpath (#1625) (7cae9be)
- core: fix importing a dataset referenced from non-existent projects (#1574) (92b8bf8)
- core: fix old dataset migration and activity dataset outputs (#1603) (a5339e2)
- core: fix project migration getting overwritten with old metadata (#1581) (c5a5960)
- core: fix update creating a commit when showing help (#1627) (529e582)
- core: fixes git encoding of paths with unicode characters (#1538) (053dac9)
- core: make Run migration ids unique by relative path instead of absolute (#1573) (cf96310)
- dataset: broken directory hierarchy after renku dataset imports (#1576) (9dcffce)
- dataset: deserialization error (#1675) (420653f)
- dataset: error when adding same file multiple times (#1639) (05bfde7)
- dataset: explicit failure when cannot pull LFS objects (#1590) (3b05816)
- dataset: invalid generated name in migration (#1593) (89b2e43)
- dataset: remove blank nodes (#1602) (478f08c)
- dataset: set isBasedOn for renku datasets (#1617) (3aee6b8)
- dataset: update local files metadata when overwriting (#1582) (59eaf25)
- dataset: various migration issues (#1620) (f24c2e4)
- service: correctely set job timeout (#1677) (25f0eb6)
- service: dataset rm endpoint supports new core API (#1622) (e71916e)
- service: push to protected branches (#1614) (34c7f92)
- service: raise exception on uninitialized projects (#1624) (a2025c3)
Features¶
- cli: add click plugin support (#1604) (47b007f)
- cli: adds consistent behaviour for cli commands (#1523) (20b7248)
- cli: show lfs status of dataset files (#1575) (a1c3e2a)
- cli: verbose output for renku show (#1524) (dae968c)
- core: Adds renku dataset update for Zenodo and Dataverse (#1331) (e38c51f)
- dataset: list dataset description (#1588) (7e13857)
- service: adds template and dockerfile migration to migration endpoint (#1509) (ea01795)
- service: adds version endpoint (#1548) (6193df6)
0.11.5 (2020-10-13)¶
Bug Fixes¶
- core: fix importing a dataset referenced from non-existent projects (#1574) (4bb13ef)
- core: fixes git encoding of paths with unicode characters (#1538) (9790707)
- dataset: fix broken directory hierarchy after renku dataset imports (#1576) (41e3e72)
- dataset: abort importing a dataset when cannot pull LFS objects (#1590) (9877a98)
- dataset: fix invalid dataset name after migration (#1593) (c7ec249)
- dataset: update dataset files metadata when adding and overwriting local files (#1582) (0a23e82)
0.11.2 (2020-09-24)¶
Bug Fixes¶
Features¶
- cli: show existing paths when initializing non-empty dir (#1535) (07c559f)
- core: follow URL redirections for dataset files (#1516) (5a37b3c)
- dataset: flattened JSON-LD metadata (#1518) (458ddb9)
- service: add additional template parameters (#1469) (6372a32)
- service: adds additional fields to datasets listings (#1508) (f8a395f)
- service: adds project details and renku operation on jobs endpoint (#1492) (6b3fafd)
- service: execute read operations via git remote (#1488) (84a0eb3)
- workflow: avoid unnecessary parent runs (#1476) (b908ffd)
0.11.0 (2020-08-14)¶
Bug Fixes¶
- cli: disable version check in githook calls (#1300) (5132db3)
- core: fix paths in migration of workflows (#1371) (8c3d34b)
- core: Fixes SoftwareAgent person context (#1323) (a207a7f)
- core: Only update project metadata if any migrations were executed (#1308) (1056a03)
- service: adds more custom logging and imp. except handling (#1435) (6c3adb5)
- service: fixes handlers for internal loggers (#1433) (a312f7c)
- service: move project_id to query string on migrations check (#1367) (0f89726)
- tests: integration tests (#1351) (3974a39)
Features¶
- cli: Adds renku save command (#1273) (4ddc1c2)
- cli: prompt for missing variables (1e1d408), closes #1126
- cli: Show detailed commands for renku log output (#1345) (19fb819)
- core: Calamus integration (#1281) (bda538f)
- core: configurable data dir (#1347) (e388773)
- core: disabling of inputs/outputs auto-detection (#1406) (3245ca0)
- core: migration check in core (#1320) (4bc52f4)
- core: Move workflow serialisation over to calamus (#1386) (f0fbc49)
- core: save and load workflow as jsonld (#1185) (d403289)
- core: separate models for migrations (#1431) (127d606)
- dataset: source and url for DatasetFile (#1451) (b4fa5db)
- service: added endpoints to execute all migrations on a project (#1322) (aca8cc2)
- service: adds endpoint for explicit migrations check (#1326) (146b1a7)
- service: adds source and destination versions to migrations check (#1372) (ea76b48)
- decode base64 headers (#1407) (9901cc3)
- service: adds endpoints for dataset remove (#1383) (289e4b9)
- service: adds endpoints for unlinking files from a dataset (#1314) (1b78b16)
- service: async migrations execution (#1344) (ff66953)
- service: create new projects from templates (#1287) (552f85c), closes #862
0.10.4 (2020-05-18)¶
Bug Fixes¶
Features¶
- cli: Adds warning messages for LFS, fix output redirection (#1199) (31969f5)
- core: Adds lfs file size limit and lfs ignore file (#1210) (1f3c81c)
- core: Adds renku storage clean command (#1235) (7029400)
- core: git hook to avoid committing large files (#1238) (e8f1a8b)
- core: renku doctor check for lfs migrate info (#1234) (480da06)
- dataset: fail early when external storage not installed (#1239) (e6ea6da)
- core: project clone API support for revision checkout (#1208) (74116e9)
- service: protected branches support (#1222) (8405ce5)
- dataset: doi variations for import (#1216) (0f329dd)
- dataset: keywords in metadata (#1209) (f98a800)
- dataset: no failure when adding ignored files (#1213) (b1e275f)
- service: read template manifest (#1254) (7eac85b)
0.10.3 (2020-04-22)¶
Bug Fixes¶
Features¶
- core: CLI warning when in non-root directory (#1162) (115e462)
- dataset: migrate submodule-based datasets (#1092) (dba20c4)
- dataset: no failure when adding existing files (#1177) (a68dcb7)
- dataset: remove –link flag (#1164) (969d4f8)
- dataset: show file size in ls-files (#1123) (0951930)
- datasets: specify title on dataset creation (#1204) (fb70ac5)
- init: read and display template variables (#1134) (0f86dc5), closes #1126
- service: add remote files to dataset (#1139) (f6bebfe)
0.10.0 (2020-03-25)¶
This release brings about several important Dataset features:
- importing renku datasets (#838)
- working with data external to the repository (#974)
- editing dataset metadata (#1111)
Please see the Dataset documentation for details.
Additional features were implemented for the backend service to facilitate a smoother user experience for dataset file manipulation.
IMPORTANT: starting with this version, a new metadata migration mechanism is in place (#1003). Renku commands will insist on migrating a project immediately if the metadata is found to be outdated.
Bug Fixes¶
- cli: consistenly show correct contexts (#1096) (b333f0f)
- dataset: –no-external-storage flag not working (#1130) (c183e97)
- dataset: commit only updated dataset files (#1116) (d9739df)
- datasets: fixed importing large amount of small files (#1119) (8d61473)
- datasets: raises correct error message on import of protected dataset (#1112) (e579904)
Features¶
- core: new migration mechanism (#1003) (1cc33d4)
- dataset: adding external data without copying (#974) (6a17512)
- dataset: bypass import confirmation (#1124) (947210a)
- dataset: import renku datasets (#838) (6aa3651)
- dataset: metadata edit (#1111) (66cfbbc)
- dataset: wildcard support when adding data from git (#1128) (baa1c9f)
0.9.1 (2020-02-24)¶
Bug Fixes¶
- added test utility functions and cleanup (#1014) (f41100d)
- cache instance cleanup (#1051) (12f5446)
- enable dataset cmd in sub directories (#1012) (e3191e1)
- fields with default need to come last (#1046) (649b159)
- fixes renku show sibling handling with no paths (#1026) (8df678f)
- flush old keys for user projects and files (#1002) (7438c73)
- generate https IDs for entities instead of file:// (#1009) (87f7750)
- handle errors correctly (#1040) (950eeac)
- improved list datasets and files (#1034) (fd96d68)
- pin idna to 2.8 (#1020) (19ea7af)
- resync repo after import action (#1052) (b38341b)
- standardize test assertions (#1016) (16e8e63)
- temporarily disable integration tests (#1036) (8c8fd7a)
- updated readme to include local testing (#1000) (351a650)
- run tests via pipenv run commands (#999) (d8095e3)
0.9.0 (2020-02-07)¶
Bug Fixes¶
- adds git user check before running renku init (#892) (2e52dff)
- adds sorting to file listing (#960) (bcf6bcd)
- avoid empty commits when adding files (#842) (8533a7a)
- Fixes dataset naming (#898) (418deb3)
- Deletes temporary branch after renku init –force (#887) (eac0463)
- enforces label on SoftwareAgent (#869) (71badda)
- Fixes JSON-LD translation and related issues (#846) (65e5469)
- Fixes renku run error message handling (#961) (81d31ff)
- Fixes renku update workflow failure handling and renku status error handling (#888) (3879124)
- Fixes sameAs property to follow schema.org spec (#944) (291380e)
- handle missing renku directory (#989) (f938be9)
- resolves symlinks when pulling LFS (#981) (68bd8f5)
- serializes all zenodo metadata (#941) (787978a)
- Fixes various bugs in dataset import (#882) (be28bf5)
Features¶
- add project initialization from template (#809) (4405744)
- added renku service with cache and datasets (#788) (7a7068d), closes #767 #846
- Adds protection for renku relevant paths in dataset add (#939) (a3c02e8)
- Adds prov:Invalidated output to renku log (008ab20)
- better UX when adding to a dataset (#911) (c6ac967)
- check for required git hooks (#854) (54ba91d)
- Dataverse export (#909) (7e9e647)
- improve dataset command output (#927) (c7639d3)
- metadata on dataset creation (#850) (b357ee7)
- Plugin support for renku-run (#883) (7dbda83)
- python 3.8 compatibility (#861) (4aaac8d)
- SHACL Validation (#767) (255a01d)
- update bug_report template to be more renku-relevant (#988) (e00ded7)
0.8.0 (2019-11-21)¶
Bug Fixes¶
- addressed CI problems with git submodules (#783) (0d3eeb7)
- adds simple check on empty filename (#786) (8cd061b)
- ensure all Person instances have valid ids (4f80efc), closes #812
- Fixes jsonld issue when importing from dataverse (#759) (ffe36c6)
- fixes nested type scoped handling if a class only has a single class (#804) (16d03b6)
- ignore deleted paths in generated entities (86fedaf), closes #806
- integration tests (#831) (a4ad7f9)
- make Creator a subclass of Person (ac9bac3), closes #793
- Redesign scoped context in jsonld (#750) (2b1948d)
0.6.1 (2019-10-10)¶
Bug Fixes¶
- add .renku/tmp to default .gitignore (#728) (6212148)
- dataset import causes renku exception due to duplicate LocalClient (#724) (89411b0)
- delete new dataset ref if file add fails (#729) (2dea711)
- fixes bug with deleted files not getting committed (#741) (5de4b6f)
- force current project for entities (#707) (538ef07)
- integration tests for #681 (#747) (b08435d)
- use commit author for project creator (#715) (1a40ebe), closes #713
- zenodo dataset import error (f1d623a)
0.6.0 (2019-09-18)¶
Bug Fixes¶
- adds _label and commit data to imported dataset files, single commit for imports (#651) (75ce369)
- always add commit to dataset if possible (#648) (7659bc8), closes #646
- cleanup needed for integration tests on py35 (#653) (fdd7215)
- fixed serialization of datetime to iso format (#629) (693d59d)
- fixes broken integration test (#649) (04eba66)
- hide image, pull, runner, show, workon and deactivate commands (#672) (a3e9998)
- integration tests fixed (#685) (f0ea8f0)
- migration of old datasets (#639) (4d4d7d2)
- migration timezones (#683) (58c2de4)
- Removes unneccesary call to git lfs with no paths (#658) (e32d48b)
- renku home directory overwrite in tests (#657) (90e1c48)
- upload metadata before actual files (#652) (95ed468)
- use latest_html for version check (#647) (c6b0309), closes #641
- user-related metadata (#655) (44183e6)
- zenodo export failing with relative paths (d40967c)
0.5.1 (2019-07-12)¶
Bug Fixes¶
- ensure external storage is handled correctly (#592) (7938ac4)
- only check local repo for lfs filter (#575) (a64dc79)
- cli: allow renku run with many inputs (f60783e), closes #552
- added check for overwriting datasets (#541) (8c697fb)
- escape whitespaces in notebook name (#584) (0542fcc)
- modify json-ld for datasets (#534) (ab6a719), closes #525 #526
- refactored tests and docs to align with updated pydoctstyle (#586) (6f981c8)
- cli: add check of missing references (9a373da)
- cli: fail when removing non existing dataset (dd728db)
- status: fix renku status output when not in root folder (#564) (873270d), closes #551
- added dependencies for SSL support (#565) (4fa0fed)
- datasets: strip query string from data filenames (450898b)
- fixed serialization of creators (#550) (6a9173c)
- updated docs (#539) (ff9a67c)
- cli: remove dataset aliases (6206e62)
- cwl: detect script as input parameter (e23b75a), closes #495
- deps: updated dependencies (691644d)
Features¶
- add dataset metadata to the KG (#558) (fb443d7)
- datasets: export dataset to zenodo (#529) (fc6fd4f)
- added support for working on dirty repo (ae67be7)
- datasets: edit dataset metadata (#549) (db39083)
- integrate metadata from zenodo (#545) (4273d2a)
- config: added global config manager (#533) (938f820)
- datasets: import data from zenodo (#509) (52b2769)
0.5.0 (2019-03-28)¶
Bug Fixes¶
Features¶
- api: list datasets from a commit (04a9fe9)
- cli: add dataset rm command (a70c7ce)
- cli: add rm command (cf0f502)
- cli: configurable format of dataset output (d37abf3)
- dataset: add existing file from current repo (575686b), closes #99
- datasets: added ls-files command (ccc4f59)
- models: reference context for relative paths (5d1e8e7), closes #452
- add JSON-LD output format for datasets (c755d7b), closes #426
- generate Makefile with log –format Makefile (1e440ce)
v0.4.0
¶
(released 2019-03-05)
- Adds
renku mv
command which updates dataset metadata,.gitattributes
and symlinks. - Pulls LFS objects from submodules correctly.
- Adds listing of datasets.
- Adds reduced dot format for
renku log
. - Adds
doctor
command to check missing files in datasets. - Moves dataset metadata to
.renku/datasets
and addsmigrate datasets
command and uses UUID for metadata path. - Gets git attrs for files to prevent duplicates in
.gitattributes
. - Fixes
renku show outputs
for directories. - Runs Git LFS checkout in a worktrees and lazily pulls necessary LFS files before running commands.
- Asks user before overriding an existing file using
renku init
orrenku runner template
. - Fixes
renku init --force
in an empty dir. - Renames
CommitMixin._location
to_project
. - Addresses issue with commits editing multiple CWL files.
- Exports merge commits for full lineage.
- Exports path and parent directories.
- Adds an automatic check for the latest version.
- Simplifies issue submission from traceback to GitHub or Sentry.
Requires
SENTRY_DSN
variable to be set and sentry-sdk package to be installed before sending any data. - Removes outputs before run.
- Allows update of directories.
- Improves readability of the status message.
- Checks ignored path when added to a dataset.
- Adds API method for finding ignored paths.
- Uses branches for
init --force
. - Fixes CVE-2017-18342.
- Fixes regex for parsing Git remote URLs.
- Handles
--isolation
option usinggit worktree
. - Renames
client.git
toclient.repo
. - Supports
python -m renku
. - Allows ‘.’ and ‘-‘ in repo path.
v0.3.3
¶
(released 2018-12-07)
- Fixes generated Homebrew formula.
- Renames
renku pull path
torenku storage pull
with deprecation warning.
v0.3.0
¶
(released 2018-11-26)
- Adds JSON-LD context to objects extracted from the Git repository
(see
renku show context --list
). - Uses PROV-O and WFPROV as provenance vocabularies and generates “stable”
object identifiers (
@id
) for RDF and JSON-LD output formats. - Refactors the log output to allow linking files and directories.
- Adds support for aliasing tools and workflows.
- Adds option to install shell completion (
renku --install-completion
). - Fixes initialization of Git submodules.
- Uses relative submodule paths when appropriate.
- Simplifies external storage configuration.
License¶
Copyright 2017-2020 - Swiss Data Science Center (SDSC)
A partnership between École Polytechnique Fédérale de Lausanne (EPFL) and
Eidgenössische Technische Hochschule Zürich (ETHZ).
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Authors¶
Python SDK and CLI for the Renku platform.
- Swiss Data Science Center <contact@datascience.ch>
Installation
Renku releases and development versions are available from PyPI. You can install it using any tool that knows how to handle PyPI packages. Our recommendation is to use :code:pipx.
Note
We do not officially support Windows at this moment. The way Windows handles paths and symlinks interferes with some renku functionality. We recommend using the Windows Subsystem for Linux (WSL) to use renku on Windows.
pipx
First, install pipx
and make sure that the $PATH
is correctly configured.
$ python3 -m pip install --user pipx
$ pipx ensurepath
Once pipx
is installed use following command to install renku
.
$ pipx install renku
$ which renku
~/.local/bin/renku
pipx
installs renku into its own virtual environment, making sure that it
does not pollute any other packages or versions that you may have already
installed.
Note
If you install renku as a dependency in a virtual environment and the
environment is active, your shell will default to the version installed
in the virtual environment, not the version installed by pipx
.
To install a development release:
$ pipx install --pip-args pre renku
pip
$ pip install renku
The latest development versions are available on PyPI or from the Git repository:
$ pip install --pre renku
# - OR -
$ pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku
Use following installation steps based on your operating system and preferences if you would like to work with the command line interface and you do not need the Python library to be importable.
Docker
The containerized version of the CLI can be launched using Docker command.
$ docker run -it -v "$PWD":"$PWD" -w="$PWD" renku/renku-python renku
It makes sure your current directory is mounted to the same place in the container.
Getting Started
Interaction with the platform can take place via the command-line interface (CLI).
Start by creating for folder where you want to keep your Renku project:
$ renku init my-renku-project
$ cd my-renku-project
Create a dataset and add data to it:
$ renku dataset create my-dataset
$ renku dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renku-python/master/README.rst
Run an analysis:
$ renku run wc < data/my-dataset/README.rst > wc_readme
Trace the data provenance:
$ renku log wc_readme
These are the basics, but there is much more that Renku allows you to do with your data analysis workflows.
For more information about using renku, refer to the renku –help.
Renku Python Library, CLI and Service¶
A Python library for the Renku collaborative data science platform. It includes a CLI and SDK for end-users as well as a service backend. It provides functionality for the creation and management of projects and datasets, and simple utilities to capture data provenance while performing analysis tasks.
- NOTE:
renku-python
is the python library and core service for Renku - it does not start the Renku platform itself - for that, refer to the Renku docs on running the platform.
Installation
Renku releases and development versions are available from PyPI. You can install it using any tool that knows how to handle PyPI packages. Our recommendation is to use :code:pipx.
Note
We do not officially support Windows at this moment. The way Windows handles paths and symlinks interferes with some renku functionality. We recommend using the Windows Subsystem for Linux (WSL) to use renku on Windows.
pipx
First, install pipx
and make sure that the $PATH
is correctly configured.
$ python3 -m pip install --user pipx
$ pipx ensurepath
Once pipx
is installed use following command to install renku
.
$ pipx install renku
$ which renku
~/.local/bin/renku
pipx
installs renku into its own virtual environment, making sure that it
does not pollute any other packages or versions that you may have already
installed.
Note
If you install renku as a dependency in a virtual environment and the
environment is active, your shell will default to the version installed
in the virtual environment, not the version installed by pipx
.
To install a development release:
$ pipx install --pip-args pre renku
pip
$ pip install renku
The latest development versions are available on PyPI or from the Git repository:
$ pip install --pre renku
# - OR -
$ pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku
Use following installation steps based on your operating system and preferences if you would like to work with the command line interface and you do not need the Python library to be importable.
Docker
The containerized version of the CLI can be launched using Docker command.
$ docker run -it -v "$PWD":"$PWD" -w="$PWD" renku/renku-python renku
It makes sure your current directory is mounted to the same place in the container.
Getting Started¶
Interaction with the platform can take place via the command-line interface (CLI).
Start by creating for folder where you want to keep your Renku project:
$ renku init my-renku-project
$ cd my-renku-project
Create a dataset and add data to it:
$ renku dataset create my-dataset
$ renku dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renku-python/master/README.rst
Run an analysis:
$ renku run wc < data/my-dataset/README.rst > wc_readme
Trace the data provenance:
$ renku log wc_readme
These are the basics, but there is much more that Renku allows you to do with your data analysis workflows.
For more information about using renku, refer to the renku –help.