Renku Command Line

The base command for interacting with the Renku platform.

renku (base command)

To list the available commands, either run renku with no parameters or execute renku help:

$ renku help
Usage: renku [OPTIONS] COMMAND [ARGS]...

Check common Renku commands used in various situations.


Options:
  --version                       Print version number.
  --global-config-path            Print global application's config path.
  --install-completion            Install completion for the current shell.
  --path <path>                   Location of a Renku repository.
                                  [default: (dynamic)]
  --renku-home <path>             Location of the Renku directory.
                                  [default: .renku]
  --external-storage / -S, --no-external-storage
                                  Use an external file storage service.
  -h, --help                      Show this message and exit.

Commands:
  # [...]

Configuration files

Depending on your system, you may find the configuration files used by Renku command line in a different folder. By default, the following rules are used:

MacOS:
~/Library/Application Support/Renku
Unix:
~/.config/renku
Windows:
C:\Users\<user>\AppData\Roaming\Renku

If in doubt where to look for the configuration file, you can display its path by running renku --global-config-path.

renku init

Create an empty Renku project or reinitialize an existing one.

Starting a Renku project

If you have an existing directory which you want to turn into a Renku project, you can type:

$ cd ~/my_project
$ renku init

or:

$ renku init ~/my_project

This creates a new subdirectory named .renku that contains all the necessary files for managing the project configuration.

If provided directory does not exist, it will be created.

Updating an existing project

There are situations when the required structure of a Renku project needs to be recreated or you have an existing Git repository. You can solve these situation by simply adding the --force option.

$ git init .
$ echo "# Example\nThis is a README." > README.md
$ git add README.md
$ git commit -m 'Example readme file'
# renku init would fail because there is a git repository
$ renku init --force

You can also enable the external storage system for output files, if it was not installed previously.

$ renku init --force --external-storage

renku config

Get and set Renku repository or global options.

Set values

You can set various Renku configuration options, for example the image registry URL, with a command like:

$ renku config registry https://registry.gitlab.com/demo/demo

By default, configuration is stored locally in the project’s directory. Use --global option to store configuration for all projects in your home directory.

Remove values

To remove a specific key from configuration use:

$ renku config --remove registry

By default, only local configuration is searched for removal. Use --global option to remove a global configuration value.

Query values

You can display all configuration values with:

$ renku config

Both local and global configuration files are read. Values in local configuration take precedence over global values. Use --local or --global flag to read corresponding configuration only.

You can provide a KEY to display only its value:

$ renku config registry
https://registry.gitlab.com/demo/demo

renku dataset

Renku CLI commands for handling of datasets.

Manipulating datasets

Creating an empty dataset inside a Renku project:

$ renku dataset create my-dataset
Creating a dataset ... OK

Listing all datasets:

$ renku dataset
ID        SHORT_NAME     TITLE          VERSION
--------  -------------  -------------  ---------
0ad1cb9a  some-dataset   Some Dataset
9436e36c  my-dataset     My Dataset

You can select which columns to display by using --columns to pass a comma-separated list of column names:

$ renku dataset --columns id,short_name,created,creators
ID        SHORT_NAME     CREATED              CREATORS
--------  -------------  -------------------  ---------
0ad1cb9a  some-dataset   2020-03-19 16:39:46  sam
9436e36c  my-dataset     2020-02-28 16:48:09  sam

Displayed results are sorted based on the value of the first column.

Deleting a dataset:

$ renku dataset rm some-dataset
OK

Working with data

Adding data to the dataset:

$ renku dataset add my-dataset http://data-url

This will copy the contents of data-url to the dataset and add it to the dataset metadata.

You can create a dataset when you add data to it for the first time by passing --create flag to add command:

$ renku dataset add --create new-dataset http://data-url

To add data from a git repository, you can specify it via https or git+ssh URL schemes. For example,

$ renku dataset add my-dataset git+ssh://host.io/namespace/project.git

Sometimes you want to import just specific paths within the parent project. In this case, use the --source or -s flag:

$ renku dataset add my-dataset --source path/within/repo/to/datafile \
    git+ssh://host.io/namespace/project.git

The command above will result in a structure like

data/
  my-dataset/
    datafile

You can use --destination or -d flag to change the name of the target file or directory. The semantics here are similar to the POSIX copy command: if the destination does not exist or if it is a file then the source will be renamed; if the destination exists and is a directory the source will be copied to it. You will get an error message if you try to move a directory to a file or copy multiple files into one.

$ renku dataset add my-dataset \
    --source path/within/repo/to/datafile \
    --destination new-dir/new-filename \
    git+ssh://host.io/namespace/project.git

will yield:

data/
  my-dataset/
    new-dir/
      new-filename

To add a specific version of files, use --ref option for selecting a branch, commit, or tag. The value passed to this option must be a valid reference in the remote Git repository.

Updating a dataset:

After adding files from a remote Git repository, you can check for updates in those files by using renku dataset update command. This command checks all remote files and copies over new content if there is any. It does not delete files from the local dataset if they are deleted from the remote Git repository; to force the delete use --delete argument. You can update to a specific branch, commit, or tag by passing --ref option.

You can limit the scope of updated files by specifying dataset names, using --include and --exclude to filter based on file names, or using --creators to filter based on creators. For example, the following command updates only CSV files from my-dataset:

$ renku dataset update -I '*.csv' my-dataset

Note that putting glob patterns in quotes is needed to tell Unix shell not to expand them.

Tagging a dataset:

A dataset can be tagged with an arbitrary tag to refer to the dataset at that point in time. A tag can be added like this:

$ renku dataset tag my-dataset 1.0 -d "Version 1.0 tag"

A list of all tags can be seen by running:

$ renku dataset ls-tags my-dataset
CREATED              NAME    DESCRIPTION      DATASET     COMMIT
-------------------  ------  ---------------  ----------  ----------------
2020-09-19 17:29:13  1.0     Version 1.0 tag  my-dataset  6c19a8d31545b...

A tag can be removed with:

$ renku dataset rm-tags my-dataset 1.0

Importing data from an external provider:

$ renku dataset import 10.5281/zenodo.3352150

This will import the dataset with the DOI (Digital Object Identifier) 10.5281/zenodo.3352150 and make it locally available. Dataverse and Zenodo are supported, with DOIs (e.g. 10.5281/zenodo.3352150 or doi:10.5281/zenodo.3352150) and full URLs (e.g. http://zenodo.org/record/3352150). A tag with the remote version of the dataset is automatically created.

Exporting data to an external provider:

$ renku dataset export my-dataset zenodo

This will export the dataset my-dataset to zenodo.org as a draft, allowing for publication later on. If the dataset has any tags set, you can chose if the repository HEAD version or one of the tags should be exported. The remote version will be set to the local tag that is being exported.

To export to a Dataverse provider you must pass Dataverse server’s URL and the name of the parent dataverse where the dataset will be exported to. Server’s URL is stored in your Renku setting and you don’t need to pass it every time.

Listing all files in the project associated with a dataset.

$ renku dataset ls-files
DATASET SHORT_NAME   ADDED                PATH
-------------------  -------------------  -----------------------------
my-dataset           2020-02-28 16:48:09  data/my-dataset/addme
my-dataset           2020-02-28 16:49:02  data/my-dataset/weather/file1
my-dataset           2020-02-28 16:49:02  data/my-dataset/weather/file2
my-dataset           2020-02-28 16:49:02  data/my-dataset/weather/file3

You can select which columns to display by using --columns to pass a comma-separated list of column names:

$ renku dataset ls-files --columns short_name,creators, path
DATASET SHORT_NAME   CREATORS   PATH
-------------------  ---------  -----------------------------
my-dataset           sam        data/my-dataset/addme
my-dataset           sam        data/my-dataset/weather/file1
my-dataset           sam        data/my-dataset/weather/file2
my-dataset           sam        data/my-dataset/weather/file3

Displayed results are sorted based on the value of the first column.

Sometimes you want to filter the files. For this we use --dataset, --include and --exclude flags:

$ renku dataset ls-files --include "file*" --exclude "file3"
DATASET SHORT_NAME  ADDED                PATH
------------------- -------------------  -----------------------------
my-dataset          2020-02-28 16:49:02  data/my-dataset/weather/file1
my-dataset          2020-02-28 16:49:02  data/my-dataset/weather/file2

Unlink a file from a dataset:

$ renku dataset unlink my-dataset --include file1
OK

Unlink all files within a directory from a dataset:

$ renku dataset unlink my-dataset --include "weather/*"
OK

Unlink all files from a dataset:

$ renku dataset unlink my-dataset
Warning: You are about to remove following from "my-dataset" dataset.
.../my-dataset/weather/file1
.../my-dataset/weather/file2
.../my-dataset/weather/file3
Do you wish to continue? [y/N]:

Note

The unlink command does not delete files, only the dataset record.

renku run

Track provenance of data created by executing programs.

Capture command line execution

Tracking execution of your command line script is done by simply adding the renku run command before the actual command. This will enable detection of:

  • arguments (flags),
  • string and integer options,
  • input files or directories if linked to existing paths in the repository,
  • output files or directories if modified or created while running the command.

Note

If there were uncommitted changes in the repository, then the renku run command fails. See git status for details.

Warning

Input and output paths can only be detected if they are passed as arguments to renku run.

Warning

Circular dependencies are not supported for renku run. See Circular Dependencies for more details.

Detecting input paths

Any path passed as an argument to renku run, which was not changed during the execution, is identified as an input path. The identification only works if the path associated with the argument matches an existing file or directory in the repository.

The detection might not work as expected if:

  • a file is modified during the execution. In this case it will be stored as an output;
  • a path is not passed as an argument to renku run.

Specifying auxiliary inputs (--input)

You can specify extra inputs to your program explicitly by using the --input option. This is useful for specifying hidden dependencies that don’t appear on the command line. These input file must exist before execution of renku run command. This option is not a replacement for the arguments that are passed on the command line. Files or directories specified with this option will not be passed as input arguments to the script.

Detecting output paths

Any path modified or created during the execution will be added as an output.

Because the output path detection is based on the Git repository state after the execution of renku run command, it is good to have a basic understanding of the underlying principles and limitations of tracking files in Git.

Git tracks not only the paths in a repository, but also the content stored in those paths. Therefore:

  • a recreated file with the same content is not considered an output file, but instead is kept as an input;
  • file moves are detected based on their content and can cause problems;
  • directories cannot be empty.

Note

When in doubt whether the outputs will be detected, remove all outputs using git rm <path> followed by git commit before running the renku run command.

Command does not produce any files (--no-output)

If the program does not produce any outputs, the execution ends with an error:

Error: There are not any detected outputs in the repository.

You can specify the --no-output option to force tracking of such an execution.

Specifying outputs explicitly (--output)

You can specify expected outputs of your program explicitly by using the --output option. These output must exist after the execution of the renku run command. However, they do not need to be modified by the command.

Detecting standard streams

Often the program expect inputs as a standard input stream. This is detected and recorded in the tool specification when invoked by renku run cat < A.

Similarly, both redirects to standard output and standard error output can be done when invoking a command:

$ renku run grep "test" B > C 2> D

Warning

Detecting inputs and outputs from pipes | is not supported.

Specifying inputs and outputs programmatically

Sometimes the list of inputs and outputs are not known before execution of the program. For example, a program might accept a date range as input and access all files within that range during its execution.

To address this issue, the program can dump a list of input and output files that it is accessing in inputs.txt and outputs.txt. Each line in these files is expected to be the path to an input or output file within the project’s directory. When the program is finished, Renku will look for existence of these two files and adds their content to the list of explicit inputs and outputs. Renku will then delete these two files.

By default, Renku looks for these two files in .renku/tmp directory. One can change this default location by setting RENKU_FILELIST_PATH environment variable. When set, it points to the directory within the project’s directory where inputs.txt and outputs.txt reside.

Exit codes

All Unix commands return a number between 0 and 255 which is called “exit code”. In case other numbers are returned, they are treaded module 256 (-10 is equivalent to 246, 257 is equivalent to 1). The exit-code 0 represents a success and non-zero exit-code indicates a failure.

Therefore the command specified after renku run is expected to return exit-code 0. If the command returns different exit code, you can specify them with --success-code=<INT> parameter.

$ renku run --success-code=1 --no-output fail

Circular Dependencies

Circular dependencies are not supported in renku run. This means you cannot use the same file or directory as both an input and an output in the same step, for instance reading from a file as input and then appending to it is not allowed. Since renku records all steps of an analysis workflow in a dependency graph and it allows you to update outputs when an input changes, this would lead to problems with circular dependencies. An update command would change the input again, leading to renku seeing it as a changed input, which would run update again, and so on, without ever stopping.

Due to this, the renku depedency graph has to be acyclic. So instead of appending to an input file or writing an output file to the same directory that was used as an input directory, create new files or write to other directories, respectively.

renku log

Show provenance of data created by executing programs.

File provenance

Unlike the traditional file history format, which shows previous revisions of the file, this format presents tool inputs together with their revision identifiers.

A * character shows to which lineage the specific file belongs to. A @ character in the graph lineage means that the corresponding file does not have any inputs and the history starts there.

When called without file names, renku log shows the history of most recently created files. With the --revision <refname> option the output is shown as it was in the specified revision.

Provenance examples

renku log B
Show the history of file B since its last creation or modification.
renku log --revision HEAD~5
Show the history of files that have been created or modified 5 commits ago.
renku log --revision e3f0bd5a D E
Show the history of files D and E as it looked in the commit e3f0bd5a.

Output formats

Following formats supported when specified with --format option:

  • ascii
  • dot
  • dot-full
  • dot-landscape
  • dot-full-landscape
  • dot-debug
  • json-ld
  • json-ld-graph
  • Makefile
  • nt
  • rdf

You can generate a PNG of the full history of all files in the repository using the dot program.

$ FILES=$(git ls-files --no-empty-directory --recurse-submodules)
$ renku log --format dot $FILES | dot -Tpng > /tmp/graph.png
$ open /tmp/graph.png

Output validation

The --strict option forces the output to be validated against the Renku SHACL schema, causing the command to fail if the generated output is not valid, as well as printing detailed information on all the issues found. The --strict option is only supported for the jsonld, rdf and nt output formats.

renku status

Show status of data files created in the repository.

Inspecting a repository

Displays paths of outputs which were generated from newer inputs files and paths of files that have been used in diverent versions.

The first paths are what need to be recreated by running renku update. See more in section about renku update.

The paths mentioned in the output are made relative to the current directory if you are working in a subdirectory (this is on purpose, to help cutting and pasting to other commands). They also contain first 8 characters of the corresponding commit identifier after the # (hash). If the file was imported from another repository, the short name of is shown together with the filename before @.

renku update

Update outdated files created by the “run” command.

Recreating outdated files

The information about dependencies for each file in the repository is generated from information stored in the underlying Git repository.

A minimal dependency graph is generated for each outdated file stored in the repository. It means that only the necessary steps will be executed and the workflow used to orchestrate these steps is stored in the repository.

Assume that the following history for the file H exists.

      C---D---E
     /         \
A---B---F---G---H

The first example shows situation when D is modified and files E and H become outdated.

      C--*D*--(E)
     /          \
A---B---F---G---(H)

** - modified
() - needs update

In this situation, you can do efectively two things:

  • Recreate a single file by running

    $ renku update E
    
  • Update all files by simply running

    $ renku update
    

Note

If there were uncommitted changes then the command fails. Check git status to see details.

Pre-update checks

In the next example, files A or B are modified, hence the majority of dependent files must be recreated.

        (C)--(D)--(E)
       /            \
*A*--*B*--(F)--(G)--(H)

To avoid excesive recreation of the large portion of files which could have been affected by a simple change of an input file, consider speficing a single file (e.g. renku update G). See also renku status.

Update siblings

If a tool produces multiple output files, these outputs need to be always updated together.

               (B)
              /
*A*--[step 1]--(C)
              \
               (D)

An attempt to update a single file would fail with the following error.

$ renku update C
Error: There are missing output siblings:

     B
     D

Include the files above in the command or use --with-siblings option.

The following commands will produce the same result.

$ renku update --with-siblings C
$ renku update B C D

renku rerun

Recreate files created by the “run” command.

Recreating files

Assume you have run a step 2 that uses a stochastic algorithm, so each run will be slightly different. The goal is to regenerate output C several times to compare the output. In this situation it is not possible to simply call renku update since the input file A has not been modified after the execution of step 2.

A-[step 1]-B-[step 2*]-C

Recreate a specific output file by running:

$ renku rerun C

If you would like to recreate a file which was one of several produced by a tool, then these files must be recreated as well. See the explanation in updating siblings.

renku rm

Remove a file, a directory, or a symlink.

Removing a file that belongs to a dataset will update its metadata. It also will attempt to update tracking information for files stored in an external storage (using Git LFS).

renku mv

Move or rename a file, a directory, or a symlink.

Moving a file that belongs to a dataset will update its metadata. It also will attempt to update tracking information for files stored in an external storage (using Git LFS). Finally it makes sure that all relative symlinks work after the move.

renku workflow

Manage the set of CWL files created by renku commands.

With no arguments, shows a list of captured CWL files. Several subcommands are available to perform operations on CWL files.

Reference tools and workflows

Managing large number of tools and workflows with automatically generated names may be cumbersome. The names can be added to the last executed run, rerun or update command by running renku workflow set-name <name>. The name can be added to an arbitrary file in .renku/workflow/*.cwl anytime later.

renku show

Show information about objects in current repository.

Siblings

In situations when multiple outputs have been generated by a single renku run command, the siblings can be discovered by running renku show siblings PATH command.

Assume that the following graph represents relations in the repository.

      D---E---G
     /     \
A---B---C   F

Then the following outputs would be shown.

$ renku show siblings C
C
D
$ renku show siblings G
F
G
$ renku show siblings A
A

Input and output files

You can list input and output files generated in the repository by running renku show inputs and renku show outputs commands. Alternatively, you can check if all paths specified as arguments are input or output files respectively.

$ renku run wc < source.txt > result.wc
$ renku show inputs
source.txt
$ renku show outputs
result.wc
$ renku show outputs source.txt
$ echo $?  # last command finished with an error code
1

renku storage

Manage an external storage.

renku doctor

Check your system and repository for potential problems.

renku migrate

Migrate files and metadata to the latest Renku version.

Datasets

The location of dataset metadata files has been changed from the data/<name>/metadata.yml to .renku/datasets/<UUID>/metadata.yml. All file paths inside a metadata file are relative to itself and the renku migrate datasets command will take care of it.

renku githooks

Install and uninstall Git hooks.

Prevent modifications of output files

The commit hooks are enabled by default to prevent situation when some output file is manually modified.

$ renku init
$ renku run echo hello > greeting.txt
$ edit greeting.txt
$ git commit greeting.txt
You are trying to update some output files.

Modified outputs:
  greeting.txt

If you are sure, use "git commit --no-verify".

Error Tracking

Renku is not bug-free and you can help us to find them.

GitHub

You can quickly open an issue on GitHub with a traceback and minimal system information when you hit an unhandled exception in the CLI.

Ahhhhhhhh! You have found a bug. 🐞

1. Open an issue by typing "open";
2. Print human-readable information by typing "print";
3. See the full traceback without submitting details (default: "ignore").

Please select an action by typing its name (open, print, ignore) [ignore]:

Sentry

When using renku as a hosted service the Sentry integration can be enabled to help developers iterate faster by showing them where bugs happen, how often, and who is affected.

  1. Install Sentry-SDK with python -m pip install sentry-sdk;
  2. Set environment variable SENTRY_DSN=https://<key>@sentry.<domain>/<project>.

Warning

User information might be sent to help resolving the problem. If you are not using your own Sentry instance you should inform users that you are sending possibly sensitive information to a 3rd-party service.