Low-level API

This API is built on top of Git and Git-LFS.

Renku repository management.

class renku.core.management.LocalClient(path=<function default_path>, renku_home='.renku', parent=None, external_storage_requested=True, *, data_dir='data')[source]

A low-level client for communicating with a local Renku repository.

Datasets

Client for handling datasets.

class renku.core.management.datasets.DatasetsApiMixin[source]

Client for handling datasets.

CACHE = 'cache'

Directory to cache transient data.

DATASETS = 'datasets'

Directory for storing dataset metadata in Renku.

POINTERS = 'pointers'

Directory for storing external pointer files.

add_data_to_dataset(dataset, urls, force=False, overwrite=False, sources=(), destination='', ref=None, external=False, extract=False, all_at_once=False, destination_names=None, progress=None)[source]

Import the data into the data directory.

add_dataset_tag(dataset, tag, description='', force=False)[source]

Adds a new tag to a dataset.

Validates if the tag already exists and that the tag follows the same rules as docker tags. See https://docs.docker.com/engine/reference/commandline/tag/ for a documentation of docker tag syntax.

Raises:errors.ParameterError
create_dataset(short_name=None, title=None, description=None, creators=None, keywords=None)[source]

Create a dataset.

dataset_commits(dataset, max_results=None)[source]

Gets the newest commit for a dataset or its files.

Commits are returned sorted from newest to oldest.

datasets

Return mapping from path to dataset.

datasets_from_commit(commit=None)[source]

Return datasets defined in a commit.

get_dataset_path(short_name)[source]

Get dataset path from short_name.

get_relative_url(url)[source]

Determine if the repo url should be relative.

has_external_files()[source]

Return True if project has external files.

load_dataset(short_name=None)[source]

Load dataset reference file.

load_dataset_from_path(path, commit=None)[source]

Return a dataset from a given path.

prepare_git_repo(url, ref=None)[source]

Clone and cache a Git repo.

remove_dataset_tags(dataset, tags)[source]

Removes tags from a dataset.

remove_file(filepath)[source]

Remove a file/symlink and its pointer file (for external files).

renku_datasets_path

Return a Path instance of Renku dataset metadata folder.

renku_pointers_path

Return a Path instance of Renku pointer files folder.

update_dataset_files(files, ref, delete=False)[source]

Update files and dataset metadata according to their remotes.

Parameters:
  • files – List of files to be updated
  • delete – Indicates whether to delete files or not
Returns:

List of files that should be deleted

update_external_files(records)[source]

Update files linked to external storage.

with_dataset(short_name=None, create=False)[source]

Yield an editable metadata object for a dataset.

class renku.core.management.datasets.DownloadProgressCallback(description, total_size)[source]

Interface to report various stages of a download.

Default initializer.

finalize()[source]

Called once when the download is finished.

update(size)[source]

Update the status.

Repository

Client for handling a local repository.

class renku.core.management.repository.PathMixin(path=<function default_path>)[source]

Define a default path attribute.

class renku.core.management.repository.RepositoryApiMixin(renku_home='.renku', parent=None, *, data_dir='data')[source]

Client for handling a local repository.

ACTIVITY_INDEX = 'activity_index.yaml'

Caches activities that generated a path.

LOCK_SUFFIX = '.lock'

Default suffix for Renku lock file.

METADATA = 'metadata.yml'

Default name of Renku config file.

WORKFLOW = 'workflow'

Directory for storing workflow in Renku.

activities_for_paths(paths, file_commit=None, revision='HEAD')[source]

Get all activities involving a path.

activity_index_path

Path to the activity filepath cache.

add_to_activity_index(activity)[source]

Add an activity and it’s generations to the cache.

cwl_prefix[source]

Return a CWL prefix.

data_dir = None

Define a name of the folder for storing datasets.

find_previous_commit(paths, revision='HEAD', return_first=False, full=False)[source]

Return a previous commit for a given path starting from revision.

Parameters:
  • revision – revision to start from, defaults to HEAD
  • return_first – show the first commit in the history
  • full – return full history
Raises:

KeyError – if path is not present in the given commit

import_from_template(template_path, metadata, force=False)[source]

Render template files from a template directory.

init_repository(force=False)[source]

Initialize an empty Renku repository.

is_project_set()[source]

Return if project is set for the client.

is_workflow(path)[source]

Check if the path is a valid CWL file.

lock

Create a Renku config lock.

parent = None

Store a pointer to the parent repository.

path_activity_cache

Cache of all activities and their generated paths.

process_commit(commit=None, path=None)[source]

Build an Activity.

Parameters:
  • commit – Commit to process. (default: HEAD)
  • path – Process a specific CWL file.
project

Return the Project instance.

remote

Return host, owner and name of the remote if it exists.

renku_home = None

Define a name of the Renku folder (default: .renku).

renku_metadata_path

Return a Path instance of Renku metadata file.

renku_path = None

Store a Path instance of the Renku folder.

resolve_in_submodules(commit, path)[source]

Resolve filename in submodules.

subclients(parent_commit)[source]

Return mapping from submodule to client.

submodules[source]

Return list of submodules it belongs to.

with_commit(commit)[source]

Yield the state of the repo at a specific commit.

with_metadata(read_only=False, name=None)[source]

Yield an editable metadata object.

with_workflow_storage()[source]

Yield a workflow storage.

workflow_names[source]

Return index of workflow names.

workflow_path

Return a Path instance of the workflow folder.

renku.core.management.repository.default_path()[source]

Return default repository path.

renku.core.management.repository.path_converter(path)[source]

Converter for path in PathMixin.

Git Internals

Wrap Git client.

class renku.core.management.git.GitCore[source]

Wrap Git client.

candidate_paths

Return all paths in the index and untracked files.

commit(commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None)[source]

Automatic commit.

dirty_paths

Get paths of dirty files in the repository.

ensure_clean(ignore_std_streams=False)[source]

Make sure the repository is clean.

ensure_unstaged(path)[source]

Ensure that path is not part of git staged files.

ensure_untracked(path)[source]

Ensure that path is not part of git untracked files.

find_attr(*paths)[source]

Return map with path and its attributes.

find_ignored_paths(*paths)[source]

Return ignored paths matching .gitignore file.

modified_paths

Return paths of modified files.

remove_unmodified(paths, autocommit=True)[source]

Remove unmodified paths and return their names.

repo = None

Store an instance of the Git repository.

setup_credential_helper()[source]

Setup git credential helper to cache if not set already.

transaction(clean=True, commit=True, commit_empty=True, commit_message=None, commit_only=None, ignore_std_streams=False, raise_if_empty=False, up_to_date=False)[source]

Perform Git checks and operations.

worktree(path=None, branch_name=None, commit=None, merge_args=('--ff-only', ))[source]

Create new worktree.

Git utilities.

class renku.core.models.git.GitURL(href, pathname=None, protocol='ssh', hostname='localhost', username=None, password=None, port=None, owner=None, name=None, regex=None)[source]

Parser for common Git URLs.

image

Return image name.

classmethod parse(href)[source]

Derive basic informations.

class renku.core.models.git.Range(start, stop)[source]

Represent parsed Git revision as an interval.

classmethod rev_parse(git, revision)[source]

Parse revision string.

renku.core.models.git.filter_repo_name(repo_name)[source]

Remove the .git extension from the repo name.