Repository API

This API is built on top of Git and Git-LFS.

Renku repository management.

renku.core.management.RENKU_HOME = '.renku'

Project directory name.

Datasets

Dataset business logic.

renku.core.management.dataset.dataset.create_dataset(name, client_dispatcher, title=None, description=None, creators=None, keywords=None, images=None, update_provenance=True, custom_metadata=None)[source]

Create a dataset.

renku.core.management.dataset.dataset.edit_dataset(name, title, description, creators, client_dispatcher, keywords=None, images=None, skip_image_update=False, custom_metadata=None)[source]

Edit dataset metadata.

renku.core.management.dataset.dataset.export_dataset(name, provider_name, publish, tag, client_dispatcher, **kwargs)[source]

Export data to 3rd party provider.

Raises

ParameterError, HTTPError, InvalidAccessToken, DatasetNotFound

Remove matching files from a dataset.

renku.core.management.dataset.dataset.filter_dataset_files(client_dispatcher, dataset_gateway, names=None, creators=None, include=None, exclude=None, ignore=None, immutable=False)[source]

Filter dataset files by specified filters.

Parameters
  • names – Filter by specified dataset names.

  • creators – Filter by creators.

  • include – Include files matching file pattern.

  • exclude – Exclude files matching file pattern.

  • ignore – Ignored datasets.

  • immutable – Return immutable copies of dataset objects.

renku.core.management.dataset.dataset.import_dataset(uri, client_dispatcher, database_dispatcher, name='', extract=False, yes=False, previous_dataset=None, delete=False, gitlab_token=None)[source]

Import data from a 3rd party provider or another renku project.

renku.core.management.dataset.dataset.list_dataset_files(client_dispatcher, datasets=None, creators=None, include=None, exclude=None)[source]

List dataset files.

renku.core.management.dataset.dataset.list_datasets()[source]

List all datasets.

renku.core.management.dataset.dataset.move_files(client_dispatcher, dataset_gateway, files, to_dataset=None)[source]

Move files and their metadata from one or more datasets to a target dataset.

renku.core.management.dataset.dataset.remove_dataset(name)[source]

Delete a dataset.

renku.core.management.dataset.dataset.search_datasets(name)[source]

Get all the datasets whose name starts with the given string.

renku.core.management.dataset.dataset.set_dataset_images(client, dataset, images)[source]

Set a dataset’s images.

renku.core.management.dataset.dataset.show_dataset(name)[source]

Show detailed dataset information.

renku.core.management.dataset.dataset.update_dataset_custom_metadata(dataset, custom_metadata)[source]

Update custom metadata on a dataset.

renku.core.management.dataset.dataset.update_dataset_git_files(client_dispatcher, files, ref, delete=False)[source]

Update files and dataset metadata according to their remotes.

Parameters
  • files – List of files to be updated

  • delete – Indicates whether to delete files or not

Returns

List of files that should be deleted

renku.core.management.dataset.dataset.update_dataset_local_files(client_dispatcher, records, delete=False)[source]

Update files metadata from the git history.

renku.core.management.dataset.dataset.update_datasets(names, creators, include, exclude, ref, delete, client_dispatcher, dataset_gateway, external=False)[source]

Update dataset files.

renku.core.management.dataset.dataset.update_external_files(client, records)[source]

Update files linked to external storage.

Dataset add business logic.

renku.core.management.dataset.dataset_add.add_data_to_dataset(dataset_name, urls, client_dispatcher, database_dispatcher, force=False, create=False, overwrite=False, sources=None, destination='', ref=None, external=False, extract=False, all_at_once=False, destination_names=None, repository=None, clear_files_before=False, total_size=None, with_metadata=None)[source]

Import the data into the data directory.

renku.core.management.dataset.dataset_add.move_files_to_dataset(client, files)[source]

Copy/Move files into a dataset’s directory.

Dataset constants.

renku.core.management.dataset.constant.CACHE = 'cache'

Directory to cache transient data.

renku.core.management.dataset.constant.DATASET_IMAGES = 'dataset_images'

Directory for dataset images.

renku.core.management.dataset.constant.POINTERS = 'pointers'

Directory for storing external pointer files.

renku.core.management.dataset.constant.renku_dataset_images_path(client)[source]

Return a Path instance of Renku dataset metadata folder.

renku.core.management.dataset.constant.renku_pointers_path(client)[source]

Return a Path instance of Renku pointer files folder.

Dataset context managers.

class renku.core.management.dataset.context.DatasetContext(name, create=False, commit_database=False, creator=None)[source]

Dataset context manager for metadata changes.

Pointer file business logic.

renku.core.management.dataset.pointer_file.create_external_file(client, src, dst)[source]

Create a new external file.

renku.core.management.dataset.pointer_file.create_pointer_file(client, target, checksum=None)[source]

Create a new pointer file.

renku.core.management.dataset.pointer_file.update_pointer_file(client, pointer_file_path)[source]

Update a pointer file.

Renku management dataset request models.

class renku.core.management.dataset.request_model.ImageRequestModel(content_url, position, mirror_locally=False, safe_image_paths=None)[source]

Model for passing image information to dataset use-cases.

to_image_object(dataset, client_dispatcher)[source]

Convert request model to ImageObject.

Tag management for dataset.

renku.core.management.dataset.tag.add_dataset_tag(dataset_name, tag, description='', force=False)[source]

Adds a new tag to a dataset.

Validates if the tag already exists and that the tag follows the same rules as docker tags. See https://docs.docker.com/engine/reference/commandline/tag/ for a documentation of docker tag syntax.

Raises

errors.ParameterError

renku.core.management.dataset.tag.list_dataset_tags(dataset_name, format)[source]

List all tags for a dataset.

renku.core.management.dataset.tag.prompt_access_token(exporter)[source]

Prompt user for an access token for a provider.

Returns

The new access token

renku.core.management.dataset.tag.prompt_tag_selection(tags)[source]

Prompt user to chose a tag or <HEAD>.

renku.core.management.dataset.tag.remove_dataset_tags(dataset_name, tags)[source]

Removes tags from a dataset.

Datasets Provenance.

class renku.core.management.dataset.datasets_provenance.DatasetsProvenance[source]

A set of datasets.

add_or_update(dataset, date=None, creator=None)[source]

Add/update a dataset according to its new content.

NOTE: This functions always mutates the dataset.

add_tag(dataset, tag)[source]

Add a tag from a dataset.

property datasets

Return an iterator of datasets.

get_all_tags(dataset)[source]

Return the list of all tags for a dataset.

get_by_id(id, immutable=False)[source]

Return a dataset by its id.

get_by_name(name, immutable=False, strict=False)[source]

Return a dataset by its name.

get_previous_version(dataset)[source]

Return the previous version of a dataset if any.

get_provenance_tails()[source]

Return the provenance for all datasets.

remove(dataset, date=None, creator=None)[source]

Remove a dataset.

remove_tag(dataset, tag)[source]

Remove a tag from a dataset.

update_during_migration(dataset, commit_sha, date=None, tags=None, remove=False, replace=False, preserve_identifiers=False)[source]

Add, update, remove, or replace a dataset in migration.

Repository

Client for handling a local repository.

class renku.core.management.repository.PathMixin(path=<function default_path>)[source]

Define a default path attribute.

Method generated by attrs for class PathMixin.

class renku.core.management.repository.RepositoryApiMixin(renku_home='.renku', parent=None, remote_cache=NOTHING, *, data_dir='data')[source]

Client for handling a local repository.

Method generated by attrs for class RepositoryApiMixin.

DATABASE_PATH = 'metadata'

Directory for metadata storage.

DOCKERFILE = 'Dockerfile'

Name of the Dockerfile in the repository.

LOCK_SUFFIX = '.lock'

Default suffix for Renku lock file.

data_dir

Define a name of the folder for storing datasets.

property database_path

Path to the metadata storage directory.

property docker_path

Path to the Dockerfile.

get_in_submodules(commit, path)[source]

Resolve filename in submodules.

has_graph_files()[source]

Return true if database exists.

has_template_checksum()[source]

Return if project has a templates checksum file.

init_repository(force=False, user=None, initial_branch=None)[source]

Initialize an empty Renku repository.

is_project_set()[source]

Return if project is set for the client.

is_protected_path(path)[source]

Checks if a path is a protected path.

property latest_agent

Returns latest agent version used in the repository.

property lock

Create a Renku config lock.

parent

Store a pointer to the parent repository.

property project

Return the Project instance.

property remote

Return host, owner and name of the remote if it exists.

renku_home

Define a name of the Renku folder (default: .renku).

renku_path

Store a Path instance of the Renku folder.

property template_checksums

Return a Path instance to the template checksums file.

property transaction_id

Get a transaction id for the current client to be used for grouping git commits.

with_metadata(project_gateway, database_gateway, read_only=False, name=None, description=None, keywords=None, custom_metadata=None)[source]

Yield an editable metadata object.

renku.core.management.repository.path_converter(path)[source]

Converter for path in PathMixin.

Git Internals

Wrap Git client.

class renku.core.management.git.GitCore[source]

Wrap Git client.

Method generated by attrs for class GitCore.

property candidate_paths

Return all paths in the index and untracked files.

commit(commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True, skip_dirty_checks=False)[source]

Automatic commit.

property dirty_paths

Get paths of dirty files in the repository.

ensure_clean(ignore_std_streams=False)[source]

Make sure the repository is clean.

ensure_unstaged(path)[source]

Ensure that path is not part of git staged files.

ensure_untracked(path)[source]

Ensure that path is not part of git untracked files.

find_ignored_paths(*paths)[source]

Return ignored paths matching .gitignore file.

property modified_paths

Return paths of modified files.

remove_unmodified(paths, autocommit=True)[source]

Remove unmodified paths and return their names.

setup_credential_helper()[source]

Setup git credential helper to cache if not set already.

worktree(path=None, branch_name=None, commit=None, merge_args=('--ff-only',))[source]

Create new worktree.

renku.core.management.git.finalize_commit(client, diff_before, commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True)[source]

Commit modified/added paths.

renku.core.management.git.finalize_worktree(client, isolation, path, branch_name, delete, new_branch, merge_args=('--ff-only',), exception=None)[source]

Cleanup and merge a previously created Git worktree.

renku.core.management.git.get_mapped_std_streams(lookup_paths, streams=('stdin', 'stdout', 'stderr'))[source]

Get a mapping of standard streams to given paths.

renku.core.management.git.prepare_commit(client, commit_only=None, skip_dirty_checks=False)[source]

Gather information about repo needed for committing later on.

renku.core.management.git.prepare_worktree(original_client, path=None, branch_name=None, commit=None)[source]

Set up a Git worktree to provide isolation.

Git utilities.

class renku.core.models.git.GitURL(href, path=None, scheme='ssh', hostname='localhost', username=None, password=None, port=None, owner=None, name=None, slug=None, regex=None)[source]

Parser for common Git URLs.

Method generated by attrs for class GitURL.

property image

Return image name.

property instance_url

Get the url of the git instance.

classmethod parse(href)[source]

Derive URI components.

renku.core.models.git.filter_repo_name(repo_name)[source]

Remove the .git extension from the repo name.

Command Builder

Most renku commands require context (database/git/etc.) to be set up for them. The command builder pattern makes this easy by wrapping commands in factory methods.

Renku Command Builder .

class renku.core.management.command_builder.Command[source]

Base renku command builder.

__init__ of Command.

add_injection_pre_hook(order, hook)[source]

Add a pre-execution hook for dependency injection.

Parameters
  • order – Determines the order of executed hooks, lower numbers get executed first.

  • hook – The hook to add

add_post_hook(order, hook)[source]

Add a post-execution hook.

Parameters
  • order – Determines the order of executed hooks, lower numbers get executed first.

  • hook – The hook to add

add_pre_hook(order, hook)[source]

Add a pre-execution hook.

Parameters
  • order – Determines the order of executed hooks, lower numbers get executed first.

  • hook – The hook to add

build()[source]

Build (finalize) the command.

command(operation)[source]

Set the wrapped command.

Parameters

operation – The function to wrap in the command builder.

execute(*args, **kwargs)[source]

Execute the wrapped operation.

First executes pre_hooks in ascending order, passing a read/write context between them. It then calls the wrapped operation. The result of the operation then gets pass to all the post_hooks, but in descending order. It then returns the result or error if there was one.

property finalized

Whether this builder is still being constructed or has been finalized.

lock_dataset()[source]

Acquire a lock for a dataset.

lock_project()[source]

Acquire a lock for the whole project.

require_clean()[source]

Check that the repository is clean.

require_migration()[source]

Check if a migration is needed.

track_std_streams()[source]

Whether to track STD streams or not.

with_commit(message=None, commit_if_empty=False, raise_if_empty=False, commit_only=None)[source]

Create a commit.

Parameters
  • message – The commit message. Auto-generated if left empty.

  • commit_if_empty – Whether to commit if there are no modified files .

  • raise_if_empty – Whether to raise an exception if there are no modified files.

  • commit_only – Only commit the supplied paths.

with_communicator(communicator)[source]

Create a communicator.

with_database(write=False, path=None, create=False)[source]

Provide an object database connection.

with_git_isolation()[source]

Whether to run in git isolation or not.

working_directory(directory)[source]

Set the working directory for the command.

Parameters

directory – The working directory to work in.