Repository API

This API is built on top of Git and Git-LFS.

Renku repository management.

renku.core.management.RENKU_HOME = '.renku'

Project directory name.

Datasets

Dataset business logic.

renku.core.dataset.dataset.create_dataset(name, client_dispatcher, title=None, description=None, creators=None, keywords=None, images=None, update_provenance=True, custom_metadata=None)[source]

Create a dataset.

Parameters
  • name (str) – Name of the dataset

  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • title (Optional[str], optional) – Dataset title (Default value = None).

  • description (Optional[str], optional) – Dataset description (Default value = None).

  • creators (Optional[List[Person]], optional) – Dataset creators (Default value = None).

  • keywords (Optional[List[str]], optional) – Dataset keywords (Default value = None).

  • images (Optional[List[ImageRequestModel]], optional) – Dataset images (Default value = None).

  • update_provenance (bool, optional) – Whether to add this dataset to dataset provenance (Default value = True).

  • custom_metadata (Optional[Dict[str, Any]], optional) – Custom JSON-LD metadata (Default value = None).

Returns

The created dataset.

Return type

Dataset

renku.core.dataset.dataset.edit_dataset(name, title, description, creators, client_dispatcher, keywords=None, images=None, skip_image_update=False, custom_metadata=None)[source]

Edit dataset metadata.

Parameters
  • name (str) – Name of the dataset to edit

  • title (str) – New title for the dataset.

  • description (str) – New description for the dataset.

  • creators (List[Person]) – New creators for the dataset.

  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • keywords (List[str], optional) – New keywords for dataset (Default value = None).

  • images (List[ImageRequestModel], optional) – New images for dataset (Default value = None).

  • skip_image_update (bool, optional) – Whether or not to skip updating dataset images (Default value = False).

  • custom_metadata (Dict, optional) – Custom JSON-LD metadata (Default value = None).

Returns

True if updates were performed.

Return type

bool

renku.core.dataset.dataset.export_dataset(name, provider_name, publish, tag, client_dispatcher, **kwargs)[source]

Export data to 3rd party provider.

Parameters
  • name – Name of dataset to export.

  • provider_name – Provider to use for export.

  • publish – Whether to export as proper version or draft.

  • tag – Dataset tag from which to export.

  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

Remove matching files from a dataset.

Parameters
  • name – Dataset name.

  • include – Include filter for files.

  • exclude – Exclude filter for files.

  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • yes – Whether to skip user confirmation or not (Default value = False).

Returns

List of files that were removed.

Return type

List[DynamicProxy]

renku.core.dataset.dataset.filter_dataset_files(client_dispatcher, dataset_gateway, names=None, creators=None, include=None, exclude=None, ignore=None, immutable=False)[source]

Filter dataset files by specified filters.

Parameters
  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • dataset_gateway (IDatasetGateway) – Injected dataset gateway.

  • names – Filter by specified dataset names. (Default value = None).

  • creators – Filter by creators. (Default value = None).

  • include – Include files matching file pattern. (Default value = None).

  • exclude – Exclude files matching file pattern. (Default value = None).

  • ignore – Ignored datasets. (Default value = None).

  • immutable – Return immutable copies of dataset objects. (Default value = False).

Returns

List of filtered files sorted by date added.

Return type

List[DynamicProxy]

renku.core.dataset.dataset.import_dataset(uri, client_dispatcher, database_dispatcher, name='', extract=False, yes=False, previous_dataset=None, delete=False, gitlab_token=None)[source]

Import data from a 3rd party provider or another renku project.

Parameters
  • uri – DOI or URL of dataset to import.

  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • database_dispatcher (IDatabaseDispatcher) – Injected database dispatcher.

  • name – Name to give imported dataset (Default value = “”).

  • extract – Whether to extract compressed dataset data (Default value = False).

  • yes – Whether to skip user confirmation (Default value = False).

  • previous_dataset – Previously imported dataset version (Default value = None).

  • delete – Whether to delete files that don’t exist anymore (Default value = False).

  • gitlab_token – Gitlab OAuth2 token (Default value = None).

renku.core.dataset.dataset.list_dataset_files(client_dispatcher, datasets=None, creators=None, include=None, exclude=None)[source]

List dataset files.

Parameters
  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • datasets – Datasets to list files for (Default value = None).

  • creators – Creators to filter by (Default value = None).

  • include – Include filters for file paths (Default value = None).

  • exclude – Exclude filters for file paths (Default value = None).

Returns

Filtered dataset files.

Return type

List[DynamicProxy]

renku.core.dataset.dataset.list_datasets()[source]

List all datasets.

renku.core.dataset.dataset.move_files(client_dispatcher, dataset_gateway, files, to_dataset_name=None)[source]

Move files and their metadata from one or more datasets to a target dataset.

Parameters
  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • dataset_gateway (IDatasetGateway) – Injected dataset gateway.

  • files (Dict[Path, Path]) – Files to move

  • to_dataset_name (Optional[str], optional) – Target dataset (Default value = None)

renku.core.dataset.dataset.remove_dataset(name)[source]

Delete a dataset.

Parameters

name – Name of dataset to delete.

renku.core.dataset.dataset.search_datasets(name)[source]

Get all the datasets whose name starts with the given string.

Parameters

name (str) – Beginning of dataset name to search for.

Returns

List of found dataset names.

Return type

List[str]

renku.core.dataset.dataset.set_dataset_images(client, dataset, images)[source]

Set a dataset’s images.

Parameters
  • client ("LocalClient") – The LocalClient.

  • dataset (Dataset) – The dataset to set images on.

  • images (List[ImageRequestModel]) – The images to set.

Returns

True if images were set/modified.

renku.core.dataset.dataset.show_dataset(name)[source]

Show detailed dataset information.

Parameters

name – Name of dataset to show details for.

Returns

JSON dictionary of dataset details.

Return type

dict

renku.core.dataset.dataset.update_dataset_custom_metadata(dataset, custom_metadata)[source]

Update custom metadata on a dataset.

Parameters
  • dataset (Dataset) – The dataset to update.

  • custom_metadata (Dict) – Custom JSON-LD metadata to set.

renku.core.dataset.dataset.update_dataset_git_files(client_dispatcher, files, ref, delete, dry_run)[source]

Update files and dataset metadata according to their remotes.

Parameters
  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • files (List[DynamicProxy]) – List of files to be updated.

  • ref (str) – Reference to use for update.

  • delete (bool, optional) – Indicates whether to delete files or not (Default value = False).

  • dry_run (bool) – Whether to perform update or only print changes.

Returns

Tuple of updated and deleted file records.

Return type

Tuple[List[DynamicProxy], List[DynamicProxy]]

renku.core.dataset.dataset.update_dataset_local_files(client_dispatcher, records)[source]

Update files metadata from the git history.

Parameters
  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • records (List[DynamicProxy]) – File records to update.

Returns

Tuple of updated and deleted file records.

Return type

Tuple[List[DynamicProxy], List[DynamicProxy]]

renku.core.dataset.dataset.update_datasets(names, creators, include, exclude, ref, delete, no_external, update_all, dry_run, client_dispatcher, dataset_gateway)[source]

Update dataset files.

Parameters
  • names – Names of datasets to update.

  • creators – Creators to filter dataset files by.

  • include – Include filter for paths to update.

  • exclude – Exclude filter for paths to update.

  • ref – Git reference to use for update.

  • delete – Whether to delete files that don’t exist on remote anymore.

  • no_external – Whether to exclude external files from the update.

  • update_all – Whether to update all datasets.

  • dry_run – Whether to return a preview of what would be updated.

  • client_dispatcher (IClientDispatcher) – Injected client dispatcher.

  • dataset_gateway (IDatasetGateway) – Injected dataset gateway.

renku.core.dataset.dataset.update_external_files(client, records, dry_run)[source]

Update files linked to external storage.

Parameters
  • client ("LocalClient") – The LocalCLient.

  • records (List[DynamicProxy]) – File records to update.

  • dry_run (bool) – Whether to return a preview of what would be updated.

Dataset add business logic.

class renku.core.dataset.dataset_add.AddAction(value)[source]

Types of action when adding a file to a dataset.

renku.core.dataset.dataset_add.add_data_to_dataset(dataset_name, urls, client_dispatcher, database_dispatcher, force=False, create=False, overwrite=False, sources=None, destination='', ref=None, external=False, extract=False, all_at_once=False, destination_names=None, repository=None, clear_files_before=False, total_size=None, with_metadata=None)[source]

Import the data into the data directory.

renku.core.dataset.dataset_add.move_files_to_dataset(client, files)[source]

Copy/Move files into a dataset’s directory.

Dataset constants.

renku.core.dataset.constant.CACHE = 'cache'

Directory to cache transient data.

renku.core.dataset.constant.DATASET_IMAGES = 'dataset_images'

Directory for dataset images.

renku.core.dataset.constant.POINTERS = 'pointers'

Directory for storing external pointer files.

renku.core.dataset.constant.renku_dataset_images_path(client)[source]

Return a Path instance of Renku dataset metadata folder.

renku.core.dataset.constant.renku_pointers_path(client)[source]

Return a Path instance of Renku pointer files folder.

Dataset context managers.

class renku.core.dataset.context.DatasetContext(name, create=False, commit_database=False, creator=None)[source]

Dataset context manager for metadata changes.

Pointer file business logic.

renku.core.dataset.pointer_file.create_external_file(client, target, path, checksum=None)[source]

Create a new external file.

renku.core.dataset.pointer_file.create_pointer_file(client, target, checksum=None)[source]

Create a new pointer file.

renku.core.dataset.pointer_file.get_pointer_file(client_path, path)[source]

Return pointer file from an external file.

renku.core.dataset.pointer_file.is_external_file_updated(client_path, path)[source]

Check if an update to an external file is available.

renku.core.dataset.pointer_file.update_external_file(client, path, checksum)[source]

Delete existing external file and create a new one.

Renku management dataset request models.

class renku.core.dataset.request_model.ImageRequestModel(content_url, position, mirror_locally=False, safe_image_paths=None)[source]

Model for passing image information to dataset use-cases.

to_image_object(dataset, client_dispatcher)[source]

Convert request model to ImageObject.

Tag management for dataset.

renku.core.dataset.tag.add_dataset_tag(dataset_name, tag, description='', force=False)[source]

Adds a new tag to a dataset.

Validates if the tag already exists and that the tag follows the same rules as docker tags. See https://docs.docker.com/engine/reference/commandline/tag/ for a documentation of docker tag syntax.

Raises

errors.ParameterError – If tag is too long or contains invalid characters.

renku.core.dataset.tag.list_dataset_tags(dataset_name, format)[source]

List all tags for a dataset.

renku.core.dataset.tag.prompt_access_token(exporter)[source]

Prompt user for an access token for a provider.

Returns

The new access token

renku.core.dataset.tag.prompt_tag_selection(tags)[source]

Prompt user to chose a tag or <HEAD>.

renku.core.dataset.tag.remove_dataset_tags(dataset_name, tags)[source]

Removes tags from a dataset.

Datasets Provenance.

class renku.core.dataset.datasets_provenance.DatasetsProvenance[source]

A set of datasets.

add_or_update(dataset, date=None, creator=None)[source]

Add/update a dataset according to its new content.

NOTE: This functions always mutates the dataset.

add_tag(dataset, tag)[source]

Add a tag from a dataset.

property datasets

Return an iterator of datasets.

get_all_tags(dataset)[source]

Return the list of all tags for a dataset.

get_by_id(id, immutable=False)[source]

Return a dataset by its id.

get_by_name(name, immutable=False, strict=False)[source]

Return a dataset by its name.

get_previous_version(dataset)[source]

Return the previous version of a dataset if any.

get_provenance_tails()[source]

Return the provenance for all datasets.

remove(dataset, date=None, creator=None)[source]

Remove a dataset.

remove_tag(dataset, tag)[source]

Remove a tag from a dataset.

update_during_migration(dataset, commit_sha, date=None, tags=None, remove=False, replace=False, preserve_identifiers=False)[source]

Add, update, remove, or replace a dataset in migration.

Repository

Client for handling a local repository.

class renku.core.management.repository.PathMixin(path=<function default_path>)[source]

Define a default path attribute.

Method generated by attrs for class PathMixin.

class renku.core.management.repository.RepositoryApiMixin(renku_home='.renku', parent=None, remote_cache=NOTHING, *, data_dir='data')[source]

Client for handling a local repository.

Method generated by attrs for class RepositoryApiMixin.

DATABASE_PATH = 'metadata'

Directory for metadata storage.

DOCKERFILE = 'Dockerfile'

Name of the Dockerfile in the repository.

LOCK_SUFFIX = '.lock'

Default suffix for Renku lock file.

data_dir

Define a name of the folder for storing datasets.

property database_path

Path to the metadata storage directory.

property docker_path

Path to the Dockerfile.

get_in_submodules(commit, path)[source]

Resolve filename in submodules.

has_graph_files()[source]

Return true if database exists.

has_template_checksum()[source]

Return if project has a templates checksum file.

init_repository(force=False, user=None, initial_branch=None)[source]

Initialize an empty Renku repository.

is_project_set()[source]

Return if project is set for the client.

is_protected_path(path)[source]

Checks if a path is a protected path.

property latest_agent

Returns latest agent version used in the repository.

property lock

Create a Renku config lock.

parent

Store a pointer to the parent repository.

property project

Return the Project instance.

property remote

Return host, owner and name of the remote if it exists.

renku_home

Define a name of the Renku folder (Default value = ‘.renku’).

renku_path

Store a Path instance of the Renku folder.

property template_checksums

Return a Path instance to the template checksums file.

property transaction_id

Get a transaction id for the current client to be used for grouping git commits.

with_metadata(project_gateway, database_gateway, read_only=False, name=None, description=None, keywords=None, custom_metadata=None)[source]

Yield an editable metadata object.

Git Internals

Wrap Git client.

class renku.core.management.git.GitCore[source]

Wrap Git client.

Method generated by attrs for class GitCore.

property candidate_paths

Return all paths in the index and untracked files.

commit(commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True, skip_dirty_checks=False)[source]

Automatic commit.

property dirty_paths

Get paths of dirty files in the repository.

ensure_clean(ignore_std_streams=False)[source]

Make sure the repository is clean.

ensure_unstaged(path)[source]

Ensure that path is not part of git staged files.

ensure_untracked(path)[source]

Ensure that path is not part of git untracked files.

find_ignored_paths(*paths)[source]

Return ignored paths matching .gitignore file.

property modified_paths

Return paths of modified files.

remove_unmodified(paths, autocommit=True)[source]

Remove unmodified paths and return their names.

setup_credential_helper()[source]

Setup git credential helper to cache if not set already.

worktree(path=None, branch_name=None, commit=None, merge_args=('--ff-only',))[source]

Create new worktree.

renku.core.management.git.finalize_commit(client, diff_before, commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True)[source]

Commit modified/added paths.

renku.core.management.git.finalize_worktree(client, isolation, path, branch_name, delete, new_branch, merge_args=('--ff-only',), exception=None)[source]

Cleanup and merge a previously created Git worktree.

renku.core.management.git.get_mapped_std_streams(lookup_paths, streams=('stdin', 'stdout', 'stderr'))[source]

Get a mapping of standard streams to given paths.

renku.core.management.git.prepare_commit(client, commit_only=None, skip_dirty_checks=False)[source]

Gather information about repo needed for committing later on.

renku.core.management.git.prepare_worktree(original_client, path=None, branch_name=None, commit=None)[source]

Set up a Git worktree to provide isolation.

Git utilities.

class renku.domain_model.git.GitURL(href, path=None, scheme='ssh', hostname='localhost', username=None, password=None, port=None, owner=None, name=None, slug=None, regex=None)[source]

Parser for common Git URLs.

Method generated by attrs for class GitURL.

property image

Return image name.

property instance_url

Get the url of the git instance.

classmethod parse(href)[source]

Derive URI components.

renku.domain_model.git.filter_repo_name(repo_name)[source]

Remove the .git extension from the repo name.

Command Builder

Most renku commands require context (database/git/etc.) to be set up for them. The command builder pattern makes this easy by wrapping commands in factory methods.

Renku Command Builder .

class renku.command.command_builder.Command[source]

Base renku command builder.

__init__ of Command.

add_injection_pre_hook(order, hook)[source]

Add a pre-execution hook for dependency injection.

Parameters
  • order (int) – Determines the order of executed hooks, lower numbers get executed first.

  • hook (Callable) – The hook to add.

add_post_hook(order, hook)[source]

Add a post-execution hook.

Parameters
  • order (int) – Determines the order of executed hooks, higher numbers get executed first.

  • hook (Callable) – The hook to add.

add_pre_hook(order, hook)[source]

Add a pre-execution hook.

Parameters
  • order (int) – Determines the order of executed hooks, lower numbers get executed first.

  • hook (Callable) – The hook to add.

build()[source]

Build (finalize) the command.

Returns

Finalized command that cannot be modified.

Return type

Command

command(operation)[source]

Set the wrapped command.

Parameters

operation (Callable) – The function to wrap in the command builder.

Returns

This command.

Return type

Command

execute(*args, **kwargs)[source]

Execute the wrapped operation.

First executes pre_hooks in ascending order, passing a read/write context between them. It then calls the wrapped operation. The result of the operation then gets pass to all the post_hooks, but in descending order. It then returns the result or error if there was one.

Returns

Result of execution of command.

Return type

CommandResult

property finalized

Whether this builder is still being constructed or has been finalized.

lock_dataset()[source]

Acquire a lock for a dataset.

lock_project()[source]

Acquire a lock for the whole project.

require_clean()[source]

Check that the repository is clean.

require_migration()[source]

Check if a migration is needed.

track_std_streams()[source]

Whether to track STD streams or not.

Returns

This command.

Return type

Command

with_commit(message=None, commit_if_empty=False, raise_if_empty=False, commit_only=None)[source]

Create a commit.

Parameters
  • message (str, optional) – The commit message. Auto-generated if left empty (Default value = None).

  • commit_if_empty (bool, optional) – Whether to commit if there are no modified files (Default value = False).

  • raise_if_empty (bool, optional) – Whether to raise an exception if there are no modified files (Default value = False).

  • commit_only (bool, optional) – Only commit the supplied paths (Default value = None).

with_communicator(communicator)[source]

Create a communicator.

Parameters

communicator (CommunicationCallback) – Communicator to use for writing to user.

with_database(write=False, path=None, create=False)[source]

Provide an object database connection.

Parameters
  • write (bool, optional) – Whether or not to persist changes to the database (Default value = False).

  • path (str, optional) – Location of the database (Default value = None).

  • create (bool, optional) – Whether the database should be created if it doesn’t exist (Default value = False).

with_git_isolation()[source]

Whether to run in git isolation or not.

working_directory(directory)[source]

Set the working directory for the command.

Parameters

directory (str) – The working directory to work in.

Returns

This command.

Return type

Command