Repository API¶
This API is built on top of Git and Git-LFS.
Renku repository management.
- renku.core.management.RENKU_HOME = '.renku'¶
Project directory name.
Datasets¶
Dataset business logic.
- renku.core.dataset.dataset.create_dataset(name, client_dispatcher, title=None, description=None, creators=None, keywords=None, images=None, update_provenance=True, custom_metadata=None)[source]¶
Create a dataset.
- Parameters
name (str) – Name of the dataset
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
title (Optional[str], optional) – Dataset title (Default value = None).
description (Optional[str], optional) – Dataset description (Default value = None).
creators (Optional[List[Person]], optional) – Dataset creators (Default value = None).
keywords (Optional[List[str]], optional) – Dataset keywords (Default value = None).
images (Optional[List[ImageRequestModel]], optional) – Dataset images (Default value = None).
update_provenance (bool, optional) – Whether to add this dataset to dataset provenance (Default value = True).
custom_metadata (Optional[Dict[str, Any]], optional) – Custom JSON-LD metadata (Default value = None).
- Returns
The created dataset.
- Return type
- renku.core.dataset.dataset.edit_dataset(name, title, description, creators, client_dispatcher, keywords=None, images=None, skip_image_update=False, custom_metadata=None)[source]¶
Edit dataset metadata.
- Parameters
name (str) – Name of the dataset to edit
title (str) – New title for the dataset.
description (str) – New description for the dataset.
creators (List[Person]) – New creators for the dataset.
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
keywords (List[str], optional) – New keywords for dataset (Default value = None).
images (List[ImageRequestModel], optional) – New images for dataset (Default value = None).
skip_image_update (bool, optional) – Whether or not to skip updating dataset images (Default value = False).
custom_metadata (Dict, optional) – Custom JSON-LD metadata (Default value = None).
- Returns
True if updates were performed.
- Return type
- renku.core.dataset.dataset.export_dataset(name, provider_name, publish, tag, client_dispatcher, **kwargs)[source]¶
Export data to 3rd party provider.
- Parameters
name – Name of dataset to export.
provider_name – Provider to use for export.
publish – Whether to export as proper version or draft.
tag – Dataset tag from which to export.
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
- renku.core.dataset.dataset.file_unlink(name, include, exclude, client_dispatcher, yes=False)[source]¶
Remove matching files from a dataset.
- Parameters
name – Dataset name.
include – Include filter for files.
exclude – Exclude filter for files.
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
yes – Whether to skip user confirmation or not (Default value = False).
- Returns
List of files that were removed.
- Return type
List[DynamicProxy]
- renku.core.dataset.dataset.filter_dataset_files(client_dispatcher, dataset_gateway, names=None, creators=None, include=None, exclude=None, ignore=None, immutable=False)[source]¶
Filter dataset files by specified filters.
- Parameters
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
dataset_gateway (IDatasetGateway) – Injected dataset gateway.
names – Filter by specified dataset names. (Default value = None).
creators – Filter by creators. (Default value = None).
include – Include files matching file pattern. (Default value = None).
exclude – Exclude files matching file pattern. (Default value = None).
ignore – Ignored datasets. (Default value = None).
immutable – Return immutable copies of dataset objects. (Default value = False).
- Returns
List of filtered files sorted by date added.
- Return type
List[DynamicProxy]
- renku.core.dataset.dataset.import_dataset(uri, client_dispatcher, database_dispatcher, name='', extract=False, yes=False, previous_dataset=None, delete=False, gitlab_token=None)[source]¶
Import data from a 3rd party provider or another renku project.
- Parameters
uri – DOI or URL of dataset to import.
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
database_dispatcher (IDatabaseDispatcher) – Injected database dispatcher.
name – Name to give imported dataset (Default value = “”).
extract – Whether to extract compressed dataset data (Default value = False).
yes – Whether to skip user confirmation (Default value = False).
previous_dataset – Previously imported dataset version (Default value = None).
delete – Whether to delete files that don’t exist anymore (Default value = False).
gitlab_token – Gitlab OAuth2 token (Default value = None).
- renku.core.dataset.dataset.list_dataset_files(client_dispatcher, datasets=None, creators=None, include=None, exclude=None)[source]¶
List dataset files.
- Parameters
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
datasets – Datasets to list files for (Default value = None).
creators – Creators to filter by (Default value = None).
include – Include filters for file paths (Default value = None).
exclude – Exclude filters for file paths (Default value = None).
- Returns
Filtered dataset files.
- Return type
List[DynamicProxy]
- renku.core.dataset.dataset.move_files(client_dispatcher, dataset_gateway, files, to_dataset_name=None)[source]¶
Move files and their metadata from one or more datasets to a target dataset.
- Parameters
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
dataset_gateway (IDatasetGateway) – Injected dataset gateway.
files (Dict[Path, Path]) – Files to move
to_dataset_name (Optional[str], optional) – Target dataset (Default value = None)
- renku.core.dataset.dataset.remove_dataset(name)[source]¶
Delete a dataset.
- Parameters
name – Name of dataset to delete.
- renku.core.dataset.dataset.search_datasets(name)[source]¶
Get all the datasets whose name starts with the given string.
- renku.core.dataset.dataset.set_dataset_images(client, dataset, images)[source]¶
Set a dataset’s images.
- Parameters
client ("LocalClient") – The
LocalClient
.dataset (Dataset) – The dataset to set images on.
images (List[ImageRequestModel]) – The images to set.
- Returns
True if images were set/modified.
- renku.core.dataset.dataset.show_dataset(name)[source]¶
Show detailed dataset information.
- Parameters
name – Name of dataset to show details for.
- Returns
JSON dictionary of dataset details.
- Return type
- renku.core.dataset.dataset.update_dataset_custom_metadata(dataset, custom_metadata)[source]¶
Update custom metadata on a dataset.
- Parameters
dataset (Dataset) – The dataset to update.
custom_metadata (Dict) – Custom JSON-LD metadata to set.
- renku.core.dataset.dataset.update_dataset_git_files(client_dispatcher, files, ref, delete, dry_run)[source]¶
Update files and dataset metadata according to their remotes.
- Parameters
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
files (List[DynamicProxy]) – List of files to be updated.
ref (str) – Reference to use for update.
delete (bool, optional) – Indicates whether to delete files or not (Default value = False).
dry_run (bool) – Whether to perform update or only print changes.
- Returns
Tuple of updated and deleted file records.
- Return type
Tuple[List[DynamicProxy], List[DynamicProxy]]
- renku.core.dataset.dataset.update_dataset_local_files(client_dispatcher, records)[source]¶
Update files metadata from the git history.
- Parameters
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
records (List[DynamicProxy]) – File records to update.
- Returns
Tuple of updated and deleted file records.
- Return type
Tuple[List[DynamicProxy], List[DynamicProxy]]
- renku.core.dataset.dataset.update_datasets(names, creators, include, exclude, ref, delete, no_external, update_all, dry_run, client_dispatcher, dataset_gateway)[source]¶
Update dataset files.
- Parameters
names – Names of datasets to update.
creators – Creators to filter dataset files by.
include – Include filter for paths to update.
exclude – Exclude filter for paths to update.
ref – Git reference to use for update.
delete – Whether to delete files that don’t exist on remote anymore.
no_external – Whether to exclude external files from the update.
update_all – Whether to update all datasets.
dry_run – Whether to return a preview of what would be updated.
client_dispatcher (IClientDispatcher) – Injected client dispatcher.
dataset_gateway (IDatasetGateway) – Injected dataset gateway.
- renku.core.dataset.dataset.update_external_files(client, records, dry_run)[source]¶
Update files linked to external storage.
- Parameters
client ("LocalClient") – The
LocalCLient
.records (List[DynamicProxy]) – File records to update.
dry_run (bool) – Whether to return a preview of what would be updated.
Dataset add business logic.
- class renku.core.dataset.dataset_add.AddAction(value)[source]¶
Types of action when adding a file to a dataset.
- renku.core.dataset.dataset_add.add_data_to_dataset(dataset_name, urls, client_dispatcher, database_dispatcher, force=False, create=False, overwrite=False, sources=None, destination='', ref=None, external=False, extract=False, all_at_once=False, destination_names=None, repository=None, clear_files_before=False, total_size=None, with_metadata=None)[source]¶
Import the data into the data directory.
- renku.core.dataset.dataset_add.move_files_to_dataset(client, files)[source]¶
Copy/Move files into a dataset’s directory.
Dataset constants.
- renku.core.dataset.constant.CACHE = 'cache'¶
Directory to cache transient data.
- renku.core.dataset.constant.DATASET_IMAGES = 'dataset_images'¶
Directory for dataset images.
- renku.core.dataset.constant.POINTERS = 'pointers'¶
Directory for storing external pointer files.
- renku.core.dataset.constant.renku_dataset_images_path(client)[source]¶
Return a
Path
instance of Renku dataset metadata folder.
- renku.core.dataset.constant.renku_pointers_path(client)[source]¶
Return a
Path
instance of Renku pointer files folder.
Dataset context managers.
- class renku.core.dataset.context.DatasetContext(name, create=False, commit_database=False, creator=None)[source]¶
Dataset context manager for metadata changes.
Pointer file business logic.
- renku.core.dataset.pointer_file.create_external_file(client, target, path, checksum=None)[source]¶
Create a new external file.
- renku.core.dataset.pointer_file.create_pointer_file(client, target, checksum=None)[source]¶
Create a new pointer file.
- renku.core.dataset.pointer_file.get_pointer_file(client_path, path)[source]¶
Return pointer file from an external file.
- renku.core.dataset.pointer_file.is_external_file_updated(client_path, path)[source]¶
Check if an update to an external file is available.
- renku.core.dataset.pointer_file.update_external_file(client, path, checksum)[source]¶
Delete existing external file and create a new one.
Renku management dataset request models.
- class renku.core.dataset.request_model.ImageRequestModel(content_url, position, mirror_locally=False, safe_image_paths=None)[source]¶
Model for passing image information to dataset use-cases.
Tag management for dataset.
- renku.core.dataset.tag.add_dataset_tag(dataset_name, tag, description='', force=False)[source]¶
Adds a new tag to a dataset.
Validates if the tag already exists and that the tag follows the same rules as docker tags. See https://docs.docker.com/engine/reference/commandline/tag/ for a documentation of docker tag syntax.
- Raises
errors.ParameterError – If tag is too long or contains invalid characters.
- renku.core.dataset.tag.list_dataset_tags(dataset_name, format)[source]¶
List all tags for a dataset.
- renku.core.dataset.tag.prompt_access_token(exporter)[source]¶
Prompt user for an access token for a provider.
- Returns
The new access token
- renku.core.dataset.tag.remove_dataset_tags(dataset_name, tags)[source]¶
Removes tags from a dataset.
Datasets Provenance.
Repository¶
Client for handling a local repository.
- class renku.core.management.repository.PathMixin(path=<function default_path>)[source]¶
Define a default path attribute.
Method generated by attrs for class PathMixin.
- class renku.core.management.repository.RepositoryApiMixin(renku_home='.renku', parent=None, remote_cache=NOTHING, *, data_dir='data')[source]¶
Client for handling a local repository.
Method generated by attrs for class RepositoryApiMixin.
- DATABASE_PATH = 'metadata'¶
Directory for metadata storage.
- DOCKERFILE = 'Dockerfile'¶
Name of the Dockerfile in the repository.
- LOCK_SUFFIX = '.lock'¶
Default suffix for Renku lock file.
- data_dir¶
Define a name of the folder for storing datasets.
- property database_path¶
Path to the metadata storage directory.
- property docker_path¶
Path to the Dockerfile.
- init_repository(force=False, user=None, initial_branch=None)[source]¶
Initialize an empty Renku repository.
- property latest_agent¶
Returns latest agent version used in the repository.
- property lock¶
Create a Renku config lock.
- parent¶
Store a pointer to the parent repository.
- property project¶
Return the Project instance.
- property remote¶
Return host, owner and name of the remote if it exists.
- renku_home¶
Define a name of the Renku folder (Default value = ‘.renku’).
- renku_path¶
Store a
Path
instance of the Renku folder.
- property template_checksums¶
Return a
Path
instance to the template checksums file.
- property transaction_id¶
Get a transaction id for the current client to be used for grouping git commits.
Git Internals¶
Wrap Git client.
- class renku.core.management.git.GitCore[source]¶
Wrap Git client.
Method generated by attrs for class GitCore.
- property candidate_paths¶
Return all paths in the index and untracked files.
- commit(commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True, skip_dirty_checks=False)[source]¶
Automatic commit.
- property dirty_paths¶
Get paths of dirty files in the repository.
- property modified_paths¶
Return paths of modified files.
- renku.core.management.git.finalize_commit(client, diff_before, commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True)[source]¶
Commit modified/added paths.
- renku.core.management.git.finalize_worktree(client, isolation, path, branch_name, delete, new_branch, merge_args=('--ff-only',), exception=None)[source]¶
Cleanup and merge a previously created Git worktree.
- renku.core.management.git.get_mapped_std_streams(lookup_paths, streams=('stdin', 'stdout', 'stderr'))[source]¶
Get a mapping of standard streams to given paths.
- renku.core.management.git.prepare_commit(client, commit_only=None, skip_dirty_checks=False)[source]¶
Gather information about repo needed for committing later on.
- renku.core.management.git.prepare_worktree(original_client, path=None, branch_name=None, commit=None)[source]¶
Set up a Git worktree to provide isolation.
Git utilities.
- class renku.domain_model.git.GitURL(href, path=None, scheme='ssh', hostname='localhost', username=None, password=None, port=None, owner=None, name=None, slug=None, regex=None)[source]¶
Parser for common Git URLs.
Method generated by attrs for class GitURL.
- property image¶
Return image name.
- property instance_url¶
Get the url of the git instance.
Command Builder¶
Most renku commands require context (database/git/etc.) to be set up for them. The command builder pattern makes this easy by wrapping commands in factory methods.
Renku Command Builder .
- class renku.command.command_builder.Command[source]¶
Base renku command builder.
__init__ of Command.
- add_injection_pre_hook(order, hook)[source]¶
Add a pre-execution hook for dependency injection.
- Parameters
order (int) – Determines the order of executed hooks, lower numbers get executed first.
hook (Callable) – The hook to add.
- add_post_hook(order, hook)[source]¶
Add a post-execution hook.
- Parameters
order (int) – Determines the order of executed hooks, higher numbers get executed first.
hook (Callable) – The hook to add.
- add_pre_hook(order, hook)[source]¶
Add a pre-execution hook.
- Parameters
order (int) – Determines the order of executed hooks, lower numbers get executed first.
hook (Callable) – The hook to add.
- build()[source]¶
Build (finalize) the command.
- Returns
Finalized command that cannot be modified.
- Return type
- command(operation)[source]¶
Set the wrapped command.
- Parameters
operation (Callable) – The function to wrap in the command builder.
- Returns
This command.
- Return type
- execute(*args, **kwargs)[source]¶
Execute the wrapped operation.
First executes pre_hooks in ascending order, passing a read/write context between them. It then calls the wrapped operation. The result of the operation then gets pass to all the post_hooks, but in descending order. It then returns the result or error if there was one.
- Returns
Result of execution of command.
- Return type
CommandResult
- property finalized¶
Whether this builder is still being constructed or has been finalized.
- with_commit(message=None, commit_if_empty=False, raise_if_empty=False, commit_only=None)[source]¶
Create a commit.
- Parameters
message (str, optional) – The commit message. Auto-generated if left empty (Default value = None).
commit_if_empty (bool, optional) – Whether to commit if there are no modified files (Default value = False).
raise_if_empty (bool, optional) – Whether to raise an exception if there are no modified files (Default value = False).
commit_only (bool, optional) – Only commit the supplied paths (Default value = None).
- with_communicator(communicator)[source]¶
Create a communicator.
- Parameters
communicator (CommunicationCallback) – Communicator to use for writing to user.