Gateways

Renku uses several gateways to abstract away dependencies on external systems such as the database or git.

Interfaces

Interfaces that the Gateways implement.

Renku activity gateway interface.

class renku.core.interface.activity_gateway.IActivityGateway[source]

Bases: ABC

Interface for the ActivityGateway.

add(activity)[source]

Add an Activity to storage.

add_activity_collection(activity_collection)[source]

Add an ActivityCollection to storage.

get_activities_by_generation(path, checksum=None)[source]

Return the list of all activities that generate a path.

get_activities_by_usage(path, checksum=None)[source]

Return the list of all activities that use a path.

get_all_activities(include_deleted=False)[source]

Get all activities in the project.

get_all_activity_collections()[source]

Get all activity collections in the project.

get_all_generation_paths()[source]

Return all generation paths.

get_all_usage_paths()[source]

Return all usage paths.

get_by_id(id)[source]

Get an activity by id.

get_downstream_activities(activity, max_depth=None)[source]

Get downstream activities that depend on this activity.

get_downstream_activity_chains(activity)[source]

Get a list of tuples of all downstream paths of this activity.

get_upstream_activities(activity, max_depth=None)[source]

Get upstream activities that this activity depends on.

get_upstream_activity_chains(activity)[source]

Get a list of tuples of all upstream paths of this activity.

remove(activity, keep_reference=True, force=False)[source]

Remove an activity from the storage.

Parameters:
  • activity (Activity) – The activity to be removed.

  • keep_reference (bool) – Whether to keep the activity in the activities index or not.

  • force (bool) – Force-delete the activity even if it has downstream activities.

Renku database gateway interface.

class renku.core.interface.database_gateway.IDatabaseGateway[source]

Bases: ABC

Gateway interface for basic database operations.

commit()[source]

Commit changes to database.

get_modified_objects_from_revision(revision_or_range)[source]

Get all database objects modified in a revision.

initialize()[source]

Initialize the database.

Renku dataset gateway interface.

class renku.core.interface.dataset_gateway.IDatasetGateway[source]

Bases: ABC

Interface for the DatasetGateway.

add_or_remove(dataset)[source]

Add or remove a dataset.

add_tag(dataset, tag)[source]

Add a tag from a dataset.

get_all_active_datasets()[source]

Get all datasets.

get_all_tags(dataset)[source]

Return the list of all tags for a dataset.

get_by_id(id)[source]

Get a dataset by id.

get_by_slug(slug)[source]

Get a dataset by slug.

get_provenance_tails()[source]

Return the provenance for all datasets.

remove_tag(dataset, tag)[source]

Remove a tag from a dataset.

External storage interface.

class renku.core.interface.storage.FileHash(uri, path, size, hash)[source]

Bases: object

The hash for a file at a specific location.

class renku.core.interface.storage.IStorage(storage_scheme, provider, credentials, provider_configuration)[source]

Bases: ABC

Interface for the external storage handler.

property credentials

Return the provider credentials for this storage handler.

abstract download(uri, destination)[source]

Download data from uri to destination.

abstract exists(uri)[source]

Checks if a remote storage URI exists.

abstract get_configurations()[source]

Get required configurations to access the storage.

abstract get_hashes(uri, hash_type='md5')[source]

Get the hashes of all files at the uri.

abstract is_directory(uri)[source]

Return True if URI points to a directory.

abstract mount(path)[source]

Mount the provider’s URI to the given path.

property provider

Return the dataset provider for this storage handler.

property storage_scheme

Storage’s URI scheme.

abstract upload(source, uri)[source]

Upload data from source to uri.

class renku.core.interface.storage.IStorageFactory[source]

Bases: ABC

Interface to get a cloud storage.

abstract static get_storage(storage_scheme, provider, credentials, configuration)[source]

Return a storage that handles provider.

Parameters:
  • storage_scheme (str) – Storage name.

  • provider (CloudStorageProviderType) – The backend provider.

  • credentials (ProviderCredentials) – Credentials for the provider.

  • configuration (Dict[str, str]) – Storage-specific configuration that are passed to the IStorage implementation

Returns:

An instance of IStorage.

Renku plan gateway interface.

class renku.core.interface.plan_gateway.IPlanGateway[source]

Bases: ABC

Interface for the PlanGateway.

add(plan)[source]

Add a plan to the database.

get_all_plans()[source]

Get all plans in project.

get_by_id(id)[source]

Get a plan by id.

get_by_name(name)[source]

Get a plan by name.

get_by_name_or_id(name_or_id)[source]

Get a plan by name or id.

get_newest_plans_by_names(include_deleted=False)[source]

Return a mapping of all plan names to their newest plans.

list_by_name(starts_with, ends_with=None)[source]

Search plans by name.

Renku project gateway interface.

class renku.core.interface.project_gateway.IProjectGateway[source]

Bases: ABC

Interface for the ProjectGateway.

get_project()[source]

Get project metadata.

update_project(project)[source]

Update project metadata.

Implementations

Implementation of Gateway interfaces.

Renku activity database gateway implementation.

class renku.infrastructure.gateway.activity_gateway.ActivityGateway[source]

Bases: IActivityGateway

Gateway for activity database operations.

add(activity)[source]

Add an Activity to storage.

add_activity_collection(activity_collection)[source]

Add an ActivityCollection to storage.

get_activities_by_generation(path, checksum=None)[source]

Return the list of all activities that generate a path.

get_activities_by_usage(path, checksum=None)[source]

Return the list of all activities that use a path.

get_all_activities(include_deleted=False)[source]

Get all activities in the project.

get_all_activity_collections()[source]

Get all activity collections in the project.

get_all_generation_paths()[source]

Return all generation paths.

get_all_usage_paths()[source]

Return all usage paths.

get_by_id(id)[source]

Get an activity by id.

get_downstream_activities(activity, max_depth=None)[source]

Get downstream activities that depend on this activity.

get_downstream_activity_chains(activity)[source]

Get a list of tuples of all downstream paths of this activity.

get_upstream_activities(activity, max_depth=None)[source]

Get upstream activities that this activity depends on them.

get_upstream_activity_chains(activity)[source]

Get a list of tuples of all upstream paths of this activity.

remove(activity, keep_reference=True, force=False)[source]

Remove an activity from the storage.

Parameters:
  • activity (Activity) – The activity to be removed.

  • keep_reference (bool) – Whether to keep the activity in the activities index or not.

  • force (bool) – Force-delete the activity even if it has downstream activities.

renku.infrastructure.gateway.activity_gateway.reindex_catalog(database)[source]

Clear and re-create database’s activity-catalog and its relations.

Renku generic database gateway implementation.

class renku.infrastructure.gateway.database_gateway.ActivityDownstreamRelation(downstream, upstream)[source]

Bases: object

Implementation of Downstream interface.

class renku.infrastructure.gateway.database_gateway.DatabaseGateway[source]

Bases: IDatabaseGateway

Gateway for base database operations.

commit()[source]

Commit changes to database.

get_modified_objects_from_revision(revision_or_range)[source]

Get all database objects modified in a revision.

initialize()[source]

Initialize the database.

renku.infrastructure.gateway.database_gateway.dump_activity(activity, catalog, cache)[source]

Get storage token for an activity.

renku.infrastructure.gateway.database_gateway.dump_downstream_relations(relation, catalog, cache)[source]

Dump relation entry to database.

renku.infrastructure.gateway.database_gateway.initialize_database(database)[source]

Initialize an empty database with all required metadata.

renku.infrastructure.gateway.database_gateway.load_activity(token, catalog, cache)[source]

Load activity from storage token.

renku.infrastructure.gateway.database_gateway.load_downstream_relations(token, catalog, cache)[source]

Load relation entry from database.

Renku dataset gateway interface.

class renku.infrastructure.gateway.dataset_gateway.DatasetGateway[source]

Bases: IDatasetGateway

Gateway for dataset database operations.

add_or_remove(dataset)[source]

Add or remove a dataset.

add_tag(dataset, tag)[source]

Add a tag from a dataset.

get_all_active_datasets()[source]

Return all datasets.

get_all_tags(dataset)[source]

Return the list of all tags for a dataset.

get_by_id(id)[source]

Get a dataset by id.

get_by_slug(slug)[source]

Get a dataset by slug.

get_provenance_tails()[source]

Return the provenance for all datasets.

remove_tag(dataset, tag)[source]

Remove a tag from a dataset.

Storage factory implementation.

class renku.infrastructure.storage.factory.StorageFactory[source]

Bases: IStorageFactory

Return an external storage.

static get_storage(storage_scheme, provider, credentials, configuration)[source]

Return a storage that handles provider.

Parameters:
  • storage_scheme (str) – Storage name.

  • provider (CloudStorageProviderType) – The backend provider.

  • credentials (ProviderCredentials) – Credentials for the provider.

  • configuration (Dict[str, str]) – Storage-specific configuration that are passed to the IStorage implementation

Returns:

An instance of IStorage.

Base storage handler.

class renku.infrastructure.storage.rclone.RCloneStorage(storage_scheme, provider, credentials, provider_configuration)[source]

Bases: IStorage

External storage implementation that uses RClone.

download(uri, destination)[source]

Download data from uri to destination.

exists(uri)[source]

Checks if a remote storage URI exists.

get_configurations()[source]

Get required configurations for rclone to access the storage.

get_hashes(uri, hash_type='md5')[source]

Download hashes with rclone and parse them.

Returns a tuple containing a list of parsed hashes.

Parameters:
  • uri (str) – Provider uri.

  • hash_type (str) – Type of hash to get from rclone (Default value = md5).

Example

hashes_raw json:

[
    {
        "Path":"resources/hg19.window.masker.bed.gz.tbi","Name":"hg19.window.masker.bed.gz.tbi",
        "Size":578288,"MimeType":"application/x-gzip","ModTime":"2022-02-07T18:45:52.000000000Z",
        "IsDir":false,"Hashes":{"md5":"e93ac5364e7799bbd866628d66c7b773"},"Tier":"STANDARD"
    }
]
is_directory(uri)[source]

Return True if URI points to a directory.

NOTE: This returns True for non-existing paths on bucket-based backends like S3 since listing non-existing paths won’t fail and there is no way to distinguish between empty directories and non-existing paths.

list_files(uri, *args, **kwargs)[source]

List a URI and return results in JSON format.

mount(path)[source]

Mount the provider’s URI to the given path.

run_command(command, *args, **kwargs)[source]

Run a RClone command with storage-specific configuration.

run_command_with_uri(command, uri, *args, **kwargs)[source]

Run a RClone command by converting a given URI.

upload(source, uri)[source]

Upload data from source to uri.

renku.infrastructure.storage.rclone.get_rclone_env_var_name(provider_name, name)[source]

Get name of an RClone env var config.

renku.infrastructure.storage.rclone.run_rclone_command(command, *args, env=None, **kwargs)[source]

Execute an RClone command.

renku.infrastructure.storage.rclone.transform_args(*args)[source]

Transforms args to command line args.

renku.infrastructure.storage.rclone.transform_kwargs(**kwargs)[source]

Transforms kwargs to command line args.

Renku plan database gateway implementation.

class renku.infrastructure.gateway.plan_gateway.PlanGateway[source]

Bases: IPlanGateway

Gateway for plan database operations.

add(plan)[source]

Add a plan to the database.

get_all_plans()[source]

Get all plans in project.

get_by_id(id)[source]

Get a plan by id.

get_by_name(name)[source]

Get a plan by name.

get_by_name_or_id(name_or_id)[source]

Get a plan by name or id.

get_newest_plans_by_names(include_deleted=False)[source]

Return a mapping of all plan names to their newest plans.

list_by_name(starts_with, ends_with=None)[source]

Search plans by name.

Renku project gateway interface.

class renku.infrastructure.gateway.project_gateway.ProjectGateway[source]

Bases: IProjectGateway

Gateway for project database operations.

get_project()[source]

Get project metadata.

update_project(project)[source]

Update project metadata.

Repository

Renku uses git repositories for tracking changes. To abstract away git internals, we delegate all git calls to the Repository class.

An abstraction layer for the underlying VCS.

class renku.infrastructure.repository.Actor(name, email)[source]

Bases: NamedTuple

Author/creator of a commit.

Create new instance of Actor(name, email)

email

Alias for field number 1

name

Alias for field number 0

class renku.infrastructure.repository.BaseRepository(path='.', repository=None)[source]

Bases: object

Abstract Base repository.

property active_branch

Return current checked out branch.

add(*paths, force=False, all=False)[source]

Add a list of files to be committed to the VCS.

add_ignored_pattern(pattern)[source]

Add the pattern to the .gitignore file.

property all_files

Return absolute paths of all files in the index and untracked files.

property branches

Return all branches.

checkout(reference=None, sparse=None)[source]

Check-out a specific reference.

clean(paths=None)[source]

Remove untracked files.

close()[source]

Close the underlying repository.

Cleans up dangling processes.

commit(message, *, amend=False, author=None, committer=None, no_verify=False, no_edit=False, paths=None)[source]

Commit added files to the VCS.

contains(path)[source]

Return True if path is tracked in the repository.

copy_content_to_file(path, *, revision=None, checksum=None, output_path=None, apply_filters=True)[source]

Get content of an object using its checksum, write it to a file, and return the file’s path.

Parameters:
  • path (Union[Path, str]) – Relative or absolute path to the file.

  • revision (Optional[Union[Reference, str]]) – A commit/branch/tag to get the file from. This cannot be passed with checksum.

  • checksum (Optional[str]) – Git hash of the file to be retrieved. This cannot be passed with revision.

  • output_path (Optional[Union[Path, str]]) – A path to copy the content to. A temporary file is created if it is None.

  • apply_filters (bool) – Whether to apply Git filter on the retrieved object. Note that apply_filters still works if repository is cloned with --skip-smudge or if GIT_LFS_SKIP_SMUDGE is set. It also works if there is not entry for the file in .gitattributes (e.g. when a file was deleted). The reason is that we use git lfs smudge command to get the file content if this option is passed and we also disable GIT_LFS_SKIP_SMUDGE.

Returns:

The path to the created file.

create_worktree(path, reference, branch=None, checkout=True, detach=False)[source]

Create a git worktree.

Parameters:
  • path (Path) – Target folder.

  • reference (Union[Branch, Commit, Reference, str]) – the reference to base the tree on.

  • branch (str, optional) – Optional new branch to create in the worktree.

  • checkout (bool, optional) – Whether to perform a checkout of the reference (Default value = False).

  • detach (bool, optional) – Whether to detach HEAD in worktree (Default value = False).

fetch(remote=None, refspec=None, all=False, tags=False, unshallow=False, depth=None)[source]

Update a remote branches.

property files

Return a list of all files in the current version of the repository.

get_attributes(*paths)[source]

Return a map from paths to its attributes.

NOTE: Dict keys are the same relative or absolute path as inputs.

get_commit(revision)[source]

Return Commit with the provided sha.

get_configuration(writable=False, scope=None)[source]

Return git configuration.

NOTE: Scope can be “global” or “local”.

get_content(path, *, revision=None, checksum=None, binary=False)[source]

Get content of a file in a given revision as text or binary.

get_existing_paths_in_revision(paths=None, revision='HEAD')[source]

List all paths that exist in a revision.

static get_global_configuration(writable=False)[source]

Return global git configuration.

static get_global_user()[source]

Return the global git user.

get_historical_changes_patch(path)[source]

Return a patch of all changes to a file.

get_ignored_paths(*paths)[source]

Return ignored paths matching .gitignore file.

NOTE: This function returns the same value as inputs: If input is an absolute path output is an absolute path. The same is true for relative paths. NOTE: Relative paths should be relative to the current working directory and not the repository’s root.

get_object_hash(path, revision=None)[source]

Return git hash of an object in a Repo or its submodule.

NOTE: path must be relative to the repo’s root regardless if this function is called from a subdirectory or not.

get_object_hashes(paths, revision=None)[source]

Return git hash of an object in a Repo or its submodule.

NOTE: path must be relative to the repo’s root regardless if this function is called from a subdirectory or not.

get_previous_commit(path, revision=None, first=False, full_history=True, submodule=False)[source]

Return a previous commit for a given path starting from revision.

get_raw_content(*, path, revision=None, checksum=None)[source]

Get raw content of a file in a given revision as text without applying any filter on it.

get_revisions_paths(*checksums)[source]

Return a revision:path tuple for each checksum so that revision contains the given blob with the checksum.

get_sizes(*checksums)[source]

Return size of blobs given their checksum.

get_user()[source]

Return the local/global git user.

static hash_object(path)[source]

Create a git hash for a a path. The path doesn’t need to be in a repository.

static hash_objects(paths)[source]

Create a git hash for a list of paths. The paths don’t need to be in a repository.

static hash_string(content)[source]

Calculate the object-hash for a blob with specified content.

property head

HEAD of the repository.

is_dirty(untracked_files=True)[source]

Return True if the repository has modified or untracked files ignoring submodules.

is_valid()[source]

Return True if a valid repository exists.

iterate_commits(*paths, revision=None, reverse=False, full_history=False, max_count=-1)[source]

Return a list of commits.

property lfs

Return a Git LFS manager.

move(*sources, destination, force=False)[source]

Move source files to the destination.

property path

Absolute path to the repository’s root.

pull(remote=None, refspec=None)[source]

Update changes from remotes.

push(remote=None, refspec=None, *, no_verify=False, set_upstream=False, delete=False, force=False)[source]

Push local changes to a remote repository.

property remotes

Return all remotes.

remove(*paths, index=False, not_exists_ok=False, recursive=False, force=False)[source]

Remove paths from repository or index.

remove_worktree(path)[source]

Create a git worktree.

Parameters:

path (Path) – Worktree folder.

reset(reference=None, hard=False)[source]

Reset a git repository to a given reference.

run_git_command(command, *args, **kwargs)[source]

Run a git command in this repository.

property staged_changes

Return a list of staged changes.

NOTE: This can be implemented by git diff --cached --name-status -z.

status()[source]

Return status of a repository.

property submodules

Return a list of submodules.

property tags

Return all available tags.

property unmerged_blobs

Return a map of path to stage and blob for unmerged blobs in the current index.

property unstaged_changes

Return a list of changes that are not staged.

property untracked_files

Return the list of untracked files.

class renku.infrastructure.repository.Branch(repository, path)[source]

Bases: Reference

A git branch.

classmethod from_head(repository, head)[source]

Create an instance from a git.Head.

property remote_branch

Return the remote branch if any.

class renku.infrastructure.repository.BranchManager(repository)[source]

Bases: object

Manage branches of a Repository.

add(name)[source]

Add a new branch.

remove(branch, force=False, remote=False)[source]

Remove an existing branch.

class renku.infrastructure.repository.Commit(repository, commit)[source]

Bases: object

A VCS commit.

property author

Author of the commit.

property authored_datetime

Commit authored date.

property committed_datetime

Commit date.

property committer

Committer of the commit.

compare_to(other)[source]

Return -1 if self is made before other.

classmethod from_commit(repository, commit)[source]

Create an instance from a git Commit object.

get_changes(*paths, commit=None, patch=False)[source]

Return list of changes in a commit.

NOTE: This function can be implemented with git diff-tree. NOTE: When patch is False Diff.diff will be empty. We need to call Commit.diff twice when patch is True because GitPython won’t set Diff.change_type in this case.

property hexsha

Commit sha.

property message

Commit message.

property parents

List of commit parents.

property root

Return True if this commit is the root commit.

traverse()[source]

Traverse over all objects that are present in this commit.

property tree

Return all objects in the commit’s tree.

class renku.infrastructure.repository.Configuration(repository=None, scope=None, writable=True)[source]

Bases: object

Git configuration manager.

get_value(section, option, default=None)[source]

Return a config value.

has_section(section)[source]

Return if config file has a section.

remove_value(section, option)[source]

Remove a config entry.

set_value(section, option, value=None)[source]

Set a config value.

class renku.infrastructure.repository.Diff(a_path, b_path, change_type, diff)[source]

Bases: NamedTuple

A single diff object between two trees.

Create new instance of Diff(a_path, b_path, change_type, diff)

a_path

Alias for field number 0

property added

True if file was added.

b_path

Alias for field number 1

change_type

Alias for field number 2

property deleted

True if file was deleted.

diff

Alias for field number 3

classmethod from_diff(diff)[source]

Create an instance from a git object.

class renku.infrastructure.repository.DiffChangeType(value)[source]

Bases: Enum

Type of change in a Diff.

class renku.infrastructure.repository.DiffLine(text, change_type)[source]

Bases: NamedTuple

A single line in a patch.

Create new instance of DiffLine(text, change_type)

property added

True if line was added.

change_type

Alias for field number 1

property deleted

True if line was deleted.

text

Alias for field number 0

class renku.infrastructure.repository.DiffLineChangeType(value)[source]

Bases: Enum

Type of change in a DiffLine.

class renku.infrastructure.repository.LFS(repository)[source]

Bases: object

Git LFS manager.

get_content(path, output_path)[source]

Get content from a given pointer file.

install(skip_smudge=True)[source]

Force install Git LFS in the repository.

is_pointer_file(path)[source]

Check if a file is an LFS pointer.

class renku.infrastructure.repository.Object(path, type, size, hexsha)[source]

Bases: NamedTuple

Represent a git object.

Create new instance of Object(path, type, size, hexsha)

classmethod from_object(object)[source]

Create an instance from a git object.

hexsha

Alias for field number 3

path

Alias for field number 0

size

Alias for field number 2

type

Alias for field number 1

class renku.infrastructure.repository.Reference(repository, path)[source]

Bases: object

A git reference.

property commit

Commit pointed to by the reference.

classmethod from_reference(repository, reference)[source]

Create an instance from a git reference.

is_valid()[source]

Return True if the reference is valid.

property name

Reference name.

property path

Reference path.

class renku.infrastructure.repository.Remote(repository, name)[source]

Bases: object

Remote of a Repository.

classmethod from_remote(repository, remote)[source]

Create an instance from a git remote.

property head

The head commit of the remote.

is_valid()[source]

Return True if remote exists.

property name

Remote’s name.

property references

Return a list of remote references.

set_url(url)[source]

Change URL of a remote.

property url

Remote’s URL.

class renku.infrastructure.repository.RemoteManager(repository)[source]

Bases: object

Manage remotes of a Repository.

add(name, url)[source]

Add a new remote.

get(remote)[source]

Return the given remote if it exists.

remove(remote)[source]

Remove an existing remote.

class renku.infrastructure.repository.RemoteReference(repository, path)[source]

Bases: Reference

A git remote reference.

property remote

Return reference’s remote.

class renku.infrastructure.repository.Repository(path='.', search_parent_directories=False, repository=None)[source]

Bases: BaseRepository

Abstract Base repository.

classmethod clone_from(url, path, *, branch=None, recursive=False, depth=None, progress=None, no_checkout=False, env=None, clone_options=None)[source]

Clone a remote repository and create an instance.

Since this is just a thin wrapper around GitPython note that the branch parameter can work and accept either a branch name or a tag. But it will not work with a commit SHA.

classmethod initialize(path, *, bare=False, branch=None)[source]

Initialize a git repository.

class renku.infrastructure.repository.Submodule(parent, name, path, url)[source]

Bases: BaseRepository

A git submodule.

classmethod from_submodule(parent, submodule)[source]

Create an instance from a git submodule.

property name

Return submodule’s name.

property relative_path

Relative submodule’s path to its parent repository.

property url

Return submodule’s url.

class renku.infrastructure.repository.SubmoduleManager(repository)[source]

Bases: object

Manage submodules of a Repository.

remove(submodule, force=False)[source]

Remove an existing submodule.

update(initialize=True)[source]

Update all submodule.

class renku.infrastructure.repository.SymbolicReference(repository, path)[source]

Bases: Reference

A git symbolic reference.

property detached

True if the reference is to a commit and not a branch.

property reference

Return the reference that this object points to.

class renku.infrastructure.repository.Tag(repository, path)[source]

Bases: Reference

A git tag.

property commit

Return the commit the tag refers to.

classmethod from_tag(repository, tag)[source]

Create an instance from a git.Head.

class renku.infrastructure.repository.TagManager(repository)[source]

Bases: object

Manage tags of a Repository.

add(name)[source]

Add a new tag.

remove(tag)[source]

Remove an existing tag.

renku.infrastructure.repository.git_unicode_unescape(s, encoding='utf-8')[source]

Undoes git/GitPython unicode encoding.

renku.infrastructure.repository.split_paths(*paths)[source]

Return a generator with split list of paths.