Renku Python API
The following sections describe the Renku Python API. If you work with the R programming language, you can also use this API through the reticulate package. For more information, visit our dedicated tutorial.
Activity
Renku API Activity.
Activity represents executed workflows in a Renku project. You can get a
list of all activities in a project by calling its list method:
from renku.api import Activity
activities = Activity.list()
The Activity class provides a static filter method that returns a
subset of activities. It can filter activities based on their input, outputs,
parameter names, and parameter values. You can pass a literal value, a list of
values, or a function predicate for each of these fields to filter activities:
from numbers import Number
from renku.api import Activity
# Return activities that use ``path/to/an/input``
Activity.filter(inputs="path/to/an/input")
# Return activities that use ``input-1`` or ``input-2`` AND generate
# output files that their name starts with ``data-``
Activity.filter(inputs=["input-1", "input-2"], outputs=lambda path: path.startswith("data-"))
# Return activities that use values between ``0.5`` and ``1.5`` for the
# parameter ``lr``
Activity.filter(parameters="lr", values=lambda value: 0.5 <= value <= 1.5 if isinstance(value, Number) else False)
Dataset
Renku API Dataset.
Dataset class allows listing datasets and files inside a Renku project and accessing their metadata.
To get a list of available datasets in a Renku project use list method:
from renku.api import Dataset
datasets = Dataset.list()
You can then access metadata of a dataset like name, slug,
keywords, etc. To get the list of files inside a dataset use files
property:
for dataset_file in dataset.files:
print(dataset_file.path)
Inputs, Outputs, and Parameters
Renku API Workflow Models.
Input and Output classes can be used to define inputs and outputs of a script
within the same script. Paths defined with these classes are added to explicit
inputs and outputs in the workflow’s metadata. For example, the following
mark a data/data.csv as an input with name my-input to the script:
from renku.api import Input
with open(Input("my-input", "data/data.csv")) as input_data:
for line in input_data:
print(line)
Users can track parameters’ values in a workflow by defining them using
Parameter function.
from renku.api import Parameter
nc = Parameter(name="n_components", value=10)
print(nc.value) # 10
Once a Parameter is tracked like this, it can be set normally in commands like
renku workflow execute with the --set option to override the value.
Plan, CompositePlan
Renku API Plan.
Plan and CompositePlan classes represent Renku workflow plans executed
in a Project. Each of these classes has a static list method that returns a
list of all active plans/composite-plans in a project:
from renku.api import Plan
plans = Plan.list()
composite_plans = CompositePlan.list()
Project
Renku API Project.
Project class acts as a context for other Renku entities like Dataset, or Inputs/Outputs. It provides access to internals of a Renku project for such entities.
Normally, you do not need to create an instance of Project class directly unless you want to have access to Project metadata (e.g. path) or get its status. To separate parts of your script that uses Renku entities, you can create a Project context manager and interact with Renku inside it:
from renku.api import Project, Input
with Project():
input_1 = Input("input_1", "path_1")
You can use Project’s status method to get info about outdated outputs and
activities, and modified or deleted inputs:
from renku.api import Project
outdated_generations, outdated_activities, modified_inputs, deleted_inputs = Project().status()
RDF Graph
Renku RDF Graph API.
The RDFGraph class allows for the quick creation of a searchable graph object
based on the project’s metadata.
To create the graph and query it:
from renku.ui.api import RDFGraph
g = RDFGraph()
# get a list of contributors to the project
list(g.subjects(object=URIRef("http://schema.org/Person")))
For more information on querying the graph, see the RDFLib documentation.