gfhub
gfhub
¶
DataLab.
Modules:
| Name | Description |
|---|---|
client |
DataLab Python SDK. |
entry |
Utilities. |
function |
Function class for defining DataLab functions from Python callables. |
nodes |
Helper functions for creating pipeline nodes. |
pipeline |
Friendly pipeline module. |
tags |
Utilities for working with tags. |
Classes:
| Name | Description |
|---|---|
Client |
DataLab client for managing files, functions, pipelines, and tags. |
Function |
A DataLab function defined from a Python callable. |
Pipeline |
A friendly pipeline representation. |
Functions:
| Name | Description |
|---|---|
get_settings |
Get settings from config files and environment variables. |
Client
¶
DataLab client for managing files, functions, pipelines, and tags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
host
|
str | None
|
The host URL of the DataLab server. If not provided, reads from settings (pyproject.toml or ~/.gdsfactory/gdsfactoryplus.toml). |
None
|
api_key
|
str | None
|
Optional API key for authentication. If not provided, reads from settings (only ~/.gdsfactory/gdsfactoryplus.toml, not local config). |
None
|
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
host
|
str | None
|
The host URL of the DataLab server. Falls back to settings. |
None
|
api_key
|
str | None
|
Optional API key. Falls back to settings. |
None
|
Methods:
| Name | Description |
|---|---|
add_file |
Upload a file to DataLab. |
add_function |
Add or update a Python function. |
add_pipeline |
Add or update a pipeline. |
add_tag |
Add or update a tag. |
delete_file |
Delete a file by ID. |
disable_pipeline |
Disable a pipeline. |
download_file |
Download a file by upload ID. |
enable_pipeline |
Enable a pipeline. |
get_job |
Get job details by ID. |
get_jobs |
Get multiple jobs by IDs (batch). |
list_functions |
List all functions in the organization. |
pipeline_url |
Get the pipeline url for a certain pipeline id. |
query_files |
Query files by name pattern and/or tags. |
trigger_pipeline |
Trigger a pipeline manually with one or more files. |
url |
Get the full URL for a given path. |
wait_for_job |
Wait for a job to complete (SUCCESS or FAILED status). |
wait_for_jobs |
Wait for multiple jobs to complete. |
Attributes:
Source code in python/gfhub/client.py
add_file
¶
add_file(data: str | Path | BinaryIO | DataFrame, tags: Iterable[str] = (), *, filename: str | None = None) -> dict
Upload a file to DataLab.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
str | Path | BinaryIO | DataFrame
|
The data to upload. Can be: - str/Path: Path to a file to upload - BinaryIO: File-like object (e.g., io.BytesIO) - pandas.DataFrame: Will be converted to Parquet format |
required |
tags
|
Iterable[str]
|
Optional list of tags to apply to the file. Tags can be simple names (e.g., "raw") or parameter tags with "key:value" format (e.g., "raw:3"). |
()
|
filename
|
str | None
|
Optional filename to use on the server. Required when uploading from BinaryIO or DataFrame. Optional when uploading from a path (defaults to the actual filename). |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary containing the upload response with file metadata. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the file upload fails. |
ValueError
|
If filename is not provided when uploading from BinaryIO or DataFrame. |
Source code in python/gfhub/client.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | |
add_function
¶
add_function(function: str | Path | Function | Callable, *, name: str = '', update: bool = True) -> dict
Add or update a Python function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
function
|
str | Path | Function | Callable
|
The function to upload. One of: - str: Python script content (if contains newlines) or path string - Path: Path to a Python script file - Function: A Function instance created from a Python callable - Callable: A callable to be uploaded as function (no import dependencies allowed!) |
required |
name
|
str
|
override the name of the function (must be given when function is Callable | str (script content)) |
''
|
update
|
bool
|
If True, updates the function if it already exists. If False, raises an error on conflict. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary containing the function response with metadata. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the function validation or upload fails. |
Examples:
Upload from a file path:
Upload from a Function instance:
from gfhub import Function
def analyze(input_path: Path, /, *, threshold: float = 0.5) -> dict:
df = pd.read_parquet(input_path)
result = df[df["value"] > threshold]
output = input_path.with_suffix(".filtered.parquet")
result.to_parquet(output)
return {"output": output}
func = Function(
analyze, dependencies={"pandas>=2.0": "import pandas as pd"}
)
client.add_function(func)
Source code in python/gfhub/client.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | |
add_pipeline
¶
Add or update a pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the pipeline. |
required |
schema
|
dict | str | Pipeline
|
Either a dict or JSON string containing the pipeline schema. The schema uses JsonNode/JsonEdge format: - nodes: List of nodes with name, type, and settings - edges: List of edges connecting nodes |
required |
update
|
bool
|
If True, updates the pipeline if it already exists. If False, raises an error on conflict. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary containing the pipeline response with metadata. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the pipeline creation or update fails. |
Examples:
schema = {
"nodes": [
{
"name": "to_parquet",
"type": "function",
"settings": {
"function": "csv2parquet",
"settings": {}
}
}
],
"edges": []
}
client.add_pipeline("csv_converter", schema)
Source code in python/gfhub/client.py
add_tag
¶
Add or update a tag.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the tag. |
required |
color
|
str
|
Hex color code for the tag (e.g., "#ef4444"). |
required |
update
|
bool
|
If True, updates the tag if it already exists. If False, raises an error on conflict. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary containing the tag response with metadata. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the tag creation or update fails. |
Source code in python/gfhub/client.py
delete_file
¶
delete_file(file_id: str) -> None
Delete a file by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_id
|
str
|
ID of the file to delete. |
required |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the deletion fails. |
Examples:
Source code in python/gfhub/client.py
disable_pipeline
¶
disable_pipeline(pipeline_id: str) -> None
Disable a pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline_id
|
str
|
ID of the pipeline. |
required |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the operation fails. |
Examples:
Source code in python/gfhub/client.py
download_file
¶
Download a file by upload ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
upload_id
|
str
|
ID of the file to download. |
required |
output
|
str | Path | BinaryIO | None
|
Where to write the file. Can be: - str/Path: File path to write to - File handle opened in binary mode (e.g., open('file', 'wb')) - io.BytesIO: BytesIO buffer to write to - None: Return new BytesIO buffer with file contents |
None
|
Returns:
| Type | Description |
|---|---|
BinaryIO | Path
|
None if output is a path or file handle, io.BytesIO if output is None. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the download fails. |
Examples:
Download to file path:
Download to file handle:
Download to BytesIO:
Get BytesIO directly:
Source code in python/gfhub/client.py
enable_pipeline
¶
enable_pipeline(pipeline_id: str) -> None
Enable a pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline_id
|
str
|
ID of the pipeline. |
required |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the operation fails. |
Examples:
Source code in python/gfhub/client.py
get_job
¶
Get job details by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job ID to retrieve |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Job details including status, inputs, outputs, timestamps, etc. |
Examples:
job = client.get_job("job_123")
print(job["status"]) # QUEUED, RUNNING, SUCCESS, or FAILED
print(job["pipeline_name"]) # Name of the pipeline
Source code in python/gfhub/client.py
get_jobs
¶
Get multiple jobs by IDs (batch).
Parameters:
Returns:
Examples:
jobs = client.get_jobs(["job_123", "job_456"])
for job in jobs:
print(job["status"]) # QUEUED, RUNNING, SUCCESS, or FAILED
Source code in python/gfhub/client.py
list_functions
¶
List all functions in the organization.
Returns:
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If listing functions fails. |
Source code in python/gfhub/client.py
pipeline_url
¶
query_files
¶
Query files by name pattern and/or tags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str | None
|
Optional filename pattern to filter by. Supports glob patterns: - Exact match: "lattice.gds" (case-insensitive) - Glob pattern: ".csv", "data.parquet", "lattice*" |
None
|
tags
|
Iterable[str]
|
Optional list of tags to filter by. Files must have ALL given tags. Supports wildcards (e.g., "wafer_id:*") to match any parameter value. |
()
|
Returns:
| Type | Description |
|---|---|
Entries
|
Dictionary containing a list of matching files with their metadata. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the query fails. |
Examples:
# Find all CSV files by extension tag
client.query_files(tags=[".csv"])
# Find files by exact name (case-insensitive)
client.query_files(name="lattice.gds")
# Find files by glob pattern
client.query_files(name="*.csv")
client.query_files(name="data*.parquet")
# Find files with specific parameter values
client.query_files(tags=["wafer_id:wafer1", ".parquet"])
# Combine name pattern and tags
client.query_files(name="*.parquet", tags=["wafer_id:*"])
# Get all files
client.query_files()
Source code in python/gfhub/client.py
trigger_pipeline
¶
Trigger a pipeline manually with one or more files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline_name
|
str
|
Name of the pipeline to trigger. |
required |
upload_ids
|
str | Iterable[str]
|
Single upload id or list of upload IDs to process. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary containing the job metadata with job ID. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the pipeline trigger fails or pipeline not found. |
Examples:
Trigger with single file:
Trigger with multiple files:
Source code in python/gfhub/client.py
url
¶
wait_for_job
¶
Wait for a job to complete (SUCCESS or FAILED status).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job ID to wait for |
required |
timeout
|
Maximum seconds to wait (default: 300) |
required | |
poll_interval
|
float
|
Seconds between polls (default: 1.0) |
1.0
|
Returns:
| Type | Description |
|---|---|
dict
|
Final job details with status SUCCESS or FAILED |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If job is not found |
Examples:
job = client.trigger_pipeline("csv2json", "upload_123")
final_job = client.wait_for_job(job["id"])
print(final_job["status"]) # SUCCESS or FAILED
if final_job["status"] == "SUCCESS":
print(final_job["output_filenames"])
Source code in python/gfhub/client.py
wait_for_jobs
¶
Wait for multiple jobs to complete.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_ids
|
list[str]
|
List of job IDs to wait for |
required |
poll_interval
|
float
|
Seconds between polling cycles (default: 1.0) |
1.0
|
Returns:
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If any job is not found |
Examples:
jobs = client.wait_for_jobs(["job_123", "job_456"])
for job in jobs:
print(job["status"]) # SUCCESS or FAILED
Source code in python/gfhub/client.py
Function
¶
A DataLab function defined from a Python callable.
This class wraps a Python function and its dependencies, validates that all undefined globals are covered by the provided imports, and generates a uv-style script for upload to DataLab.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable
|
A Python function to upload. Must have a valid signature with
positional-only input parameters and keyword-only config parameters:
|
required |
dependencies
|
dict[str, str | list[str]] | None
|
A dict mapping package specs to import statement(s). The package spec can include version constraints (e.g., "pandas>=2.0"). The import statement(s) define what names become available. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dependencies don't cover all undefined globals used in the function body. |
Examples:
def analyze(input_path: Path, /, *, threshold: float = 0.5) -> dict:
df = pd.read_parquet(input_path)
result = df[df["value"] > threshold]
output = input_path.with_suffix(".filtered.parquet")
result.to_parquet(output)
return {"output": output}
func = Function(
analyze,
dependencies={"pandas>=2.0": "import pandas as pd"},
)
client.add_function("filter_data", func)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable
|
The Python function to wrap. |
required |
dependencies
|
dict[str, str | list[str]] | None
|
Dict mapping package specs to import statement(s). |
None
|
Methods:
| Name | Description |
|---|---|
eval |
Evaluate the function locally using uv run. |
to_script |
Generate a uv-style Python script. |
Attributes:
| Name | Type | Description |
|---|---|---|
dependencies |
dict[str, list[str]]
|
The normalized dependencies dict. |
func |
Callable
|
The wrapped Python function. |
name |
str
|
The function name. |
undefined_globals |
set[str]
|
The set of undefined globals found in the function. |
Source code in python/gfhub/function.py
undefined_globals
property
¶
The set of undefined globals found in the function.
eval
¶
Evaluate the function locally using uv run.
This runs the function in a subprocess with uv run --script, which
will automatically install the required dependencies. The execution
mirrors how the backend runs functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*inputs
|
Any
|
Positional inputs to pass to the function. Path objects will be resolved to absolute paths. Other types (int, float, str, dict, list) are passed as-is. |
()
|
**kwargs
|
Any
|
Keyword parameters to pass to the function. |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary mapping output names to output values. Path strings |
dict[str, Any]
|
in the output are converted back to Path objects. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the function execution fails. |
Examples:
func = Function(analyze, dependencies={"pandas": "import pandas as pd"})
result = func.eval(Path("input.parquet"), threshold=0.5)
print(result)
# {"output": Path("/tmp/.../output.parquet")}
Source code in python/gfhub/function.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 | |
to_script
¶
to_script() -> str
Generate a uv-style Python script.
Returns:
| Type | Description |
|---|---|
str
|
A string containing the complete uv script with dependency |
str
|
metadata, imports, and the function definition. |
Source code in python/gfhub/function.py
Pipeline
¶
A friendly pipeline representation.
Methods:
| Name | Description |
|---|---|
on_file_upload |
Create a pipeline that triggers on file upload. |
to_dict |
Convert pipeline to dict representation. |
Source code in python/gfhub/pipeline.py
on_file_upload
classmethod
¶
Create a pipeline that triggers on file upload.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
function_name
|
str
|
The name of the function node to use for plotting. |
required |
tags
|
Iterable[str]
|
The tags to filter file uploads on. |
required |
kwargs
|
dict | None
|
Additional keyword arguments to pass to the function node. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A pipeline that triggers on file upload. |
Source code in python/gfhub/pipeline.py
get_settings
¶
Get settings from config files and environment variables.
Returns:
| Type | Description |
|---|---|
str
|
Tuple of (api_key, host) read from: |
str
|
|
tuple[str, str]
|
|
tuple[str, str]
|
|
Priority: env vars > local > global (api_key only from global/env)