query_files is the primary way to search and retrieve files stored in DataLab. You can filter by filename using glob patterns, by tags, or combine both. Every file object in the result includes an id field, which is the unique identifier you will use in later tutorials to download, delete, or feed files into a pipeline.

Setup

import gfhub

# Reads host and API key from ~/.gdsfactory/gdsfactoryplus.toml or environment variables.
# You can also pass them explicitly: gfhub.Client(host="http://...", api_key="...")
client = gfhub.Client()

Get all files

Calling query_files() with no arguments returns every file in your organization.

files = client.query_files()

print(f"Total files: {len(files)}")
for f in files[:5]:
    print(f"  - {f['original_name']}")
Total files: 1766
  - spirals_manifest.csv
  - spirals.gds
  - rings.gds
  - wafer_map.png
  - wafer_map.png

Inspect a file object

Each entry in the result is a dictionary with the following key fields:

Field Description
id Unique identifier. Use this to download, delete, or trigger pipelines
name / original_name Filename as uploaded
mime_type Detected MIME type (e.g. text/csv, application/octet-stream)
status File availability (e.g. Available)
file_size File size in bytes
created_at Upload timestamp (ISO 8601)
tags Dict of tags keyed by tag name. Each tag has id, name, color, and optionally parameter_value for parameter tags
pipelines List of pipelines this file is connected to. Each entry has an id and name (covered in the pipelines tutorial)
import json

if files:
    print(json.dumps(files[0], indent=2, default=str))
{
  "id": "019df3b5-4c9a-7692-a397-84adcafb7f99",
  "name": "spirals_manifest.csv",
  "original_name": "spirals_manifest.csv",
  "mime_type": "text/csv",
  "status": "Available",
  "created_at": "2026-05-04T15:57:18.362846Z",
  "file_size": 2525,
  "tags": {
    ".csv": {
      "id": "019d234b-e0f7-7901-a8b4-355e6c3cb2e9",
      "name": ".csv",
      "color": "#06b6d4"
    },
    "project": {
      "id": "019d3dea-adf4-7101-aa1c-0c89b8ebe7af",
      "name": "project",
      "color": "#0ea5e9",
      "parameter_value": "tutorial_spirals"
    },
    "runner": {
      "id": "019daa71-3dd8-75f2-a28c-405aa78a9eb6",
      "name": "runner",
      "color": "#ef4444"
    }
  },
  "pipelines": [
    {
      "id": "019df1ff-0e3f-7f71-aa82-c7559651cc3f",
      "name": "spiral_propagation_loss"
    },
    {
      "id": "019df1ff-c078-77c1-89c8-863d7c6e37cd",
      "name": "rib_wafer_analysis"
    },
    {
      "id": "019d3e8c-ef44-74c2-85b4-28be0081d0b3",
      "name": "plot_parquet"
    },
    {
      "id": "019df1ff-f27c-7603-b328-fb3eadd63605",
      "name": "ridge_wafer_analysis"
    },
    {
      "id": "019d4ee5-a5a5-76f3-be4e-2e4508b3b49b",
      "name": "Test Plot - Alejandro"
    },
    {
      "id": "019da9e2-1812-7d13-808f-077b937502e8",
      "name": "debug_noop_pipe_5880f52e"
    },
    {
      "id": "019df1ef-aa10-7740-8bfd-82710e1f0389",
      "name": "plot_parquet_tutorial"
    },
    {
      "id": "019df1ef-9eef-7551-9370-ad82ff490089",
      "name": "plot_spiral_spectrum"
    },
    {
      "id": "019df1ef-afcc-7f21-89be-f253124e630d",
      "name": "plot_ring_spectrum"
    },
    {
      "id": "019df1ef-aff0-7493-a06e-02422c2c86a8",
      "name": "csv_to_parquet_pipeline_example"
    },
    {
      "id": "019df1ef-f2ea-7bb0-aa44-2de295783fc9",
      "name": "cutback_die_analysis"
    },
    {
      "id": "019df1f0-1a86-7f63-8279-b07f07354aef",
      "name": "iv_resistance_fit"
    },
    {
      "id": "019df1f0-3775-7882-abf5-e409302db131",
      "name": "rings_fsr_analysis"
    },
    {
      "id": "019df1f0-c617-7f62-966e-ea22a97ecaa8",
      "name": "die_fsr_aggregation"
    },
    {
      "id": "019df1f3-1311-7281-b022-6572b003bad7",
      "name": "wafer_fsr_aggregation"
    },
    {
      "id": "019df1f1-9cad-7133-abb4-9c059a62cca4",
      "name": "spiral_device_analysis"
    },
    {
      "id": "019df1f3-3cc0-79f1-a92c-714269fcfdef",
      "name": "die_sheet_resistance"
    },
    {
      "id": "019df1fe-f384-7890-b599-f731b0bdaa83",
      "name": "aggregate_die_analyses"
    }
  ]
}

Filter by name

The name parameter supports glob patterns (case-insensitive). Use * to match any sequence of characters.

# Exact match (case-insensitive)
files = client.query_files(name="lattice.gds")
print(f"Exact match: {len(files)} file(s)")
for f in files:
    print(f"  - {f['original_name']}")
Exact match: 0 file(s)
# Glob patterns
for pattern in ["*.gds", "waveguide*.csv"]:
    matches = client.query_files(name=pattern)
    print(f"  '{pattern}' → {len(matches)} file(s)")
  '*.gds' → 35 file(s)
  'waveguide*.csv' → 1 file(s)

Filter by tags

Tags are labels attached to files at upload time. Simple tags are plain labels like "raw" or "reviewed". Parameter tags carry a value in the format "key:value", for example "wafer_id:wafer1". To query files that have a parameter tag regardless of its value, use just the key: "wafer_id". To filter for a specific value, use the full form: "wafer_id:wafer1".

Extension tags like .gds, .csv, and .parquet are applied automatically based on file type.

When you pass multiple tags, a file must have all of them to be returned.

# Filter by file extension tag (auto-applied at upload)
for ext in [".gds", ".csv", ".parquet"]:
    matches = client.query_files(tags=[ext])
    print(f"  '{ext}' → {len(matches)} file(s)")
  '.gds' → 35 file(s)


  '.csv' → 23 file(s)


  '.parquet' → 481 file(s)
# Filter by a named tag (applied manually at upload)
files = client.query_files(tags=["components"])
print(f"Files tagged 'raw': {len(files)}")
for f in files[:5]:
    print(f"  - {f['original_name']}")
Files tagged 'raw': 409
  - cutback_device_400.png
  - cutback_device_16.png
  - cutback_device_16.png
  - cutback_device_816.png
  - cutback_device_16.png
# Parameter tag — any value (query by tag name alone)
files = client.query_files(tags=["wafer_id"])
print(f"Files with any wafer_id: {len(files)}")
for f in files:
    tag = f["tags"].get("wafer_id", {})
    wafer_value = tag.get("parameter_value", "?")
    print(f"  - {f['original_name']}  (wafer_id={wafer_value})")

# Parameter tag — exact value
files = client.query_files(tags=["wafer_id:1"])
print(f"\nFiles for wafer1: {len(files)}")
for f in files:
    print(f"  - {f['original_name']}")
Files with any wafer_id: 16
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sample_data.csv  (wafer_id=1)
  - waveguide_transmission.csv  (wafer_id=2)

Files for wafer1: 1
  - sample_data.csv

Combine name and tag filters

Both filters can be used together. Only files that match the name pattern and have all the specified tags will be returned.

# Parquet measurement files for a specific wafer
files = client.query_files(name="*.csv", tags=["wafer_id:1"])

print(f"Found {len(files)} file(s):")
for f in files:
    tag_names = list(f["tags"].keys())
    print(f"  - {f['original_name']}  (id={f['id']}, tags={tag_names})")
Found 1 file(s):
  - sample_data.csv  (id=019d6283-90fe-7c70-8602-8f6fc2c0e7ba, tags=['.csv', 'test', 'wafer_id'])