query files

query_files is the primary way to search and retrieve files stored in DataLab. You can filter by filename using glob patterns, by tags, or combine both. Every file object in the result includes an id field, which is the unique identifier you will use in later tutorials to download, delete, or feed files into a pipeline.

Setup¶

import gfhub

# Reads host and API key from ~/.gdsfactory/gdsfactoryplus.toml or environment variables.
# You can also pass them explicitly: gfhub.Client(host="http://...", api_key="...")
client = gfhub.Client()

Get all files¶

Calling query_files() with no arguments returns every file in your organization.

files = client.query_files()

print(f"Total files: {len(files)}")
for f in files[:5]:
    print(f"  - {f['original_name']}")

Total files: 1766
  - spirals_manifest.csv
  - spirals.gds
  - rings.gds
  - wafer_map.png
  - wafer_map.png

Inspect a file object¶

Each entry in the result is a dictionary with the following key fields:

Field	Description
`id`	Unique identifier. Use this to download, delete, or trigger pipelines
`name` / `original_name`	Filename as uploaded
`mime_type`	Detected MIME type (e.g. `text/csv`, `application/octet-stream`)
`status`	File availability (e.g. `Available`)
`file_size`	File size in bytes
`created_at`	Upload timestamp (ISO 8601)
`tags`	Dict of tags keyed by tag name. Each tag has `id`, `name`, `color`, and optionally `parameter_value` for parameter tags
`pipelines`	List of pipelines this file is connected to. Each entry has an `id` and `name` (covered in the pipelines tutorial)

import json

if files:
    print(json.dumps(files[0], indent=2, default=str))

{
  "id": "019df3b5-4c9a-7692-a397-84adcafb7f99",
  "name": "spirals_manifest.csv",
  "original_name": "spirals_manifest.csv",
  "mime_type": "text/csv",
  "status": "Available",
  "created_at": "2026-05-04T15:57:18.362846Z",
  "file_size": 2525,
  "tags": {
    ".csv": {
      "id": "019d234b-e0f7-7901-a8b4-355e6c3cb2e9",
      "name": ".csv",
      "color": "#06b6d4"
    },
    "project": {
      "id": "019d3dea-adf4-7101-aa1c-0c89b8ebe7af",
      "name": "project",
      "color": "#0ea5e9",
      "parameter_value": "tutorial_spirals"
    },
    "runner": {
      "id": "019daa71-3dd8-75f2-a28c-405aa78a9eb6",
      "name": "runner",
      "color": "#ef4444"
    }
  },
  "pipelines": [
    {
      "id": "019df1ff-0e3f-7f71-aa82-c7559651cc3f",
      "name": "spiral_propagation_loss"
    },
    {
      "id": "019df1ff-c078-77c1-89c8-863d7c6e37cd",
      "name": "rib_wafer_analysis"
    },
    {
      "id": "019d3e8c-ef44-74c2-85b4-28be0081d0b3",
      "name": "plot_parquet"
    },
    {
      "id": "019df1ff-f27c-7603-b328-fb3eadd63605",
      "name": "ridge_wafer_analysis"
    },
    {
      "id": "019d4ee5-a5a5-76f3-be4e-2e4508b3b49b",
      "name": "Test Plot - Alejandro"
    },
    {
      "id": "019da9e2-1812-7d13-808f-077b937502e8",
      "name": "debug_noop_pipe_5880f52e"
    },
    {
      "id": "019df1ef-aa10-7740-8bfd-82710e1f0389",
      "name": "plot_parquet_tutorial"
    },
    {
      "id": "019df1ef-9eef-7551-9370-ad82ff490089",
      "name": "plot_spiral_spectrum"
    },
    {
      "id": "019df1ef-afcc-7f21-89be-f253124e630d",
      "name": "plot_ring_spectrum"
    },
    {
      "id": "019df1ef-aff0-7493-a06e-02422c2c86a8",
      "name": "csv_to_parquet_pipeline_example"
    },
    {
      "id": "019df1ef-f2ea-7bb0-aa44-2de295783fc9",
      "name": "cutback_die_analysis"
    },
    {
      "id": "019df1f0-1a86-7f63-8279-b07f07354aef",
      "name": "iv_resistance_fit"
    },
    {
      "id": "019df1f0-3775-7882-abf5-e409302db131",
      "name": "rings_fsr_analysis"
    },
    {
      "id": "019df1f0-c617-7f62-966e-ea22a97ecaa8",
      "name": "die_fsr_aggregation"
    },
    {
      "id": "019df1f3-1311-7281-b022-6572b003bad7",
      "name": "wafer_fsr_aggregation"
    },
    {
      "id": "019df1f1-9cad-7133-abb4-9c059a62cca4",
      "name": "spiral_device_analysis"
    },
    {
      "id": "019df1f3-3cc0-79f1-a92c-714269fcfdef",
      "name": "die_sheet_resistance"
    },
    {
      "id": "019df1fe-f384-7890-b599-f731b0bdaa83",
      "name": "aggregate_die_analyses"
    }
  ]
}

Filter by name¶

The name parameter supports glob patterns (case-insensitive). Use * to match any sequence of characters.

# Exact match (case-insensitive)
files = client.query_files(name="lattice.gds")
print(f"Exact match: {len(files)} file(s)")
for f in files:
    print(f"  - {f['original_name']}")

Exact match: 0 file(s)

# Glob patterns
for pattern in ["*.gds", "waveguide*.csv"]:
    matches = client.query_files(name=pattern)
    print(f"  '{pattern}' → {len(matches)} file(s)")

  '*.gds' → 35 file(s)
  'waveguide*.csv' → 1 file(s)

Filter by tags¶

Tags are labels attached to files at upload time. Simple tags are plain labels like "raw" or "reviewed". Parameter tags carry a value in the format "key:value", for example "wafer_id:wafer1". To query files that have a parameter tag regardless of its value, use just the key: "wafer_id". To filter for a specific value, use the full form: "wafer_id:wafer1".

Extension tags like .gds, .csv, and .parquet are applied automatically based on file type.

When you pass multiple tags, a file must have all of them to be returned.

# Filter by file extension tag (auto-applied at upload)
for ext in [".gds", ".csv", ".parquet"]:
    matches = client.query_files(tags=[ext])
    print(f"  '{ext}' → {len(matches)} file(s)")

  '.gds' → 35 file(s)


  '.csv' → 23 file(s)


  '.parquet' → 481 file(s)

# Filter by a named tag (applied manually at upload)
files = client.query_files(tags=["components"])
print(f"Files tagged 'raw': {len(files)}")
for f in files[:5]:
    print(f"  - {f['original_name']}")

Files tagged 'raw': 409
  - cutback_device_400.png
  - cutback_device_16.png
  - cutback_device_16.png
  - cutback_device_816.png
  - cutback_device_16.png

# Parameter tag — any value (query by tag name alone)
files = client.query_files(tags=["wafer_id"])
print(f"Files with any wafer_id: {len(files)}")
for f in files:
    tag = f["tags"].get("wafer_id", {})
    wafer_value = tag.get("parameter_value", "?")
    print(f"  - {f['original_name']}  (wafer_id={wafer_value})")

# Parameter tag — exact value
files = client.query_files(tags=["wafer_id:1"])
print(f"\nFiles for wafer1: {len(files)}")
for f in files:
    print(f"  - {f['original_name']}")

Files with any wafer_id: 16
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sweep_results.parquet  (wafer_id=wafer1)
  - measurement_example.csv  (wafer_id=wafer1)
  - sample_data.csv  (wafer_id=1)
  - waveguide_transmission.csv  (wafer_id=2)

Files for wafer1: 1
  - sample_data.csv

Combine name and tag filters¶

Both filters can be used together. Only files that match the name pattern and have all the specified tags will be returned.

# Parquet measurement files for a specific wafer
files = client.query_files(name="*.csv", tags=["wafer_id:1"])

print(f"Found {len(files)} file(s):")
for f in files:
    tag_names = list(f["tags"].keys())
    print(f"  - {f['original_name']}  (id={f['id']}, tags={tag_names})")

Found 1 file(s):
  - sample_data.csv  (id=019d6283-90fe-7c70-8602-8f6fc2c0e7ba, tags=['.csv', 'test', 'wafer_id'])