Naming Conventions

Employing a unique naming strategy per resource ensures that the spec is followed uniformly regardless of metadata producer.

Jobs and Datasets have their own namespaces, job namespaces being derived from schedulers and dataset namespaces from datasources.

Dataset Naming

A dataset, or table, is organized according to a producer, namespace, database and (optionally) schema.

Data Store	Type	Namespace	Name	Format
Athena	Warehouse	Host	Catalog, Database, Table	awsathena://athena.{region_name}.amazonaws.com/{catalog}.{database}.{table}
Azure Cosmos DB	Warehouse	Host, Database	Schema, Table	azurecosmos://{host}/dbs/{database}/colls/{table}
Azure Data Explorer	Warehouse	Host	Database, Table	azurekusto://{host}.kusto.windows.net/{database}/{table}
Azure Synapse	Warehouse	Host, Port, Database	Schema, Table	sqlserver://{host}:{port};database={database}/{schema}.{table}
BigQuery	Warehouse	bigquery	Project ID, dataset, table	bigquery://{project id}.{dataset name}.{table name}
Cassandra	Warehouse	Host, Port	Keyspace, Table	cassandra://{host}:{port}/{keyspace}.{table}
MySQL	Warehouse	Host, Port	Database, Table	mysql://{host}:{port}/{database}.{table}
Postgres	Warehouse	Host, Port	Database, Schema, Table	postgres://{host}:{port}/{database}.{schema}.{table}
Redshift	Warehouse	Host, Port	Database, Schema, Table	redshift://{cluster_identifier}.{region_name}:{port}/{database}.{schema}.{table}
Snowflake	Warehouse	account identifier (composite of organization name and account name)	Database, Schema, Table	snowflake://{organization name}-{account name}/{database}.{schema}.{table}
Trino	Warehouse	Host, Port	Catalog, Schema, Table	trino://{host}:{port}/{catalog}.{schema}.{table}
ABFSS (Azure Data Lake Gen2)	Data lake	container, service	path	abfss://{container name}@{service name}/{path}
DBFS (Databricks File System)	Distributed file system	workspace	path	hdfs://{workspace name}/{path}
GCS	Blob storage	bucket	path	gs://{bucket name}/{path}
HDFS	Distributed file system	Namenode host and port	path	hdfs://{namenode host}:{namenode port}/{path}
Kafka	distributed event streaming platform	bootstrap server host and port	topic	kafka://{bootstrap server host}:{port}/{topic name}
Local file system	File system	Host	Path	file://{host}/{path}
S3	Blob Storage	bucket name	path	s3://{bucket name}/{path}
WASBS (Azure Blob Storage)	Blob Storage	container, service	path	wasbs://{container name}@{service name}/{path}

Job Naming

A Job is a recurring data transformation with inputs and outputs. Each execution is captured as a Run with corresponding metadata. A Run event identifies the Job it instances by providing the job’s unique identifier. The Job identifier is composed of a Namespace and Name. The job name is unique within its namespace.

Producer	Formula	Example
Airflow	namespace + DAG + task	airflow-staging.orders_etl.count_orders
SQL	namespace + name	gx.validate_datasets

Run Naming

Runs are named using client-generated UUIDs. The OpenLineage client is responsible for generating them and maintaining them throughout the duration of the runcycle.

from openlineage.client.run import Run
from openlineage.client.uuid import generate_new_uuid
run = Run(str(generate_new_uuid()))

Why Naming Matters

Naming enables focused insight into data flows, even when datasets and workflows are distributed across an organization. This focus enabled by naming is key to the production of useful lineage.

Naming Conventions

Dataset Naming​

Job Naming​

Run Naming​

Why Naming Matters​

Additional Resources​

Dataset Naming

Job Naming

Run Naming

Why Naming Matters

Additional Resources