Glossary
A list of terms and definitions
- Source: A data source (files, databases, etc.).
- Pack: A microprogram that analyzes a source.
- Metrics: Results of a pack's analysis on a source. Example: completeness=0.8
- Worker: An instance of the CLI that executes packs on data sources. Workers are deployed on-premises and are the only components with direct access to your data.
- Routine: A recurring execution of a pack on a source, scheduled via CRON expressions.
- Scope: The scope of a source (dataset, table, column, cell, item, etc.) to which metrics or recommendations can refer.
- Ticket: A task to be undertaken by individuals; actions to be performed. Also referred to as "Issue".
- Project: An organizational entity that promotes collaboration by grouping information to ensure quality assurance across a set of sources.
- Report: A page presenting a set of information about the quality of one or more sources linked to a project. Exportable and shareable.
- Studio: The AI-powered conversational interface for exploring, investigating, and documenting data quality. Supports multiple LLM providers and streaming responses.
- LLM Configuration: A configuration entry defining which Large Language Model provider and model to use for Studio's AI features. Supports OpenAI, Azure OpenAI, Anthropic, Mistral, Ollama, and generic OpenAI-compatible APIs.
- MCP Server: A Model Context Protocol server that extends Studio's capabilities with custom external tools. MCP servers are configured per organization and communicate via JSON-RPC 2.0.
- Alert: A notification triggered when data quality scores fall below defined thresholds. Alerts can be configured per source and silenced when needed.
- Certification: A workflow for formally certifying the quality status of data sources based on defined criteria and quality metrics.
- Curation Plan: A structured plan to federate teams around data quality challenges, defining actions and responsibilities for improving data quality.
- Domain: The business domain associated with a data source (e.g., Finance, Marketing, Healthcare). Used for governance and catalog organization.
- Data Owner: The person or team responsible for the quality and governance of a data source.
- Habilitation: Granular permissions assigned to users on specific data sources, controlling access levels within the platform.
- Recommendation: A suggestion produced by a pack to improve the quality of a data source. Recommendations are attached to specific scopes.