Skip to main content

Data Quality Worflow

QALITA Platform places data quality management at the core 🎯 of its operation. This management is ensured and facilitated by a set of features that allow you to:

  1. Measure the quality 📏 of data and create analysis reports.
  2. Detect anomalies 🔍 and translate them into actionable items.
  3. Take corrective actions 🔧, track their execution, and measure their impact.

Data Quality Management Worflow

Measuring Data Quality (1/2/3)

1 Agent

The first step is to register an agent capable of communicating with the source.

2.1 Source

The second step is to register the source to be measured on the platform.

2.2 Pack

Next, you need to use a pack to run an analysis or create a pack for measuring data quality.

info

Packs are categorized based on the type of analysis they perform, for example:
pack:quality:completeness for measuring data completeness.

QALITA Platform provides a set of default packs, but new ones can also be created.

QALITA Platform supports the following pack types:

  • completeness: Measures data completeness.
  • validity: Measures data validity.
  • accuracy: Measures data accuracy.
  • timeliness: Measures data timeliness.
  • consistency: Measures data consistency.
  • uniqueness: Measures data uniqueness.
  • reasonability: Measures data reasonability.

3 Analysis

The final step is to use a pack to run an analysis on the source.

Analysis are made using a pack and an agent, producing metadata stored in QALITA Platform’s database.

  • metrics: quality indicators computed by the pack from the source data.
  • recommendations: improvement suggestions identified during the source analysis by the pack.
  • schema: a description of the data structure. This allows associating metrics and recommendations with data scopes.
  • logs: traces of the analysis, useful for understanding the computations performed by the pack.

Anomaly Detection (4)

Anomalies are detected by a data analyst or data manager using QALITA Platform’s graphical interface.
They are identified from the metrics and recommendations produced by analysis and then linked to a ticket.

4 Projects

For better organization, it is possible to create projects that group sources and analysis.

A project includes:

  • One or more data sources registered on the platform.
  • Data quality analysis reports.
  • Issues associated with the project’s data sources.

This is especially useful for data migration projects or research projects involving multiple data sources.

See the Project page for more details.

4.1 Reports

Reports provide a way to visualize the metadata of sources linked to a project.

They are generated by QALITA Platform and can be configured to display metrics and recommendations from one or more packs, on one or more sources.
They are also shareable, helping facilitate collaboration and information sharing between project stakeholders.

See the Reports page for more details.

4.2 Issues

Issues represent anomalies detected by a data analyst or data manager.
They are associated with a data source and created from the metrics and recommendations of analysis reports.

Recommendations are categorized by severity:

  • High: Data is unusable.
  • Warning: Data is usable but with risks.
  • Info: Data is usable but with limited risks.

Issues allow tracking anomalies and their corrective actions.
They can be managed directly within the platform or via a project management tool such as Jira or GitLab thanks to QALITA Platform’s integration with these tools.

Corrective Actions (5)

There are two possible types of anomalies:

  • Human-related anomaly (e.g., incorrect entry, misconfiguration). The data is wrong at the source and must be corrected.
  • Technical anomaly (e.g., transformation issue). The data is wrong after transformation and must be transformed, excluded, or fixed in ETL programs.

5.1 Technical

If the anomaly is technical, QALITA Platform does not directly correct it.
The ETL program causing the anomaly must be fixed (by a data engineer).

However, the correction process can be tracked by a data analyst or data manager using Issues, after which a new analysis can be launched to verify the fix.

5.2 Human

If the anomaly is human-related, QALITA Platform allows direct correction.

The original source is copied, and corrective actions are applied to the copy, keeping the original source intact.
A new version of the source is then created and analyzed to verify that the corrective actions were properly applied.

To know more about fixing human related data, checkout Studio Page