Internship Project/Data Labeling and Update Tool 2025. 2. 24.

(25.02.18) User Scenario

Key Features

Data Retrieval & Access
- Retrieve data by unique ID or fetch all data.
- Search and filter datasets based on various criteria.
- View the latest update log for specific data upon retrieval.
Data Modification & Labeling
- Update values of fields.
  - Modify entire field values.
  - Replace words (or column values) with labels.
- Perform CRUD operations on labels (create, update, delete annotations).
Logging & Version Control
- Track modifications with detailed logs (who modified, when, and what changed).
- Retrieve previous versions of modified data.
- Revert to an earlier version if necessary.
Role-Based Access Control
- Log in with assigned roles (reviewer, admin).
- Access only assigned datasets based on user roles.
- Register users with different roles and manage access permissions.
Data Organization & Assigning
- Sort and group datasets based on various criteria.
- Admin assigns dataset groups (and specific rows) to users.
- Allow multiple users to label and modify data simultaneously.
Reference Template Uploading
- Upload reference template data (e.g., no_sql_template, sql_template) for easy access and usage.
- Admin can upload new templates or modify existing ones.
- Associate templates with specific datasets for quick reference during labeling and data modification.

Role	Description
admin	Manages data organization(assigning data to users ), user roles, logging, and reports and access control.
reviewer(labeler)	Responsible for data labeling, modification, and review, ensuring data accuracy and consistency.

(Grouping & Sorting Datasets, Assigning to Users)

Sort sample data based on status(data was updated or not), Sample ID, SQL Template Type(number) or other custom criteria.
Select the specific sample datas.
Create dataset groups with the sample datas & Dataset Metadata(e.g. Assigned User, Due date, Description) for specific annotation tasks.

Select a dataset group to monitor.
Check the progress status (Updated/Not Updated) of each sample within the dataset.
Filter/Sort the dataset to display status of samples.
Confirm the dataset.

Select a reviewer/labeler to track their assigned datasets.
Check the status (Updated/Not Updated) of each dataset group assigned to the user.
Can leave comments(or request a re-label/re-update) to the user.
Confirm the dataset.

(Reviewing & Updating Assigned Data)

Select a dataset and retrieve the sample datas.
Select a specific sample of the dataset.
Views the existing column fields values.
Update the values :
- Full modification (replacing the entire value).
- Labeling (replacing column values with predefined labels)
- Pass/Skip/Confirmed (no need to modify or update)
Reviews modifications and verifies query correctness.

Should labels be CRUD individually?
Should all labels be uploaded in advance in bulk (at once) and then used for labeling tasks?
Should labels (column names, etc) be generated based on templates to avoid the need for manual uploads?
How can related labels be easily selected on the front-end? </aside>

(25.02.26) ERD - Simplified (0)	2025.02.26
(25.02.20) ERD (Even Sourcing Based) (0)	2025.02.24
(25.02.19) ERD (draft) (0)	2025.02.24
(25.02.11) Open Source Data Labeling & Updating Tool Ideation (1)	2025.02.24