본문 바로가기
Internship Project/Data Labeling and Update Tool 2025. 2. 24.

(25.02.18) User Scenario

Key Features

  • Data Retrieval & Access
    • Retrieve data by unique ID or fetch all data.
    • Search and filter datasets based on various criteria.
    • View the latest update log for specific data upon retrieval.
  • Data Modification & Labeling
    • Update values of fields.
      • Modify entire field values.
      • Replace words (or column values) with labels.
    • Perform CRUD operations on labels (create, update, delete annotations).
  • Logging & Version Control
    • Track modifications with detailed logs (who modified, when, and what changed).
    • Retrieve previous versions of modified data.
    • Revert to an earlier version if necessary.
  • Role-Based Access Control
    • Log in with assigned roles (reviewer, admin).
    • Access only assigned datasets based on user roles.
    • Register users with different roles and manage access permissions.
  • Data Organization & Assigning
    • Sort and group datasets based on various criteria.
    • Admin assigns dataset groups (and specific rows) to users.
    • Allow multiple users to label and modify data simultaneously.
  • Reference Template Uploading
    • Upload reference template data (e.g., no_sql_template, sql_template) for easy access and usage.
    • Admin can upload new templates or modify existing ones.
    • Associate templates with specific datasets for quick reference during labeling and data modification.

User Scenario

  • Personas(By Role)
Role Description
admin Manages data organization(assigning data to users ), user roles, logging, and reports and access control.
reviewer(labeler) Responsible for data labeling, modification, and review, ensuring data accuracy and consistency.

Admin Scenario

(Grouping & Sorting Datasets, Assigning to Users)

Step 1: Log in (Sign in) & Selecting Data

  1. Log in with administrator credentials(ID/PW).
  2. Retrieve all available datasets and samples.
  3. Select a dataset.

Step 2: Sorting & Grouping Data

  1. Sort sample data based on status(data was updated or not), Sample ID, SQL Template Type(number) or other custom criteria.
  2. Select the specific sample datas.
  3. Create dataset groups with the sample datas & Dataset Metadata(e.g. Assigned User, Due date, Description) for specific annotation tasks.

Step 3: Assigning Data to Users

  1. Select a dataset group.
  2. Assign it to specific users (reviewers/labelers).
  3. Confirm assignments.

Step 4: Monitoring & Tracking Assignment Completion

  1. Select a dataset group to monitor.
  2. Check the progress status (Updated/Not Updated) of each sample within the dataset.
  3. Filter/Sort the dataset to display status of samples.
  4. Confirm the dataset.

Step 5: Monitoring & Tracking User Assignment Completion

  1. Select a reviewer/labeler to track their assigned datasets.
  2. Check the status (Updated/Not Updated) of each dataset group assigned to the user.
  3. Can leave comments(or request a re-label/re-update) to the user.
  4. Confirm the dataset.

 

Reviewer (Labeler) Scenario

(Reviewing & Updating Assigned Data)

Step 1: Log in & Accessing Assigned Data

  1. Log in with assigned credentials(ID/PW).
  2. Check(Navigate to) Assigned Datasets of the user.
  3. Retrieve datasets assigned by the admin.
  4. Can Search or filter datasets based on template no, status, metadata etc.

Step 2: Updating & Labeling Data & Checking Validation

  1. Select a dataset and retrieve the sample datas.
  2. Select a specific sample of the dataset.
  3. Views the existing column fields values.
  4. Update the values :
    • Full modification (replacing the entire value).
    • Labeling (replacing column values with predefined labels)
    • Pass/Skip/Confirmed (no need to modify or update)
  5. Reviews modifications and verifies query correctness.

Step 3: Submitting Updates to Admin

  1. Submit the updated sample data of the dataset.
  2. System logs the modification details (who, when, contents).
  3.  

Discussion Points: Labeling Feature

  • Should labels be CRUD individually?
  • Should all labels be uploaded in advance in bulk (at once) and then used for labeling tasks?
  • Should labels (column names, etc) be generated based on templates to avoid the need for manual uploads?
  • How can related labels be easily selected on the front-end? </aside>