bugfree Icon
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course

Data Interview Question

Dynamic Customer Address Database

bugfree Icon

Hello, I am bugfree Assistant. Feel free to ask me for any question related to this problem

Answer

Designing a database schema to accommodate changes in customers' address information over time is crucial for maintaining data integrity and supporting historical data analysis. Here, we will explore various strategies to design such a schema effectively, focusing on the concept of Slowly Changing Dimensions (SCD).

1. Using Slowly Changing Dimension (SCD) Type 2

SCD Type 2 is ideal for tracking historical data over time. It involves adding a new row to the table whenever a customer's address changes, maintaining a full history of address changes.

Schema Design:

  • Customer Table:

    • customer_id (INT, Primary Key)
    • customer_name (VARCHAR)
    • other_customer_details
  • Customer Address Table:

    • address_id (INT, Primary Key)
    • customer_id (INT, Foreign Key)
    • street (VARCHAR)
    • city (VARCHAR)
    • state (VARCHAR)
    • postal_code (VARCHAR)
    • country (VARCHAR)
    • start_date (DATE)
    • end_date (DATE, NULL for current address)
    • active_flag (BOOLEAN, 1 for current, 0 for historical)

Advantages:

  • Keeps a complete history of address changes.
  • Allows for historical analysis and tracking customer movement over time.

Disadvantages:

  • Increases data storage requirements due to multiple rows per customer.
  • Requires careful querying to retrieve the current address.

2. Using a Separate Historical Address Table

This approach involves maintaining a separate table for historical addresses while keeping only the current address in the primary address table.

Schema Design:

  • Customer Table:

    • customer_id (INT, Primary Key)
    • customer_name (VARCHAR)
    • current_address_id (INT, Foreign Key)
  • Address Table:

    • address_id (INT, Primary Key)
    • street (VARCHAR)
    • city (VARCHAR)
    • state (VARCHAR)
    • postal_code (VARCHAR)
    • country (VARCHAR)
  • Historical Address Table:

    • history_id (INT, Primary Key)
    • customer_id (INT, Foreign Key)
    • address_id (INT, Foreign Key)
    • start_date (DATE)
    • end_date (DATE)

Advantages:

  • Simplifies queries for current addresses.
  • Reduces the complexity of the primary address table.

Disadvantages:

  • Requires additional joins to access historical data.

3. Using SCD Type 3

SCD Type 3 involves adding new columns to the table to store previous address information.

Schema Design:

  • Customer Table:
    • customer_id (INT, Primary Key)
    • customer_name (VARCHAR)
    • current_street (VARCHAR)
    • previous_street (VARCHAR)
    • current_city (VARCHAR)
    • previous_city (VARCHAR)
    • current_state (VARCHAR)
    • previous_state (VARCHAR)
    • current_postal_code (VARCHAR)
    • previous_postal_code (VARCHAR)
    • current_country (VARCHAR)
    • previous_country (VARCHAR)
    • last_updated (DATE)

Advantages:

  • Simple to implement for limited historical tracking.
  • No data duplication.

Disadvantages:

  • Limited to tracking only the most recent change.
  • Can become unwieldy with multiple address changes.

Conclusion

The choice of schema depends on the specific requirements for historical data analysis and the complexity you are willing to manage. SCD Type 2 is generally preferred for comprehensive historical tracking, while separate historical tables or SCD Type 3 might be suitable for scenarios with less frequent address changes or simpler requirements.