Normalisation for professors in organisations with SQL Server

Table of Content

  1. Chapter 1 - Project Overview

  2. Chapter 2 - Import data

  3. Chapter 3 - Create tables and populate them

  4. Chapter 4 - Set constraints and relationship

  5. Chapter 5 - Final two tips

Chapter 1 Project Overview

Normalisation is a process to store data while reducing data redundancy and keeping data consistency at the same time, it follows the common 3 Normal Forms and 2 Strict Normal Forms plus the Boyce-Codd Normal Form. This project shows how to split a complete table into a database with appropriate relationships and constraints in SQL Server.

This project mainly focuses on logic and techniques. There are two aspects this project show:

  • The method of extending the dimension tables (Snowflake Schema)
  • The achievement of referential integrity (Build primary key and foreign key relationship, Choose actions on update or delete)

You can find my complete SQL codes on my GitHub.

The data comes from Trenton Mckinney's GitHub.

Chapter 2 Import data

SQL Server 2019 Import and Export Data Wizard imported an Excel table into my SQL Server Database. Then, have a look at the original table.

It has 8 attributes:

  • firstname

  • lastname

  • university

  • university_shortname

  • university_city

  • function

  • organization

  • organization_sector

To achieve the map of tables below, the project will use a snowflake schema to organise these attributes and add two auto-incrementing ids for dim_professors and fact_affiliations, respectively.

-- Example to add surrgate key while setting it as primary key
ALTER TABLE dim_professors
ADD id INT IDENTITY(1,1) PRIMARY KEY;

Chapter 3 Create tables and populate them

This database mainly records what role professors play in each organisation, a professor can have different functions in an organisation of one university for the fact table, and professors, universities and organisations have their own dimension tables.

So, it has 1 fact table and 3 dimension tables

  • affiliations (fact table)

  • professors (dimension table)

  • organisations (dimension table)

  • universities (dimension table)

There are 3 things to consider when creating tables

  • datatype

  • nullable

  • primary key

datatype

Most attributes are varchar(), the maximum should be decided by the maximum length of the current record but should leave some space.

The column university_shortname should follow the fixed format and use char(3).

nullable

Most attributes should not be null except the [function] attribute.

primary key

For affiliations and professors, the primary key can use created surrogate key.

For organisations and universities, the names of organisations and universities are good choices to identify each row.

With CREATE, VARCHAR(), and PRIMARY KEY, the four empty tables are created in SQL Server.

With INSERT INTO, SELECT DISTINCT columns FROM the original table, they are populated.

Chapter 4 Set constraints and relationships

The final relationships should be

  • M:1 relationship from professors to universities with university_shortname column

  • N: M relationship from professors to organisations with affiliations table, divide it into 2 steps:

    • 1:N relationship from professors to affiliations with professor_id column

    • 1:M relationship from organisations to affiliations with organisation column

The key part of this chapter is to talk about how to achieve the connection between organisations and professors with affiliations. Because a professor can have many functions in an organisation while an organisation can have many professors, there is a many-to-many relationship. A method is to use an intermediate table to become the bridge.

Step 1 a. Add a professor_id column in the affiliations

b. Reference the professor_id from professors to affiliations;

c. Set when a deletion is attempted in the referenced table (i.e., professors), it will not be allowed if there are any dependent rows in the referencing table (i.e., affiliations)

ALTER TABLE affiliations
ADD professor_id INT,
    CONSTRAINT fk_affiliations_professors FOREIGN KEY (professor_id)
    REFERENCES professors(id) ON DELETE NO ACTION;

Step 2 Populate professor_id using the professors table

UPDATE affiliations
SET professor_id = professors.id
FROM affiliations
JOIN professors ON affiliations.firstname = professors.firstname
                AND affiliations.lastname = professors.lastname;

Step 3 Drop firstname and lastname

ALTER TABLE affiliations
DROP COLUMN firstname;

ALTER TABLE affiliations
DROP COLUMN lastname;

Step 4 1:M relationship from organisations to affiliations with referential integrity

ALTER TABLE affiliations
ADD CONSTRAINT fk_affiliations_organisations FOREIGN KEY (organisation) 
    REFERENCES organisations(organisation) ON DELETE NO ACTION;

The final diagram shows the complete database

Chapter 5 Final Two tips

Tip 1 Referential Integrity: ON DELETE NO ACTION

To keep data consistent, there are some options for referenced tables when a row is deleted. The project chose NO ACTION, it prevents the deletion of rows if it is referenced. Test it with

Tip 2 INFORMATION_SCHEMA

The INFORMATION_SCHEMA can check tables, columns, constraints and other metadata.

SELECT constraint_name, table_name, constraint_type
FROM INFORMATION_SCHEMA.table_constraints;

The constraints will be extracted in a table, super convenient, right?