How to Design a Database

database icons

Lesson 1 has two parts. This is part 1 where we create some student and course data and consider how we can design a database for this data.

Part 1: Design our student database in this article

Part 2: Learn about the relationships in the database.

Storing Data Using Excel

A spreadsheet is an easy way to see records held in rows. To start with a simple task we use Excel to hold some simple data.

Below is a table of students registered on course at university.

Student ID

Student Name

Course Code

Course Name

Registration Date

652415501

Zhang Wei

888342

Management Information Systems

2024-2

652415502

Chen Mei

887231

Business Analytics

2024-2

652415502

Chen Mei

888342

Management Information Systems

2024-1

652415503

Li Hua

881234

Marketing Principles

2024-2

652415503

Li Hua

888342

Management Information Systems

2024-1

652415504

Somchai Prasert

888341

Data Science Essentials

2024-2

652415504

Somchai Prasert

888342

Management Information Systems

2024-2

652415505

Wanchai Suthep

883123

Financial Accounting

2024-1

652415505

Wanchai Suthep

881234

Marketing Principles

2024-2

652415506

Liu Xiaojun

884567

Operations Management

2024-1

652415506

Liu Xiaojun

888341

Data Science Essentials

2024-2

652415507

Kritsada Anan

885678

Human Resource Management

2024-2

652415508

Wang Fang

886345

Supply Chain Management

2024-1

652415508

Wang Fang

888342

Management Information Systems

2024-1

652415509

Napat Siriwat

887654

International Business

2024-2

652415509

Napat Siriwat

888341

Data Science Essentials

2024-2

652415510

Zhang Min

888120

Corporate Strategy

2024-1

652415510

Zhang Min

884567

Operations Management

2024-1

652415511

Li Xiaomei

888342

Management Information Systems

2024-2

652415511

Li Xiaomei

888341

Data Science Essentials

2024-1

This data is stored in a csv file called register.csv. For reference, this data was created using chatGPT with the following prompts:

First prompt:

create 10 students with the following data in a table of rows for each student

   – Student ID

   – Student Name

   – Course Enrolled (course code)

   – Course Enrolled (course name)

   – Registration Date (year, semester, e.g. 2024-2)

Here is some example data for the format (mainly Thai & Chinese students)

652415501 Zhang Wei 888342 Management Information systems 2024-2

Follow-up prompt:

this is ok for data but there are no courses with more than one student, can you provide a course in both semesters for example, and give details of students that are registered on more than one course, and courses that have more than one student – provide as many rows as needed

Instructions:

each group

  • download the test data
  • add some more rows
  • add group members to courses

Storing Data Issues

think about these questions

  • what would happen if you have to register thousands of students on courses at one time?
  • would there be any errors?
  • how would you find any errors?
  • Would this be easy?
  • Could you guarantee that you could find 100% of the errors?
  • can you add students with an wrong student number?
  • can you add courses with an wrong course code?
  • can you create duplicate records incorrectly?
  • can you register students with an invalid student number, or a wrong course code?
  • can you re-register students by mistake?

we solve these problems by using a database. In relational databases

  • we solve these error issues by using keys
  • in tables
  • we use a unique key
  • such as a student number
  • called the primary key

So we know there will be issues if we continue using a spreadsheet to hold our data. We can list the benefits of using a database later, but for now lets focus on what we need to do to design a database.

Database Design

Instructions:

each group

  • create a student table
  • create a course table

Database Concepts

  • each row in a table, is an entity instance
  • Each column is an attribute
  • one value for each attribute of each entity instance
  • one value is in one row and one column

Entity-Relationship Diagrams (ERD)

  • a table
  • based on one object / concept
  • called an entity
  • e.g. student
  • an entity is a single data model
ER diagram entity
  • attributes are like column names
  • try not to use spaces
  • it is common to use an underscore (e.g. first_name)
  • list the attributes
ER diagram with an entity and attributes

Data Values

the value can be:

  • numeric
  • string of characters
  • date \ time
  • other basic data types
  • ‘NULL’ value (e.g. missing value)
    • not the same as empty or zero

wait

  • we used id_number
  • but what does this mean?
  • is it easy to know what entity it is for?
  • how about student_id
  • we choose a primary key
  • or, create one
  • in the ER diagram we underline the primary key
  • it is possible to use multiple attributes
  • but stick with the easy route for now
ER diagram with a primary key

does your design & test data match?

for example

  • if you have name, does your test data only have one field for name?
  • if you have first name and surname, does your test data have two fields for name?
  • always save your test data
  • make versions in case you want to go back
  • everyone always makes mistakes – this is how we get better!

Continue to the Database Relationships

second part of lesson 1: Database relationships

Leave a Reply