How to Design a Database
Lesson 1 has two parts. This is part 1 where we create some student and course data and consider how we can design a database for this data.
Part 1: Design our student database in this article
Part 2: Learn about the relationships in the database.
Storing Data Using Excel
A spreadsheet is an easy way to see records held in rows. To start with a simple task we use Excel to hold some simple data.
Below is a table of students registered on course at university.
Student ID | Student Name | Course Code | Course Name | Registration Date |
652415501 | Zhang Wei | 888342 | Management Information Systems | 2024-2 |
652415502 | Chen Mei | 887231 | Business Analytics | 2024-2 |
652415502 | Chen Mei | 888342 | Management Information Systems | 2024-1 |
652415503 | Li Hua | 881234 | Marketing Principles | 2024-2 |
652415503 | Li Hua | 888342 | Management Information Systems | 2024-1 |
652415504 | Somchai Prasert | 888341 | Data Science Essentials | 2024-2 |
652415504 | Somchai Prasert | 888342 | Management Information Systems | 2024-2 |
652415505 | Wanchai Suthep | 883123 | Financial Accounting | 2024-1 |
652415505 | Wanchai Suthep | 881234 | Marketing Principles | 2024-2 |
652415506 | Liu Xiaojun | 884567 | Operations Management | 2024-1 |
652415506 | Liu Xiaojun | 888341 | Data Science Essentials | 2024-2 |
652415507 | Kritsada Anan | 885678 | Human Resource Management | 2024-2 |
652415508 | Wang Fang | 886345 | Supply Chain Management | 2024-1 |
652415508 | Wang Fang | 888342 | Management Information Systems | 2024-1 |
652415509 | Napat Siriwat | 887654 | International Business | 2024-2 |
652415509 | Napat Siriwat | 888341 | Data Science Essentials | 2024-2 |
652415510 | Zhang Min | 888120 | Corporate Strategy | 2024-1 |
652415510 | Zhang Min | 884567 | Operations Management | 2024-1 |
652415511 | Li Xiaomei | 888342 | Management Information Systems | 2024-2 |
652415511 | Li Xiaomei | 888341 | Data Science Essentials | 2024-1 |
This data is stored in a csv file called register.csv. For reference, this data was created using chatGPT with the following prompts:
First prompt:
create 10 students with the following data in a table of rows for each student
– Student ID
– Student Name
– Course Enrolled (course code)
– Course Enrolled (course name)
– Registration Date (year, semester, e.g. 2024-2)
Here is some example data for the format (mainly Thai & Chinese students)
652415501 Zhang Wei 888342 Management Information systems 2024-2
Follow-up prompt:
this is ok for data but there are no courses with more than one student, can you provide a course in both semesters for example, and give details of students that are registered on more than one course, and courses that have more than one student – provide as many rows as needed
Instructions:
each group
- download the test data
- add some more rows
- add group members to courses
Storing Data Issues
think about these questions
- what would happen if you have to register thousands of students on courses at one time?
- would there be any errors?
- how would you find any errors?
- Would this be easy?
- Could you guarantee that you could find 100% of the errors?
- can you add students with an wrong student number?
- can you add courses with an wrong course code?
- can you create duplicate records incorrectly?
- can you register students with an invalid student number, or a wrong course code?
- can you re-register students by mistake?
we solve these problems by using a database. In relational databases
- we solve these error issues by using keys
- in tables
- we use a unique key
- such as a student number
- called the primary key
So we know there will be issues if we continue using a spreadsheet to hold our data. We can list the benefits of using a database later, but for now lets focus on what we need to do to design a database.
Database Design
Instructions:
each group
- create a student table
- create a course table
Database Concepts
- each row in a table, is an entity instance
- Each column is an attribute
- one value for each attribute of each entity instance
- one value is in one row and one column
Entity-Relationship Diagrams (ERD)
- a table
- based on one object / concept
- called an entity
- e.g. student
- an entity is a single data model
- attributes are like column names
- try not to use spaces
- it is common to use an underscore (e.g. first_name)
- list the attributes
Data Values
the value can be:
- numeric
- string of characters
- date \ time
- other basic data types
- ‘NULL’ value (e.g. missing value)
- not the same as empty or zero
wait
- we used id_number
- but what does this mean?
- is it easy to know what entity it is for?
- how about student_id
- we choose a primary key
- or, create one
- in the ER diagram we underline the primary key
- it is possible to use multiple attributes
- but stick with the easy route for now
does your design & test data match?
for example
- if you have name, does your test data only have one field for name?
- if you have first name and surname, does your test data have two fields for name?
- always save your test data
- make versions in case you want to go back
- everyone always makes mistakes – this is how we get better!
Continue to the Database Relationships
second part of lesson 1: Database relationships