I have three tables with common fields -
The last two tables have some of the
users fields. Here's an example:
I'm wondering if it's better to:
a)use one table with many NULL values
b)use three tables
The question is less about "one table with many NULL versus three tables" that about the data structure. The real question is how other tables in your data structure will refer to these entities.
This is a classic situation, where you have "one-of" relationships and need to represent them in SQL. There is a "right" way, and that is to have four tables:
- "users" (I can't think of a good name) would encompass everyone and have a unique id that could be referenced by other tables
- "normal", "admins", "guests" each of which would have a 1-0/1 relationship with "users"
This allows other tables to refer to any of the three types of users, or to users in general. This is important for maintaining proper relationships.
You have suggested two shortcuts. One is that there is no information about "normal" users so you dispense with that table. However, this means that you can't refer to "normal" users in another table.
Often, when the data structures are similar, the data is simply denormalized into a single row (as in your solution a).
All three approach are reasonable, in the context of applications that have specific needs. As for performance, the difference between having additional
NULLABLE columns is generally minimal when the data types are variable length. If a lot of the additional columns are numeric, then these occupy real space even when
NULL, which can be a factor in designing the best solution.
In short, I wouldn't choose between the different options based on the premature optimization of which might be better. I would choose between them based on the overall data structure needed for the database, and in particular, the relationships that these entities have with other entities.
Then there is the question of the
id that you use for the specialized tables. There are two ways of doing this. One is to have a separate id, such as
GuestId for each of these tables. Another column in each table would be the
This makes sense when other entities have relationships with these particular entities. For instance, "admins" might have a sub-system that describes rights and roles and privileges that they have, perhaps along with a history of changes. These tables (ahem, entities) would want to refer to an
AdminId. And, you should probably oblige by letting them.
If you don't have such tables, then you might still split out the Admins, because the 100 integer columns they need are a waste of space for the zillion other users. In that case, you can get by without a separate id.
I want to emphasize that you have asked a question that doesn't have a "best" answer in general. It does have a "correct" answer by the rules of normalization (that would be 4 tables with 4 separate ids). But the best answer in a given situation depends on the overall data model.