[KLUG Programming] questions on picking data types

Jamie McCarthy jamie at mccarthy.vg
Thu Jul 15 13:03:20 EDT 2004


awilliam at whitemice.org (Adam Tauno Williams) writes:

> > >> > If not, then it probably makes more sense to just store
> > >> > the 9-digit SSN as an INT.  That way when you need to
> > >> > retrieve a row ...
> > Oh, I can't WAIT for them to add a digit! :)
> 
> Thats actually not so funny, they've talked about doing so in the
> past.

I recall in the early 90s seeing a coding "tip" in Dr. Dobb's
Journal that advised C programmers on a great way to store phone
numbers.  Since all area codes have a 0 or 1 as their middle digit,
you just swap the first two numbers and it will fit into an unsigned
32-bit int!  So (616) 555-1212 becomes 1665551212.  I pity any
programmer who actually took this advice ten years ago (try this on
Kalamazoo's new area code).

But that's the beauty of SQL.  Databases are designed to allow for
data flexibility in this way.

If you store your SSNs as an INT UNSIGNED, and they add another
digit, you just do

    ALTER TABLE foo CHANGE COLUMN ssn ssn BIGINT UNSIGNED;

The disk churns for a few seconds, and voila!  You're done with the
backend portion (whether your UI is as easy to tweak is another
story).

If they add 50 more digits, you turn the column into a CHAR.  If
they make SSNs 300 digits long, turn it into TEXT.  It all happens
quite easily.  For someone who grew up on C, this flexibility is
like taking a 400-pound gorilla off my back.  So nice.

> And SSN numbers are already not unique.

Ouch, good point.  Don't make that a UNIQUE KEY.  And write your
code to expect multiple rows to come back if you search on SSN...


Rusty writes:

> I wonder if I use the string that I could put three fields on the
> form and constrain them for the appropriate number of characters
> and then concatenate them into the field on the write of the
> data?  I am just worried that I am going to get one that is typed
> in as 123456789 and another 123-45-6789 and another 123-456789
> and a forth 12-34-56789.  There are way to many permutations for
> efficient error trapping I think... well unless I strip out all
> hyphens and then re-input them as needed.  I think I would want
> to store the hyphens so that once I capture the SSN I do not have
> to continually write code to format the number properly every
> time I need to display it in a form or report.

You asked for advice, so here it is:  this is a really bad idea.

:)

Data storage should be unrelated to data formatting.  Another way to
say that is that formatting your data, both on input and output,
should be done explicitly, by code that is designed to format your
data.

Yes, there are a zillion ways to mistype a SSN -- or any other kind
of input.  Trapping such errors is the job of your input formatting
code, which ultimately needs to return either a consistently
formatted piece of data or an informative error.  _How_ that data is
formatted for storage should depend on the most logical way to
represent it internally, not on how it will appear on output.

And ditto for the output.  Write one function to format your
database's raw data into the format you need for output.  You only
need to write it once, that's how subroutines work.  :)  I don't
know if making it object-oriented would be right for your program,
but it definitely should be modular.

In any case, where your data in your DB needs now, or may need in
future, any kind of massaging either on input or output, it's not a
bad idea to write that subroutine anyway.  If SSNs do go to 10
digits and the standard format becomes ######-####, you will want to
change that in one place, not a hundred.

Proper conceptual separation of code and data will avoid a _lot_ of
headaches in the future.
-- 
  Jamie McCarthy
 http://mccarthy.vg/
  jamie at mccarthy.vg



More information about the Programming mailing list