Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).

I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...

"Oh okay, we'll only face this problem when we have 
(a) a lot of users, or 
(b) it'll only happen when two people try doing something
    at _exactly_ the same time"

I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.

Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...


Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!

Comments

It's MAX(bad). –

Written by webbiedave

Accepted Answer

The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.

It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.

I kind of agree with @Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.

And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.

And right from the MySQL documentation on the subject:

You can generate sequences without calling LAST_INSERT_ID(), but the utility of 
using the function this way is that the ID value is maintained in the server as 
the last automatically generated value. It is multi-user safe because multiple 
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by 
other clients that generate their own sequence values.
Written by ircmaxell
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki