top of page

Do the number of parking spaces in a parking garage follow Benford’s Law?

Updated: Dec 16, 2019

I recently stumbled across an old New York Times article that introduced me to Benford’s Law. Benford’s Law holds that in a large set of numbers, the leading digit is more likely to be small than large i.e. a number is more likely to start with 1 than 2, 2 than 3, and 3 than 7 and so on. Specifically, the Law finds that numbers that start with 1 occur ~30.1% of the time, 2 will occur 17.6% declining all the way to 9 which occurs roughly 4.6% of the time.

The distribution of leading digits in a large set of numbers according to Benford’s Law

The law was first discovered in 1938 by a physicist at General Electric — the venerable Dr. Frank Benford — and has since been applied in a number of capacities, but most notably in fraud detection. It turns out that humans are not so great at generating fake or random data. We can apply Benford’s Law to a data set and find out if the distribution of leading digits follows the natural order of things. If it doesn’t — that specific data set deserves a second look.

So after learning about this intriguing Law I (obviously) looked to apply it to parking data. (Personal aside: I also am also constantly on the lookout for ways to interrupt the data scientists I work with and prove to them that this business professional is more numbers savvy than they give me credit for.)

A glimpse into the Smarking ENG Slack channel…. Minor losses in engineering productivity are justified by the small ego boost I get from talking with people smarter than myself

I quickly decided that Mike’s thumbs up via Slack was not going to cut it — whatever he was working on would have to wait. There was a burning question on my mind that needed answering… does parking data follow Benford’s Law?

Mike, who handles our backend infrastructure, was the right guy to talk to. We quickly pulled up his terminal and started thinking about data sets that would be interesting to analyze: number of parking transactions? Daily peak occupancy of a garage? Ultimately we settled on the number of parking spaces in a garage. Smarking has over 2,000 garages on our platform — a large enough data set for analysis that we’re confident is naturally / randomly distributed. Had we chosen daily peak occupancy we would have had a bias for 5, 6, 7, and 8’s as average peak occupancies typically fall between 51% — 89%. After a couple of queries, sorting through the data, and removing some erroneous data points we arrived at our answer: