Updated: Dec 16, 2019
I recently stumbled across an old New York Times article that introduced me to Benford’s Law. Benford’s Law holds that in a large set of numbers, the leading digit is more likely to be small than large i.e. a number is more likely to start with 1 than 2, 2 than 3, and 3 than 7 and so on. Specifically, the Law finds that numbers that start with 1 occur ~30.1% of the time, 2 will occur 17.6% declining all the way to 9 which occurs roughly 4.6% of the time.
The distribution of leading digits in a large set of numbers according to Benford’s Law
The law was first discovered in 1938 by a physicist at General Electric — the venerable Dr. Frank Benford — and has since been applied in a number of capacities, but most notably in fraud detection. It turns out that humans are not so great at generating fake or random data. We can apply Benford’s Law to a data set and find out if the distribution of leading digits follows the natural order of things. If it doesn’t — that specific data set deserves a second look.
So after learning about this intriguing Law I (obviously) looked to apply it to parking data. (Personal aside: I also am also constantly on the lookout for ways to interrupt the data scientists I work with and prove to them that this business professional is more numbers savvy than they give me credit for.)
A glimpse into the Smarking ENG Slack channel…. Minor losses in engineering productivity are justified by the small ego boost I get from talking with people smarter than myself
I quickly decided that Mike’s thumbs up via Slack was not going to cut it — whatever he was working on would have to wait. There was a burning question on my mind that needed answering… does parking data follow Benford’s Law?
Mike, who handles our backend infrastructure, was the right guy to talk to. We quickly pulled up his terminal and started thinking about data sets that would be interesting to analyze: number of parking transactions? Daily peak occupancy of a garage? Ultimately we settled on the number of parking spaces in a garage. Smarking has over 2,000 garages on our platform — a large enough data set for analysis that we’re confident is naturally / randomly distributed. Had we chosen daily peak occupancy we would have had a bias for 5, 6, 7, and 8’s as average peak occupancies typically fall between 51% — 89%. After a couple of queries, sorting through the data, and removing some erroneous data points we arrived at our answer:
Distribution of parking garage space counts onboarded to Smarking, sample size > 2,000
Yes! Parking garage space counts do follow Benford’s Law! Huzzah! Based on more than 2,000 parking garages, 32.1% of garage space counts begin with 1, 17.4% of garage space counts begin with 2, 12.4% begin with 3… and so on. If we visually compare Benford’s distribution to Smarking’s distribution we get an eerily similar graph, see below.
The green bar represents the distribution of leading digits among garage space counts vs the blue bar representing Benford’s distribution.
How satisfying — math works! This obscure law I knew nothing about applies to the wild world of parking data! And whats more — my data science credibility is at an all time high! What a time to be alive.
Keen observers of the graph will notice one outlier — garage space counts are slightly more likely to start with 8 than with 7: 5.47% for 8 vs 5.35% for 7. This means garages are more likely to have 80 or 800 spaces than 70 or 700. Smarking is currently unaware of any garages with 7,000 or 8,000 spaces — although maybe some airports have structures that large.
So I propose this question to the Smarking blog readers — is there a good explanation for this slight deviation from Benford’s Law? Or is Smarking’s sample size simply too small? To that end, if you own a parking structure with 70–79 spaces or 700–799 spaces — get in touch! We need you as a client! Dr. Benford needs you to be a Smarking client! The fundamentals of mathematics need you to be a Smarking client!
Okay, thats all for now. As always, thanks for reading. We really appreciate you and your strange interest in parking data.
-Cassius & the Smarking Team