Approximating π Using the Monte Carlo Method in Postgres
Databases are commonly used as dumb storage bins for CRUD application data. However, this doesn’t do them justice: database systems such as PostgreSQL are much more capable and versatile than that, supporting a wide range of operations across many different data types.
The standard way of interfacing with Postgres is SQL – you know, that thing you may have briefly learned in webdev class where you SELECT
stuff FROM
a set of tables WHERE
some condition is met. But that’s a far cry from what you can achieve when taking advantage of the full feature set of this declarative and – surprise! – Turing complete programming language.
In this post, I’ll describe how to approximate with a fairly compact SQL1 query.
This post is loosely based on what I learned in the Advanced SQL lecture held by Torsten Grust in the summer of 2017 at the University of Tübingen. Take a look at the lecture slides for in-depth explanations and a wide array of examples.
Theory
Let’s make sure to understand2 the required math first: You might remember that the area of a circle with radius is
We can divide by to get an equation for :
That means that as long as we know the radius and find a way of estimating , we can estimate !
Now let’s imagine our circle centered in a tight square box:
Note that the square’s side length is , yielding the square area
If we generate random points3 in this square, we can count how many points fall inside the circle. That’s the Monte Carlo method – generating random samples and estimating based on the observed distribution:
This gives us an approximation of the fraction of the square that is occupied by the circle, which in turn lets us approximate :
That means that is approximately four times the number of points inside the circle divided by the total number of points within the bounds of our square. This is starting to sound easy to implement!
To make things a bit more straightforward later on, let’s agree to set and place the circle’s center at . This conveniently turns our square into a unit square:
Counting the random points in this example, we notice that out of points fall inside the circle. Let’s plug these values into the formula we derived above and see what we get:
Not too far off!
Implementation
Before showing you the query, here are some “advanced” SQL features that will come in handy:
- The
random()
function returns a pseudo-randomly generated floating-point value between4 0 and 1. -
Postgres5 happens to have built-in support for geometric shapes:
point(x, y)
represents a point with the givenx
andy
coordinates, andcircle(p, r)
creates a circle with radiusr
around the pointp
.We can thus create a random point using
point(random(), random())
. Our unit circle from above can be generated usingcircle(point(0.5, 0.5), 0.5)
. -
Natively representing shapes is not very useful without some common operations on them. For example, the
@>
operator checks if the shape given in the left argument envelops the right argument – you could read it as “contains”.We’ll use this to check if our unit circle contains each random point:
circle(point(0.5, 0.5), 0.5) @> point(random(), random())
returns a boolean ready to be used in our query’sWHERE
clause. -
In case you haven’t used it before:
generate_series(min, max)
generates a single-column table containing the range of numbers frommin
tomax
.We won’t actually need these numbers, instead we just want to ensure that the
WHERE
clause of our query is executedn
times, so by convention we’ll call the returned table_
in ourFROM
clause. - Postgres supports setting global parameters using
\set name value
and retrieving the value using:name
. We’ll use this to store our sample sizen
because we need to reference it twice in our query.
Putting all of this together and combining it with our approximation formula for , we end up with something along the lines of the following query:
\set n 1000000
SELECT 4 * count(*) :: float / :n AS pi
FROM generate_series(1, :n) AS _
WHERE circle(point(0.5, 0.5), 0.5) @> point(random(), random());
To run it, simply spin up psql
, paste the query, press return and bask in the glory of what we’ve achieved today:
$ psql
psql=# \set n 1000000
psql=# SELECT 4 * count(*) :: float / :n AS pi
psql-# FROM generate_series(1, :n) AS _
psql-# WHERE circle(point(0.5, 0.5), 0.5) @> point(random(), random());
+----------+
| pi |
+----------+
| 3.140892 |
+----------+
(1 row)
Time: 682.926 ms
Addendum: Accidental LaTeX Implementation
If you’re only here for the SQL query, go away.
The visualizations in the “theory” section above have been drawn in LaTeX/TikZ6, including the random points used to approximate in the SQL query.
At some point I realized that it wouldn’t be too hard to add a counter to the \foreach
loop that’s responsible for filling the points with different shades of gray depending on whether they fall inside7 the circle. This enables keeping track of the number of points inside the circle. Because the iteration count is known, computing the corresponding approximation for works the same way as in the SQL query’s SELECT
clause. Finally displaying the result as part of the drawing was trivial, as you can see once you scroll down a bit.
First, here’s the code:
\RequirePackage{luatex85}
\documentclass{standalone}
\usepackage{fontenc,unicode-math}
\setmainfont[Ligatures=TeX]{TeX Gyre Pagella}
\setmathfont[Ligatures=TeX]{TeX Gyre Pagella Math}
\usepackage{tikz}
\usetikzlibrary{calc}
\begin{document}
\scalebox{2}{
\begin{tikzpicture}[scale=3.5]
% axes
\draw [<->,thick] (0,1.2) node (yaxis) [above] {$y$}
|- (1.2,0) node (xaxis) [right] {$x$};
% draw circle
\coordinate (m) at (0.5,0.5);
\draw (m) circle (0.5cm) node (mlabel) [right] {$m$};
\fill[black] (m) circle (0.4pt);
\draw[dashed] (yaxis |- m) node[left] {$0.5$}
-| (xaxis -| m) node[below] {$0.5$};
\draw[dashed] (m) -- node[right] {$r$} (0.23,0.08);
% draw rectangle
\coordinate (one) at (1,1);
\draw (0,0) rectangle (one);
\draw[dashed] (yaxis |- one) node[left] {$1$}
-| (xaxis -| one) node[below] {$1$};
% draw random points
\pgfmathsetmacro{\i}{100}
\newcounter{inpoints}
\setcounter{inpoints}{0}
\pgfmathsetseed{3455632}
\def\incolor{gray!50!black}
\def\outcolor{gray!50!white}
\foreach \p in {1,...,\i} {
\pgfmathsetmacro{\x}{0.5*rand+0.5}
\pgfmathsetmacro{\y}{0.5*rand+0.5}
\pgfmathparse{(\x-0.5)^2+(\y-0.5)^2}
\pgfmathsetmacro{\dist}{\pgfmathresult}
\ifdim\dist pt < 0.25pt
\addtocounter{inpoints}{1}
\fill[fill=\incolor] (\x,\y) circle (0.25pt);
\else
\fill[fill=\outcolor] (\x,\y) circle (0.25pt);
\fi
}
\pgfmathparse{int(\i-\theinpoints)}
\pgfmathsetmacro{\theoutpoints}{\pgfmathresult}
\pgfmathparse{(4*\theinpoints/\i)}
\node[above of=m,yshift=1.15cm,xshift=0.18cm] {$\pi \approx \frac{4 \cdot \textcolor{\incolor}{\theinpoints}}{\textcolor{\outcolor}{\theoutpoints} + \textcolor{\incolor}{\theinpoints}} \approx \pgfmathresult$};
\end{tikzpicture}
}
\end{document}
Simply adjust the random seed 3455632
and optionally the iteration count \i
in the LaTeX document above, compile8 and you’ll observe a different distribution of the points and, most likely, a slightly different approximation of below the plot. You can also change the colors of the points to your liking:
Turning this visualization into an animation by making the points appear one by one (while continually adjusting the approximation) would be interesting. Consider that an exercise for the reader. 😉
-
Note that we’ll use some PostgreSQL-specific functions, however due to SQL’s Turing completeness, everything we do here is theoretically possible in any standards-compliant RDBMS. ↩
-
If points are too boring, you could do the same with darts. As long as you’re not too good at darts. ↩
-
As usual, the range is actually , meaning that is not included. As usual, this doesn’t really matter for the problem at hand. ↩
-
The excellent pdf2svg utility (which can be installed via Homebrew) was used to convert to a web-accessible format. ↩
-
Which can be tested by checking whether
(\x-0.5)^2+(\y-0.5)^2
(where0.5
is both the x and y coordinate of the circle’s center) is smaller than0.25
. ↩ -
If you remove lines 1 and 3-5, you can use any LaTeX engine, otherwise you’ll be constrained to LuaLaTeX. ↩