Approximating π Using the Monte Carlo Method in Postgres

Databases are commonly used as dumb storage bins for CRUD application data. However, this doesn’t do them justice: database systems such as PostgreSQL are much more capable and versatile than that, supporting a wide range of operations across many different data types.

The standard way of interfacing with Postgres is SQL – you know, that thing you may have briefly learned in webdev class where you SELECT stuff FROM a set of tables WHERE some condition is met. But that’s a far cry from what you can achieve when taking advantage of the full feature set of this declarative and – surprise! – Turing complete programming language.

In this post, I’ll describe how to approximate $\pi$ with a fairly compact SQL¹ query.

This post is loosely based on what I learned in the Advanced SQL lecture held by Torsten Grust in the summer of 2017 at the University of Tübingen. Take a look at the lecture slides for in-depth explanations and a wide array of examples.

Theory

Let’s make sure to understand² the required math first: You might remember that the area $A$ of a circle with radius $r$ is

A = \pi r^2.

We can divide by $r^2$ to get an equation for $\pi$ :

\pi = \frac{A}{r^2}

That means that as long as we know the radius and find a way of estimating $A$ , we can estimate $\pi$ !

Now let’s imagine our circle centered in a tight square box:

Note that the square’s side length is $2r$ , yielding the square area

B = (2r)^2 = 4r^2.

If we generate $n$ random points³ in this square, we can count how many points $m$ fall inside the circle. That’s the Monte Carlo method – generating random samples and estimating based on the observed distribution:

This gives us an approximation of the fraction of the square that is occupied by the circle, which in turn lets us approximate $\pi$ :

\begin{aligned} \frac{A}{B} \approx \frac{m}{n} &\Rightarrow \frac{\pi r^2}{4 r^2} \approx \frac{m}{n}\\ &\Rightarrow \frac{\pi}{4} \approx \frac{m}{n}\\ &\Rightarrow \pi \approx \frac{4m}{n} \end{aligned}

That means that $\pi$ is approximately four times the number of points inside the circle divided by the total number of points within the bounds of our square. This is starting to sound easy to implement!

To make things a bit more straightforward later on, let’s agree to set $r = 0.5$ and place the circle’s center at $m = (0.5, 0.5)$ . This conveniently turns our $2r \times 2r$ square into a unit square:

Counting the random points in this example, we notice that $m = 79$ out of $n = 100$ points fall inside the circle. Let’s plug these values into the formula we derived above and see what we get:

\pi \approx \frac{4m}{n} = \frac{4 \cdot 79}{100} = 3.16

Not too far off!

Implementation

Before showing you the query, here are some “advanced” SQL features that will come in handy:

The random() function returns a pseudo-randomly generated floating-point value between⁴ 0 and 1.
Postgres⁵ happens to have built-in support for geometric shapes: point(x, y) represents a point with the given x and y coordinates, and circle(p, r) creates a circle with radius r around the point p.

We can thus create a random point using point(random(), random()). Our unit circle from above can be generated using circle(point(0.5, 0.5), 0.5).
Natively representing shapes is not very useful without some common operations on them. For example, the @> operator checks if the shape given in the left argument envelops the right argument – you could read it as “contains”.

We’ll use this to check if our unit circle contains each random point: circle(point(0.5, 0.5), 0.5) @> point(random(), random()) returns a boolean ready to be used in our query’s WHERE clause.
In case you haven’t used it before: generate_series(min, max) generates a single-column table containing the range of numbers from min to max.

We won’t actually need these numbers, instead we just want to ensure that the WHERE clause of our query is executed n times, so by convention we’ll call the returned table _ in our FROM clause.
Postgres supports setting global parameters using \set name value and retrieving the value using :name. We’ll use this to store our sample size n because we need to reference it twice in our query.

Putting all of this together and combining it with our approximation formula for $\pi$ , we end up with something along the lines of the following query:

\set n 1000000

SELECT 4 * count(*) :: float / :n AS pi
FROM   generate_series(1, :n) AS _
WHERE  circle(point(0.5, 0.5), 0.5) @> point(random(), random());

To run it, simply spin up psql, paste the query, press return and bask in the glory of what we’ve achieved today:

$ psql
psql=# \set n 1000000
psql=# SELECT 4 * count(*) :: float / :n AS pi
psql-# FROM   generate_series(1, :n) AS _
psql-# WHERE  circle(point(0.5, 0.5), 0.5) @> point(random(), random());
+----------+
|    pi    |
+----------+
| 3.140892 |
+----------+
(1 row)

Time: 682.926 ms

Addendum: Accidental LaTeX Implementation

If you’re only here for the SQL query, go away.

The visualizations in the “theory” section above have been drawn in LaTeX/TikZ⁶, including the random points used to approximate $\pi$ in the SQL query.

At some point I realized that it wouldn’t be too hard to add a counter to the \foreach loop that’s responsible for filling the points with different shades of gray depending on whether they fall inside⁷ the circle. This enables keeping track of the number of points inside the circle. Because the iteration count is known, computing the corresponding approximation for $\pi$ works the same way as in the SQL query’s SELECT clause. Finally displaying the result as part of the drawing was trivial, as you can see once you scroll down a bit.

First, here’s the code:

\RequirePackage{luatex85}
\documentclass{standalone}
\usepackage{fontenc,unicode-math}
\setmainfont[Ligatures=TeX]{TeX Gyre Pagella}
\setmathfont[Ligatures=TeX]{TeX Gyre Pagella Math}
\usepackage{tikz}
\usetikzlibrary{calc}
\begin{document}

\scalebox{2}{
\begin{tikzpicture}[scale=3.5]
    % axes
    \draw [<->,thick] (0,1.2) node (yaxis) [above] {$y$}
                   |- (1.2,0) node (xaxis) [right] {$x$};
    % draw circle
    \coordinate (m) at (0.5,0.5);
    \draw (m) circle (0.5cm) node (mlabel) [right] {$m$};
    \fill[black] (m) circle (0.4pt);
    \draw[dashed] (yaxis |- m) node[left] {$0.5$}
               -| (xaxis -| m) node[below] {$0.5$};
    \draw[dashed] (m) -- node[right] {$r$} (0.23,0.08);
    % draw rectangle
    \coordinate (one) at (1,1);
    \draw (0,0) rectangle (one);
    \draw[dashed] (yaxis |- one) node[left] {$1$}
               -| (xaxis -| one) node[below] {$1$};
    % draw random points
    \pgfmathsetmacro{\i}{100}
    \newcounter{inpoints}
    \setcounter{inpoints}{0}
    \pgfmathsetseed{3455632}
    \def\incolor{gray!50!black}
    \def\outcolor{gray!50!white}
    \foreach \p in {1,...,\i} {
        \pgfmathsetmacro{\x}{0.5*rand+0.5}
        \pgfmathsetmacro{\y}{0.5*rand+0.5}
        \pgfmathparse{(\x-0.5)^2+(\y-0.5)^2}
        \pgfmathsetmacro{\dist}{\pgfmathresult}
        \ifdim\dist pt < 0.25pt
            \addtocounter{inpoints}{1}
            \fill[fill=\incolor] (\x,\y) circle (0.25pt);
        \else
            \fill[fill=\outcolor] (\x,\y) circle (0.25pt);
        \fi
    }
    \pgfmathparse{int(\i-\theinpoints)}
    \pgfmathsetmacro{\theoutpoints}{\pgfmathresult}
    \pgfmathparse{(4*\theinpoints/\i)}
    \node[above of=m,yshift=1.15cm,xshift=0.18cm] {$\pi \approx \frac{4 \cdot \textcolor{\incolor}{\theinpoints}}{\textcolor{\outcolor}{\theoutpoints} + \textcolor{\incolor}{\theinpoints}} \approx \pgfmathresult$};
\end{tikzpicture}
}

\end{document}

Simply adjust the random seed 3455632 and optionally the iteration count \i in the LaTeX document above, compile⁸ and you’ll observe a different distribution of the points and, most likely, a slightly different approximation of $\pi$ below the plot. You can also change the colors of the points to your liking:

Turning this visualization into an animation by making the points appear one by one (while continually adjusting the approximation) would be interesting. Consider that an exercise for the reader. 😉

Note that we’ll use some PostgreSQL-specific functions, however due to SQL’s Turing completeness, everything we do here is theoretically possible in any standards-compliant RDBMS. ↩
Bear with me here. ↩
If points are too boring, you could do the same with darts. As long as you’re not too good at darts. ↩
As usual, the range is actually $[0, 1)$ , meaning that $1$ is not included. As usual, this doesn’t really matter for the problem at hand. ↩
Other RDBMSes have similar features, see here or here. ↩
The excellent pdf2svg utility (which can be installed via Homebrew) was used to convert to a web-accessible format. ↩
Which can be tested by checking whether (\x-0.5)^2+(\y-0.5)^2 (where 0.5 is both the x and y coordinate of the circle’s center) is smaller than 0.25. ↩
If you remove lines 1 and 3-5, you can use any LaTeX engine, otherwise you’ll be constrained to LuaLaTeX. ↩