Digital Garden Visitor Counter

Estimated read time: 11 minutes
TL;DR: I created a visitor counter that runs in AWS Lambda and stores data in DynamoDB. It runs fast and uses little memory, and should fall within the AWS free tier. This was a fun little exercise!

When I started making this digital garden, I wanted it to have a late 90s-era visitor counter [1]. You know those little boxes that said something like, "You are visitor number 12345," on old GeoCities websites? Maybe I'm just showing my age here, but this is what one might look like:

I thought it would be fun to build one of these from scratch in Rust, since it would be a good opportunity to play around with Rust in an AWS Lambda Function URL with DynamoDB. Using these services should be cheap given the low amount of traffic I'm expecting on this site. In fact, as long as I don't get the hug of death, it will land well within the free tier.[2]

Since it is written in Rust, it can use the lowest memory Lambda configuration available (currently 128 MB), and should run blazingly fast. Using an ARM-based instance makes it even cheaper. At the time of writing, Lambda costs $0.0000000017 per 1 millisecond[3] with 128 MB of memory on ARM in US West 2. Converting that into more human comprehendible terms, if one instance of this Lambda ran continuously for a month, it would cost $4.41. Considering that's only after the free tier has been exhausted, that really shouldn't break the bank in the unlikely event I do get the hug of death.

Of course, it won't be running continuously, but only briefly when someone visits a web-page with it. How fast can we get? Let's talk more about how this thing will work before we get to that.

Design Decisions

First, how exactly should this thing behave? Ideally, the counter would count unique browser sessions, and reject all requests from bots, such as search engine crawlers. If it's embedded on multiple pages, and a visitor visits all them within, say, 2 hours, then it should only count that as one visit.

So what is available for tracking uniqueness? A Lambda Function URL is given a source IP address, and a User-Agent header string. The IP address is not unique, and can be shared by many people, such as from an organization, public WiFi, or a VPN. The User-Agent header is also not unique, but it adds slightly more uniqueness to the IP address, and does enable identifying well-behaved bots.

Of course, since a Lambda doesn't have persistent storage, something has to be stored in order to keep track of the semi-unique browser sessions. I don't want to store IP addresses or user agent strings, as I value privacy, and don't want any information that could be used to identify someone. Fortunately, with hashes, I can store just enough scrambled information to identify a session, but not enough to to pose any kind of privacy risk if the data is compromised.

To do this, I hashed the source IP and user agent together with MD5, and took the first 32-bits of that hash. Everyone just blanket says you shouldn't use MD5, but it's always about using the right tool for the job. I'm not going to claim MD5 is the best hash for this use-case, but it is very fast, and I don't care about cryptographic security or collision forgery in this instance. If someone wants to forge their user agent to mess with the Lambda, then more power to them!

I only take the first 32-bits of the hash since this is best-effort uniqueness tracking, and I don't want to store more than I need to. Storing less of a hash allows for storing more visitor hashes in a single DynamoDB item, which has a 400 KB limit. The time the visitor was last seen is also stored with the hash so that old entries can be pruned, and a session can be limited in duration.

Our visitor struct ends up being:

#[derive(Copy, Clone, Debug, serde::Deserialize, serde::Serialize)]
struct Visitor {
    /// The hashed IP and user agent.
    tag: u32,
    /// The last time the visitor was seen.
    last_seen: u32,
}

The last_seen can actually be a 32-bit integer to save on space by defining a non-standard time it is counting up from, say 2023-07-01 00:00:00 UTC. This would give us 4,294,967,295 seconds, or about 136 years of time to work with, making it work up until 2159. If my garden stays online that long, I'll be shocked. Or dead. 💀

One thing we need to be mindful of is how we prevent multiple concurrent Lambdas from clobbering each other, and for that, we need to think about the table design. DynamoDB has optimistic locking in the form of conditional expressions on the PutItem operation. What we can do is put a condition on the count itself when making an update, and if the count is not what we expect, then the update gets rejected. Then we just reload the item, apply our count increment again, and try again.

Given that, the item format can be the following:

counter name	count	recent visitors
default	1234	up to ~26,600 visitor hashes and times

After all that effort on making the visitor struct small (it should be 8-bytes in Rust), I lazily used Ciborium to serialize the visitors as CBOR. This brings the size of the struct up to 15-bytes when serialized where it really should be 8. So clearly, this can be improved by doing some manual binary serialization rather than using CBOR. 😅 But that's a problem for another day.

Blitting the Glyphs

Blitting, in software, originates from a function named BitBLT in 1975, and means combining two bitmaps together. I like to use this term since it sounds funny, but it seems like few people are familiar with it. So consider yourself familiar now if you weren't already.

I wanted the counter to be directly loadable as an image using the <img> HTML tag without requiring any client-side scripting. This means the Lambda needs to respond with image data using a Content-Type header. To do the rendering, I embedded an 8x16 pixel font for digits 0-9, and did manually blitting into a larger image buffer. This produces a small, pixelated, response in a very short order. Any desired styling can handled in CSS.

The font looks like the following in Rust code:

const GLYPH_WIDTH: usize = 8;
const GLYPH_HEIGHT: usize = 16;
const GLYPH_SIZE: usize = GLYPH_WIDTH * GLYPH_HEIGHT;
const GLYPH_COUNT: usize = 10;

// Foreground
const F: u8 = 1;

const GLYPH_BITMAPS: [[u8; GLYPH_SIZE]; GLYPH_COUNT] = [
    [ // 0
        0,0,0,F,F,0,0,0,
        0,0,F,F,F,F,0,0,
        0,F,F,0,0,F,F,0,
        0,F,0,0,0,0,F,0,
        F,F,0,0,0,0,F,F,
        F,F,0,0,0,0,F,F,
        F,F,0,0,0,0,F,F,
        F,F,0,0,F,0,F,F,
        F,F,0,F,0,0,F,F,
        F,F,0,0,0,0,F,F,
        F,F,0,0,0,0,F,F,
        F,F,0,0,0,0,F,F,
        0,F,0,0,0,0,F,0,
        0,F,F,0,0,F,F,0,
        0,0,F,F,F,F,0,0,
        0,0,0,F,F,0,0,0,
    ],
    // ...
];

It takes a little effort to draw each of the glyphs in a text editor like this, but it's not too bad. Overall, it took about ten minutes, and I'm happy with the result. I used an F constant to get some much needed syntax highlighting in the editor to make my life easier.

The blitting code is pretty straight forward. It just loops over the digits in the count, and blits them into an image buffer. The image buffer is then encoded as a PNG using the png crate. I increased the difficulty by putting in a larger gap between groups of 3 digits to make larger numbers easier to read, but other than that, it was pretty straight forward:

/// Render a number with a space between every 3 digits.
///
/// The `reserve_width` is a minimum width of the image in number of digits.
/// This is useful if you want the image to always be the same width.
pub fn render_separated_number(number: usize, reserve_width: usize) -> Render {
    let number = number.to_string();

    // Spacing between groups of digits in pixels.
    let group_spacing = 3;

    // Split the number into groups of 3 digits, starting from the right.
    let groups: Vec<_> = number
        .as_bytes()
        .rchunks(3)
        .rev()
        .map(|b| std::str::from_utf8(b).unwrap())
        .collect();

    // Calculate the image size and allocate memory.
    let (width, height) = font::text_size(number.len().max(reserve_width));
    let width = 2 + width + group_spacing * (reserve_width / 3).max(number.len() / 3);
    let height = 2 + height; // 2px padding total
    let mut pixels = vec![0; width * height * size_of::<u32>()];

    // Calculate the very first X offset such that the number ends up right-aligned.
    let mut x = 1 // 1px padding on the left
        // First, calculate offset in character widths
        + Some(reserve_width as isize - number.len() as isize)
            .filter(|&n| n > 0)
            .map(|n| font::text_size(n as usize).0)
            .unwrap_or(0)
        // Then refine the offset in number of group spaces skipped
        + Some((reserve_width / 3) as isize - (number.len() / 3) as isize)
            .filter(|&n| n > 0)
            .map(|n| n as usize * group_spacing)
            .unwrap_or(0);
    for group in groups {
        font::blit_into(&mut pixels, width, group, x, 1);
        x += font::text_size(group.len()).0 + group_spacing;
    }

    Render {
        width,
        height,
        pixels,
    }
}

If you want to improve this code, feel free to change it on GitHub. It can definitely be optimized and made easier to read.

Testing the blitting works correctly is more of a visual process, and rather than run the Lambda over and over with a ton of different counts in DynamoDB, I created a little test harness that uses minifb to quickly cycle through a wide variety of numbers in a native window.

So how fast is it?

Finally, the part you've been waiting for that I mentioned in the beginning. How fast is this thing? It turns out that the rendering is effectively instantaneous, clocking in at 2 microseconds in benchmarks. This is not so surprising since this is Rust, and we're just copying numbers between arrays in memory. The whole operation probably fits in the CPU cache. The slowest part of this is likely the memory allocation for converting the number to a string first. This could be removed by just directly operating on the input number.

The real bottleneck is the DynamoDB GetItem and PutItem operations, which I measured through CloudWatch below:

Operation	Average	P99
GetItem	5 ms	12 ms
PutItem	4 ms	4 ms

In the case that the visitor has already established a session, only the GetItem operation will run, which makes the minimum run time 5 milliseconds. But when a new visitor comes along, both GetItem and PutItem will run, which has a minimum run time 9 milliseconds. Of course, this is assuming only a single visitor at a time. In the case of concurrency, both of these operations may be repeated up to 5 times, which means the maximum run time should be 80 milliseconds.

There is a way the Lambda could be even slower than that, and that's if:

Calls to DynamoDB are returning errors or timing out. In this case, the built-in SDK retry will run, and this uses exponential backoff which can significantly slow things down. To mitigate this, I configured the client to only retry once, and set a timeout on the overall operation of 200 milliseconds.
The Lambda isn't warmed up, in which case, there will be a small cold startup time.

With low traffic, the cold start up will likely occur for every request, which other people have graciously measured for me, and amounts to approximately 17 milliseconds.

With all that, the absolute minimum time should be 5 milliseconds, and the maximum without DynamoDB failures should be 97 milliseconds. If everything goes wrong, it could be as high as 200 milliseconds since, although there are multiple calls being made, a timeout on one will fail the entire Lambda.

Conclusion

This was a fun little exercise, and I'm happy with the result. I'm sure there are many ways it could be improved, but it's good enough for now. Even with a lot of traffic on this site, the counter is likely to continue falling within the AWS free tier, and if it doesn't, it still shouldn't cost much.

Footnotes

[1]: I think these were also called "hit" counters back then.

[2]: The Lambda free tier includes 1 million requests, and 400,000 GB-seconds. What is a GB-second? It's the multiple of the Lambda memory allocation and the time the Lambda took. So if a Lambda ran for 1 second with 128 MB of memory, that would be 0.128 GB-seconds. If we assume this Lambda will take an average of 40 milliseconds to run at 128 MB, then we can calculate that we would only use 5,120 GB-seconds at 1 million requests. Therefore, the request limit is the limiting factor of the free tier for this use case. So long as the Lambda doesn't get called more than 1 million times, it will be free. This same exercise can be applied for DynamoDB, and this Lambda shouldn't get anywhere near its free tier.

[3]: (Opinions are my own) Not really sure why they chose such difficult to comprehend units for this. After about 5 zeros, I can't keep track anymore of how many there are when punching it into a calculator. I think it would have been better to show this in seconds, as that would have slightly less zeros in it so that I don't have to copy and paste the number to have confidence I'm doing calculations with it correctly.