Lua Memory That Stops Surprising You

Practical memory management and OOP patterns for Lua at scale in game systems

11 minute read

With my current projects, I’m spending a lot of time in Lua 5.4. Get ready for some posts about it as the good, bad, and ugly has been an interesing journey–not just in the language, but in how the language has modernized in the last few releases.

Let’s start with something that I could nearly ignore in C# and javascript: garbage collection.

Lua’s garbage collector doesn’t announce itself. It waits until the heap crosses a threshold, then pauses execution while it sweeps. In a local system or editor, that pause is invisible. In a game running at 60 frames per second, a 16ms budget per frame means any GC pause over a few milliseconds shows up as a hitch the player feels. Worse is server side code and that 16ms hits hundreds of clients at once.

The Lua 5.4 documentation describes the incremental collector’s “atomic step”–a full-graph traversal that can’t be interrupted and can run into the tens of milliseconds depending on live object count. For the GC to stay well under that threshold, the job isn’t to tune the collector. It’s to give it less to do.

That’s what I’m digging into today: not GC configuration, but the allocation patterns that create pressure in the first place–and the OOP structures that either compound or avoid it.

Before Optimizing: Know Whether You Have a Problem

Optimization without measurement is superstition. Before changing anything, establish a baseline.

Lua 5.4 exposes memory state through collectgarbage():

-- Snapshot memory usage across a frame
local function measureFrame(frameFunc)
    collectgarbage("stop")           -- pause automatic GC temporarily
    local before = collectgarbage("count")  -- returns KB used as float

    frameFunc()                      -- run your game logic

    local after = collectgarbage("count")
    collectgarbage("restart")        -- resume automatic GC

    return after - before            -- KB allocated this frame
end

-- Usage: log over 100 frames to find spikes in a hot path
for i = 1, 100 do
    local allocated = measureFrame(function() entity:doTheThing(dt) end)
    if allocated > 10 then           -- threshold: tune to your budget
        print(string.format("Frame %d: %.2f KB allocated", i, allocated))
    end
end

If per-frame allocation is flat (near zero on steady frames), your GC pressure is low. If it spikes regularly–especially on frames with many entities, spell effects, or AI decisions–you have allocation happening in hot paths that you can address.

 
A useful rule of thumb from the lua-users wiki on GC in real-time games: for smooth 60fps, target no more than 1–2ms per frame for GC. That means minimizing allocations per frame, not just in total. The collector handles steady-state lifetime objects efficiently. It’s the per-frame churn that kills frame pacing.

The Allocation Problem in Hot Paths

The most common source of frame-by-frame allocation is table creation inside loops. It looks clean and harmless, like code I’ve written many times:

-- Runs every frame, for every entity:
function onEntityMoved(entity)
    local event = {              -- new table allocated here
        id     = entity.id,
        type   = "move",
        x      = entity.x,
        y      = entity.y
    }
    eventBus:dispatch(event)     -- event lives briefly, then becomes garbage
end

With 200 active entities at 60fps, this is 12,000 table allocations per second–each one short-lived, each one contributing to the GC’s next atomic step trigger. The collector handles it, but not for free.

Object pooling solves this by pre-allocating a fixed set of tables and reusing them instead of creating and discarding:

-- Pool size: base on your expected max entities, plus a safety buffer.
-- If your entity budget is 256, a 25% buffer gives 320 slots.

local MAX_ENTITIES  = 256
local POOL_SIZE     = math.ceil(MAX_ENTITIES * 1.25)  -- 320; adjust to your cap
local _pool         = {}
local _poolIndex    = 0

-- Initialize the pool once at startup--not per frame
for i = 1, POOL_SIZE do
    _pool[i] = { id = 0, type = "", x = 0.0, y = 0.0 }
end

-- Reuse a slot instead of allocating a new table
local function getPooledEvent(entity, eventType)
    -- Wrap index: when we hit POOL_SIZE, cycle back to 1
    _poolIndex = (_poolIndex % POOL_SIZE) + 1

    local event = _pool[_poolIndex]  -- grab existing table

    -- Overwrite fields; no new table is created
    event.id   = entity.id
    event.type = eventType
    event.x    = entity.x
    event.y    = entity.y

    return event
end

eventBus:dispatch(getPooledEvent(entity, "move"))
 
Pool reuse assumes events are consumed before the pool wraps. If you store references to pooled events across frame boundaries, you’ll get stale data–the same slot gets reused next cycle with different values. Size the pool to your peak-frame entity count, not your average. And if you store event references beyond a single dispatch, copy the data out instead of holding the reference.

Local Variables and the Global Lookup Cost

This one surprises developers coming from compiled languages because in most compiled languages, variable scope and access speed are unrelated. In Lua’s virtual machine, they are one in the same.

Global variables in Lua live in a table called _ENV. Every access to a global name–math.cos, pairs, anything not declared local–is a hash table lookup: find the key in _ENV, retrieve the value. Local variables are stored in the VM’s register array and accessed by index. The difference is a hash traversal versus an array dereference.

Lua 5.4 Bytecode: Global vs. Local Access

Global Access
GETTABUP A 0 "math"
GETTABUP A A "cos"
Resolves: _ENV → hash lookup → value
Extra: hash probe on every call
Local Access
MOVE A B
Resolves: register[index]
Direct: one instruction, no lookup

In a loop running 10,000 times per frame, math.cos as a global means 10,000 hash lookups. Pulled into a local, it’s 10,000 register reads and one hash lookup at function entry:

-- Before: global lookup on every iteration
function updateProjectiles(dt)
    for i = 1, #projectiles do
        local p = projectiles[i]
        -- math.cos and math.sin are resolved via _ENV each call
        p.x = p.x + math.cos(p.angle) * p.speed * dt
        p.y = p.y + math.sin(p.angle) * p.speed * dt
    end
end

-- After: one lookup per function call, register access in the loop
function updateProjectiles(dt)
    local cos = math.cos   -- resolved once, stored in register
    local sin = math.sin
    local ps  = projectiles  -- local ref to the global table too

    for i = 1, #ps do
        local p = ps[i]
        p.x = p.x + cos(p.angle) * p.speed * dt
        p.y = p.y + sin(p.angle) * p.speed * dt
    end
end

It feels like old Visual Basic with Dim’s at the top of the page and a bit counterintuitive from a C++ or C#. There scope is a readability concern, not a performance one. In Lua, local is a performance keyword as much as a scoping one.

OOP That Doesn’t Oops

Lua has no built-in class system. The common substitute is tables with metatables, but the specific pattern you use determines whether methods are shared across instances or duplicated per instance.

The closure approach creates a separate function object for every method on every instance. I’ll admit, this is what felt like the “right way” when I first dug into Lua objects as it feels like closures in javascript.

-- Each call to Enemy.new() allocates new function objects for every method
function Enemy.new(enemyType, maxHealth)
    local self = { type = enemyType, health = maxHealth }

    -- New function object allocated here for each instance:
    function self:takeDamage(amount)
        -- Also worth a note that : syntax desugars to function(self, amount), the self shadows the outer local.
        -- The inner self is the call-site receiver, not the constructor's self. This is a common source of confusion and bugs.
        self.health = math.max(0, self.health - amount)
    end

    function self:isAlive()
        return self.health > 0
    end

    return self
end
-- 500 enemies = 1,000 function allocations, none shared

The prototype pattern puts methods on a shared table and uses __index to find them at call time:

-- Enemy is a table acting as a "class" (prototype)
local Enemy = {}

-- __index: when a field isn't found on an instance, look here next.
-- This is Lua's mechanism for prototype-based inheritance.
Enemy.__index = Enemy

function Enemy.new(enemyType, maxHealth)
    -- setmetatable(table, meta): attaches a metatable to a table.
    -- When Lua can't find a key on the returned instance, it follows
    -- __index to Enemy and finds methods there instead.
    local instance = setmetatable({}, Enemy)

    instance.type      = enemyType
    instance.health    = maxHealth
    instance.maxHealth = maxHealth

    return instance
end

-- Methods live on Enemy--not copied per instance
function Enemy:takeDamage(amount)
    -- 'self' is the instance; colon syntax is shorthand for
    -- function Enemy.takeDamage(self, amount)
    self.health = math.max(0, self.health - amount)
end

function Enemy:isAlive()
    return self.health > 0
end

function Enemy:getHealthPercent()
    return self.health / self.maxHealth
end

-- 500 enemies share the same three function objects on Enemy

The difference is meaningful at scale. 500 enemies with the closure pattern hold 1,000 function allocations that are identical in behavior but distinct in memory. With the prototype pattern, 500 enemies hold 500 instance tables plus three function objects total.

Inheritance and the Super Call

Lua’s inheritance builds on the same __index mechanism. The pattern is: set up one class to delegate field lookups to another.

local Character = {}
Character.__index = Character

function Character.new(name, health)
    -- Base constructor: creates a Character instance
    return setmetatable({ name = name, health = health }, Character)
end

function Character:takeDamage(amount)
    self.health = math.max(0, self.health - amount)
end

function Character:isAlive()
    return self.health > 0
end

-- Enemy inherits from Character.
-- setmetatable on Enemy itself (not an instance): when Enemy doesn't
-- have a key, look at Character. This chains the prototype lookup.
local Enemy = setmetatable({}, { __index = Character })
Enemy.__index = Enemy

function Enemy.new(name, health, experienceValue)
    -- Call Character.new to set up the base fields--
    -- this is Lua's equivalent of super() in other languages.
    -- We then re-setmetatable the result to point to Enemy,
    -- so Enemy's methods shadow Character's where defined.
    local instance = Character.new(name, health)
    instance.experienceValue = experienceValue
    return setmetatable(instance, Enemy)  -- re-bind metatable to Enemy
end

function Enemy:takeDamage(amount)
    -- Override with armor mitigation, then call the base:
    local mitigated = math.max(0, amount - (self.armor or 0))
    -- Explicit super call: Character.takeDamage(self, ...) bypasses
    -- Enemy's __index and calls Character's version directly.
    Character.takeDamage(self, mitigated)
end

function Enemy:getExperience()
    return self.experienceValue
end

The Character.new(name, health) call in Enemy.new is the super() pattern in Lua. Unlike languages with explicit super syntax, you call the parent constructor directly and then rebind the metatable. It’s explicit rather than implicit, which makes the inheritance chain visible but requires you to remember to do it.

 
Deep inheritance chains–Character → Enemy → BossEnemy → DragonBoss–add a metatable hop for each level when looking up methods that don’t exist closer in the chain. For performance-sensitive types with many method calls per frame, consider flattening commonly used methods onto the final class, or storing method references as locals in your update loop rather than resolving them through the chain each call.

String Caching in Hot Paths

Lua interns strings: two identical string literals share the same memory. That makes string comparison fast (pointer equality). What it doesn’t help with is concatenation–.. always creates a new string object, regardless of whether the result matches an existing interned string.

-- This creates a new string every call:
local function makeKey(entityId, category)
    return entityId .. "_" .. category  -- fresh allocation each time
end

-- Called 10,000 times? 10,000 new strings.

For keys generated from components, cache on first construction:

-- Cache: stores the constructed key string so subsequent calls return
-- the existing interned value instead of re-concatenating.
local _keyCache = {}

local function getCachedKey(entityId, category)
    -- Two-level index: _keyCache[entityId] is a sub-table per entity
    local byEntity = _keyCache[entityId]
    if not byEntity then
        byEntity = {}
        _keyCache[entityId] = byEntity
    end

    local key = byEntity[category]
    if not key then
        -- Build the string once, store for reuse
        key = entityId .. "_" .. category
        byEntity[category] = key
    end

    return key  -- same string object on every subsequent call
end

For hot paths with known, bounded key spaces–entity types, component names, state labels–pre-compute all keys at startup and store them in constants rather than building them dynamically at all:

-- Pre-compute at module load; no concatenation at runtime
local Keys = {
    ENEMY_MOVE    = "enemy_move",
    ENEMY_ATTACK  = "enemy_attack",
    ENEMY_DEATH   = "enemy_death",
    PLAYER_MOVE   = "player_move",
}

-- Usage in hot path: no allocation, just a table lookup
eventBus:dispatch(Keys.ENEMY_MOVE, entity)

Where to Look First

The patterns above address the most common sources of per-frame allocation pressure: tables created in loops, global lookups in inner functions, per-instance function objects, and runtime string concatenation. Several of these patterns I’ve been refactoring out of our code base, measuring performance, and iterating. It’s been a great learning process.

The order of impact varies by system. A game with many short-lived projectiles will see the most benefit from pooling. A system with a dense AI update loop may benefit most from localizing globals. An entity-component system built with closure-style OOP will see the most change from switching to prototype-based classes.

Measure first. collectgarbage("count") before and after suspect sections gives you the KB-per-frame number you need to prioritize. Then address the highest-allocation hot path first. The GC is efficient at managing long-lived objects. What it struggles with is high-frequency short-lived ones, and those are the patterns worth finding.