Learn the Basics of Lua for Web Scraping as a Python Developer
Get started with the essentials of Lua in 10 minutes

Lua is a lightweight and high-level programming language that is typically employed for scripting purposes. It was designed with the intention to be integrated into other applications and allows developers to extend the functionality of their software by custom scripts.
As a Python developer, you may normally not have the chance to work with Lua. However, if you need to scrape JavaScripe web pages in your work, you will have a high chance to use it because of Splash, a lightweight and scriptable browser engine developed by Zyte (previously Scrapinghub), the same company that develops Scrapy.
Lua is used in Splash as a scripting language to provide more advanced control over the web scraping process. With Lua scripts, you can interact with web pages, manipulate the DOM, execute advanced JavaScript, etc. In this post, we will introduce the basics of Lua that are essential for web scraping using Splash. You will then be able to understand Lua scripts in Splash and can start to write scripts by yourself.
Install Lua
Actually, for this post, we don’t need to install Lua, you can just try the commands in Lua Live Demo. However, if you want to run Lua on your own computer locally, you can simply download the source code and build it.
Basic syntax
This part is not meant to be comprehensive and will only cover the essentials that will very likely be needed in Splash scripting. For a more comprehensive introduction, the book “Programming in Lua” and the official reference manual are recommended.
- The comments in Lua start with two hyphens (
--) as in SQL. - Lua is case-sensitive.
- No need to declare variables in Lua before accessing them. Variables are by default global but can be changed to local by declaring them with the
localkeyword. - Lua is dynamic typing meaning the types are inferred from the values as in Python. We don’t need to (and can’t) specify the types when we declare variables.
- A string can be created with single, double quotes, or double curly brackets (
[[]]). Double curly brackets are used to write multi-line strings which are very commonly used in Splash because JavaScript code is normally written as multi-line strings. nilis similar toNonein Python. However, it does more than serve as an empty or undefined value in Lua. A variable whose value isnilwill garbage collected in Lua.- Only
falseandnilare falsy in Lua, and any other value is truthy, including 0 and empty strings. - Functions are first-class values in Lua, similar to that in Python, meaning functions can be used in the same way as other types of data. They can be stored in variables, passed to another function, or returned from another function.
- Table is the only data structure in Lua. There are no other data structures like list/array, dictionary/object, etc, which are commonly found in other languages. However, all other data structures can be constructed based on tables as we will see later.
- When a function is called with a string or a table, the parentheses can be omitted. This can be confusing for beginners.
- Tables can be treated as objects and can have methods. The methods can be called with either a dot (
obj.method()) or a colon (obj:method()). The latter is a syntactic sugar forobj.method(obj). This is very commonly used in Splash and will be introduced in more detail later.
We will then further illustrate some parts that need a further introduction with some simple code.
Variables
It should be emphasized that it is global variables that do not need to be declared, local variables still need to be declared. Otherwise, they will be global, even when created inside a function:
function testVariables()
var1 = 100
local var2 = 200
print(var1, var2)
end
testVariables() -- 100, 200
print(var1, var2) -- 100, nilAs you see, var1 is a global variable even though it’s created in a function. As a best practice, we also always declare variables as local unless they must be globally used.
Functions
As mentioned above, functions are first-class values in Lua, meaning they can be stored in variables, passed to another function, or returned from another function.
A function can be created with the function keyword directly:
function echo(var)
print(var)
endIt can also be created anonymously and then assign to a variable:
echo = function (var)
print(var)
endActually, the former can be seen as a syntactic sugar for the latter. It’s more prominent when creating functions for a table:
obj = {}
function obj.echo(var)
print(var)
end
-- Above is the same as:
obj.echo = function (var)
print(var)
end
-- We can call both with the same syntax:
obj.echo(100) -- 100Closures
A very commonly encountered concept in Lua is closure, which is basically a function returned by another function. An important feature of closure is that it can remember and update the variables passed in from the parent function.
Let’s see it in a simple example:
function createAdder(initVal)
local value = initVal or 0 -- This is the way to set default value in Lua.
return function (num)
value = value + num
print(value)
end
end
adder = createAdder()
adder(1) -- 1
adder(2) -- 3
adder100 = createAdder(100)
adder100(1) -- 101
adder100(2) -- 103Closures also demonstrate that functions in Lua are first-class values and can be returned from another function.
Arrays
As mentioned above, the only data structure in Lua is table, and there is no such data structure of array or list. However, tables can be used to create arrays natively in Lua. You just need to put discrete values in curly brackets, much like creating sets in Python:
arr = {"red", "green", "blue"}And then we can access the values by index. However, note that unlike in most other programming languages, the index starts at 1 for Lua!
print(arr[1]) -- red
print(arr[3]) -- blueUnder the hood, arrays in Lua are still associative arrays, which are collections of key-value pairs where each key is linked to a specific value. The above array is the same as:
arr = {[1]="red", [2]="green", [3]="blue"}Note that the indices must be put in square brackets if they are specified explicitly.
We can use the for loop to iterate the items of an array. Let’s define a function that can do such a job and print the array in a nice way:
function printArr(arr)
if not arr then
print(arr) -- nil
return
end
repr = '['
for _, v in ipairs(arr) do
repr = repr .. tostring(v) .. ', '
end
repr = string.gsub(repr, ",%s*$", "") .. ']'
print(repr)
end
arr = {"red", "green", "blue"}
printArr(arr) -- [red, green, blue]This simple example has several commonly used knowledge points of Lua:
- Note the syntax of the
ifcondition andforloop in Lua. We need to usethen … endordo … endexplicitly to denote a code block in Lua. ipairs()return the index/value pairs of an array. Since we are not using the index here, it’s assigned to a dummy variable (_) which is the same as in Python...is used to concatenate strings in Lua. Non-string values will be converted to strings using thetostring()function before concatenation.- The
string.gsub()function searches for a pattern in a string variable and replaces it with a replacement string. The pattern is similar to regular expressions and in most cases works in the same way.
As a side note, we can get the length of an array with the hash operator (#) and can thus loop through it using the numeric for loop:
arr = {"red", "green", "blue"}
-- Note that the range includes both ends.
for i = 1, #arr do
print(arr[i])
end
-- red
-- green
-- blueWe can use table.insert() and table.remove() to insert or remove an item in an array:
table.insert(arr, "black") -- Inserted in the end.
table.insert(arr, 2, "pink") -- Insert at a specific position.
printArr(arr) -- [red, pink, green, blue, black]
arr.remove(arr) -- Remove the last item.
arr.remove(arr, 3) -- Remove the item at a specific position.
printArr(arr) -- [red, pink, blue]Associative arrays
An associative array in Lua is a collection of key-value pairs, where each key is linked to a specific value. It is similar to the dictionaries in Python. However, from a technical point of view, it’s more similar to the objects in vanilla JavaScript.
Firstly, if the keys are strings which are also valid identifier names in Lua, we can use them as keys directly, and no need to use quotes and square brackets. This is also the most common use case:
myTable = {value=100}Technically, it’s the same as:
myTable = {['value']=100}When the keys are variables, numbers, reserved keywords like if and for, or strings not valid as identifier names in Lua, they must be put in square brackets:
myTable= {
value=100,
[1]='color',
['if']=true,
['1stName']='John',
['last name']='Doe'
}When the key is a string that is also a valid identifier name, we can access its value either using a dot or a pair of square brackets:
print(myTable.value) -- 100
print(myTable["value"]) -- 100
print(myTable[value]) -- nilNote that the third one returns nil. This is because the value of the value variable is used as the key, which is nil. nil does not exist as a key in the table, and it’s actually not allowed. However, if there is a variable called value in your code, you may get unexpected results.
For other types of keys, you must always use square brackets to access the value:
print(myTable[1]) -- color
print(myTable['if']) -- true
print(myTable['1stName']) -- 'John'
print(myTable['last name']) -- 'Doe'We can loop through the key/value pairs of a table using the pairs() function. Note that the keys are not ordered and may be different the sequence when they are created:
for k, v in pairs(myTable) do
print(k .. ' -> ' .. tostring(v))
end
-- value -> 100
-- last name -> Doe
-- 1 -> color
-- if -> true
-- 1stName -> JohnClasses and objects
Lua is not a native object-oriented programming (OOP) language and thus there are no such concepts as classes or objects. Everything (classes or objects) is just tables in Lua. However, OOP can be realized easily with tables.
Firstly, a table can be seen as an object already and we can add functions to it as we saw previously. The one demonstrated above is like a static method that does not require an instance. We can also create classical instance methods that do require an instance. It can be realized with the “magical” colon in Lua:
person = {firstName = "John", lastName = "Doe"}
function person:getFullName()
return self.firstName .. ' ' .. self.lastName
end
print(person:getFullName()) - John DoeIn this example, self refers to the object itself calling the function, similar to the self in Python.
The function declaration using a colon is just a syntactic sugar for the following declaration (yes, there are many sugars 🍬 in Lua):
person = {firstName = "John", lastName = "Doe"}
function person.getFullName(self)
return self.firstName .. ' ' .. self.lastName
end
print(person.getFullName(person))Understanding this syntax sugar is very important to understand classes, instantiation, and inheritance in Lua. Let’s demonstrate it with a simple example:
-- Create a class, which is just a table in Lua.
Animal = {}
-- Create a constructor function for the class:
function Animal:new()
local newAnimal = {}
-- Create a metadata table which can be associated with another table to customize its behavior.
local metatable = {}
metatable.__index = self
setmetatable(newAnimal, metatable)
return newAnimal
end
-- Create an instance method.
function Animal:breathe()
print("I'm breathing...")
end
-- Create an instance of the Animal class.
animal = Animal:new()
-- Call an instance method.
animal.breathe() -- I'm breathing...When using tables as classes in Lua, there are two very important concepts, namely, metatable and metamethod.
In Lua, a metatable is a special table (well, it’s just a regular table but used for special purposes) that can be associated with another table to customize its behavior or provide fallbacks for non-existent keys. Metamethods are functions/methods defined in the metatable which can provide operator overloading or implementing inheritance.
The most important method is __index which can accept a function with the table as the first parameter, and the key being accessed as the second one. Therefore, a verbose version of the constructor can be written as:
-- Create a constructor function for the class:
function Animal:new()
local newAnimal = {}
local metadataTable = {}
metadataTable.__index = function (_, key)
return self[key]
end
setmetatable(newAnimal, metadataTable)
return newAnimal
endThe table passed in (here newAnimal) is not used and thus can be replaced with the dummy variable _.
For class instantiation and inheritance, the use of the __index metamethod is so common that Lua provides a shortcut. Even though __index is called a metamethod, it can accept a table as the value as shown above. This is another syntactic sugar for the verbose version above.
Actually, since the setmetatable() function returns the table back, we can simplify the constructor as follows, which is very commonly used in practice:
-- Create a constructor function for the class:
function Animal:new()
local newAnimal = {}
return setmetatable(newAnimal, {__index = self})
endThe metamethod __index is assigned to a metatable created on the fly.
With the knowledge above, class inheritance is easier to understand:
-- Well, in Lua, an instance of a class can be treated as another class, and
-- it's still just a table...
-- It inherits all the properties and methods of its parent.
Bird = Animal:new()
function Bird:fly()
print("I can fly!")
end
bird = Bird:new() -- Inherits from Animal.
bird.breathe() -- Also inherits from Animal.
bird.fly() -- New in the Bird classA property/method will be checked in the current instance, the class, the parent class, the grandparent class, etc, whichever is the first one that has the given property/method. If none can be found in all of them, nil be returned.
Splash scripting example
Finally, let’s check a simple example of Splash from the official document, which shall be fairly simple to understand now:
function main(splash, args)
splash:go("http://example.com")
splash:wait(0.5)
local title = splash:evaljs("document.title")
return {title=title}
endAs you see the colon is used very heavily in Splash. It can be mysterious if you don’t know Lua. However, with the knowledge of this post, you should be very comfortable working with it now.
Some further posts will be published on how to use Lua scripting in Splash for scraping JavaScript web pages in more detail.
