Free AI web copilot to create summaries, insights and extended knowledge, download it at here

9255

Abstract

ated_at DATETIME NOT NULL ); "</span> ><span class="hljs-variable">${SQL_SCRIPT_PATH}</span>

<span class="hljs-keyword">if</span> [ <span class="hljs-string">"<span class="hljs-variable"> ${ROWS}</span>"</span> -gt 0 ]; <span class="hljs-keyword">then</span> <span class="hljs-built_in">echo</span> <span class="hljs-string">"########### Generating insert statement for <span class="hljs-variable">$ {ROWS}</span> rows ###########"</span> <span class="hljs-built_in">echo</span> <span class="hljs-string">"INSERT INTO purchase_transaction VALUES"</span> >><span class="hljs-variable">${SQL_SCRIPT_PATH}</span> <span class="hljs-keyword">fi</span>

<span class="hljs-keyword">for</span> ((i = <span class="hljs-variable"> ${NUMBER_OF_CUSTOMERS}</span>; i >= <span class="hljs-number">0</span>; i--)); <span class="hljs-keyword">do</span> RANDOM_NUMBER=$ ((RANDOM))

<span class="hljs-keyword">if</span> [ <span class="hljs-variable"> ${i}</span> == 0 ]; <span class="hljs-keyword">then</span> NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER=<span class="hljs-variable">$ {ROWS}</span> <span class="hljs-keyword">else</span> MAX_NUMBER_OF_TRANSACTION_PER_CUSTOMER= $((<span class="hljs-variable">$ {ROWS}</span> / <span class="hljs-variable"> ${i}</span>)) NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER=$ ((<span class="hljs-variable"> ${RANDOM_NUMBER}</span> % <span class="hljs-variable">$ {MAX_NUMBER_OF_TRANSACTION_PER_CUSTOMER}</span> + <span class="hljs-number">2</span>)) ROWS= $((ROWS - NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER)) <span class="hljs-keyword">fi</span> getUUID <span class="hljs-keyword">for</span> ((j = <span class="hljs-number">1</span>; j <= <span class="hljs-variable">$ {NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER}</span>; j++)); <span class="hljs-keyword">do</span> RANDOM_NUMBER= $((RANDOM)) getPaymentType getAmount getDate <span class="hljs-keyword">if</span> [ <span class="hljs-variable">$ {i}</span> == 0 ] && [ <span class="hljs-variable"> ${j}</span> == <span class="hljs-variable">$ {NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER}</span> ]; <span class="hljs-keyword">then</span> LAST_CHAR=<span class="hljs-string">";"</span> <span class="hljs-keyword">else</span> LAST_CHAR=<span class="hljs-string">","</span> <span class="hljs-keyword">fi</span> <span class="hljs-built_in">echo</span> <span class="hljs-string">"(uuid(),"<span class="hljs-variable"> ${PAYMENT_TYPE}</span>\", \"<span class="hljs-variable">$ {AMOUNT}</span>", "<span class="hljs-variable"> ${UUID}</span>\", \"<span class="hljs-variable">$ {DATE}</span>")<span class="hljs-variable"> ${LAST_CHAR}</span>"</span> >><span class="hljs-variable">$ {SQL_SCRIPT_PATH}</span> <span class="hljs-keyword">done</span> <span class="hljs-keyword">done</span>

<span class="hljs-built_in">echo</span> <span class="hljs-string">"########### Running SQL script against DB ###########"</span> mysql --user=<span class="hljs-string">"customer_user"</span> --password=<span class="hljs-string">"customer_password"</span> --database=<span class="hljs-string">"customer"</span> <<span class="hljs-variable">${SQL_SCRIPT_PATH}</span>

<span class="hljs-built_in">echo</span> <span class="hljs-string">"########### Script execution finished! ###########"</span></pre></div><p id="8d56">The file looks confusing, but it is only the first look. At the beginning of the file, I declare variables. One variable is an array that contains payment types, the next variable is set the number of rows that will be inserted.</p><p id="1d6d">As I've mentioned before the customer can have multiple purchase transactions, so we will calculate the number of customers by dividing the number of rows by 3. This number for me makes more sense since if we have for example 10 transactions we can divide it between 3 customers and each customer can have from 1 to 8 transactions.</p><p id="8cd2">The next variable defines the path to the directory inside the container where the SQL script will be created. As you may remember this directory is mapped as a volume in the docker-compose file.</p><p id="16e6">To generate values for the insert statement I define a couple of functions. The first function will <a href="https://javarevisited.blogspot.com/2021/05/3-examples-to-generate-random-alphanumeric-string-in-java0.html">generate the UUI</a>D which is used as an id of a customer. The second function generates a random timestamp, then one more function gets the random value from the payment type array. And the last function generates a random value for the amount of transaction.</p><p id="c525">After variables and function definition, the SQL statement that creates a table is generated and saved into the file, created in the directory defined in the previously defined environment variables.</p><p id="5154">Then the most important part is where the script creates insert statements. For each customer, the number of transactions is randomly calculated. For each transaction defined above functions are called to generate the values. And then those values are combined into the insert statement and this statement is appended to a script file created earlier.</p><p id="6276">And last but not least the script has a command to upload the created file into a database.</p><p id="7e9d">The complete code can be found here:</p><div id="5d96" class="link-block"> <a href="https://github.com/polovyivan/mysql-init-db-on-container-startup"> <div> <div> <h2>GitHub - polovyivan/mysql-init-db-on-container-startup</h2> <div><h3>You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…</h3></div> <div><p>github.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*_Zx5CxF7W4bJJWBh)"></div> </div> </div> </a> </div><h2 id="2a90">Setup execution</h2><p id="ab3d">To create and run the container, execute the command from the directory where the docker-compose file is located:</p><div id="7fa4"><pre><span class="hljs-variable"> $docker</span>-compose up</pre></div><p id="95bc">or for Docker Compose used as a plugin:</p><div id="36e2"><pre><span class="hljs-variable">$ docker</span> compose up</pre></div><p id="1724">Depending on the number of rows and the configuration of the PC the database creation time may take up to 5 minutes, so be patient. Creating the database instance with 100k records takes up to 5 minutes on my PC with 4 cores CPU and 8 GiB of RAM.</p><p id="20bf">Unfortunately, the bash script is not as performant as for example Java. This is the reason why the creation of the transaction id I’ve left for the database. The execution logs look like below:</p><figure id="15e4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*f0zv_tAnLkvJ49mPi2sBBg.png"><figcaption></figcaption></figure><h2 id="8c9e">Run queries in the database</h2><p id="02bf">The easiest way to run the queries in the database is to enter inside the container and use the MySQL command line interface. To enter the container execute the command:</p><div id="4046"><pre><span class="hljs-variable"> $docker</span> container <span class="hljs-built_in">exec</span> -it my_sql_local bash</pre></div><p id="50ed">More about Docker commands you can find here:</p><div id="0bab" class="link-block"> <a href="https://faun.pub/docker-commands-for-software-developers-6101d511343"> <div> <div> <h2>Docker commands for software developers</h2> <div><h3>It is hard to imagine a software developer without Docker nowadays. Docker allows developers to keep a local\dots</h3></div> <div><p>faun.pub</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*BHrBnWPN3pm9QrA33Dg1fQ.png)"></div> </div> </div> </a> </div><p id="1f31">And to login inside the database run the command using credentials configured as environment variables in the docker-compose file:</p><div id="28bc"><pre><span class="hljs-variable">$ mysql</span> -ucustomer_user -pcustomer_password;</pre></div><p id="b2d6">After successful login, execute the below command to switch to the needed database:</p><div id="f5db"><pre>USE customer;

<span class="hljs-keyword">SELECT</span> <span class="hljs-built_in">count</span>(<span class="hljs-operator">*</span>) <span class="hljs-keyword">FROM</span> purchase_transaction;</pre></div><figure id="995a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*cUm0Lsod0NcKX0HTpUf1FA.png"><figcaption></figcaption></figure><p id="b5ba">If you don't like to use the CLI you can connect to a DB using any database tool, like DBeaver.</p><figure id="5c5f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7NfBC3A_tCuExh4DLkpMjA.png"><figcaption></figcaption></figure><p id="490c">Just need to append the database with extra configurations:</p><div id="0adf"><pre><span class="hljs-string">?a</span>l

Options

lowPublicKeyRetrieval=<span class="hljs-literal">true</span>&useSSL=<span class="hljs-literal">false</span></pre></div><h2 id="82cf">Performance of queries</h2><p id="db8f">Now about performance. There is a great article explaining how you can check the performance of a query. I highly recommend you read it.</p><div id="f0a9" class="link-block"> <a href="https://vladmihalcea.com/mysql-query-profiling-performance-schema/"> <div> <div> <h2>MySQL Query Profiling Using Performance Schema — Vlad Mihalcea</h2> <div><h3>In this article, I’m going to explain how to do query profiling using the MySQL Performance Schema. The Performance…</h3></div> <div><p>vladmihalcea.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*BE3YGFlAPp_3KXBy)"></div> </div> </div> </a> </div><p id="d281">I will demonstrate to you how to check the performance of a query using the following example. Imagine our application needs to check if the customer has any purchase transactions. For this, we will create a simple query that checks the existence of any transaction by the customer id. To speed up the query we will add the index for the column that holds the customer id and then compare the performance of the query before and after the index.</p><p id="0f88">To be able to check a performance we have to log in to the database as a root user. If you logged in as a simple database user for example the one set as the environment variable in the docker-compose file with the user name <b><i>customer_user</i></b> you won’t be able to query tables with needed information, so keep it in mind. Logg out from the simple user and execute the command:</p><div id="bbe7"><pre><span class="hljs-variable">$mysql</span> -uroot -proot;</pre></div><figure id="ac60"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EMKiBjJ-lDuWS-SVw_b4PA.png"><figcaption></figcaption></figure><p id="217c">After logging in as a root we have to perform a couple of queries to enable query performance profiling, as explained in the article mentioned before.</p><div id="4098"><pre><span class="hljs-keyword">UPDATE</span> performance_schema.setup_instruments <span class="hljs-keyword">SET</span> enabled <span class="hljs-operator">=</span> <span class="hljs-string">'YES'</span>, timed <span class="hljs-operator">=</span> <span class="hljs-string">'YES'</span>;

<span class="hljs-keyword">UPDATE</span> performance_schema.setup_consumers <span class="hljs-keyword">SET</span> enabled <span class="hljs-operator">=</span> <span class="hljs-string">'YES'</span>;</pre></div><p id="5220">First, we perform a simple query that checks if a customer has any purchase transactions. Remember that we don't have yet an index on this column.</p><div id="7024"><pre><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">EXISTS</span>(<span class="hljs-keyword">SELECT</span> <span class="hljs-operator"></span> <span class="hljs-keyword">FROM</span> purchase_transaction <span class="hljs-keyword">WHERE</span> customer_id <span class="hljs-operator">=</span> <span class="hljs-string">'e68316fa-4c54-419f-8e95-b2d3ce31e258'</span>);</pre></div><p id="b265">Because the customer id is randomly generated in your case it will be different than mine, just get anyone by executing the query.</p><div id="b2d1"><pre><span class="hljs-keyword">SELECT</span> <span class="hljs-operator"></span> <span class="hljs-keyword">FROM</span> purchase_transaction;</pre></div><figure id="e257"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Tvm7BDfb017kwLNGMGg_hw.png"><figcaption></figcaption></figure><p id="f1d5">Now we will execute a query to fetch the performance of the previous one:</p><div id="50bb"><pre><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">TRUNCATE</span>(timer_wait <span class="hljs-operator">/</span> <span class="hljs-number">10000000000</span>, <span class="hljs-number">2</span>) <span class="hljs-keyword">AS</span> timer_wait_sec, rows_sent,
rows_examined,
select_scan,
no_index_used, digest
<span class="hljs-keyword">FROM</span> performance_schema.events_statements_history_long <span class="hljs-keyword">WHERE</span> sql_text <span class="hljs-keyword">LIKE</span> <span class="hljs-string">'%e68316fa-4c54-419f-8e95-b2d3ce31e258%'</span>;</pre></div><p id="d0f4">The only way to get needed information at this point is by the SQL text of a query. Using this column is not that convenient because we have to find something quite distinct in the query to be able to fetch the right one. But this query returns a digest of an original query and in posterior queries, we can use it instead of a SQL text.</p><div id="c8a8"><pre><span class="hljs-keyword">SELECT</span> sql_text
<span class="hljs-keyword">FROM</span> performance_schema.events_statements_history_long <span class="hljs-keyword">WHERE</span> digest <span class="hljs-operator">=</span> <span class="hljs-string">'351a4a86f9401bf26de96e5312f2853aeed0b7177f8ce57090c16e86dadb894f'</span>;</pre></div><figure id="0de4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*iTIXVopMxSN8eC-OC6W-rA.png"><figcaption></figcaption></figure><p id="ba84">I ran a query a couple of times just to have more performance records. And then checked performance using the digest of a query.</p><div id="d16d"><pre><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">TRUNCATE</span>(timer_wait <span class="hljs-operator">/</span> <span class="hljs-number">10000000000</span>, <span class="hljs-number">2</span>) <span class="hljs-keyword">AS</span> timer_wait_sec, rows_sent,
rows_examined,
select_scan,
no_index_used, digest
<span class="hljs-keyword">FROM</span> performance_schema.events_statements_history_long <span class="hljs-keyword">WHERE</span> digest <span class="hljs-operator">=</span> <span class="hljs-string">'351a4a86f9401bf26de96e5312f2853aeed0b7177f8ce57090c16e86dadb894f'</span>;</pre></div><figure id="8e4d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*UMX_vemb94p9xn7iXOzuLA.png"><figcaption></figcaption></figure><p id="a892">As you can see from the screen the average time of a query fluctuates near 5 milliseconds. And the column <b><i>no_index_used</i></b> indicates that no index was used during the execution of a query.</p><p id="6b20">Now I add the index to the customer id column:</p><div id="5c07"><pre><span class="hljs-keyword">CREATE</span> INDEX customer_id_inx <span class="hljs-keyword">ON</span> purchase_transaction (customer_id);</pre></div><p id="e804">After I execute the original query couple of times.</p><figure id="b886"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sK2CvHy3jz0gzJLfQy4N8A.png"><figcaption></figcaption></figure><p id="6ec1">And then I check performance again by the digest:</p><div id="e00b"><pre><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">TRUNCATE</span>(timer_wait <span class="hljs-operator">/</span> <span class="hljs-number">10000000000</span>, <span class="hljs-number">2</span>) <span class="hljs-keyword">AS</span> timer_wait_sec, rows_sent,
rows_examined,
select_scan,
no_index_used, digest
<span class="hljs-keyword">FROM</span> performance_schema.events_statements_history_long <span class="hljs-keyword">WHERE</span> digest <span class="hljs-operator">=</span> <span class="hljs-string">'351a4a86f9401bf26de96e5312f2853aeed0b7177f8ce57090c16e86dadb894f'</span>;</pre></div><figure id="adfc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dH0ZDipO8bBvzcD0tHad4A.png"><figcaption></figcaption></figure><p id="a464">As you can see the performance of the query improved a lot and now it spends less than a millisecond because it takes advantage of an index.</p><h2 id="de95">Conclusion</h2><p id="ba2a">Docker gives us as developers an opportunity to replicate the production environment on our local machine. This tutorial is meant to show how you can create a database with a large amount of data. And use it to test and improve the queries executed by your application. Here I use simple examples, but these examples can serve as a base for your use cases.</p><p id="3f67">Thank you for reading! Please like and follow. If you have any questions or suggestions, please feel free to write me on my LinkedIn <a href="https://www.linkedin.com/in/ivan-polovyi-5a4082178/">account</a>.</p><p id="8953">Become a <a href="https://polovyiivan.medium.com/membership">member</a> for full access to Medium content.</p></article></body>

Initialize MySQL database inside the docker container with large-volume data

Application performance is essential. As a web application developer, I do everything to guarantee that my code will run fast. One of the offenders of the application response time is communication with a database.

Offen during development a small amount of data is used to test the code. But when the application goes to production it may handle hundreds of thousands of records and the communication between the app and the database can slow down the application’s performance.

In this tutorial, I will show you how you can create a local instance of a database with a large amount of data and test its performance.

The use case

Imagine we have a database with a table that stores the purchase transactions of customers. The structure of a table:

It contains the id of a purchase transaction and a payment type that can be one of the following VISA, MASTERCARD, DISCOVER, AMEX, or CASH. The table has a column to store the amount of the transaction, the date and time the transaction occurs, and the id of the customer to whom the transaction belongs. Any given customer can have many transactions.

Setup creation

The plan is to during the Docker container initialization create a file with SQL script containing statements to create a table and insert data, then execute this file against the instance of a database inside the container.

To spin a local instance of a MySQL database I will use Docker. I will use a Docker Compose to create a container for more convenience.

First, we have to define the docker-compose file:

version: '3.8'
services:
  my_sql_local:
    container_name: my_sql_local
    image: mysql:8.0.31
    restart: always
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: root
      MYSQL_DATABASE: customer
      MYSQL_USER: customer_user
      MYSQL_PASSWORD: customer_password
      MAX_ALLOWED_PACKET: 1073741824
    volumes:
      - ./init/mysql/scripts/bash:/docker-entrypoint-initdb.d
      - ./init/mysql/scripts/sql:/tmp/sql

This file is very simple to file, here I define one container based on the official MySQL image. More about the image you can find here:

mysql — Official Image | Docker Hub

MySQL is a widely used, open-source relational database management system (RDBMS).

hub.docker.com

I defined a couple of environment variables. Most of the variables have self-explanatory names and don’t need further explication.

As I will create the database instance with a large volume of data, I needed to set the MAX_ALLOWED_PACKET variable to the maximum to avoid any issue related to the size of data inserted into the database.

An important part of this definition file is a Docker volume mapping. In this section, I defined two of them. The first volume is mapped to the special directory inside the docker container.

When the container runs it will check this folder and if it finds files in this folder with extension sh or SQL it will execute the files. More about this feature you can find below:

Localstack resource creation on initialization.

This tutorial explains how to create the resources on localstac initialization.

aws.plainenglish.io

The second volume I map to store the SQL script mentioned before. This volume is not necessary to map. Without this volume, the setup will work properly. I do this mapping to have easy access to the script which is useful for testing and debugging purposes.

In the first volume, I put the following bash script file:

#!/bin/bash

echo "########### Creating env variables ###########"
PAYMENT_TYPES=("VISA" "MASTERCARD" "DISCOVER" "AMEX" "CASH")
ROWS=10000
NUMBER_OF_CUSTOMERS=$((${ROWS} / 3))
echo "########### Numbers of customers is ${NUMBER_OF_CUSTOMERS} ###########"

SQL_SCRIPT_PATH=/tmp/sql/01-init-sql-script.sql

function getUUID() {
  UUID=($(cat /proc/sys/kernel/random/uuid))
}

function getDate() {
  DATE=$(date -d "$((${RANDOM_NUMBER} % 22 + 2000))-
                  $((${RANDOM_NUMBER} % 12 + 1))-
                  $((${RANDOM_NUMBER} % 28 + 1))
                  $((${RANDOM_NUMBER} % 23 + 1)):00:00" '+%Y-%m-%d %H:%M:%S')
}

function getPaymentType() {
  PAYMENT_TYPE=${PAYMENT_TYPES[${RANDOM_NUMBER} % ${#PAYMENT_TYPES[@]}]}
}

function getAmount() {
  AMOUNT=$((1 + ${RANDOM_NUMBER} % (200 - 1))).$((${RANDOM_NUMBER} % 99))
}

echo "########### Creating sql script file ###########"
echo "
DROP TABLE IF EXISTS purchase_transaction;

CREATE TABLE purchase_transaction (
id VARCHAR(36) PRIMARY KEY,
payment_type VARCHAR(20) NOT NULL,
amount DECIMAL(19,4),
customer_id VARCHAR(36),
created_at DATETIME NOT NULL
);
" >${SQL_SCRIPT_PATH}

if [ "${ROWS}" -gt 0 ]; then
  echo "########### Generating insert statement for ${ROWS} rows ###########"
  echo "INSERT INTO purchase_transaction VALUES" >>${SQL_SCRIPT_PATH}
fi

for ((i = ${NUMBER_OF_CUSTOMERS}; i >= 0; i--)); do
  RANDOM_NUMBER=$((RANDOM))

  if [ ${i} == 0 ]; then
    NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER=${ROWS}
  else
    MAX_NUMBER_OF_TRANSACTION_PER_CUSTOMER=$((${ROWS} / ${i}))
    NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER=$((${RANDOM_NUMBER} % ${MAX_NUMBER_OF_TRANSACTION_PER_CUSTOMER} + 2))
    ROWS=$((ROWS - NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER))
  fi
  getUUID
  for ((j = 1; j <= ${NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER}; j++)); do
    RANDOM_NUMBER=$((RANDOM))
    getPaymentType
    getAmount
    getDate
    if [ ${i} == 0 ] && [ ${j} == ${NUMBER_OF_TRANSACTIONS_FOR_CUSTOMER} ]; then
      LAST_CHAR=";"
    else
      LAST_CHAR=","
    fi
    echo "(uuid(),\"${PAYMENT_TYPE}\", \"${AMOUNT}\", \"${UUID}\", \"${DATE}\")${LAST_CHAR}" >>${SQL_SCRIPT_PATH}
  done
done

echo "########### Running SQL script against DB ###########"
mysql --user="customer_user" --password="customer_password" --database="customer" <${SQL_SCRIPT_PATH}

echo "########### Script execution finished! ###########"

The file looks confusing, but it is only the first look. At the beginning of the file, I declare variables. One variable is an array that contains payment types, the next variable is set the number of rows that will be inserted.

As I've mentioned before the customer can have multiple purchase transactions, so we will calculate the number of customers by dividing the number of rows by 3. This number for me makes more sense since if we have for example 10 transactions we can divide it between 3 customers and each customer can have from 1 to 8 transactions.

The next variable defines the path to the directory inside the container where the SQL script will be created. As you may remember this directory is mapped as a volume in the docker-compose file.

To generate values for the insert statement I define a couple of functions. The first function will generate the UUID which is used as an id of a customer. The second function generates a random timestamp, then one more function gets the random value from the payment type array. And the last function generates a random value for the amount of transaction.

After variables and function definition, the SQL statement that creates a table is generated and saved into the file, created in the directory defined in the previously defined environment variables.

Then the most important part is where the script creates insert statements. For each customer, the number of transactions is randomly calculated. For each transaction defined above functions are called to generate the values. And then those values are combined into the insert statement and this statement is appended to a script file created earlier.

And last but not least the script has a command to upload the created file into a database.

The complete code can be found here:

GitHub - polovyivan/mysql-init-db-on-container-startup

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Setup execution

To create and run the container, execute the command from the directory where the docker-compose file is located:

$docker-compose up

or for Docker Compose used as a plugin:

$docker compose up

Depending on the number of rows and the configuration of the PC the database creation time may take up to 5 minutes, so be patient. Creating the database instance with 100k records takes up to 5 minutes on my PC with 4 cores CPU and 8 GiB of RAM.

Unfortunately, the bash script is not as performant as for example Java. This is the reason why the creation of the transaction id I’ve left for the database. The execution logs look like below:

Run queries in the database

The easiest way to run the queries in the database is to enter inside the container and use the MySQL command line interface. To enter the container execute the command:

$docker container exec -it my_sql_local bash

More about Docker commands you can find here:

Docker commands for software developers

It is hard to imagine a software developer without Docker nowadays. Docker allows developers to keep a local…

faun.pub

And to login inside the database run the command using credentials configured as environment variables in the docker-compose file:

$mysql -ucustomer_user -pcustomer_password;

After successful login, execute the below command to switch to the needed database:

USE customer;

SELECT count(*) FROM purchase_transaction;

If you don't like to use the CLI you can connect to a DB using any database tool, like DBeaver.

Just need to append the database with extra configurations:

?allowPublicKeyRetrieval=true&useSSL=false

Performance of queries

Now about performance. There is a great article explaining how you can check the performance of a query. I highly recommend you read it.

MySQL Query Profiling Using Performance Schema — Vlad Mihalcea

In this article, I’m going to explain how to do query profiling using the MySQL Performance Schema. The Performance…

vladmihalcea.com

I will demonstrate to you how to check the performance of a query using the following example. Imagine our application needs to check if the customer has any purchase transactions. For this, we will create a simple query that checks the existence of any transaction by the customer id. To speed up the query we will add the index for the column that holds the customer id and then compare the performance of the query before and after the index.

To be able to check a performance we have to log in to the database as a root user. If you logged in as a simple database user for example the one set as the environment variable in the docker-compose file with the user name customer_user you won’t be able to query tables with needed information, so keep it in mind. Logg out from the simple user and execute the command:

$mysql -uroot -proot;

After logging in as a root we have to perform a couple of queries to enable query performance profiling, as explained in the article mentioned before.

UPDATE performance_schema.setup_instruments SET enabled = 'YES', timed = 'YES';

UPDATE performance_schema.setup_consumers SET enabled = 'YES';

First, we perform a simple query that checks if a customer has any purchase transactions. Remember that we don't have yet an index on this column.

SELECT EXISTS(SELECT * FROM purchase_transaction 
              WHERE customer_id = 'e68316fa-4c54-419f-8e95-b2d3ce31e258');

Because the customer id is randomly generated in your case it will be different than mine, just get anyone by executing the query.

SELECT * FROM purchase_transaction;

Now we will execute a query to fetch the performance of the previous one:

SELECT
    TRUNCATE(timer_wait / 10000000000, 2) AS timer_wait_sec,
    rows_sent,                             
    rows_examined,                                           
    select_scan,                          
    no_index_used, 
    digest                             
FROM
    performance_schema.events_statements_history_long
WHERE
    sql_text LIKE '%e68316fa-4c54-419f-8e95-b2d3ce31e258%';

The only way to get needed information at this point is by the SQL text of a query. Using this column is not that convenient because we have to find something quite distinct in the query to be able to fetch the right one. But this query returns a digest of an original query and in posterior queries, we can use it instead of a SQL text.

SELECT
    sql_text               
FROM
    performance_schema.events_statements_history_long
WHERE
    digest = '351a4a86f9401bf26de96e5312f2853aeed0b7177f8ce57090c16e86dadb894f';

I ran a query a couple of times just to have more performance records. And then checked performance using the digest of a query.

SELECT
    TRUNCATE(timer_wait / 10000000000, 2) AS timer_wait_sec,
    rows_sent,                             
    rows_examined,                                           
    select_scan,                          
    no_index_used, 
    digest                             
FROM
    performance_schema.events_statements_history_long
WHERE
    digest = '351a4a86f9401bf26de96e5312f2853aeed0b7177f8ce57090c16e86dadb894f';

As you can see from the screen the average time of a query fluctuates near 5 milliseconds. And the column no_index_used indicates that no index was used during the execution of a query.

Now I add the index to the customer id column:

CREATE INDEX customer_id_inx ON purchase_transaction (customer_id);

After I execute the original query couple of times.

And then I check performance again by the digest:

SELECT
    TRUNCATE(timer_wait / 10000000000, 2) AS timer_wait_sec,
    rows_sent,                             
    rows_examined,                                           
    select_scan,                          
    no_index_used, 
    digest                             
FROM
    performance_schema.events_statements_history_long
WHERE
    digest = '351a4a86f9401bf26de96e5312f2853aeed0b7177f8ce57090c16e86dadb894f';

As you can see the performance of the query improved a lot and now it spends less than a millisecond because it takes advantage of an index.

Conclusion

Docker gives us as developers an opportunity to replicate the production environment on our local machine. This tutorial is meant to show how you can create a database with a large amount of data. And use it to test and improve the queries executed by your application. Here I use simple examples, but these examples can serve as a base for your use cases.

Thank you for reading! Please like and follow. If you have any questions or suggestions, please feel free to write me on my LinkedIn account.

Become a member for full access to Medium content.