avatarMB20261

Summary

The provided content is a comprehensive guide on building Llama.cpp with GPU (CUDA) support, detailing the necessary steps and prerequisites for setting up the environment, installing dependencies, and compiling the software to leverage GPU acceleration for efficient execution of large language models.

Abstract

The article "LLM By Examples: Build Llama.cpp with GPU (CUDA) support" offers a detailed walkthrough for developers looking to enhance the performance of Llama.cpp, a framework for large language models, through GPU acceleration. It begins by emphasizing the importance of GPU support for reducing training and inference times, enabling real-time applications. The guide uses a Windows WSL2 environment with an Nvidia RTX2070 GPU as an example setup, providing command-line instructions and code snippets for verifying CUDA installation, installing necessary libraries, cloning the Llama.cpp Git repository, and building the software using CMake. The article also addresses common installation errors and provides links to additional resources for validating the installation and exploring different build environments, such as CPU-only configurations and Docker-based setups.

Opinions

  • The author believes that leveraging GPU acceleration is crucial for the efficient execution of large language models, particularly for real-time or near-real-time applications.
  • The guide assumes that readers may not be familiar with core concepts of Llama.cpp and provides links to introductory materials for those needing background information.
  • The use of a Windows WSL2 environment with a common gaming laptop GPU (Nvidia RTX2070) suggests that the setup process is accessible to a wide range of developers, not just those with specialized hardware.
  • The article implies that the Llama.cpp framework is designed with ease-of-use and performance in mind, catering to both command-line interface (CLI) enthusiasts and those looking to deploy server applications.
  • By including a section on potential installation errors, the author acknowledges the complexity of the build process and aims to prepare users for troubleshooting common issues.
  • The provision of additional resources for different installation scenarios indicates the author's commitment to supporting a diverse user base with varying needs and technical setups.

LLM By Examples: Build Llama.cpp with GPU (CUDA) support

As the demand for advanced language models continues to surge, developers increasingly seek high-performance solutions to harness their capabilities. Llama.cpp stands out as a powerful framework designed for efficient execution of large language models. This article aims to provide a comprehensive guide to building Llama.cpp with GPU (CUDA) support, enabling users to maximize computational efficiency.

Building Llama.cpp with GPU (CUDA) support unlocks the potential for accelerated performance and enhanced scalability. By leveraging the parallel processing power of modern GPUs, developers can significantly reduce the time taken for model training and inference, allowing for real-time or near-real-time applications. In this guide, we will explore the prerequisites for setting up your environment, such as compatible GPU hardware and CUDA software, along with detailed steps to configure your system. We will also walk through the installation and build process, ensuring you have the tools needed to effectively deploy Llama.cpp on GPU, opening new possibilities for your projects and applications in natural language processing and beyond.

If you don’t familiar with core concepts of Llama.cpp, take a look below link first.

Prepare for installation

To demonstrate the process of build and installation, we will use Windows WSL2 environment with a Nvidia RTX2070 (8GB) GPU environment. This specification is very common and could be found at most of Game laptop or PC.

In case if you are interested in how to setup such environment, take a look below links:

Now, let’s check what we have here:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

$ nvidia-smi
Mon Oct 21 16:19:03 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 531.18       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2070         On | 00000000:01:00.0 Off |                  N/A |
| N/A   48C    P0               34W /  N/A|      0MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Build and Installation

To build Llama.cpp, we will need:

  • cmake and support libraries
  • git, we will need clone the llama.cpp git repo

Now, let’s get started.

$ git clone https://github.com/ggerganov/llama.cpp
Cloning into 'llama.cpp'...
remote: Enumerating objects: 35858, done.
remote: Counting objects: 100% (105/105), done.
remote: Compressing objects: 100% (91/91), done.
remote: Total 35858 (delta 35), reused 44 (delta 11), pack-reused 35753 (from 1)
Receiving objects: 100% (35858/35858), 59.87 MiB | 346.00 KiB/s, done.
Resolving deltas: 100% (26027/26027), done.

$ cd llama.cpp

$ sudo apt-get install libcurl4-openssl-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Suggested packages:
  libcurl4-doc libidn11-dev libkrb5-dev libldap2-dev librtmp-dev libssh2-1-dev libssl-dev
The following NEW packages will be installed:
  libcurl4-openssl-dev
0 upgraded, 1 newly installed, 0 to remove and 27 not upgraded.
Need to get 386 kB of archives.
After this operation, 1698 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libcurl4-openssl-dev amd64 7.81.0-1ubuntu1.18 [386 kB]
Fetched 386 kB in 1s (468 kB/s)
Selecting previously unselected package libcurl4-openssl-dev:amd64.
(Reading database ... 37472 files and directories currently installed.)
Preparing to unpack .../libcurl4-openssl-dev_7.81.0-1ubuntu1.18_amd64.deb ...
Unpacking libcurl4-openssl-dev:amd64 (7.81.0-1ubuntu1.18) ...
Setting up libcurl4-openssl-dev:amd64 (7.81.0-1ubuntu1.18) ...
Processing triggers for man-db (2.10.2-1) ...

$ sudo apt install build-essential git cmake libopenblas-dev libatlas-base-dev
[sudo] password for wsluser:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
build-essential is already the newest version (12.9ubuntu3).
build-essential set to manually installed.
git is already the newest version (1:2.34.1-1ubuntu1.11).
git set to manually installed.
The following additional packages will be installed:
  cmake-data dh-elpa-helper emacsen-common libatlas3-base libjsoncpp25 libopenblas-pthread-dev libopenblas0
  libopenblas0-pthread librhash0
Suggested packages:
  cmake-doc ninja-build cmake-format libatlas-doc liblapack-doc
The following NEW packages will be installed:
  cmake cmake-data dh-elpa-helper emacsen-common libatlas-base-dev libatlas3-base libjsoncpp25 libopenblas-dev
  libopenblas-pthread-dev libopenblas0 libopenblas0-pthread librhash0
0 upgraded, 12 newly installed, 0 to remove and 27 not upgraded.
Need to get 25.5 MB of archives.
After this operation, 175 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 libjsoncpp25 amd64 1.9.5-3 [80.0 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 librhash0 amd64 1.4.2-1ubuntu1 [125 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/main amd64 dh-elpa-helper all 2.0.9ubuntu1 [7610 B]
Get:4 http://archive.ubuntu.com/ubuntu jammy/main amd64 emacsen-common all 3.0.4 [14.9 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 cmake-data all 3.22.1-1ubuntu1.22.04.2 [1913 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 cmake amd64 3.22.1-1ubuntu1.22.04.2 [5010 kB]
Get:7 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libatlas3-base amd64 3.10.3-12ubuntu1 [3340 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libatlas-base-dev amd64 3.10.3-12ubuntu1 [3590 kB]
Get:9 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libopenblas0-pthread amd64 0.3.20+ds-1 [6803 kB]
Get:10 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libopenblas0 amd64 0.3.20+ds-1 [6098 B]
Get:11 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libopenblas-pthread-dev amd64 0.3.20+ds-1 [4634 kB]
Get:12 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libopenblas-dev amd64 0.3.20+ds-1 [18.6 kB]
Fetched 25.5 MB in 1min 1s (419 kB/s)
Selecting previously unselected package libjsoncpp25:amd64.
(Reading database ... 34093 files and directories currently installed.)
Preparing to unpack .../00-libjsoncpp25_1.9.5-3_amd64.deb ...
Unpacking libjsoncpp25:amd64 (1.9.5-3) ...
Selecting previously unselected package librhash0:amd64.
Preparing to unpack .../01-librhash0_1.4.2-1ubuntu1_amd64.deb ...
Unpacking librhash0:amd64 (1.4.2-1ubuntu1) ...
Selecting previously unselected package dh-elpa-helper.
Preparing to unpack .../02-dh-elpa-helper_2.0.9ubuntu1_all.deb ...
Unpacking dh-elpa-helper (2.0.9ubuntu1) ...
Selecting previously unselected package emacsen-common.
Preparing to unpack .../03-emacsen-common_3.0.4_all.deb ...
Unpacking emacsen-common (3.0.4) ...
Selecting previously unselected package cmake-data.
Preparing to unpack .../04-cmake-data_3.22.1-1ubuntu1.22.04.2_all.deb ...
Unpacking cmake-data (3.22.1-1ubuntu1.22.04.2) ...
Selecting previously unselected package cmake.
Preparing to unpack .../05-cmake_3.22.1-1ubuntu1.22.04.2_amd64.deb ...
Unpacking cmake (3.22.1-1ubuntu1.22.04.2) ...
Selecting previously unselected package libatlas3-base:amd64.
Preparing to unpack .../06-libatlas3-base_3.10.3-12ubuntu1_amd64.deb ...
Unpacking libatlas3-base:amd64 (3.10.3-12ubuntu1) ...
Selecting previously unselected package libatlas-base-dev:amd64.
Preparing to unpack .../07-libatlas-base-dev_3.10.3-12ubuntu1_amd64.deb ...
Unpacking libatlas-base-dev:amd64 (3.10.3-12ubuntu1) ...
Selecting previously unselected package libopenblas0-pthread:amd64.
Preparing to unpack .../08-libopenblas0-pthread_0.3.20+ds-1_amd64.deb ...
Unpacking libopenblas0-pthread:amd64 (0.3.20+ds-1) ...
Selecting previously unselected package libopenblas0:amd64.
Preparing to unpack .../09-libopenblas0_0.3.20+ds-1_amd64.deb ...
Unpacking libopenblas0:amd64 (0.3.20+ds-1) ...
Selecting previously unselected package libopenblas-pthread-dev:amd64.
Preparing to unpack .../10-libopenblas-pthread-dev_0.3.20+ds-1_amd64.deb ...
Unpacking libopenblas-pthread-dev:amd64 (0.3.20+ds-1) ...
Selecting previously unselected package libopenblas-dev:amd64.
Preparing to unpack .../11-libopenblas-dev_0.3.20+ds-1_amd64.deb ...
Unpacking libopenblas-dev:amd64 (0.3.20+ds-1) ...
Setting up libopenblas0-pthread:amd64 (0.3.20+ds-1) ...
update-alternatives: using /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 to provide /usr/lib/x86_64-linux-gnu/libblas.so.3 (libblas.so.3-x86_64-linux-gnu) in auto mode
update-alternatives: using /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 to provide /usr/lib/x86_64-linux-gnu/liblapa
ck.so.3 (liblapack.so.3-x86_64-linux-gnu) in auto mode
update-alternatives: using /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblas.so.0 to provide /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (libopenblas.so.0-x86_64-linux-gnu) in auto mode
Setting up libatlas3-base:amd64 (3.10.3-12ubuntu1) ...
Setting up libatlas-base-dev:amd64 (3.10.3-12ubuntu1) ...
update-alternatives: using /usr/lib/x86_64-linux-gnu/atlas/libblas.so to provide /usr/lib/x86_64-linux-gnu/libblas.so (libblas.so
-x86_64-linux-gnu) in auto mode
update-alternatives: using /usr/lib/x86_64-linux-gnu/atlas/liblapack.so to provide /usr/lib/x86_64-linux-gnu/liblapack.so (liblapack.so-x86_64-linux-gnu) in auto mode
Setting up emacsen-common (3.0.4) ...
Setting up dh-elpa-helper (2.0.9ubuntu1) ...
Setting up libjsoncpp25:amd64 (1.9.5-3) ...
Setting up libopenblas0:amd64 (0.3.20+ds-1) ...
Setting up librhash0:amd64 (1.4.2-1ubuntu1) ...
Setting up cmake-data (3.22.1-1ubuntu1.22.04.2) ...
Setting up libopenblas-pthread-dev:amd64 (0.3.20+ds-1) ...
update-alternatives: using /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so to provide /usr/lib/x86_64-linux-gnu/libblas.so (libblas.so-x86_64-linux-gnu) in auto mode
update-alternatives: using /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so to provide /usr/lib/x86_64-linux-gnu/liblapack
.so (liblapack.so-x86_64-linux-gnu) in auto mode
update-alternatives: using /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblas.so to provide /usr/lib/x86_64-linux-gnu/libopen
blas.so (libopenblas.so-x86_64-linux-gnu) in auto mode
Setting up libopenblas-dev:amd64 (0.3.20+ds-1) ...
Setting up cmake (3.22.1-1ubuntu1.22.04.2) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.8) ...
/sbin/ldconfig.real: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link

$ export PATH=/usr/local/cuda-12.1/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
$ export CUDA_HOME=/usr/local/cuda-12.1/
$ export CUDA_VERSION=121

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

$ cmake -B build -DGGML_CUDA=ON
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- OpenMP found
-- Using llamafile
-- Using AMX
-- Found CUDAToolkit: /usr/local/cuda-12.1/include (found version "12.1.66")
-- CUDA found
-- Using CUDA architectures: 52;61;70;75
-- The CUDA compiler identification is NVIDIA 12.1.66
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-12.1/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 11.4.0
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /work/GitHubs/MEAIDev/poc-ai-tool-llama-cpp/llama.cpp/build

$ cmake --build build --config Release
[  0%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[  1%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[  1%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-backend.cpp.o
[  2%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-quants.c.o
[  2%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/acc.cu.o
[  2%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/arange.cu.o
[  3%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/argmax.cu.o
[  3%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/argsort.cu.o
[  4%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/binbcast.cu.o
[  4%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/clamp.cu.o
[  4%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/concat.cu.o
[  5%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/conv-transpose-1d.cu.o
[  5%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/convert.cu.o
[  6%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/count-equal.cu.o
[  6%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/cpy.cu.o
[  6%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/cross-entropy-loss.cu.o
[  7%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/diagmask.cu.o
[  7%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/dmmv.cu.o
[  8%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn-tile-f16.cu.o
[  8%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn-tile-f32.cu.o
[  8%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/fattn.cu.o
[  9%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/getrows.cu.o
[  9%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/im2col.cu.o
[ 10%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/mmq.cu.o
[ 10%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/mmvq.cu.o
[ 10%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/norm.cu.o
[ 11%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/opt-step-adamw.cu.o
[ 11%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/out-prod.cu.o
[ 12%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/pad.cu.o
[ 12%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/pool2d.cu.o
[ 12%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/quantize.cu.o
[ 13%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/rope.cu.o
[ 13%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/rwkv-wkv.cu.o
[ 14%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/scale.cu.o
[ 14%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/softmax.cu.o
[ 14%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sum.cu.o
[ 15%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/sumrows.cu.o
[ 15%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/tsembd.cu.o
[ 16%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/unary.cu.o
[ 16%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/upscale.cu.o
[ 16%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda.cu.o
[ 17%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cu.o
[ 17%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cu.o
[ 18%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cu.o
[ 18%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cu.o
[ 18%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cu.o
[ 19%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq1_s.cu.o
[ 19%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_s.cu.o
[ 20%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_xs.cu.o
[ 20%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cu.o
[ 20%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq3_s.cu.o
[ 21%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cu.o
[ 21%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq4_nl.cu.o
[ 22%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-iq4_xs.cu.o
f[ 22%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q2_k.cu.o
[ 22%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q3_k.cu.o
[ 23%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_0.cu.o
[ 23%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_1.cu.o
[ 24%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q4_k.cu.o
[ 24%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_0.cu.o
[ 24%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_1.cu.o
[ 25%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q5_k.cu.o
[ 25%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q6_k.cu.o
[ 26%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/mmq-instance-q8_0.cu.o
[ 26%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q4_0-q4_0.cu.o
[ 26%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q4_0-q4_0.cu.o
[ 27%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-q8_0-q8_0.cu.o
[ 27%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-q8_0-q8_0.cu.o
[ 28%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs128-f16-f16.cu.o
[ 28%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs256-f16-f16.cu.o
[ 28%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f16-instance-hs64-f16-f16.cu.o
[ 29%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs128-f16-f16.cu.o
[ 29%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs256-f16-f16.cu.o
[ 30%] Building CUDA object ggml/src/CMakeFiles/ggml.dir/ggml-cuda/template-instances/fattn-vec-f32-instance-hs64-f16-f16.cu.o
[ 30%] Building CXX object ggml/src/CMakeFiles/ggml.dir/llamafile/sgemm.cpp.o
[ 30%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-amx/mmq.cpp.o
[ 31%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-amx.cpp.o
[ 31%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-aarch64.c.o
[ 32%] Linking CXX shared library libggml.so
[ 32%] Built target ggml
[ 32%] Building CXX object src/CMakeFiles/llama.dir/llama.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/llama-vocab.cpp.o
[ 33%] Building CXX object src/CMakeFiles/llama.dir/llama-grammar.cpp.o
[ 33%] Building CXX object src/CMakeFiles/llama.dir/llama-sampling.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/unicode.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/unicode-data.cpp.o
[ 34%] Linking CXX shared library libllama.so
[ 34%] Built target llama
[ 34%] Generating build details from Git
-- Found Git: /usr/bin/git (found version "2.34.1")
[ 34%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 34%] Built target build_info
[ 35%] Building CXX object common/CMakeFiles/common.dir/arg.cpp.o
[ 35%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 36%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 36%] Building CXX object common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 36%] Building CXX object common/CMakeFiles/common.dir/log.cpp.o
[ 37%] Building CXX object common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 37%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 38%] Building CXX object common/CMakeFiles/common.dir/train.cpp.o
[ 38%] Linking CXX static library libcommon.a
[ 38%] Built target common
[ 38%] Building CXX object tests/CMakeFiles/test-tokenizer-0.dir/test-tokenizer-0.cpp.o
[ 39%] Linking CXX executable ../bin/test-tokenizer-0
[ 39%] Built target test-tokenizer-0
[ 39%] Building CXX object tests/CMakeFiles/test-tokenizer-1-bpe.dir/test-tokenizer-1-bpe.cpp.o
[ 39%] Linking CXX executable ../bin/test-tokenizer-1-bpe
[ 39%] Built target test-tokenizer-1-bpe
[ 40%] Building CXX object tests/CMakeFiles/test-tokenizer-1-spm.dir/test-tokenizer-1-spm.cpp.o
[ 40%] Linking CXX executable ../bin/test-tokenizer-1-spm
[ 40%] Built target test-tokenizer-1-spm
[ 40%] Building CXX object tests/CMakeFiles/test-log.dir/test-log.cpp.o
[ 40%] Building CXX object tests/CMakeFiles/test-log.dir/get-model.cpp.o
[ 41%] Linking CXX executable ../bin/test-log
[ 41%] Built target test-log
[ 41%] Building CXX object tests/CMakeFiles/test-arg-parser.dir/test-arg-parser.cpp.o
[ 42%] Building CXX object tests/CMakeFiles/test-arg-parser.dir/get-model.cpp.o
[ 42%] Linking CXX executable ../bin/test-arg-parser
[ 42%] Built target test-arg-parser
[ 42%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/test-quantize-fns.cpp.o
[ 43%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/get-model.cpp.o
[ 43%] Linking CXX executable ../bin/test-quantize-fns
[ 43%] Built target test-quantize-fns
[ 44%] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/test-quantize-perf.cpp.o
[ 44%] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/get-model.cpp.o
[ 44%] Linking CXX executable ../bin/test-quantize-perf
[ 44%] Built target test-quantize-perf
[ 44%] Building CXX object tests/CMakeFiles/test-sampling.dir/test-sampling.cpp.o
[ 44%] Building CXX object tests/CMakeFiles/test-sampling.dir/get-model.cpp.o
[ 45%] Linking CXX executable ../bin/test-sampling
[ 45%] Built target test-sampling
[ 46%] Building CXX object tests/CMakeFiles/test-chat-template.dir/test-chat-template.cpp.o
[ 46%] Building CXX object tests/CMakeFiles/test-chat-template.dir/get-model.cpp.o
[ 47%] Linking CXX executable ../bin/test-chat-template
[ 47%] Built target test-chat-template
[ 47%] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/test-grammar-parser.cpp.o
[ 48%] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/get-model.cpp.o
[ 48%] Linking CXX executable ../bin/test-grammar-parser
[ 48%] Built target test-grammar-parser
[ 49%] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/test-llama-grammar.cpp.o
[ 49%] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/get-model.cpp.o
[ 50%] Linking CXX executable ../bin/test-llama-grammar
[ 50%] Built target test-llama-grammar
[ 50%] Building CXX object tests/CMakeFiles/test-grammar-integration.dir/test-grammar-integration.cpp.o
[ 51%] Building CXX object tests/CMakeFiles/test-grammar-integration.dir/get-model.cpp.o
[ 51%] Linking CXX executable ../bin/test-grammar-integration
[ 51%] Built target test-grammar-integration
[ 51%] Building CXX object tests/CMakeFiles/test-grad0.dir/test-grad0.cpp.o
[ 51%] Building CXX object tests/CMakeFiles/test-grad0.dir/get-model.cpp.o
[ 52%] Linking CXX executable ../bin/test-grad0
[ 52%] Built target test-grad0
[ 53%] Building CXX object tests/CMakeFiles/test-barrier.dir/test-barrier.cpp.o
[ 53%] Building CXX object tests/CMakeFiles/test-barrier.dir/get-model.cpp.o
[ 54%] Linking CXX executable ../bin/test-barrier
[ 54%] Built target test-barrier
[ 55%] Building CXX object tests/CMakeFiles/test-backend-ops.dir/test-backend-ops.cpp.o
[ 55%] Building CXX object tests/CMakeFiles/test-backend-ops.dir/get-model.cpp.o
[ 55%] Linking CXX executable ../bin/test-backend-ops
[ 55%] Built target test-backend-ops
[ 56%] Building CXX object tests/CMakeFiles/test-rope.dir/test-rope.cpp.o
[ 56%] Building CXX object tests/CMakeFiles/test-rope.dir/get-model.cpp.o
[ 57%] Linking CXX executable ../bin/test-rope
[ 57%] Built target test-rope
[ 57%] Building CXX object tests/CMakeFiles/test-model-load-cancel.dir/test-model-load-cancel.cpp.o
[ 58%] Building CXX object tests/CMakeFiles/test-model-load-cancel.dir/get-model.cpp.o
[ 58%] Linking CXX executable ../bin/test-model-load-cancel
[ 58%] Built target test-model-load-cancel
[ 58%] Building CXX object tests/CMakeFiles/test-autorelease.dir/test-autorelease.cpp.o
[ 59%] Building CXX object tests/CMakeFiles/test-autorelease.dir/get-model.cpp.o
[ 59%] Linking CXX executable ../bin/test-autorelease
[ 59%] Built target test-autorelease
[ 60%] Building CXX object tests/CMakeFiles/test-json-schema-to-grammar.dir/test-json-schema-to-grammar.cpp.o
[ 60%] Building CXX object tests/CMakeFiles/test-json-schema-to-grammar.dir/get-model.cpp.o
[ 60%] Linking CXX executable ../bin/test-json-schema-to-grammar
[ 60%] Built target test-json-schema-to-grammar
[ 60%] Building C object tests/CMakeFiles/test-c.dir/test-c.c.o
[ 60%] Linking C executable ../bin/test-c
[ 60%] Built target test-c
[ 61%] Building CXX object examples/cvector-generator/CMakeFiles/llama-cvector-generator.dir/cvector-generator.cpp.o
[ 61%] Linking CXX executable ../../bin/llama-cvector-generator
[ 61%] Built target llama-cvector-generator
[ 62%] Building CXX object examples/baby-llama/CMakeFiles/llama-baby-llama.dir/baby-llama.cpp.o
[ 62%] Linking CXX executable ../../bin/llama-baby-llama
[ 62%] Built target llama-baby-llama
[ 62%] Building CXX object examples/batched-bench/CMakeFiles/llama-batched-bench.dir/batched-bench.cpp.o
[ 63%] Linking CXX executable ../../bin/llama-batched-bench
[ 63%] Built target llama-batched-bench
[ 64%] Building CXX object examples/batched/CMakeFiles/llama-batched.dir/batched.cpp.o
[ 64%] Linking CXX executable ../../bin/llama-batched
[ 64%] Built target llama-batched
[ 65%] Building CXX object examples/convert-llama2c-to-ggml/CMakeFiles/llama-convert-llama2c-to-ggml.dir/convert-llama2c-to-ggml.cpp.o
[ 65%] Linking CXX executable ../../bin/llama-convert-llama2c-to-ggml
[ 65%] Built target llama-convert-llama2c-to-ggml
[ 65%] Building CXX object examples/embedding/CMakeFiles/llama-embedding.dir/embedding.cpp.o
[ 66%] Linking CXX executable ../../bin/llama-embedding
[ 66%] Built target llama-embedding
[ 66%] Building CXX object examples/eval-callback/CMakeFiles/llama-eval-callback.dir/eval-callback.cpp.o
[ 67%] Linking CXX executable ../../bin/llama-eval-callback
[ 67%] Built target llama-eval-callback
[ 67%] Building CXX object examples/export-lora/CMakeFiles/llama-export-lora.dir/export-lora.cpp.o
[ 67%] Linking CXX executable ../../bin/llama-export-lora
[ 67%] Built target llama-export-lora
[ 68%] Building CXX object examples/gbnf-validator/CMakeFiles/llama-gbnf-validator.dir/gbnf-validator.cpp.o
[ 68%] Linking CXX executable ../../bin/llama-gbnf-validator
[ 68%] Built target llama-gbnf-validator
[ 69%] Building C object examples/gguf-hash/CMakeFiles/sha256.dir/deps/sha256/sha256.c.o
[ 69%] Built target sha256
[ 70%] Building C object examples/gguf-hash/CMakeFiles/xxhash.dir/deps/xxhash/xxhash.c.o
[ 70%] Built target xxhash
[ 70%] Building C object examples/gguf-hash/CMakeFiles/sha1.dir/deps/sha1/sha1.c.o
[ 70%] Built target sha1
[ 70%] Building CXX object examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/gguf-hash.cpp.o
[ 71%] Linking CXX executable ../../bin/llama-gguf-hash
[ 71%] Built target llama-gguf-hash
[ 71%] Building CXX object examples/gguf-split/CMakeFiles/llama-gguf-split.dir/gguf-split.cpp.o
[ 72%] Linking CXX executable ../../bin/llama-gguf-split
[ 72%] Built target llama-gguf-split
[ 73%] Building CXX object examples/gguf/CMakeFiles/llama-gguf.dir/gguf.cpp.o
[ 73%] Linking CXX executable ../../bin/llama-gguf
[ 73%] Built target llama-gguf
[ 73%] Building CXX object examples/gritlm/CMakeFiles/llama-gritlm.dir/gritlm.cpp.o
[ 73%] Linking CXX executable ../../bin/llama-gritlm
[ 73%] Built target llama-gritlm
[ 74%] Building CXX object examples/imatrix/CMakeFiles/llama-imatrix.dir/imatrix.cpp.o
[ 74%] Linking CXX executable ../../bin/llama-imatrix
[ 74%] Built target llama-imatrix
[ 75%] Building CXX object examples/infill/CMakeFiles/llama-infill.dir/infill.cpp.o
[ 75%] Linking CXX executable ../../bin/llama-infill
[ 75%] Built target llama-infill
[ 75%] Building CXX object examples/llama-bench/CMakeFiles/llama-bench.dir/llama-bench.cpp.o
[ 76%] Linking CXX executable ../../bin/llama-bench
[ 76%] Built target llama-bench
[ 77%] Building CXX object examples/llava/CMakeFiles/llava.dir/llava.cpp.o
[ 77%] Building CXX object examples/llava/CMakeFiles/llava.dir/clip.cpp.o
[ 77%] Built target llava
[ 77%] Linking CXX static library libllava_static.a
[ 77%] Built target llava_static
[ 78%] Linking CXX shared library libllava_shared.so
[ 78%] Built target llava_shared
[ 78%] Building CXX object examples/llava/CMakeFiles/llama-llava-cli.dir/llava-cli.cpp.o
[ 79%] Linking CXX executable ../../bin/llama-llava-cli
[ 79%] Built target llama-llava-cli
[ 79%] Building CXX object examples/llava/CMakeFiles/llama-minicpmv-cli.dir/minicpmv-cli.cpp.o
[ 80%] Linking CXX executable ../../bin/llama-minicpmv-cli
[ 80%] Built target llama-minicpmv-cli
[ 80%] Building CXX object examples/lookahead/CMakeFiles/llama-lookahead.dir/lookahead.cpp.o
[ 81%] Linking CXX executable ../../bin/llama-lookahead
[ 81%] Built target llama-lookahead
[ 81%] Building CXX object examples/lookup/CMakeFiles/llama-lookup.dir/lookup.cpp.o
[ 81%] Linking CXX executable ../../bin/llama-lookup
[ 81%] Built target llama-lookup
[ 82%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-create.dir/lookup-create.cpp.o
[ 82%] Linking CXX executable ../../bin/llama-lookup-create
[ 82%] Built target llama-lookup-create
[ 83%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-merge.dir/lookup-merge.cpp.o
[ 83%] Linking CXX executable ../../bin/llama-lookup-merge
[ 83%] Built target llama-lookup-merge
[ 83%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-stats.dir/lookup-stats.cpp.o
[ 84%] Linking CXX executable ../../bin/llama-lookup-stats
[ 84%] Built target llama-lookup-stats
[ 84%] Building CXX object examples/main/CMakeFiles/llama-cli.dir/main.cpp.o
[ 84%] Linking CXX executable ../../bin/llama-cli
[ 84%] Built target llama-cli
[ 84%] Building CXX object examples/parallel/CMakeFiles/llama-parallel.dir/parallel.cpp.o
[ 84%] Linking CXX executable ../../bin/llama-parallel
[ 84%] Built target llama-parallel
[ 85%] Building CXX object examples/passkey/CMakeFiles/llama-passkey.dir/passkey.cpp.o
[ 85%] Linking CXX executable ../../bin/llama-passkey
[ 85%] Built target llama-passkey
[ 86%] Building CXX object examples/perplexity/CMakeFiles/llama-perplexity.dir/perplexity.cpp.o
[ 86%] Linking CXX executable ../../bin/llama-perplexity
[ 86%] Built target llama-perplexity
[ 86%] Building CXX object examples/quantize-stats/CMakeFiles/llama-quantize-stats.dir/quantize-stats.cpp.o
[ 86%] Linking CXX executable ../../bin/llama-quantize-stats
[ 86%] Built target llama-quantize-stats
[ 86%] Building CXX object examples/quantize/CMakeFiles/llama-quantize.dir/quantize.cpp.o
[ 87%] Linking CXX executable ../../bin/llama-quantize
[ 87%] Built target llama-quantize
[ 88%] Building CXX object examples/retrieval/CMakeFiles/llama-retrieval.dir/retrieval.cpp.o
[ 88%] Linking CXX executable ../../bin/llama-retrieval
[ 88%] Built target llama-retrieval
[ 88%] Generating theme-snowstorm.css.hpp
[ 88%] Generating colorthemes.css.hpp
[ 89%] Generating completion.js.hpp
[ 89%] Generating index-new.html.hpp
[ 90%] Generating index.html.hpp
[ 90%] Generating index.js.hpp
[ 90%] Generating json-schema-to-grammar.mjs.hpp
[ 90%] Generating loading.html.hpp
[ 91%] Generating prompt-formats.js.hpp
[ 92%] Generating style.css.hpp
[ 92%] Generating system-prompts.js.hpp
[ 92%] Generating theme-beeninorder.css.hpp
[ 93%] Generating theme-ketivah.css.hpp
[ 93%] Generating theme-mangotango.css.hpp
[ 93%] Generating theme-playground.css.hpp
[ 94%] Generating theme-polarnight.css.hpp
[ 95%] Building CXX object examples/server/CMakeFiles/llama-server.dir/server.cpp.o
[ 95%] Linking CXX executable ../../bin/llama-server
[ 95%] Built target llama-server
[ 96%] Building CXX object examples/save-load-state/CMakeFiles/llama-save-load-state.dir/save-load-state.cpp.o
[ 96%] Linking CXX executable ../../bin/llama-save-load-state
[ 96%] Built target llama-save-load-state
[ 97%] Building CXX object examples/simple/CMakeFiles/llama-simple.dir/simple.cpp.o
[ 97%] Linking CXX executable ../../bin/llama-simple
[ 97%] Built target llama-simple
[ 97%] Building CXX object examples/speculative/CMakeFiles/llama-speculative.dir/speculative.cpp.o
[ 98%] Linking CXX executable ../../bin/llama-speculative
[ 98%] Built target llama-speculative
[ 98%] Building CXX object examples/tokenize/CMakeFiles/llama-tokenize.dir/tokenize.cpp.o
[ 99%] Linking CXX executable ../../bin/llama-tokenize
[ 99%] Built target llama-tokenize
[ 99%] Building CXX object pocs/vdot/CMakeFiles/llama-vdot.dir/vdot.cpp.o
[ 99%] Linking CXX executable ../../bin/llama-vdot
[ 99%] Built target llama-vdot
[ 99%] Building CXX object pocs/vdot/CMakeFiles/llama-q8dot.dir/q8dot.cpp.o
[100%] Linking CXX executable ../../bin/llama-q8dot
[100%] Built target llama-q8dot

$

After build, you could find all command line scrips at llama.cpp/build/bin directory:

$ ls -l build/bin
total 30280
-rwxr-xr-x 1 wsluser wsluser  420688 Oct 21 14:41 llama-baby-llama
-rwxr-xr-x 1 wsluser wsluser  961096 Oct 21 14:41 llama-batched
-rwxr-xr-x 1 wsluser wsluser  961144 Oct 21 14:41 llama-batched-bench
-rwxr-xr-x 1 wsluser wsluser  487264 Oct 21 14:42 llama-bench
-rwxr-xr-x 1 wsluser wsluser  998336 Oct 21 14:42 llama-cli
-rwxr-xr-x 1 wsluser wsluser  366528 Oct 21 14:41 llama-convert-llama2c-to-ggml
-rwxr-xr-x 1 wsluser wsluser  994784 Oct 21 14:41 llama-cvector-generator
-rwxr-xr-x 1 wsluser wsluser  965616 Oct 21 14:41 llama-embedding
-rwxr-xr-x 1 wsluser wsluser  961504 Oct 21 14:41 llama-eval-callback
-rwxr-xr-x 1 wsluser wsluser  999344 Oct 21 14:41 llama-export-lora
-rwxr-xr-x 1 wsluser wsluser   28344 Oct 21 14:41 llama-gbnf-validator
-rwxr-xr-x 1 wsluser wsluser   28056 Oct 21 14:41 llama-gguf
-rwxr-xr-x 1 wsluser wsluser  103448 Oct 21 14:41 llama-gguf-hash
-rwxr-xr-x 1 wsluser wsluser   48064 Oct 21 14:41 llama-gguf-split
-rwxr-xr-x 1 wsluser wsluser  961832 Oct 21 14:41 llama-gritlm
-rwxr-xr-x 1 wsluser wsluser 1004344 Oct 21 14:42 llama-imatrix
-rwxr-xr-x 1 wsluser wsluser  984752 Oct 21 14:42 llama-infill
-rwxr-xr-x 1 wsluser wsluser 1253696 Oct 21 14:42 llama-llava-cli
-rwxr-xr-x 1 wsluser wsluser  966048 Oct 21 14:42 llama-lookahead
-rwxr-xr-x 1 wsluser wsluser  995376 Oct 21 14:42 llama-lookup
-rwxr-xr-x 1 wsluser wsluser  978248 Oct 21 14:42 llama-lookup-create
-rwxr-xr-x 1 wsluser wsluser   69792 Oct 21 14:42 llama-lookup-merge
-rwxr-xr-x 1 wsluser wsluser  987240 Oct 21 14:42 llama-lookup-stats
-rwxr-xr-x 1 wsluser wsluser 1249088 Oct 21 14:42 llama-minicpmv-cli
-rwxr-xr-x 1 wsluser wsluser  970280 Oct 21 14:42 llama-parallel
-rwxr-xr-x 1 wsluser wsluser  961376 Oct 21 14:42 llama-passkey
-rwxr-xr-x 1 wsluser wsluser 1059232 Oct 21 14:42 llama-perplexity
-rwxr-xr-x 1 wsluser wsluser   21184 Oct 21 14:43 llama-q8dot
-rwxr-xr-x 1 wsluser wsluser  359752 Oct 21 14:42 llama-quantize
-rwxr-xr-x 1 wsluser wsluser  213016 Oct 21 14:42 llama-quantize-stats
-rwxr-xr-x 1 wsluser wsluser  975176 Oct 21 14:42 llama-retrieval
-rwxr-xr-x 1 wsluser wsluser  961656 Oct 21 14:43 llama-save-load-state
-rwxr-xr-x 1 wsluser wsluser 1932464 Oct 21 14:42 llama-server
-rwxr-xr-x 1 wsluser wsluser   26824 Oct 21 14:43 llama-simple
-rwxr-xr-x 1 wsluser wsluser  989720 Oct 21 14:43 llama-speculative
-rwxr-xr-x 1 wsluser wsluser  337904 Oct 21 14:43 llama-tokenize
-rwxr-xr-x 1 wsluser wsluser   21768 Oct 21 14:43 llama-vdot
-rwxr-xr-x 1 wsluser wsluser  966504 Oct 21 14:41 test-arg-parser
-rwxr-xr-x 1 wsluser wsluser   18152 Oct 21 14:41 test-autorelease
-rwxr-xr-x 1 wsluser wsluser  380080 Oct 21 14:41 test-backend-ops
-rwxr-xr-x 1 wsluser wsluser   22088 Oct 21 14:41 test-barrier
-rwxr-xr-x 1 wsluser wsluser   15776 Oct 21 14:41 test-c
-rwxr-xr-x 1 wsluser wsluser  354536 Oct 21 14:41 test-chat-template
-rwxr-xr-x 1 wsluser wsluser   61080 Oct 21 14:41 test-grad0
-rwxr-xr-x 1 wsluser wsluser  602752 Oct 21 14:41 test-grammar-integration
-rwxr-xr-x 1 wsluser wsluser   41072 Oct 21 14:41 test-grammar-parser
-rwxr-xr-x 1 wsluser wsluser  596752 Oct 21 14:41 test-json-schema-to-grammar
-rwxr-xr-x 1 wsluser wsluser   46336 Oct 21 14:41 test-llama-grammar
-rwxr-xr-x 1 wsluser wsluser   34064 Oct 21 14:41 test-log
-rwxr-xr-x 1 wsluser wsluser   16512 Oct 21 14:41 test-model-load-cancel
-rwxr-xr-x 1 wsluser wsluser   17552 Oct 21 14:41 test-quantize-fns
-rwxr-xr-x 1 wsluser wsluser   41632 Oct 21 14:41 test-quantize-perf
-rwxr-xr-x 1 wsluser wsluser   17336 Oct 21 14:41 test-rope
-rwxr-xr-x 1 wsluser wsluser   45368 Oct 21 14:41 test-sampling
-rwxr-xr-x 1 wsluser wsluser  352920 Oct 21 14:41 test-tokenizer-0
-rwxr-xr-x 1 wsluser wsluser  330064 Oct 21 14:41 test-tokenizer-1-bpe
-rwxr-xr-x 1 wsluser wsluser  329840 Oct 21 14:41 test-tokenizer-1-spm

$ 

If you see some errors during the compile and build process, write down the file name. Most of time, the error only impacts the file or model related, not whole installation.

What’s next?

Typically the next step is to validate the installation. Below link provides you not only the hello world use case, but most of modern common use cases.

If you are interested in building and installing Llama.cpp for different environment, check out below links:

Llama 3
Llama Cpp
Cuda
Installation
Recommended from ReadMedium