Do you need to fine-tune large language models for semantic search?
Fine-tuning is expensive and costly, do you really need it for simple question answering?
Many people ask how to train models on their Corpus of data to ask questions, and they assume fine-tuning is the way to go. However, fine-tuning is a type of transfer learning used to teach a model a new task, not new information. On the other hand, semantic search utilizes a semantic embedding that represents the meaning of the text to search the context and topics in a database.
Semantic search is way cheaper, faster, and more straightforward than fine-tuning models for NLP tasks. The only similarity between the two is that they use semantic embeddings, but they are entirely different technologies. The biggest misconception about fine-tuning is that people think they can use it to teach the model new information and perform QA with just a single model, but that is not how transfer learning works.
Fine-tuning a model involves taking a small portion of the model and applying it to a new task. It is not retraining the entire model. However, there are issues like confabulation and hallucination that fine-tuning cannot fix. Transfer learning is an example where a previously learned skill can be applied in a different context, like tying shoes or stacking blocks. Fine-tuning is just a new task and not new information. To take a large language model back to school, the entire model needs to be unfrozen, which is expensive and doesn’t solve the issue of confabulation and hallucination.
Fine-tuning is more challenging than prompt engineering and can be 100 or even 10,000 times more difficult to execute. On the other hand, semantic search is an easy and cheaper process that involves retrieving exact information from a database or index. It is infinitely scalable, unlike fine-tuning, where the cost goes up as the amount of data increases. Fine-tuning is not suitable for QA but can be used in conjunction with semantic search to answer specific tasks. The question of how much information to share is an open question in the field of AI alignment because of the risk of dangerous players using it for nefarious purposes. Therefore, it is essential to consider carefully when sharing information.
Fine-tuning helps teach a model a pattern rather than new information. It is valuable for teaching a new task or a pattern-based task. This process can be translated to a machine by indexing the corpus with semantic embeddings to make it searchable, generating relevant search terms or queries using a large language model, and using an embedding to find the desired information. To effectively search for relevant documents, utilize a semantic search engine to match your query. This will help you pull the most relevant documents. Then, use the LLM to quickly read and summarize the necessary information from these documents. Finally, compile all the relevant information together to obtain the answer you need.
Fine-tuning for tasks such as classification and sentiment analysis is old-school NLP and ignores cheaper and simpler methods for these tasks. In summary, fine-tuning can be effective for specific NLP tasks, but it is not the best option for all tasks, and there are other, more efficient approaches. Fine-tuning a model teaches it a new task without necessarily providing new information. It is similar to tuning a guitar to enhance its performance. On the other hand, semantic search refers to using semantic or neural embeddings to represent text meaning and search based on context and topics, allowing for faster, cheaper, and more efficient database searches.
For quality assurance, humans typically start with a question, then search for relevant data, compile it, extract the salient bits, and finally produce an answer. To use a library analogy, the Dewey Decimal System is like a human-readable semantic embedding that helps find a book in the library. The process of finding relevant information can be simplified by utilizing a Dewey decimal system, where questions are matched with a corresponding number, and books are retrieved based on those numbers. This is essentially a semantic search, where a large amount of data is filtered through for a specific subset of information. The index within books allows for easy pinpointing of the necessary information. A similar process can be done through machine learning, where semantic embeddings are used to make documents searchable, and large language models generate relevant search terms. This results in a more efficient way of retrieving necessary information.
As a junior data scientist, it’s important to know how to effectively use your semantic search engine to find the most relevant documents for a given query. This involves matching your Dewey Decimal System to pull the most relevant resources and using the LLM to quickly read and summarize the most important parts of those documents. Once you have compiled all the relevant information, you can formulate your answer.
Credits: David Shapiro






