Tag: Llama

Guide for Running Llama 2 Using LLAMA.CPP on AWS Fargate

Llama 2 is a new family of open-source large language models released by Meta (more on that here https://ai.meta.com/llama/) and which became a standard in the industry for using in cases with self-hosted LLM. LLAMA.CPP is an open-source framework that is focused on running Llama models on C...

Run Llama 2 on Your CPU with Rust

A new one-file Rust implementation of Llama 2 is now available thanks to Sasha Rush. It’s a Rust port of Karpathy’s llama2.c. It already supports the following features: Support for 4-bit GPT-Q Quantization SIMD support for fast CPU inference Support for Grouped ...

Fine-tune Llama 2 on Your Computer with QLoRa and TRL

Llama 2 is a state-of-the-art large language model (LLM) released by Meta. In the paper presenting the model, Llama 2 demonstrates impressive capabilities on public benchmarks for various natural language generation and coding tasks. Meta also released Chat versions of Llama 2. These chat mode...

Fine-tuning Llama 2 for news category prediction: A step-by-step comprehensive guide to???

In this blog, I will guide you through the process of fine-tuning Meta’s Llama 2 7B model for news article categorization across 18 different categories. I will utilize a news classification instruction dataset that I previously created using GPT 3.5. If you’re interested ...

Running Llama 2 on CPU Inference Locally for Document Q&A

Third-party commercial large language model (LLM) providers like OpenAI’s GPT4 have democratized LLM use via simple API calls. However, teams may still require self-managed or private deployment for model inference within enterprise perimeters due to various reasons around data privacy and com...

Fine-tuning Llama 2 for news category prediction: A step-by-step comprehensive guide to fine-tuning any LLM (Part 1)

In this blog, I will guide you through the process of fine-tuning Meta’s Llama 2 7B model for news article categorization across 18 different categories. I will utilize a news classification instruction dataset that I previously created using GPT 3.5. If you’re interested ...

Using LLaMA 2.0, FAISS and LangChain for Question-Answering on Your Own Data

Over the past few weeks, I have been playing around with several large language models (LLMs) and exploring their potential with all sorts of methods available on the internet, but now it’s time for me to share what I have learned so far! I was super excited to know that Meta released the n...

Fine-Tuning a Llama-2 7B Model for Python Code Generation

About 2 weeks ago, the world of generative AI was shocked by the company Meta's release of the new Llama-2 AI model. Its predecessor, Llama-1, was a breaking point in the LLM industry, as with the release of its weights along with new finetuning techniques, there was a massive creation of open-s...

Quantize Llama models with GGML and llama.cpp

Due to the massive size of Large Language Models (LLMs), quantization has become an essential technique to run them efficiently. By reducing the precision of their weights, you can save memory and speed up inference while preserving most of the model’s performance. Recently, 8-bit and 4-bit qu...