NVIDIA and Meta Collaborate on Advanced RAG Pipelines with Llama 3.1 and NeMo Retriever NIMs


Peter
Zhang


Jul
24,
2024
03:50

NVIDIA
and
Meta
introduce
scalable
agentic
RAG
pipelines
with
Llama
3.1
and
NeMo
Retriever
NIMs,
optimizing
LLM
performance
and
decision-making
capabilities.

NVIDIA and Meta Collaborate on Advanced RAG Pipelines with Llama 3.1 and NeMo Retriever NIMs

In
a
significant
advancement
for
large
language
models
(LLMs),
NVIDIA
and
Meta
have
jointly
introduced
a
new
framework
incorporating
Llama
3.1
and
NVIDIA
NeMo
Retriever
NIMs,
designed
to
enhance
retrieval-augmented
generation
(RAG)
pipelines.
This
collaboration
aims
to
optimize
LLM
responses,
ensuring
they
are
current
and
accurate,
according
to

NVIDIA
Technical
Blog
.

Enhancing
RAG
Pipelines

Retrieval-augmented
generation
(RAG)
is
a
crucial
strategy
for
preventing
LLMs
from
generating
outdated
or
incorrect
responses.
Various
retrieval
strategies,
such
as
semantic
search
or
graph
retrieval,
improve
the
recall
of
documents
needed
for
accurate
generation.
However,
there
is
no
one-size-fits-all
approach,
and
the
retrieval
pipeline
must
be
customized
according
to
specific
data
requirements
and
hyperparameters.

Modern
RAG
systems
increasingly
incorporate
an
agentic
framework
to
handle
reasoning,
decision-making,
and
reflection
on
the
retrieved
data.
An
agentic
system
enables
an
LLM
to
reason
through
problems,
create
plans,
and
execute
them
using
a
set
of
tools.

Meta’s
Llama
3.1
and
NVIDIA
NeMo
Retriever
NIMs

Meta’s
Llama
3.1
family,
spanning
models
with
8
billion
to
405
billion
parameters,
is
equipped
with
capabilities
for
agentic
workloads.
These
models
can
break
down
tasks,
act
as
central
planners,
and
perform
multi-step
reasoning,
all
while
maintaining
model
and
system-level
safety
checks.

NVIDIA
has
optimized
the
deployment
of
these
models
through
its
NeMo
Retriever
NIM
microservices,
providing
enterprises
with
scalable
software
to
customize
their
data-dependent
RAG
pipelines.
The
NeMo
Retriever
NIMs
can
be
integrated
into
existing
RAG
pipelines
and
work
with
open-source
LLM
frameworks
like
LangChain
or
LlamaIndex.

LLMs
and
NIMs:
A
Powerful
Duo

In
a
customizable
agentic
RAG,
LLMs
equipped
with
function-calling
capabilities
play
a
crucial
role
in
decision-making
on
retrieved
data,
structured
output
generation,
and
tool
calling.
NeMo
Retriever
NIMs
enhance
this
process
by
providing
state-of-the-art
text
embedding
and
reranking
capabilities.

NVIDIA
NeMo
Retriever
NIMs

NeMo
Retriever
microservices,
packaged
with
NVIDIA
Triton
Inference
Server
and
NVIDIA
TensorRT,
offer
several
benefits:


  • Scalable
    deployment:

    Seamlessly
    scale
    to
    meet
    user
    demands.

  • Flexible
    integration:

    Integrate
    into
    existing
    workflows
    and
    applications
    with
    ease.

  • Secure
    processing:

    Ensure
    data
    privacy
    and
    rigorous
    data
    protection.

Meta
Llama
3.1
Tool
Calling

Llama
3.1
models
are
designed
for
serious
agentic
capabilities,
allowing
LLMs
to
plan
and
select
appropriate
tools
to
solve
complex
problems.
These
models
support
OpenAI-style
tool
calling,
facilitating
structured
outputs
without
the
need
for
regex
parsing.

RAG
with
Agents

Agentic
frameworks
enhance
RAG
pipelines
by
adding
layers
of
decision-making
and
self-reflection.
These
frameworks,
such
as
self-RAG
and
corrective
RAG,
improve
the
quality
of
retrieved
data
and
generated
responses
by
ensuring
post-generation
verification
and
alignment
with
factual
information.

Architecture
and
Node
Specifications

Multi-agent
frameworks
like
LangGraph
allow
developers
to
group
LLM
application-level
logic
into
nodes
and
edges,
offering
finer
control
over
agentic
decision-making.
Noteworthy
nodes
include:


  • Query
    decomposer:

    Breaks
    down
    complex
    questions
    into
    smaller
    logical
    parts.

  • Router:

    Decides
    the
    source
    of
    document
    retrieval
    or
    handles
    responses.

  • Retriever:

    Implements
    the
    core
    RAG
    pipeline,
    often
    combining
    semantic
    and
    keyword
    search
    methods.

  • Grader:

    Checks
    the
    relevance
    of
    retrieved
    passages.

  • Hallucination
    checker:

    Verifies
    the
    factual
    accuracy
    of
    generated
    content.

Additional
tools
can
be
integrated
based
on
specific
use
cases,
such
as
financial
calculators
for
answering
trend
or
growth-related
questions.

Getting
Started

Developers
can
access
NeMo
Retriever
embedding
and
reranking
NIM
microservices,
along
with
Llama
3.1
NIMs,
on
NVIDIA’s
AI
platform.
A
detailed
implementation
guide
is
available
in
NVIDIA’s
developer
Jupyter
notebook.

Image
source:
Shutterstock

Comments are closed.