Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

st’s advanced concurrency features provide excellent tooling for efficient and safe multi-threaded programming, maximizing your LLM’s throughput potential.</li><li><b>Web Ecosystem:</b> While Rust may be newer relative to languages like Python and JavaScript, its web development ecosystem is growing rapidly. Frameworks like Actix Web and Rocket offer mature solutions for building high-performance REST APIs.</li><li><b>Cross-Platform Compatibility:</b> Applications built with Rust can easily compile to run on virtually any operating system (Windows, Linux, macOS, etc.). This versatility is a tremendous advantage in deployment scenarios.</li></ol><h1 id="0df0">Let’s set the stage</h1><figure id="d2d7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*AcpA4MkKboaPY0ONHHsN2g.jpeg"><figcaption></figcaption></figure><p id="4fe8">To interact with LLMs from Rust programs, there are a few primary methods:</p><ol><li><b>API Clients:</b> Many LLM services provide readily available REST APIs. Rust offers excellent HTTP client libraries, such as <code>reqwest</code>, to facilitate seamless communication with these APIs.</li><li><b>Model Hosting:</b> If you need low-latency or offline access, consider hosting language models directly within your Rust server. Rust bindings exist for popular frameworks like ONNX Runtime, allowing you to load and execute models locally.</li><li><b>Hybrid Approaches:</b> In some cases, a combination of the above approaches might be optimal. Your Rust server could interact with an external API when dealing with larger, more computationally intensive LLMs, while hosting smaller models locally for real-time tasks.</li></ol><h1 id="39fa">Our approach</h1><figure id="5b02"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*yxk1qTnQ9WfTTwKl-p20lg.jpeg"><figcaption></figcaption></figure><p id="c74d">In this design brainstorming session, we’ll outline the conceptual framework and key components for building a Rust-based REST server aimed at serving Language Model (LM) requests efficiently. Our goal is to design a scalable and performant server architecture that can handle various LM-related functionalities such as chat interactions, health checks, and version information retrieval.</p><h1 id="c75f">Problem Definition</h1><p id="8384"><b>Goal:</b> Establish a clear objective for our server. Possibilities include:</p><ul><li>Providing a central point of access and control for one or more large language models.</li><li>Offering an API layer for other applications to leverage LLM capabilities easily.</li><li>Abstracting away platform-specific LLM details behind a simple REST interface.</li></ul><h1 id="93e8">Target Users:</h1><p id="822d">Who are we building this server for?</p><ul><li>Developers building LLM-powered applications.</li><li>Data scientists conducting experiments with LLMs.</li><li>Int

Options

ernal services within an organization that need LLM functionality.</li></ul><h1 id="40ab">Design Thinking for a Rust LLM REST Server</h1><ol><li>Project Structure:</li></ol><p id="692a">We’ll start by defining the overall project structure, including modules, dependencies, and project organization. This involves setting up a Cargo-based project with appropriate dependencies for handling HTTP requests, JSON serialization, and any required LM-related functionality.</p><p id="08d8">2. Endpoint Design:</p><p id="347b">Next, we’ll design the REST API endpoints that our server will expose. Key endpoints may include:</p><ul><li><code>/api/query</code>: Endpoint for handling chat interactions with the Language Model.</li><li><code>/api/health</code>: Endpoint for performing health checks to ensure the server is running smoothly.</li><li><code>/api/app/version</code>: Endpoint for retrieving version information of the server application.</li></ul><p id="7e21">Each endpoint will have specific request/response formats and logic for handling incoming requests and generating appropriate responses.</p><p id="5660">3. Language Model Integration:</p><p id="cfb7">We’ll integrate the Language Model functionality into our server to handle chat interactions. This may involve leveraging existing LM libraries or implementing custom logic to interact with the LM backend.</p><p id="7166">4. Error Handling:</p><p id="51e4">Error handling is crucial for ensuring the reliability of our server. We’ll design robust error handling mechanisms to gracefully handle errors and return meaningful error responses to clients.</p><p id="4375">5. Concurrency and Performance:</p><p id="2b52">Rust’s concurrency features will be leveraged to ensure our server can handle multiple requests concurrently without compromising performance or safety. We’ll design our server to efficiently utilize system resources and minimize latency.</p><p id="b4f6">6. Configuration and Deployment:</p><p id="ee30">We’ll design our server to be configurable and deployable in various environments. This involves defining configuration options for server settings such as port number, log levels, and any other relevant parameters.</p><p id="e1e2">7. Testing and Quality Assurance:</p><p id="0a99">Comprehensive testing will be an integral part of our design process. We’ll plan for unit tests, integration tests, and possibly end-to-end tests to ensure the reliability and correctness of our server implementation.</p><p id="647b">Conclusion:</p><p id="860d">This design brainstorming session provides a high-level overview of the key components and considerations involved in building a Rust-based REST server for serving Language Model requests. By carefully planning and designing our server architecture, we can create a robust and scalable platform for handling LM interactions effectively.</p></article></body>

Migrating Pinterest profiles to React

Imad Elyafi | Pinterest engineer, Core Experience

Since 2012, we’ve scaled our web framework Denzel (named after the greatest actor of all time) on top of Backbone. But nowadays, React is a golden standard. It has a large developer community and enables excellent engineering velocity and performance. Here we’ll look at techniques we tried and challenges we faced while migrating to React, starting with Pinner profiles.

Preparing infrastructure (server-side Nunjucks)

We frequently ship new features and run hundreds of experiments every day, so we couldn’t freeze product development in order to rebuild our whole website in React. While it’s relatively easy to build a new web app in React, migrating a service that’s constantly changing and used by millions of people is a much more complicated challenge. It’s like changing the engines of an airplane while mid-flight.

Before, we used the Jinja templating engine for server-side rendering in Python, and the JavaScript equivalent called Nunjucks for client-side browser rendering. These templating engines are very similar and allow us to have universal rendering (the same templates on both client and server). Since React can’t be rendered in Python, as a very first step, we enabled Nunjucks rendering on a stand-alone NodeJS server. Now, we have pure isomorphic rendering, with JavaScript on the server and on the client.

Denzel-React bridge

In order to empower engineers to incrementally convert core parts of our UI to React, we enabled React rendering inside Denzel. In most cases React.render() can be used, so we added React-specific bindings to Nunjucks’ templating language with a new keyword, component, to represent the “bridge” between Denzel and React. In combination with our A/B testing framework, we can easily measure React components against legacy Denzel components and control for certain metrics such as Pinner wait time, time-to-first-byte and error rate.

Here’s an example of rendering MyReactComponent.js in Nunjucks templates:

{% if in_react %}
  {{ component('MyReactComponent', {pinId: '123'}) }}
{% else %}
  {{ module('MyDenzelComponent', pinId='123') }}
{% endif %}

Higher order components for adapters

The last step is to supply data to the newly created React components. We created a Higher Order Component (HOC) called withResource to easily fetch data from our API while remaining composable with other HOCs. A simplified version of withResource provides a data prop to the wrapped component:

import ResourceFactory from 'ResourceFactory';

export default withResource = ({ name, options }) => (Subject) => {
  return class extends Component {

    state = {
      data: null
    };

    componentDidMount() {
      const resourceOptions = getOptions(options, this.props);
      const resource = ResourceFactory.create(name, resourceOptions);

      resource.callGet().then((resourceResponse) => {
        this.setState({ data: resourceResponse.data });
      });
    }

    render() {
      return (
        <Subject
          data={this.state.data}
          {...this.props}
        />
      );
    }
  };
};

Example

import withResource from 'withResource';

class MyComponent extends Component {
  render() {
    return (
      <div>{'Username:' + this.props.data.name}</div>
    );
  }
};

export default withResource({
  name: 'MyResource',
  options: props => {
    return { pin_id: props.pinId };
  },
})(MyComponent);

Results

In converting Pinners’ profiles to React, we’ve seen consistent performance and engagement improvements. Performance and engagement metrics each have increased 20 percent. While we converted profiles, web product engineers continued making changes to old Denzel components and simultaneously created new React components.

Acknowledgements: The core contributors to the project were Imad Elyafi, Braden Anderson, Chris Lloyd, Kevin Grandon, Jessica Chan along with the rest of the WebCore Experience team. A number of engineers across the company also provided helpful feedback.