The provided content details the creation of Maia, an AI assistant for Microsoft Word, which leverages Go, TypeScript, and OpenAI APIs to enhance writing capabilities within the Word environment.
Abstract
The article outlines the development of Maia, an AI writing assistant designed to integrate with Microsoft Word. The assistant utilizes Go for the API proxy, TypeScript for the Word Add-in, and OpenAI's APIs for text and image generation. Maia is capable of completing sentences, translating and summarizing paragraphs, and generating images based on text prompts, all within the Word interface. The project is a response to the current AI hype and aims to simplify and streamline the writing process by embedding AI functionalities directly into Word. The author provides a comprehensive guide on how to create the API proxy, develop the Office Add-in, and deploy the system for use, emphasizing the ease of use and potential for customization.
Opinions
The author expresses enthusiasm about the potential of AI, particularly in the context of writing assistance.
There is a note of caution regarding the current AI hype, comparing it to the dotcom era and questioning its sustainability.
The author demonstrates a preference for simplicity in coding, opting not to use JSON for the API proxy due to the perceived complexity.
The author values the power and flexibility of Office Add-ins over traditional methods like VBA or WordBasic.
There is a pragmatic approach to CORS, enabling it during development and restricting it in production for security purposes.
The author shows a preference for certain tools and platforms, such as Digital Ocean App Platform for deployment and Replicate.com for exploring alternative image generation APIs.
The author suggests a personal commitment to further development, considering the integration of other APIs and the possibility of fine-tuning the model for customized functionality.
There is an acknowledgment of the limitations of the current image generation capabilities of OpenAI APIs compared to DALL-E 2.
The author provides a subjective account of the convenience and improved user experience when interacting with AI directly within Word, as opposed to using a separate browser tab.
Create an AI Assistant for Microsoft Word
How to write a Word Add-in AI Assistant using Go, Typescript and the OpenAI APIs
You can’t read the news anywhere nowadays without seeing articles about ChatGPT, a new AI model, or someone using AI for something. The hype is so strong and thick that you might think it’s the mid-1990s and Web 1.0 again. Or is it?
Certainly, everyone’s jumping into the fray, even as tech layoffs started going off at all the tech giants. In fact, the tech giants like Microsoft, Google, Meta, Amazon and so on are doubling down on AI despite laying off tens of thousands of workers. Whether they are lemmings leaping to their doom like the dotcoms during the late 1990s, or there is really a pot of gold at the end of the rainbow, only time will tell.
In the meantime, the technology is pretty exciting! I can’t resist getting into the midst of it all and writing my own AI assistant, and it’s a great time to do so. I can’t write my own AI model, but there are plenty of choices to choose from for APIs.
In the course of poking around with different stuff, I eventually wrote an AI Assistant for Microsoft Word, which I called Maia (My AI Assistant). Here is how it works on Word on Mac on my computer.
And this is Maia running on my iPad.
Interested to know how I created Maia? Read on!
The big picture
I want an AI assistant that’s embedded into Word, which I can trigger to create new content, complete my sentences, translate and summarise paragraphs and generally help me with writing anything with Word.
To do this, I will need to embed some logic within Word or write some sort of plug-in for Word to extend Word’s capabilities. The entire Office suite has something called Office Add-ins, which allows programmers to do exactly this.
Next, I want to call some AI model APIs to do work for me. I could call the APIs directly from the add-in, but that would mean I must expose my API tokens and pass them to the users. It would also mean I need to put the processing logic in the add-in itself, which again is exposed and distributed to all users. If I want to change the logic or switch APIs, I will need to redistribute the add-in, which is a pain.
Instead, I chose to build my own simple API proxy that will take in the requests from the add-in, call the AI model APIs, get the responses and do whatever I want with the response before returning it to the user.
As for the model to use, I decided on OpenAI, partly because they have APIs for both text and image generation, which is what I want. And, of course, OpenAI’s text completion API is basically from the same family as ChatGPT and is really good too.
So here’s the 10,000 feet view of the AI assistant.
Word add-in <-> API proxy <-> OpenAI
Creating an API proxy to OpenAI
As you might have guessed, I wrote the API proxy using Go. It’s a relatively small web service, so everything goes into a single file. This is the entire program.
package main
import (
"io""log""net/http""os""github.com/joho/godotenv""github.com/sausheong/openai"
)
var env stringvar openAIApiKey stringvar openAIOrganization stringfuncinit() {
env = os.Getenv("ENV")
if env != "prod" {
err := godotenv.Load()
if err != nil {
log.Printf("Failed to load the env vars: %v", err)
}
}
openAIApiKey = os.Getenv("OPENAI_API_KEY")
openAIOrganization = os.Getenv("OPENAI_ORGANIZATION")
}
funcmain() {
addr := os.Getenv("PORT")
mux := http.NewServeMux()
mux.Handle("/static/", http.StripPrefix("/static",
http.FileServer(http.Dir("./static"))))
mux.HandleFunc("/ask", a)
mux.HandleFunc("/gen", g)
server := &http.Server{
Addr: ":" + addr,
Handler: mux,
}
server.ListenAndServe()
}
// handler for /askfunca(w http.ResponseWriter, r *http.Request) {
if env != "prod" {
enableCors(&w)
}
body, err := io.ReadAll(r.Body)
if err != nil {
log.Printf("Failed to read the body: %v", err)
}
data := string(body)
text, err := ask(data)
if err != nil {
log.Printf("Failed to talk to OpenAI: %v", err)
}
w.Write([]byte(text))
}
// handler for /genfuncg(w http.ResponseWriter, r *http.Request) {
if env != "prod" {
enableCors(&w)
}
body, err := io.ReadAll(r.Body)
if err != nil {
log.Printf("Failed to read the body: %v", err)
}
data := string(body)
img, err := generate(data)
if err != nil {
log.Printf("Failed to talk to OpenAI: %v", err)
}
w.Write([]byte(img))
}
// enable CORS for the APIfuncenableCors(w *http.ResponseWriter) {
(*w).Header().Set("Access-Control-Allow-Origin", "*")
}
// ask OpenAI to generate textfuncask(prompt string) (string, error) {
oaClient := openai.NewClient(openAIApiKey, openAIOrganization)
request := make(openai.CompletionRequest)
request.SetModel(openai.TEXT_DAVINCI_003)
request.SetPrompt(prompt + " {}")
request["temperature"] = 0.75
request["max_tokens"] = 4096 - len(prompt)
request["stop"] = "{}"
cr, err := oaClient.Complete(request)
if err != nil {
log.Println("Completion request failed:", err)
}
return cr.Text(), err
}
// ask OpenAI to generate an imagefuncgenerate(prompt string) (string, error) {
oaClient := openai.NewClient(openAIApiKey, openAIOrganization)
request := make(openai.ImageRequest)
request.SetPrompt(prompt)
request.SetFormat("b64_json")
request.SetSize("512x512")
cr, err := oaClient.GenerateImage(request)
if err != nil {
log.Println("Completion request failed:", err)
}
return cr.ImageBase64(), err
}
The code is pretty straightforward to understand, but let me point out a few things.
I didn’t bother with using JSON. This is such a simple program I thought it’ll be overkill.
There are only 2 functions in this program:
Ask the AI model (text-davinci-003, which is based on GPT-3) to do something. This could be to write something, summarise a paragraph, translate a paragraph, explain something — whatever a good AI assistant should do, but all text
Generate an image from a text prompt. This is based on the OpenAI APIs again.
Correspondingly there are 2 handlers functions and 2 functions that call OpenAI APIs. Here’s one of the handler functions:
funca(w http.ResponseWriter, r *http.Request) {
if env != "prod" {
enableCors(&w)
}
body, err := io.ReadAll(r.Body)
if err != nil {
log.Printf("Failed to read the body: %v", err)
}
data := string(body)
text, err := ask(data)
if err != nil {
log.Printf("Failed to talk to OpenAI: %v", err)
}
w.Write([]byte(text))
}
In production, I have to take care of CORS and disallow other domains to call this API, but when it’s not in production, I enable CORS by simply allowing everything.
Now let’s talk about how we call the OpenAI APIs. To ask OpenAI to generate text, I used their Completion API. I need an API key and an organisation, both provided by OpenAI when you request them for API access.
I create a completion request, then set the model to text-davinci-003 and pass it the prompt. Note that I added a{} string to the end of the prompt. This is to tell OpenAI that it’s the end of my input. This is needed because otherwise, it can get confused and won’t be able to answer your requests properly, especially when there is a lot of input (for example, when you ask it to summarise or translate a paragraph).
I also set the max_tokens parameter, which is the maximum number of tokens allowed in the output. The maximum number of tokens text-davinci-003 is capable of is 4096, including the input prompt, so the maximum output tokens is 4096 minus the length of the input prompt.
I create an ImageRequest, set the prompt and make the output to be formatted as base64_json. The base64_json returns the image as base64, which is just binary pixels represented in a text format. I also set the size to 512x512.
That’s it; it’s a simple API proxy to OpenAI APIs. We’ll talk about it’s deployed in a while when we’re putting it together with the Word Add-in.
Creating an Office Add-in for Word
The next thing to look at is the Word Add-in. A long time ago, I used to write WordBasic for Word 6. Later that became Visual Basic for Applications (VBA), and that was prevalent for manipulating Word documents (amongst other things). VBA is still around and is powerful, but the new kid on the block is the Office Add-ins, essentially a web application hosted on an Office application (in this case, Word).
Office Add-ins are just client-side web applications alongside a manifest file that describes how your web application talks to the various Office applications.
To create an Office Add-in, we can use the Yeoman generator for Office Add-ins. The Yeoman generator creates a Node.js web application, and you can install it with this command (assuming you already have Node.js).
$ npm install -g yo generator-office
To create the project, do this.
$ yo office
You should see something like this.
_-----_ ╭──────────────────────────╮
| | │ Welcome to the Office │
|--(o)--| │ Add-in generator, by │
`---------´ │ @OfficeDev! Let's create │
( _´U`_ ) │ a project together! │
/___A___\ /╰──────────────────────────╯
| ~ |
__'.___.'__
´ ` |° ´ Y `
? Choose a project type: (Use arrow keys)
❯ Office Add-in Task Pane project
Office Add-in Task Pane project using React framework
Office Add-in Task Pane project using Angular framework
Excel Custom Functionsusing a Shared Runtime
Excel Custom Functionsusing a JavaScript-only Runtime
Office Add-in Task Pane project supporting single sign-on (localhost)
Outlook Add-inwith Teams Manifest (Developer preview)
(Move up and down to reveal more choices)
Select Office Add-in Task Pane project and choose Typescript, name your add-in and finally, select Word as the application you want to support. The generator will spin and spit out our project, Maia.
Now go to the maia directory. You will see a bunch of generated files. You can ignore most of them for now, but take note of the manifest.xml file — this is the manifest file that defines the add-in, which you publish or pass around for your users to side-load.
Your sample source code will be in the src directory. The taskpane/taskpane.html HTML page is the page your user sees as the task pane on your Word application, while taskpane/taskpane.ts is where you will put your code to control the Word document and also the task pane. It might seem confusing if this is your first add-in, but these are the 2 most important files you need to modify.
For Maia, I have changed the taskpane.html to this.
<!-- Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License. --><!-- This file shows how to design a first-run page that provides a welcome screen to the user about the features of the add-in. --><!DOCTYPE html><html><head><metacharset="UTF-8" /><metahttp-equiv="X-UA-Compatible"content="IE=Edge" /><metaname="viewport"content="width=device-width, initial-scale=1"><title>Maia Add-in</title><!-- Office JavaScript API --><scripttype="text/javascript"src="https://appsforoffice.microsoft.com/lib/1.1/hosted/office.js"></script><linkhref="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"rel="stylesheet"integrity="sha384-GLhlTQ8iRABdZLl6O3oVMWSktQOp6b7In1Zl3/Jr59b6EGGoI1aFkw7cmDA6j6gD"crossorigin="anonymous"></head><body><divclass="container text-center"><div><h1class="display-1">Maia <imgsrc="/assets/icon-32.png"/></h1></div><divclass="p-3">
Maia is an AI assistant that can help you with suggestions for writing and image generation.
</div><divclass="p-3">
Position your cursor on the paragraph with the request or the image prompt and click on Ask or Create button.
</div><divclass="btn-group"role="group"aria-label="Maia buttons"><divclass="btn-group"role="group"><buttonid="ask"type="button"class="btn btn-outline-secondary">Ask</button></div><divclass="btn-group"role="group"><buttonid="gen"type="button"class="btn btn-outline-secondary">Create</button></div></div></div></body></html>
We need the office.js script for the APIs to control the Word document. I used Bootstrap because that’s more familiar to me. The rest are simply descriptions, while the 2 buttons are controlled with the taskpane.ts file.
As expected, the 2 main functions are to ask the assistant to do something and to generate an image.
The first one is the ask function, which gets the current paragraph and then calls the API proxy server, passing it the input text. When it receives a response from the server, it inserts it into the document using the insertText function.
The second function works similarly and receives text data from the proxy server, even though it’s an image this time. The function then inserts it into the document as an inline picture using the insertInlinePictureFromBase64 function.
During development, to test this add-in, you can go to the maia directory and run this.
$ npm start
This starts a local web server on port 3000 and opens up Word with your add-in loaded. You should see something like this on your terminal.
> [email protected] start
> office-addin-debugging start manifest.xml
Debugging is being started...
App type: desktop
Starting the dev server... (webpack serve --mode development)
The dev server is running on port 3000. Process id: 97999
Sideloading the Office Add-in...
Launching word via /var/folders/kc/btmw900j29vdrlx6lbm2lb2m0000gn/T/Word add-in 821d276f-2502-491e-af25-2f275d9a925d.docx
Debugging started
And then this will open up.
Putting them together
Now that we have both the Word Add-in and the API proxy, we need to put them together. It works on your computer, but if you want anyone else to use it, you need to integrate them and push them on the Internet.
While it’s possible to deploy the Add-in as static files to be hosted and then deploy the API proxy somewhere else, it’s easier to put them together as a single web application and push it out together.
Remember, I mentioned that the add-in is just some static HTML file with a Typescript file. To make it into proper Javascript code to be pushed out, we can run this.
$ npm run build
This will build the HTML and Typescript into a nice package under a dist directory. Now copy this entire dist directory and place it together with the AI proxy source code, in thestatic directory. This is where you set up the API proxy to serve files.
You can now deploy this somewhere, but please take note of the URLs used. The server needs to be served out with HTTPS, and you need to change the manifest.xml file and replace everything with https://localhost:3000 with your new domain name. For example in my case, I deployed to the subdomain maia.sausheong.com, so I changed everything that was https://localhost:3000 to https://maia.sausheong.com.
I deployed this on the Digital Ocean App Platform because it’s something simple and familiar to me, but you can do it on any of the cloud services or anywhere else.
Distributing the AI assistant
There are two ways of distributing an Office Add-in. The standard and more scalable way of doing it is through Microsoft AppSource, where users can add it to their Word application.
To sideload the add-in on a macOS computer, drop the manifest.xml file into this directory /Users/<username>/Library/Containers/com.microsoft.Word/Data/Documents/wef (replacing <username> with your actual user name, of course). Then restart Word, and on the Word ribbon, go to the Insert tab. Select My Add-ins (dropdown menu) in the Add-ins group, and on the dropdown menu, choose Maia.
To sideload the add-in on an iPad, connect your iPad to your macOS computer (using the lightning cable or otherwise). Open up Finder, and under Locations, choose the iPad. On the top of the Finder window, click Files, then look for Word. Drop the manifest.xml file into the Word folder. Then restart Word, and on the Insert tab, choose Add-ins and select Maia.
You can, of course, also sideload on a Windows computer, but I have no practical way of showing you because I don’t have a Windows computer any more. However, the instructions for doing so are here.
You can also sideload on the Office for Web, but the process is a bit more elaborate, so if you are interested, you can also find the instructions here.
What’s next?
This was a simple and fun little project that was surprisingly useful. While practically this is no different from running ChatGPT on a browser tab and then cutting and pasting the response into Word, it’s quite a different experience getting the results directly into your document.
Whenever I need ideas to kick-start a piece of writing, it’s just a couple of clicks away from seeding something from Maia. Translating, summarising and doing other tasks on Word is also simpler and more convenient.
The only grouse I have is that the images generated by the OpenAI APIs are not as good as DALL-E 2, which is the standard I was expecting. I’m still considering if I should switch to another API (there are plenty on Replicate.com).
I’m also considering other services I can offer within Maia through other APIs and even fine-tuning some specific data, effectively creating a customised model for Maia.
Code
Here’s the code for Maia, including the dist for the Word Add-in and the API proxy.