Innovative Web App for Email Drafting from Voice Notes
Written on
Chapter 1: Overview of the Email Drafting Web App
This project combines Chrome’s speech recognition capabilities with GPT-3 to develop a web application that generates emails from your verbal notes. It supports multiple languages, making it accessible for a wider audience.
This concept originated from a query I received after sharing my previous articles on GPT-3 and speech recognition. An individual suggested I explore this idea as a potential project, though they eventually went quiet. Nevertheless, I felt motivated to pursue it as an intriguing endeavor.
The request was straightforward:
“I’m looking for a web application that lets me record voice notes, which then produces a draft email for me to easily copy and send. My daily routine involves writing numerous emails quickly, so this tool would be incredibly beneficial! I would, of course, compensate you for the work.”
This article describes the simplest version of the web app that I managed to create over the weekend. First, check out the functionality in the following video, then continue reading to understand how it operates, and finally, find a link to try it out in your Chrome browser:
Chapter 2: How It Functions
The web application leverages Chrome’s speech recognition and OpenAI’s GPT-3 to convert spoken notes into email drafts. It accommodates several languages, as supported by the speech recognition engine; while it could support even more, I opted to keep it manageable for now.
The app is built using JavaScript for its core operations, with HTML and CSS managing the user interface. Now, let’s delve into the specifics of its functionality.
Section 2.1: Application Mechanics
At first glance, you might think the app begins listening only when the “Start recording” button is pressed. However, it continuously listens and only records the recognized text when the button is in Recording mode, indicated by a red display.
Once the user clicks the button again to stop recording, a fetch function is triggered, sending a request to the OpenAI GPT-3 API. This request generates the email based on the user's spoken notes and the email-writing prompt detailed below. The resulting email is then shown to the user.
For more insights on implementing speech recognition in Chrome and making API calls to GPT-3, refer to these previous articles:
Section 2.2: The Email-Writing Prompt
A crucial aspect of this application is that the voice notes must be sent to GPT-3 along with specific instructions. These instructions guide the AI to draft an email based on the provided notes.
My implementation appends the voice notes to a prompt structured like this:
“Write an email complete with greetings and salutation following these indications: ”
I refer to this as the “basic prompt,” which is available in several languages:
var basicprompt = [];
basicprompt.push("Write an email complete with greetings and salutation following these indications: ");
basicprompt.push("Ecrivez un e-mail complet avec salutations et salutations en suivant ces indications: ");
basicprompt.push("Schreiben Sie eine E-Mail mit Grüßen und Anrede nach diesen Angaben: ");
basicprompt.push("Escribe un email completo con saludos, siguiendo estas indicaciones: ");
basicprompt.push("Scrivi una mail completa di saluti e saluti seguendo queste indicazioni: ");
These prompts are stored in an array, ensuring they match the languages available in the dropdown menu. When generating the full prompt for GPT-3, the correct language is selected accordingly.
Section 2.3: Future Enhancements
As I mentioned earlier, expanding language support is relatively easy, constrained only by the speech recognition engine's capabilities. One limitation of the current web app is that the emails tend to have a uniform tone, mainly due to the basic prompt formulation. It is possible to customize the email style by adjusting the prompts to be more formal or informal or even aligning them with personal preferences.
Link to Explore the Tool
Before diving in, keep these points in mind:
- This application relies on Google Chrome’s speech recognition, so use this browser to access it.
- I achieved consistent results on Windows laptops and desktops, but performance may vary on smartphones.
- You’ll need an API key from OpenAI, which can be obtained with free tokens here: OpenAI API Key.
Enjoy crafting numerous emails effortlessly!