reCAPTCHA WAF Session Token
Programming Languages

Build a Speech-to-text Web App with Whisper, React and Node

In this article, we’ll build a speech-to-text application using OpenAI’s Whisper, along with React, Node.js, and FFmpeg. The app will take user input, synthesize it into speech using OpenAI’s Whisper API, and output the resulting text. Whisper gives the most accurate speech-to-text transcription I’ve used, even for a non-native English speaker.

Table of Contents
  1. Introducing Whisper
  2. Prerequisites
  3. Tech Stack
  4. Setting Up the Project
  5. Integrating Whisper
  6. Installing FFmpeg
  7. Trim Audio in the Code
  8. The Frontend
  9. Conclusion

Introducing Whisper

OpenAI explains that Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the Web.

Text is easier to search and store than audio. However, transcribing audio to text can be quite laborious. ASRs like Whisper can detect speech and transcribe the audio to text with a high level of accuracy and very quickly, making it a particularly useful tool.

Prerequisites

This article is aimed at developers who are familiar with JavaScript and have a basic understanding of React and Express.

If you want to build along, you’ll need an API key. You can obtain one by signing up for an account on the OpenAI platform. Once you have an API key, make sure to keep it secure and not share it publicly.

Tech Stack

We’ll be building the frontend of this app with Create React App (CRA). All we’ll be doing in the frontend is uploading files, picking time boundaries, making network requests and managing a few states. I chose CRA for simplicity. Feel free to use any frontend library you prefer or even plain old JS. The code should be mostly transferable.

For the backend, we’ll be using Node.js and Express, just so we can stick with a full JS stack for this app. You can use Fastify or any other alternative in place of Express and you should still be able to follow along.

Note: in order to keep this article focussed on the subject, long blocks of code will be linked to, so we can focus on the real tasks at hand.

Setting Up the Project

We start by creating a new folder that will contain both the frontend and backend for the project for organizational purposes. Feel free to choose any other structure you prefer:

<code class="bash language-bash"><span class="token function">mkdir</span> speech-to-text-app
<span class="token builtin class-name">cd</span> speech-to-text-app
</code>

Next, we initialize a new React application using create-react-app:

<code class="bash language-bash">npx create-react-app frontend
</code>

Navigate to the new frontend folder and install axios to make network requests and react-dropzone for file upload with the code below:

<code class="bash language-bash"><span class="token builtin class-name">cd</span> frontend
<span class="token function">npm</span> <span class="token function">install</span> axios react-dropzone react-select react-toastify
</code>

Now, let’s switch back into the main folder and create the backend folder:

<code class="bash language-bash"><span class="token builtin class-name">cd</span> <span class="token punctuation">..</span>
<span class="token function">mkdir</span> backend
<span class="token builtin class-name">cd</span> backend
</code>

Next, we initialize a new Node application in our backend directory, while also installing the required libraries:

<code class="bash language-bash"><span class="token function">npm</span> init -y
<span class="token function">npm</span> <span class="token function">install</span> express dotenv cors multer form-data axios fluent-ffmpeg ffmetadata ffmpeg-static
<span class="token function">npm</span> <span class="token function">install</span> --save-dev nodemon
</code>

In the code above, we’ve installed the following libraries:

  • dotenv: necessary to keep our OpenAI API key away from the source code.
  • cors: to enable cross-origin requests.
  • multer: middleware for uploading our audio files. It adds a .file or .files object to the request object, which we’ll then access in our route handlers.
  • form-data: to programmatically create and submit forms with file uploads and fields to a server.
  • axios: to make network requests to the Whisper endpoint.

Also, since we’ll be using FFmpeg for audio trimming, we have these libraries:

  • fluent-ffmpeg: this provides a fluent API to work with the FFmpeg tool, which we’ll use for audio trimming.
  • ffmetadata: this is used for reading and writing metadata in media files. We need it to retrieve the audio duration.
  • ffmpeg-static: this provides static FFmpeg binaries for different platforms, and simplifies deploying FFmpeg.

Our entry file for the Node.js app will be index.js. Create the file inside the backend folder and open it in a code editor. Let’s wire up a basic Express server:

<code class="javascript language-javascript"><span class="token keyword">const</span> express <span class="token operator">=</span> <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'express'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">const</span> cors <span class="token operator">=</span> <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'cors'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">const</span> app <span class="token operator">=</span> <span class="token function">express</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

app<span class="token punctuation">.</span><span class="token method function property-access">use</span><span class="token punctuation">(</span><span class="token function">cors</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
app<span class="token punctuation">.</span><span class="token method function property-access">use</span><span class="token punctuation">(</span>express<span class="token punctuation">.</span><span class="token method function property-access">json</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

app<span class="token punctuation">.</span><span class="token method function property-access">get</span><span class="token punctuation">(</span><span class="token string">"https://www.sitepoint.com/"</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token parameter">req<span class="token punctuation">,</span> res</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
  res<span class="token punctuation">.</span><span class="token method function property-access">send</span><span class="token punctuation">(</span><span class="token string">'Welcome to the Speech-to-Text API!'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token keyword">const</span> <span class="token constant">PORT</span> <span class="token operator">=</span> process<span class="token punctuation">.</span><span class="token property-access">env</span><span class="token punctuation">.</span><span class="token constant">PORT</span> <span class="token operator">||</span> <span class="token number">3001</span><span class="token punctuation">;</span>
app<span class="token punctuation">.</span><span class="token method function property-access">listen</span><span class="token punctuation">(</span><span class="token constant">PORT</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
  <span class="token console class-name">console</span><span class="token punctuation">.</span><span class="token method function property-access">log</span><span class="token punctuation">(</span><span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">Server is running on port </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span><span class="token constant">PORT</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code>

Update package.json in the backend folder to include start and dev scripts:

<code class="javascript language-javascript"><span class="token string">"scripts"</span><span class="token operator">:</span> <span class="token punctuation">{</span>
  <span class="token string">"start"</span><span class="token operator">:</span> <span class="token string">"node index.js"</span><span class="token punctuation">,</span>
  <span class="token string">"dev"</span><span class="token operator">:</span> <span class="token string">"nodemon index.js"</span><span class="token punctuation">,</span>
<span class="token punctuation">}</span>
</code>

The above code simply registers a simple GET route. When we run npm run dev and go to localhost:3001 or whatever our port is, we should see the welcome text.

Integrating Whisper

Now it’s time to add the secret sauce! In this section, we’ll:

  • accept a file upload on a POST route
  • convert the file to a readable stream
  • very importantly, send the file to Whisper for transcription
  • send the response back as JSON

Let’s now create a .env file at the root of the backend folder to store our API Key, and remember to add it to gitignore:

<code class="bash language-bash"><span class="token assign-left variable">OPENAI_API_KEY</span><span class="token operator">=</span>YOUR_API_KEY_HERE
</code>

First, let’s import some of the libraries we need to update file uploads, network requests and streaming:

<code class="javascript language-javascript"><span class="token keyword">const</span>  multer  <span class="token operator">=</span>  <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'multer'</span><span class="token punctuation">)</span>
<span class="token keyword">const</span>  <span class="token maybe-class-name">FormData</span>  <span class="token operator">=</span>  <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'form-data'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">const</span> <span class="token punctuation">{</span> <span class="token maybe-class-name">Readable</span> <span class="token punctuation">}</span> <span class="token operator">=</span>  <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'stream'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">const</span>  axios  <span class="token operator">=</span>  <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'axios'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token keyword">const</span>  upload  <span class="token operator">=</span>  <span class="token function">multer</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code>

Next, we’ll create a simple utility function to convert the file buffer into a readable stream that we’ll send to Whisper:

<code class="javascript language-javascript"><span class="token keyword">const</span>  <span class="token function-variable function">bufferToStream</span>  <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token parameter">buffer</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
  <span class="token keyword control-flow">return</span>  <span class="token maybe-class-name">Readable</span><span class="token punctuation">.</span><span class="token keyword module">from</span><span class="token punctuation">(</span>buffer<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code>

We’ll create a new route, /api/transcribe, and use axios to make a request to OpenAI.

First, import axios at the top of the app.js file: const axios = require('axios');.

Then, create the new route, like so:

<code class="javascript language-javascript">app<span class="token punctuation">.</span><span class="token method function property-access">post</span><span class="token punctuation">(</span><span class="token string">'/api/transcribe'</span><span class="token punctuation">,</span> upload<span class="token punctuation">.</span><span class="token method function property-access">single</span><span class="token punctuation">(</span><span class="token string">'file'</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token keyword">async</span> <span class="token punctuation">(</span><span class="token parameter">req<span class="token punctuation">,</span> res</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
  <span class="token keyword control-flow">try</span> <span class="token punctuation">{</span>
    <span class="token keyword">const</span>  audioFile  <span class="token operator">=</span> req<span class="token punctuation">.</span><span class="token property-access">file</span><span class="token punctuation">;</span>
    <span class="token keyword control-flow">if</span> <span class="token punctuation">(</span><span class="token operator">!</span>audioFile<span class="token punctuation">)</span> <span class="token punctuation">{</span>
      <span class="token keyword control-flow">return</span> res<span class="token punctuation">.</span><span class="token method function property-access">status</span><span class="token punctuation">(</span><span class="token number">400</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token method function property-access">json</span><span class="token punctuation">(</span><span class="token punctuation">{</span> error<span class="token operator">:</span> <span class="token string">'No audio file provided'</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">const</span>  formData  <span class="token operator">=</span>  <span class="token keyword">new</span>  <span class="token class-name">FormData</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span>  audioStream  <span class="token operator">=</span>  <span class="token function">bufferToStream</span><span class="token punctuation">(</span>audioFile<span class="token punctuation">.</span><span class="token property-access">buffer</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    formData<span class="token punctuation">.</span><span class="token method function property-access">append</span><span class="token punctuation">(</span><span class="token string">'file'</span><span class="token punctuation">,</span> audioStream<span class="token punctuation">,</span> <span class="token punctuation">{</span> filename<span class="token operator">:</span> <span class="token string">'audio.mp3'</span><span class="token punctuation">,</span> contentType<span class="token operator">:</span> audioFile<span class="token punctuation">.</span><span class="token property-access">mimetype</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    formData<span class="token punctuation">.</span><span class="token method function property-access">append</span><span class="token punctuation">(</span><span class="token string">'model'</span><span class="token punctuation">,</span> <span class="token string">'whisper-1'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    formData<span class="token punctuation">.</span><span class="token method function property-access">append</span><span class="token punctuation">(</span><span class="token string">'response_format'</span><span class="token punctuation">,</span> <span class="token string">'json'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span>  config  <span class="token operator">=</span> <span class="token punctuation">{</span>
      headers<span class="token operator">:</span> <span class="token punctuation">{</span>
        <span class="token string">"Content-Type"</span><span class="token operator">:</span> <span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">multipart/form-data; boundary=</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>formData<span class="token punctuation">.</span><span class="token property-access">_boundary</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><span class="token punctuation">,</span>
        <span class="token string">"Authorization"</span><span class="token operator">:</span> <span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">Bearer </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>process<span class="token punctuation">.</span><span class="token property-access">env</span><span class="token punctuation">.</span><span class="token constant">OPENAI_API_KEY</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><span class="token punctuation">,</span>
      <span class="token punctuation">}</span><span class="token punctuation">,</span>
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    
    <span class="token keyword">const</span>  response  <span class="token operator">=</span>  <span class="token keyword control-flow">await</span> axios<span class="token punctuation">.</span><span class="token method function property-access">post</span><span class="token punctuation">(</span><span class="token string">'https://api.openai.com/v1/audio/transcriptions'</span><span class="token punctuation">,</span> formData<span class="token punctuation">,</span> config<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span>  transcription  <span class="token operator">=</span> response<span class="token punctuation">.</span><span class="token property-access">data</span><span class="token punctuation">.</span><span class="token property-access">text</span><span class="token punctuation">;</span>
    res<span class="token punctuation">.</span><span class="token method function property-access">json</span><span class="token punctuation">(</span><span class="token punctuation">{</span> transcription <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
  <span class="token punctuation">}</span> <span class="token keyword control-flow">catch</span> <span class="token punctuation">(</span>error<span class="token punctuation">)</span> <span class="token punctuation">{</span>
    res<span class="token punctuation">.</span><span class="token method function property-access">status</span><span class="token punctuation">(</span><span class="token number">500</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token method function property-access">json</span><span class="token punctuation">(</span><span class="token punctuation">{</span> error<span class="token operator">:</span> <span class="token string">'Error transcribing audio'</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
  <span class="token punctuation">}</span>
<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code>

In the code above, we use the utility function bufferToStream to convert the audio file buffer into a readable stream, then send it over a network request to Whisper and await the response, which is then sent back as a JSON response.

You can check the docs for more on the request and response for Whisper.

Installing FFmpeg

We’ll add additional functionality below to allow the user to transcribe a part of the audio. To do this, our API endpoint will accept startTime and endTime, after which we’ll trim the audio with ffmpeg.

Installing FFmpeg for Windows

To install FFmpeg for Windows, follow the simple steps below:

  1. Visit the FFmpeg official website’s download page here.
  2. Under the Windows icon there are several links. Choose the link that says “Windows Builds”, by gyan.dev.
  3. Download the build that corresponds to our system (32 or 64 bit). Make sure to download the “static” version to get all the libraries included.
  4. Extract the downloaded ZIP file. We can place the extracted folder wherever we prefer.
  5. To use FFmpeg from the command line without having to navigate to its folder, add the FFmpeg bin folder to the system PATH.

Installing FFmpeg for macOS

If we’re on macOS, we can install FFmpeg with Homebrew:

<code class="bash language-bash">brew <span class="token function">install</span> ffmpeg
</code>

Installing FFmpeg for Linux

If we’re on Linux, we can install FFmpeg with apt, dnf or pacman, depending on our Linux distribution. Here’s the command for installing with apt:

<code class="bash language-bash"><span class="token function">sudo</span> <span class="token function">apt</span> update
<span class="token function">sudo</span> <span class="token function">apt</span> <span class="token function">install</span> ffmpeg
</code>

Trim Audio in the Code

Why do we need to trim the audio? Say a user has an hour-long audio file and only wants to transcribe from the 15-minute mark to 45-minute mark. With FFmpeg, we can trim to the exact startTime and endTime, before sending the trimmed stream to Whisper for transcription.

First, we’ll import the the following libraries:

<code class="javascript language-javascript"><span class="token keyword">const</span> ffmpeg <span class="token operator">=</span> <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'fluent-ffmpeg'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">const</span> ffmpegPath <span class="token operator">=</span> <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'ffmpeg-static'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">const</span> ffmetadata <span class="token operator">=</span> <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'ffmetadata'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">const</span> fs  <span class="token operator">=</span>  <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">'fs'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

ffmpeg<span class="token punctuation">.</span><span class="token method function property-access">setFfmpegPath</span><span class="token punctuation">(</span>ffmpegPath<span class="token punctuation">)</span><span class="token punctuation">;</span>
</code>
  • fluent-ffmpeg is a Node.js module that provides a fluent API for interacting with FFmpeg.
  • ffmetadata will be used to read the metadata of the audio file — specifically, the duration.
  • ffmpeg.setFfmpegPath(ffmpegPath) is used to explicitly set the path to the FFmpeg binary.

Next, let’s create a utility function to convert time passed as mm:ss into seconds. This can be outside of our app.post route, just like the bufferToStream function:

<code class="javascript language-javascript">
<span class="token keyword">const</span> <span class="token function-variable function">parseTimeStringToSeconds</span> <span class="token operator">=</span> <span class="token parameter">timeString</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
    <span class="token keyword">const</span> <span class="token punctuation">[</span>minutes<span class="token punctuation">,</span> seconds<span class="token punctuation">]</span> <span class="token operator">=</span> timeString<span class="token punctuation">.</span><span class="token method function property-access">split</span><span class="token punctuation">(</span><span class="token string">':'</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token method function property-access">map</span><span class="token punctuation">(</span><span class="token parameter">tm</span> <span class="token arrow operator">=></span> <span class="token function">parseInt</span><span class="token punctuation">(</span>tm<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword control-flow">return</span> minutes <span class="token operator">*</span> <span class="token number">60</span> <span class="token operator">+</span> seconds<span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code>

Next, we should update our app.post route to do the following:

  • accept the startTime and endTime
  • calculate the duration
  • deal with basic error handling
  • convert audio buffer to stream
  • trim audio with FFmpeg
  • send the trimmed audio to OpenAI for transcription

The trimAudio function trims an audio stream between a specified start time and end time, and returns a promise that resolves with the trimmed audio data. If an error occurs at any point in this process, the promise is rejected with that error.

Let’s break down the function step by step.

  1. Define the trim audio function. The trimAudio function is asynchronous and accepts the audioStream and endTime as arguments. We define temporary filenames for processing the audio:

    <code class="javascript language-javascript"><span class="token keyword">const</span> <span class="token function-variable function">trimAudio</span> <span class="token operator">=</span> <span class="token keyword">async</span> <span class="token punctuation">(</span><span class="token parameter">audioStream<span class="token punctuation">,</span> endTime</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
        <span class="token keyword">const</span> tempFileName <span class="token operator">=</span> <span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">temp-</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span><span class="token known-class-name class-name">Date</span><span class="token punctuation">.</span><span class="token method function property-access">now</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">.mp3</span><span class="token template-punctuation string">`</span></span><span class="token punctuation">;</span>
        <span class="token keyword">const</span> outputFileName <span class="token operator">=</span> <span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">output-</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span><span class="token known-class-name class-name">Date</span><span class="token punctuation">.</span><span class="token method function property-access">now</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">.mp3</span><span class="token template-punctuation string">`</span></span><span class="token punctuation">;</span>
    </code>
  2. Write stream to a temporary file. We write the incoming audio stream into a temporary file using fs.createWriteStream(). If there’s an error, the Promise gets rejected:

    <code class="javascript language-javascript"><span class="token keyword control-flow">return</span> <span class="token keyword">new</span> <span class="token class-name">Promise</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token parameter">resolve<span class="token punctuation">,</span> reject</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
        audioStream<span class="token punctuation">.</span><span class="token method function property-access">pipe</span><span class="token punctuation">(</span>fs<span class="token punctuation">.</span><span class="token method function property-access">createWriteStream</span><span class="token punctuation">(</span>tempFileName<span class="token punctuation">)</span><span class="token punctuation">)</span>
    </code>
  3. Read metadata and set endTime. After the audio stream finishes writing to the temporary file, we read the metadata of the file using ffmetadata.read(). If the provided endTime is longer than the audio duration, we adjust endTime to be the duration of the audio:

    <code class="javascript language-javascript"><span class="token punctuation">.</span><span class="token method function property-access">on</span><span class="token punctuation">(</span><span class="token string">'finish'</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
        ffmetadata<span class="token punctuation">.</span><span class="token method function property-access">read</span><span class="token punctuation">(</span>tempFileName<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token parameter">err<span class="token punctuation">,</span> metadata</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
            <span class="token keyword control-flow">if</span> <span class="token punctuation">(</span>err<span class="token punctuation">)</span> <span class="token function">reject</span><span class="token punctuation">(</span>err<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token keyword">const</span> duration <span class="token operator">=</span> <span class="token function">parseFloat</span><span class="token punctuation">(</span>metadata<span class="token punctuation">.</span><span class="token property-access">duration</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token keyword control-flow">if</span> <span class="token punctuation">(</span>endTime <span class="token operator">></span> duration<span class="token punctuation">)</span> endTime <span class="token operator">=</span> duration<span class="token punctuation">;</span>
    </code>
  4. Trim Audio using FFmpeg. We utilize FFmpeg to trim the audio based on the start time (startSeconds) received and duration (timeDuration) calculated earlier. The trimmed audio is written to the output file:

    <code class="javascript language-javascript"><span class="token function">ffmpeg</span><span class="token punctuation">(</span>tempFileName<span class="token punctuation">)</span>
        <span class="token punctuation">.</span><span class="token method function property-access">setStartTime</span><span class="token punctuation">(</span>startSeconds<span class="token punctuation">)</span>
        <span class="token punctuation">.</span><span class="token method function property-access">setDuration</span><span class="token punctuation">(</span>timeDuration<span class="token punctuation">)</span>
        <span class="token punctuation">.</span><span class="token method function property-access">output</span><span class="token punctuation">(</span>outputFileName<span class="token punctuation">)</span>
    </code>
  5. Delete temporary files and resolve promise. After trimming the audio, we delete the temporary file and read the trimmed audio into a buffer. We also delete the output file using the Node.js file system after reading it to the buffer. If everything goes well, the Promise gets resolved with the trimmedAudioBuffer. In case of an error, the Promise gets rejected:

    <code class="javascript language-javascript"><span class="token punctuation">.</span><span class="token method function property-access">on</span><span class="token punctuation">(</span><span class="token string">'end'</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
        fs<span class="token punctuation">.</span><span class="token method function property-access">unlink</span><span class="token punctuation">(</span>tempFileName<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token parameter">err</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
            <span class="token keyword control-flow">if</span> <span class="token punctuation">(</span>err<span class="token punctuation">)</span> <span class="token console class-name">console</span><span class="token punctuation">.</span><span class="token method function property-access">error</span><span class="token punctuation">(</span><span class="token string">'Error deleting temp file:'</span><span class="token punctuation">,</span> err<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">const</span> trimmedAudioBuffer <span class="token operator">=</span> fs<span class="token punctuation">.</span><span class="token method function property-access">readFileSync</span><span class="token punctuation">(</span>outputFileName<span class="token punctuation">)</span><span class="token punctuation">;</span>
    fs<span class="token punctuation">.</span><span class="token method function property-access">unlink</span><span class="token punctuation">(</span>outputFileName<span class="token punctuation">,</span> <span class="token punctuation">(</span>err<span class="token punctuation">)</span> <span class="token operator">=</span><span class="token operator">&</span>gt<span class="token punctuation">;</span> <span class="token punctuation">{</span>
        <span class="token keyword control-flow">if</span> <span class="token punctuation">(</span>err<span class="token punctuation">)</span> <span class="token console class-name">console</span><span class="token punctuation">.</span><span class="token method function property-access">error</span><span class="token punctuation">(</span><span class="token string">'Error deleting output file:'</span><span class="token punctuation">,</span> err<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    
    <span class="token function">resolve</span><span class="token punctuation">(</span>trimmedAudioBuffer<span class="token punctuation">)</span><span class="token punctuation">;</span>
    
    <span class="token punctuation">}</span><span class="token punctuation">)</span>
    <span class="token punctuation">.</span><span class="token method function property-access">on</span><span class="token punctuation">(</span><span class="token string">'error'</span><span class="token punctuation">,</span> reject<span class="token punctuation">)</span>
    <span class="token punctuation">.</span><span class="token method function property-access">run</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    </code>

The full code for the endpoint is available in this GitHub repo.

The Frontend

The styling will be done with Tailwind, but I won’t cover setting up Tailwind. You can read about how to set up and use Tailwind here.

Creating the TimePicker component

Since our API accepts startTime and endTime, let’s create a TimePicker component with react-select.
Using react-select simply adds other features to the select menu like searching the options, but it’s not critical to this article and can be skipped.

Let’s break down the TimePicker React component below:

  1. Imports and component declaration. First, we import necessary packages and declare our TimePicker component. The TimePicker component accepts the props id, label, value, onChange, and maxDuration:

    <code class="javascript language-javascript"><span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">React</span><span class="token punctuation">,</span> <span class="token punctuation">{</span> useState<span class="token punctuation">,</span> useEffect<span class="token punctuation">,</span> useCallback <span class="token punctuation">}</span></span> <span class="token keyword module">from</span> <span class="token string">'react'</span><span class="token punctuation">;</span>
    <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">Select</span></span> <span class="token keyword module">from</span> <span class="token string">'react-select'</span><span class="token punctuation">;</span>
    
    <span class="token keyword">const</span> <span class="token function-variable function"><span class="token maybe-class-name">TimePicker</span></span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token parameter"><span class="token punctuation">{</span> id<span class="token punctuation">,</span> label<span class="token punctuation">,</span> value<span class="token punctuation">,</span> onChange<span class="token punctuation">,</span> maxDuration <span class="token punctuation">}</span></span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
    </code>
  2. Parse the value prop. The value prop is expected to be a time string (format HH:MM:SS). Here we split the time into hours, minutes, and seconds:

    <code class="javascript language-javascript"><span class="token keyword">const</span> <span class="token punctuation">[</span>hours<span class="token punctuation">,</span> minutes<span class="token punctuation">,</span> seconds<span class="token punctuation">]</span> <span class="token operator">=</span> value<span class="token punctuation">.</span><span class="token method function property-access">split</span><span class="token punctuation">(</span><span class="token string">':'</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token method function property-access">map</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token parameter">v</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token function">parseInt</span><span class="token punctuation">(</span>v<span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    </code>
  3. Calculate maximum values. maxDuration is the maximum time in seconds that can be selected, based on audio duration. It’s converted into hours, minutes, and seconds:

    <code class="javascript language-javascript"><span class="token keyword">const</span> validMaxDuration <span class="token operator">=</span> maxDuration <span class="token operator">===</span> <span class="token number">Infinity</span> <span class="token operator">?</span> <span class="token number">0</span> <span class="token operator">:</span> maxDuration
    <span class="token keyword">const</span> maxHours <span class="token operator">=</span> <span class="token known-class-name class-name">Math</span><span class="token punctuation">.</span><span class="token method function property-access">floor</span><span class="token punctuation">(</span>validMaxDuration <span class="token operator">/</span> <span class="token number">3600</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> maxMinutes <span class="token operator">=</span> <span class="token known-class-name class-name">Math</span><span class="token punctuation">.</span><span class="token method function property-access">floor</span><span class="token punctuation">(</span><span class="token punctuation">(</span>validMaxDuration <span class="token operator">%</span> <span class="token number">3600</span><span class="token punctuation">)</span> <span class="token operator">/</span> <span class="token number">60</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> maxSeconds <span class="token operator">=</span> <span class="token known-class-name class-name">Math</span><span class="token punctuation">.</span><span class="token method function property-access">floor</span><span class="token punctuation">(</span>validMaxDuration <span class="token operator">%</span> <span class="token number">60</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    </code>
  4. Options for time selects. We create arrays for possible hours, minutes, and seconds options, and state hooks to manage the minute and second options:

    <code class="javascript language-javascript"><span class="token keyword">const</span> hoursOptions <span class="token operator">=</span> <span class="token known-class-name class-name">Array</span><span class="token punctuation">.</span><span class="token keyword module">from</span><span class="token punctuation">(</span><span class="token punctuation">{</span> length<span class="token operator">:</span> <span class="token known-class-name class-name">Math</span><span class="token punctuation">.</span><span class="token method function property-access">max</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> maxHours<span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1</span> <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token parameter">_<span class="token punctuation">,</span> i</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> minutesSecondsOptions <span class="token operator">=</span> <span class="token known-class-name class-name">Array</span><span class="token punctuation">.</span><span class="token keyword module">from</span><span class="token punctuation">(</span><span class="token punctuation">{</span> length<span class="token operator">:</span> <span class="token number">60</span> <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token parameter">_<span class="token punctuation">,</span> i</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
    
    <span class="token keyword">const</span> <span class="token punctuation">[</span>minuteOptions<span class="token punctuation">,</span> setMinuteOptions<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">useState</span><span class="token punctuation">(</span>minutesSecondsOptions<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token punctuation">[</span>secondOptions<span class="token punctuation">,</span> setSecondOptions<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">useState</span><span class="token punctuation">(</span>minutesSecondsOptions<span class="token punctuation">)</span><span class="token punctuation">;</span>
    </code>
  5. Update value function. This function updates the current value by calling the onChange function passed in as a prop:

    <code class="javascript language-javascript"><span class="token keyword">const</span> <span class="token function-variable function">updateValue</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token parameter">newHours<span class="token punctuation">,</span> newMinutes<span class="token punctuation">,</span> newSeconds</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
        <span class="token function">onChange</span><span class="token punctuation">(</span><span class="token template-string"><span class="token template-punctuation string">`</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span><span class="token known-class-name class-name">String</span><span class="token punctuation">(</span>newHours<span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token method function property-access">padStart</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token string">'0'</span><span class="token punctuation">)</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">:</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span><span class="token known-class-name class-name">String</span><span class="token punctuation">(</span>newMinutes<span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token method function property-access">padStart</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token string">'0'</span><span class="token punctuation">)</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">:</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span><span class="token known-class-name class-name">String</span><span class="token punctuation">(</span>newSeconds<span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token method function property-access">padStart</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token string">'0'</span><span class="token punctuation">)</span><span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    </code>
  6. Update minute and second options function. This function updates the minute and second options depending on the selected hours and minutes:

    <code class="javascript language-javascript"><span class="token keyword">const</span> updateMinuteAndSecondOptions <span class="token operator">=</span> <span class="token function">useCallback</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token parameter">newHours<span class="token punctuation">,</span> newMinutes</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
        <span class="token keyword">const</span> minutesSecondsOptions <span class="token operator">=</span> <span class="token known-class-name class-name">Array</span><span class="token punctuation">.</span><span class="token keyword module">from</span><span class="token punctuation">(</span><span class="token punctuation">{</span> length<span class="token operator">:</span> <span class="token number">60</span> <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token parameter">_<span class="token punctuation">,</span> i</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token keyword">let</span> newMinuteOptions <span class="token operator">=</span> minutesSecondsOptions<span class="token punctuation">;</span>
            <span class="token keyword">let</span> newSecondOptions <span class="token operator">=</span> minutesSecondsOptions<span class="token punctuation">;</span>
            <span class="token keyword control-flow">if</span> <span class="token punctuation">(</span>newHours <span class="token operator">===</span> maxHours<span class="token punctuation">)</span> <span class="token punctuation">{</span>
                newMinuteOptions <span class="token operator">=</span> <span class="token known-class-name class-name">Array</span><span class="token punctuation">.</span><span class="token keyword module">from</span><span class="token punctuation">(</span><span class="token punctuation">{</span> length<span class="token operator">:</span> <span class="token known-class-name class-name">Math</span><span class="token punctuation">.</span><span class="token method function property-access">max</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> maxMinutes<span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1</span> <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token parameter">_<span class="token punctuation">,</span> i</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword control-flow">if</span> <span class="token punctuation">(</span>newMinutes <span class="token operator">===</span> maxMinutes<span class="token punctuation">)</span> <span class="token punctuation">{</span>
                    newSecondOptions <span class="token operator">=</span> <span class="token known-class-name class-name">Array</span><span class="token punctuation">.</span><span class="token keyword module">from</span><span class="token punctuation">(</span><span class="token punctuation">{</span> length<span class="token operator">:</span> <span class="token known-class-name class-name">Math</span><span class="token punctuation">.</span><span class="token method function property-access">max</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> maxSeconds<span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token number">1</span> <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token parameter">_<span class="token punctuation">,</span> i</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> i<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
            <span class="token punctuation">}</span>
            <span class="token function">setMinuteOptions</span><span class="token punctuation">(</span>newMinuteOptions<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token function">setSecondOptions</span><span class="token punctuation">(</span>newSecondOptions<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">[</span>maxHours<span class="token punctuation">,</span> maxMinutes<span class="token punctuation">,</span> maxSeconds<span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    </code>
  7. Effect Hook. This calls updateMinuteAndSecondOptions when hours or minutes change:

    <code class="javascript language-javascript"><span class="token function">useEffect</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
        <span class="token function">updateMinuteAndSecondOptions</span><span class="token punctuation">(</span>hours<span class="token punctuation">,</span> minutes<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">[</span>hours<span class="token punctuation">,</span> minutes<span class="token punctuation">,</span> updateMinuteAndSecondOptions<span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    </code>
  8. Helper functions. These two helper functions convert time integers to select options and vice versa:

    <code class="javascript language-javascript"><span class="token keyword">const</span> <span class="token function-variable function">toOption</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token parameter">value</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">(</span><span class="token punctuation">{</span>
        value<span class="token operator">:</span> value<span class="token punctuation">,</span>
        label<span class="token operator">:</span> <span class="token known-class-name class-name">String</span><span class="token punctuation">(</span>value<span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token method function property-access">padStart</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token string">'0'</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">const</span> <span class="token function-variable function">fromOption</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token parameter">option</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> option<span class="token punctuation">.</span><span class="token property-access">value</span><span class="token punctuation">;</span>
    </code>
  9. Render. The render function displays the time picker, which consists of three dropdown menus (hours, minutes, seconds) managed by the react-select library. Changing the value in the select boxes will call updateValue and updateMinuteAndSecondOptions, which were explained above.

You can find the full source code of the TimePicker component on GitHub.

The main component

Now let’s build the main frontend component by replacing App.js.

The App component will implement a transcription page with the following functionalities:

  • Define helper functions for time format conversion.
  • Update startTime and endTime based on selection from the TimePicker component.
  • Define a getAudioDuration function that retrieves the duration of the audio file and updates the audioDuration state.
  • Handle file uploads for the audio file to be transcribed.
  • Define a transcribeAudio function that sends the audio file by making an HTTP POST request to our API.
  • Render UI for file upload.
  • Render TimePicker components for selecting startTime and endTime.
  • Display notification messages.
  • Display the transcribed text.

Let’s break this component down into several smaller sections:

  1. Imports and helper functions. Import necessary modules and define helper functions for time conversions:

    <code class="javascript language-javascript"><span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">React</span><span class="token punctuation">,</span> <span class="token punctuation">{</span> useState<span class="token punctuation">,</span> useCallback <span class="token punctuation">}</span></span> <span class="token keyword module">from</span> <span class="token string">'react'</span><span class="token punctuation">;</span>
    <span class="token keyword module">import</span> <span class="token imports"><span class="token punctuation">{</span> useDropzone <span class="token punctuation">}</span></span> <span class="token keyword module">from</span> <span class="token string">'react-dropzone'</span><span class="token punctuation">;</span> 
    <span class="token keyword module">import</span> <span class="token imports">axios</span> <span class="token keyword module">from</span> <span class="token string">'axios'</span><span class="token punctuation">;</span> 
    <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">TimePicker</span></span> <span class="token keyword module">from</span> <span class="token string">'./TimePicker'</span><span class="token punctuation">;</span> 
    <span class="token keyword module">import</span> <span class="token imports"><span class="token punctuation">{</span> toast<span class="token punctuation">,</span> <span class="token maybe-class-name">ToastContainer</span> <span class="token punctuation">}</span></span> <span class="token keyword module">from</span> <span class="token string">'react-toastify'</span><span class="token punctuation">;</span> 
    
    
    </code>
  2. Component declaration and state hooks. Declare the TranscriptionPage component and initialize state hooks:

    <code class="javascript language-javascript"><span class="token keyword">const</span> <span class="token function-variable function"><span class="token maybe-class-name">TranscriptionPage</span></span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
      <span class="token keyword">const</span> <span class="token punctuation">[</span>uploading<span class="token punctuation">,</span> setUploading<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">useState</span><span class="token punctuation">(</span><span class="token boolean">false</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">const</span> <span class="token punctuation">[</span>transcription<span class="token punctuation">,</span> setTranscription<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">useState</span><span class="token punctuation">(</span><span class="token string">''</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">const</span> <span class="token punctuation">[</span>audioFile<span class="token punctuation">,</span> setAudioFile<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">useState</span><span class="token punctuation">(</span><span class="token keyword null nil">null</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">const</span> <span class="token punctuation">[</span>startTime<span class="token punctuation">,</span> setStartTime<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">useState</span><span class="token punctuation">(</span><span class="token string">'00:00:00'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">const</span> <span class="token punctuation">[</span>endTime<span class="token punctuation">,</span> setEndTime<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">useState</span><span class="token punctuation">(</span><span class="token string">'00:10:00'</span><span class="token punctuation">)</span><span class="token punctuation">;</span> 
      <span class="token keyword">const</span> <span class="token punctuation">[</span>audioDuration<span class="token punctuation">,</span> setAudioDuration<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">useState</span><span class="token punctuation">(</span><span class="token keyword null nil">null</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      
    </code>
  3. Event handlers. Define various event handlers — for handling start time change, getting audio duration, handling file drop, and transcribing audio:

    <code class="javascript language-javascript"><span class="token keyword">const</span> <span class="token function-variable function">handleStartTimeChange</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token parameter">newStartTime</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
      
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    
    <span class="token keyword">const</span> <span class="token function-variable function">getAudioDuration</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token parameter">file</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
      
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    
    <span class="token keyword">const</span> onDrop <span class="token operator">=</span> <span class="token function">useCallback</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token parameter">acceptedFiles</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
      
    <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    
    <span class="token keyword">const</span> <span class="token function-variable function">transcribeAudio</span> <span class="token operator">=</span> <span class="token keyword">async</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span> 
      
    <span class="token punctuation">}</span><span class="token punctuation">;</span>
    </code>
  4. Use the Dropzone hook. Use the useDropzone hook from the react-dropzone library to handle file drops:

    <code class="javascript language-javascript"><span class="token keyword">const</span> <span class="token punctuation">{</span> getRootProps<span class="token punctuation">,</span> getInputProps<span class="token punctuation">,</span> isDragActive<span class="token punctuation">,</span> isDragReject <span class="token punctuation">}</span> <span class="token operator">=</span> <span class="token function">useDropzone</span><span class="token punctuation">(</span><span class="token punctuation">{</span>
      onDrop<span class="token punctuation">,</span>
      accept<span class="token operator">:</span> <span class="token string">'audio/*'</span><span class="token punctuation">,</span>
    <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    </code>
  5. Render. Finally, render the component. This includes a dropzone for file upload, TimePicker components for setting start and end times, a button for starting the transcription process, and a display for the resulting transcription.

The transcribeAudio function is an asynchronous function responsible for sending the audio file to a server for transcription. Let’s break it down:

<code class="javascript language-javascript"><span class="token keyword">const</span> <span class="token function-variable function">transcribeAudio</span> <span class="token operator">=</span> <span class="token keyword">async</span> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token arrow operator">=></span> <span class="token punctuation">{</span>
    <span class="token function">setUploading</span><span class="token punctuation">(</span><span class="token boolean">true</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token keyword control-flow">try</span> <span class="token punctuation">{</span>
      <span class="token keyword">const</span> formData <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">FormData</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      audioFile <span class="token operator">&&</span> formData<span class="token punctuation">.</span><span class="token method function property-access">append</span><span class="token punctuation">(</span><span class="token string">'file'</span><span class="token punctuation">,</span> audioFile<span class="token punctuation">)</span><span class="token punctuation">;</span>
      formData<span class="token punctuation">.</span><span class="token method function property-access">append</span><span class="token punctuation">(</span><span class="token string">'startTime'</span><span class="token punctuation">,</span> <span class="token function">timeToMinutesAndSeconds</span><span class="token punctuation">(</span>startTime<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      formData<span class="token punctuation">.</span><span class="token method function property-access">append</span><span class="token punctuation">(</span><span class="token string">'endTime'</span><span class="token punctuation">,</span> <span class="token function">timeToMinutesAndSeconds</span><span class="token punctuation">(</span>endTime<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

      <span class="token keyword">const</span> response <span class="token operator">=</span> <span class="token keyword control-flow">await</span> axios<span class="token punctuation">.</span><span class="token method function property-access">post</span><span class="token punctuation">(</span><span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">http://localhost:3001/api/transcribe</span><span class="token template-punctuation string">`</span></span><span class="token punctuation">,</span> formData<span class="token punctuation">,</span> <span class="token punctuation">{</span>
        headers<span class="token operator">:</span> <span class="token punctuation">{</span> <span class="token string">'Content-Type'</span><span class="token operator">:</span> <span class="token string">'multipart/form-data'</span> <span class="token punctuation">}</span><span class="token punctuation">,</span>
      <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

      <span class="token function">setTranscription</span><span class="token punctuation">(</span>response<span class="token punctuation">.</span><span class="token property-access">data</span><span class="token punctuation">.</span><span class="token property-access">transcription</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      toast<span class="token punctuation">.</span><span class="token method function property-access">success</span><span class="token punctuation">(</span><span class="token string">'Transcription successful.'</span><span class="token punctuation">)</span>
    <span class="token punctuation">}</span> <span class="token keyword control-flow">catch</span> <span class="token punctuation">(</span>error<span class="token punctuation">)</span> <span class="token punctuation">{</span>
      toast<span class="token punctuation">.</span><span class="token method function property-access">error</span><span class="token punctuation">(</span><span class="token string">'An error occurred during transcription.'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span> <span class="token keyword control-flow">finally</span> <span class="token punctuation">{</span>
      <span class="token function">setUploading</span><span class="token punctuation">(</span><span class="token boolean">false</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
  <span class="token punctuation">}</span><span class="token punctuation">;</span>
</code>

Here’s a more detailed look:

  1. setUploading(true);. This line sets the uploading state to true, which we use to indicate to the user that the transcription process has started.

  2. const formData = new FormData();. FormData is a web API used to send form data to the server. It allows us to send key–value pairs where the value can be a Blob, File or a string.

  3. The audioFile is appended to the formData object, provided it’s not null (audioFile && formData.append('file', audioFile);). The start and end times are also appended to the formData object, but they’re converted to MM:SS format first.

  4. The axios.post method is used to send the formData to a server endpoint (http://localhost:3001/api/transcribe). Change http://localhost:3001 to the server address. This is done with an await keyword, meaning that the function will pause and wait for the Promise to be resolved or be rejected.

  5. If the request is successful, the response object will contain the transcription result (response.data.transcription). This is then set to the transcription state using the setTranscription function. A successful toast notification is then shown.

  6. If an error occurs during the process, an error toast notification is shown.

  7. In the finally block, regardless of the outcome (success or error), the uploading state is set back to false to allow the user to try again.

In essence, the transcribeAudio function is responsible for coordinating the entire transcription process, including handling the form data, making the server request, and handling the server response.

You can find the full source code of the App component on GitHub.

Conclusion

We’ve reached the end and now have a full web application that transcribes speech to text with the power of Whisper.

We could definitely add a lot more functionality, but I’ll let you build the rest on your own. Hopefully we’ve gotten you off to a good start.

Here’s the full source code:




Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
WP Twitter Auto Publish Powered By : XYZScripts.com
SiteLock