software engineering

Using LLMs As Virtual Assistants for Python Programming

In recent years, artificial intelligence has dominated the technology landscape and made a transformative impact on virtually every industry, from the creative arts to finance to management. Large language models (LLMs) such as OpenAI’s GPT and Google’s Gemini are improving at breakneck speeds and have started to play an essential role in a software engineer’s toolkit.

Though the current generation of LLMs can’t replace software engineers, these models are capable of serving as intelligent digital assistants that can help with coding and debugging some straightforward and routine tasks. In this article, I leverage my experience developing AI and machine learning solutions to explain the intricacies of using LLMs to generate code capable of interacting with external resources.

Defining Large Language Models

An LLM is a machine learning model that has been trained on very large quantities of text data with the goal of understanding and generating human language. An LLM is typically built using transformers, a type of neural network architecture that works on a “self-attention mechanism,” meaning that entire input sequences are processed simultaneously rather than word by word. This allows the model to analyze entire sentences, significantly improving its understanding of latent semantics—the underlying meaning and intent conveyed by text. Essentially, LLMs understand context, making them effective in generating text in a humanlike style.

The deeper the network, the better it can capture subtle meanings in human language. A modern LLM requires vast amounts of training data and might feature billions of parameters—the elements learned from the training data—since the hope is that increased depth will lead to improved performance in tasks like reasoning. For training GPT-3, the raw data scraped from the content in published books and the Internet was 45TB of compressed text. GPT-3 contains approximately 175 billion parameters to achieve its knowledge base.

Alongside GPT-3 and GPT-4, several other LLMs have made considerable advancements; these include Google’s PaLM 2 and LLaMa 2 from Meta.

Because their training data has included programming languages and software development, LLMs have learned to generate code as well. Modern LLMs are able to transform natural language text prompts into working code in a wide range of programming languages and technology stacks, though leveraging this powerful capability requires a certain level of technical expertise.

The Benefits and Limitations of LLM Code Generation

While complex tasks and problem-solving will most likely always require the attention of human developers, LLMs can act as intelligent assistants, writing code for less complicated tasks. Handing off repetitive tasks to an LLM can increase productivity and reduce development time in the design process, especially with early-phase tasks like prototyping and concept validation. Additionally, an LLM can provide valuable insights into the debugging process by explaining code and finding syntax errors that can be difficult for humans to spot after a long day of writing code.

That said, any code generated by an LLM should be considered a starting point and not a finished product—the code should always be reviewed and thoroughly tested. Developers should also be aware of the limitations of LLMs. Because they lack the problem-solving and improvisational skills of humans, LLMs struggle with complex business logic and challenges that require innovative solutions. Additionally, LLMs may not have the proper training to tackle projects that are domain specific or use specialized or proprietary frameworks. Overall, LLMs can be effective assistants, but human developers remain an essential part of the development process.

Generating Code With an LLM: A Weather API Call Example

One critical component of most modern applications is the requirement to interface with external resources. Developers are frequently required to call an API (application programming interface), a set of instructions or protocols that provide a standardized way for applications to communicate with each other. Writing the code to consume an API often involves a deep dive into its documentation, which details all of the requirements, including the naming conventions, the operations it supports, the data format of the input and response, and security requirements.

As long as it is given the required information, an LLM may be able to speed the process up, by helping to generate the code for an API request. This example walks through how to prompt ChatGPT to generate a call to the OpenWeather API in Python, and how to avoid errors such as the use of deprecated functions. Our example then shows how to send the same prompt through OpenAI’s API, demonstrating how changing parameters can affect an LLM’s output.

Note that there is a cost associated with using the OpenWeather API, though the API can be called for free 1,000 times per day.

Turning ChatGPT Prompts Into API Calls

The requirements for our example application include displaying the current temperature and a description of the weather. To see how ChatGPT can help us write this code, we prompt as follows:

Write a Python script that uses the Python “requests” library to perform a GET request to the OpenWeather API at the endpoint “http://api.openweathermap.org/data/2.5/weather”. The API call should include an API key in the query string parameters under the key “appid.” The API should return data in JSON format. The function should return the current temperature in Fahrenheit.

The following Python code is returned:

import requests

# Replace "YOUR_API_KEY" with your actual OpenWeather API key.
api_key = "YOUR_API_KEY"
api_url = "http://api.openweathermap.org/data/2.5/weather"

# City for which you want to get weather information.
city_name = "London"

# Parameters for the API request.
params = {
    'q': city_name,
    'appid': api_key,
    'units': 'imperial'  # Use "metric" for Celsius or "imperial" for Fahrenheit.
}

try:
    # Make the GET request.
    response = requests.get(api_url, params=params)

    # Check if the request was successful (status code 200).
    if response.status_code == 200:
        # Parse and print the JSON response.
        weather_data = response.json()
        print("Weather Data for", city_name)
        print("Temperature:", weather_data['main']['temp'],"°F")
        print("Description:", weather_data['weather'][0]["description"])
    else:
        # Print an error message if the request was not successful.
        print("Error:", response.status_code, response.text)

except Exception as e:
    print("An error occurred:", e)

Note that in the prompt, the actual location was not specified; however, the generated code includes a variable named city_name, which has a default value of “London.” Don’t forget to replace YOUR_API_KEY with your personal OpenWeather API key.

To test the code, the Python interpreter must be installed on your machine, after which the code can be run using an IDE such as Visual Studio Code, or by invoking the script via the command line. The following weather data is output for the city of London:

Weather Data for London
Temperature: 53.08°F       
Description: overcast clouds

In many systems, the information that is retrieved from an API might need to be stored or input into another system, such as a database or even another API. In these cases, the information would need to be converted to a specific format, such as XML or CSV, or even incorporated in a SQL command that inserts or updates data in a database table. The LLM can help by generating the code for custom output formats. This sentence is added to the original prompt:

The output should be returned in the form of a SQL command that inserts the returned data into a table called ‘current_weather’, where the columns are “city,” “temperature,” “weather_description,” and “update_date.”

ChatGPT returns the same code as above, with a few added import statements and an updated output format:

import requests
import json
from datetime import datetime

# …

try:
    # Make the GET request.
    response = requests.get(api_url, params=params)

    # Check if the request was successful (status code 200).
    if response.status_code == 200:
        # Parse and print the JSON response.
        weather_data = response.json()

        # Extract relevant information.
        city = city_name
        temperature = weather_data['main']['temp']
        weather_description = weather_data['weather'][0]['description']
        update_date = datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')

        # Generate SQL command for insertion.
        sql_command = f"INSERT INTO current_weather (city, temperature, weather_description, update_date) VALUES ('{city}', {temperature}, '{weather_description}', '{update_date}')"
            
        # Print the SQL command.
        print(sql_command)
    else:
        # Print an error message if the request was not successful.
        print("Error:", response.status_code, response.text)

except Exception as e:
    print("An error occurred:", e)

Running this code results in the correct SQL command:

INSERT INTO current_weather (city, temperature, weather_description, update_date) VALUES ('London', 53.37, 'broken clouds', '2024-02-06 04:43:35')

However, a warning is returned due to the utcnow() function being deprecated:

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

To prevent ChatGPT from using deprecated functions, we add to our prompt:

Please do not use any functions that are deprecated.

After adding this line, ChatGPT replaces the deprecated utcnow() function with the following:

# Use timezone-aware object for update_date.
update_date = datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S')

This code once again returns the correct SQL command. SQL commands can be tested using various IDEs like Visual Studio Code or query editors in database management tools. In a typical web application, the SQL command would be run immediately after the API call, updating a table in the database in real time.

As long as they are given accurate guidance, LLMs are capable of structuring their output into virtually any format, including SQL commands, JSON, or even a call to another API.

Using the OpenAI API Instead of ChatGPT

Many LLMs have API counterparts that enable developers to interact with LLMs programmatically and integrate them seamlessly into applications. This enables you to create your own virtual AI assistant, with features such as code generation for autocompletion, refactoring, and optimization. User interfaces can be optimized for specific domains and customized to populate predefined prompt templates. Integrating an LLM programmatically also allows tasks to be scheduled or triggered, facilitating the creation of an automated virtual assistant.

In this example, we’ll perform the same weather retrieval task, now using Python to interact with the OpenAI API instead of manually prompting ChatGPT for code. The API can be installed using the following command:

pip install openai

To use the OpenAI API, an API key is required. The API has a cost associated with it, however, new accounts have free access for a limited time frame.

After the OpenAI API is installed, its library can be imported into your code. The following code sends the prompt that was created above, omitting the instruction to output a SQL command. The model that is specified is GPT-3.5:

import openai

openai.api_key = "YOUR_API_KEY"  # Replace with your actual API key.

response = openai.chat.completions.create(
  model='gpt-3.5-turbo',
  max_tokens=1024, temperature=0.3, top_p=0.9,
  messages=[
        {'role': 'system', 'content': 'You are an intelligent AI assistant.'},
        {'role': 'user', 'content': "Write a Python script that uses the Python 'requests' library to perform a GET request to the OpenWeather API at the endpoint 'http://api.openweathermap.org/data/2.5/weather'. The API call should include an API key in the query string parameters under the key 'appid.' The data returned by the API is in JSON format. Please do not use any functions that are deprecated."},
    ],
)

response_message = response.choices[0].message.content
print(response_message )

The LLM generates the following Python code, which retrieves the temperature for London:

import requests

# API endpoint and API key.
url = "http://api.openweathermap.org/data/2.5/weather"
api_key = "YOUR_API_KEY"

# Query string parameters.
params = {
    'q': 'London,uk',  # Example city and country code.
    'appid': api_key
}

# Send GET request.
response = requests.get(url, params=params)

# Check if request was successful.
if response.status_code == 200:
    # Parse JSON response.
    data = response.json()

    # Print the temperature in Celsius.
    temperature = data['main']['temp'] - 273.15
    print(f"The temperature in London is {temperature:.2f}°C.")
else:
    print(f"Error: {response.status_code}")

Note that the instruction to retrieve the temperature in degrees Fahrenheit was also omitted. The LLM did not specify the units in the API call, but it chose to mathematically convert the units from Kelvins to Celsius when displaying the results.

Leveraging LLM-specific Parameters

When using the API, many of the LLM’s parameters can be adjusted, altering the responses that are generated. Some parameters change the level of randomness and creativity, while others focus on repetition. While parameters may have more of an influence when generating natural language text, adjusting them can also influence code generation.

In the previous code, GPT’s parameters can be adjusted in line 7:

max_tokens=1024, temperature=0.3, top_p=0.9,

The following parameters can be adjusted:

Parameter

Description

Code Generation Impact

temperature

The temperature parameter adjusts the randomness of the generated text, essentially the “creativity” of the response. A higher temperature increases randomness, while a lower temperature results in more predictable responses. The temperature can be set between 0 and 2. The default is either 0.7 or 1, depending on the model.

A lower temperature will produce safer code that follows the patterns and structures learned during training. Higher temperatures may result in more unique and unconventional code, however, they may also introduce errors and inconsistencies.

max_tokens

The max_tokens parameter sets a limit on how many tokens the LLM will generate. If it is set too low, the response may only be a few words. Setting it too high may waste tokens, increasing costs.

Max tokens should be set high enough to include all the code that needs to be generated. It can be decreased if you don’t want any explanations from the LLM.

top_p

Top P, or nucleus sampling, influences what the next word or phrase might be by limiting the choices that the LLM considers. top_p has a maximum value of 1 and a minimum value of 0. Setting top_p to 0.1 tells the LLM to limit the next token to the top 10% of the most probable ones. Setting it to 0.5 changes that to the top 50%, yielding a wider range of responses.

With a low top P value, the code generated will be more predictable and contextually relevant, as only the most probable tokens will be used. Though raising top P allows more diversity in the output, it can lead to irrelevant or nonsensical code snippets.

frequency_penalty

The frequency_penalty parameter reduces the repetition of words or phrases in the LLM’s response. With a high frequency penalty, the LLM avoids repeating words that were used earlier. A lower frequency penalty allows more repetition. The frequency_penalty parameter has a maximum value of 2 and a minimum value of 0.

With a higher frequency penalty, the generated code will be less repetitive and potentially more innovative; however, the LLM may choose elements that are less efficient or even incorrect. With a lower frequency penalty, the code might not explore alternative approaches. Experimentation might be needed to find the optimal value.

presence_penalty

The presence_penalty parameter is related to the frequency_penalty parameter in that they both encourage a more diverse word range. However, while frequency_penalty penalizes tokens that have appeared many times in the text, presence_penalty penalizes a token that has already appeared, regardless of its frequency. The net effect is that frequency_penalty tries to reduce repetition of words, while presence_penalty focuses on using entirely new words. The presence_penalty parameter has a maximum value of 2 and a minimum value of 0.

Similar to frequency penalty, a high presence penalty encourages the LLM to explore alternative solutions; however, the generated code may be less efficient or even incorrect. A lower presence penalty allows more code to be repeated, which can result in more concise and efficient code, especially when there are repetitive patterns.

stop

A stop sequence tells the LLM when to stop generating text. For example, setting stop to “/n” tells the LLM to stop after a single line.

A stop sequence prevents an LLM from generating excess or irrelevant code. The stop sequence should be a natural ending point for the code snippet, for example, the end of a function or a loop.

To see how these parameters affect code generation in action, we’ll experiment with adjusting the frequency_penalty and presence_penalty parameters.

When frequency_penalty was set to 1.5 (a higher value), the LLM generated additional code to catch exceptions, presumably in an effort to avoid repeating previous outputs:

#...

try:
    # Send GET request with params and get response data in JSON format.
    response = requests.get(url, params=params)

    if response.status_code == 200:
        weather_data = response.json()

        # Print out some relevant information from the weather data.
        print("City:", weather_data['name'])
        print("Temperature:", weather_data['main']['temp'], "K")

except requests.exceptions.RequestException as e:
   # Handle any error that occurred during the HTTP request.
   print("Error:", e)

While the additional code is highly beneficial, it is important to note that results are unpredictable. The error handling functionality prevents the program from timing out or crashing whenever the external API is having issues, but because we did not ask the LLM to generate code to handle exceptions, its addition was more akin to a lucky guess. Running the API call with identical parameters a second time would most likely yield a different result. The only consistent way to tell the LLM to generate error handling code is to add those specific instructions to the initial prompt.

Setting presence_penalty to 2.0 (the highest value) had a similar effect. The LLM avoided repeating a previous output and instead placed the call to the OpenWeather API within a function, passing the API key as an argument:

import requests

def get_weather(api_key):
    url = "http://api.openweathermap.org/data/2.5/weather"
    params = {
        'q': 'London,uk',  # Example city and country code.
        'appid': api_key
    }


    if response.status_code == 200:
        data = response.json()
        return data
    else:
        print("Error:", response.status_code)

# Replace "YOUR_API_KEY" with your actual API key from OpenWeather.
api_key = "YOUR_API_KEY"

weather_data = get_weather(api_key)
print(weather_data)

While placing the API call in the function is a useful adjustment, passing the API key as an argument is curious. In most use cases, the argument that should be passed is city. This anomaly can be easily remedied by adjusting the initial prompt.

Adjusting an LLM’s parameters will require experimentation, as the results are generally unpredictable, and adjusting certain parameters will often appear to have no effect. Changing the parameters will typically have no consistent effect on whether or not best practices are being followed. In general, settings that allow for more predictable outcomes and repetition will result in more concise and efficient code. On the other hand, settings that promote creativity and penalize repetition may result in more creative and innovative solutions, albeit with a higher chance for nonfunctioning or irrelevant code.

Integrating LLMs Into Codebases

Once the LLM-generated code is working and optimized, the next step in the development process is integrating it into an existing codebase. Directly pasting the code into existing code might lead to unorganized formatting and maintenance issues. A more streamlined approach is to convert the code into reusable functions or objects. This can be done with the LLM as long as detailed instructions are added to the prompt. Arguments should be specified, as well as output format.

To have the LLM create a Weather object that contains the get_weather function, the following instruction can be added to the prompt:

The API call should be placed in an object called “weather,” which contains a function called “get_weather,” which takes in an argument called “city.”

Based on this prompt, the LLM generates the following Python code:

import requests

class Weather:
    def __init__(self, api_key):
        self.api_key = api_key
        self.temperature = None

    def get_weather(self, city):
        url = 'http://api.openweathermap.org/data/2.5/weather'
        params = {
            'q': city,
            'appid': self.api_key,
            'units': 'imperial'  # Request temperature in Fahrenheit.
        }

        response = requests.get(url, params=params)
        data = response.json()

        if response.status_code == 200:
            self.temperature = data['main']['temp']
            print(f"The temperature in {city} is {self.temperature}°F.")
        else:
            print(f"Error: {data['message']}")

# Replace "YOUR_API_KEY" with your actual API key from OpenWeather.
api_key = "YOUR_API_KEY"

weather = Weather(api_key)
weather.get_weather('London')

The LLM generates a working function in a Weather object. The object can now be integrated into another application, however, error handling should be incorporated to ensure that the application does not crash or inadvertently display sensitive information.

Performing Error Handling and Testing

In the real world, API calls can fail for a number of reasons. To ensure reliability, the code must include proper error handling. In many of the previous examples, the LLM-generated code contained a check to see if the HTTP status code that was returned was 200 (a successful response) and showed an error description for any other response. But this necessary check does not take into account common error scenarios such as connection timeouts or DNS resolution errors.

For reliable operation, the code should contain error handling mechanisms, such as using a try-catch block when making the API call. The following text was appended to the previous prompt:

For error handling, incorporate a try-catch block when making the API call. The code should also confirm that the HTTP response status is 200 and display the error description when it is not.

This prompt generated Python code containing a try-catch block to handle any potential errors:

class Weather:
    def __init__(self, api_key):
        self.api_key = api_key
        self.temperature = None

    def get_weather(self, city):
        try:
            # Construct the API URL with the provided city and API key.
            url = "http://api.openweathermap.org/data/2.5/weather"
            params = {'q': city, 
                      'appid': self.api_key,
                      'units': 'imperial'  # Get temperature in Fahrenheit.
                      }

            # Send GET request to the API.
            response = requests.get(url, params=params)

            # Check if the response status is 200 (OK).
            if response.status_code == 200:
                # Parse the JSON response
                data = response.json()

                # Extract the temperature in Fahrenheit from the response.
                self.temperature = data['main']['temp']
            else:
                # Display the error description if the response status is not 200.
                print(f'Error: {response.status_code} - {response.json()["message"]}')
        except requests.exceptions.RequestException as e:
            # Handle any request exceptions.
            print(f'Error: {e}')

While the code generated by the LLM provides a good starting point, further modifications are typically required for additional functionality or customization. For example, errors should be logged, and configuration errors (e.g., an API authentication error or a “Service Unavailable” status) should generate notifications to administrators. Additionally, developers might add code to access a backup API if the primary API is down.

Once the code does everything it’s supposed to, the next crucial step is to test and confirm that it will hold up in real-life situations. Testing should be comprehensive, with a diverse array of test cases that include potential error conditions and edge cases. For increased reliability and faster feedback, you can automate testing. To assess real-world performance, measuring performance metrics such as execution time, memory usage, and resource consumption can help identify potential bottlenecks in the system. Insights derived from continuous testing and monitoring can help refine prompts and fine-tune LLM parameters.

The Evolution of LLMs

While LLMs are in no way a replacement for human expertise, their ability to generate code is a transformative innovation that can be of valuable assistance to developers. Not only can LLMs speed up the development cycle, an LLM-based smart virtual assistant can quickly generate multiple variations of the code, letting developers choose the optimal version. Delegating simpler tasks to an LLM improves developers’ productivity, letting them focus on complicated tasks that require specialized knowledge and human thought, such as problem-solving and designing the next generation of applications. With clear prompts and comprehensive testing, a developer can leverage APIs to add the functionality of an LLM to an application.

With more and more developers discovering the benefits of AI, the technology will improve very quickly; however, it is important to keep in mind responsible and ethical usage. Just like all generative AI users, software developers have a duty to keep an eye on data privacy violations, intellectual property, security concerns, unintended output, and potential biases in LLM training. LLMs are currently being heavily researched, and as the technology advances, they will evolve into seamlessly integrated intelligent virtual assistants.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button