What Can AI Do for Data Science?
Figure 1. KNIME AI nodes to prompt an LLM. <AI prov> indicates the selected AI provider.
Rather than pondering whether AI will substitute data science, we can start thinking of what AI can actually do for data science. Is there a way to exploit what AI can do best to enrich our data science solutions?
LLMs were born to analyze and generate text. This is what they can do best. Take a question and produce a human-like answer. Take a comment and interpret its sentiment. Take a book and summarize it. Take a topic and generate text around it in any language and style. And so on.
In this blog post, I would like to run a quick overview of AI-based tasks that could be integrated in your data science application and provide an advantage in terms of better results, expanded functionality, or time saved.
Our Top 3 Partner Recommendations
1. Best VPN for Engineers – 3 Months Free – Stay secure online with a free trial
2. Best Project Management Tool for Tech Teams – Boost team efficiency today
4. Best Password Management for Tech Teams – zero-trust and zero-knowledge security
- The obvious chatbot
- Text and image creation
- Text Summarization
- Sentiment analysis
- Explanation
- Conversational search
- Coding
- Data generation
- Augment analytics with GenAI
- Augment GenAI with analytics
The Obvious Chatbot
I do not know how many AI-based chatbots I have seen since the announcement of ChatGPT going live. This is the most common and most obvious AI-based application. Raise your hand who has not implemented a chatbot yet. We did too.
For example, Vittorio Haardt & Roberto Cadili explain how they built a chatbot without coding in this popular article “How to build a custom AI-powered job finder chatbot” from May 2nd 2024 published on the KNIME Blog.
Even earlier, Dayanjan S. Wijesinghe and his colleagues showed how to build a chatbot to get information about clinical practice guidelines, in this article “KNIME-Med-Chat-Bot: A Low Code Solution For AI Driven Conversational Information Extraction from Clinical Practice Guidelines” , from January 13th 2024, published on the “Low Code for Data Science” journal.
Despite being used and abused, and little challenging, a chatbot is always useful. It can be used to answer questions around the usage of a product, to educate newbies on common practice guidelines, or for other similar tasks. You can easily implement a chatbot and confine it into a mini window in the lower right corner of your web page.
Text and Image Creation
Among the many tasks Large Language Models excel at is text generation. Whether you would like to generate an email, the lyrics of a song, a handbook, an invitation, a letter to Santa, or some other text, this can be easily done with GenAI. Just craft a prompt with the right request, in terms of language, content, tone, and style and you will get your text.
The same goes for image generation. Just craft a prompt with the right request, in terms of content, colors, and style and you will get your image.
Recently, Akash, an intern at KNIME, managed to create a lyrics for rap songs using GenAI that turned out to be much better than what I had done a few years earlier by training an LSTM network. The rapbot (the rap generator data app) is available on the KNIME Community Hub for free download and you can admire Akash’s rapping in this TikTok video “Generate a rap using KNIME”.
Figure 2. Rapping AI generated lyrics on TikTok
A similar application for image generation is described in this article “How to use GenAI for Image Generation the no-code way” published on the KNIME Blog in June 2024.
Text Summarization
Another task LLMs excel at is text summarization. Imagine you need to read a long dissertation, and you have very little time available, you could just get a summary by AI. Even better, you could just insert this summary into a presentation describing the key results of the dissertation for your management team. We used this feature of LLMs to summarize CVs by job applicants.
The workflow is actually quite simple to build. Of course, the perfectionist, that you are, can make it as detailed and complicated as possible; however, the basic application is quite easy to implement in 5 simple steps:
- Get an API key to access your preferred LLM provider, like OpenAI or HuggingFace, and insert it into a KNIME workflow via the Credentials Widget node,
- Use the API key to authenticate with your selected LLM provider and select the LLM to connect.
- Upload the CV file and prompt the selected LLM to summarize it.
- Using the KNIME Text Processing extension, extract all Named Entities in the CV, such as the schools the candidate attended or the previous companies they worked for.
- Finally, draw a dashboard displaying all the summarized content organized to make it easier to evaluate it at a glance
The workflow “LLMs for CV Summarization” is available for free download from the KNIME Community Hub.
In the figure below you can see the dashboard with the summary of possible CVs that Alice in Wonderland, Jack Sparrow, and John W. Smith could have submitted.
Figure 3. The data app for CV Summarization
Sentiment Analysis
To confirm that GenAI gives its best when dealing with texts, here is another use case where we got excellent results: sentiment analysis. Sentiment analysis is the process to extract the “sentiment” from somebody’s text and speech. Widely used in polls, for example to quantify the popularity of political proposals, and in web reviews, for example to detect flaws in the service and hospitality industry, sentiment analysis represents by now a commonly used data science practice.
I report here the use case of sentiment analysis in the financial sector. This use case is described in the article “A beginners guide to build your own LLM-based solutions” published on the KNIME Blog (you need to scroll down quite a bit, because the use case is described just at the end of the article). The corresponding workflow “KNIME workflow for sentiment prediction with LLMs” can be found on the KNIME Community Hub.
Note that this workflow employs three distinct AI providers – open source and closed source – for the same task: Hugging Face, OpenAI, and GPT4All. Indeed, the KNIME AI extension is growing by the day, by adding new functionalities as well as new connectors to AI providers and LLMs.
Figure 4. The nodes of the KNIME AI Extension
Image Description
Remember that feeling, in school, when in front of the whole classroom the teacher would ask you to describe in your own words the content of an art masterpiece? And you there standing in silence looking for what to say. Well, AI could have found the words for you. Indeed, another popular and successful usage of AI is providing a description of all sorts of things, even complex concepts. We tried that too.
This workflow “Leverage open-source, local LLMs for vision and embeddings via Ollama” by Roberto Cadili accepts images as input and produces a description at the output for each one of the images. Next step should be asking AI to describe plots, charts, and maybe even entire dashboards.
Figure 5. AI describing image content
Conversational Search
Conversational search is one of the most innovative use cases relying on AI. Let’s suppose that you need to filter your customers for a promotional campaign. You can go by age, if your product addresses youth, or you can go by geographical distribution, if your product suites some areas more than others. What about if you do not know which area or which demographic is best suited for your product? In this case you can ask AI where a specific wine is most likely drunk, which age group is most likely to listen to some music, and then build your prospect base accordingly for the promotional campaign.
A similar use case, “Filter Chat App”, has been implemented by Alneeda San and can be downloaded from the KNIME Community Hub.
Coding
Another use case for AI is code writing. Thanks to the many web-based examples and tutorials, AI has become really good at Python coding. You ask AI to implement a script to perform a given task and AI does it. If the result is not exactly what you wanted, you can keep asking to refine it until it is.
As an example here I would like to report Dennis Ganzaroli’s post on Minard’s chart about Napoleon’s campaign in Russia in 1812. The plot was generated via the E-Charts nodes within KNIME Analytics Platform. These Python based nodes provide an AI assistant (K-AI). You can ask the assistant to write the code that builds your chart. Dennis created the whole chart using the KNIME AI assistant (K-AI) without writing one single line of code himself.
Data Generation
AI has also been used for data generation. Large public data might be hard to find. An easy solution then is to generate it yourself, according to specific statistical distributions and specific dependencies. All of which, distributions and dependencies, must be clearly mentioned in the prompt to the LLM, to obtain the dataset with the desired properties.
An example for artificial data generation for supply chain – “Generating data via LLM” – is available for download from Ali Marvi’s space on the KNIME Community Hub.
Augment Analytics with GenAI
A more interesting set of use cases combines the power of AI in text and image creation with traditional data science applications. Let’s take for example a fraud detection application. Fraud detection has been an ever-present problem in many businesses and not an easy problem to solve. Depending on data availability, business regulations, and privacy laws, many different techniques have been implemented for fraud detection, triggering custom actions in case of frauds.
In the old days, we had a set of template emails to send to customers in their language of preference. The maintenance of such an email template corpus was not easy. Some languages are rarely spoken, and we needed specialized writers when a change was required. Email texts could not be all easily customized by injecting information about the suspected fraud in the template language. And so on.
Well, with AI this becomes easier. We feed a prompt into the AI model, including the specific features of the suspicious event and the desired language, and a custom email text is generated in the desired language ready to be sent to the customer.
You can read the whole story in R. Cadili, “KNIME for Finance: Introducing AI to Finance Departments”, KNIME Blog, July 2024.
Figure 6. Deploying the fraud detection workflow on the KNIME Business Hub
Augment GenAI with Analytics
The previous use case adopted classic analytics techniques and then completed them with AI-generated emails. In this last use case, we would like to do the opposite. We prompt GenAI for a specific task, like generating cooking recipes for a given set of ingredients, and then we refine the result with some classic data operations, like calculating the number of calories that come with the proposed recipe. Linus Krause’s workflow “Random_recipe_json_api” does exactly that.
First, it asks you for the ingredients in your fridge, waiting for consumption before the expiration date; then it prompts a Large Language Model for compatible recipes; then it calculates the number of calories associated with this meal; and finally displays it all on a web page.
Figure 7. An example of integration of AI responses and classic data operations in this recipe AI generator.
Summary
We have waited a long time to write this blog post, till solutions for all types of use cases were implemented and could be described. We finally made it. In this blog post, we describe ten types of data science use cases based on AI or integrating AI.
We move from the obvious chatbot to a recipe generator, passing through the generation of rap songs, summarizing CVs, another evergreen which is sentiment analysis, AI generated code to show Napoleon’s Russian campaign, conversational search, data generation, fraud detection using AI and classic data science, and a cat vs dog recognition and description. All use cases include and link to a ready to use solution available on the KNIME Community Hub.
In case you were wondering how to apply AI to your business or how to integrate it with existing data science applications, in this article you might find some inspiration for your next project.
Rosaria Silipo is not only an expert in data mining, machine learning, reporting, and data warehousing, she has become a recognized expert on the KNIME data mining engine, about which she has published three books: KNIME Beginner’s Luck, The KNIME Cookbook, and The KNIME Booklet for SAS Users. Previously Rosaria worked as a freelance data analyst for many companies throughout Europe. She has also led the SAS development group at Viseca (Zürich), implemented the speech-to-text and text-to-speech interfaces in C# at Spoken Translation (Berkeley, California), and developed a number of speech recognition engines in different languages at Nuance Communications (Menlo Park, California). Rosaria gained her doctorate in biomedical engineering in 1996 from the University of Florence, Italy.