insideBIGDATA Latest News – 7/10/2023

In this regular column, we’ll bring you all the latest industry news centered around our main topics of focus: big data, data science, machine learning, AI, and deep learning. Our industry is constantly accelerating with new products and services being announced everyday. Fortunately, we’re in close touch with vendors from this vast ecosystem, so we’re in a unique position to inform you about all that’s new and exciting. Our massive industry database is growing all the time so stay tuned for the latest news items describing technology that may make you and your organization more competitive.

Dremio Revolutionizes Data Lakehouse Engine with Cutting-Edge Features, Empowering Faster Insights and Streamlined Operations

Dremio, the easy and open data lakehouse, announced robust new features that enhance the performance and versatility of its data platform. These new capabilities empower organizations to accelerate their data analytics and enable faster, more efficient decision-making. Dremio is ensuring easy self-service analytics—with data warehouse functionality and data lake flexibility—across customer data. Among the key features unveiled by Dremio are querying, performance, and compatibility enhancements.

“With these new key features, Dremio continues to provide the most powerful and flexible data lakehouse engine on the market,” said Tomer Shiran, founder and CPO at Dremio. “We are excited to empower our customers with capabilities that make lakehouses easier than ever, and allow companies to replace their expensive and proprietary cloud data warehouses with modern and open data architectures.”

Launch of Hasper, An LLM-Powered Intelligent Engine For Advanced Analytics

Scribble Data, a machine learning, and generative AI company, announced the launch of Hasper, a full–stack large language model (LLM)-based engine for business leaders to rapidly build AI-powered data products. Hasper provides engineering guardrails to bring the power of LLMs to enterprises, enabling streamlined data workflows and facilitating contextually rich insights.

“Embracing the rise of generative AI marks a pivotal moment in our journey to unite enterprise analytics with the potential of LLMs. It’s the convergence of our Enrich Intelligence platform, built on machine learning and robust workflows, with the transformative power of LLMs. We’re already seeing this unlock compelling new use cases for our customers,” said Indrayudh Ghoshal, Co-founder and COO at Scribble Data.

Sama Launches Platform 2.0, Delivering 99% Client Acceptance Rate for AI Training Data

Sama, a leader in providing data annotation solutions that power the AI models of the future, announced Platform 2.0, a re-engineered computer vision platform to reduce the risk of machine learning (ML) algorithm failures. The completely redesigned, scalable platform, consisting of SamaIQ™, SamaAssure™ and SamaHub™, offers greater transparency for clients to minimize rework and allows Sama to deliver annotated data and insights three times faster. Platform 2.0 has successfully achieved a 99% client acceptance rate for AI training data through SamaAssure, the industry’s highest quality guarantee, with an annotation delivery rate of up to 300+ million frames, 850+ million shapes and 10 billion annotation points a month. With these enhancements, Sama clients can now deploy ML faster than ever, with the assurance that their AI models are trained on exceptional-quality data to avoid concerns of drift and/or bias.

“As companies continue to use machine learning algorithms to power some of the most innovative technologies today, our collective responsibility as leaders in tech is to ensure the applications are built using high-quality, unbiased data,” said Wendy Gonzalez, CEO of Sama. “Without accurate inputs, AI applications won’t be effective and may even yield potentially dangerous outcomes. Our Platform 2.0’s value lies in both its exceptional data accuracy and its unique combination of approach, people and technology.”

Anomalo Can Now Monitor an Entire Data Warehouse in Minutes With Major Expansion of Its Deep Data Quality Monitoring Platform

Anomalo, the complete data quality platform company, announced that it has expanded its platform to include metadata-based observability for all tables in an enterprise data warehouse. Enterprises can now do basic monitoring of the entire data warehouse in minutes and at low cost and use that as a pathway into deep data quality monitoring to identify issues with the contents of their data.

“Our customers have always said that they want to monitor every single table in their data warehouse or data lake for data issues. But especially in this environment, applying Anomalo’s full deep monitoring capabilities to every single table is neither necessary nor cost-efficient,” said Elliot Shmukler, co-founder and CEO of Anomalo. “With our new table observability checks, basic monitoring can now be applied cost-efficiently to the entire data warehouse with the flexibility to use Anomalo’s unsupervised data monitoring, metric checks and validation rules only on the most important tables. Thus enterprises now have the flexibility with Anomalo that cover all of their data observability and data quality needs.”

Directus Introduces Real-Time Data Functionality via WebSockets

Directus, an open-source software company focused on helping organizations build composable experiences, introduced a new functionality within its open data platform that enables front-end developers to build real-time capabilities into their applications via WebSockets.

Innovative organizations of all sizes turn to Directus to power and scale their data-driven applications. Many of these applications increasingly demand real-time data updates, but historically, the only time data can be loaded is on page load or through polling, which can lead to out-of-date data and unnecessary HTTP requests. With WebSockets, the server can proactively push updates to the connected clients as soon as new data is available. This update means that developers using Directus can now easily integrate up-to-the-second features into their applications and unlock new potential use cases for real-time experiences, ranging from instant messaging to live updates and notifications, real-time analytics and monitoring, IoT applications and countless other possibilities.

“This functionality is a game-changer for front-end developers relying on real-time data — for example, a ride-share app that shows vehicle location, a tool requiring live collaborative editing, or a financial app that must show to-the-second data for stock prices,” said Ben Haynes, co-founder and CEO of Directus. “Real-time has become the expectation in all aspects of life and business, and our users can rest assured knowing the data powering their applications is truly up-to-date.”

Memphis.dev Disrupts Event Streaming with Introduction of Memphis Cloud to Accelerate Development of Real-time Applications in Multi-cloud Environments

Event streaming innovator Memphis.dev introduced Memphis Cloud, which builds on the highly successful Memphis open source project to enable a full serverless experience for enterprises with added security and features that provide easy-to-deploy, stateless stream processing at scale. These include Augmentation of Kafka clusters, built-in schema management, enforcement, and transformation; multi-tenancy for traffic isolation; use-based billing and true multi-cloud capabilities. Memphis Cloud is poised to disrupt the dominance of the combination of Kafka and Flink with an intelligent, frictionless message broker that enables the ultra-fast development of real-time applications for developers and data engineers.

“The world is asynchronous and built out of events. Message brokers are the engine behind their flow in the modern software architecture, and when we looked at the bigger picture and the role message brokers play, we immediately understood that the modern message broker should be much more intelligent and by far with much less friction,” said Yaniv Ben Hemo, co-founder and CEO, Memphis. “With that insight, we built Memphis.dev which takes five minutes on average for a user to get to production and start building queue-based applications and distributed streaming pipelines.”

Databricks Introduces New Generative AI Tools, Investing in Lakehouse AI

Databricks, the Data and AI company, announced new Lakehouse AI innovations that allow customers to easily and efficiently develop generative AI applications, including large language models (LLMs), directly within the Databricks Lakehouse Platform. Lakehouse AI offers a unique, data-centric approach to AI, with built-in capabilities for the entire AI lifecycle and underlying monitoring and governance. New features that will help customers more easily implement generative AI use cases include: Vector Search, a curated collection of open source models, LLM-optimized Model Serving, MLflow 2.5 with LLM capabilities such as AI Gateway and Prompt Tools, and Lakehouse Monitoring.

The demand for generative AI is driving disruption across industries, creating urgency for technical teams to build generative AI models and LLMs on top of their own data to differentiate their offerings. However, data determines success with AI, and when the data platform is separate from the AI platform, it’s difficult to enforce and maintain clean, high-quality data. Additionally, the process of getting a model from experimentation to production, and the related tuning, operationalizing, and monitoring of the models, is complex and unreliable.

“We’ve reached an inflection point for organizations: leveraging AI is no longer aspirational — it is imperative for organizations to remain competitive,” said Ali Ghodsi, Co-Founder and CEO at Databricks. “Databricks has been on a mission to democratize data and AI for more than a decade and we’re continuing to innovate as we make the lakehouse the best place for building, owning and securing generative AI models.”

Databricks Announces LakehouseIQ, the Natural Language Interface That Opens Data Analytics to Everyone

Databricks, the Data and AI company, announced LakehouseIQ, a knowledge engine that learns what makes an organization’s data, culture and operations unique. LakehouseIQ uses generative AI to understand jargon, data usage patterns, organizational structure, and more to answer questions within the context of a business. Anyone in an organization can interact with LakehouseIQ using natural language to search, understand, and query data. LakehouseIQ is fully integrated with Databricks Unity Catalog to help ensure that democratizing access to data adheres to internal security and governance rules.

“LakehouseIQ will help democratize data access for every company to improve better decision-making and accelerate innovation. With LakehouseIQ, an employee can simply write a question and find the data they need for a project, or get answers to questions relevant to their company’s operations. It removes the roadblocks inherent in traditional data tools and doesn’t require programming skills,” said Ali Ghodsi, Co-Founder and CEO at Databricks. “Every employee knows the questions to ask to improve their day-to-day work and ultimately their business. With LakehouseIQ, they have the power to quickly and accurately discover the answers.”

Informatica Announces Superpipe for Snowflake with Up to 3.5x Faster Data Integration and Replication

Informatica (NYSE: INFA), an enterprise cloud data management leader announced the launch of four new product capabilities to bring greater simplicity, speed, and performance to data integration and replication for customers in the Snowflake ecosystem: Informatica Superpipe for Snowflake, Enterprise Data Integrator Snowflake native application, Cloud Data Integration-Free for Snowflake, and support for Apache Iceberg on Snowflake.

“Customers are looking to move their data quickly and seamlessly into Snowflake to power trusted insights, and Informatica enables organizations to do just that by accelerating their time to value from our combined capabilities,” said Sumeet Kumar Agrawal, Vice President of Product Management, Informatica. “We’re introducing fast and seamless replication capabilities and making them more accessible than ever, empowering customers to leverage mission-critical data within the Snowflake ecosystem for better business outcomes.”

Reltio Expands AI/ML Innovation: Unveils Industry-first Pre-Trained ML Model for Automated Entity Resolution

As Artificial Intelligence (AI) and Machine Learning (ML) technologies disrupt traditional business models, Reltio^® is changing the game with real-time AI-driven Master Data Management (MDM). Reltio, a leading cloud-native, SaaS MDM platform, announced new ML capabilities with its rollout of the Reltio Connected Data Platform 2023.2 release. Reltio’s technology focus in its 2023.2 release is on Entity Resolution – widely considered one of the most complex of all MDM challenges. Reltio has augmented entity resolution capabilities with the industry’s first pre-trained ML models.

Entity Resolution is the process of identifying and reconciling records from different systems across the entire enterprise into one entity for use across business processes. As application proliferation continues to grow exponentially across organizations, unifying data across siloed sources becomes increasingly challenging. Many organizations deploy teams of data stewards to manage this through manual processes.

Reltio has leveraged AI/ML so that customers can quickly and accurately achieve value from their Customer, Product, Supplier, and other Core business data domains. Unlike other solutions that require weeks of manual effort before the ML recommendations can be effective, Reltio’s entity resolution capability delivers instant value. Industry and domain-specific ML models can now be enabled with a click of a button and make it easy for data stewards to review and consolidate matches – ultimately yielding high-quality data for the organization to use, and greater efficiency in data management. Reltio’s Connected Data Platform provides trusted high-quality matching recommendations out of the box, and in a fraction of the time and cost previously required.

“Reltio is pioneering the future of modern data management through substantial investments in AI/ML, sparking innovation, and delivering swift and superior value to our customers,” said Manish Sood, CEO, Founder, and Chairman of Reltio. “The indispensable partnership between AI/ML and data management empowers businesses with trusted, high-quality data, driving real-time operations and insightful analytics powered by machine learning.”

Airtable AI Opens Beta Program for Leading Enterprise Customers

Airtable launched a customer beta program and expanding access to Airtable AI, the easiest and fastest way to deploy AI-powered applications for the enterprise. The beta program will include participation from leading global retail, services, technology, and media companies and will scale to thousands of organizations using Airtable later this summer. Beta customers will be able to experiment with and embed new AI capabilities in their data and workflows and provide valuable insights for Airtable’s roadmap, building on those the company gained during a limited alpha program and its own internal use of Airtable AI.

Integrating AI capabilities into Airtable’s next-generation platform enables organizations to embed AI across a wide range of critical business processes and power marketing campaigns, content production, product roadmaps, customer feedback, and other workflows, all in one place.

Launching initially with foundation models provided by OpenAI, Airtable AI will eventually offer customers the ability to choose from a variety of models that best fit their needs.

“The current generation of large language models are already highly capable of advanced reasoning and creativity tasks and can be applied to a broad range of knowledge work. Yet most companies are just beginning to scratch the surface with them,” says Airtable Co-founder and CEO Howie Liu. “Enterprises can now use Airtable’s no-code approach to immediately integrate these powerful models in the context of their existing data and workflows, making it possible to rapidly build and deploy highly engaging, AI-powered apps.

Zetaris launches fast Data Expansion Kit and Industry Solutions offer powered by Dataiku

Zetaris, a leading unified data preparation studio, unveiled a lightning fast data expansion kit and industry solution trial offer, both powered by Dataiku, the platform for everyday AI.

The Zetaris Data Expansion Kit (BYOD) enables organizations to bring their own data from any source and integrate it into a single view and prepare all data assets in the studio. Users can connect to the data science layer to create and deliver data and advanced analytics using the latest techniques at scale. This speeds up time, reduces complexity and is far more cost effective than traditional methods.

The Zetaris Industry Solutions offer provides users with vertical end-to-end analytical solutions to their most pressing business needs. It is through the Dataiku plug-in that Zetaris creates an easy way for Dataiku customers to connect to the hard-to-reach data sources in real-time, expanding the datasets you can connect to in a unified manner. Dataiku strengthens the proposition of Zetaris by providing an insights layer for AI.

Vinay Samuel, CEO and founder of Zetaris, enthuses, “We are delighted to partner with Dataiku to unlock the power and benefits of real-time access and query of data, without the complexity and cost of having to move the data and clean it. Users can self-serve and test an end-end environment preloaded with sample data, semantic model pipelines, a data mart and plumbed Dataiku model. Organizations can replace sample data connections to real data and seamlessly transition to a commercial model and leverage out-of the box data quality management of the source data set.”

RisingWave Cloud Democratizes Event Stream Processing, Making It Affordable at Cloud Scale

RisingWave Labs, the company behind the distributed SQL streaming database built for the cloud, announced the general availability of RisingWave Cloud, a fully managed SQL stream processing platform, built on a cloud-native architecture. RisingWave Cloud consumes streaming data, performs incremental computations when new data comes in, and updates results dynamically at a fraction of the cost compared to Apache Flink.

“Traditionally, event stream processing used older technologies, and those first-generation stream processing systems came with cost and performance barriers,” said Yingjun Wu, CEO and founder of RisingWave Labs. “We founded RisingWave Labs with the aim of empowering companies with real-time information to enable faster, better business decision-making in a cost-efficient manner. We have an extremely powerful distributed SQL engine that gives you fully featured, compute-efficient SQL analytics in the cloud.”

DataStax Taps Generative AI to Streamline Data Pipelines

DataStax, the real-time AI company, announced a new, GPT-based schema translator in its Astra Streaming cloud service that uses generative AI to efficiently and accurately transfer data between systems with different data structures within an enterprise. An industry first, the new Astra Streaming GPT Schema Translator automatically generates “schema mappings,” a traditionally difficult and time-consuming process when building and maintaining event streaming pipelines.

“Organizations are still identifying the role that new AI technologies, like GPT, will play in their day-to-day business functions and we’re proud to be the first to integrate this new technology into our product to significantly reduce development time and support costs ,” said Chris Latimer, general manager, streaming, DataStax. “At DataStax, we’re continually innovating to develop impactful integrations and features, like the new GPT Schema Translator, to support our users and maintain our competitive edge. We’re experimenting with new technologies to address known pain points, help make developers more productive, and make businesses more effective with the help of AI.”

Chronosphere launches new framework and features to tackle growing observability costs

As more organizations become cloud native, their observability data costs are spiraling out of control, creating a significant and increasingly unpredictable budget item. To mitigate this problem, Chronosphere, a leading cloud native observability platform, is launching the Observability Data Optimization Cycle – the industry’s first vendor-neutral framework. The framework helps companies regain control over observability data growth, which 69% of companies say is an area of concern, based on recent research by ESG on observability spending. Chronosphere is also introducing new product features to support this framework, enabling teams to better understand and optimize the management of their cloud observability resources.

“As more organizations adopt cloud native architectures, engineers are drowning in the massive amount of observability data that comes with it,” said Martin Mao, CEO of Chronosphere. “This is causing an explosion in observability costs, while simultaneously overwhelming engineers in the troubleshooting process, leading to longer incidents and unhappy customers. Our new framework and features helps organizations achieve the best possible observability outcomes while keeping costs under control.”

Edge Delta Revolutionizes Observability Pipelines with New Release

Edge Delta, a leading observability pipelines and analytics company, announced its latest release, Visual Pipelines. This release dramatically simplifies observability pipeline workflows. Now, teams can collect, process, and route data in just a few clicks. Visual Pipeline was designed and architected with large enterprise organizations in mind. It gives these organizations the ability to support and scale pipelines spanning hundreds of teams across thousands of data sources, thereby unlocking open observability.

“With Visual Pipelines, we aim to remove cumbersome manual processes from data collection, processing and routing. This release allows teams of all sizes to focus on higher-priority duties, like building great software, while staying ahead of data growth and observability budgets,” says Ozan Unlu, CEO, Edge Delta.

Cutting-Edge Monster API Platform Transforms AI Development: Up to 90% Cost Reduction Achieved Through Decentralized Computing

Monster API launched its platform to provide developers access to GPU infrastructure and pre-trained AI models at a far lower cost than other cloud-based options, delivering extraordinary ease of use and scalability. It utilizes decentralized computing to enable developers to quickly and efficiently create AI applications, saving up to 90% over traditional cloud options. Generative AI is disrupting every industry from content creation to code generation, however the application development process in the machine learning world is very expensive, complex and hard to scale for all but the most sophisticated and largest ML teams.

The new platform allows developers to access the latest and most powerful AI models (like Stable Diffusion) ‘out-of-the-box’ at one-tenth the cost of traditional cloud ‘monopolies’ like AWS, GCP, and Azure. In just one example, an optimized version of the Whisper AI model on Monster API led to a 90% reduced cost as compared to running it on a traditional cloud like AWS.

“By 2030, AI will impact the lives of 8 billion people. With Monster API, our ultimate wish is to see developers unleash their genius and dazzle the universe by helping them bring their innovations to life in a matter of hours,” said Saurabh Vij, CEO and co-founder, Monster API.

Mendix Adds Powerful New AI and Machine Learning Capabilities to its Enterprise Low-Code Platform

Mendix, a Siemens business and global leader in modern enterprise application development, outlined powerful new and robust AI and machine learning capabilities, including innovative context-aware AI developer tools, which will all be available upon the release of Mendix 10.

The new AI and Machine Learning enhancements reinforce the status of Mendix’s low-code platform as the de facto standard for building smart business applications and solutions. Mendix 10 features greatly expanded AI capabilities in two major areas. First, Mendix 10 empowers the enterprise to seamlessly integrate AI use cases with low-code applications using Mendix’s new Machine Learning Kit. Secondly, the platform greatly expands the scope and functionality of AI-enabled application development.

“Enterprises with sophisticated machine learning capabilities can easily incorporate their models into Mendix applications using the ML Kit,” said Amir Piltan, Mendix’s senior product manager for AI. “But for those earlier in the adoption curve, it is not necessary for enterprises to build models from scratch. They can start with the ONNX Model Zoo, fine-tune the model for specific use cases, and keep their data and AI model secure, as it never leaves their Mendix ecosystem. This makes AI deployment easier from an operational, commercial, and governance standpoint.”

ThinkData Works Launches Data Lineage Solution for Improved Data Observability and Health Monitoring

ThinkData Works Inc., a leading data catalog provider, launched a new data lineage solution that allows companies to better visualize data flows, improving efficiency and observability for an organization’s entire data ecosystem. The offering provides a comprehensive view of data pipelines, displaying an active graph of data lineage from warehouse to model and letting users analyze and explore pipelines to ensure quality, limit risk, and reduce cost.

“When we started thinking about how we’d build a lineage solution, we knew exactly what we didn’t want to do,” says Lewis Wynne-Jones, VP Product at ThinkData Works. “We’ve seen too many lineage solutions that demo well but completely fall apart when they’re dropped into a production environment. In order to be truly useful, data lineage needs to be clear, practical, and easy-to-interpret. We tested our solution against pipelines generating models out of thousands of datasets and within minutes were able to spot dependencies and risks that we’d never seen before. That was when we knew we’d built something incredible.”

Promethium Brings the Power of Generative AI to the Data Fabric

Promethium, a leading innovator in data management and analytics, announced availability of its groundbreaking generative AI-powered data fabric, which leverages the power of ChatGPT to provide a significant leap forward in data management and analysis, promising to transform the way organizations harness and leverage their data assets. With the rapid proliferation of data across industries, traditional data management approaches are increasingly inadequate for current tools that rely heavily on manual processes. Recognizing the critical need for a faster and more intuitive solution, Promethium has harnessed the state-of-the-art language modeling capabilities of ChatGPT to develop a data fabric unlike anything seen before.

“Promethium’s Generative AI-powered data fabric is a game-changer for data management,” said Kaycee Lai, Founder and CEO at Promethium. “By incorporating Generative AI, we’re further enabling organizations to unlock the true potential of their data, empowering all of their employees to make data-driven decisions with confidence and speed.”

Speedata Launches Workload Analyzer – A Tool for Enterprises Looking to Accelerate Spark Queries

Speedata, whose first-of-its-kind Analytics Processing Unit (APU) accelerates big data analytic workloads across industries, announced the launch of its “Workload Analyzer.” The browser-based performance predictor tool analyzes Spark log files to help data engineers learn how to maximize workload performance, both in the cloud and on-prem.

“Our team is committed to providing enterprises with the tools needed to accelerate their big data analytics workloads,” said Jonathan Friedmann, Co-Founder & CEO of Speedata. “The Workload Analyzer is one of those tools, helping businesses focus on what’s working and how to improve what’s not. It’s designed to help data engineers optimize their analytics with available infrastructure, set realistic goals, maximize their data, and maintain their competitive edge.”

Parallel Domain Unveils Data Lab, a self-serve API for Synthetic Data Generation

Parallel Domain is a synthetic data platform that enhances perception performance for autonomy and robotics companies. They announced Data Lab, a new API to generate high-fidelity synthetic data for training and testing of perception systems. The solution enables ML engineers to create synthetic datasets with just a few lines of code, giving them control over dynamic virtual worlds to simulate any scenario imaginable.

“Autonomous systems, from ADAS to delivery drones, are already making our world safer and more accessible. But they’re only operating in limited contexts today,” said Kevin McNamara, CEO and Founder of Parallel Domain. “The launch of Data Lab moves us towards a world where ML developers across autonomy industries can generate exactly the datasets they need to prepare their systems for a wide variety of situations — from someone crossing the street with a stroller to a tree falling in the middle of the road. Our customers have an increasing need for data to expand their operations. We’re proud to meet this need with Data Lab and contribute to a future where our lives are improved by safe autonomous systems every single day.”

Aerospike Delivers New Developer-ready, Real-time Scalable Graph Database

Aerospike, Inc. unveiled a new graph database with native support for the Gremlin query language. More and more organizations need to integrate and support large graph data sets for better decisions in heavy transactional (OLTP) operations. But today’s graph databases fail to perform at scale with a sustainable cost model. This forces companies to waste time and resources creating inflexible custom solutions.

Aerospike Graph delivers millisecond multi-hop graph queries at extreme throughput across trillions of vertices and edges. Benchmarks show a throughput of more than 100,000 queries per second with sub-5ms latency — on a fraction of the infrastructure. Developers can write applications with new and existing Gremlin queries in Aerospike Graph. These OLTP applications for AdTech, Customer 360, fraud detection and prevention, and other use cases can now operate at scale in real time.

“To meet the demands for enterprises building rich, real-time data services and applications, Aerospike’s multi-model platform continues to expand its legendary scale, low latency, and lowest total cost of ownership to the most popular areas of data management,” said Subbu Iyer, CEO of Aerospike. “We’ve now delivered Aerospike’s unique value propositions to the graph database market, where enterprises no longer need to spend lavishly on bespoke solutions to get real-time performance and scale.”

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Source link