AI is Everywhere
After the first quarter of 2024, we want to recap the most influential and most talked about trends in Artificial Intelligence and Deep Learning. No matter what industry and application, AI has a place to change the way we work, research, and become more productive.
As a high-performance computing solutions provider for AI and deep learning workloads, we took note of interesting and rising AI trends and have listed what we feel will make or have already made a splash in AI. First, we will define AI and then give some thoughts on the future of that AI.
Multi-Modal AI
Definition: Multi-modal AI systems are characterized by the ability to process and understand information from multiple modalities or sources. These modalities may include text, images, speech, videos, and other forms of data.
In the normal world, we don’t speak in binary or code. Instead, the world’s form of communication is through text, images, video, and more. Data points we can comprehend with the blink of an eye. However, translating this data back to a computer or an AI has been challenging to say the least.
Through the use of transformer networks, convolution, and other high-level algorithms for calculating sentiment, weight, and more, we can deliver real-world data to our computers more easily than ever before. Each pixel on an image now can give meaning, and each sentence can be broken up into tokens that an AI can now comprehend.
But the fascinating thing isn’t just the comprehension, but the ability to respond. Last year, we were given GPT-4, Llama 2, DALLE-3, Mistal AI, Stable Diffusion, and more. Open AI also released and teased Sora, an AI model that can create realistic imaginative videos from text. These AIs could ingest texts as prompts and respond with coherent and knowledgeable information through more text or generate original art.
Open Source LLMs
Definition: Open Source Large Language Models are community-developed models with source code available to the general public for inspection, modification, and distribution. This allows any developer to inspect and modify the code as they see fit.
Open Source LLMs promote constant improvements, collaboration, innovation, and democratization of AI by enabling individuals to organizations to leverage and build upon for further advancements. Having transparent and accessible AI models is a huge plus for the continued development and innovation in the AI space.
Closed-source models still have a purpose with dedicated centralized teams, goals, and defined compute resources but the need for open-source models drives innovation and pressures the other AI companies to continue to develop.
Open-source models from Meta (Llama) and Mistral AI are powerful models great for developing custom AI and AI Agents with finetuned functions for specific tasks. Organizations can also train these LLMs on proprietary company data, and since these LLMs are open source, they can be downloaded and run locally where API calls back to a closed source model are a security risk.
Custom Local Models
Definition: The increased adoption and surge of available Open Source LLMs has made way for developing custom local models trained on domain knowledge and fine-tuned for a specific task.
Fine-tuned domain-specific AI models are the future of implementing AI in every industry. By feeding AI models with domain data and knowledge, it can produce results more aligned with our expectations. Not every organization needs the most advanced, generalized, and best-performing model. These models like GPT-4 use trillions of parameters and are computationally expensive to run.
By augmenting a smaller, existing, open-source model, developers and enthusiasts can run AI on commodity hardware efficiently. These AI models can be custom fine-tuned for almost any scenario, from customer support to supply chain management to a mini LLM document review. This is especially relevant for specialized industries with industry-specific jargon, such as healthcare, finance, and legal. Not to mention, running these smaller models on local hardware allows data security-minded industries like healthcare and legal to run them.
AI Robotics
Definition: AI Robotics are non-preprogrammed robotics that can perform generalized tasks, capable of perception, goals, and self-test optimization paramount for robotics to have better problem-solving skills.
AI Robotics bridges the digital AI world and the interaction of computers to the real world. Our computers use computer vision to see, and now we are supplying them with the ability to process and perceive. NVIDIA highlighted robotics during their 2024 GTC Keynote emphasizing AI perception. Training humanoid robotics to walk, run, lift boxes, and respond to commands is only possible with multi-modal AI. Multi-modal AI can help them visualize, and interpret, physical objects as well as comprehend voice commands. For example, the robot can understand a vocal prompt to pick up an apple, and the computer vision built into the robot can decipher which object in front of it is an apple.
Custom local models enable the robotics processor to run highly specific smaller AI tasks without being overloaded by a massive LLM. Open-source models let developers code functions to distribute to the robotics model. By minimizing the size of models and enabling developers to code their custom AI models, robotics processors can be smaller and more energy-efficient for running a multitude of local AI tasks.