Radar Trends to Watch: January 2025

Radar Trends to Watch: January 2025

News

Despite its 31 days, December is a short month. Announcements and events other than office parties get attention. To combat this trend, OpenAI released a series of announcements: “12 Days of OpenAI”. Not to be outdone, Google responded with a flurry of announcements, including their Gemini 2.0 Flash Thinking model. Models appeared that could use audio and video streaming for both input and output. But perhaps the most important announcement was DeepSeek-V3, a very large expert mixture model (671B parameters) that performs on par with the other top models – but costs roughly 1/10 the cost to train.

AI

  • DeepSeek-V3 is another LLM to watch. Its performance is on par with Llama 3.1, GPT-4o and Claude Sonnet. Although the training was not cheap, the cost of training was estimated to be about 10% of the larger models.
  • Not to be outdone by Google, OpenAI introduced its other models: o3 and o3-mini. Both are “reasoning models” that have been trained to solve logical problems. They may be released at the end of January; OpenAI is looking for safety and security researchers for testing.
  • Not to exceed the 12 days of OpenAI, Google released a new experimental model that was trained to solve logical problems: Gemini 2.0 Flash Thinking. Unlike OpenAI’s GPT models, which support reasoning, Flash Thinking shows its chain of thought explicitly.
  • Jeremy Howard and his team have released ModernBERT, a major upgrade to the BERT model they released six years ago. It comes in two sizes: parameters 139M and 395M. It is ideal for searching, classifying and extracting entities and other parts of the data feed.
  • AWS’s Bedrock service has the ability to check the output of other models for hallucinations.
  • To make sure that the 12 days of OpenAI are not passed, Google announced Android XR, an operating system for augmented reality headsets and glasses. Google doesn’t plan to build their own hardware; they work with Samsung, Qualcomm and other manufacturers.
  • Not to be outdone by OpenAI 12 days ago, Anthropic announced Clio, a privacy-preserving approach to discovering how people use their models. This information will be used to help Anthropic better understand security issues and create more useful models.
  • Not to be outdone by the 12 days of OpenAI, Google announced Gemini 2.0 Flash, a multi-modal model that supports streaming for both input and output. The announcement also introduced Astra, an artificial intelligence agent for smartphones. Neither is widely available yet.
  • OpenAI has released canvas, a new feature that combines programming with typing. Changes to the canvas (code or text) immediately become part of the context. Python code is run in the browser using Pyodide (Wasm), rather than in a container (as with Code Interpreter).
  • Stripe has announced a toolkit for agents that lets you build payments into agent workflows. Stripe recommends using the toolkit in test mode until the application is thoroughly validated.
  • Simon Willison shows how to run a GPT-4 class model (Llama 3.3 70B) on a reasonably well-equipped laptop (64GB MacBook Pro M2).
  • As part of its 12 Days of OpenAI series, OpenAI has finally released its video generation model, Sora. It’s free for ChatGPT Plus subscribers, though limited to 50 five-second video clips per month; a ChatGPT Pro account loosens many restrictions.
  • Researchers have shown that advanced models of artificial intelligence, including Claude 3 Opus and OpenAI o1, are capable of “intrigue”: working against the interests of their users to achieve their goals. Planning includes subverting oversight mechanisms, intentionally delivering subpar results, and even taking steps to avoid suspension or replacement. Hello, HAL?
  • Roaming RAG is a new technique for augmented search generation that finds relevant content by searching titles and traversing documents – like a human. It requires well-structured documents. A surprisingly simple idea, really.
  • Google announced PaliGemma 2, a new version of its Gemma models that includes vision.
  • GPT-4-o1-preview is no more; preview is now the real deal, OpenAI o1. In addition to advanced reasoning capabilities, the production version claims to be faster and provide more consistent results.
  • A group of AI agents inside Minecraft they behaved surprisingly like humans – even developed occupations and religions. Is it a way to model how human groups work together?
  • One thing the AI ​​industry desperately needs (besides more power) is better benchmarks. Current benchmarks are closed, easy to game (the AI ​​does that), and non-reproducible, and don’t need to test anything meaningful. Better Bench is a benchmark quality assessment framework.
  • Palmyra Creative, a new language model from Writer, promises the ability to develop a “style” so that all AI-generated output doesn’t sound boringly the same.
  • During AI training, it collects biases from human data. When humans interact with AI, there is a feedback loop that reinforces these biases.

Programming

  • Unicon may never become one of the top 20 (or top 100) programming languages, but it is a descendant of Icon, which has always been my favorite string processing language.
  • What do CAPTCHAs mean when LLM-equipped robots can successfully perform tasks set for humans?
  • egui, along with eframe, is a GUI library and framework for Rust. It’s portable and runs natively (on macOS, Windows, Linux, and Android), on the web (using Wasm), and in many game engines.
  • For the archivist in us: The Manx Project is not about an island in the Irish Sea or about cats. It is a catalog of manuals for old computers.
  • Cerbrec is a graphical Python framework for deep learning. It is aimed at Python programmers who do not have sufficient expertise to build applications with PyTorch or other artificial intelligence libraries.
  • GitHub has announced free access to GitHub Copilot for all existing and new users. Free access gives you 2,000 code fills and 50 chat messages per month. In addition to the GPT-4o, they also added the ability to use the Claude 3.5 Sonnet.
  • Devin, an AI-assisted coding tool that claims to support end-to-end software development, including design and debugging, has reached general availability.
  • JSON5, also known as “JSON for humans”, is a variant of JSON that was designed to be human-readable so that it can be written and maintained manually – for example in configuration files.
  • AWS announced two major new services: Aurora DSQL, which is a distributed SQL database, and S3 Tables, which supports data lakes through Apache Iceberg.
  • AutoFlow is an open source knowledge graph creation tool. It is based on TiDB (vector database), LlamaIndex and DSPy.

Security

  • Portspoof is a security tool that makes all 65,535 TCP ports appear open to valid services. Emulates a valid service on each port. It is difficult for an attacker to determine which ports are actually open without examining each port.
  • Let’s Encrypt, the company that issues certificates that websites (and other apps) use to prove their identity, has announced short-lived certificates that expire after six days. Short-lived certificates increase security by minimizing the risk of compromise if the private key is compromised.
  • Due to the continued presence of attackers in telecommunications networks, the US FBI and CISA have recommended the use of encrypted communication protocols. (Though they still want a backdoor into encryption systems, which would make them vulnerable to attack.)
  • A new phishing attack uses corrupted Word documents to bypass security checks. Even if documents are damaged, Word can restore them.
  • LLM Flowbreaking is a new class of attacks against language models that prevent guardrails from stopping unwanted output from reaching the user. These attacks exploit racial terms in the application’s interaction with users.
  • Bootkitty is a UEFI bootkit that focuses on secure booting on Ubuntu systems. It appears to have been developed by cyber security students in Korea and then leaked (perhaps accidentally). It has not yet been found in the wild, but when it does, it will be a dangerous threat.
  • DEF CON launched a project to improve the cybersecurity of US water infrastructure. They start with six water companies that serve rural communities.

Quantum computing

  • Google has built a quantum computing chip in which an error-corrected logic qubit can remain stable for an hour. Exceeds the “subthreshold”: the error rate is reduced when physical qubits are added to correct the errors. The chip was built at Google’s new manufacturing facility.

Web

  • Google adds “store reviews” to Chrome. Reviews are summaries of reports from well-known sources that report fraud and other problems generated by artificial intelligence.
  • Here’s how to create user interfaces for streaming text on the web. Text streaming is almost a must for building AI-driven chatbots.

Biology

  • Yes, we can have virtual taste. A research group has developed a lollipop interface so people can experience the taste in virtual worlds.


Learn faster. Dig deeper. See further.

Leave a Reply

Your email address will not be published. Required fields are marked *