Devin, the First AI Programmer, is Born
On March 13, Cognition Labs launched its first AI software engineer, Devin. Based on the information released so far, Devin appears to be one of the strongest AI software engineers available.
In the SWE-Bench basic test, Devin was able to solve 13.86% of the problems without human assistance, while the current SOTA model can only accomplish 1.96% of the tasks.
Devin's ability to solve real-world software problems surpasses that of current models such as GPT-4 and Claude, based on the results alone.
Officials tweeted that Devin not only passed an interview with one of the industry's leading AI companies but also successfully fulfilled a single work request on the freelance platform Upwork.
With the real world now non-existent, will Devin make software engineering obsolete?
Devin is not yet available to the public. However, during testing, a developer reported success in using AI to assist with coding. Previously, attempts to use AI for this purpose had failed. Recently, the developer tasked Devin with extracting selectors from a simple HTML page. Other AI tools such as GPT-4-turbo, Claude, Groq, and LLama2 were unable to complete this task, but Devin succeeded in just 10s.
So, how capable is this AI software engineer who came to steal the programmer's job?
Full-stack abilities to create finished projects independently
Although many models on the market can program, they often require step-by-step prompts to generate a complete program.
With Devin, all you have to do is ask it to do something and wait for it to complete the task.
Devin's approach involves automatically writing code based on natural language cues, generating complete programs, and publishing them online. It can also plan and execute complex tasks that require thousands of decisions.
This is made possible by advances in long-term reasoning and planning by the Cognition AI behind it, which allows Devin to recall relevant context at each step, learn over time, and correct mistakes.
Devin specialises in autonomous learning and using unfamiliar technology from new knowledge.
He can even train and fine-tune his own AI models. It appears that AI is about to close the loop.
According to Alex Atallah, former CTO of OpenSea, Devin is the first AI agent he has interacted with that makes him feel like he is communicating with real, useful people.
Cognition AI claims that Devin has achieved a breakthrough in AI comprehension, going beyond predicting the next word or line of code to considering the overall approach to problem-solving.
The specifics yet unknown, the technology is headed toward autonomous driving
The use of AI in software development is not a new concept. However, most AI products are designed to assist in programming. Devin's technology path, which CognitionAI has not yet disclosed, briefly mentions that the team has found a unique way to combine large-scale language models (LLMs), such as OpenAI's GPT-4, with reinforcement learning techniques. This approach could be the key to a technological breakthrough.
After watching Devin's demo, Andrej Karpathy, an AI expert who recently departed OpenAI, provided some insights. He believes that the development of automated software engineering will follow a similar path to automated driving, where AI takes on more and more tasks while humans provide oversight.
$21 million Series A fundraising round, ten-person team
The company behind Devin is called Cognition AI, based in New York and San Francisco, and is positioned as an applied AI lab focused on reasoning. The company was incorporated two months ago after working in secret. Cognition Labs recently launched Devin and announced the closing of a $21 million Series A round led by Funders Fund.
The team consists of only 10 members, but they have won 10 IOI gold medals. The founding members have worked on AI frontiers at Cursor, Scale AI, Lunchclub, Modal, Google DeepMind, Waymo, Nuro, and other companies.