Introducing YouAgent with Code Execution
You.com introduces YouAgent, an AI agent with access to a computing environment, enabling it to run code to answer your STEM questions more reliably.
In this example, YouAgent calculates a monthly mortgage by writing and executing code.
Disclaimer: YouAgent functionality can now be accessed through Genius Mode. Learn more about Genius Mode and other AI Modes.
This blog post was published prior to You.com’s latest AI advancements and may not reflect our current capabilities. With a foundation in search and the team’s AI expertise, You.com was perfectly positioned to enhance LLMs with live access to the Internet to address issues around hallucinations and transparency. As such, You.com is capable of tasks ranging from searching online to writing an essay, debugging code, creating digital art, solving complex problems, and more. Learn more about getting the most out of You.com.
You.com also offers its core technology through a suite of self-serve APIs. Get complete details about the YOU API.
Background
LLMs have enabled new ways of learning and creating on the internet. They provide long-form, useful, and conversational answers to many different types of questions. However, they come with several serious shortcomings:
- They cannot be trained frequently enough to stay up-to-date, which is necessary to provide the most accurate references and citations.
- They hallucinate — often confidently providing incorrect answers — about stock prices, recent news, people, and other important questions.
- They cannot reliably reason about math, science, and logic.
In 2022, You.com was the first to launch a consumer product with an LLM that could access and refer to the Internet to provide answers that are up-to-date and include citations [1].
In the spring of 2023, You.com was the first to introduce multi-modal chat outputs for consumers, accurately providing plots, charts, and interactive apps to offer a reliable alternative to text that may contain hallucinations for real-time topics (e.g., stock prices, weather, etc.) [2].
Introducing YouAgent
Today, You.com introduces YouAgent. The term “AI agent” comes from the machine learning community’s term for an AI that not only observes its environment but also takes action within that environment. Since its founding, You.com has aimed to be a Do-Engine that can help people actually get things done, and YouAgent is the next major milestone on the path toward that vision.
YouAgent’s first set of actions is enabled by using a computing environment that runs Python code. The LLM can write code, run it in this environment, and then take further action based on the output of the code execution. This code interpreter tool, along with YouAgent’s multi-step reasoning process, enables it to answer complex STEM questions far more accurately than other pure LLMs.
How to use YouAgent
You can use YouAgent by starting your query with “@agent” or “/agent” in our AI chat interface. These trigger words will tell You.com that you would like it to act, which today means executing Python code in a computing environment. Note that action capabilities will expand in the future.
Currently, any logged in You.com user can make up to five YouAgent queries per day. YouPro subscribers can make up to 100 YouAgent queries each day. Learn more about YouPro.
To see how YouAgent generates a response on You.com, view this sample.
Putting YouAgent to the Test
Asking an LLM to multiply large numbers or solve complex math and physics problems is similar to asking a normal person what “55 to the power of 0.12” is without giving them a calculator. Many chatbots in the market provide confident but wrong answers to STEM questions. Some chat providers even offer citations for incorrect reasoning on these types of questions.
We find that code execution helps with these issues. Concretely, we perform better on several STEM benchmarks, sampled from the academic MMLU dataset (college math / high school math / high school statistics / high school physics categories), the ACT (math section), and the GRE (math section). We report the performance of YouAgent against GPT-4 to demonstrate the effectiveness of YouAgent on STEM questions compared to pure LLMs.
The table and chart below report the accuracy of YouAgent and GPT-4 on various STEM benchmarks, including academic benchmarks as well as US undergraduate / graduate entrance exams.
As shown in the images above, YouAgent consistently performs similar or better than GPT-4 across each benchmark. We observe a 27% absolute increase in accuracy over GPT-4 on an official practice ACT math section, which is the difference between a C- (69%) and A+ (96%) student. Relative performance does vary across tasks, with YouAgent performing significantly better than GPT-4 on computation-heavy tests (e.g., the ACT, high school statistics, etc.) and marginally better or equivalent to GPT-4 on more abstract, less computation-heavy math tests (e.g., the GRE, certain college math questions, etc.).
If you want to access the underlying datasets, feel free to shoot us an email. We’re continuously taking steps to further improve our accuracy across different mathematical and scientific domains.
Comparisons to other chatbots without code execution
To illustrate some of these improvements, we contrast YouAgent to example answers from other large consumer LLM offerings (Google, ChatGPT+ [3], and Bing) as well as some smaller platforms.
With access to a code execution environment along with its multi-step reasoning capabilities, YouAgent can more reliably answer questions that involve performing various mathematical operations than other consumer LLM offerings that don’t leverage code execution.
We find that if GPT-4 cannot solve a problem, none of the companies that use its API will be able to solve that problem either. Given the common usage of the GPT-4 API, this results in many consumer chatbots giving confident wrong answers that require mathematical reasoning. For STEM questions, some chat engines even give citations for wrong answers. In some cases, the citations do not include the facts at all; in other cases, they are misleading but suggest the answer is backed up and correct.
We provide some examples of YouAgent and other chatbots responding differently to STEM questions below. Note that YouAgent also performs better than YouChat without @agent itself when answering certain STEM questions. To access the YouAgent benchmark dataset with additional examples, please reach out to us.
Example #1:
YouAgent ✅, Link to YouAgent Answer
Other chatbots ❌
Example #2:
YouAgent ✅, Link to YouAgent Answer
Other chatbots ❌
Example #3
YouAgent ✅, Link to YouAgent Answer
Other chatbots ❌
Limitations and future work
While YouAgent is able to perform well on various STEM tasks given its multi-step reasoning process combined with access to a coding environment, we still have not achieved 100% accuracy on our benchmarks. Progressing closer toward that goal will take additional research and development.
Another known limitation is that YouAgent will often try to execute code, even when coding is not necessarily required – we plan to continuously learn when to execute code in order to better solve the variety of questions our users ask You.com every day.
We aim to expand YouAgent in the near future to support:
- file uploads
- image outputs such as plots and graphs
- ability to perform web search in conjunction with code execution
- more mathematical and scientific libraries
- better formatting of mathematical text
- continued performance improvements across various STEM benchmarks
If you would like YouAgent to include additional libraries beyond the initial dozen that we support at the moment or would like to request any other functionalities, please let us know. We welcome you to join our Discord or apply to join the team if this is a direction that excites you.
Conclusion
At You.com, we want to provide accurate answers to all questions. We want to go beyond providing knowledge and help you get things done. To do this, we continue to innovate in this direction by bringing our users AI that can access up-to-date information online, decide how to best present that information in different modalities, and now reason much better about logic, math, physics, and chemistry by writing and executing code.
For additional info about YouAgent and You.com, please refer to our Frequently Asked Questions.
Reference Notes
[1] Various papers, such as LaMDA, had been published earlier that describe tool use, but no consumer product had launched with citations and continuous internet access prior to YouChat. For the launch date of YouChat, see our announcement on Twitter.
[3] ChatGPT+ is run without a code interpreter by default, which requires changing settings. ChatGPT+ offers the most similar functionality to YouAgent through their “Advanced data analysis” option. However, this is not available to any of the companies that use GPT-3 or GPT-4 APIs.