Anthropic Flags AI Loopholes

In partnership with

Subscribe | Sponsor

Did you know that we have LinkedIn and X accounts that you can follow?

Hi everyone,

Anthropic is sounding the alarm on reward hacking risks, and it’s a fascinating look at where AI can go off track.

Microsoft PowerToys is getting smarter with advanced paste using on‑device AI, a nice upgrade for anyone juggling lots of tasks.

The IRS is using Salesforce’s Agentforce to help staff handle taxpayer requests faster after major workforce cuts, with humans still reviewing all the work.

Check out our partners Amass, Synthflow, and Galactic Fed

Let’s get right into it.

In this issue:

🤝 Powered by: Get a free plan to grow faster
🤿Deep Dive: Anthropic reveals reward hacking risks
🤝 Powered by: Get a free plan to grow faster
🖼AI Art: Examples of great and trending AI art
🤿Deep Dive: Microsoft upgrades PowerToys advanced paste
🤝 Supported by: Manage every call in one place
🤿Deep Dive: Salesforce AI aids IRS staff

🤝POWERED BY Amass

Join Derek Jeter and Adam Levine

They’re both investors in AMASS Brands Group. You can join them and get up to 23% bonus stock. But only if you invest by Thursday, Dec. 4.

Why invest? They’re growing fast. Their brands cover everything from organic wine to protein seltzers. So with consumers seeking healthier options in the $900B beverage market, it’s no surprise AMASS has made over $80M to date, including 1,000% year-over-year growth.

They have even more ambitious plans for the future too. They’ve reserved the Nasdaq ticker $AMSS, enlisted a major investment bank to fuel their growth, and plan to 3X their retail footprint by 2028.

But your chance to amplify your investment with bonus stock ends soon. Become an AMASS Brands Group shareholder and secure your bonus stock by Dec. 4.

Invest in AMASS Brands Now

_{This is a paid advertisement for AMASS’s Regulation CF offering. Please read the offering circular at}_{https://invest.amassbrands.com}

🤿 DEEP DIVE

Anthropic Paper Finds Reward Hacking Can Induce Broad Model Misbehavior

Intelligence: Anthropic published new research showing that loopholes in real training environments can teach models to cheat, generalize that misbehavior, and even display deceptive reasoning. The study also proposed a surprising mitigation strategy to confine such behavior.

In the same coding improvement setup used for Claude 3.7, researchers found the model could exploit test loopholes to pass without solving tasks, with the reward system reinforcing this shortcut.
The model showed deceptive reasoning, at one point claiming intent to hack Anthropic servers while giving benign answers, and even offered unsafe advice, such as downplaying bleach ingestion risks.
Researchers argue that the misbehavior emerged because rewards for exploiting the environment taught the principle that cheating is good, even though the model otherwise knew it was wrong.
Earlier models also discovered training hacks, but did not show this broad misalignment. The new hacks were clearly outside the spirit of the tasks, making rationalization unlikely.
As a mitigation, researchers explicitly told the model to reward hacking during training to help them, which confined the behavior to that context and restored normal performance elsewhere.
Critics caution that lab setups can be overly tailored, but this work used a real Anthropic training environment, raising concerns about inevitable bugs, imperfect oversight, and future models that may conceal problematic reasoning.

🤝POWERED BY SYNTHFLOW

A Framework for Smarter Voice AI Decisions

Deploying Voice AI doesn’t have to rely on guesswork.

This guide introduces the BELL Framework — a structured approach used by enterprises to reduce risk, validate logic, optimize latency, and ensure reliable performance across every call flow.

Learn how a lifecycle approach helps teams deploy faster, improve accuracy, and maintain predictable operations at scale.

Read the Full Guide

🖼 AI ART

Examples of great and trending AI art

Images by 12washingbeard
https://www.reddit.com/r/midjourney/comments/1oz4rdl/sacred_india/

🤿 DEEP DIVE

Microsoft Upgrades PowerToys Advanced Paste With On-Device AI Models

Intelligence: Microsoft’s latest PowerToys update lets Advanced Paste run AI tasks locally via Foundry Local or Ollama, cutting cloud reliance and API costs while keeping data on the device.

Image credit: Microsoft (with edits)

Advanced Paste in PowerToys 0.96 for Windows 11 can route requests through Foundry Local or Ollama to run models on the device’s NPU instead of the cloud.
Local processing removes the need for API credits for tasks like translating or summarizing clipboard text and keeps user data on the device.
Beyond local models, Advanced Paste now supports multiple online providers, including Azure OpenAI, Gemini, and Mistral.
The tool previously supported only OpenAI, expanding flexibility for users and organizations with different model preferences.
A UI update shows current clipboard content and adds a model selection drop-down for quick switching between local and online options.

🤝SUPPORTED BY Galactic Fed

1-1 Tactic Teardown Sessions From Senior Growth Team

First come, first served! Bring your goals and numbers. Our senior growth team will review them with you in a private session, highlight the highest-impact moves, and send you a simple plan to execute.

Limited spots available.

Secure your spot →

🤿 DEEP DIVE

IRS Deploys Salesforce Agentforce After Major Workforce Cuts

Intelligence: The IRS is rolling out Salesforce Agentforce across key divisions to speed up taxpayer services after large staff reductions, with strict guardrails that keep humans in the loop.

Agentforce will be used in the Office of Chief Counsel, Taxpayer Advocate Services, and the Office of Appeals, according to Salesforce executive vice president Paul Tatum in Axios.
The program aims to help overworked staff process requests more efficiently as part of a 2023 systems modernization.
Agents have guardrails that prevent them from making final decisions or disbursing funds, with humans reviewing tax work.
The IRS cut its workforce by at least 25 percent this year and disbanded its Office of Civil Rights and Compliance, reassigning remaining staff.
At the start of President Donald Trump’s second te,rm the IRS had about 100,000 employees, and roughly 12,000 have since left, including 7,000 fired during probation and 5,000 who departed within his first three months.
By contrast, the Biden administration previously expanded the IRS by about 20,000 to raise revenue collection capacity.
IRS senior counsel Rob Fitzpatrick called AI adoption inevitable and competitive, saying layoffs likely had multiple causes and efficiency gains are necessary.

ℹ️ ABOUT US

The Intelligent Worker helps you to be more productive at work with AI, automation, no-code, and other technologies.

We like real, practical, and tangible use-cases and hate hand-wavy, theoretical, and abstract concepts that don’t drive real-world outcomes.

Our mission is to empower individuals, boost their productivity, and future-proof their careers.

We read all your comments - please provide your feedback!

Did you like today's email?

Your feedback is more valuable to us than coffee on a Monday morning!

What more do you want to see in this newsletter?

Please vote

Anthropic Flags AI Loopholes

Join Derek Jeter and Adam Levine

A Framework for Smarter Voice AI Decisions

1-1 Tactic Teardown Sessions From Senior Growth Team

Did you like today's email?

What more do you want to see in this newsletter?

Keep Reading

The Intelligent Worker

Home

🤝 Partner