"Tech Prophet" Who Predicted the iPhone Now Predicts...

George Gilder - who predicted the iPhone 17 years early and gave Reagan the first microchip - is making his boldest call yet. He says an American nanotech "super-convergence" could mint more millionaires than any event in recent memory. He's found 3 stocks set to benefit before November 18's bombshell.

AI 'gold rush' for chatbot training data could run out of human-written text

MATT O'BRIEN
June 06, 2024

Artificial intelligence systems like ChatGPT could soon run out of what keeps making them smarter -- the tens of trillions of words people have written and shared online.

A new study released Thursday by research group Epoch AI projects that tech companies will exhaust the supply of publicly available training data for AI language models by roughly the turn of the decade -- sometime between 2026 and 2032.

Comparing it to a "literal gold rush" that depletes finite natural resources, Tamay Besiroglu, an author of the study, said the AI field might face challenges in maintaining its current pace of progress once it drains the reserves of human-generated writing.

In the short term, tech companies like ChatGPT-maker OpenAI and Google are racing to secure and sometimes pay for high-quality data sources to train their AI large language models - for instance, by signing deals to tap into the steady flow of sentences coming out of Reddit forums and news media outlets.

In the longer term, there won't be enough new blogs, news articles and social media commentary to sustain the current trajectory of AI development, putting pressure on companies to tap into sensitive data now considered private -- such as emails or text messages -- or relying on less-reliable "synthetic data" spit out by the chatbots themselves.

"There is a serious bottleneck here," Besiroglu said. "If you start hitting those constraints about how much data you have, then you can't really scale up your models efficiently anymore. And scaling up models has been probably the most important way of expanding their capabilities and improving the quality of their output."

The researchers first made their projections two years ago -- shortly before ChatGPT's debut -- in a working paper that forecast a more imminent 2026 cutoff of high-quality text data. Much has changed since then, including new techniques that enabled AI researchers to make better use of the data they already have and sometimes "overtrain" on the same sources multiple times.

But there are limits, and after further research, Epoch now foresees running out of public text data sometime in the next two to eight years.

The team's latest study is peer-reviewed and due to be presented at this summer's International Conference on Machine Learning in Vienna, Austria. Epoch is a nonprofit institute hosted by San Francisco-based Rethink Priorities and funded by proponents of effective altruism -- a philanthropic movement that has poured money into mitigating AI's worst-case risks.

Besiroglu said AI researchers realized more than a decade ago that aggressively expanding two key ingredients -- computing power and vast stores of internet data -- could significantly improve the performance of AI systems.

The amount of text data fed into AI language models has been growing about 2.5 times per year, while computing has grown about 4 times per year, according to the Epoch study. Facebook parent company Meta Platforms recently claimed the largest version of their upcoming Llama 3 model -- which has not yet been released -- has been trained on up to 15 trillion tokens, each of which can represent a piece of a word.

But how much it's worth worrying about the data bottleneck is debatable.

"I think it's important to keep in mind that we don't necessarily need to train larger and larger models," said Nicolas Papernot, an assistant professor of computer engineering at the University of Toronto and researcher at the nonprofit Vector Institute for Artificial Intelligence.

Papernot, who was not involved in the Epoch study, said building more skilled AI systems can also come from training models that are more specialized for specific tasks. But he has concerns about training generative AI systems on the same outputs they're producing, leading to degraded performance known as "model collapse."

Training on AI-generated data is "like what happens when you photocopy a piece of paper and then you photocopy the photocopy. You lose some of the information," Papernot said. Not only that, but Papernot's research has also found it can further encode the mistakes, bias and unfairness that's already baked into the information ecosystem.

If real human-crafted sentences remain a critical AI data source, those who are stewards of the most sought-after troves -- websites like Reddit and Wikipedia, as well as news and book publishers -- have been forced to think hard about how they're being used.

"Maybe you don't lop off the tops of every mountain," jokes Selena Deckelmann, chief product and technology officer at the Wikimedia Foundation, which runs Wikipedia. "It's an interesting problem right now that we're having natural resource conversations about human-created data. I shouldn't laugh about it, but I do find it kind of amazing."

While some have sought to close off their data from AI training -- often after it's already been taken without compensation -- Wikipedia has placed few restrictions on how AI companies use its volunteer-written entries. Still, Deckelmann said she hopes there continue to be incentives for people to keep contributing, especially as a flood of cheap and automatically generated "garbage content" starts polluting the internet.

AI companies should be "concerned about how human-generated content continues to exist and continues to be accessible," she said.

From the perspective of AI developers, Epoch's study says paying millions of humans to generate the text that AI models will need "is unlikely to be an economical way" to drive better technical performance.

As OpenAI begins work on training the next generation of its GPT large language models, CEO Sam Altman told the audience at a United Nations event last month that the company has already experimented with "generating lots of synthetic data" for training.

"I think what you need is high-quality data. There is low-quality synthetic data. There's low-quality human data," Altman said. But he also expressed reservations about relying too heavily on synthetic data over other technical methods to improve AI models.

"There'd be something very strange if the best way to train a model was to just generate, like, a quadrillion tokens of synthetic data and feed that back in," Altman said. "Somehow that seems inefficient."

------------

The Associated Press and OpenAI have a licensing and technology agreement that allows OpenAI access to part of AP's text archives.

Continue Reading...

Popular

Trump Nominates Jared Isaacman For NASA Administrator — Elon Musk Reacts

Elon Musk's ally Jared Isaacman nominated for NASA Administrator by President Donald Trump amid SpaceX's Artemis push.

EPAM Confident On Growth, Initiates $1 Billion Stock Buyback

EPAM Systems (EPAM) stock gained 4.41% after reporting strong third-quarter 2025 results, with sales of $1.394 billion.

Trump's Next Ban - Coming January 19, 2026 (shocking) - Ad

On Jan 19, 2026, Trump is expected to sign an order banning exports of a material every tech firm needs. Not chips-but without it, tech stops. It's his move for U.S. tech dominance. $2T already committed by giants like Apple and NVIDIA. Weeks remain to position before the shift.

IRS Direct File won't be available next year. Here's what that means for taxpayers

WASHINGTON (AP) — IRS Direct File, the electronic system for filing tax returns for free, will not be offered next year, the Trump administration has confirmed.

Trump tariffs face Supreme Court test in trillion-dollar test of executive power

WASHINGTON (AP) — President power to unilaterally impose far-reaching is coming before the Supreme Court on Wednesday in a pivotal test of executive power with trillion-dollar implications for the global economy.

Why Are 21 Billionaires Moving Their Money ASAP? - Ad

One of the biggest stock market events in 25 years is rapidly unfolding... The economist who predicted the 2008 Financial Crisis says it will be: "The Biggest Crash of Our Lifetime." Starting November 19 it could cut the entire tech marketing by HALF.

Wall Street Enters Its Strongest Month: These 7 Stocks Often Crush It

November is historically Wall Street's strongest month. These 7 stocks often delivered standout gains in recent decades.

Trump's $300 Million White House Ballroom Could Soon Bear A Very Familiar Name

President Donald Trump is reportedly planning to name his new $300 million White House ballroom after himself.

The $43B Big Pharma Story is Starting Over-With a New Player - Ad

Big Pharma once paid $43B for a small biotech with a similar platform. Now, a new company is following that same playbook, leveraging its patented delivery technology to attract partnerships and near-term revenue potential.

Dave Ramsey Reveals Why Millionaires Crush Mortgages Early

Dave Ramsey advises buying a home with cash, citing his firm's survey of millionaires who often retire their mortgages in about 10 years.

Trump's New AI Executive Order Will Spark a $10 Trillion Boom - Ad

On July 23, 2025, Trump launched America's AI Action Plan to secure global tech dominance. Morgan Stanley calls it a $10T boom. It's the closest thing to a government-backed wealth transfer-and your chance to get in early.

Why Did MediciNova Stock (MNOV) Jump Over 87% In After-Hours Trading?

MediciNova shares soared over 87% in after-hours trading on Thursday following the publication of promising research.

Ford Expedition vs GMC Yukon, an Edmunds big SUV comparison

Let’s say you need a vehicle with room for more than five passengers, a large cargo area, and the ability to pull a heavy trailer. A full-size three-row SUV should work well, but which one? The most significant news this year is the redesigned . Ford has given its biggest SUV a complete makeover with a more upscale interior, new technology features and a novel split-opening tailgate.

Trump Just Got Authority to BAN This Critical Export - Ad

Trump's next move could reshape tech. He's expected to restrict exports of a key material The New York Times says "powers the world's tech." Action expected Jan 19. $10 trillion at stake. Chipmakers may be forced onshore. A few U.S. firms could see massive gains if you act early.

Shutdown stalemate set to drag into sixth week as Trump pushes Republicans to change Senate rules

WASHINGTON (AP) — Republicans and Democrats remained at a stalemate on the over the weekend as it headed into its sixth week, with for millions of Americans and President Donald Trump pushing GOP leaders to change Senate rules to end it.

Coeur Mining's $7 Billion Deal For New Gold Creates $20 Billion Powerhouse

Coeur Mining Inc. (CDE) announced acquisition of New Gold Inc. (NGD) in all-stock deal valued at $7 billion.

Elon's New Device Could Launch Biggest IPO of the Decade - Ad

Elon Musk's new device is being called a "game-changer"-and even the White House is using this tech. Jeff Brown says it could launch Musk's next trillion-dollar company and make early investors rich. You can claim a stake now for as little as $500.

Stability AI largely wins UK court battle against Getty Images over copyright and trademark

LONDON (AP) — Artificial intelligence company Stability AI mostly prevailed against Getty Images Tuesday in a British .

The Tesla Shock Nobody Sees Coming - Ad

While headlines scream "Tesla is doomed"...Jeff Brown has uncovered a revolutionary AI breakthrough buried inside Tesla's labs. One that is helping AI escape from our computer screens and manifest itself here in the real world all while creating a 25,000% growth market explosion starting as early as January 29.

DraftKings Q3 Preview: Record NFL Betting Expected, Will Prediction Markets Hurt Results, Guidance?

DraftKings is likely to highlight its prediction market moves alongside Q3 results Thursday. Analysts and investors could also ask about the overall prediction market sector growth.

Tempus AI Stock (TEM) Slides 6% Overnight: Here's Why The Stock Is Trending

Tempus AI shares fell 6.02% in after-hours trading Tuesday following its third-quarter earnings report.

On November 18, a powerful new law signed by President Trump will trigger a radical shift in America's money system... - Ad

When a small group of private companies - not the Fed - will perform a major mint of a new kind of money. And those who act before this new system fully kicks in could see gains as high as 40X by 2032. But those who fail to prepare will be blindsided by this sea change to the U.S. dollar.

Donald Trump Jr.-Linked Drone Maker Unusual Machines Wins Major Pentagon Deal

Unusual Machines secures its largest Pentagon drone contract to date, with Trump Jr. serving as an adviser but not involved in the deal.

November 18: D-Day For The Dollar - Ad

A quiet shift in U.S. law has just authorized private companies to mint a new form of government-authorized money called the "Dollar 2.0"... and the next major mint hits on November 18. Investors who make the right moves before then could make up to 40X by 2032...

S&P Global Boosts Outlook As CEO Hails Exceptional Growth

S&P Global (NYSE: SPGI) shares surge after reporting strong Q3 results, beating expectations with adjusted earnings of $4.73 per share.

Bitcoin Below $102,000 As 'Extreme Fear' Sentiment Takes Down Ethereum, XRP, Dogecoin

Bitcoin is trading below $102,000 on Wednesday, with the Fear and Greed Index dropping to extreme fear at 20. Over the past 24 hours, crypto markets saw over $1.7 billion in liquidations.

"Tech Prophet" Who Predicted the iPhone Now Predicts... - Ad

George Gilder - who predicted the iPhone 17 years early and gave Reagan the first microchip - is making his boldest call yet. He says an American nanotech "super-convergence" could mint more millionaires than any event in recent memory. He's found 3 stocks set to benefit before November 18's bombshell.

What to know as the annual sign-up window for health insurance arrives

Higher prices, less help and all hang over health insurance markets as shoppers start looking for coverage this week.

What to know about the deadly UPS plane crash in Kentucky

LOUISVILLE, Ky. (AP) — At least seven people are dead and 11 others injured after a caught fire and crashed Tuesday while taking off from the company's distribution hub in Louisville, leaving a trail of flames near the runway.

Trump's Next Ban - Coming January 19, 2026 (shocking) - Ad

On Jan 19, 2026, Trump is expected to sign an order banning exports of a material every tech firm needs. Not chips-but without it, tech stops. It's his move for U.S. tech dominance. $2T already committed by giants like Apple and NVIDIA. Weeks remain to position before the shift.

Billie Eilish Asks Zuckerbeg, Other Billionaires To Donate

Billie Eilish urges billionaires to share their wealth for good causes during award acceptance speech. Mark Zuckerberg in attendance.

Why Are First Solar Shares Surging On Friday?

First Solar beats Q3 sales estimates and reports record module shipments. Analyst raises price target to $150 despite supply chain challenges.

Why Are 21 Billionaires Moving Their Money ASAP? - Ad

One of the biggest stock market events in 25 years is rapidly unfolding... The economist who predicted the 2008 Financial Crisis says it will be: "The Biggest Crash of Our Lifetime." Starting November 19 it could cut the entire tech marketing by HALF.

Trump Warns Supreme Court Tariff Ruling Could 'Literally Destroy' The US: 'We'll Be Struggling For Years'

President Donald Trump warned that the U.S. economy would face significant challenges if the Supreme Court rules against the majority of the tariffs imposed this year.

The $43B Big Pharma Story is Starting Over-With a New Player - Ad

Big Pharma once paid $43B for a small biotech with a similar platform. Now, a new company is following that same playbook, leveraging its patented delivery technology to attract partnerships and near-term revenue potential.

Bloom Energy Stock Surges Nearly 17% In Wednesday Pre-Market: What's Going On?

Shares of Bloom Energy Corp soared 18.68% in pre-market trading on Wednesday as third-quarter results surpassed analyst estimates.

Trending Now

Information, charts or examples are for illustration and educational purposes only and not for individualized investment management This message contains commercial elements, such as advertising. We only send these offers to those who have opted in to our newsletter. Past performance is not indicative of future results. For these reasons we strongly suggest trading in a DEMO/Simulated account. The information provided by us is for educational and informational purposes only. We make no representations or warranties concerning the products, practices or procedures of any company or entity mentioned or recommended and have not determined if the statements and opinions of the advertiser are accurate, correct or truthful. If you use, act upon or make decisions in reliance on information contained or any external source linked within it, you do so at your own peril and agree to hold us, our officers, directors, shareholders, affiliates and agents without fault.

Copyright markethundred.com
Privacy Policy | Terms of Service