He Called Bitcoin to $100k... Now He Says This Coin Is Next

Juan Villaverde may be America's top crypto expert. According to Juan's timing model, we are entering what could be the biggest bull market in crypto's history. With America's first crypto president in office, it could be the biggest gains the market has ever seen. And one special coin could skyrocket.

AI 'gold rush' for chatbot training data could run out of human-written text

MATT O'BRIEN
June 06, 2024

Artificial intelligence systems like ChatGPT could soon run out of what keeps making them smarter -- the tens of trillions of words people have written and shared online.

A new study released Thursday by research group Epoch AI projects that tech companies will exhaust the supply of publicly available training data for AI language models by roughly the turn of the decade -- sometime between 2026 and 2032.

Comparing it to a "literal gold rush" that depletes finite natural resources, Tamay Besiroglu, an author of the study, said the AI field might face challenges in maintaining its current pace of progress once it drains the reserves of human-generated writing.

In the short term, tech companies like ChatGPT-maker OpenAI and Google are racing to secure and sometimes pay for high-quality data sources to train their AI large language models - for instance, by signing deals to tap into the steady flow of sentences coming out of Reddit forums and news media outlets.

In the longer term, there won't be enough new blogs, news articles and social media commentary to sustain the current trajectory of AI development, putting pressure on companies to tap into sensitive data now considered private -- such as emails or text messages -- or relying on less-reliable "synthetic data" spit out by the chatbots themselves.

"There is a serious bottleneck here," Besiroglu said. "If you start hitting those constraints about how much data you have, then you can't really scale up your models efficiently anymore. And scaling up models has been probably the most important way of expanding their capabilities and improving the quality of their output."

The researchers first made their projections two years ago -- shortly before ChatGPT's debut -- in a working paper that forecast a more imminent 2026 cutoff of high-quality text data. Much has changed since then, including new techniques that enabled AI researchers to make better use of the data they already have and sometimes "overtrain" on the same sources multiple times.

But there are limits, and after further research, Epoch now foresees running out of public text data sometime in the next two to eight years.

The team's latest study is peer-reviewed and due to be presented at this summer's International Conference on Machine Learning in Vienna, Austria. Epoch is a nonprofit institute hosted by San Francisco-based Rethink Priorities and funded by proponents of effective altruism -- a philanthropic movement that has poured money into mitigating AI's worst-case risks.

Besiroglu said AI researchers realized more than a decade ago that aggressively expanding two key ingredients -- computing power and vast stores of internet data -- could significantly improve the performance of AI systems.

The amount of text data fed into AI language models has been growing about 2.5 times per year, while computing has grown about 4 times per year, according to the Epoch study. Facebook parent company Meta Platforms recently claimed the largest version of their upcoming Llama 3 model -- which has not yet been released -- has been trained on up to 15 trillion tokens, each of which can represent a piece of a word.

But how much it's worth worrying about the data bottleneck is debatable.

"I think it's important to keep in mind that we don't necessarily need to train larger and larger models," said Nicolas Papernot, an assistant professor of computer engineering at the University of Toronto and researcher at the nonprofit Vector Institute for Artificial Intelligence.

Papernot, who was not involved in the Epoch study, said building more skilled AI systems can also come from training models that are more specialized for specific tasks. But he has concerns about training generative AI systems on the same outputs they're producing, leading to degraded performance known as "model collapse."

Training on AI-generated data is "like what happens when you photocopy a piece of paper and then you photocopy the photocopy. You lose some of the information," Papernot said. Not only that, but Papernot's research has also found it can further encode the mistakes, bias and unfairness that's already baked into the information ecosystem.

If real human-crafted sentences remain a critical AI data source, those who are stewards of the most sought-after troves -- websites like Reddit and Wikipedia, as well as news and book publishers -- have been forced to think hard about how they're being used.

"Maybe you don't lop off the tops of every mountain," jokes Selena Deckelmann, chief product and technology officer at the Wikimedia Foundation, which runs Wikipedia. "It's an interesting problem right now that we're having natural resource conversations about human-created data. I shouldn't laugh about it, but I do find it kind of amazing."

While some have sought to close off their data from AI training -- often after it's already been taken without compensation -- Wikipedia has placed few restrictions on how AI companies use its volunteer-written entries. Still, Deckelmann said she hopes there continue to be incentives for people to keep contributing, especially as a flood of cheap and automatically generated "garbage content" starts polluting the internet.

AI companies should be "concerned about how human-generated content continues to exist and continues to be accessible," she said.

From the perspective of AI developers, Epoch's study says paying millions of humans to generate the text that AI models will need "is unlikely to be an economical way" to drive better technical performance.

As OpenAI begins work on training the next generation of its GPT large language models, CEO Sam Altman told the audience at a United Nations event last month that the company has already experimented with "generating lots of synthetic data" for training.

"I think what you need is high-quality data. There is low-quality synthetic data. There's low-quality human data," Altman said. But he also expressed reservations about relying too heavily on synthetic data over other technical methods to improve AI models.

"There'd be something very strange if the best way to train a model was to just generate, like, a quadrillion tokens of synthetic data and feed that back in," Altman said. "Somehow that seems inefficient."

------------

The Associated Press and OpenAI have a licensing and technology agreement that allows OpenAI access to part of AP's text archives.

Continue Reading...

Popular

Shaq Rolls Up In A Cybertruck — Elon Musk Has This One-Word Response

Elon Musk gave a one-word response to NBA legend Shaquille O'Neal's customized Tesla Cybertruck.

Elon Musk's Ex-Wife Spills the Truth About Becoming a Billionaire

Justine Musk, the first wife of billionaire Elon Musk, offered her unique perspective on the journey to becoming a billionaire.

Elon Musk Drops Stunning Bombshell? - Ad

Behind closed doors, Musk revealed a game-changing breakthrough technology that could shake the tech world & crush major companies. Why is no one talking about it? Luckily, we had a man on the inside - watch now to see the details & how to profit!

Marjorie Taylor Greene Goes Stock Shopping Again: Here Are 50+ Stocks The Congresswoman Bought

Marjorie Taylor Greene disclosed buying more than 50 stocks in early May. A look at the list and why the congresswoman's past trades have drawn red flags.

Kindly MD Stock Soars Over 300% After Announcing Merger And Bitcoin Treasury Plans

Kindly MD, Inc. (NASDAQ:KDLY) shares are trading higher Monday after the company announced a merger agreement with Nakamoto Holdings to establish a Bitcoin (CRYPTO: BTC)

A Historic Gold Announcement Is About to Rock Wall Street - Ad

The greatest investor of all time is about to validate what Garrett Goggin has been saying for months: Gold is entering a once-in-a-generation mania. Front-running Buffett has never been more urgent - and four tiny miners could be your ticket to 100X gains.

In Spain, a homelessness crisis unfolds in Madrid's airport

MADRID (AP) — Every morning at 6 a.m., Teresa sets out in search of work, a shower and a bit of exercise before she returns home. For around six months, that has been Terminal 4 of .

Trump's trade demands go beyond tariffs to target perceived unfair practices

FRANKFURT, Germany (AP) — The Trump administration says the sweeping tariffs it unveiled April 2, then , have a simple goal: Force other countries to drop their trade barriers to U.S. goods.

He Called Bitcoin to $100k... Now He Says This Coin Is Next - Ad

Juan Villaverde may be America's top crypto expert. According to Juan's timing model, we are entering what could be the biggest bull market in crypto's history. With America's first crypto president in office, it could be the biggest gains the market has ever seen. And one special coin could skyrocket.

Madison Square Garden Delivers Revenue Beat, Sees Solid Year Ahead As Income Rises

Madison Square Garden Entertainment reported its fiscal third-quarter 2025 results Tuesday. The stock gained after the report.

Applied Materials, Cava Group, Coinbase, Quantum Computing, And Meta Platforms: Why These 5 Stocks Are On Investors' Radars Today

U.S. stocks experienced mixed trading on Thursday, the Dow Jones Industrial Average rose 0.65% to 42,322.75, while the S&P 500 gained 0.41% to 5,916.93. The tech-heavy Nasdaq declined nearly 0.2% to 19,112.32. These are the top stocks that gained the attention of retail traders and investors throughout the day:

Biggest Dividend Payout in U.S. History: $1 Trillion up for Grabs! - Ad

Trump just launched a $1 trillion National Investment Fund to replace income taxes and send direct payouts to Americans. You could claim up to $21,307--before the first public checks go out. This is historic. Act now to be first in line.

Associated Press finds 'no definitive evidence' to change credit for famous Vietnam War photo

Months after the release of a who took an iconic Vietnam War image of a naked girl running from a napalm attack, The Associated Press said Tuesday it had found “no definitive evidence” to warrant changing a nearly 52-year-old photo credit.

Super Micro Computer Stock Is Tumbling Wednesday: What's Going On?

Super Micro Computer shares are trading lower Wednesday after the company reported worse-than-expected third-quarter financial results on Tuesday after the market closed.

Memorial Day Offer: Discover the Altcoin Trump May Be Backing - For Just $19 - Ad

Trump just launched a national crypto reserve-and it could include a little-known altcoin backed by Google, Visa, and PayPal. One top crypto analyst says it could be the next Bitcoin. This Memorial Day weekend only, get full access for just $19-an 85% discount. Don't miss it.

Tempus AI (TEM) Stock Lower Ahead Of Q1 Earnings Report, Notetaker AI Debut

Tempus AI shares are trading lower by 1.3% Tuesday afternoon. The company will report its Q1 earnings after Tuesday's market close.

Elon Musk Reacts After Ray Dalio Warns Of US Decline, US Retailers Push Chinese Suppliers To Resume Shipments And More: This Week In Economics

The weekend saw major developments, including Elon Musk's response to Ray Dalio's warning about U.S. decline, U.S. retailers pushing for resumed Chinese shipments, Musk's mixed feelings about his first 100 days, and the Nasdaq's impressive recovery.

Is the President Playing 4D-Chess? - Ad

"Trump is purposefully CRASHING the market." That's the exact title of a video the President posted on Truth Social. To "Market Wizard" Larry Benedict, it was confirmation. Now he's revealing how he went 13 for 13 on trades-and the 3 tickers he's watching before the next Trump Trigger hits.

'Thunderbolts' kicks off the summer movie season with $76 million at the box office

NEW YORK (AP) — Marvel Studios’ opened with $76 million in domestic ticket sales, according to studio estimates Sunday, kicking off with a solid No. 1 debut that fell shy of Marvel’s more spectacular launches.

Cyprus and Israel seek to quickly establish an electricity linkup via an undersea cable

NICOSIA, Cyprus (AP) — Cyprus and neighboring Israel are seeking to swiftly establish an electricity linkup via an undersea cable that would eliminate their respective energy isolation, an official said Monday.

The Clock Is Ticking as a Potential $194 Trillion Shockwave Barrels Toward the Market - Ad

The markets feel like they're on the edge of chaos. Many Americans are looking at their retirement accounts worried. One Wall Street insider sees a pattern to the chaos... and it all ties back to Trump. Larry Benedict says the next big shock could hit as soon as June 18-but there's still time to prepare.

How Much Inventory Did Companies Actually Build Ahead Of Tariffs?

Award-winning Author Sam Ro discusses the inventory buildup of companies ahead of tariffs - and why it matters...

Waldencast Q1 Preview: Beauty Brand Glow Dims On Soft Margins, Flat Sales

WALD is reporting Q1 results on May 13, Telsey Advisory Group expects margin and earnings miss. Analyst Dana Telsey maintains outperform rating.

No.1 Opportunity for 2025 [Take Action Now!] - Ad

Starlink's potential IPO Could Be the Biggest in History--Silicon Valley insider James Altucher has uncovered a way to profit BEFORE the IPO--with as little as $50. Musk's $180B giant is set to launch--will you miss out? Claim your spot before it's too late.

India's leader Modi touted all was well in Kashmir. A massacre of tourists shattered that claim

SRINAGAR, India (AP) — Hundreds of Indian tourists, families and honeymooners, drawn by the breathtaking Himalayan beauty, were enjoying a picture-perfect meadow in Kashmir. They didn’t know gunmen in army fatigues were lurking in the woods.

Why Elon Musk's Secret Project is Bad News for China - Ad

Beijing's plan to strike back at Trump goes far beyond tariffs. China is putting a master plan in action that could lead to the end of the American empire. But even though the media is focusing on Trump's tariffs, Elon Musk is working on a secret project that could secure America's dominance for years.

You Can Take a Stake in Elon's xAI Before June 1st... - Ad

Elon Musk's private AI firm xAI may have just leapfrogged ChatGPT and Google's Gemini--building what experts call the future of AI. For the first time, you can take a stake starting at $500.

Elon Musk's Tesla Teases FSD Capabilities, Jeff Bezos-Backed Slate's 100,000 EV Orders Challenge Cybertruck Hype And More: This Week In Mobility

The weekend saw major EV and autonomy developments, with Tesla teasing FSD ahead of its Robotaxi event, Uber and Waymo pushing self-driving tech, Bezos-backed Slate challenging Cybertruck with 100K orders, and Republicans introducing a bill to roll back EV incentives.

China Rejects US Blame For Fentanyl Crisis, Calls Tariff Penalties Unreasonable Amid Easing Trade Tensions

China firmly rejected responsibility for the U.S. fentanyl crisis on Tuesday, calling Washington's punitive tariffs "unreasonable" even as broader trade tensions between the world's two largest economies show signs of easing.

$24 Trillion Robotics Opportunity Revealed - Ad

Robots aren't coming to America in 2025-they're already here. Oxford Economics says, "The Robotics Revolution we predicted has arrived." Forbes calls it "a $24 trillion opportunity." One $7 stock is critical to a robot backed by Amazon, Tesla, Microsoft, and Google.

ESPN says its direct-to-consumer streaming service will debut in September at $29.99 a month

NEW YORK (AP) — ESPN said Tuesday that its new all-encompassing streaming service will take on a familiar name — ESPN — and launch in September at an initial price of $29.99 per month.

Honda Canada postpones multibillion EV investment project in Ontario

TORONTO (AP) — Honda Canada will postpone a $15-billion Canadian (US$10.7 billion) electric vehicle investment project in Canada's most populous province, including a proposed EV battery plant and retooled vehicle assembly facility.

How High Will Bitcoin Go In 2025? - Ad

A new coin is emerging in the crypto bull market. Investing now could be like buying Bitcoin in 2013. Before it took off. Or scooping up Ethereum in 2017 ... before it soared.

Trump says the US will stop bombing Yemen's Houthis after rebels say they'll stop targeting ships

WASHINGTON (AP) — President said Tuesday that he's ordering a halt to nearly two months of U.S. airstrikes on Yemen's Houthis, saying that the have indicated that “they don’t want to fight anymore” and have pledged to stop attacking ships along a vital maritime corridor.

Nvidia's Jensen Huang Doesn't Want You To Order A Shake At Denny's, 'Get A Coke' He Says

Jensen Huang, co-founder of Nvidia, credits his shy teenage years as a Denny's busboy for developing people skills. Now a billionaire, Nvidia has been dominating the AI industry with a projected $400B in data-center spending next year.

$24 Trillion Robotics Opportunity Revealed - Ad

Robots aren't coming to America in 2025-they're already here. Oxford Economics says, "The Robotics Revolution we predicted has arrived." Forbes calls it "a $24 trillion opportunity." One $7 stock is critical to a robot backed by Amazon, Tesla, Microsoft, and Google.

Republicans look to get Trump's big bill back on track with rare Sunday committee session

WASHINGTON (AP) — Republicans will look to get their massive tax cut and border security package back on track during a rare Sunday night committee meeting after that same panel advancing the measure two days earlier, a setback that Speaker Mike Johnson is looking to reverse quickly.

Trending Now

Information, charts or examples are for illustration and educational purposes only and not for individualized investment management This message contains commercial elements, such as advertising. We only send these offers to those who have opted in to our newsletter. Past performance is not indicative of future results. For these reasons we strongly suggest trading in a DEMO/Simulated account. The information provided by us is for educational and informational purposes only. We make no representations or warranties concerning the products, practices or procedures of any company or entity mentioned or recommended and have not determined if the statements and opinions of the advertiser are accurate, correct or truthful. If you use, act upon or make decisions in reliance on information contained or any external source linked within it, you do so at your own peril and agree to hold us, our officers, directors, shareholders, affiliates and agents without fault.

Copyright markethundred.com
Privacy Policy | Terms of Service