Copper's Spiking - And This Junior Has the Grades to Match

Copper is climbing again - and this time, it looks tariff-proof. A $31M junior just posted nearly 1B lbs of copper equivalent. With insiders and institutions piling in, this could be the next breakout.

AI 'gold rush' for chatbot training data could run out of human-written text

MATT O'BRIEN
June 06, 2024

Artificial intelligence systems like ChatGPT could soon run out of what keeps making them smarter -- the tens of trillions of words people have written and shared online.

A new study released Thursday by research group Epoch AI projects that tech companies will exhaust the supply of publicly available training data for AI language models by roughly the turn of the decade -- sometime between 2026 and 2032.

Comparing it to a "literal gold rush" that depletes finite natural resources, Tamay Besiroglu, an author of the study, said the AI field might face challenges in maintaining its current pace of progress once it drains the reserves of human-generated writing.

In the short term, tech companies like ChatGPT-maker OpenAI and Google are racing to secure and sometimes pay for high-quality data sources to train their AI large language models - for instance, by signing deals to tap into the steady flow of sentences coming out of Reddit forums and news media outlets.

In the longer term, there won't be enough new blogs, news articles and social media commentary to sustain the current trajectory of AI development, putting pressure on companies to tap into sensitive data now considered private -- such as emails or text messages -- or relying on less-reliable "synthetic data" spit out by the chatbots themselves.

"There is a serious bottleneck here," Besiroglu said. "If you start hitting those constraints about how much data you have, then you can't really scale up your models efficiently anymore. And scaling up models has been probably the most important way of expanding their capabilities and improving the quality of their output."

The researchers first made their projections two years ago -- shortly before ChatGPT's debut -- in a working paper that forecast a more imminent 2026 cutoff of high-quality text data. Much has changed since then, including new techniques that enabled AI researchers to make better use of the data they already have and sometimes "overtrain" on the same sources multiple times.

But there are limits, and after further research, Epoch now foresees running out of public text data sometime in the next two to eight years.

The team's latest study is peer-reviewed and due to be presented at this summer's International Conference on Machine Learning in Vienna, Austria. Epoch is a nonprofit institute hosted by San Francisco-based Rethink Priorities and funded by proponents of effective altruism -- a philanthropic movement that has poured money into mitigating AI's worst-case risks.

Besiroglu said AI researchers realized more than a decade ago that aggressively expanding two key ingredients -- computing power and vast stores of internet data -- could significantly improve the performance of AI systems.

The amount of text data fed into AI language models has been growing about 2.5 times per year, while computing has grown about 4 times per year, according to the Epoch study. Facebook parent company Meta Platforms recently claimed the largest version of their upcoming Llama 3 model -- which has not yet been released -- has been trained on up to 15 trillion tokens, each of which can represent a piece of a word.

But how much it's worth worrying about the data bottleneck is debatable.

"I think it's important to keep in mind that we don't necessarily need to train larger and larger models," said Nicolas Papernot, an assistant professor of computer engineering at the University of Toronto and researcher at the nonprofit Vector Institute for Artificial Intelligence.

Papernot, who was not involved in the Epoch study, said building more skilled AI systems can also come from training models that are more specialized for specific tasks. But he has concerns about training generative AI systems on the same outputs they're producing, leading to degraded performance known as "model collapse."

Training on AI-generated data is "like what happens when you photocopy a piece of paper and then you photocopy the photocopy. You lose some of the information," Papernot said. Not only that, but Papernot's research has also found it can further encode the mistakes, bias and unfairness that's already baked into the information ecosystem.

If real human-crafted sentences remain a critical AI data source, those who are stewards of the most sought-after troves -- websites like Reddit and Wikipedia, as well as news and book publishers -- have been forced to think hard about how they're being used.

"Maybe you don't lop off the tops of every mountain," jokes Selena Deckelmann, chief product and technology officer at the Wikimedia Foundation, which runs Wikipedia. "It's an interesting problem right now that we're having natural resource conversations about human-created data. I shouldn't laugh about it, but I do find it kind of amazing."

While some have sought to close off their data from AI training -- often after it's already been taken without compensation -- Wikipedia has placed few restrictions on how AI companies use its volunteer-written entries. Still, Deckelmann said she hopes there continue to be incentives for people to keep contributing, especially as a flood of cheap and automatically generated "garbage content" starts polluting the internet.

AI companies should be "concerned about how human-generated content continues to exist and continues to be accessible," she said.

From the perspective of AI developers, Epoch's study says paying millions of humans to generate the text that AI models will need "is unlikely to be an economical way" to drive better technical performance.

As OpenAI begins work on training the next generation of its GPT large language models, CEO Sam Altman told the audience at a United Nations event last month that the company has already experimented with "generating lots of synthetic data" for training.

"I think what you need is high-quality data. There is low-quality synthetic data. There's low-quality human data," Altman said. But he also expressed reservations about relying too heavily on synthetic data over other technical methods to improve AI models.

"There'd be something very strange if the best way to train a model was to just generate, like, a quadrillion tokens of synthetic data and feed that back in," Altman said. "Somehow that seems inefficient."

------------

The Associated Press and OpenAI have a licensing and technology agreement that allows OpenAI access to part of AP's text archives.

Continue Reading...

Popular

Trump keeps saying the GOP mega bill will eliminate taxes on Social Security. It does not

WASHINGTON (AP) — President keeps saying that Republicans' mega legislation will eliminate taxes on federal benefits.

How to Hack a $1.3T Market - Ad

Forget concrete. The new foundation for real estate success is digital, and Pacaso leads the way. Their tech unlocks a $1.3T real estate market. They've already earned $110M+ in gross profits in their operating history and reserved the Nasdaq ticker PCSO.

Elon Musk-Led Tesla Sales Surge 12% In The UK During June: Q2 Deliveries Beat Analyst Estimates

Tesla UK sales surge 12% in June, with 7,891 units sold as Q2 delivery figures exceed expectations. Rival BYD also sees growth.

Bernie Sanders Shares First Thing He Would Do As President — And It Could Be A Jab At Elon Musk

Bernie Sanders lost the 2016 and 2020 presidential races. In a recent interview, Sanders shared what he would do first if he were president.

A Historic Gold Announcement Is About to Rock Wall Street - Ad

The greatest investor of all time is about to validate what Garrett Goggin has been saying for months: Gold is entering a once-in-a-generation mania. Front-running Buffett has never been more urgent - and four tiny miners could be your ticket to 100X gains.

Amazon's AI-Powered 4-Day Prime Day Will Drive Billions In Sales

Bank of America analyst maintains Buy rating on Amazon with $248 price forecast, noting 11th Prime Day event to have extended 4-day window with AI savings tools and estimated $21.4B in GMV.

This Rare Metal Spiked 300%-And One Tiny Firm May Be Riding the Surge - Ad

Antimony prices soared last year, but few investors noticed. One firm, sitting on a historic deposit, is perfectly positioned to benefit as demand grows for defense.

Boeing Rises 2.8% After Key Trading Signal

A significant trading signal occurred for Boeing stock, as it demonstrated a power inflow at $209.02, after which BA rose up to 2.8%.

Asia Markets Mixed, Europe Slips, Dollar Weakens Further As Tariff Deadline Looms - Global Markets Today While US Slept

U.S. markets up on trade deals and rate cut hopes, but concerns persist. Mixed economic data, Central bank expectations, and geopolitical risks affect performance. Asia and Europe markets mostly lower, commodities mixed, USD weak on trade uncertainty.

Buffett's Favorite Chart Just Hit 209% - Here's What That Means For Gold - Ad

Buffett's favorite market signal just hit its highest level in history-stocks are more overvalued than 1929. He's sitting on $325B in cash... and may be about to buy one overlooked gold miner. I've identified it-plus 4 others with up to 100X potential.

Air India faces turbulence as plane crash prompts deeper checks and disruptions

NEW DELHI (AP) — Air India is facing fresh turbulence following last week’s fatal crash as additional safety inspections on its Dreamliner fleet have led to flight delays, cancellations and growing passenger anxiety.

These are the celebs who are attending Jeff Bezos’ Venice wedding

VENICE, Italy (AP) — arrived in Venice on Thursday, leading a star-studded guest list of celebrities descending on the lagoon city for the weekend wedding of Amazon founder Jeff Bezos and Lauren Sánchez.

Gold Just Hit Another Record. This Junior Could Be Next. - Ad

This copper junior was the smallest company invited to a top global mining conference. With nearly 1B pounds CuEq and 50,000m of funded drilling, this story won't stay quiet for long.

Goldman Just Launched Two New Bond ETFs; Here's Why Investors Should Pay Attention

Goldman Sachs Asset Management debuts two new actively managed fixed income ETFs in response to rising demand for active strategies in a volatile rate landscape.

Bitcoin, Ethereum, XRP, Dogecoin Surge Ahead Of Independence Day

Cryptocurrency markets are surging heading into the holiday weekend, fueled by optimism around altcoin ETF approvals for 2025 and the impending ratification of the ‘Big Beautiful Bill.’

The Tesla Shock Nobody Sees Coming - Ad

While headlines scream "Tesla is doomed"...Jeff Brown has uncovered a revolutionary AI breakthrough buried inside Tesla's labs. One that is helping AI escape from our computer screens and manifest itself here in the real world all while creating a 25,000% growth market explosion starting as early as July 23rd.

Outrage Mounts Over Trump's Appointment of 22-Year-Old to Terrorism Unit Amid Iran Crisis

In a move that has drawn widespread criticism, President Donald Trump has appointed 22-year-old Thomas Fugate to a critical terrorism-prevention role, amidst rising tensions with Iran.

Lisa Murkowski Defends Alaska Carveouts As Elizabeth Warren, Bernie Sanders And Rand Paul Condemn Trump's 'Big Beautiful' Bill: 'Not Good Enough For The Rest Of Our Nation'

Senators across party lines are fiercely divided over Donald Trump's $3.3 trillion "Big Beautiful Bill," which narrowly passed the Senate and now heads to the House amid concerns over debt, tax cuts, and social program rollbacks.

Back This Medical AI Tech - Ad

You don't often find healthcare tech company valued at $4M. HeartSciences is an exception. With $75M invested, including R&D, clinical trials, and product development, their patented AI software is approaching FDA submission, a potential major inflection point.

Occidental Petroleum, Exxon Mobil, Chevron Surge In Monday Pre-Market: What's Going On?

Occidental Petroleum Corp (NYSE: OXY), Exxon Mobil Corp (NYSE: XOM) and Chevron Corp (NYSE: CVX) climbed 1.95%, 1.53% and 1.31% during Monday pre-market trading session after oil prices have skyrocketed to their highest point since January, sparking concerns about potential supply disruptions.

US brings charges in North Korean remote worker scheme that officials say funds weapons program

WASHINGTON (AP) — The Justice Department announced criminal charges Monday in connection with a scheme by North Korea to fund its weapons program through the salaries of remote information technology workers employed unwittingly by U.S. companies.

Copper's Rally Is Back. This $31M Junior Is Sitting on a Potential Fortune - Ad

A high-grade copper-gold discovery in Quebec just delivered extremely high grade intercepts. Valuation? Just $31M. With copper breaking out, the timing here could be perfect. Get the symbol here.

How Senate Republicans want to change the tax breaks in Trump's big bill

WASHINGTON (AP) — House and Senate Republicans are taking slightly different approaches when it comes to the tax cuts that lawmakers are looking to include in their massive

Asia Markets Mixed, Europe Gains, Dollar Hit by Fed Concerns And Cooling Inflation - Global Markets Today While US Slept

US markets closed mixed on Wednesday, tech shares lifted Nasdaq while S&P 500 remained flat. Fed Chair Powell reiterated wait-and-see approach. Asian markets mostly up, European markets also positive with oil prices steady.

Is Elon's Empire Crumbling? - Ad

Jeff Brown - the legend who called Tesla and Nvidia early - says Elon is about to launch a $25T AI revolution. This isn't another chatbot. It's real-world AI that could 14X the impact of ChatGPT. But after July 23rd, it may be too late.

UnitedHealthcare Group, Gryphon Digital Mining, AMC Entertainment, Centene, And Tesla: Why These 5 Stocks Are On Investors' Radars Today

Major U.S. indices were mixed on Tuesday, with the Dow Jones Industrial Average rising 0.9% to 44,494.94 and the S&P 500 declining 0.1% to 6,198.01. The Nasdaq fell 0.8% to 20,202.89. These are the top stocks that gained the attention of retail traders and investors throughout the day:

Ivy League MBA Was A 'Waste Of Time,' Says Veteran Banker, Lists Two Key Traits Far More Relevant In The AI Age

Standard Chartered CEO Bill Winters says his MBA was a waste of time, and today's leaders need curiosity and empathy, not just technical skills.

Copper's Spiking - And This Junior Has the Grades to Match - Ad

Copper is climbing again - and this time, it looks tariff-proof. A $31M junior just posted nearly 1B lbs of copper equivalent. With insiders and institutions piling in, this could be the next breakout.

Republicans hit major setback in their effort to ease regulations on gun silencers

WASHINGTON (AP) — Republican efforts to loosen regulations on and short-barreled rifles and shotguns have been dealt a big setback with the Senate parliamentarian advising that the proposal would need to clear a 60-vote threshold if included in their big tax and immigration bill.

How to Hack a $1.3T Market - Ad

Forget concrete. The new foundation for real estate success is digital, and Pacaso leads the way. Their tech unlocks a $1.3T real estate market. They've already earned $110M+ in gross profits in their operating history and reserved the Nasdaq ticker PCSO.

Senate Republicans are in a sprint on Trump's big bill after a weekend of setbacks

WASHINGTON (AP) — After a weekend of setbacks, the Senate will try to sprint ahead Monday on big bill of despite a series of challenges, including the sudden announcement from that he won't run for reelection after opposing the package over its Medicaid health care cuts.

A Historic Gold Announcement Is About to Rock Wall Street - Ad

The greatest investor of all time is about to validate what Garrett Goggin has been saying for months: Gold is entering a once-in-a-generation mania. Front-running Buffett has never been more urgent - and four tiny miners could be your ticket to 100X gains.

China retaliates against EU with a ban on European medical devices

BANGKOK (AP) — China said Sunday that European medical device companies will be barred from selling to the Chinese government as a countermeasure for the European Union's restrictions on the sale of similar products from China.

Japan launches a climate change monitoring satellite on mainstay H2A rocket's last flight

TOKYO (AP) — Japan on Sunday launched a satellite to monitor greenhouse gas emissions using its mainstay rocket, which made its final flight before it is replaced by a new flagship designed to be more cost competitive in the global space market.

This Rare Metal Spiked 300%-And One Tiny Firm May Be Riding the Surge - Ad

Antimony prices soared last year, but few investors noticed. One firm, sitting on a historic deposit, is perfectly positioned to benefit as demand grows for defense.

AstraZeneca In $15 Billion Talks For Drug That Could Upend Lung Cancer Treatment

AstraZeneca is in advanced talks with Summit Therapeutics over a potential $15 billion deal for the lung cancer drug ivonescimab.

Trending Now

Information, charts or examples are for illustration and educational purposes only and not for individualized investment management This message contains commercial elements, such as advertising. We only send these offers to those who have opted in to our newsletter. Past performance is not indicative of future results. For these reasons we strongly suggest trading in a DEMO/Simulated account. The information provided by us is for educational and informational purposes only. We make no representations or warranties concerning the products, practices or procedures of any company or entity mentioned or recommended and have not determined if the statements and opinions of the advertiser are accurate, correct or truthful. If you use, act upon or make decisions in reliance on information contained or any external source linked within it, you do so at your own peril and agree to hold us, our officers, directors, shareholders, affiliates and agents without fault.

Copyright markethundred.com
Privacy Policy | Terms of Service