zyddnys/manga-image-translator: Translate manga/image

Even_Adder@lemmy.dbzer0.com · 2 days ago

I think it’s really disingenuous to mention the DeviantArt/Midjourney/Runway AI/Stability AI lawsuit without talking about how most of the infringement claims were dismissed by the judge.

Even_Adder@lemmy.dbzer0.com · 2 days ago

This isn’t about research into AI, what some people want will impact all research, criticism, analysis, archiving. Please re-read the letter.

Even_Adder@lemmy.dbzer0.com · 3 days ago

Damn, this article is so biased.

Even_Adder@lemmy.dbzer0.com · 3 days ago

You should read this letter by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries.

Why are scholars and librarians so invested in protecting the precedent that training AI LLMs on copyright-protected works is a transformative fair use? Rachael G. Samberg, Timothy Vollmer, and Samantha Teremi (of UC Berkeley Library) recently wrote that maintaining the continued treatment of training AI models as fair use is “essential to protecting research,” including non-generative, nonprofit educational research methodologies like text and data mining (TDM). If fair use rights were overridden and licenses restricted researchers to training AI on public domain works, scholars would be limited in the scope of inquiries that can be made using AI tools. Works in the public domain are not representative of the full scope of culture, and training AI on public domain works would omit studies of contemporary history, culture, and society from the scholarly record, as Authors Alliance and LCA described in a recent petition to the US Copyright Office. Hampering researchers’ ability to interrogate modern in-copyright materials through a licensing regime would mean that research is less relevant and useful to the concerns of the day.

Even_Adder@lemmy.dbzer0.com · 12 days ago

That title looks like a typical sports anime title.

Haikyu!!
Kuroko’s Basketball
Blue Lock
Hajime no Ippo
Ace of Diamond
Slam Dunk
Major
The Prince of Tennis
Yowamushi Pedal
One Outs
Baby Steps

I could go on, but I think I’ve made my point.

Even_Adder@lemmy.dbzer0.com · 13 days ago

Have you read this article by Cory Doctorow yet?

Even_Adder@lemmy.dbzer0.com · 17 days ago

She can’t be fully 2 without the paws and face though.

Even_Adder@lemmy.dbzer0.com · 17 days ago

But without the face she isn’t quite there yet.

Even_Adder@lemmy.dbzer0.com · 17 days ago

I’d say you’re mostly safe since we’re not even reaching step two.

Even_Adder@lemmy.dbzer0.com · 18 days ago

It should be fully legal because it’s still a person doing it. Like Cory Doctrow said in this article:

Break down the steps of training a model and it quickly becomes apparent why it’s technically wrong to call this a copyright infringement. First, the act of making transient copies of works – even billions of works – is unequivocally fair use. Unless you think search engines and the Internet Archive shouldn’t exist, then you should support scraping at scale: https://pluralistic.net/2023/09/17/how-to-think-about-scraping/

Making quantitative observations about works is a longstanding, respected and important tool for criticism, analysis, archiving and new acts of creation. Measuring the steady contraction of the vocabulary in successive Agatha Christie novels turns out to offer a fascinating window into her dementia: https://www.theguardian.com/books/2009/apr/03/agatha-christie-alzheimers-research

The final step in training a model is publishing the conclusions of the quantitative analysis of the temporarily copied documents as software code. Code itself is a form of expressive speech – and that expressivity is key to the fight for privacy, because the fact that code is speech limits how governments can censor software: https://www.eff.org/deeplinks/2015/04/remembering-case-established-code-speech/

That’s all these models are, someone’s analysis of the training data in relation to each other, not the data itself. I feel like this is where most people get tripped up. Understanding how these things work makes it all obvious.

Even_Adder@lemmy.dbzer0.com · 25 days ago

They don’t train on random social media posts. Everything is sorted and approved.

Even_Adder@lemmy.dbzer0.com · 25 days ago

As long as there’s supervision during training, which there always will be, this isn’t really a problem. This just shows how bad it can get if you just train on generated stuff.

Even_Adder@lemmy.dbzer0.com · 26 days ago

It sounds a lot like this quote from Andrej Karpathy :

Turns out that LLMs learn a lot better and faster from educational content as well. This is partly because the average Common Crawl article (internet pages) is not of very high value and distracts the training, packing in too much irrelevant information. The average webpage on the internet is so random and terrible it’s not even clear how prior LLMs learn anything at all.

Even_Adder@lemmy.dbzer0.com · 26 days ago

It’s important we don’t let people of influence insulate themselves from criticism using this as a scapegoat.

Even_Adder@lemmy.dbzer0.com · 27 days ago

It’s already happening. A quote from Andrej Karpathy :

Turns out that LLMs learn a lot better and faster from educational content as well. This is partly because the average Common Crawl article (internet pages) is not of very high value and distracts the training, packing in too much irrelevant information. The average webpage on the internet is so random and terrible it’s not even clear how prior LLMs learn anything at all.

Even_Adder@lemmy.dbzer0.com · edit-2 27 days ago

You have to pretty much intentionally give it enough synthetic data to wreck it. OpenAI and Anthropic train their models on generated data to improve them. As long as there’s supervision during training, which there always will be, this isn’t really a problem.

https://openai.com/index/prover-verifier-games-improve-legibility/

https://www.anthropic.com/research/claude-character

Even_Adder@lemmy.dbzer0.com · 29 days ago

Any place is a good place for laptop grand finals.

Even_Adder@lemmy.dbzer0.com · 30 days ago

Gintama?

Even_Adder@lemmy.dbzer0.com · 30 days ago

When am I supposed to play Melty Blood?

Even_Adder@lemmy.dbzer0.com · 1 month ago

I forgot what passes for dining in some parts of the country.

Even_Adder@lemmy.dbzer0.com · 10 months ago

zyddnys/manga-image-translator: Translate manga/image

Even_Adder@lemmy.dbzer0.com · edit-2 10 months ago

The weights for Show-1 have been released

Even_Adder@lemmy.dbzer0.com · 10 months ago

Stability.ai released a suite of open source audio diffusion tools.