TheOneCurly

TheOneCurly@lemm.ee · 15 days ago

The concept as I understand it is that Threads has the sheer volume of content to completely drown out the existing Fedi content if it fully opens the floodgates. If that occurs and say 90% of content becomes Threads and then they start making Threads only extensions to Activity Pub, servers will have to start patching those in and the Activity Pub project is defacto owned by Meta.

People also have issues with the Meta content moderation and the population on Threads, but as you noted that’s fixable on an individual and community level. The existential threat to the future of the Fediverse is why servers should defederate. Meta can’t and shouldn’t be trusted with any amount of power over this community project.

TheOneCurly@lemm.ee · 16 days ago

It’s a specter with a heart bleed.

TheOneCurly@lemm.ee · 19 days ago

Re-manifested? To fix it you have to reenable manifest v2. That should be simple for a while but will get more problematic over time.

TheOneCurly@lemm.ee · 27 days ago

That’s what this is about… Continual training of new models is becoming difficult because there’s so much generated content flooding data sets. They don’t become biased or overly refined, they stop producing output that resembles human text.

TheOneCurly@lemm.ee · 1 month ago

Doing good work takes time to make money, execs need those quarterly bonuses right now. Much easier to do a bunch of layoffs and get that line up now.

TheOneCurly@lemm.ee · 3 months ago

What’s funny is right at launch I would have seriously considered upgrading, but I’m on second gen Ryzen and that platform was deemed not new enough at the time. Now they’ve added a bunch of BS and even though I think they’ve removed the restriction I’m over the new shiny thing and am looking heavily into a full linux setup.

TheOneCurly@lemm.ee · 3 months ago

Yep Lemmy uses SMTP and in my experience most self-hostable platforms do as well. You can see in the Lemmy config documents how it gets set up: https://join-lemmy.org/docs/administration/configuration.html.

TheOneCurly@lemm.ee · 3 months ago

Made with Gtk4, WebKitGTK, libadwaita and Flatpak.

WebKit based, which is interesting. I don’t have much experience with WebKit on Linux.

TheOneCurly@lemm.ee · edit-2 3 months ago

This is legal vs rude. It certainly is legal and was in the terms of service for them to use the data in any way they see fit. But, also it’s rude to bait and switch from being a message board to being an AI data source company. Users we led to believe they were entering into an agreement with one type of company and are now in an agreement with a totally different one.

You can smugly tell people they shouldn’t have made that decision 15 years ago when they started, but a little empathy is also cool.

Additionally: When you owe your entire existence and value to user goodwill it might not be a great idea to be rude to them.

TheOneCurly@lemm.ee · 3 months ago

I can only really speak to reddit, but I think this applies to all of the user generated content websites. The original premise, that everyone agreed to, was the site provides a space and some tools and users provide content to fill it. As information gets added, it becomes a valuable resource for everyone. Ads and other revenue streams become a necessary evil in all this, but overall directly support the core use case.

Now that content is being packaged into large language models to be either put behind a paywall or packed into other non-freely available services. Since they no longer seem interested in supporting the model we all agreed on, I see no reason to continue adding value and since they provided tools to remove content I may as well use them.

TheOneCurly@lemm.ee · 5 months ago

The goal posts of … respecting basic copyright?

TheOneCurly@lemm.ee · 6 months ago

TMDB has a pretty good one. https://developer.themoviedb.org/reference/intro/getting-started

TheOneCurly@lemm.ee · 6 months ago

The section about “regular language” is the reason. That’s not being cheeky, that’s a technical term. It immediately dives into some complex set theory stuff but that’s the place to start understanding.

TheOneCurly@lemm.ee · 6 months ago

As I understand it, NAT is a firewall with only a very basic configuration: allow all outbound and accept only established inbound. If you don’t expect to have any incoming connections and completely trust all your internal devices then its good enough.

However, if you start wanting to port forward for servers (SSH, FTP, video games) you need to poke holes in the NAT firewall and it has no additional configuration options to help you. The same goes for if you have internal (ex. IoT) devices that you don’t necessarily trust, there are no rules to block outbound traffic.

TheOneCurly@lemm.ee · 6 months ago

Sora can sometimes do 1 minute clips that mostly look ok as long as you don’t pay too close attention. We are incredibly far away from coherent, feature-length narratives and even those aren’t likely to be thematically interesting or engaging.

TheOneCurly@lemm.ee · 6 months ago

I wonder what the risks are to including deleted and pre-edited content in training data. Most of the edits are going to be typos and formatting, do you want 2-3 copies of the same message with typos in them for training data? Similarly, deleted comments are mostly nonsense, unhelpful, duplicate, or highly controversial things.

If someone wants to dig through and find individual users to restore that’s one thing, but I don’t think I’d immediately choose to train off of that other data unless I had to.

TheOneCurly@lemm.ee · 6 months ago

That’s what finally did in my 10 year old Corsair. I was technically within specs on wattage with my new 4070 but certain loads would cause it to trip the over current protection anyway.

TheOneCurly@lemm.ee · 6 months ago

We made a tag that can’t be reliably and deterministically scanned so we also included a machine learning model that takes a good guess at it.

I just don’t see how you could possibly rely on a black box model for anything important. You have no way to mathematically prove if there are collisions in the model output or not, and newer versions of the model can’t be made backwards compatible. So if you have a database of thousands of these tags scanned, then they discover a critical vulnerability and provide a new model, you’re SOL and everything you have is worthless.

TheOneCurly@lemm.ee · 6 months ago

There are hundreds of gTLDs now, maybe everyone can stop abusing country code TLDs and leave them for their intended purposes.

TheOneCurly@lemm.ee · 6 months ago

That’s why dns-over-https is so important