Bypassing Newspapers.com paywall and hunting down obituaries

DrNeurohax@kbin.social · 1 year ago

Agreed. I’m in my 40s, and I’ve never seen anywhere near the level of subsurface signaling and intentional complacency we’re experiencing now.

DrNeurohax@kbin.social · 1 year ago

Well, terrorists became boring, and they still want the loony wing of the GOP’s clicks, so best to back off on Nazis and pro-Russians, leaving pedophiles as the safest bet.

DrNeurohax@kbin.social · 1 year ago

At first glance, I probably thought JXL was another attempt at JPEG2000 by a few bitter devs, so I had ignored it.

Yeah, my examples/description was more intended to be conceptual for folks that may not have dealt with the nitty gritty. Just mental exercises. I’ve only done a small bit of image analysis, so I have a general understanding of what’s possible, but I’m sure there are folks here (like you) that can waaay outclass me on details.

These intermediate-to-deep dives are very interesting. Not usually my cup of tea, but this does seem big. Thanks for the info.

DrNeurohax@kbin.social · 1 year ago

(fair warning - I go a little overboard on the examples. Sorry for the length.)

No idea on the details, but apparently it’s more efficient for multithreaded reading/writing.

I guess that you could have a few threads reading the file data at once into memory. While one CPU core reads the first 50% of the file, and second can be reading in the second 50% (though I’m sure it’s not actually like that, but as a general example). Image compression usually works some form of averaging over an area, so figuring out ways to chop the area up, such that those patches can load cleanly without data from the adjoining patches is probably tricky.

I found this semi-visual explanation with a quick google. The image in 3.4 is kinda what I’m talking about. In the end you need equally sized pixels, but during compression, you’re kinda stretching out the values and/or mapping of values to pixels.

Not an actual example, but highlights some of the problems when trying to do simultaneous operations…

Instead of pixels 1, 2, 3, 4 being colors 1.1, 1.2, 1.3, 1.4, you apply a function that assigns the colors 1.1, 1.25, 1.25, 1.4. You now only need to store the values 1.1, 1.25, 1.4 (along with location). A 25% reduction in color data. If you wanted to cut that sequence in half for 2 CPUs with separate memory blocks to read at once, you lose some of that optimization. Now CPU1 and CPU2 need color 1.25, so it’s duplicated. Not a big deal in this example, but these bundles of values can span many pixels and intersect with other bundles (like color channels - blue can be most efficiently read in 3 pixels wide chunks, green 2 pixel wide chunks, and red 10 pixel wide chunks). Now where do you chop those pixels up for the two CPUs? Well, we can use our “average 2 middle values in 4 pixel blocks” approach, but we’re leaving a lot of performance on the table with empty or useless values. So, we can treat each of those basic color values as independent layers.

But, now that we don’t care how they line up, how do we display a partially downloaded image? The easiest way is to not show anything until the full image is loaded. Nothing nothing nothing Tada!

Or we can say we’ll wait at the end of every horizontal line for the values to fill in, display that line, then start processing the next. This is the old waiting for the picture to slowly load in 1 line at a time cliche. Makes sense from a human interpretation perspective.

But, what if we take 2D chunks and progressively fill in sub-chunks? If every pixel is a different color, it doesn’t help, but what about a landscape photo?

First values in the file: Top half is blue, bottom green. 2 operations and you can display that. The next values divide the halves in half each. If it’s a perfect blue sky (ignoring the horizon line), you’re done and the user can see the result immediately. The bottom half will have its values refined as more data is read, and after a few cycles the user will be able to see that there’s a (currently pixelated) stream right up the middle and some brownish plant on the right, etc. That’s the image loading in blurry and appearing to focus in cliche.

All that is to say, if we can do that 2D chunk method for an 8k image, maybe we don’t need to wait until the 8k resolution is loaded if we need smaller images for a set. Maybe we can stop reading the file once we have a 1024x1024 pixel grid. We can have 1 high res image of a stoplight, but treat is as any resolution less than the native high res, thanks to the progressive loading.

So, like I said, this is a general example of the types of conditions and compromises. In reality, almost no one deals with the files on this level. A few smart folks write libraries to handle the basic functions and everyone else just calls those libraries in their paint, or whatever, program.

Oh, that was long. Um, sorry? haha. Hope that made sense!

DrNeurohax@kbin.social · 1 year ago

Oh, I’ve just been toying around with Stable Diffusion and some general ML tidbits. I was just thinking from a practical point of view. From what I read, it sounds like the files are smaller at the same quality, require the same or less processor load (maybe), are tuned for parallel I/O, can be encoded and decoded faster (and there being less difference in performance between the two), and supports progressive loading. I’m kinda waiting for the catch, but haven’t seen any major downsides, besides less optimal performance for very low resolution images.

I don’t know how they ingest the image data, but I would assume they’d be constantly building sets, rather than keeping lots of subsets, if just for the space savings of de-duplication.

(I kinda ramble below, but you’ll get the idea.)

Mixing and matching the speed/efficiency and storage improvement could mean a whole bunch of improvements. I/O is always an annoyance in any large set analysis. With JPEG XL, there’s less storage needed (duh), more images in RAM at once, faster transfer to and from disc, fewer cycles wasted on waiting for I/O in general, the ability to store more intermediate datasets and more descriptive models, easier to archive the raw photo sets (which might be a big deal with all the legal issues popping up), etc. You want to cram a lot of data into memory, since the GPU will be performing lots of operations in parallel. Accessing the I/O bus must be one of the larger time sinks and CPU load becomes a concern just for moving data around.

I also wonder if the support for progressive loading might be useful for more efficient, low resolution variants of high resolution models. Just store one set of high res images and load them in progressive steps to make smaller data sets. Like, say you have a bunch of 8k images, but you only want to make a website banner based on the model from those 8k res images. I wonder if it’s possible to use the the progressive loading support to halt reading in the images at 1k. Lower resolution = less model data = smaller datasets to store or transfer. Basically skipping the downsampling.

Any time I see a big feature jump, like better file size, I assume the trade off in another feature negates at least half the benefit. It’s pretty rare, from what I’ve seen, to have improvements on all fronts.

DrNeurohax@kbin.social · 1 year ago

Even better, this must be fantastic when you’re training AI models with millions of images. The compression level AND performance should be a game changer.

DrNeurohax@kbin.social · 1 year ago

Thank you so much for this! It reminded me to revisit my library’s general resources and look specifically for which archive collections they had available. I’m 1 state over, so I figured there was a good chance we would have Newspapers.com Library Edition access here.

The main/default collection my library sent me to was no help, but they had a Newspapers.com Library Edition portal listed further down. Final-fucking-ly got it. I really, really appreciate the help.

DrNeurohax@kbin.social · 1 year ago

I’m sure it’s a fine service, if you want to use it regularly, but I just wanted 1 tiny thing. If they had a $1 for an obit or a page deal, sure. Instead, there’s this whole microcosm of bullshit where some are archived, others available, some omitted from public collections, some on different 3rd party sites, etc.

The family paid for an obit. It wasn’t in the 1800s. The paper has been digitized. I should be able to go to the paper with the name, exact date, and city and find it. They literally say it doesn’t exist. Not that it’s on our archive site or our partner site, just nothing.

I would have thrown a couple bucks to any of the sites for access, but no, I need to sign up for a subscription, give them all my details, get spam calls for the next 100 years, just no. Super frustrating.

DrNeurohax@kbin.social · edit-2 1 year ago

Bypassing Newspapers.com paywall and hunting down obituaries

DrNeurohax@kbin.social · 1 year ago

Ah, yes, you’re right! Thanks for that.

DrNeurohax@kbin.social · 1 year ago

smh

That’s fucking tragic. Makes me want to whip out the ole Hacker Manifesto.

Kids will never again know the fun of dealing with long distance calling plans and the barely usable international calling that used to cost half you rent for a 15 minute conversation.

DrNeurohax@kbin.social · 1 year ago

Probably based on the Cap’n Crunch whistle pay phone hack.

Someone correct me if I’ve missed a few bits, but here’s the story…

First, a little history.

Payphones were common. If you’re younger, you’ve probably seen them in movies. To operate them, you picked up the handset, listened for the dial tone (to make sure no one yanked the cord loose), inserted the amount shown by the coin slot, and then dialed. You have a limited amount of time before an automatic message would ask you to add more money. If you dialed a long distance number, a message would play telling you how much more you needed to insert.

There were no digital controls to this - no modern networking. The primitive “computers” were more like equipment you’d see in a science class. So, to deal with the transaction details, the coin slot mechanism would detect the type of coin inserted, mute the microphone on the handset, and transmit a series of tones. Just voltage spikes. The muting prevented the background noise from interfering with the signal detection. Drop a quarter in the slot and you’d hear the background noise suddenly disappear followed by some tapping sounds (this was just bleed through).

It’s also relevant to know that cereals used to include a cheap, little toy inside. At one point, Cap’n Crunch had a whistle which had a pitch of 2600Hz.

The story goes that someone* figured out that the tones sent by the payphones were at 2600Hz - same as the whistle. You could pick up a payphone handset and puff into the whistle a certain number of times, and ti would be detected as control signals (inserting money).

That’s right! Free phone calls to anywhere. I’m hazy on the specifics, but I’m pretty sure there were other tricks you could do, like directly calling restricted technician numbers, too. The reason the 2600Hz tone was special had to do with something like it was used as a general signal that didn’t trigger billing.

It knocked the idea of phone hacking, or “phreaking”, from a little known quirk, to an entire movement. Some of the stuff was wild and if you’re interested, look up the different “boxes” that people distributed blueprints for. Eventually, the phone companies caught on and started making it harder to get at wires and more sophisticated coin receptacles.

If you’ve ever seen the magazine 2600 back in the 90s and early 00s, that’s the origin of the name.

All that is to say, if you knew nothing about technology and watched a guy whistle into a phone to get special access, you’d probably be freaked out. Who knows what that maniac could do with a flute!

I could have sworn it was Mitnick, but might have been someone else.

DrNeurohax@kbin.social · 1 year ago

I looked into this before with a similar deal by a 3rd party seller on Amazon. The enterprise drives (I was looking at those EXOS drives, too) must be sold by the manufacturer certified reseller or you run the chance of getting zero warranty. That being said, I’ve seen plenty of conflicting stories by people that bought them and needed to submit an RMA. I’d say it was a 60/40 split of honoring the warranty to not honoring it.

Long story short, it’s a gamble. They’re likely good drives, but you’re rolling the dice if something goes wrong with them.

DrNeurohax@kbin.social · 1 year ago

I wonder if that was born of the Dogecoin tipping system that was around for a while in… 2017/2018? I forget.

I’m pretty sure they thought the awards/gilding was going to be their best bet to Moneyville after Premium flopped. It’s basically just a rebranding with the ability to gift it.

DrNeurohax@kbin.social · 1 year ago

Delete everything after the long number
username/number(space)"I’m…

DrNeurohax@kbin.social · 1 year ago

How much of that Kickstarter money is set aside from the avalanche of lawsuits Adobe will launch when they see the name and icon?
There is zero chance it launches without rebranding, so I wonder what they’re actually naming it.

DrNeurohax@kbin.social · 1 year ago

I JUST started using SearXNG and have been also googling the same terms to see how they compare.

So far (less than a week), SearXNG has had what i was looking for in the first 5 links every time. Googled result was either below the scroll or I gave up. Maybe only a couple dozen tests, but it wasn’t even close.

DrNeurohax@kbin.social · 1 year ago

You dare question the majesty of AltaVista?

DrNeurohax@kbin.social · 1 year ago

Yeah, I’m pretty sure you’re right. I think before that was metasearch.com. It was basically a top frame that you entered the search in with a row of icons, and the bottom frame would render the search results from whichever sites you chose. I’m pretty sure it removed all the extra elements, too, so it was actually pretty decent.

DrNeurohax@kbin.social · 1 year ago

Which is why you need at least 1 account recovery email that isn’t yours, and that person should have your password/2FA key saved somewhere secure.

As for dead relatives, I’ve had a couple and I exported all their data, deleted everything from the account, and I check them around once per year. I figure 5 years is enough, but by 10, there isn’t really any reason to keep it going.

DrNeurohax

Bypassing Newspapers.com paywall and hunting down obituaries

Bypassing Newspapers.com paywall and hunting down obituaries