Recent comments in /f/MachineLearning

kuraisle t1_jbyrulz wrote

Has anyone had any experience data mining BioArXiv? It's on a requester pays Amazon s3 bucket, which isn't something I've used before and I'm struggling to guess how much I would have to pay to retrieve a few thousand articles. Thanks!

5

SuperNovaEmber t1_jbymjai wrote

Oh dear. You miss the most important. Which I did not mention. I figured you know?

Every empty point in space is theoretically capable of storing an atom, give or take.

Most of space is empty. The calculation of all the observable atoms 'brute forced' into all possible voids?

That's really the point, friend. You're talking combinatorics of one thing and then falsely equalizing with simply number of atoms?? Not the possible combinations of those atoms?? Which far exceeds the visible signs of your awakening?? It's not even astronomically close, bud.

In theory, a device around the size of a deck of cards contains more than enough energy to compute to end game.

The "observable" universe operates at an insanely high frequency. Consider the edge of the universe is over 10 orders of magnitude closer than the Planck length, using meters of course.

We're 10 billion times closer to the edge of the universe than the fabric of reality.

−6

WesternLettuce0 t1_jbyl2wi wrote

I used distilbert and legalbert separately to produce embeddings for my documents. What is the best way to use the embeddings for classification? Do I create document level embeddings before training my classifiers? Do I combine the two embeddings?

1

Thog78 t1_jbyh4w1 wrote

What you're looking for when comparing UMAPs is if the local relationships are the same. Try to recognize clusters and see their neighbors, or whether they are distinct or not. A much finer colored clustering based on another reduction (typically PCA) helps with that. Without clustering, you can only try to recognize landmarks from their size and shape.

2

schwah t1_jby3w3r wrote

Okay fair enough, it's not as simple as 10^170 > 10^80.

But I don't think your math makes much sense either. You can't just count the number of isotopes - nearly all of the universe is hydrogen and helium. And even with compression, it is going to take a lot more than 1 bit to represent a board state. Memory (at least with todays technology) requires billions of atoms per bit. And that is only memory - the computational substrate is also needed. And obviously we are ignoring some pretty significant engineering challenges of a computer that size, like how to deal with gravitational collapse.

I'll grant that it's potentially possible that you could brute force Go with a Universe-o-tron (if you ignore the practical limitations of physics), but it's definitely not a slam dunk like you're implying.

4

OptimizedGarbage t1_jbxticv wrote

It kinda was though? It was trained using self-play, so the agent it was playing against was adversarially searching for exploitable weaknesses. They actually cite this as one of the reasons for it's success in the paper

1

suflaj t1_jbx9h57 wrote

But there is evidence of a defense by taking as many adversarial attacks as possible and training against them. Ultimately, the ultimate defense is generalization. We know it exists, we know it is achievable, we only don't know HOW it's achievable (for non-trivial problems).

6

SuperNovaEmber t1_jbwxgu5 wrote

Wow, that understanding is deeply flawed. In computer systems we have compression and instancing and other tricks. But that's all besides the following point.

Atoms, for instance, how many different types are possible? Lets even ignore isotopes!

It's just like calculating a base. Like a byte can have 256 values. You get 4 bytes(32 bits) together and that's 4.3 billion states or 256^4 (base 256 with 4 bytes) or 2^32 (binary, base 2 with 32 bits). So instead of 256 values we got 118 unique atoms and instead of bytes we got atoms, 10^80 of them.

Simple, right? 118^10^80 combinations possible. Highest exponent first, mind you. Otherwise you only will get 1,658 digits instead of the actual result.... Which is not even remotely close..... Not 80 digits. Not 170 digits. Not 1,658, even.

That's 207188200730612538547439527925963726569493435639287375683771302641055893615162425 digits..... Again. This is not the answer. Just the number of digits in the answer.

Universe gots zero problems computing GO, bro

That's nothing compared to all the possible spaces all the possible atoms could occupy over all extents over space(and)time.

That's a calculation I'll leave up to you downvoters, gl hf!

−5

ApparatusCerebri t1_jbwwh5j wrote

Our visual system does use a couple of neat tricks to process what's around us but that too is open to some edge cases hence optical illusions. Other than that, in our case, evolution is the mother of all adversarial training :D

2