suflaj

suflaj t1_iqsw33w wrote

The self-attention mechanism evaluates relationships within the inputs. DNNs evaluate relationship between the input and the weights in the layer. Self-attention outputs the relationship between the inputs. DNNs just output the input transformed into another hyperspace.

30

suflaj t1_iqou52g wrote

The tensor and cuda cores between these 2 are not comparable. I don't know what support for libraries mean, CUDA capability versions are rarely relevant for DL and the cards are not very relevant now, let alone 10 years from now when something for their generation might start to get deprecated. You must realize that even if you bought a 4090 on this very day, a product that is soon only coming out, it is going to be obsolete in 2-4 years.

The 3060 is comparable to the 2080. The 2060 is not even comparable to any last gen cards. Obviously the answer is 3060.

9

suflaj t1_iqns8dp wrote

To add on what others have said, you would still likely want mini-batches to better track progress. Even if we had infinite memory there is still a limit to how fast you can process information (even at physical extremes), and so you would not be able to do these operations instantly. Unless there were significant drawbacks to using minibatches, you'd probably take over minibatches with seconds or minutes per update over a hanging loop that updates every X hours.

1