A recent multicore i7 (e.g. 4 core Haswell, 8 way SIMD for single precision = 32 threads) is enough to prototype OpenCL code which you can then run on larger CPUs or GPUs.
Intel was selling the Xeon Phi 31S1P for under 200$ (it's back to 500$ now) for a limited time.
They will likely to have cheap version and promotions this time around too.
Because it's not made to accelerate training, just inference. The TPU is an 8-bit fixed point processor less power hungry than GPUs, so it won't help research, only deployment for large projects, running in the cloud.
I care about being able to carry on doing my hobby stuff 5 or 10 years from now. With NVidia I can at least be confident that as long as my graphics card keeps working (which feels like something under my control, unlike Google shutting down their products) I can keep running my code on it.
I would like to play with these things.