Don't use it for your job, because you won't experience it where it's at its strongest. If a person isn't good at, say, electronics, you wouldn't conclude that he's an incapable of thinking from administering an electronics test.
Instead have it generate an html javascript game. Plenty of training data on that and easy for you to follow along. Then have it make simple modifications and customizations, one-at-a-time. Ask it what some function does, then make an edit to it, and ask it what does now.
Tell it to do things that obviously require some degree of understanding and reasoning, and are highly unlikely to be in its training data. Inspect the code to see if it was right.
You can do this with Claude, ChatGPT, Grok, MiniMax, Qwen, or whatever model you like as long as it's not old, tiny, or quanted to hell. Use a dense model with >25B parameters or an MoE with >100B parameters.
I use either Qwen 3.6 27B or Gemma 4 31B at a Q8 quant on my Macbook, but you can use a Q5 quant and have mostly ok results. I use llama.cpp for the inference engine and Unsloth XL quants downloaded from Huggingface. Local isn't as good as what you can get in the cloud, but it gives you full visibility into all generated tokens.
I'm not gonna convince you that AGI is nigh, nor that AI in its current forms can boost every task, nor that it's always logical, nor that it's always right, nor that it's been a net productivity boost across the economy, nor that it'll turn 10x engineers into 100x engineers, nor that the autoregressive transformer makes sense long-term. I mostly agree with you. But it's mistaken to believe it's incapable of logic or reason outright, or that it's randomly sampling normal or uniform distributions, or that it is a hindrance to all knowledge work. It's pretty darn intelligent at some things. And when people a) stop wasting tokens and over-trusting it on the things it's bad at or b) figure out how to make continual learning feasible, we'll see net efficiency gains from it.