DeepSeek V4 Architecture: How Sparse Attention Cuts Inference Costs, What NIST Found

DeepSeek V4 architecture uses sparse attention to cut inference costs 73% at one-million-token contexts, but a NIST government evaluation found it lags frontier US models by eight months on cross-domain tasks — and China's National Intelligence Law means the hosted API carries data obligations that self-hosting the open weights does not resolve. (347 characters)