RC RANDOM CHAOS

Norway's National Library Builds Sovereign Norwegian LLM on 2PB Huawei Flash

· via Hacker News

Original source

Norway's 2 petabytes of Huawei flash storage and LLM training

Hacker News →

Norway’s National Library is training a sovereign Norwegian-language LLM because no commercial provider will, and a globally trained English model misses local history, news, and culture. The library was the natural home for the project: it holds the country’s largest digital collection under a legal deposit mandate covering books, broadcasts, and web content, and it secured an agreement with Norwegian newspapers allowing training on copyrighted material — leverage no private company has. Its 20PB of unique digitized content (60PB with 3-2-1 redundancy) feeds the pipeline.

The technical architecture splits across three systems that don’t naturally cooperate. A preservation archive optimized for durability and cheap cold storage feeds an on-prem pipeline running on Nvidia DGX H200, a 384-core CPU cluster, and 2PB of Huawei OceanStor Dorado all-flash for ingestion, cleaning, deduplication, and normalization. Cleaned data then ships to Sigma2’s Olivia supercomputer — an HPE Cray EX with 448 GPUs and 5.3PB of ClusterStor — for the actual training runs. IT platform head Marius Husnes says the bottleneck was never compute; it was data quality and moving petabyte-scale datasets out of a high-latency archive into a high-throughput pipeline, a problem nobody publicly documents.

Open questions remain around evaluation (Norwegian has two written forms and many dialects, with no standard benchmarks), governance (who controls a sovereign model and authorizes its uses), and orchestration across the three-system stack. The broader signal: Huawei storage is winning serious European infrastructure deals, and the playbook Norway is improvising will matter for every non-English-speaking country attempting the same.

Read the full article

Continue reading at Hacker News →

This is an AI-generated summary. Read the original for the full story.