MLCommons Announces Expansion of Industry-Leading AILuminate Benchmark

SAN FRANCISCO, May 29, 2025 (GLOBE NEWSWIRE) -- MLCommons^® today announced that it is expanding its first-of-its-kind AILuminate benchmark to measure AI reliability across new models, languages, and tools. As part of this expansion, MLCommons is partnering with NASSCOM, India’s premier technology trade association, to bring AILuminate’s globally recognized AI reliability benchmarks to South Asia. MLCommons is also unveiling new proof of concept testing for AILuminate’s Chinese-language capabilities and new AILuminate reliability grades for an expanded suite of large language models (LLMs).

”We’re looking forward to working with NASSCOM to develop India-specific, Hindi-language benchmarks and ensure companies in India and around the world can better measure the reliability and risk of their AI products,” said Peter Mattson, President of MLCommons. “This partnership, along with new AILuminate grades and proof of concept for Chinese language capabilities, represents a major step towards the development of globally inclusive industry standards for AI reliability.”

“The rapid development of AI is reshaping India’s technology sector and, in order to harness risk and foster innovation, rigorous global standards can help align the growth of the industry with emerging best practices,” said Ankit Bose, Head of NASSCOM AI. “We plan to work alongside MLCommons to develop these standards and ensure that the growth and societal integration of AI technology continues responsibly.”

The NASSCOM collaboration builds on MLCommons’ intentionally global approach to AI benchmarking. Modeled after MLCommons’ ongoing partnership with Singapore’s AI Verify Foundation, the NASSCOM partnership will help to meet South Asia’s urgent need for standardized AI benchmarks that are collaboratively designed and trusted by the region’s industry experts, policymakers, civil society members, and academic researchers. MLCommons’ partnership with the AI Verify Foundation – in close collaboration with the National University of Singapore – has already resulted in significant progress towards globally-inclusive AI benchmarking across East Asia, including just-released proof of concept scores for Chinese-language LLMs.

AILuminate is also unveiling new reliability grades for an updated and expanded suite of LLMs, to help companies around the world better measure product risk. Like previous AILuminate testing, these grades are based on LLM responses to 24,000 test prompts across 12 hazard categories – including including violent and non-violent crimes, child sexual exploitation, hate, and suicide/self-harm. None of the LLMs evaluated were given any advance knowledge of the evaluation prompts (a common problem in non-rigorous benchmarking), nor access to the evaluator model used to assess responses. This independence provides a methodological rigor uncommon in standard academic research or private benchmarking.

“Companies are rapidly incorporating chatbots into their products, and these updated grades will help them better understand and compare risk across new and constantly-updated models,” said Rebecca Weiss, Executive Director of MLCommons.”We’re grateful to our partners on the Risk and Reliability Working Group – including some of the foremost AI researchers, developers, and technical experts – for ensuring a rigorous, empirically-sound analysis that can be trusted by industry and academia like.”

Having successfully expanded the AILuminate benchmark to multiple languages, the AI Risk & Reliability Working Group is beginning the process of evaluating reliability across increasingly sophisticated AI tools, including mutli-modal LLMs and agentic AI. We hope to announce proof-of-concept benchmarks in these spaces later this year.

About MLCommons
MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf^® benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire AI industry through benchmarks and metrics, public datasets, and measurements for AI risk and reliability.

Press Inquiries:

press@mlcommons.org

source: MLCommons

《說說心理話》「四徑跑手」的驚人意志原來是這樣練成的：小時候曾挨窮挨餓，17歲來港定居，言語不通，沒有朋友… 到現在成為消防處救護總隊目► 即睇

1	【聚焦人幣】人幣兌一美元即期收跌９點子，報７﹒１９１６
2	【關稅戰】特朗普關稅遭禁制，日韓股市高開，日經漲０﹒９％
3	【關稅戰】美法院判特朗普加關稅等越權，商務部：敦促徹底取消
4	【關稅戰】回應兩用物項出口管制，商務部：堅持維護和平穩定立場
5	【比特幣】比特幣ＥＴＦ五周吸金９０億美元，黃金ＥＴＦ遭拋售
6	【關稅戰】美國貿易法庭裁定特朗普全球關稅違法，頒發禁制令
7	《Ａ股焦點》海爾智家稱關稅問題下仍處於有利地位，ＡＨ股價齊升
8	《Ａ股焦點》山推股份升２％，擬赴港上市，推進全球化戰略
9	《Ａ股焦點》招商輪船、安通控股終止航運業務重組，兩者股價升
10	【聚焦人幣】人幣中間價貶１３點子報７﹒１９０７，創近一周新低

1	《匯海威言－毛偉廉》美債市憂慮加劇，本周料美元持續下行
2	《真知灼見－溫灼培》美歐關稅變化如何影響金融市場？
3	《美元走勢－羅明立》美元兌日圓走弱，１４２是初步支持
4	《投資智慧－鄧聲興》長和（００００１）雲端總部時代的來臨
5	《政政經經－石鏡泉》港股企穩？
6	《窩輪豪情－梁業豪》應採取較為保守的應巿態度
7	《一本萬利－林本利》睇得多ｘｘ天線人都黐線，入市還看五大指標
8	《菲常論證－溫蕎菲》憂內捲美團績後偏軟，比亞迪年內三度減價
9	《國金新見－溫嘉煒》美債還未「最終清算」
10	《港股偉論－黃偉豪》憧憬ＲＥＩＴＳ納入滬深港通，領展股價吸引

1	關稅戰 \| 特朗普建議6月1日起對歐盟商品徵50%關稅，斥歐盟貿易壁壘嚴重
2	高息定存 \| 一周高息合集，三大發鈔行再減存息，工銀亞洲3個月最高5厘
3	關稅戰 \| 特朗普轉軚，歐盟50%關稅期限暫緩至7月9日
4	港股 \| 蕭猷華：趁港股調整，宜逢低吸納
5	高息定存 \| 招商永隆上調一年期港元定存息，滙豐3個月跌穿1厘
6	小米 \| 季績勝預期獲大行升目標，股價逆市上升應如何部署？
7	美股收盤 \| 國際貿易局勢再度緊張，蘋果挫3%，道指跌256點
8	寧德時代 \| 被指街貨少連日回吐，大行輪番唱好可趁機吸納？
9	一本萬利 \| 技術分析睇xx天線，睇得多人都黐線；入市還看「五大指標」（有片）
10	搶人才 \| 哈佛風波港大學伺機搶人才，學者倡邀哈佛來港設分校推動大學城遠景

1	開市Go \| 美匯下跌後回升，恒瑞招股，騰訊季盈利勝預期
2	曾煥平專訪 \| 曾煥平料關稅實際影響半年後浮現，香港未來面臨十級風浪
3	電動車 \| 哪吒汽車被申請破產，母公司被執行金額逾1億元
4	美股收盤 \| GDP收縮引發衰退憂慮，道指插近800點後倒升
5	港股 \| 蕭猷華：中美貿易戰有望進一步降溫
6	關稅戰 \| FOCUS \| 破冰未至「放水」先行，市場疑慮揮之不去
7	第八屆兒童書展｜5.23開鑼　親子減壓！開心激玩閱讀嘉年華
8	蔡金強專訪 \| 中美「開弓沒有回頭箭」，人幣拒兌美元大貶（有片）
9	美股收盤 \| 非農數據勝預期，標指創20年來最長連升紀錄
10	高息定存 \| 一周高息合集，集友銀行增低門檻1個月定存，信銀半年期3.35厘
11	市場波動下的攻守之道　2025第二季股巿與輪證部署策略
12	關稅戰 \| FOCUS \| 中國對美出口挫21%，未來數月臨韌性真考驗
13	ShipAny聯手支付平台Payment Asia 聯合Annibody開啓直播電商
14	長和：港口交易絕不可能不合法或不合規情況下進行
15	美股收盤 \| 三大指數個別發展，市場觀望中美經貿會談
16	高息定存 \| 中銀恒生齊減定存息，3個月及6個月存期失守3厘
17	高息定存 \| 滙豐恒生減定存息，3個月最高2.4厘，7日10厘
18	高息股｜高息個股VS高息股ETF點揀好？
19	高息定存 \| 一周高息合集，滙豐單周三度減定存息，建行亞洲3個月仍有4.8厘
20	蔡金強專訪 \| 三大因素加持港股「堅」，最重要5隻股票是……（有片）
21	寧德時代 \| FOCUS \| 此「寧王」非彼「雪王」，換電賽道臨雙考驗
22	寧德時代 \| David Webb警告寧德時代H股流通股數量過低
23	下調評級 \| 債務攀升財赤擴大，穆迪降美國信用評級至AA1
24	港股 \| 蕭猷華：恒指5月上望24000點
25	券商轉型生死局：為何科技夥伴成機構「救命稻草」？
26	高息定存 \| 一周高息合集，拆息大跌多間港銀下調定存息，工銀亞洲3個月仍有3.05厘
27	高息定存 \| 多間銀行下調港元定存年息，滙豐3個月2.8厘
28	關稅戰 \| 專訪 \| 香港物流協會副會長梁庭彰：關稅戰致本港物流業已見退單現象
29	關稅戰 \| 特朗普建議6月1日起對歐盟商品徵50%關稅，斥歐盟貿易壁壘嚴重
30	阿里 \| 傳與蘋果AI合作受美關注疊加季績遜預期，阿里連日急挫，仍獲大行力撐可趁機吸？

關稅戰

關稅戰 | FOCUS | 關稅大棒死而復生，驢象司法戰不利...

大國博弈

狂人亂揮「大棒」，美還剩多少「蜜糖」？

貨幣攻略

高息定存 | 招商永隆上調一年期港元定存息，滙豐3個月跌穿1...

說說心理話