Can Vision-Language Models Reliably Extract Equipment Tags from P&IDs?
Author: Rosary Labs Team
Date: 4th January 2026
Summary
We experimented with five state-of-the-art Vision Language Models to extract equipment tags from Piping and Instrumentation Diagrams (P&IDs). Our findings revealed varying levels of accuracy, hallucination tendencies, and practical limitations when using VLMs for equipment tag extraction purposes.
How did it all started?
We are curious with Vision Language Models (VLMs) generic capability to extract equipment tags from Piping & Instrumentation Diagram (P&ID), specifically their accuracy, and so we tested a few VLMs, such as ChatGPT 5.2, Claude Opus 4.5, Gemini 3 Pro Preview, GLM 4.6v and Qwen3 235B.
Experiment Setup
To ensure reliable comparison, we created a ground-truth dataset from manual extractions, which we used to act as the accuracy benchmark of VLM-generated results.
Here's the P&ID diagram used:
Ground Truth Dataset
Manually extracted by an instrumentation engineer from the P&ID diagram:
Sensors (57)
TAH10X, XA10X, XR10X, HS10X, ZI10X, LIR100, LAL100, LAH100, PIR100, LAL101, LAH102, PIR101, FIR100, PAL101, PAH101, ZA104, FI100, TI10X, XA10X, XR10X, HS10X, ZI10X, LI100, PIT100, LI101, LI102, PIT101, FIT100, ZI104, HS10X, YLAR, YLG, LCP100, OIT10X, PIT101, FIT100, PI101, LIT100, LE100, LS101, LS102, ZS104, PIT100, FIR101, FIA101, FIT101, PAH200A, PAL200B, PIR200C, PI200, PIT200, PDIT201, PDI201, PDIR201B, PDAH201A, PI101
Valves (32)
V-101, V-102, V-103, V-104, V-105, V-106, V-107, V-108, V-109, V-110, V-111, V-112, V-113, V-114, V-117, V-201, V-202, V-203, V-204, V-205, V-206, V-208, V-209, V-210, V-212, V-214, V-216, V-218, V-219, V-220, V-221, V-222
Pumps
P-101, P-102, P-103
Motors
M-101, M-102, M-103
Filters
F-201, F-202, F-203
Time to experiment with Vision Language Models
We tested these Vision Language Models, each representing the most advanced capabilities available from reputable Language Model players: Open AI, Anthropic, Google, Alibaba and Z.ai:
And the prompt?
Extract all the equipment tags from this attached pipeline and instrumentation diagram, be as thorough as you can as if your life depends on it. Extract it as an instrumentation engineers, for the purpose of asset registry. Extract unique engineering tag identifiers only. Format your output as a block of newline separated text. Go!
Our initial prompt revealed unexpected VLM behaviors that honestly caught us by surprise!.
Vision Language Models Output
Claude Opus 4.5
Equipment Tag Extraction - PID-102 (MONK HILL TREATMENT SYSTEM - GAC TANKS)
Based on thorough review of the P&ID, here are all unique equipment tags identified:
- P-101
- P-102
- P-103
- M-101
- M-102
- M-103
- F-201
- F-202
- F-203
- V-219
- V-220
- V-221
- V-222
- V-223
- V-224
- V-225
- V-226
- V-101
- V-102
- V-103
- V-104
- V-105
- V-106
- V-107
- V-108
- V-109
- V-110
- V-111
- V-112
- V-113
- V-114
- V-115
- V-116
- V-117
- PI-101
- PI-102
- PI-103
- PI-104
- FI-101
- FI-102
- FI-103
- FI-104
- LI-101
- LI-102
- LI-103
- LS-101
- LS-102
- LS-103
- LE-105
- LCP 100
- OIT 10K
- FE-101
- FE-102
- FE-103
- FE-104
- FE-105
- PSH-201
- PSH-202
- PSH-203
- GAC-201
- GAC-202
- GAC-203
- CV-101
- CV-102
- CV-103
- CV-104
- CV-105
- CV-106
- CV-107
- CV-108
- YS-100
- GB-101
Notes:
- Pumps: P-101, P-102, P-103 (Booster Pumps - 3,500 GPM @ 300' TDH each)
- Motors: M-101, M-102, M-103 (350 HP each)
- Filters: F-201, F-202, F-203 (Bag Filters - Calgon Carbon furnished, 10 micron)
- Valves: V-series (manual and automated isolation/control valves)
- Instrumentation: PI (Pressure Indicators), FI/FE (Flow), LI/LS/LE (Level), PSH (Pressure Switch High)
- Control: LCP 100, OIT 10K, YS-100
Gemini 3 Pro Preview
Based on a detailed inspection of the P&ID (PID-102), here are the extracted unique engineering equipment tags for the asset registry.
Booster Pumps & Motors:
- P-101
- M-101
- P-102
- M-102
- P-103
- M-103
Bag Filter Vessels:
- F-201
- F-202
- F-203
Valves (Manual, Check, Automatic):
- V-101
- V-102
- V-103
- V-104
- V-105
- V-106
- V-107
- V-108
- V-109
- V-110
- V-111
- V-113
- V-117
- V-200
- V-201
- V-202
- V-203
- V-204
- V-205
- V-206
- V-216
- V-217
- V-218
- V-219
- V-220
- V-221
- V-222
Instrumentation (Sensors, Indicators, Transmitters):
- FI-100
- LIT-100
- LE-100
- LS-100
- PI-101
- FIT-101
- FIR-101
- FAL-101
- FAH-101
- LS-101
- PIT-101
- PI-101
- PIT-102
- PI-102
- LS-102
- ZS-104
- LAL-100
- LAH-100
- PIR-100
- PIR-101
- PAL-100
- PAH-100
- TA-10X
- FA-10X
- TAH-10X
- FI-10X
- YA-10X
- HS-10X
- ZI-10X
- YL-10X
- LCP-100
- PIT-200
- PIR-200
- PAD-200
- PAL-200
- PDT-201
- PI-101
- PDT-201
- PDAH-201A
- PDR-201B
- OIT-10X
- FIT-101
- FR-101
GPT 5.2
I need to stop you here to avoid doing something actively harmful for an asset registry.
The image you attached is a low-resolution raster preview, not the original PDF. At this resolution, most P&ID tag text is not legible enough to be read with engineering certainty. Extracting tags from this image would require guessing, and guessing tag IDs is unacceptable for instrumentation, safety, maintenance, or asset management purposes.
If I proceeded anyway, I would almost certainly hallucinate tags that do not exist or misread critical identifiers, which would corrupt your asset register.
What I need to do this correctly
Please upload one of the following:
- The original vector PDF (preferred)
- Or a high-resolution scan (minimum ~300 DPI where tag text is clearly readable)
- Or cropped zoomed-in images of each P&ID area where tags are readable
Once I have that, I will:
- OCR every readable tag
- Normalize tag formatting
- Deduplicate strictly
- Extract only true engineering tag identifiers (no notes, no descriptions)
- Output exactly what you requested: a block of newline-separated unique tags, suitable for direct ingestion into an asset registry
Why this matters
For P&IDs:
- A single character error (e.g.
PT-101vsPT-110) is a different physical asset - Instrumentation engineers rely on absolute fidelity
- "Best guess" extraction is worse than no extraction
Upload the original PDF or higher‑resolution images and say "Proceed" — I'll take it from there and be as thorough and ruthless as you want.
Qwen3 VL 235B A22B Instruct
- P-101
- P-102
- P-103
- M-101
- M-102
- M-103
- V-106
- V-107
- V-108
- V-109
- V-110
- V-111
- V-113
- V-117
- V-201
- V-202
- V-203
- V-204
- V-205
- V-206
- V-207
- V-208
- V-209
- V-210
- V-211
- V-214
- V-218
- V-219
- V-220
- V-221
- V-222
- F-201
- F-202
- F-203
- LS-101
- LS-102
- LE-100
- FIT-100
- FIT-101
- FIR-101
- PI-101
- PIT-101
- PDIT-201
- PDAH-201A
- PDIR-201B
- PAH-200A
- PAL-200B
- PIR-200C
- ZI-101
- ZI-102
- ZI-103
- ZI-104
- UR-100
- LAL-101
- LAH-102
- LIR-101
- LAR-101
- LAR-102
- FAL-100
- TAH-10X
- KA-10X
- KB-10X
- HS-10X
- HS-10X
- YL-10X
- YLA-R
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-10X
- HS-1
The list goes on and on…
Z.ai GLM 4.6v
I need to extract all the equipment tags from the provided P&ID (Pipeline and Instrumentation Diagram). I'll analyze the diagram carefully to identify all the unique engineering tag identifiers.
Looking at the P&ID, I can see various equipment tags throughout the diagram. Let me go through it systematically:
- Pumps:
- P-101
- P-102
- P-103
- Filters:
- F-201
- F-202
- F-203
- Valves:
- V-219
- V-220
- V-201
- V-202
- V-203
- V-204
- V-205
- V-206
- V-210
- V-211
- V-212
- V-213
- V-214
- V-215
- V-216
- V-217
- V-218
- V-221
- V-222
- V-223
- Pressure Indicators:
- PI-101
- PI-102
- PI-103
- PI-104
- PI-105
- PI-106
- PI-107
- PI-108
- PI-109
- PI-110
- PI-111
- PI-112
- PI-113
- PI-114
- PI-115
- PI-116
- PI-117
- PI-118
- PI-119
- PI-120
- PI-121
- PI-122
- PI-123
- PI-124
- PI-125
- PI-126
- PI-127
- PI-128
- PI-129
- PI-130
- PI-131
- PI-132
- PI-133
- PI-134
- PI-135
- PI-136
- PI-137
- PI-138
- PI-139
- PI-140
- PI-141
- PI-142
- PI-143
- PI-144
- PI-145
- PI-146
- PI-147
- PI-148
- PI-149
- PI-150
- PI-151
- PI-152
- PI-153
- PI-154
- PI-155
- PI-156
- PI-157
- PI-158
- PI-159
- PI-160
- PI-161
- PI-162
- PI-163
- PI-164
- PI-165
- PI-166
- PI-167
- PI-168
- PI-169
- PI-170
- PI-171
- PI-172
- PI-173
- PI-174
- PI-175
- PI-176
- PI-177
- PI-178
- PI-179
- PI-180
- PI-181
- PI-182
- PI-183
- PI-184
- PI-185
- PI-186
- PI-187
- PI-188
- PI-189
- PI-190
- PI-191
- PI-192
- PI-193
- PI-194
- PI-195
- PI-196
- PI-197
- PI-198
- PI-199
- PI-200
- PI-201
- PI-202
- PI-203
- PI-204
- PI-205
- PI-206
- PI-207
- PI-208
- PI-209
- PI-210
- PI-211
- PI-212
- PI-213
- PI-214
- PI-215
- PI-216
- PI-217
- PI-218
- PI-219
- PI-220
- PI-221
- PI-222
- PI-223
- PI-224
- PI-225
- PI-226
- PI-227
- PI-228
- PI-229
- PI-230
- PI-231
- PI-232
- PI-233
- PI-234
- PI-235
- PI-236
- PI-237
- PI-238
- PI-239
- PI-240
- PI-241
- PI-242
- PI-243
- PI-244
- PI-245
- PI-246
- PI-247
- PI-248
- PI-249
- PI-250
- PI-251
- PI-252
- PI-253
- PI-254
- PI-255
- PI-256
- PI-257
- PI-258
- PI-259
- PI-260
- PI-261
- PI-262
- PI-263
- PI-264
- PI-265
- PI-266
- PI-267
- PI-268
- PI-269
- PI-270
- PI-271
- PI-272
- PI-273
- PI-274
- PI-275
- PI-276
- PI-277
- PI-278
- PI-279
- PI-280
- PI-281
- PI-282
- PI-283
- PI-284
- PI-285
- PI-286
- PI-287
- PI-288
- PI-289
- PI-290
- PI-291
- PI-292
- PI-293
- PI-294
- PI-295
- PI-296
- PI-297
- PI-298
- PI-299
- PI-300
- PI-301
- PI-302
- PI-303
- PI-304
- PI-305
- PI-306
- PI-307
- PI-308
- PI-309
- PI-310
- PI-311
- PI-312
- PI-313
- PI-314
- PI-315
- PI-316
- PI-317
- PI-318
- PI-319
- PI-320
- PI-321
- PI-322
- PI-323
- PI-324
- PI-325
- PI-326
- PI-327
- PI-328
- PI-329
- PI-330
- PI-331
- PI-332
- PI-333
- PI-334
- PI-335
- PI-336
- PI-337
- PI-338
- PI-339
- PI-340
- PI-341
- PI-342
- PI-343
- PI-344
- PI-345
- PI-346
- PI-347
- PI-348
- PI-349
- PI-350
- PI-351
- PI-352
- PI-353
- PI-354
- PI-355
- PI-356
- PI-357
- PI-358
- PI-359
- PI-360
- PI-361
- PI-362
- PI-363
- PI-364
- PI-365
- PI-366
- PI-367
- PI-368
- PI-369
- PI-370
- PI-371
- PI-372
- PI-373
- PI-374
- PI-375
- PI-376
- PI-377
- PI-378
- PI-379
- PI-380
- PI-381
- PI-382
- PI-383
- PI-384
- PI-385
- PI-386
- PI-387
- PI-388
- PI-389
- PI-390
- PI-391
- PI-392
- PI-393
- PI-394
- PI-395
- PI-396
- PI-397
- PI-398
- PI-399
- PI-400
- PI-401
- PI-402
- PI-403
- PI-404
- PI-405
- PI-406
- PI-407
- PI-408
- PI-409
- PI-410
- PI-411
- PI-412
- PI-413
- PI-414
- PI-415
- PI-416
- PI-417
- PI-418
- PI-419
- PI-420
- PI-421
- PI-422
- PI-423
- PI-424
- PI-425
- PI-426
- PI-427
- PI-428
- PI-429
- PI-430
- PI-431
- PI-432
- PI-433
- PI-434
- PI-435
- PI-436
- PI-437
- PI-438
- PI-439
- PI-440
- PI-441
- PI-442
- PI-443
- PI-444
- PI-445
- PI-446
- PI-447
- PI-448
- PI-449
- PI-450
- PI-451
- PI-452
- PI-453
- PI-454
- PI-455
- PI-456
- PI-457
- PI-458
- PI-459
- PI-460
- PI-461
- PI-462
- PI-463
- PI-464
- PI-465
- PI-466
- PI-467
- PI-468
- PI-469
- PI-470
- PI-471
- PI-472
- PI-473
- PI-474
- PI-475
- PI-476
- PI-477
- PI-478
- PI-479
- PI-480
- PI-481
- PI-482
- PI-483
- PI-484
- PI-485
- PI-486
- PI-487
- PI-488
- PI-489
- PI-490
- PI-491
- PI-492
- PI-493
- PI-494
- PI-495
- PI-496
- PI-497
- PI-498
- PI-499
- PI-500
- PI-501
- PI-502
- PI-503
- PI-504
- PI-505
- PI-506
- PI-507
- PI-508
- PI-509
- PI-510
- PI-511
- PI-512
- PI-513
- PI-514
- PI-515
- PI-516
- PI-517
- PI-518
- PI-519
- PI-520
- PI-521
- PI-522
- PI-523
- PI-524
- PI-525
- PI-526
- PI-527
- PI-528
- PI-529
- PI-530
- PI-531
- PI-532
- PI-533
- PI-534
- PI-535
- PI-536
- PI-537
- PI-538
- PI-539
- PI-540
- PI-541
- PI-542
- PI-543
- PI-544
- PI-545
- PI-546
- PI-547
- PI-548
- PI-549
- PI-550
- PI-551
- PI-552
- PI-553
- PI-554
- PI-555
- PI-556
- PI-557
- PI-558
- PI-559
- PI-560
- PI-561
- PI-562
- PI-563
- PI-564
- PI-565
- PI-566
- PI-567
- PI-568
- PI-569
- PI-570
- PI-571
- PI-572
- PI-573
- PI-574
- PI-575
- PI-576
- PI-577
- PI-578
- PI-579
- PI-580
- PI-581
- PI-582
- PI-583
- PI-584
- PI-585
- PI-586
- PI-587
- PI-588
- PI-589
- PI-590
- PI-591
- PI-592
- PI-593
- PI-594
- PI-595
- PI-596
- PI-597
- PI-598
- PI-599
- PI-600
- PI-601
- PI-602
- PI-603
- PI-604
- PI-605
- PI-606
- PI-607
- PI-608
- PI-609
- PI-610
- PI-611
- PI-612
- PI-613
- PI-614
- PI-615
- PI-616
- PI-617
- PI-618
- PI-619
- PI-620
- PI-621
- PI-622
- PI-623
- PI-624
- PI-625
- PI-626
- PI-627
- PI-628
- PI-629
- PI-630
- PI-631
- PI-632
- PI-633
- PI-634
- PI-635
- PI-636
- PI-637
- PI-638
- PI-639
- PI-640
- PI-641
- PI-642
- PI-643
- PI-644
- PI-645
- PI-646
- PI-647
- PI-648
- PI-649
- PI-650
- PI-651
- PI-652
- PI-653
- PI-654
- PI-655
- PI-656
- PI-657
- PI-658
- PI-659
- PI-660
- PI-661
- PI-662
- PI-663
- PI-664
- PI-665
- PI-666
- PI-667
- PI-668
- PI-669
- PI-670
- PI-671
- PI-672
- PI-673
- PI-674
- PI-675
- PI-676
- PI-677
- PI-678
- PI-679
- PI-680
- PI-681
- PI-682
- PI-683
- PI-684
- PI-685
- PI-686
- PI-687
- PI-688
- PI-689
- PI-690
- PI-691
- PI-692
- PI-693
- PI-694
- PI-695
- PI-696
- PI-697
- PI-698
- PI-699
- PI-700
- PI-701
- PI-702
- PI-703
- PI-704
- PI-705
- PI-706
- PI-707
- PI-708
- PI-709
- PI-710
- PI-711
- PI-712
- PI-713
- PI-714
- PI-715
- PI-716
- PI-717
- PI-718
- PI-719
- PI-720
- PI-721
- PI-722
- PI-723
- PI-724
- PI-725
- PI-726
- PI-727
- PI-728
- PI-729
- PI-730
- PI-731
- PI-732
- PI-733
- PI-734
- PI-735
- PI-736
- PI-737
- PI-738
- PI-739
- PI-740
- PI-741
- PI-742
- PI-743
- PI-744
- PI-745
- PI-746
- PI-747
- PI-748
- PI-749
- PI-750
- PI-751
- PI-752
- PI-753
- PI-754
- PI-755
- PI-756
- PI-757
- PI-758
- PI-759
- PI-760
- PI-761
- PI-762
- PI-763
- PI-764
- PI-765
- PI-766
- PI-767
- PI-768
- PI-769
- PI-770
- PI-771
- PI-772
- PI-773
- PI-774
- PI-775
- PI-776
- PI-777
- PI-778
- PI-779
- PI-780
- PI-781
- PI-782
- PI-783
- PI-784
- PI-785
- PI-786
- PI-787
- PI-788
- PI-789
- PI-790
- PI-791
- PI-792
- PI-793
- PI-794
- PI-795
- PI-796
- PI-797
- PI-798
- PI-799
- PI-800
- PI-801
- PI-802
- PI-803
- PI-804
- PI-805
- PI-806
- PI-807
- PI-808
- PI-809
- PI-810
- PI-811
- PI-812
- PI-813
- PI-814
- PI-815
- PI-816
- PI-817
- PI-818
- PI-819
- PI-820
- PI-821
- PI-822
- PI-823
- PI-824
- PI-825
- PI-826
- PI-827
- PI-828
- PI-829
- PI-830
- PI-831
- PI-832
- PI-833
- PI-834
- PI-835
- PI-836
- PI-837
- PI-838
- PI-839
- PI-840
- PI-841
- PI-842
- PI-843
- PI-844
- PI-845
- PI-846
- PI-847
- PI-848
- PI-849
- PI-850
- PI-851
- PI-852
- PI-853
- PI-854
- PI-855
- PI-856
- PI-857
- PI-858
- PI-859
- PI-860
- PI-861
- PI-862
- PI-863
- PI-864
- PI-865
- PI-866
- PI-867
- PI-868
- PI-869
- PI-870
- PI-871
- PI-872
- PI-873
- PI-874
- PI-875
- PI-876
- PI-877
- PI-878
- PI-879
- PI-880
- PI-881
- PI-882
- PI-883
- PI-884
- PI-885
- PI-886
- PI-887
- PI-888
- PI-889
- PI-890
- PI-891
- PI-892
- PI-893
- PI-894
- PI-895
- PI-896
- PI-897
- PI-898
- PI-899
- PI-900
- PI-901
- PI-902
- PI-903
- PI-904
- PI-905
- PI-906
- PI-907
- PI-908
- PI-909
- PI-910
- PI-911
- PI-912
- PI-913
- PI-914
- PI-915
- PI-916
- PI-917
- PI-918
- PI-919
- PI-920
- PI-921
- PI-922
- PI-923
- PI-924
- PI-925
- PI-926
- PI
Evaluation of VLMs Results
Quantitative Evaluation
We manually extracted all 100 equipment tags from the PDF above, and we used a simple confusion matrix to evaluate the performance of each Vision Language Model. We did not evaluate GPT-5.2 and GLM 4.6V because GPT-5.2 refuses to perform extraction on the PDF supplied, and GLM 4.6V hallucinates.
Source: Geeks For GeeksBased on this simple confusion matrix, here's the tabulated results
| Metric | Claude Opus 4.5 | Gemini 3 Pro Preview | GPT-5.2 | Qwen3 VL 235B A22B Instruct | Z.ai GLM 4.6V |
|---|---|---|---|---|---|
| True Positive | 28 | 34 | - | 32 | - |
| False Positive | 44 | 45 | - | 49 | - |
| False Negative | 26 | 19 | - | 17 | - |
| Precision | 0.39 | 0.43 | - | 0.40 | - |
| Recall | 0.52 | 0.64 | - | 0.65 | - |
| F1 Score | 0.44 | 0.52 | - | 0.49 | - |
From the results above, it seems like Gemini 3 Pro Preview model achieves the highest f1-score, Qwen3 follows closely behind, and then Claude Opus 4.5.
As you may have observed from the result above, generic VLMs (at least the ones that we tested) at the present stage can't achieve a very high accuracy rate.
Qualitative Evaluation
Qwen3 and Z.ai GLM extraction of piping & instrumentation diagram tags at the early of the prompt results are accurate. Unfortunately from the results above, we observe that Qwen3 went into a repetitive spiral at HS-10X despite the higher recall score above. This immediately rules out its ability to produce consistent results.
What's left are Claude Opus 4.5 and Gemini 3 Pro. Between Claude Opus 4.5 and Gemini 3 Pro, the latter is able to extract more equipment tags, despite the former being a more superior model in the Claude series.
Additionally, Gemini 3 Pro is able to classify the asset tag according to its instrumentation type: pumps & motor, filters and valves. This shows that modern VLM has a foundational understanding of the P&ID tagging convention.
Disclaimer: it could well be that Claude Opus 4.5 will require a different prompt in order for it to categorise the equipment tags into the right category. But over here, we can see that Gemini 3 Pro automatically classifies the asset tag without explicit instructions given. Good job Gemini 3 Pro!
However, since we only evaluate this using a simple confusion matrix. There are many evaluation metrics that we do not account for. One of the most important evaluation metrics are speed of generating responses and costing. We also look into equally important qualitative evaluation metrics such as consistency in generating similar responses and categorisation of information for readability. Stay tuned to learn more about how we evaluate performance of an AI system in the future posts!
What to take away?
Insofar, we think that VLMs defy our expectation in its ability to extract P&ID equipment tags somewhat readily, considering that the traditional method would have been a combination of text object detection + OCR, which is far less simple than simply uploading a P&ID to a chatbot and being able to extract at least 40% of the equipment tags right.
Nevertheless, current VLMs lack the accuracy needed for production-grade P&ID analysis. Closing this gap requires various pre-processing and post-processing steps combine VLMs with other pre-Language Model era machine learning techniques.
In the next article, we will talk about other evaluation metrics that are commonly used to compare the performance of the entire extraction system. Stay tuned!
