Qual è la migliore espressione regolare per verificare se una stringa è un URL valido?

Come posso verificare se una determinata stringa è un indirizzo URL valido?

La mia conoscenza delle espressioni regolari è basilare e non mi permette di scegliere tra le centinaia di espressioni regolari che ho già visto sul web.

Ho scritto il mio pattern URL (in realtà IRI, internazionalizzato) per rispettare RFC 3987 ( http://www.faqs.org/rfcs/rfc3987.html ). Questi sono nella syntax PCRE.

Per IRI assoluto (internazionalizzato):

/^[az](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;[email protected]])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?$/i 

Per consentire anche IRI relativi:

 /^(?:[az](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;[email protected]])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?|(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;[email protected]])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;[email protected]])+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?)$/i 

Come sono stati compilati (in PHP):

  

Modifica 7 marzo 2011: a causa del modo in cui PHP gestisce i backslash nelle stringhe tra virgolette, questi sono inutilizzabili per impostazione predefinita. Sarà necessario evitare i backslash, ad eccezione dei casi in cui il backslash ha un significato speciale nella regex. Puoi farlo in questo modo:

 $escape_backslash = '/(? 

Ho appena scritto un post sul blog per un’ottima soluzione per il riconoscimento degli URL nei formati più usati come:

L’espressione regolare utilizzata è:

 /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w][email protected])?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w][email protected])[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/ 

Tuttavia, ti consiglio di andare su http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-withwith-the per vedere l’esempio di lavoro.

Quale piattaforma? Se si utilizza .NET, utilizzare System.Uri.TryCreate , non System.Uri.TryCreate regolare.

Per esempio:

 static bool IsValidUrl(string urlString) { Uri uri; return Uri.TryCreate(urlString, UriKind.Absolute, out uri) && (uri.Scheme == Uri.UriSchemeHttp || uri.Scheme == Uri.UriSchemeHttps || uri.Scheme == Uri.UriSchemeFtp || uri.Scheme == Uri.UriSchemeMailto /*...*/); } // In test fixture... [Test] void IsValidUrl_Test() { Assert.True(IsValidUrl("http://www.example.com")); Assert.False(IsValidUrl("javascript:alert('xss')")); Assert.False(IsValidUrl("")); Assert.False(IsValidUrl(null)); } 

(Grazie a @Yoshi per il suggerimento su javascript: 🙂

Ecco cosa usa RegexBuddy .

 (\b(https?|ftp|file)://)?[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|] 

Corrisponde a questi sotto (all’interno dei ** ** segni ** ** ):

 **http://www.regexbuddy.com** **http://www.regexbuddy.com/** **http://www.regexbuddy.com/index.html** **http://www.regexbuddy.com/index.html?source=library** 

Puoi scaricare RegexBuddy su http://www.regexbuddy.com/download.html .

Per quanto riguarda il post di risposta alla palpebra che recita “Questo è basato sulla mia lettura delle specifiche dell’URI.”: Grazie Eyelidness, la tua è la soluzione perfetta che ho cercato, in quanto basata sulle specifiche URI! Lavoro superbo. 🙂

Ho dovuto fare due emendamenti. Il primo a ottenere la regexp per abbinare correttamente gli URL degli indirizzi IP in PHP (v5.2.10) con la funzione preg_match ().

Ho dovuto aggiungere un altro set di parentesi alla riga sopra “Indirizzo IP” attorno ai tubi:

 )|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?# 

Non so perché.

Ho anche ridotto la lunghezza minima del dominio di primo livello da 3 a 2 lettere per supportare .co.uk e simili.

Codice finale:

 /^(https?|ftp):\/\/(?# protocol )(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?# username )(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?# password )@)?(?# auth requires @ )((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?# domain segments AND )[az][a-z0-9-]*[a-z0-9](?# top level domain OR )|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?# )(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?# IP address ))(:\d+)?(?# port ))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path )(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?# query string )?)?)?(?# path and query string optional )(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?# fragment )$/i 

Questa versione modificata non è stata verificata rispetto alle specifiche URI, quindi non posso garantire la sua conformità, è stata modificata per gestire gli URL su ambienti di rete locali e TLD a due cifre e altri tipi di URL Web e per funzionare meglio in PHP setup che uso.

Come codice PHP :

 define('URL_FORMAT', '/^(https?):\/\/'. // protocol '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password '@)?(?#'. // auth requires @ ')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND '[az][a-z0-9-]*[a-z0-9]'. // top level domain OR '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'. '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address ')(:\d+)?'. // port ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path '(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string '?)?)?'. // path and query string optional '(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment '$/i'); 

Ecco un programma di test in PHP che convalida una varietà di URL usando la regex:

  

Grazie ancora alla palpebra per la regex!

Mathias Bynens ha un grande articolo sul miglior confronto di molte espressioni regolari: Alla ricerca della regex di convalida dell’URL perfetto

Il migliore pubblicato è un po ‘lungo, ma corrisponde a qualsiasi cosa tu possa lanciare.

Versione JavaScript

 /^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)[email protected])?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\u00a1-\uffff0-9]-*)*[az\u00a1-\uffff0-9]+)(?:\.(?:[az\u00a1-\uffff0-9]-*)*[az\u00a1-\uffff0-9]+)*(?:\.(?:[az\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$/i 

Versione PHP

 _^(?:(?:https?|ftp)://)(?:\S+(?::\S*)[email protected])?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\x{00a1}-\x{ffff}0-9]-*)*[az\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[az\x{00a1}-\x{ffff}0-9]-*)*[az\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[az\x{00a1}-\x{ffff}]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$_iuS 

Il post Ottenere parti di un URL (Regex) discute l’analisi di un URL per identificare i suoi vari componenti. Se si desidera verificare se un URL è ben formato, dovrebbe essere sufficiente per le proprie esigenze.

Se hai bisogno di verificare se è effettivamente valido, alla fine dovrai provare ad accedere a qualsiasi cosa dall’altra parte.

In generale, però, probabilmente starai meglio usando una funzione che ti viene fornita dal tuo framework o da un’altra libreria. Molte piattaforms includono funzioni che analizzano gli URL. Ad esempio, c’è il modulo urlparse di Python e in .NET è ansible utilizzare il costruttore della class System.Uri come mezzo per convalidare l’URL.

Questo potrebbe non essere un lavoro per le regex, ma per gli strumenti esistenti nella lingua scelta. Probabilmente si desidera utilizzare il codice esistente che è già stato scritto, testato e sottoposto a debug.

In PHP, usa la funzione parse_url .

Perl: modulo URI .

Ruby: modulo URI .

.NET: class ‘Uri’

I regexes non sono una bacchetta magica che saluti a tutti i problemi che coinvolgono le stringhe.

Parser di riferimento URI non convalidato

Per scopi di riferimento, ecco la specifica IETF: ( TXT | HTML ). In particolare, Appendice B. L’ analisi di un riferimento URI con un’espressione regolare dimostra come analizzare una espressione regolare valida . Questo è descritto come

per un esempio di un parser di riferimento URI non validante che prenderà una stringa qualsiasi ed estrarrà i componenti URI.

Ecco la regex che forniscono:

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 

Come ha detto qualcun altro, probabilmente è meglio lasciarlo a un lib / framework che stai già usando.

Questo corrisponderà a tutti gli URL

  • con o senza http / https
  • con o senza www

… inclusi sottodomini e quelle nuove estensioni di nomi di dominio di primo livello come. museo,. accademia,. fondazione ecc. che può contenere fino a 63 caratteri (non solo. com , .net , info ecc.)

 (([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)? 

Perché oggi la lunghezza massima dell’estensione del nome di dominio di primo livello disponibile è di 13 caratteri come. internazionale , puoi cambiare il numero 63 nell’espressione a 13 per impedire che qualcuno lo abusi.

come javascript

 var urlreg=/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/; $('textarea').on('input',function(){ var url = $(this).val(); $(this).toggleClass('invalid', urlreg.test(url) == false) }); $('textarea').trigger('input'); 
 textarea{color:green;} .invalid{color:red;} 
      

La migliore espressione regolare per l’URL per me sarebbe:

 "(([\w]+:)?//)?(([\d\w]|%[a-fA-F\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?" 
  function validateURL(textval) { var urlregex = new RegExp( "^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$"); return urlregex.test(textval); } 

Corrisponde a http://site.com/dir/file.php?var=moo | ftp: // utente: [email protected]: 21 / file / dir

Partite non corrispondenti site.com | http://site.com/dir//

 function validateURL(textval) { var urlregex = new RegExp( "^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$"); return urlregex.test(textval); } 

Corrisponde a http://www.asdah.com/~joe | ftp://ftp.asdah.co.uk:2828/asdah%20asdah.gif | https://asdah.gov/asdh-ah.as

Se cerchi davvero la partita definitiva , probabilmente la troverai su ” Un’espressione regolare di un buon URL? “.

Ma una regex che corrisponde veramente a tutti i possibili domini e consente tutto ciò che è permesso in base alle RFC è orribilmente lunga e illeggibile, fidati di me 😉

Non sono stato in grado di trovare la regex che stavo cercando, quindi ho modificato una regex per soddisfare le mie esigenze e sembra che ora funzioni bene. I miei requisiti erano:

Ecco cosa mi è venuto in mente, ogni suggerimento è apprezzato:

 @Test public void testWebsiteUrl(){ String regularExpression = "((http|ftp|https):\\/\\/)?[\\w\\-_]+(\\.[\\w\\-_]+)+([\\w\\-\\.,@?^=%&:/~\\+#]*[\\w\\-\\@?^=%&/~\\+#])?"; assertTrue("www.google.com".matches(regularExpression)); assertTrue("www.google.co.uk".matches(regularExpression)); assertTrue("http://www.google.com".matches(regularExpression)); assertTrue("http://www.google.co.uk".matches(regularExpression)); assertTrue("https://www.google.com".matches(regularExpression)); assertTrue("https://www.google.co.uk".matches(regularExpression)); assertTrue("google.com".matches(regularExpression)); assertTrue("google.co.uk".matches(regularExpression)); assertTrue("google.mu".matches(regularExpression)); assertTrue("mes.intnet.mu".matches(regularExpression)); assertTrue("cse.uom.ac.mu".matches(regularExpression)); assertTrue("http://www.google.com/path".matches(regularExpression)); assertTrue("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e".matches(regularExpression)); assertTrue("http://www.google.com/?queryparam=123".matches(regularExpression)); assertTrue("http://www.google.com/path?queryparam=123".matches(regularExpression)); assertFalse("www..dr.google".matches(regularExpression)); assertFalse("www:google.com".matches(regularExpression)); assertFalse("https://[email protected]".matches(regularExpression)); assertFalse("https://www.google.com\"".matches(regularExpression)); assertFalse("https://www.google.com'".matches(regularExpression)); assertFalse("http://www.google.com/path'".matches(regularExpression)); assertFalse("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e'".matches(regularExpression)); assertFalse("http://www.google.com/?queryparam=123'".matches(regularExpression)); assertFalse("http://www.google.com/path?queryparam=12'3".matches(regularExpression)); } 

Ho scritto una versione un po ‘groovy che puoi eseguire

corrisponde ai seguenti URL (che è abbastanza buono per me)

 public static void main(args){ String url = "go to http://www.m.abut.ly/abc its awesome" url = url.replaceAll(/https?:\/\/w{0,3}\w*?\.(\w*?\.)?\w{2,3}\S*|www\.(\w*?\.)?\w*?\.\w{2,3}\S*|(\w*?\.)?\w*?\.\w{2,3}[\/\?]\S*/ , { it -> "woof${it}woof" }) println url } 

http://google.com

http://google.com/help.php

http://google.com/help.php?a=5

http://www.google.com

http://www.google.com/help.php

http://www.google.com?a=5

google.com?a=5

google.com/help.php

google.com/help.php?a=5

http://www.m.google.com/help.php?a=5 (e tutte le sue permutazioni)

http://www.m.google.com/help.php?a=5 (e tutte le sue permutazioni)

m.google.com/help.php?a=5 (e tutte le sue permutazioni)

La cosa importante per tutti gli URL che non iniziano con http o www è che devono includere un / o?

Scommetto che questo può essere ottimizzato un po ‘di più, ma fa il lavoro abbastanza bene per essere così breve e compatto … perché puoi dividerlo in 3:

trova tutto ciò che inizia con http: https?: // w {0,3} \ w *?. \ w {2,3} \ S *

trova tutto ciò che inizia con www: www. \ w *?. \ w {2,3} \ S *

o trovare qualcosa che deve avere un testo quindi un punto quindi almeno 2 lettere e poi un? o /: \ w *?. \ w {2,3} [/ \?] \ S *

Io uso questo regex:

 ((https?:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)[email protected])?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)? 

Per supportare entrambi:

 http://stackoverflow.com https://stackoverflow.com 

E:

 //stackoverflow.com 

Ho lavorato a un approfondito articolo sulla convalida dell’URI usando espressioni regolari. È basato su RFC3986.

Convalida URI di espressioni regolari

Sebbene l’articolo non sia ancora completo, ho creato una funzione PHP che fa un ottimo lavoro di convalida degli URL HTTP e FTP. Ecco la versione attuale:

 // function url_valid($url) { Rev:20110423_2000 // // Return associative array of valid URI components, or FALSE if $url is not // RFC-3986 compliant. If the passed URL begins with: "www." or "ftp.", then // "http://" or "ftp://" is prepended and the corrected full-url is stored in // the return array with a key name "url". This value should be used by the caller. // // Return value: FALSE if $url is not valid, otherwise array of URI components: // eg // Given: "http://www.jmrware.com:80/articles?height=10&width=75#fragone" // Array( // [scheme] => http // [authority] => www.jmrware.com:80 // [userinfo] => // [host] => www.jmrware.com // [IP_literal] => // [IPV6address] => // [ls32] => // [IPvFuture] => // [IPv4address] => // [regname] => www.jmrware.com // [port] => 80 // [path_abempty] => /articles // [query] => height=10&width=75 // [fragment] => fragone // [url] => http://www.jmrware.com:80/articles?height=10&width=75#fragone // ) function url_valid($url) { if (strpos($url, 'www.') === 0) $url = 'http://'. $url; if (strpos($url, 'ftp.') === 0) $url = 'ftp://'. $url; if (!preg_match('/# Valid absolute URI having a non-empty, valid DNS host. ^ (?P[A-Za-z][A-Za-z0-9+\-.]*):\/\/ (?P (?:(?P(?:[A-Za-z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*)@)? (?P (?P \[ (?: (?P (?: (?:[0-9A-Fa-f]{1,4}:){6} | ::(?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3} | (?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2} | (?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4}: | (?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?:: ) (?P[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4} | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?:: ) | (?P[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&\'()*+,;=:]+) ) \] ) | (?P(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)) | (?P(?:[A-Za-z0-9\-._~!$&\'()*+,;=]|%[0-9A-Fa-f]{2})+) ) (?::(?P[0-9]*))? ) (?P(?:\/(?:[A-Za-z0-9\-._~!$&\'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*) (?:\?(?P (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? (?:\#(?P (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? $ /mx', $url, $m)) return FALSE; switch ($m['scheme']) { case 'https': case 'http': if ($m['userinfo']) return FALSE; // HTTP scheme does not allow userinfo. break; case 'ftps': case 'ftp': break; default: return FALSE; // Unrecognized URI scheme. Default to FALSE. } // Validate host name conforms to DNS "dot-separated-parts". if ($m['regname']) { // If host regname specified, check for DNS conformance. if (!preg_match('/# HTTP DNS host name. ^ # Anchor to beginning of string. (?!.{256}) # Overall host length is less than 256 chars. (?: # Group dot separated host part alternatives. [A-Za-z0-9]\. # Either a single alphanum followed by dot | # or... part has more than one char (63 chars max). [A-Za-z0-9] # Part first char is alphanum (no dash). [A-Za-z0-9\-]{0,61} # Internal chars are alphanum plus dash. [A-Za-z0-9] # Part last char is alphanum (no dash). \. # Each part followed by literal dot. )* # Zero or more parts before top level domain. (?: # Explicitly specify top level domains. com|edu|gov|int|mil|net|org|biz| info|name|pro|aero|coop|museum| asia|cat|jobs|mobi|tel|travel| [A-Za-z]{2}) # Country codes are exactly two alpha chars. \.? # Top level domain can end in a dot. $ # Anchor to end of string. /ix', $m['host'])) return FALSE; } $m['url'] = $url; for ($i = 0; isset($m[$i]); ++$i) unset($m[$i]); return $m; // return TRUE == array of useful named $matches plus the valid $url. } 

This function utilizes two regexes; one to match a subset of valid generic URIs (absolute ones having a non-empty host), and a second to validate the DNS “dot-separated-parts” host name. Although this function currently validates only HTTP and FTP schemes, it is structured such that it can be easily extended to handle other schemes.

This one works for me very well. (https?|ftp)://(www\d?|[a-zA-Z0-9]+)?\.[a-zA-Z0-9-]+(\:|\.)([a-zA-Z0-9.]+|(\d+)?)([/?:].*)?

Here’s a ready-to-go Java version from the Android source code. This is the best one I’ve found.

 public static final Matcher WEB = Pattern.compile(new StringBuilder() .append("((?:(http|https|Http|Https|rtsp|Rtsp):") .append("\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)") .append("\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_") .append("\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?") .append("((?:(?:[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}\\.)+") // named host .append("(?:") // plus top level domain .append("(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])") .append("|(?:biz|b[abdefghijmnorstvwyz])") .append("|(?:cat|com|coop|c[acdfghiklmnoruvxyz])") .append("|d[ejkmoz]") .append("|(?:edu|e[cegrstu])") .append("|f[ijkmor]") .append("|(?:gov|g[abdefghilmnpqrstuwy])") .append("|h[kmnrtu]") .append("|(?:info|int|i[delmnoqrst])") .append("|(?:jobs|j[emop])") .append("|k[eghimnrwyz]") .append("|l[abcikrstuvy]") .append("|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])") .append("|(?:name|net|n[acefgilopruz])") .append("|(?:org|om)") .append("|(?:pro|p[aefghklmnrstwy])") .append("|qa") .append("|r[eouw]") .append("|s[abcdeghijklmnortuvyz]") .append("|(?:tel|travel|t[cdfghjklmnoprtvwz])") .append("|u[agkmsyz]") .append("|v[aceginu]") .append("|w[fs]") .append("|y[etu]") .append("|z[amw]))") .append("|(?:(?:25[0-5]|2[0-4]") // or ip address .append("[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]") .append("|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]") .append("[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}") .append("|[1-9][0-9]|[0-9])))") .append("(?:\\:\\d{1,5})?)") // plus option port number .append("(\\/(?:(?:[a-zA-Z0-9\\;\\/\\?\\:\\@\\&\\=\\#\\~") // plus option query params .append("\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?") .append("(?:\\b|$)").toString() ).matcher(""); 

I found the following Regex for URLs, tested successfully with 500+ URLs :

/\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)[email protected])?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\x{00a1}-\x{ffff}0-9]+-?)*[az\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[az\x{00a1}-\x{ffff}0-9]+-?)*[az\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[az\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b/gi

I know it looks ugly, but the good thing is that it works. 🙂

Explanation and demo with 581 random URLs on regex101.

Source: In search of the perfect URL validation regex

I tried to formulate my version of url. My requirement was to capture instances in a String where possible url can be cse.uom.ac.mu – noting that it is not preceded by http nor www

 String regularExpression = "((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})"; assertTrue("www.google.com".matches(regularExpression)); assertTrue("www.google.co.uk".matches(regularExpression)); assertTrue("http://www.google.com".matches(regularExpression)); assertTrue("http://www.google.co.uk".matches(regularExpression)); assertTrue("https://www.google.com".matches(regularExpression)); assertTrue("https://www.google.co.uk".matches(regularExpression)); assertTrue("google.com".matches(regularExpression)); assertTrue("google.co.uk".matches(regularExpression)); assertTrue("google.mu".matches(regularExpression)); assertTrue("mes.intnet.mu".matches(regularExpression)); assertTrue("cse.uom.ac.mu".matches(regularExpression)); //cannot contain 2 '.' after www assertFalse("www..dr.google".matches(regularExpression)); //cannot contain 2 '.' just before com assertFalse("www.dr.google..com".matches(regularExpression)); // to test case where url www must be followed with a '.' assertFalse("www:google.com".matches(regularExpression)); // to test case where url www must be followed with a '.' //assertFalse("http://wwwe.google.com".matches(regularExpression)); // to test case where www must be preceded with a '.' assertFalse("https://[email protected]".matches(regularExpression)); 

For Python, this is the actual URL validating regex used in Django 1.5.1:

 import re regex = re.compile( r'^(?:http|ftp)s?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[AZ]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain... r'localhost|' # localhost... r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4 r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6 r'(?::\d+)?' # optional port r'(?:/?|[/?]\S+)$', re.IGNORECASE) 

This does both ipv4 and ipv6 addresses as well as ports and GET parameters.

Found in the code here , Line 44.

whats wrong with plain and simple FILTER_VALIDATE_URL ?

  $url = "http://www.example.com"; if(!filter_var($url, FILTER_VALIDATE_URL)) { echo "URL is not valid"; } else { echo "URL is valid"; } 

I know its not the question exactly but it did the job for me when I needed to validate urls so thought it might be useful to others who come across this post looking for the same thing

The following RegEx will work:

 "@((((ht)|(f))tp[s]?://)|(www\.))([az][-a-z0-9]+\.)?([az][-a-z0-9]+\.)?[az][-a-z0-9]+\.[az]+[/]?[a-z0-9._\/~#&=;%+?-]*@si" 

For convenience here’s a one-liner regexp for URL’s that will also match localhost where you’re more likely to have ports than .com or similar.

 (http(s)?:\/\/.)?(www\.)?[[email protected]:%._\+~#=]{2,256}(\.[az]{2,6}|:[0-9]{3,4})\b([[email protected]:%_\+.~#?&\/\/=]*) 

You don’t specify which language you’re using. If PHP is, there is a native function for that:

 $url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/'; if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) { // Wrong } else { // Valid } 

Returns the filtered data, or FALSE if the filter fails.

Check it here >>

Spero che sia d’aiuto.

I hope it’s helpful for you…

 ^(http|https):\/\/+[\www\d]+\.[\w]+(\/[\w\d]+)? 

This is a rather old thread now and the question asks for a regex based URL validator. I ran into the thread whilst looking for precisely the same thing. While it may well be possible to write a really comprehensive regex to validate URLs I eventually settled on another way to do things – by using PHP’s parse_url function.

It returns boolean false if the url cannot be parsed. Otherwise it returns the scheme, the host and other information. This may well not be enough for a comprehensive URL check on its own but can be drilled down into for further analysis. If the intent is to simply catch typos, invalid schemes etc it is perfectly adequate.

To Check URL regex would be:

 ^http(s{0,1})://[a-zA-Z0-9_/\\-\\.]+\\.([A-Za-z/]{2,5})[a-zA-Z0-9_/\\&\\?\\=\\-\\.\\~\\%]*