Previous: URLs Up: URLs Home: Next: Containers

Parsing

Algorithms which parse URLs return a view which references the underlying character buffer without taking ownership, avoiding memory allocations and copies. The following example parses a string literal containing a URI:

boost::core::string_view s = "https://user:pass@example.com:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor";

The function returns a result which holds a url_view if the string is a valid URL. Otherwise it holds an error_code. It is impossible to construct a url_view which refers to an invalid URL.

The caller is responsible for ensuring that the lifetime of the character buffer extends until it is no longer referenced by the view. These are the same semantics as that of std::string_view.

For convenience, a URL view can be constructed directly from the character buffer in a string_view. In this case, it parses the string according to the URI-reference grammar, throwing an exception upon failure. The following two statements are equivalent:

boost::system::result<url_view> r = parse_uri( s );

In this library, free functions which parse things are named with the word "parse" followed by the name of the grammar used to match the string. There are several varieties of URLs, and depending on the use-case a particular grammar may be needed. In the target of an HTTP GET request for example, the scheme and fragment are omitted. This corresponds to the origin-form production rule described in rfc7230. The function parse_origin_form is suited for this purpose. All the URL parsing functions are listed here:

Function Grammar Example Notes

http://www.boost.org/index.html?field=value

No fragment

/index.html?field=value

Used in HTTP

//www.boost.org/index.html?field=value#downloads

URI

http://www.boost.org/index.html?field=value#downloads

http://www.boost.org/index.html

Any URI or relative-ref

The URL is stored in its serialized form. Therefore, it can always be easily output, sent, or embedded as part of a protocol:

url u = parse_uri_reference( "https://www.example.com/path/to/file.txt" ).value();

assert(u.encoded_path() == "/path/to/file.txt");

A url is an allocating container which owns its character buffer. Upon construction from url_view, it allocates dynamic storage to hold a copy of the string.

boost::system::result< url > rv = parse_uri_reference( "https://www.example.com/path/to/file.txt" );

static_assert( std::is_convertible< boost::system::result< url_view >, boost::system::result< url > >::value, "" );

A static_url is a container which owns its character buffer for a URL whose maximum size is known. Upon construction from url_view, it does not perform any dynamic memory allocations.

boost::system::result< static_url<1024> > rv = parse_uri_reference( "https://www.example.com/path/to/file.txt" );

static_assert( std::is_convertible< boost::system::result< static_url<1024> >, boost::system::result< url > >::value, "" );

Result Type

These functions have a return type which uses the result alias template. This class allows the parsing algorithms to report errors without referring to exceptions.

The functions result::operator bool() and result::operator* can be used to check if the result contains an error.

boost::system::result< url > ru = parse_uri_reference( "https://www.example.com/path/to/file.txt" );
if ( ru )
{
    url u = *ru;
    assert(u.encoded_path() == "/path/to/file.txt");
}
else
{
    boost::system::error_code e = ru.error();
    handle_error(e);
}

Since result::operator bool() is already checking if result contains an error, result::operator* provides an unchecked alternative to get a value from result. In contexts where it is acceptable to throw errors, result::value can be used directly.

try
{
    url u = parse_uri_reference( "https://www.example.com/path/to/file.txt" ).value();
    assert(u.encoded_path() == "/path/to/file.txt");
}
catch (boost::system::system_error &e)
{
    handle_error(e);
}

Check the reference for result for a synopsis of the type.