Index of /datasets/supplement/2019-imc-hoiho

Icon  Name                         Last modified      Size  Description
[DIR] Parent Directory - [   ] 201007-midar-iff.re 04-Jun-2019 20:22 480K [   ] 201007-midar-iff.routers.bz2 17-Sep-2019 13:55 9.9M [   ] 201104-midar-iff.re 04-Jun-2019 20:22 492K [   ] 201104-midar-iff.routers.bz2 17-Sep-2019 13:55 13M [   ] 201110-midar-iff.re 04-Jun-2019 20:22 481K [   ] 201110-midar-iff.routers.bz2 17-Sep-2019 13:55 12M [   ] 201207-midar-iff.re 04-Jun-2019 20:22 480K [   ] 201207-midar-iff.routers.bz2 17-Sep-2019 13:55 14M [   ] 201304-midar-iff.re 04-Jun-2019 20:22 445K [   ] 201304-midar-iff.routers.bz2 17-Sep-2019 13:55 14M [   ] 201307-midar-iff.re 04-Jun-2019 20:22 396K [   ] 201307-midar-iff.routers.bz2 17-Sep-2019 13:56 15M [   ] 201404-midar-iff.re 04-Jun-2019 20:22 447K [   ] 201404-midar-iff.routers.bz2 17-Sep-2019 13:56 15M [   ] 201412-midar-iff.re 04-Jun-2019 20:22 444K [   ] 201412-midar-iff.routers.bz2 17-Sep-2019 13:56 16M [   ] 201508-midar-iff.re 04-Jun-2019 20:22 438K [   ] 201508-midar-iff.routers.bz2 17-Sep-2019 13:56 15M [   ] 201603-midar-iff.re 04-Jun-2019 20:22 459K [   ] 201603-midar-iff.routers.bz2 17-Sep-2019 13:56 16M [   ] 201609-midar-iff.re 04-Jun-2019 20:22 451K [   ] 201609-midar-iff.routers.bz2 17-Sep-2019 13:56 18M [   ] 201702-midar-iff.re 04-Jun-2019 20:22 455K [   ] 201702-midar-iff.routers.bz2 17-Sep-2019 13:56 17M [   ] 201708-midar-iff.re 04-Jun-2019 20:22 473K [   ] 201708-midar-iff.routers.bz2 17-Sep-2019 13:56 17M [   ] 201803-midar-iff.re 04-Jun-2019 20:22 448K [   ] 201803-midar-iff.routers.bz2 17-Sep-2019 13:57 18M [   ] 201901-midar-iff.re 04-Jun-2019 20:22 356K [   ] 201904-midar-iff.re 04-Jun-2019 20:22 394K [TXT] README.txt 17-Sep-2019 14:37 1.8K [   ] public_suffix_list.dat 17-Sep-2019 13:57 188K [DIR] web/ 17-Sep-2019 17:50 -
This public dataset contains the data used to train our system to
learn regular expressions that extract router names from hostnames.
It also includes the "best" regular expressions inferred for each
suffix with at least one training router.  Note, not all of the
regular expressions are useful, and you should exercise your best
judgement as to which expressions are useful.  We have included
web pages showing how the best regular expressions applied given
the training data to help you exercise your judgement.

If you use this data, you are required to cite:

 M. Luckie, B. Huffaker, and k. claffy.  Learning to Extract Router
 Names from Hostnames.  Proc. ACM Internet Measurement Conference
 2019.

You are also required to cite the ITDK, from which this data is
derived.  The instructions for citing the ITDK are included at:

 http://data.caida.org/datasets/topology/ark/ipv4/

The data is designed to be used with sc_hoiho, which is included
as part of scamper:

 https://www.caida.org/tools/measurement/scamper/

To obtain the inferred regular expressions which are included in this
dataset release, you will need to build sc_hoiho by passing
--with-sc_hoiho and either --with-pcre or --with-pcre2 to configure.
When building sc_hoiho, ensure pcre (or pcre2) is in the path where
your compiler looks for header files and libraries.  For example:

CFLAGS='-I/usr/local/include' LDFLAGS='-L/usr/local/lib' ./configure \
 --with-sc_hoiho --with-pcre2

and then run:

sc_hoiho -d best-regex public_suffix_list.dat <training-set>.routers

Note that this can take some time to complete.  If you are only
concerned with a regular expression for a single domain, you can pass
-D <domain> to sc_hoiho.  Other options to sc_hoiho are documented in
the manual page for sc_hoiho.

 https://www.caida.org/tools/measurement/scamper/man/sc_hoiho.1.pdf