COMMAND

    Passive System Fingerprinting using Network Client Applications

SYSTEMS AFFECTED

    munices

PROBLEM

    Jose Nazario posted following white paper.  Here is his a low-jack
    approach   to   passive   network   analysis.     Passive   target
    fingerprinting involves the utilization of network traffic between
    two hosts by a third system to identify the types of systems being
    used.  Because no data is sent to either system by the  monitoring
    party, detection  approaches the  impossible.   Methods which rely
    solely on  the IP  options present  in normal  traffic are limited
    in the  accuracy about  the targets.   Further inspection  is also
    needed  to  determine  avenues  of  vulnerability,  as  well.   We
    describe a  method to  rapidly identify  target operating  systems
    and version, as well as vectors  of attack, based on data sent  by
    client  applications.   While  simplistic,  it  is  robust.    The
    accuracy of this method  is also quite high  in most cases.   Four
    methods of fingerprinting a system are presented, with sample data
    provided.

    Passive OS mapping has become a new area of research in both white
    hat and black  hat arenas.   For the white  hat, it becomes  a new
    method to map their network and monitor traffic for security.  For
    example,  a  new  and  possibly  subversive host can be identified
    quickly,  often  with  great  accuracy.   For  the black hat, this
    method provides  a nearly  undetectable method  to map  a network,
    finding vulnerable hosts.

    To be sure, passive mapping can be a time consuming process.  Even
    with automated tools like Siphon (a sufficient quantity packets to
    arrive  to  build  up  a  statistically significant reading of the
    subjects'  operating  systems).    Compare  this   to  active   OS
    fingerprinting methods, using tools like nmap and queso, which can
    operate  in  under  a  minute  usually,  and  only more determined
    attackers, or curious types, will be attracted to this method.

    Siphon, nmap and queso are available from:

        http://www.subterrain.net/projects/siphon/
        http://www.insecure.org/nmap/
        http://www.apostols.org/

    Two  major  methods  of  operating  system fingerprinting exist in
    varying  degrees  of  use,  active  and  passive.  Active scanning
    involves the use of  IP packets sent to  the host and the  scanner
    then  monitoring  the  replies  to  guess  the  operating systems.
    Passive scanning, in contrast, allows the scanning party to obtain
    information in the absence of any packets sent from the  listening
    system to  the targets.   Each method  has their  advantages,  and
    their limitations.

    Active Scanning
    ===============
    By now nearly everyone  is familiar with active  scanning methods.
    The premier port scanning tool,  nmap, has been equipped for  some
    time now  with accurate  active scanning  measures.   This code is
    based off of an earlier tool, queso, from the group The  Apostols.
    Nmap's  author,  Fyodor,  has  written  an excellent paper on this
    topic in the e-zine Phrack (issue  54 article 9).  Ofir Arkin  has
    been  using  ICMP  bit  handling  to differentiate between certain
    types of operating systems.  Because ICMP usually slips below  the
    threshold of  analysis, and  most of  the ICMP  messages used  are
    legitimate, the detection of  this scanning can be  more difficult
    than, say, queso or nmap fingerprinting.

    The problems with  active scanning are  mainly twofold: first,  we
    can readily firewall the  packets used to fingerprint  our system,
    obfuscating  the  information;  secondly,  we  can detect it quite
    easily.   Because  of  this,  it  is  less  attractive for a truly
    stealthy adversary.


    Passive Scanning
    ================
    In  a  message  dated  June   30,  1999,  Photon  posted  to   the
    nmap-hackers  list  with  some  ideas  of passive operating system
    fingerprinting (this note is  available from the MARC  archives of
    the nmap-hackers  list).   He set  up a  webpage with  some of his
    thoughts, which  has since  been taken  down.   In short, by using
    default  IP  packet  construction  behavior, including default TTL
    values, the presence of the DF  bit, and the like, one can  gain a
    confident level of the system's OS.

    These ideas were quickly picked up by others and several lines of
    research have been active since then. Lance Spitzer's paper dated
    May 24 2000:

        http://www.enteract.com/~lspitz/pubs.html

    on  passive  fingerprinting  included  many  of the data needed to
    build such a tool.  In fact, two quickly appeared, one from  Craig
    Smith and another tool called p0f from Michael Zalewski:

        http://www.enteract.com/~lspitz/passfing.tar.gz
        http://lcamtuf.hack.pl/p0f.tgz

    One  very  interesting  tool  that  is  under  active development,
    extending the earlier work, is  Siphon.  By utilizing not  only IP
    stack behavior,  but also  routing information  and spanning  tree
    updates, a complete network map  can be built over time.   Passive
    port  scans  also  take  place,  adding  to  the  data.  This tool
    promises to be truly useful for the white hat, and a patient black
    hat.

    One limitation of these methods, though, is that they only provide
    a measure of the operating system.  Vulnerabilities may or may not
    exist, and further investigations  must be undertaken to  evaluate
    if this is the  case.  While suitable  for the white hat  for most
    purposes (like  accounting), this  is not  suitable to  a would-be
    attacker.  Simply put, more information is needed.


    An Alternative Approach
    =======================
    An  alternative  method  to  merely  fingerprinting  the operating
    system  is   to  perform   an  identification   by  using   client
    applications.  Quite  a number of  network clients send  revealing
    information  about   their  host   systems,  either   directly  or
    indirectly.  We use application  level information to map back  to
    the operating system, either directly or indirectly.

    One very large advantage to  the method described here is  that in
    some  situations,  much  more  accurate  information can be gained
    about the  client.   Because of  stack similarities,  most Windows
    systems,  including  95,  98  and  NT  4.0,  look  too  similar to
    differentiate.   The client  application, however,  is willing  to
    reveal this information.

    This provides not only a measure of the target's likely  operating
    system, but  also a  likely vector  for entrance.   Most of  these
    client applications  have numerous  security holes,  to which  one
    can point malicious data. In some cases, this can provide the  key
    information needed to  begin infiltrating a  network, and one  can
    proceed more rapidly.  In most cases it provides a starting  point
    for the analysis of vulnerabilities of a network.

    One major limitation of this method, however, comes when a  system
    is emulating another to provide  access to client software.   This
    includes Solaris and SCO's support  for Linux binaries.  As  such,
    under  these  circumstances,  the  data  should be taken with some
    caution and evaluated in the presence of other information.   This
    limitation, however, is  similar to the  limitation that IP  stack
    tweaking can place on passive  fingerprinting at the IP level,  or
    the  effect  on   active  scanning  from   these  adjustments   or
    firewalling.

    Four different type  of network clients  are discussed here  which
    provide suitable fingerprinting information.  Email clients, which
    leave  telltale  information  in  most  cases  on  their messages;
    Usenet clients, which, like mail applications, litter their  posts
    with client  system information;  web browsers,  which send client
    information with  each request;   and even  the ubiquitous  telnet
    client, which sends  such information more  quietly, but can  just
    as effectively fingerprint an operating system.

    Knowing this, one now only  needs to harvest the network  for this
    information  and  map  it  to  source  addresses.   Various tools,
    including sniffers,  both generic  and specialized,  and even  web
    searches will yield this information.  A rapid analysis of systems
    can be quickly performed.  This works quite well for the white hat
    and the black hat hacker, as well.

    In this paper is described  a low tech approach to  fingerprinting
    systems for  both their  operating system  and a  likely route  to
    gaining entry.   By using  application level  data sent  from them
    over the  network, we  can quickly  gather accurate  data about  a
    system.  In some  cases, one doesn't even  have to be on  the same
    network as the targets, they can gather the information from afar,
    compile the information and use it at their discretion at a  later
    date.

    Mail Clients
    ------------
    One of the largest type of traffic the network sees is  electronic
    mail.  Nearly  everyone who uses  the Internet on  a regular basis
    uses email in those transaction  sessions.  They not only  receive
    mail, but also  send a good  amount of mail,  too.  Because  it is
    ubiquitous, it  makes an  especially attractive  avenue for system
    fingerprinting and ultimately penetration.

    Within the headers  of nearly every  mail message is  some form of
    system identification.  Either through the use of crafted  message
    identification tags, as  used by Eudora  and Pine, or  by explicit
    header information, such as  headers generated by OutLook  clients
    or CDE mail clients.

    The scope of this method, both in terms of information gained  and
    the potential impact, should not be underestimated.  If  anything,
    viruses that  spread by  email, including  ones that  are used  to
    steal passwords from systems, should illustrate the  effectiveness
    of this method.

    Pine, for  example, itself  is one  of the  worst offenders of any
    application for the system it is  on.  It gives away a  whole host
    of information useful to an attacker in one fell swoop.  To wit:

        Message-ID: <Pine.LNX.4.10.9907191137080.14866-100000@somehost.example.ca>

    It is clear it's Pine, we know the version (4.10), and we know the
    system type.  Too much about it,  in fact.  This is a list  of the
    main ports of Pine as of 4.30:

        a41	IBM RS/6000 running AIX 4.1 or 4.2
        a32	IBM RS/6000 running AIX 3.2 or earlier
        aix	IBM S/370 AIX
        aos	AOS for IBM RT (untested)
        mnt	FreeMint
        aux	Macintosh A/UX
        bsd	BSD 4.3
        bs3	BSDi BSD/386 Version 3 and Version 4
        bs2	BSDi BSD/386 Version 2
        bsi	BSDi BSD/386 Version 1
        dpx	Bull DPX/2 B.O.S.
        cvx	Convex
        d54	Data General DG/UX 5.4
        d41	Data General DG/UX 4.11 or earlier
        d-g	Data General DG/UX (even earlier)
        ult	DECstation Ultrix 4.1 or 4.2
        gul	DECstation Ultrix using gcc compiler
        vul	VAX Ultrix
        os4	Digital Unix v4.0
        osf	DEC OSF/1 v2.0 and Digital Unix (OSF/1) 3.n
        sos	DEC OSF/1 v2.0 with SecureWare
        epx	EP/IX System V
        bsf	FreeBSD
        gen	Generic port
        hpx	Hewlett Packard HP-UX 10.x
        hxd	Hewlett Packard HP-UX 10.x with DCE security
        ghp	Hewlett Packard HP-UX 10.x using gcc compiler
        hpp	Hewlett Packard HP-UX 8.x and 9.x
        shp	Hewlett Packard HP-UX 8.x and 9.x with Trusted Computer Base
        gh9	Hewlett Packard HP-UX 8.x and 9.x using gcc compiler
        isc	Interactive Systems Unix
        lnx	Linux using crypt from the C library
        lnp	Linux using Pluggable Authentication Modules (PAM)
        slx	Linux using -lcrypt to get the crypt function
        sl4	Linux using -lshadow to get the crypt() function
        sl5	Linux using shadow passwords, no extra libraries
        lyn	Lynx Real-Time System (Lynxos)
        mct	Tenon MachTen (Mac)
        osx Macintosh OS X
        neb	NetBSD
        nxt	NeXT 68030's and 68040's Mach 2.0
        bso	OpenBSD with shared-lib
        sc5	SCO Open Server 5.x
        sco	SCO Unix
        pt1	Sequent Dynix/ptx v1.4
        ptx	Sequent Dynix/ptx
        dyn	Sequent Dynix (not ptx)
        sgi	Silicon Graphics Irix
        sg6	Silicon Graphics Irix >= 6.5
        so5	Sun Solaris >= 2.5
        gs5	Sun Solaris >= 2.5 using gcc compiler
        so4	Sun Solaris <= 2.4
        gs4	Sun Solaris <= 2.4 using gcc compiler
        sun	Sun SunOS 4.1
        ssn	Sun SunOS 4.1 with shadow password security
        gsu	SunOS 4.1 using gcc compiler
        s40	Sun SunOS 4.0
        sv4	System V Release 4
        uw2	UnixWare 2.x and 7.x
        wnt	Windows NT 3.51

    Pine system types used in Message-ID  tags as of Pine 4.30.   This
    table was gathered from the  supported systems listed in the  Pine
    source code  documentation, in  the file  pine4.30/doc/pine-ports,
    and was edited for brevity.

    Hence, with the above message ID, one knows the target's hostname,
    an account on  that machine that  reads mail using  Pine, and that
    it's Linux without shadowed passwords  (the LNX host type).   Hang
    out  on  a  mailing  list,  maybe something platform agnostic, and
    collect targets.  In this case, one could use a well known exploit
    within the mail message, grab the system password file and send it
    back to ourselves for analysis.  This can easily scaled to as many
    clients as has been fingerprinted; one mass mailing, and sit  back
    and wait for the password files to come in.

    This is not to say that  other mail clients are not vulnerable  to
    such  information  leaks.   Most  mail  clients  give  out similar
    information, either  directly or  indirectly.   Direct information
    would be an entry in the message headers, such as an X-Mailer tag.
    Indirect information  would be  similar to  that seen  for Pine, a
    distinctive message ID tag.   When this information is coupled  to
    the  information  about  the  originating  host, a fingerprint can
    occur rapidly.

    Some examples:

        User-Agent: Mutt/1.2.4i

        X-Mailer: Microsoft Outlook Express 5.00.3018.1300
        X-MimeOLE: Produced By Microsoft MimeOLE V5.00.3018.1300

        X-Mailer: dtmail 1.2.1 CDE Version 1.2.1 SunOS 5.6 sun4u sparc

        X-Mailer: PMMail 2000 Professional (2.10.2010) For Windows 2000 (5.0.2195)

        X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
        Message-ID:  <4.3.2.7.2.20001117142518.043ad100@mailserver3.somewhere.gov>

    While not all  clients give out  their host system  or processors,
    such as Mutt or Outlook  Express, this information can be  used by
    itself to get a larger vulnerability assessment.  For example,  if
    we know what  version strings appear  only on Windows,  as opposed
    to  a  MacOS  system,  we  can  determine the processor type.  The
    dtmail application is entirely too friendly to someone determining
    vulnerabilities, giving up the  processor and OS revision.   Given
    the problems  that have  appeared in  the CDE  suite, and in older
    versions of Solaris, an attack would be all too easy to construct.


    There are two main avenues  for finding this information for  lots
    of clients  quickly.   First, we  can sniff  the network  for this
    information.  Using  a tool like  mailsnarf, ngrep or  any sniffer
    with some basic filtering, a  modest collection of host to  client
    application data can be gathered.  The speed of collection and the
    ultimate size of  this database depends  chiefly on the  amount of
    traffic your network segment sees.   This is the main drawback  to
    this method, a limited amount of data.

    A much more efficient  method, and one that  can make use of  this
    above information, is in offline  (for the target with respect  to
    the  potential  attacker)  system  fingerprinting, with an exploit
    path included.  How  do we do this?  We search the web,  with it's
    repleat mailing list archives, and we turn up some boxes.

        Altavista: 2,033 pages found (for pine.ult)
        Google results 1-10 of about 141,000 for pine.lnx
        Altavista: 16,870 pages found (for pine.osf)

    You  get  the  idea.   Tens  of  thousands  of  hits, thousands of
    potentially exploitable boxes ready to be picked.  Simply evaluate
    the source host information  and map it to  the client data and  a
    large database of vulnerable hosts is rapidly built.

    The exploits  are easy.   Every week,  new exploits  are found  in
    client software,  either mail  applications like  Pine, or methods
    to  deliver  exploits  using  mail  software.   Examples  of  this
    include  the  various  buffer  overflows  that  have appeared (and
    persist) in Pine and OutLook, the delivery of malicious DLL  files
    using Eudora  attachments, and  such.   We know  from viruses like
    ILOVEYOU and Melissa  that more people  than not will  open almost
    any mail message, and we  know from spammers that it's  trivial to
    bulk  send   messages  with   forged  headers,   making  traceback
    difficult.   These two  items combine  to make  for a very readily
    available exploit.


    In  a  manner  similar  to  electronic  mail, Usenet clients leave
    significant  information  in  the  headers  of  their  posts which
    reveal information about their host operating systems.  One  great
    advantage to Usenet, as opposed  to email or even web  traffic, is
    that posts are distributed.  As such, we can be remote and collect
    data on hosts without their knowledge or ever having to gain entry
    into their network.

    Among the various newsreaders commonly used, copious host info  is
    included in  the headers.   The popular  UNIX newsreader  'tin' is
    among the worst offenders of revealing host information. Operating
    system versions, processors and applications are all listed in the
    'User-Agent'  field,  and  when  coupled  to the NNTP-Posting-Host
    information, a remote host fingerprint has been performed:

        User-Agent: tin/1.5.2-20000206 ("Black Planet") (UNIX) (SunOS/5.6(sun4u))
        User-Agent: tin/pre-1.4-980226 (UNIX) (FreeBSD/2.2.7-RELEASE (i386))
        User-Agent: tin/1.4.2-20000205 ("Possession") (UNIX) (Linux/2.2.13(i686))
        NNTP-Posting-Host: host.university.edu

    The standard  web browsers  also leave  copious information  about
    themselves and their host systems,  as they do with HTTP  requests
    and mail.  We will elaborate  on web clients in the next  section,
    but they are also a problem as Usenet clients:

        X-Http-User-Agent: Mozilla/4.75  (Windows NT 5.0; U)
        X-Mailer: Mozilla 4.75  (X11; U; Linux 2.2.16-3smpi686)

    And several  other clients  also leave  verbose information  about
    their hosts  to varying  degrees.   Again, when  combined with the
    NNTP-Posting-Host or  other identifying  header, one  can begin to
    amass information about hosts without too much work:

        Message-ID: <Pine.LNX.4.21.0010261126210.32652-100000@host.example.co.nz>

        User-Agent: MT-NewsWatcher/3.0 (PPC)

        X-Operating-System: GNU/Linux 2.2.16
        User-Agent: Gnus/5.0807 (Gnus v5.8.7) XEmacs/21.1 (Bryce Canyon)

        X-Newsreader: Microsoft Outlook Express 5.50.4133.2400

        X-Newsreader: Forte Free Agent 1.21/32.243

        X-Newsreader: WinVN 0.99.9 (Released Version) (x86 32bit)

    Either directly  or indirectly,  we can  fingerprint the operating
    system  over  the  source  host.    Other  programs  are  not   so
    forthcoming, but still leak information  about a host that can  be
    used to determine vulnerability analysis.

        X-Newsreader: KNode 0.1.13

        User-Agent: Pan/0.9.1 (Unix)

        User-Agent: Xnews/03.02.04

        X-Newsreader: trn 4.0-test74 (May 26, 2000)

        X-Newsreader: knews 1.0b.0 (mrsam/980423)

        User-Agent: slrn/0.9.5.7 (UNIX)

        X-Newsreader: InterChange (Hydra) News v3.61.08

    None of  these header  fields are  required by  the specifications
    for NNTP, as noted in RFC 2980.  They provide only some additional
    information  about  the  host  which  was  the source of the data.
    However, given that more transactions that concern the servers are
    between servers,  this data  is entirely  extraneous.   It is,  it
    appears, absent from RFC 977, the original specification for NNTP.

    On interesting possibility to exploiting a user agent like Mozilla
    is to examine  the accepted languages.   In the below  example, we
    see not only English is supported, but that the browser is  linked
    to  Acrobat.   Given  potential  holes,  and  past  problems, with
    malicious PDF files, this could be another avenue to gaining entry
    to a host.

        X-Mailer: Mozilla 4.75  (Win98; U)
        X-Accept-Language: en,pdf

    While this may seem that we're limited to fingerprinting hosts, or
    out of luck if they are using  a proxy, this is not the case.   We
    can also retrieve proxy info from the headers:

        X-Http-Proxy: 1.0 x72.deja.com:80 (Squid/1.1.22) for client 10.32.34.18

    While in  this case  the proxy  is disconnected  from the client's
    network, if this were  a border proxy, we  could use this to  gain
    information about a possible entry point to the network and,  over
    time and with  enough sample data,  information about the  network
    behind the protected border.


    A remarkably simple and  highly effective means of  fingerprinting
    a target is  to follow the  web browsing that  gets done from  it.
    Most every  system in  use is  a workstation,  and nearly everyone
    uses their  web browsers  to spend  part of  their day.   And just
    about  every   browser  sends   too  much   information  in   it's
    'User-Agent' field.

    RFC 1945 notes that the  'User-Agent' field is not required  in an
    HTTP  1.0  request,  but  can  be  used.  The authors state, "user
    agents  should  include  this  field  with  requests."   They cite
    statistics  as  well  as  on  the  fly  tailoring  of data to meet
    features or limitations of browsers.  The draft standard for  HTTP
    version 1.1 requests,  RFC 2616, also  notes similar usage  of the
    'User-Agent' field.

    We can gather this information in  two ways.  First, we could  run
    a website  and turn  on logging  of the  User-Agent field from the
    client (if it's not  already on).  Simply  generate a lot of  hits
    and  watch  the  data  come  in.   Get on Slashdot, advertise some
    pornographic  material,  or  mirror  some  popular  software (like
    warez) and you're ready to go.  Secondly, we can sniff web traffic
    on our visible segment.   While almost any sniffer will  work, one
    of the easiest for this type  of work is urlsnarf from the  dsniff
    package from Dug Song.  This package is available at

        http://www.monkey.org/~dugsong/dsniff/

    Examples  of  browsers  that  send  not  only  their   application
    information, such  as the  browser and  the version,  but also the
    operating system which the host runs include:

        - Netscape (UNIX, MacOS, and Windows)
        - Internet Explorer

    One  shining  example  of  a  browser that doesn't send extraneous
    information  is  Lynx.   On  both  2.7  and 2.8 versions, only the
    browser information is sent, no information about the host.

    The  User-Agent  field  can  be  important  to  the web server for
    legitimate reasons.   Due to  implementations, both  Netscape  and
    Explorer  are  not  equivalent  on  many items, including how they
    handle  tables,  scripting  and   style  sheets.   However,   host
    information is not needed and is sent gratuitously.

    A typical request from a popular browser looks like this:

        GET / HTTP/1.0
        Connection: Keep-Alive
        User-Agent: Mozilla/4.08  (X11; I; SunOS 5.7 sun4u)
        Host: 10.10.32.1
        Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
        Accept-Encoding: gzip
        Accept-Language: en
        Accept-Charset: iso-8859-1,*,utf-8

    The User-Agent field  is littered with  extra information that  we
    don't need to  know: the operating  system type, version  and even
    the hardware being used.

    Instantly we know everything  there is to know  about compromising
    this host: the operating system, the host's architecture, and even
    a route we could use to gain entry.  For example a recent problems
    in Netscape's JPEG handling.

    Using urlsnarf to log these transactions is the easiest method  to
    sniff this information from the network.  A typical line of output
    is below:

        10.10.1.232 - -  "GET http://www.latino.com/
        HTTP/1.0" - - "http://www.latino.com/" "Mozilla/4.07  (Win95; I ;Nav)"

    We  can  also  use  the  tool  ngrep to listen to this information
    on the  wire.   A simple  filter to  listen only  to packets  that
    contain the  information 'User-Agent'  can be  set up  and used to
    log information about hosts on the network.  ngrep can be obtained
    from the PacketFactory website:

        http://www.packetfactory.net/Projects/Ngrep/

    A simple regular expression filter can do the trick:

        ngrep -qid ep1 'User-Agent' tcp port 80

    This  will  print  out  all  TCP  packets  which  contain the case
    insensitive string User-Agent  in them.   And, within this  field,
    for too  many browsers,  is too  much information  about the host.
    With the  above options  to ngrep,  typical output  will look like
    this:

        T 10.10.11.43:1860 -> 130.14.22.107:80
          GET /entrez/query/query.js HTTP/1.1..Accept: */*..Referer: http://www.
          ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search	DB=PubMed..Accept-Langua
          ge: en-us..Accept-Encoding: gzip, deflate..If-Modified-Since: Thu, 29
          Jun 2000 18:38:45 GMT; length=4558..User-Agent: Mozilla/4.0 (compatibl
          e; MSIE 5.5; Windows 98)..Host: www.ncbi.nlm.nih.gov..Connection: Keep
          -Alive..Cookie: WebEnv=FpEB]AfeA>>Hh^`Ba@<]^d]bCJfdADh@(j)@ =^a=T=EjIE=b<F
          bg<....

    Even  more  information  is  contained  within  the  request  than
    urlsnarf showed us, information including cookies.


    In  much  the  same  way  as  one  can use the strings sent during
    requests by the clients to  determine what system type is  in use,
    one can follow  the replies sent  back by the  server to determine
    what type it is.  Again we will use ngrep, this time matching  the
    expression 'server:' to gather the web server type:

        T 192.168.0.5:80 -> 192.168.0.1:1033
          HTTP/1.0 200 OK..Server: Netscape-FastTrack/2.01..Date: Mon, 30 Oct 20
          00 00:15:31 GMT..Content-type: text/html....

    While specifics about the  operating system information are  lost,
    this  works  to  passively  gather vulnerability information about
    the target server.   This can be  coupled to other  information to
    decide how best to proceed with an attack.

    This information will not be  covered as this paper is  limited to
    client applications and systems being fingerprinted.


    While telnet is no longer in  widespread use due to the fact  that
    all of its  data is sent  in plain text,  including authentication
    data,  it  is  still   used  widely  enough  to   be  of  use   in
    fingerprinting target systems.  What is interesting is that it not
    only gives  us a  mechanism to  gather operating  system data,  it
    gives us the particular application in use, which can be of  value
    in determining a mechanism of entry.

    The specification for the telnet protocol describes a  negotiation
    between  the  client  and  the  host  for information such as line
    speed, terminal type and  echoing (for descriptive information  on
    these options and  their negotiations, please  see RFCs 857,  858,
    859, 860, 1091,  1073, 1079, 1184,  1372, and 1408.  Also, see TCP
    Illustrated, Volume 1: The Protocols by W. Richard Stevens).  What
    is interesting  to note  is that  each client  behaves in a unique
    way, even  different client  applications on  the same  host type.
    Similarly,  the  telnet  server,  running  a telnet daemon, can be
    fingerprinted by following the negotiations with the client.  This
    information can be viewed from the telnet command line application
    on a  UNIX host  by issuing  the 'toggle  options' command  at the
    telnet> prompt.

    This information can be gather directly, using a wedge application
    or a honey-pot as demonstrated on the network at Hope2k, or it can
    be sniffed off the network in a truly passive fashion.  We discuss
    below gathering data about both  the client system and the  server
    being  connected  to.   The  same  principles  apply  to both host
    identification methods.

    The negotiations  described above,  and in  the references listed,
    can be used to fingerprint  the client based upon the  options set
    and the  order in  which they  are negotiated.   Table 1 describes
    the behavior of several telnet  clients in these respects.   Their
    differences are  immediately obvious,  even for  different clients
    on the same  operating system, such  as Tera Term  Pro and Windows
    Telnet on a Windows 95 host.

    In this  table, all  server commands  and negotiation  options are
    ignored and only data originating from the client is shown.

    Table is omitted in this version, please see:

        http://www.crimelabs.net/docs/passive.html

    for pdf or/and .ps

    Obviously, the most  direct method to  fingerprint a server  would
    be to connect  to it and  examine the order  of options and  their
    values  as  a  telnet  session  was  negotiated.  However, as this
    study  is  concerned  with  passive  scanning  of clients, we will
    leave it to the reader to  map this information and learn what  to
    do with it.

SOLUTION

    In this  paper has  been illustrated  the effectiveness  of target
    system identification by using the information provided by network
    client applications.  This  provides a very efficient  and precise
    measure of the client operating  system, as well as identifying  a
    vector for attack.  This  information is sent gratuitously and  is
    not  essential  to   the  normal  operation   of  many  of   these
    applications.

    The main limitation  of this information  is found when  a host is
    performing  emulation  of  another  operating  system  to  run the
    client software.   While this is  rare, it could  lead to a  false
    system identification.   This mainly  falls in  the open  software
    world, however, and only for some operating systems.

    For web browsers, which are ubiquitous and used by nearly everyone
    on the  Internet, the  host operating  system should  not be sent.
    Ideally,  information  about  what  protocols  are  spoken,   what
    standards are  met and  what language  are supported  (ie English,
    German, French) should  suffice.  Lynx  behaves nearly ideally  in
    this regard,  and both  Netscape and  Explorer should  follow this
    lead.

    With respect  to Usenet  and electronic  mail clients,  again only
    what  features  are  supported  should  be  provided.   Pine is an
    example of  how bad  it can  get, providing  too much  information
    about a host too quickly.   There is no reason why any  legitimate
    client  should  know  what  processor  and  OS is being run on the
    sending host.

    Telnet clients are far more difficult.  It is tempting to say that
    all telnet applications should  support the same set  of features,
    but that is simply impossible.

    Proxy hosts should be used, if possible, to strip off  information
    about the  originating system,  including the  workstation address
    and operating system information.   This will help obscure  needed
    information to map a network from outside the perimeter.   Coupled
    with strong measures to catch viruses and malicious code, such  as
    in a web page script, the risks should be greatly reduced.

    The  best  solution  is  for  application  authors  to  not   send
    gratuitous information in their headers or requests.  Furthermore,
    client applications should  be scrutinized to  the same degree  as
    daemons that run with administrative privilidges.  The lessons  of
    RFC 1123 most certainly apply at this level.

    First: The admin/user must not  be able to alter/remove the  ident
    strings that are sent  out by the application.   This is the  case
    for most  of Windows  apps and  even where  it is  possible people
    usually do not take any measures  in this departement.  So we  can
    move on.

    Second: The information displayed must actually be correct.   This
    is when the fun begins.   To take a really good example,  the Pine
    on most Linux  systems *always* sends  messages with a  Message-Id
    that  contains  "LNX"  although  we  think  most  are using shadow
    passwords.

    Also, most mail agents are  quite good at rewriting headers  if we
    ask them to, MTAs being another hidden champions of this.  If  you
    ever happen to  receive a mail  from me with  a From: header  that
    says root, do  not even for  a second think  that it was  actually
    sent from that account.

    Also, you do not seem to adequately account for the fact that many
    people are not running mail servers  on their systems.  So if  you
    eg  see:  Outlook  Express,  than  fine,  you know that the sender
    machine was  running this  MUA.   But it  also makes  it more than
    likely that the  email address you  found is not  actually that on
    the sending machine but rather one on a big mail server, which may
    be running anything.  You have no knowledge of how mail collection
    at that site works, so you  cannot be sure that your exploit  will
    actually work.   (eg person can  make an email  enquiry with their
    browsers email client  after clicking on  a link but  use someting
    else for "normal" mail and may not even be aware of the difference
    and yes, we have seen a setup like this.)

    Also,  emulation  is  (or  rather  can  be) a quite big issue with
    lesser known OSs for  which not enough native  applications exist.
    Eg there is  no Netscape binary  of the current  release available
    for any  BSD operating  system.   (With BSDi  support having  been
    dropped after 4.75)  so if you  want to use  Netscape on any  BSD,
    you have to  use emulation.   But if you  go after the  presumably
    old, 2.0.x  kernel based  Linux system  it reports  itself as, you
    will be in  for a surprise).   But the real  kicker is using  Wine
    (Windows  emulation   package  for   UNIX)  and   a  windows-based
    web-browser... (yes  ppl have  done things  like this.   Sometimes
    you are forced to, eg if there is not even a Linux port of the  sw
    you need to run).

    For proxies: It is known  that there exist proxies that  hide your
    real IP address and cannot be detected any easy way (because  they
    do not insert an X-Forwarded-IP field).  The proxy may or may  not
    be  local,  so  you  do  not  necessarily have the entrance to the
    network either.

    Also,   web   search   engines   can   be   helpful   for  finding
    vulnerabilities in servers  but to compile  lists of target  hosts
    from mailing  list archives  is fragile...  there may  not be many
    live  hits  from  those.   (Even  for  server fingerprinting, some
    surpriese are  in the  game: eg  Walmart was  suspected of forging
    their  server  signature  because  at  least on one occassion they
    reported  themselves  as  Microsoft-IIS/4.0  (Unix)  mod_ssl/2.6.6
    OpenSSL/0.9.5..  outright  funny.  So  the point is:  although the
    information may be there,  it may already be  forged intentionally
    or otherwise incorrect.

    Also, the fact that you found eg  Mutt does not tell a lot to  you
    unless  you  have  a  specific  exploit...  because  many of these
    programs run on many  UNIX/UNIX-like systems plus on  DOS/Windows.
    So you do not know a lot.

    And finally: with all this information  you have to go out and  do
    some actual scanning to verify/gather more information and this is
    where you can already get caught.

    But, yes, even with the above points made, considering the average
    Windows/Mac user  and admin,  information leakage  can be  a cause
    for many an interesting occurance... why, at this rate we got  the
    idea  for  a  paper  titled:   "utilizing information gleaned from
    Internet-accessible  support  pages  of  various big organizations
    and  institutions  in  network  incidents..."  It  is  at least as
    interesting a topic as this one.