Ph.D. comic~
It is really funny!
like this. you can find it here http://www.phdcomics.com/comics.php
American talking about Hanhan from China
Hanhan is almost my age, and he is now so famous that even people from the opposite of Pacific ocean are now talking about him. So jealous. Following is what I got from NewYork Times.
SATURDAY PROFILE; Heartthrob’s Barbed Blog Challenges China’s Leaders
SHANGHAI
IT’S not so easy being Han Han, the heartthrob race car driver and pop novelist who just happens to be China’s most widely read blogger.
Traveling incognito is all but impossible. Local officials frequently vie for his endorsement of their latest architectural boondoggles. (He politely declines.) And love-lorn young women often approach him after races with letters bearing his name. (He says the women have been duped by impostors who have assumed his identity.)
But Mr. Han’s most vexing challenge comes from a more formidable nemesis: the unseen censors who delete blog posts they deem objectionable and the publishing police who have held up the release of his new magazine, ”A Chorus of Solos,” a provocative collection of essays and photographs. ”The government wants China to become a great cultural nation, but our leaders are so uncultured,” he said with a shrug, offering his characteristic Cheshire-cat grin. ”If things continue like this, China will only be known for tea and pandas.”
Since he began blogging in 2006, Mr. Han has been delivering increasingly caustic attacks on China’s leadership and the policies he contends are creating misery for those unlucky enough to lack a powerful government post. With more than 300 million hits to his blog, he may be the most popular living writer in the world.
In a recent interview at his office in Shanghai, he described party officials as ”useless” and prone to spouting nonsense, although he used more delicate language to dismiss their relevance. ”Their lives are nothing like ours,” he said. ”The only thing they have in common with young people is that like us, they too have girlfriends in their 20s, although theirs are on the side.”
Mr. Han has enjoyed widespread fame since he published his first novel at 19, but his popularity has ballooned in recent months through blog posts that seem to capture the zeitgeist of his peers, the so-called post-80s generation born after the economic reforms introduced by Deng Xiaoping.
Theirs is a generation of only children, the result of China’s one-child policy, and one that has known only uninterrupted growth. Whether true or not, it is also a demographic with a reputation for being spoiled, impatient and less accepting of the storyline fed to them by government-run media.
If Mr. Han’s tongue is sharp, he is careful to deliver his barbs through sarcasm and humorous anecdotes that obliquely take on corruption, censorship and everyday injustice
In one recent post about redevelopment projects that often end in violence and forced evictions, he suggested that the government build public housing in the form of prisons. The benefits would be twofold, he explained: Tenants could make no claim on the apartments and those who make a fuss could simply be locked up in their homes.
His current gambit is a wryly subversive competition that will award $730 to the person who comes up with new lyrics to a song-and-dance routine that was broadcast last month during the reliably soporific Chinese New Year television gala.
The performance, staged by China’s national broadcaster and viewed by an estimated 400 million people, featured merry members of the Uighur minority belting out praise for Communist Party policies.
These were not the policies that many Uighurs bemoan as oppressive — and which may or may not have provoked the deadly riots in the western region of Xinjiang last summer — but ones that supposedly reduced taxes, increased health benefits and according to the singing farmer Maimaiti, filled his donkey sack with cash.
ALTHOUGH his posts are sometimes ”harmonized” — a popular euphemism for censorship –his blog, published by one of China’s most popular Web portals, has so far been allowed to continue. Ran Yunfei, a writer and blogger in Sichuan Province, says that Mr. Han is partly insulated by his celebrity, but also by his avoidance of the most politically charged topics.
How to get hired (A CS student need to know)
See here.
I’ve hired dozens of C/C++ programmers (mostly at the entry level). To do that, I had to interview hundreds of candidates. Many of them were woefully poorly prepared for the interview. This page is my attempt to help budding software engineers get and pass programming interviews.
To see more, click here.
What really happens when you navigate to a URL
See here
Translate in to Chinese! See here.
As a software developer, you certainly have a high-level picture of how web apps work and what kinds of technologies are involved: the browser, HTTP, HTML, web server, request handlers, and so on.
In this article, we will take a deeper look at the sequence of events that take place when you visit a URL.
1. You enter a URL into the browser
It all starts here:

2. The browser looks up the IP address for the domain name

The first step in the navigation is to figure out the IP address for the visited domain. The DNS lookup proceeds as follows:
- Browser cache - The browser caches DNS records for some time. Interestingly, the OS does not tell the browser the time-to-live for each DNS record, and so the browser caches them for a fixed duration (varies between browsers, 2 – 30 minutes).
- OS cache – If the browser cache does not contain the desired record, the browser makes a system call (gethostbyname in Windows). The OS has its own cache.
- Router cache – The request continues on to your router, which typically has its own DNS cache.
- ISP DNS cache – The next place checked is the cache ISP’s DNS server. With a cache, naturally.
- Recursive search – Your ISP’s DNS server begins a recursive search, from the root nameserver, through the .com top-level nameserver, to Facebook’s nameserver. Normally, the DNS server will have names of the .com nameservers in cache, and so a hit to the root nameserver will not be necessary.
Here is a diagram of what a recursive DNS search looks like:

One worrying thing about DNS is that the entire domain like wikipedia.org or facebook.com seems to map to a single IP address. Fortunately, there are ways of mitigating the bottleneck:
- Round-robin DNS is a solution where the DNS lookup returns multiple IP addresses, rather than just one. For example, facebook.com actually maps to four IP addresses.
- Load-balancer is the piece of hardware that listens on a particular IP address and forwards the requests to other servers. Major sites will typically use expensive high-performance load balancers.
- Geographic DNS improves scalability by mapping a domain name to different IP addresses, depending on the client’s geographic location. This is great for hosting static content so that different servers don’t have to update shared state.
- Anycast is a routing technique where a single IP address maps to multiple physical servers. Unfortunately, anycast does not fit well with TCP and is rarely used in that scenario.
Most of the DNS servers themselves use anycast to achieve high availability and low latency of the DNS lookups.
3. The browser sends a HTTP request to the web server

You can be pretty sure that Facebook’s homepage will not be served from the browser cache because dynamic pages expire either very quickly or immediately (expiry date set to past).
So, the browser will send this request to the Facebook server:
GET http://facebook.com/ HTTP/1.1 Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...] User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...] Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: facebook.com Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]
The GET request names the URL to fetch: “http://facebook.com/”. The browser identifies itself (User-Agent header), and states what types of responses it will accept (Accept and Accept-Encoding headers). The Connection header asks the server to keep the TCP connection open for further requests.
The request also contains the cookies that the browser has for this domain. As you probably already know, cookies are key-value pairs that track the state of a web site in between different page requests. And so the cookies store the name of the logged-in user, a secret number that was assigned to the user by the server, some of user’s settings, etc. The cookies will be stored in a text file on the client, and sent to the server with every request.
There is a variety of tools that let you view the raw HTTP requests and corresponding responses. My favorite tool for viewing the raw HTTP traffic is fiddler, but there are many other tools (e.g., FireBug) These tools are a great help when optimizing a site.
In addition to GET requests, another type of requests that you may be familiar with is a POST request, typically used to submit forms. A GET request sends its parameters via the URL (e.g.: http://robozzle.com/puzzle.aspx?id=85). A POST request sends its parameters in the request body, just under the headers.
The trailing slash in the URL “http://facebook.com/” is important. In this case, the browser can safely add the slash. For URLs of the form http://example.com/folderOrFile, the browser cannot automatically add a slash, because it is not clear whether folderOrFile is a folder or a file. In such cases, the browser will visit the URL without the slash, and the server will respond with a redirect, resulting in an unnecessary roundtrip.
4. The facebook server responds with a permanent redirect

This is the response that the Facebook server sent back to the browser request:
HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: http://www.facebook.com/
P3P: CP="DSP LAW"
Pragma: no-cache
Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-Cnection: close
Date: Fri, 12 Feb 2010 05:09:51 GMT
Content-Length: 0
The server responded with a 301 Moved Permanently response to tell the browser to go to “http://www.facebook.com/” instead of “http://facebook.com/”.
There are interesting reasons why the server insists on the redirect instead of immediately responding with the web page that the user wants to see.
One reason has to do with search engine rankings. See, if there are two URLs for the same page, say http://www.igoro.com/ and http://igoro.com/, search engine may consider them to be two different sites, each with fewer incoming links and thus a lower ranking. Search engines understand permanent redirects (301), and will combine the incoming links from both sources into a single ranking.
Also, multiple URLs for the same content are not cache-friendly. When a piece of content has multiple names, it will potentially appear multiple times in caches.
5. The browser follows the redirect

The browser now knows that “http://www.facebook.com/” is the correct URL to go to, and so it sends out another GET request:
GET http://www.facebook.com/ HTTP/1.1 Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...] Accept-Language: en-US User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...] Accept-Encoding: gzip, deflate Connection: Keep-Alive Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...] Host: www.facebook.com
The meaning of the headers is the same as for the first request.
6. The server ‘handles’ the request

The server will receive the GET request, process it, and send back a response.
This may seem like a straightforward task, but in fact there is a lot of interesting stuff that happens here – even on a simple site like my blog, let alone on a massively scalable site like facebook.
- Web server software
The web server software (e.g., IIS or Apache) receives the HTTP request and decides which request handler should be executed to handle this request. A request handler is a program (in ASP.NET, PHP, Ruby, …) that reads the request and generates the HTML for the response.In the simplest case, the request handlers can be stored in a file hierarchy whose structure mirrors the URL structure, and so for example http://example.com/folder1/page1.aspx URL will map to file /httpdocs/folder1/page1.aspx. The web server software can also be configured so that URLs are manually mapped to request handlers, and so the public URL of page1.aspx could be http://example.com/folder1/page1.
- Request handler
The request handler reads the request, its parameters, and cookies. It will read and possibly update some data stored on the server. Then, the request handler will generate a HTML response.
One interesting difficulty that every dynamic website faces is how to store data. Smaller sites will often have a single SQL database to store their data, but sites that store a large amount of data and/or have many visitors have to find a way to split the database across multiple machines. Solutions include sharding (splitting up a table across multiple databases based on the primary key), replication, and usage of simplified databases with weakened consistency semantics.
One technique to keep data updates cheap is to defer some of the work to a batch job. For example, Facebook has to update the newsfeed in a timely fashion, but the data backing the “People you may know” feature may only need to be updated nightly (my guess, I don’t actually know how they implement this feature). Batch job updates result in staleness of some less important data, but can make data updates much faster and simpler.
7. The server sends back a HTML response

Here is the response that the server generated and sent back:
HTTP/1.1 200 OK
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
P3P: CP="DSP LAW"
Pragma: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Cnection: close
Transfer-Encoding: chunked
Date: Fri, 12 Feb 2010 09:05:55 GMT
2b3��������T�n�@����[...]
The entire response is 36 kB, the bulk of them in the byte blob at the end that I trimmed.
The Content-Encoding header tells the browser that the response body is compressed using the gzip algorithm. After decompressing the blob, you’ll see the HTML you’d expect:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en" id="facebook" class=" no_js">
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-language" content="en" />
...
In addition to compression, headers specify whether and how to cache the page, any cookies to set (none in this response), privacy information, etc.
Notice the header that sets Content-Type to text/html. The header instructs the browser to render the response content as HTML, instead of say downloading it as a file. The browser will use the header to decide how to interpret the response, but will consider other factors as well, such as the extension of the URL.
8. The browser begins rendering the HTML
Even before the browser has received the entire HTML document, it begins rendering the website:

9. The browser sends requests for objects embedded in HTML

As the browser renders the HTML, it will notice tags that require fetching of other URLs. The browser will send a GET request to retrieve each of these files.
Here are a few URLs that my visit to facebook.com retrieved:
- Images
http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
… - CSS style sheets
http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
… - JavaScript files
http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
…
Each of these URLs will go through process a similar to what the HTML page went through. So, the browser will look up the domain name in DNS, send a request to the URL, follow redirects, etc.
However, static files – unlike dynamic pages – allow the browser to cache them. Some of the files may be served up from cache, without contacting the server at all. The browser knows how long to cache a particular file because the response that returned the file contained an Expires header. Additionally, each response may also contain an ETag header that works like a version number – if the browser sees an ETag for a version of the file it already has, it can stop the transfer immediately.
Can you guess what “fbcdn.net” in the URLs stands for? A safe bet is that it means “Facebook content delivery network”. Facebook uses a content delivery network (CDN) to distribute static content – images, style sheets, and JavaScript files. So, the files will be copied to many machines across the globe.
Static content often represents the bulk of the bandwidth of a site, and can be easily replicated across a CDN. Often, sites will use a third-party CDN provider, instead of operating a CND themselves. For example, Facebook’s static files are hosted by Akamai, the largest CDN provider.
As a demonstration, when you try to ping static.ak.fbcdn.net, you will get a response from an akamai.net server. Also, interestingly, if you ping the URL a couple of times, may get responses from different servers, which demonstrates the load-balancing that happens behind the scenes.
10. The browser sends further asynchronous (AJAX) requests

In the spirit of Web 2.0, the client continues to communicate with the server even after the page is rendered.
For example, Facebook chat will continue to update the list of your logged in friends as they come and go. To update the list of your logged-in friends, the JavaScript executing in your browser has to send an asynchronous request to the server. The asynchronous request is a programmatically constructed GET or POST request that goes to a special URL. In the Facebook example, the client sends a POST request to http://www.facebook.com/ajax/chat/buddy_list.php to fetch the list of your friends who are online.
This pattern is sometimes referred to as “AJAX”, which stands for “Asynchronous JavaScript And XML”, even though there is no particular reason why the server has to format the response as XML. For example, Facebook returns snippets of JavaScript code in response to asynchronous requests.
Among other things, the fiddler tool lets you view the asynchronous requests sent by your browser. In fact, not only you can observe the requests passively, but you can also modify and resend them. The fact that it is this easy to “spoof” AJAX requests causes a lot of grief to developers of online games with scoreboards. (Obviously, please don’t cheat that way.)
Facebook chat provides an example of an interesting problem with AJAX: pushing data from server to client. Since HTTP is a request-response protocol, the chat server cannot push new messages to the client. Instead, the client has to poll the server every few seconds to see if any new messages arrived.
Long polling is an interesting technique to decrease the load on the server in these types of scenarios. If the server does not have any new messages when polled, it simply does not send a response back. And, if a message for this client is received within the timeout period, the server will find the outstanding request and return the message with the response.
Conclusion
Hopefully this gives you a better idea of how the different web pieces work together.
Happy new year 2010!
Let’s celebrate Chinese new year! Huray…
The importance of stupidity in scientific research
Journal of Cell Science; a famous scientific magazine just like nature, science
This essasy told us how to behave in your Ph.D. study. It helps.
The importance of stupidity in scientific research
Department of Microbiology, UVA Health System, University of Virginia, Charlottesville, VA 22908, USA
e-mail: maschwartz@virginia.edu
Accepted 9 April 2008
I recently saw an old friend for the first time in many years. We had been Ph.D. students at the same time, both studying science, although in different areas. She later dropped out of graduate school, went to Harvard Law School and is now a senior lawyer for a major environmental organization. At some point, the conversation turned to why she had left graduate school. To my utter astonishment, she said it was because it made her feel stupid. After a couple of years of feeling stupid every day, she was ready to do something else.
I had thought of her as one of the brightest people I knew and her subsequent career supports that view. What she said bothered me. I kept thinking about it; sometime the next day, it hit me. Science makes me feel stupid too. It’s just that I’ve gotten used to it. So used to it, in fact, that I actively seek out new opportunities to feel stupid. I wouldn’t know what to do without that feeling. I even think it’s supposed to be this way. Let me explain.
For almost all of us, one of the reasons that we liked science in high school and college is that we were good at it. That can’t be the only reason – fascination with understanding the physical world and an emotional need to discover new things has to enter into it too. But high-school and college science means taking courses, and doing well in courses means getting the right answers on tests. If you know those answers, you do well and get to feel smart.
A Ph.D., in which you have to do a research project, is a whole different thing. For me, it was a daunting task. How could I possibly frame the questions that would lead to significant discoveries; design and interpret an experiment so that the conclusions were absolutely convincing; foresee difficulties and see ways around them, or, failing that, solve them when they occurred? My Ph.D. project was somewhat interdisciplinary and, for a while, whenever I ran into a problem, I pestered the faculty in my department who were experts in the various disciplines that I needed. I remember the day when Henry Taube (who won the Nobel Prize two years later) told me he didn’t know how to solve the problem I was having in his area. I was a third-year graduate student and I figured that Taube knew about 1000 times more than I did (conservative estimate). If he didn’t have the answer, nobody did.
That’s when it hit me: nobody did. That’s why it was a research problem. And being my research problem, it was up to me to solve. Once I faced that fact, I solved the problem in a couple of days. (It wasn’t really very hard; I just had to try a few things.) The crucial lesson was that the scope of things I didn’t know wasn’t merely vast; it was, for all practical purposes, infinite. That realization, instead of being discouraging, was liberating. If our ignorance is infinite, the only possible course of action is to muddle through as best we can.
I’d like to suggest that our Ph.D. programs often do students a disservice in two ways. First, I don’t think students are made to understand how hard it is to do research. And how very, very hard it is to do important research. It’s a lot harder than taking even very demanding courses. What makes it difficult is that research is immersion in the unknown. We just don’t know what we’re doing. We can’t be sure whether we’re asking the right question or doing the right experiment until we get the answer or the result. Admittedly, science is made harder by competition for grants and space in top journals. But apart from all of that, doing significant research is intrinsically hard and changing departmental, institutional or national policies will not succeed in lessening its intrinsic difficulty.
Second, we don’t do a good enough job of teaching our students how to be productively stupid – that is, if we don’t feel stupid it means we’re not really trying. I’m not talking about `relative stupidity’, in which the other students in the class actually read the material, think about it and ace the exam, whereas you don’t. I’m also not talking about bright people who might be working in areas that don’t match their talents. Science involves confronting our `absolute stupidity’. That kind of stupidity is an existential fact, inherent in our efforts to push our way into the unknown. Preliminary and thesis exams have the right idea when the faculty committee pushes until the student starts getting the answers wrong or gives up and says, `I don’t know’. The point of the exam isn’t to see if the student gets all the answers right. If they do, it’s the faculty who failed the exam. The point is to identify the student’s weaknesses, partly to see where they need to invest some effort and partly to see whether the student’s knowledge fails at a sufficiently high level that they are ready to take on a research project.
Productive stupidity means being ignorant by choice. Focusing on important questions puts us in the awkward position of being ignorant. One of the beautiful things about science is that it allows us to bumble along, getting it wrong time after time, and feel perfectly fine as long as we learn something each time. No doubt, this can be difficult for students who are accustomed to getting the answers right. No doubt, reasonable levels of confidence and emotional resilience help, but I think scientific education might do more to ease what is a very big transition: from learning what other people once discovered to making your own discoveries. The more comfortable we become with being stupid, the deeper we will wade into the unknown and the more likely we are to make big discoveries.
say thanks in English
I saw a great article this afternoon from 5xue.comon how to say “thank you” in English. There are so many kinds of expressions, not just “thank you” “thanks very much”, something we always say.
1. Thank you for one of the most memorable days of my trip.
2.Thank you for one of most enjoyable visits we have had in many months.
3. Thank you for doing so much to make my trip to New York interesting.
4. Thank you for contributing so much to the pleasure of our stay in…
5. Thank you so much for your generous hospitality.
6. I hope something will bring you to New York soon so that I can reciprocate your kindness.
7. You must give me the chance to return your kindness when you visit here.
8. Thank you very much (ever so much) (most sincerely) (indeed) (from the bottom of my heart).
9. Many thanks for your kind and warm letter.
10. Thanks a million.
11. Please accept (I wish to express) my sincere (grateful) (profound) appreciation for..
12. I sincerely (deeply) (warmly) appreciate..
13. I am very sincerely (most) (truly) grateful to you for …
14. There is nothing more important to me than to receive one of your letters.
15. Your most courteous letter..
16. I cannot tell you how much your letter delighted me.
17. I love the way you say things in your letter. You make even the smallest incident seem so interesting …
18. It was good (fine)(charming)(thoughtful) of you…
19. It was nice (characteristically thoughtful) (more than kind) of you…
20. At the outset, I want to thank you for your kindness to me and for your compliments.
21. We were deeply touched by …
22. It is a hopeless understatement to say that I am deeply grateful.
23. It is generous of you to take so much interest in my work.
The Great Debaters
I saw a great movie <The Great Debaters> last night. It kind of impressed me a lot. It, especially let me know the power of words, just like sword in the war. Also see the Great Debaters Movie .com
Directed by Denzel Washington, it is a 2007 American biopic period drama film, co-stars Forest Whitaker, Kimberly Elise, Nate Parker…
What need to be corrected here is that The Wiley College never beat Harvard College in the 1930s, indeed it beat University of Southern California in the matchup. see Wikipedia.
![]()
The film depicts the Wiley Debate team beating Harvard College in the 1930s. This meeting actually never occurred. The debate most likely similar to the one depicted by the movie was the match up between Wiley and The University of Southern California, who at the time were the reigning debating champions. Wiley College did indeed win this matchup.[5] According to Robert Eisele: “In that era, there was much at stake when a black college debated any white school, particularly one with the stature of Harvard. We used Harvard to demonstrate the heights they achieved.”[6]
The film omits another reality: even though they beat the reigning champions, the Great Debaters were not allowed to call themselves victors because they were not truly considered to belong to the debate society; blacks were not admitted until after World War II.[7]
Very famous line:
- Who is the judge?
- The judge is God.
- Why is he God?
- Because he decides who wins or loses not my opponent.
- Who is your opponent?
- He doesn’t exist.
- Why does he not exist?
-Because he is a mute distant voice the truth that I speak.
- Speak the truth!
- Speak the truth!
At last, junior Farmer said in his debate which let him win:
St. Augustine said:
”An unjust law is no law at all.”
Which means
I have a right, even a duty to resist.
With violence or civil disobedience…
You should pray I choose the later.
I like it.
You also can see the report that day in US newspaper. It is true in the world.

C++ File Input/Output
C++ File Input and Output from hmc.edu![]()
Details of file I/O seem to be buried at the back, missing, or overly complicated in most C++ manuals. This page provides a quick reference for the most frequently used methods.
This page only discusses basic options that seem to be common to all my C++ references. Apparently there is a lot of variation from one manual to another, and from one implementation to another. I believe the methods below to be safe and portable, at least for ASCII (human-readable text) files.
Header files
To do input and output, you will need to load the iostream header file. You may also need to load the fstream (file I/O) and/or iomanip (format manipulation) header files. Put some/all of the following lines at the top of your code file (or in the header file for your program, if you are using one).
#include <iostream.h> // I/O #include <fstream.h> // file I/O #include <iomanip.h> // format manipulation
Getting a stream
Three streams just exist: cout (terminal output), cin (terminal input), and cerr (error output, which also goes to the terminal).
When writing error messages, use cerr rather than cout. In simple examples, the two appear to be the same. However, they behave differently when you use Unix redirection operators when you run your program. In particular, if you redirect output to a file, error messages printed to cerr will still appear on the user’s terminal whereas error messages printed to cout will be mixed into the output file.
File streams are of type ifstream (input) or ofstream (output).
ifstream fp_in; // declarations of streams fp_in and fp_out
ofstream fp_out;
fp_in.open("myfile.txt", ios::in); // open the streams
fp_out.open("myfile.txt", ios::out);
fp_in.close(); // close the streams
fp_out.close();
A file should be closed if you are done with it, but the program will continue running for a while longer. This is particularly important when you intend to open a lot of files, as there may be a limit on how many you can have open at once. It is also a good idea if you intend to open a file for input, and then re-open the file for output.
Declaring the pointer and opening the file can be combined:
ifstream fp_in("myfile.txt", ios::in); // declare and open
The parameters ios::in and ios::out specify the mode in which the file is to be opened. Depending on the implementation of C++, it may be possible to specify a variety of other mode options, such as appending to the end of an existing file, triggering an error rather than overwriting an existing file, or specifying that a file is binary for operating systems (e.g. MS-DOS) which distinguish binary and ASCII files.
Passing streams to functions
File streams must be passed to functions by reference, not by value.
void myfunction(ifstream &fp, ...) // use this void myfunction(ifstream fp, ...) // not this
If you pass streams by value, the C++ compiler will not complain. However, mysterious bad things will start happening, often in parts of the code which don’t appear to be related to the offending function.
Item by item input and output
If each input item is surrounded by whitespace (blanks, tabs, newlines), the items can be read easily using the extraction operator >>.
int myinteger; // declarations float myfloat; char mychar; char *mystring; // two ways to declare a string char mystring[64]; fp_in >> myinteger; // input from file pointer or standard input cin >> myfloat; fp_in >> mychar; cin >> mystring;
The extraction operator works for numbers (ints, floats), characters (char), and strings (declared as arrays of type char or pointers to type char).
The extraction operator returns a zero value if it encounters a problem (typically, the end of the file). Therefore, it can be used as the test in an if statement or a while loop.
WARNING: when reading data into a character string, bad things will happen if the input word is longer than your string. To avoid problems, use the operator setw to force excessively long input to be broken up. (You must include the iomanip header file.) The input to setw should be the length of your string (including the null character ‘\0′ at the end of the string).
cin >> setw(length) >> mystring;
Numbers, characters, and strings can be written to a file, standard output, or the standard error using the insertion operator <<.
cout << "Value of myinteger " << myinteger << endl; cout << "My string is " << mystring << " plus a null character\n" << flush;
To insert a line break, either insert the magic variable endl or write the end-of-line character (‘\n’) to the output.
To make a pointer print out as a pointer, not as whatever type of data it points to, cast it to the type (void *). To make a character print as a number, cast it to type int. Similarly, you can use a cast to convince C++ to print an integer as the corresponding character.
cout << (void *)ptr; cout << (int)ch; cout << (char)ival;
Buffering and flush
When you send output to a stream, it does not necessarily get printed immediately. Rather, it may wait in a buffer until some unspecified event, e.g. buffer full enough, reading from input, or exit from program. The details may vary.
Buffering makes it faster to print a large amount of output, particularly if you are producing the output bit-by-bit (e.g. one character at a time). However, it is a nuisance for output used in debugging (e.g. a statement that informs you that step 3 of your algorithm has been finished).
Forcing all buffered output to actually be printed is known as “flushing” the stream. A flush can be forced by calling the flush function associated with each output stream, inserting the magic variable flush into the stream, or inserting endl.
cout << flush; cout.flush(); cout << endl;
Other input operations
All of the following are illustrated using the standard input, but they work just the same on file streams.
- cin.get(char &ch)
- Puts the next input character in the variable ch. Returns an integer value, which is zero if it encountered a problem (e.g. end of file).
- cin.getline(char *buffer, int length)
- Reads characters into the string buffer, stopping when (a) it has read length-1 characters or (b) when it finds an end-of-line character (‘\n’) or the end of the file. Stores a null character (‘\0′) after the last character read.
- cin.read(char *buffer, int n)
- Reads n bytes (or until the end of the file) from the stream into the buffer.
- cin.gcount()
- Returns the number of characters read by a preceding get, getline, or read command.
- cin.ignore(int n)
- Remove the next n characters (or until end of file) from the input stream, throwing them away into the Great Bit Bucket.
- cin.putback(char ch)
- Puts character ch back onto the stream. Bad things will happen if this character is not the one most recently extracted from the stream.
These operations all return zero if something goes wrong, e.g. they hit the end of the file. Therefore, they can be used as the condition in an if statement or while loop.
In addition, there are two more input operations, get and peek. These functions return EOF (which is secretly -1) if they encounter the end of the file. The output of these functions should be put into an integer (not a char) variable.
- cin.get()
- Returns the next character in the stream.
- cin.peek()
- Returns the next character in the stream but does not remove it from the stream.
The following functions can be used to test the status of a stream. They return an integer, which is either zero or non-zero.
- cin.good()
- Returns 0 if the stream has encountered problems such as reading the end of file, non-existent file.
- cin.bad()
- Returns non-zero value if the stream is totally unusable, e.g. the file cannot be opened (but not if the stream has merely hit the end of the file).
- cin.eof()
- Returns a non-zero value if the stream has hit the end of the file.
Notice that the stream’s status will change to bad (not good, eof, etc) after the first read request which encounters a problem. So, to use one of these functions, you attempt to do what you wanted to do (e.g. open the file, read the next number from the file, …). If the action can’t succeed, the program won’t crash, though some of your variables might not contain the values you intended. Next, use the status function to check whether the action succeeded.
Other output operations
Other options for writing data to an output stream are:
- cout.put(char ch)
- Puts ch onto the stream.
- cout.write(char *str, int n)
- Puts n characters onto the stream, reading them from the string str.
The function setf can be used to change formatting parameters for an output stream. For example, the following causes numbers to be left justified.
cout.setf(ios::left); \\ set option cout.unsetf(ios::left); \\ unset option
The most obviously useful parameters are:
- ios::left
- Left justify output.
- ios::right
- Right justify output.
- ios::scientific
- Print floating point numbers using scientific notation.
- ios::fixed
- Print floating point numbers using fixed point notation.
- ios::showpoint
- Print a decimal point for all floating point numbers, even when it’s not needed (e.g. the number is exactly an integer).
The precision of numbers can be changed as follows. You can also set the width, i.e. the minimum number of spaces used to print the next. These featuers are used, for example, to make output items line up in columns when printed. Both of these features require that you include the iomanip header file.
cout << setprecision(2); \\ print two digits after decimal point cout.precision(2); \\ an alternative syntax cout << setw(8); \\ make item occupy 8 characters cout.width(8); \\ an alternative syntax
Setting the width to k forces the item to occupy at least k characters. If its printed representation is shorter, blanks will be added so that it occupies k characters. If its printed representation is longer, it will occupy more than k characters.
When you reset parameters such as the precision, it is not clear whether the new value lasts only for the next item printed or whether it persists for subsequent values. Apparently this varies with the parameter and the C++ compiler. Therefore, if you care about the value of some formatting parameter, explicitly set it to the right thing before you output each item.
For examples of usage, and how the formatting options affect the printing of numbers, look at Bob Keller’s format sampler.
Repositioning and error states
In general, it is possible to move to any position in a file stream. This is a capability most programmers use rarely, if at all. For fairly obvious reasons, don’t try to use repositioning operations on the standard input, output, or error streams.
The most likely reason for using a repositioning command is to rewind a file to the start, so that you can read its contents again. This is done as follows:
fp_in.clear(); // forget we hit the end of file fp_in.seekg(0, ios::beg); // move to the start of the file
Technorati : c++, input and output, learning c++
using cin to get user input
Too many questions keep popping up with the same problem. How do I get user input from cin using >> into X type. Using the >> operator opens you up to alot of problems. Just try entering some invalid data and see how it handles it.
Cin is notorious at causing input issues because it doesn’t remove the newline character from the stream or dotype-checking. So anyone using cin >> var; and following it up with another cin >> stringtype; or getline(); will receive empty inputs. It’s best practice to NOT MIX the different types of input methods from cin.
I know it’s going to be easier to use cin >> integer; than the code below. However, the code below is type-safe and if you enter something that isn’t an integer it will handle it. The above code will simply go into an infinite loop and cause undefined behaviour in your application.
Another dis-advantage of using cin >> stringvar; is that cin will do no checks for length, and it will break on a space. So you enter something that is more than 1 word, only the first word is going to be loaded. Leaving the space, and following word still in the input stream.
A more elegant solution, and much easier to use is the getline(); function. The example below shows you how to load information, and convert it between types.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
Technorati : c++, cin, cin.ignore, learning c++
Conatact Form
Categories
Tag cloud
-
Recent Comments

