Manual Data Mining
One of the more poignant things about my thesis is that I analyzed 417 Friendster Profiles manually, in the span of five months. I still have bad memories about sitting down in front of my computer poring over the Profiles (and their screen-captures) for 12 straight hours for a full week, just getting color categorization right. Needless to say, most experts on the matter of Internet research would stop short of calling this “stupid.”
I haven’t heard of technologies and tools dedicated to Internet research before, like “data mining” and “sentiment analysis.” From what I’ve read on the matter, tools ranging from simple scripts to full-blown programs have taken the place of the manual method I used in my work. Personally, I feel a bit bad. Pissed, even. Had I known of these tools beforehand, my thesis shouldn’t have been a pain in the ass to commit into writing. But with these new tools at the disposal of new researchers, I expect the floodgates to be opened for students at my school to do more social research on the Internet.
I’m still stuck in the “dark ages” of Internet research. I’m not a computer scientist: I am not very well-versed in programming languages, and I would probably end up with better results doing manual data mining.
The disadvantages of manual data mining come to the fore, in that a (scientifically) less-objective methodology surfaces as a primary criticism. There is no way, as far as I’m concerned, to do a strict and committed random sampling method in an online social network if you’re going to do it manually. I relied on a particular Friendster group, so questions may be laid on (a pretentious sort of) objectivity.
But even then, large groups come with large samples. With large samples come hard work, and hard work demands extreme commitment. Dedicated programs cancel out hard work and extreme commitment, leaving you with interpreting the returned data (in terms of correlations, variances, and so on).
There is also no escaping errors. Manual data-collation, especially with large samples, would lead to errors. While they were minor ones in the case of my thesis (rounding errors), I still can’t sleep at night knowing that the integrity of my thesis can be compromised by a single miscalculated element.
But then again, you can’t do much with numbers alone, no matter how good you are in statistics. In general, I’m a skeptic when it comes to statistics: correlations, for example, don’t show actual relationships between arrays of instances. It is still important for any researcher in social network analysis to go through the tedious process of reading the site itself, because each element is unique. Establishing personality information, to me, is the first step in establishing the network for purposes of analysis: the whole is the sum of its parts.
In general, however, I am impressed with the possibilities brought about by computer-aided data mining, in terms of researches on social media. Dammit, I should have had a tool.