The force of Annie Petit's argument is that observation needs to be protected as a legitimate activity. Which is part of research.
CASRO want responses by Sept 2nd. the MRS by Sept 16th
I'm not going to explain what digividuals are. John does it much better than I can. What emerges from the webinar is that Braimjuicer has managed to navigate away from the dangerous shoals of stalking people and stealing their data. Because the digividuals collect internet content from all over they don't track individuals. What is collected is loosely representative but at this point John really treats the data as stimulus for researchers to then develop hypothetical products and then generate insights from these. In short it doesn't matter if the digividuals reflect actual people or not and on balance it is better than they don't.
One of the most interesting claims John makes is that digividuals could replace research groups as a surer way of generating insights. So the battle is on!
The other really interesting idea that came from the webinar was the idea that once you have your digividuals up and running you can subject them to all sorts of indignities like Christmas shopping or losing their job. So the creative potential is considerable. You can log onto the webinar here. There's a film which aggregates the content collected by one of the first digividuals a designer type called Nicole who lives in Hammersmith. She even has her own route to work - trackable on Googlemaps. She even has an Ebay account of her own. And Nicole is a research bot made up of a few lines of code. There's also another film of David Bausola and Will Goodhand who want to Casro Tech in New York last June to launch Digividuals there.
Digividuals can be seen all sorts of inteesting ways. They are a landgrab from quant into qual. It will be interesting to see how many clients switch qual budgets across to this. Or if attracts money from entrely different budgts altogether.
Annelies Verhaeghe of Insites Consulting who will be bringing a paper to Cloud 4 was in action at ESOMAR. As were several from Brainjuicer. I notice that both Anneliese and myself (and Christophe?) will be on the platform at the MRS Social Media conference the day after Cloud 4. So that will make interesting watching because I'm still not sure what is going to be discussed on the day!
And I thought I would remind you that submissions for the MRS 2011 conference need to be made to the MRS by the end of September. Here are the details.
Hope to see you next wednesday John G
Regarding the robots - this was the first time they had been publicly mentioned on a conference platform. John Kearon aka Chief Juicer and a member of the Cloud of Knowing group - had given me permission to say that Brainjuicer is now experimenting with the robots known by Brainjuicer as Digividuals. And working with Kraft on using them. Which meant that what might otherwise have seemed a rather theoretical concept now had a practical application but for most there had emerged completely left of field. Later that day Will Goodhand business development head at Brainjuicer pitched the Digividuals as a concept at the Dragon's Den session which closed the conference. So there appeared to be 2 independent mentions of robots in the same day - making it one of the hot topics at the conference.
When the awards shortlistings were announced it was very gratifying to be nominated for Best New Thinking and for Best Presentation. Particularly the latter since trying to introduce transmedia, text robots and probabilstic sampling and tagging in 15 minutes meant that my slides were quite desnse. So much so that the chair of the session Rita Clifton had joked that I would be offering a free slide by slide afterwards. Clearly my presentation wasn't as obscure as she had anticipated.
I'm not really sure where to log the Cloud of Knowing paper - it emerged far too quickly and under time pressure when the original idea was that it should be the output of the group working together. The good news about the publishing of the paper at the conference and its shortlisting is that it helps to put Cloud of Knowing on the map. I'm hoping that at our next meeting there will be some new curious faces willing to join in.
Graeme Lawrence of Virtual Surveys invited me to give the paper again at the Northern MRS event - highlights of the conference so we may be on the way to bulding a Northern cloud fraternity as well!
What I would like to make very clear is that this paper was not the point of the project. It is work in progress and I look forward to many more papers, meetings and discussions about using web content.
The deadline for the Market Research Society Conference written paper is Monday (gasp). I will do my very best to circulate a draft of the paper end of play tomorrow to give you the chance to comment on structure and content. Which leaves me Monday to knock the document on the head.
You will have caught the drift if you've seen either the webinar I gave for the IE business school or the powerpoint (if you hadn't got an hour to listen to me burble through it) the links are in a powerpoint in the scriptorium.
I plan to set up a face to face meeting in February when we can talk about which ideas to feature in the conference presentation - which is only 15 minutes after all. And also to talk about one of the ideas which is Demographic replicators - search for the term and you can learn more about these as social media bots and their potential use in research. There's a blog dedicated to them.
So.. wait for the paper - your inputs please finishing line on Monday and a meeting in the offing. Speak soon.]]>
Quantitative research surveys tend to collect the information about who you are at the end of the survey because its dull and being factual you may be too wearied from the survey to bother to embroider. It is very clear which are the classification questions to be used for weighting and cross tabulation. And which is the content which will be carved up.Qualitative research takes a lot of trouble to make sure you are the right person before you start the research at all. And in analysis much is made of the distinction between process data and content data. The bits of the research interview (or group) that are identified as being linked to the dynamics of the research process itself - these are discarded as contaminated. It could be the warm up for the group or something a respondent agrees with which is reckoned to be a part of the group dynamics than a sign that person B agrees with person A.
want to suggest that using two parallel tagging systems that we could make sampling dynamic and apply it to web content. Sample tags would be used to identify who is the author of the content. Content tags would be used to identify the content. Although a lot of data might carry both tags the same data item should not be used for sampling and content because that would be bootstrapping. This approach would have to be probablistic - because it is unlikely that we would ever have enough information from sources of internet content to identify them (apart from asking them to complete a sample frame questionnaire of course) - but I''m trying to avoid this because it would turn content analytics into a different way of recruiting online respondents - great content - can you fill in this survey? Similar to scoring models used in direct mail which use detailed information about a sample to make intelligent guesses about the propensities of people living in similar types of houses or buying similar branded goods.
The sampling bottleneck which we need to get past is analogous to the Van Neuman bottleneck which was a major step forward in computing. Before Van Neuman computer designers used two different types of data - control/instruction data and content which the instructions worked on. . The old knitting machines programed with cards are a hangover from that period. Even though Alan Turing's original concept paper showed that a symbol processing machine could handle both in the same medium. What Van Neuman suggested was that data be stored in memory and be acted upon using a CPU which applied programming instructions using the same medium. No more knitting cards. And computers have been designed the same way every since. The instructions and the data have been in the same format stored in the memory and processed in the CPU. I am suggesting that sampling and content needs to go the same way. Drawn from the same source and handled as part of the same medium]]>
Sorry - I've come over all Roger Dean. I'll b dusting down my old vinyl Yes Albums in a minute. But its just that I suddenly had a thought that if we have all these measurement tools grinding away.
Then for every word in the English language we have a score extending back several years on how how that word has trended. Some words trend a lot more dramatically than others. But I would have thought that the vector that a word travels through is reasonably stable. There are levels it can go up and down to. But there are other levels it will never reach. This becomes a kind of magnetic resonance map of the language. And if we artificially reduced it to a 5000 word vocabulary - (simple option) or a volatiliy index (the most exciting 5000 words) then wouldn't we have a brilliant way to predict the viability of social media campaigns because we would know the tolerance parameters of the words they were drawing on.
Its a bit like Fourrier analysis which basically deconstructs any curve into the component curves of which it is an aggregate. Which is how statistical modellers work out which factor is contributing which bit. At present we are looking at campaigns as discreet events when really we should be building a map of the entire linguistic field. ocean thingummy. Does this make sense or am I raving?]]>
Now the Germans are notoriously sensitive to data issues- they have the toughest direct mail regulations in Europe (opt in is standard for example). It matters because on the wild frontier of analytics, data doesn't care who finds it. Having to filter data by regulatory authority based on where the web user or website so we can give them an opt out. is a potential nightmare. And now that Google has announced that it will limit the number of daily free news items for the news sites in an effort to stop the newspapers opting out of Google altogether there are worrying signs that walls are starting to be built. The value of analytics largely consists of how easy it is to get hold of. Analysing it is where the hard work is. If getting hold of it becomes twice as difficult then a lot less clients are going to want to use it.]]>
Why am I blogging about it? Because Google Wave is potentially a killer app for research. For the last 2 years I have been running training sessions doing pros and cons for chatroom versus bulletin board research and Wave has eliminated the distinction. But because it also allows you to embed other objects - projective material and any other structured inputs you could wish for can be put into it. And I presume this means also outputs so voting can be embedded as well.
Would Wave be a suitable vehicle for dumping RSS feeds before the community get to work sorting? Dunno need to think about that.
And as soon as Google are forward thinking enough to allow me to invite others I will of course let you know and you can trial it yourselves.
1/ content identification and retrieval - to a wiki or friendfeed type place
2/ Tagging probably working alongside respondents
3/ Grading (probably working alongide respondents
4/ web user profiling - the ability to guess who people are without having to ask them.
Here's a little gem from No 4 which I picked up this week from Andrew Walmesley's Marketing column
Check out http://www.hackerfactor.com/GenderGuesser.html#Analyze a simple free tool which you can paste some text you've written into it and it will guess your gender. (thought I do like the get out clause they put: Weak emphasis could indicate European. Welcome news for eurosceptics!
Now this is a very basic idea but clearly if there are a series of tools which with reasonable confidence can establish not only demographic iidentity (which is hard) then behavioural profiling is in the bag because it is much easier. What I mean by this is pattern of site visiting, location, level of IT expertise, what they find interesting, what bores them. Those are much easier to guesstimate.
So if these tools work and we have a big enough data set for each web poster we follow, it is possible that we could identify a group of blog/website posters by topic and then use the tools to construct a sample frame.Yes? What do you think?