Back to Forum Back to Top


Experience with IRBs and metadata collection

PII IRB social media

Back to Forum

Browsing: Experience with IRBs and metadata collection

Sara Meeder

Posts: 27

Does anyone have experience with metadata collection for social network communications in research? I am interested in how researchers and IRBs are dealing with metadata collection that involves non-participant identifiers (phone numbers/addresses).

Rubi Linares-Orozco

Posts: 33

Is the data you are collecting publically available? Is it pre-existing?
If so your study may qualify for Exempt Category 4
" Research involving collection or study of existing data, documents, records,
or specimens, if:
a) these sources are publicly available; or
b) the information is recorded by the researcher in such a manner that subjects cannot be
identified, directly or through identifiers linked to the subjects."
--- In your request for exemption to the IRB you will need to demonstrate how your data meets the criteria for exemption.

For data that is not publically available nor pre-existing, the IRB will want to know how the data is being obtained pro-spectively, whether obtaining such data presents risk to participants (i.e. loss of confidentiality, infrigement on someones right to privacy), how consent/permission to obtain data was/will be obtained. The study may qualify for expedited review category depending on how the data is being/will be obtained.

I hope this information helps.

~ Rubi

Sara Meeder

Posts: 27
posted in reply to rubi linares-orozco

Thanks, Rubi! The data will be collected prospectively for the study in the process of collecting metadata on participant email, text and phone communications. The plan is to replace non-participant numbers with a study identifier, but this will happen after data is already collected. I know other people are collecting metadata for studies, and am interested in finding out what kind of language other researchers have used to explain this data collection to participants and to the IRB.


Rubi Linares-Orozco

Posts: 33
posted in reply to sara meeder

Hi Sara,

As you indicated that the data is being obtained pro-spectively, the IRB will want to know from what social media the data will be extracted from (source: Twitter, Facebook, etc.) and how it is being obtained- API, Oauth, Stream, Data Pools, etc.

The IRB will also want to know the expectation of privacy from the source.
For example, Twitter has already incorporated this issue into their user agreement... “our public information includes the messages you Tweet; the metadata provided with Tweets, such as when you Tweeted and the client application you used to Tweet; the language and time zone associated with your account; and the lists you create, people you follow, Tweets you mark as likes or Retweet, and many other bits of information that result from your use of the Twitter Services. Twitter may receive information about your location. For example, you may choose to publish your location in your Tweets and in your Twitter profile. You may also tell us your location when you set your trend location on We may also determine location by using other data from your device, such as precise location information from GPS, information about wireless networks or cell towers near your mobile device, or your IP address.” If Twitter considers this data to be public, you have a strong agrument for "Public" data and that there is no expectation of privacy.

Depending on your source of data, you will want to read the user agreement and privacy statement for that social media site. You will want to provide this information to the IRB Committee so that they gain/enhance their understanding of what the social media site considers to be public record, and what information users have already agreed to make public.

The IRB Committee will also want to know
1. What type of identifiers are being captured
2. What risks are likely to participants (i.e. loss of confidentiality, infrigement on someones right to privacy)--
3. How you will either remove/delete personal identifiable information (PII)
4. How you will either mask ID's (user handles, IP addresses, phone numbers).
5. Where and how will the data be stored? (e.g. encrypted secure server, cloud computing)
6. Who will have access to this data? (e.g. “Only the PIs and designated research assistants will be allowed to access the data, and no identifying information will ever leave the secure environment.”)
7. When the data is written up, how will it be written in such a way that re-identification is unlikely to occur (i.e. aggregated results).

The IRB Committee can request for more information, but these are the more common ones I have experienced as an Analyst.

I recommend you sit down with your respective IRB, as sometimes talking through the study and different scenarios can help in developing the protocol with the appropriate safeguards to minimize the potential risks.

Best of Luck,
~ Rubi