Background Patients routinely use Twitter to share feedback about their experience receiving healthcare. Identifying and analysing the content of posts sent to hospitals may provide a novel real-time measure of quality, supplementing traditional, survey-based approaches.
Objective To assess the use of Twitter as a supplemental data stream for measuring patient-perceived quality of care in US hospitals and compare patient sentiments about hospitals with established quality measures.
Design 404 065 tweets directed to 2349 US hospitals over a 1-year period were classified as having to do with patient experience using a machine learning approach. Sentiment was calculated for these tweets using natural language processing. 11 602 tweets were manually categorised into patient experience topics. Finally, hospitals with ≥50 patient experience tweets were surveyed to understand how they use Twitter to interact with patients.
Key results Roughly half of the hospitals in the US have a presence on Twitter. Of the tweets directed toward these hospitals, 34 725 (9.4%) were related to patient experience and covered diverse topics. Analyses limited to hospitals with ≥50 patient experience tweets revealed that they were more active on Twitter, more likely to be below the national median of Medicare patients (p<0.001) and above the national median for nurse/patient ratio (p=0.006), and to be a non-profit hospital (p<0.001). After adjusting for hospital characteristics, we found that Twitter sentiment was not associated with Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) ratings (but having a Twitter account was), although there was a weak association with 30-day hospital readmission rates (p=0.003).
Conclusions Tweets describing patient experiences in hospitals cover a wide range of patient care aspects and can be identified using automated approaches. These tweets represent a potentially untapped indicator of quality and may be valuable to patients, researchers, policy makers and hospital administrators.
Statistics from Altmetric.com
- Healthcare quality improvement
- Patient satisfaction
- Quality measurement
- Performance measures
- Quality improvement methodologies
Over the past decade, patient experiences have drawn increasing interest, highlighting the importance of incorporating patients’ needs and perspectives into care delivery.1 ,2 With healthcare becoming more patient centred and outcome and value driven, healthcare stakeholders need to be able to measure, report and improve outcomes that are meaningful to patients.2–5 These outcomes can only be provided by patients, and thus systems are needed to capture patient-reported outcomes and facilitate the use of these data at both an individual patient level and the population level.2 ,3 Structured patient experience surveys such as Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) are common methods designed to assess patients’ perception of the quality of their own healthcare.4 ,6 ,7 A major drawback with these surveys is the significant time lag—often several months—before official data are released, making it difficult for patients and other concerned stakeholders to be informed about current opinions on the quality of a given institution. Moreover, these surveys traditionally have low response rates,4 ,8 raising concerns about potential response and selection bias in the results.
Social media usage is pervasive in the USA, with most networks seeing growth in their user base each year. As of 2014, approximately one out of five adults actively use Twitter; while most popular with adults under 50 years old, the network has seen significant growth in the 65 and older population in the past year.9 Although there are legitimate privacy, social, ethical and legal concerns about interacting with patients on social media,10–12 it is clear that patients are using these venues to provide feedback.13–21 In addition, the use of social media data for health research has been gaining popularity in recent years.22 ,23
Sentiment analysis of social media is useful for determining how people feel about products, events, people and services.24–28 It is widely used in other industries, including political polling29–32 and brand/reputation management.33 ,34 Researchers have also been experimenting with sentiment analysis of social media for healthcare research.13 ,14 ,16 ,17 Sentiment can be determined in several ways, with the goal being to classify the underlying emotional information as either positive or negative. This can be done either purely by human input or by an algorithm trained to complete this process based on a human-classified set of objects, and reliability is largely a function of the method used.
We seek to describe the use of Twitter as a novel, real-time supplementary data stream to identify and measure patient-perceived quality of care in US hospitals. This approach has previously been used to examine patient care in the UK.17 While there was no correlation between Twitter sentiment and other standardised measures of quality, the analysis provided useful insight for quality improvement. Our aims are to provide a current characterisation of US hospitals on Twitter, explore the unsolicited patient experience topics discussed by patients, and determine if Twitter data are associated with quality of care, as compared with other established metrics.
Hospital Twitter data
We compiled a list of Twitter accounts for each hospital in the USA. The October 2012–September 2013 HCAHPS report served as our hospital master list and included 4679 hospitals. We used Amazon's Mechanical Turk (AMT)—an online tool that allows large, tedious jobs to be completed very quickly by harnessing the efforts of crowd-sourced employees35—to identify a Twitter account for each hospital. Two AMT workers attempted to identify an account for each hospital, with any disagreements resolved by manual inspection (FG). We used the services of DataSift, a data broker for historical Twitter data, to obtain all tweets that mentioned any of these hospitals during the 1-year period from 1 October 2012 to 30 September 2013. Mentions were defined as tweets if they were specifically directed toward a hospitals’ Twitter account (ie, they included the full hospital Twitter handle, such as ‘@BostonChildrens’). During this time frame, we found 404 065 tweets that were directed at these hospitals. Tweets and associated metadata were cleaned and processed by custom Python scripts and stored in a database (MongoDB) for further analysis. This study only analysed tweets that were completely public (ie, no privacy settings were selected by the user) and that were original tweets—we ignored all retweets (tweets from another individual that have been reposted) to ensure we were capturing unique patient experience feedback. Furthermore, there were no personal identifiers used in our analysis, and thus there was no knowledge of the users’ identities. The study was approved by the Boston Children's Hospital Institutional Review Board, which granted waiver of informed consent.
Machine learning classifier
We manually curated a random subset of hospital tweets to identify those pertaining to patient experiences—defined as patient's, friend's, or family member's discussion of healthcare experience. Some examples of patient experience included: interactions with staff, treatment effectiveness, hospital environment (food, cleanliness, parking, etc), mistakes or errors in treatment or medication administration, and timing or access to treatment. Curation was achieved via two methods. The first method used a custom web-app that facilitated the curation of randomly selected tweets from the database, by allowing multiple curators (including TR and KB) to label them if related to patient experiences. Each tweet was labelled by two curators, and only those tweets labelled identically were used for the analysis. The second method of data curation used AMT for crowd-sourced labelling. Again, multiple curators, who were classified by Amazon as being highly experienced in the field of sentiment analysis (Master Workers),36 labelled each tweet, and only those tweets that agreed in their labelling were used. To test if curators for both methods were classifying tweets reliably, we calculated inter-rater agreement and Cohen's κ values37 ,38 between raters. Because we used multiple pairs of curators for the first method and AMT can use hundreds of individual curators for a project, we focused our efforts on the pairs of curators that were most prolific between specific sets of curators. The most prolific raters in the web-based method, representing four curators out of eight total and 52% of all classified tweets, showed an average agreement of 94.7% and an average κ value of 0.425 (p<0.001) across 12 620 tweets. The analogous raters for the AMT method, representing 20 curators out of 210 and 5% of classified tweets, had an average agreement of 97.7% and an average κ value of 0.788 (p<0.001) across 529 tweets. After multiple rounds of curation, curator pairs had rated 24 408 tweets using the web-app (overall agreement of 90.64%) and 15 000 tweets using AMT (overall agreement of 80.64%). These two sets were combined to create a training set of 2216 tweets relating to patient experiences and 22 757 tweets covering other aspects of the hospital.
This training set was used to build a classifier that could automatically label the full database of tweets. The machine learning approach looks at features of the tweets (eg, number of friends/followers/tweets from the user, user location and the specific words used in the tweet, but never username) and uses this information to develop a classifier. For the text of a tweet, we used a bag-of-words approach and included unigrams, bigrams and trigrams in the analysis. Specifically, we compared multiple different classifiers (naive Bayes and support vector machine) and subjectively selected the best classifier based on a variety of metrics such as F1 score, precision, recall and accuracy. Building the classifier was an iterative process and we retrained and improved the classifier over many rounds of curation. We used 10-fold cross-validation for evaluating the different classifiers, and selected a support vector machine classifier with an average accuracy of 0.895. This classifier on average had an F1 score of 0.806, precision of 0.818, and recall of 0.795.
We used natural language processing (NLP) to measure the sentiment of all patient experience tweets. Sentiment was determined using the open-source Python library TextBlob.39 The sentiment analyser implementation used by TextBlob is based on the Pattern library,40 which is trained from human annotated words commonly found in product reviews. Sentiment scores range from −1 to 1, and scores of exactly 0.0 were discarded, as they typically indicate that there was not sufficient context. The average number of patient experience tweets for all hospitals was 43. To ensure that there were enough tweets to provide an accurate assessment of sentiment, we calculated a mean sentiment score for each hospital with ≥50 patient experience tweets (n=297).
We compared the proportion of hospitals in each of the following American Hospital Association (AHA) categorical variables between the highest and lowest sentiment quartiles: region, urban status, bed count, nurse-to-patient ratio, profit status, teaching status and percentage of patients on Medicare/Medicaid. We compared nurse-to-patient ratio and percentage of patients on Medicare/Medicaid with the median national value. We used the following Twitter characteristics (measured in August 2014) for sentiment correlation and quartile comparison: days that the account has been active; number of status updates; number of followers; number of patient experience tweets received; and number of total tweets received.
We again used AMT to identify which topics were being discussed in the patient experience tweets. A total of 11 602 machine-identified patient experience tweets were classified by AMT workers as belonging to one or more predefined categories. Only tweets with agreed-upon labels were further analysed; this totalled 7511 tweets (overall agreement of 64.7%); of these, 3878 were identified as belonging to a patient experience category, and 3633 were found to be not truly about patient experience. Owing to the sheer number of topics, we calculated average agreement and Cohen's κ values for both workers for each topic. We found that the topics Food, Money, Pain, General, Room condition, and Time had an average agreement of 91.7% and a moderate κ of 0.52 (p<0.001), while the topics Communication, Discharge, Medication instructions, and Side effects had an average agreement of 97.4% and a low κ of 0.18 (p<0.001).
We emailed contacts with formal positions in the office of patient or public relations (or equivalent) of the 297 hospitals with ≥50 patient experience tweets (111 unique Twitter accounts) and asked them to provide feedback regarding their use of Twitter for patient relations. If employees could not be identified, either the department email (n=44) or general contact email (n=40) was used. Contact was attempted twice, with a second email sent 9 days after the first if necessary. The questions asked were: (1) “Does your hospital monitor Twitter activity?”; (2) “Do you follow-up with patients regarding comments they make on Twitter?”; and (3) “Are you aware that patients post about their hospital/care experience on Twitter?”. Informants were told their participation in the study was voluntary, confidential and anonymous.
Comparison with validated measures of quality of care
We chose two validated measures of quality of care. The first was HCAHPS, the formal US nationwide patient experience survey. The intent of the HCAHPS is to provide a standardised survey instrument and data-collection methodology for measuring patients’ perspectives on hospital care, which enables valid comparisons to be made across all hospitals. Like other traditional patient surveys, the HCAHPS is highly standardised and well validated.4 ,6 ,7 We focused on the percentage of patients who rated a hospital a 9 or 10 (out of 10), which has been shown to correlate with direct measures of quality,4 although we also looked at the percentage of patients who gave a 0 to 6 rating (not shown). We analysed data from the HCAHPS period 1 October 2012–30 September 2013. The second validated measure of quality of care was the Hospital Compare 30-day hospital readmission rate calculated from the period 1 July 2012–30 June 2013. This is a standardised metric covering 30-day overall rate of unplanned readmission after discharge from the hospital and includes patients admitted for internal medicine, surgery/gynaecology, cardiorespiratory, cardiovascular and neurology services.41 The score represents the ratio of predicted readmissions (within 30 days) to the number of expected readmissions, multiplied by the national observed rate.42
We used Pearson's correlation to assess the linear relationship between numeric variables, Fisher's exact test to compare proportions between categorical variables, and a two-tailed independent t test to compare the means of quantiles. Bonferroni correction was used to adjust for multiple comparisons. Multivariable linear regression was used to adjust for potential confounders such as: region, size, bed count, profit status, rural/urban status, teaching status, nurse-to-patient ratio, percentage of patients on Medicare and percentage of patients on Medicaid. Twitter account confounders (total statuses, total followers, and total days since account creation) were measured in August 2014. Additional Twitter covariates were the total number of patient experience tweets received during the study period and whether or not the hospital had a unique Twitter handle (as opposed to sharing with a larger healthcare network). A Wald test was used to test for trend significance.
Characteristics of US hospitals on Twitter
Of the 4679 US hospitals identified, 2349 (50.2%) had an account on Twitter; this included data from 1609 Twitter handles (as many hospitals within a provider network share the same Twitter handle). During the 1-year study period, we found 404 065 total tweets directed towards these hospitals (data from 1418 Twitter handles, representing 2137 hospitals); of these, 369 197 (91.4%) were original tweets (data from 1417 Twitter handles, representing 2136 hospitals). The classifier tagged 34 725 (9.4%) original tweets relating to patient experiences and 334 472 (90.6%) relating to other aspects of the hospital. Patient experience tweets were found for 1065 Twitter handles, representing 1726 hospitals (36.9%).
Table 1 describes the common characteristics for all of the hospitals with Twitter accounts. Overall, the mean number of patient experience tweets received for all hospitals during the 1-year study period was 43. The median sentiment values for the highest and lowest quartiles were 0.362 and 0.211, respectively. The proportion of hospitals in the profit status (p<0.001) and bed count (p=0.037) categories was significantly different between the highest and lowest sentiment quartiles, with public and larger hospitals over-represented in the lowest sentiment quartile.
We found no correlation between sentiment and Twitter characteristics, except a weak negative correlation (r=−0.18, p=0.002) with total days the account was active. When the highest and lowest quartiles were compared after hospitals had been ranked based on these characteristics, only the total number of tweets was shown to have an effect on sentiment (p=0.002).
Hospitals with 50+ tweets were more active on Twitter, as they had more posts and followers (p<0.001), but their accounts were not older. In addition, hospitals with more patient experience posts were more likely to be below the national median of Medicare patients (p<0.001), above the national median for nurse/patient ratio (p=0.006), and be a non-profit hospital (p<0.001). Figure 1 shows the geographical distribution of all US hospitals on Twitter, highlighting sentiment and number of patient experience tweets received.
Patient experience tweets
Discharge instructions (including care at home)
“…epic fail on my TKA discharge”
“12 hrs in the ER…..not good [hospital]. My father is just now getting a room. #notgoodenough #er #getitright”
Treatment side effects
“Im on 325 mg aspirin coming off blood thinners after blood clott found in lung will this affect my heart??”
Communication with staff
“I know it's Monday/Flu season but waiting 3.5 h for pediatrics to call me back is a little much”
“Hey [hospital], can you hold off on the collections calls until my bill is actually due? Please try to keep it classy.”
“[hospital] pediatric ER sucks! No doctors to assist, nurses in the back having coffee while there is a sick child in pain in empty ER”
Conditions of rooms/bathrooms
“what do you have against coat hooks? 3 exam rooms, 2 locations in 1 day and not 1 place to hang my coat and bag!”
“[hospital] gave my mom a prescription for a discontinued medicine. #Silly hospital. @Walgreens saved the day!”
“skipped dinner last night because of your terrible cafeteria food. Are you trying to get more patients? #eathealthy #healthcare”
General satisfaction/dissatisfaction with procedure and/or staff
“I'm thankful for the [hospital], their staff, my Doctors and to be treated as a person, not just a patient.”
Use of Twitter data by US hospitals
Of the 297 hospitals surveyed about Twitter use, 49.5% responded. All hospitals indicated that they monitored Twitter closely, actively interacted with patients via Twitter, and were aware that patients post about their care experiences. Box 2 includes some additional representative feedback received.
Hospital responses regarding their use of patient Twitter posts
“[O]ur goal is to respond to patient comments within an hour of their post. If the comment can be addressed via Twitter, we direct them to appropriate resources online. We are extremely careful to abide by HIPAA guidelines and the protection of patient privacy.”
“We also do social listening to find out what topics are important to our patient families and supporters so we can take a proactive approach and participate in the conversation by providing information and help.”
“We've had patients tweet us from waiting rooms, we've even had patients tweet us from their hospital beds!”
“We proactively respond to all social media conversations that mention us. We respond then try to facilitate a one-on-one email or phone conversation with the patient or caregiver to discuss their experience.”
“We utilize geocoding to narrow tweets to within .25 kilometers of each facility to capture any tweets that don't expressly mention [hospital network name], but are related to their health care experience at a facility.”
“If there is a serious complaint, we send those along to Patient Relations who may follow up outside of social media. We typically don't get into a back and forth around a negative comment, but rather let the person know we are sorry for their experience, and then direct them to patient relations.”
Linking Twitter data to quality of care
In the univariate analysis, we found a significant difference between percentage of people giving an HCAHPS rating of 9 or 10 for hospitals that have a Twitter account compared with those that do not (0.71 vs 0.69, p=0.001) and between HCAHPS rating in the highest versus lowest quartiles of hospitals ranked by the number of Twitter followers (0.72 vs 0.69, p<0.001). In addition, there was a significant difference in sentiment in the highest versus lowest quartiles of hospitals ranked on HCAHPS (0.30 vs 0.26, p=0.017). However, after adjusting for potential confounders (see Methods) through multivariate linear regression, we did not find any significant correlation between HCAHPS and any of these metrics.
We observed a correlation between 30-day readmission rates and sentiment. There is a weak negative correlation (r=−0.215, p<0.001), where higher sentiment scores are associated with lower 30-day readmission rates (figure 2). In addition, there was a small but significant difference (p=0.014) between the 30-day readmission rates in the highest versus lowest quartiles of hospitals ranked on sentiment. Finally, after adjustment for hospital and Twitter characteristics using multivariate linear regression, there was still a small but significant association between higher sentiment scores and lower 30-day readmission rates (table 3; p=0.003).
Our findings indicate that patients use Twitter to provide feedback about the quality of care they receive at US hospitals. We found that approximately half of the hospitals in the USA have a presence on Twitter and that sentiment towards hospitals was, on average, positive. Of the 297 surveyed, half responded and all confirmed that they closely monitor social media and interact with users. We therefore conclude that the stakeholders of these hospitals see the value of capturing information on the quality of care in general, and patient experience in particular. Surprisingly, we found only a weak association with one measure of hospital quality (30-day readmission), but not with an established standard of patient experience (HCAHPS). Taken together, our findings suggest that Twitter is a unique platform to engage with patients and to collect potentially untapped feedback—and possibly a useful measure for supplementing traditional approaches of assessing and improving quality of care.
Our findings on the extent of Twitter usage by hospitals are similar to what has been reported previously.43 The generally positive sentiment on Twitter is consistent with other analyses that suggest a positive language bias on social media.44 However, our analysis of Twitter sentiment, and exploring the association with conventional quality measures, is novel. There were some striking differences between the hospitals with the highest and lowest sentiment, with both large and public hospitals being over-represented in the lowest quartile. In addition, the number of tweets a hospital received influenced, in part, hospital sentiment; hospitals that received more tweets had, on average, higher sentiment. However, the number of tweets a hospital posted did not affect sentiment. Thus, having a more active online presence with a frequent posting behaviour is not sufficient to increase sentiment alone, although we did find that it increased the likelihood of receiving more patient experience tweets.
Twitter feedback is entirely unsolicited. As such, there was a wide range of patient experience topics discussed. These topics include those covered by the HCAHPS survey and previous research,45 as well as some not typically addressed (eg, time, side effects, money and food concerns). It is not surprising that some topics tended to be more negative than others—for example, discussion of time, money or pain is not likely to be positive. Thus, from an individual hospital's perspective, it might not be useful to heavily weight the number of positive or negative tweets within one topic category at any one moment. However, monitoring these topics over time and detecting when sentiment goes above or below an established baseline could be useful.
We used both HCAHPS scores and 30-day hospital readmission rates as conventional measures of quality of care to compare against. Readmission rate was only one of several metrics we could have used to compare against HCAHPS; other measures such as mortality and Hospital Compare metrics could also be analysed. While there are conflicting studies on this association with readmission rates and it is disputed by some,46–49 they have been used in this way before, including recent studies that showed correlation with ratings on Facebook15 and Yelp.18 We report associations between Twitter sentiment and readmission rates to evaluate the potential of this relationship and found a weak negative correlation, with higher-sentiment hospitals having a lower readmission rate. This association survived adjustment for potential confounders, with a small but significant downward trend for readmission rates as sentiment increases. Nonetheless, we acknowledge that the observed correlation was weak at best and probably influenced by confounding factors. Importantly, no association was observed between sentiment and HCAHPS score, after adjustment for hospital characteristics. This finding of only a weak association with a clinical metric, and no association with the more easily explainable alternative patient experience metric, suggests that Twitter sentiment must be treated cautiously in understanding quality. The use of Twitter data as we have in this analysis is in its infancy, and therefore development of methodologies to compare against traditional measures of patient experience is warranted. However, our findings suggest that there is new information here that hospital administrators may want to listen carefully to.
There were several limitations to our study. First, while the use of Twitter is becoming more pervasive in the older population, users under 30 years of age are the largest group, indicating there is a selection bias. Second, we only looked at tweets that explicitly included a hospital's Twitter handle. Broadening our criteria to include hospital names as keywords or attempting to assign tweets to nearby hospitals given geospatial data could have potentially increased the number of patient experience tweets we identified. In addition, many hospitals within a larger network shared a Twitter account and, without additional follow-up, it is difficult to determine which hospital is being discussed. Like all surveys, our hospital questionnaire may have been subject to a potential response and selection bias. Owing to the cross-sectional design, while we have shown association between organisations that use Twitter and their interactions with patients, we cannot confirm any causal relationship. Further investigation of how these findings change over time would be helpful. Finally, while patient experience classification had relatively high agreement rates and inter-rater κ values, topic classification only had an overall agreement of 64.7%. In addition, some of the topics had a low κ value. This is probably an effect of using crowd-sourced curators without a high level of domain-specific training, which also explains why 77.3% of patient experience tweets were non-specifically labelled as ‘General’. As for our automated approach, machine classification and sentiment analysis using NLP does not perform as well as human curation. With these caveats acknowledged, our approach enabled processing of an extremely large amount of data and illustrated that automated analysis of Twitter data can provide useful, unsolicited information to hospitals across a wide variety of patient experience topics.
Our findings have implications for various groups. Hospital administrators and clinicians should consider actively monitoring what their patients are saying on social media. Institutions that do not use Twitter should create accounts and analyse the data, while existing users might consider leveraging automated tools. Insight from key leaders at institutions will help to better understand gaps and potential opportunities. Regulators should continue to consider social media commentary as a supplemental source of data about care quality.50 The information is plentiful and, although the techniques for processing and understanding these data are still being developed and improved, potentially important. We recommend a larger survey in the USA and globally with all relevant stakeholders, including patients and their families, to obtain a better understanding of the use and value of social media for patient interactions. The public should pay attention to what other people are tweeting and posting on social media, and systems to collect, aggregate and summarise this information for a public audience in real-time should be considered to complement data from traditional reporting platforms. To increase the utility of these data, we would recommend that each hospital manages their own unique Twitter identity, rather than share an account across a larger healthcare network.
We show that monitoring Twitter provides useful, unsolicited, and real-time data that might not be captured by traditional feedback mechanisms—Twitter sentiment only weakly correlates with readmission rates but not HCAHPS ratings, as would be expected. While many hospitals monitor their own Twitter feeds, we recommend that patients, researchers and policy makers also attempt to utilise this data stream to understand the experiences of healthcare consumers and the quality of care they receive.
We would like to thank the hospitals that graciously participated in our survey. We would also like to thank Clark Freifeld for his help in acquiring the raw data from DataSift.
Contributors JBH had full access to all of the raw data in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis. Study design: JBH, JSB, FTB and FG. Acquisition of data: JBH. Manual curation: TR and KB. Machine learning: JBH, GT. Analysis and interpretation of data: JBH, JSB, TR, KB, EON, DJM, RR, AW, FTB, FG. Drafting of manuscript: JBH and FG. Statistical analysis: JBH, TR, KB and DJM. Critical revision of the manuscript for important intellectual content: JBH, JSB, TR, KB, EON, DJM, RR, AW, FTB and FG. Study supervision: JSB and FG.
Funding Funded by NLM Training Grant T15LM007092 (to JBH), NLM 1R01LM010812–04 (to JSB), and a Harkness Fellowship from the Commonwealth Fund (to FG).
Competing interests None declared.
Ethics approval Boston Children's Hospital Institutional Review Board.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Aggregate data may be obtained by writing to the corresponding author.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.