Boletín SciELO-México: Datos académicos

Mostrando entradas con la etiqueta Datos académicos. Mostrar todas las entradas

martes, 1 de octubre de 2024

Los cárteles de datos y la publicación vigilada (entrevista con Sarah Lamdan)

Publicado en Knowledge Equity Lab
https://knowledgeequitylab.ca/podcast/s3-e2/

Escuchar audio: https://www.buzzsprout.com/1628326/episodes/11154367-data-cartels-surveillance-publishing?client_source=small_player&iframe=true&referrer=https://www.buzzsprout.com/1628326/11154367-data-cartels-and-surveillance-publishing.js?container_id=buzzsprout-player-11154367&player=small

Entrevista con: Sarah Lamdan, autora del libro Data Cartels: The Companies That Control and Monopolize Our Information (Data Cartels: Las empresas que controlan y monopolizan nuestra información).

- la vigilancia de los usuarios y la extracción de datos se han introducido en la infraestructura académica de múltiples formas

El libro de la autora analiza los peligros de dejar que varias empresas de análisis de datos monopolicen y actúen como cárteles en los mercados de la información.
LexisNexis fue contratada para el programa de vigilancia de investigación extrema del ICE (Servicio de Inmigración y Control de Aduanas). En realidad, es una sub-agencia bajo el Departamento de Seguridad Nacional, una especie de policía de inmigración
LexisNexis es una plataforma de investigación jurídica, que está involucrada en la vigilancia gubernamental y la vigilancia del ICE. Esto abrió una línea de investigación que cinco años más tarde culminó en un libro sobre empresas de análisis de datos-
LexisNexis es también uno de los principales corredores de datos del gobierno que vende nuestra información personal al gobierno. Además, es un importante proveedor de datos financieros que proporciona datos a bancos y compañías de seguros y otras instituciones financieras.
ArielX y algunas otras grandes empresas de datos ocupan y monopolizan múltiples mercados de información.
Empresas como Reed Elsevier LexisNexis y Thomson Reuters reprimen la competencia, actúan al unísono con los demás miembros de su grupo monopolizador de empresas y se comportan como cárteles que, implícitamente -siempre es implícito- actúan de forma concertada para evitar la competencia y mantener y ganar más control sobre los grandes mercados de la información, como la información académica, la información jurídica y los datos personales.
Eso es algo complejo porque estas empresas hacen cosas diferentes en mercados diferentes, y cada uno de los diferentes mercados tiene diferentes nombres.
Cada uno de los mercados tiene nombres diferentes, pero lo que estas empresas tienen en común es que todos están vendiendo información, ya sea información personal, información académica, información jurídica, y datos. Datos sobre todas esas cosas diferentes, a diferentes grupos de consumidores
Tal es el caso del mercado de intermediación de datos, que son empresas que venden datos de reconocimiento facial o datos de geolocalización, o sólo los expedientes de datos sólidos con los registros públicos y privados y registros de redes sociales. Este tipo de industria de datos personales se llama a menudo la industria de corretaje de datos.
Sin embargo, no solemos referirnos a las plataformas de información académica como corredores de datos académicos, sino las llamamos plataformas de investigación o incluso revistas académicas.

TEMPORADA 3 EPISODIO 2

Los cárteles de datos y la publicación vigilada

En los últimos años, a medida que el proceso de investigación y erudición se ha ido desarrollando cada vez más en línea, ha quedado claro que la vigilancia de los usuarios y la extracción de datos se han introducido en la infraestructura académica de múltiples formas.

Para quienes están comprometidos con la preservación de la libertad académica y la equidad del conocimiento, es importante cuestionar las prácticas y estructuras de las empresas que recopilan y venden estos datos, así como las repercusiones de este modelo de negocio en la infraestructura académica y, en particular, en los académicos y estudiantes ya marginados y con escasa financiación.

Para ayudarnos a entender este panorama y sus implicaciones, hoy conversamos con Sarah Lamdan, autora del libro Data Cartels: The Companies That Control and Monopolize Our Information (Data Cartels: Las empresas que controlan y monopolizan nuestra información).

Escuchar

Transcripción

Safa

Está escuchando el podcast "Unsettling Knowledge Inequities", presentado por el Knowledge Equity Lab y SPARC, la Scholarly Publishing and Academic Resources Coalition.

A lo largo de los últimos años, a medida que el proceso de investigación y erudición se ha ido desarrollando cada vez más en línea, se ha hecho evidente que la vigilancia de los usuarios y la extracción de datos se han colado en la infraestructura académica de múltiples maneras.

Para ayudarnos a entender este panorama y sus implicaciones, hoy conversamos con la Dra. Sarah Lamdan, autora del libro Data Cartels: The Companies That Control and Monopolize Our Information.

Sarah: Me llamo Sarah Lamdan. Soy profesora de Derecho en la Facultad de Derecho de la Universidad de la Ciudad de Nueva York. Y tengo mi sede en la ciudad de Long Island.

En 2017 estuve trabajando como Bibliotecaria. Así que soy a la vez profesora de Derecho y también bibliotecaria con un máster en Gestión de la Información Jurídica. Así que estaba trabajando en una biblioteca de derecho y alguien me envió un artículo de noticias que decía: LexisNexis era una de las empresas que competían por un contrato para el programa de vigilancia de investigación extrema del ICE.

ICE es el Servicio de Inmigración y Control de Aduanas. Así que en realidad es una sub-agencia bajo el Departamento de Seguridad Nacional, y se piensa en una especie de policía de inmigración, la aplicación de la inmigración - y en la época en que me enteré de la conexión ICE LexisNexis, ICE Inmigración y Aduanas estaba involucrado en una gran cantidad de actividades realmente éticamente tensas. Hubo toda la revelación sobre, ya sabes, la separación de los niños, la gente estaba realmente preocupada por el tipo de límites de los derechos humanos que la inmigración y la aplicación de las costumbres en los Estados Unidos estaban cruzando.

Y como bibliotecario jurídico, utilizas LexisNexis todo el tiempo. Usas la plataforma jurídica todo el tiempo, y yo no entendía cómo Lexis, que para mí era una plataforma de investigación jurídica, podía estar involucrada en la vigilancia, la vigilancia gubernamental y la vigilancia del ICE, y eso abrió esta línea de preguntas e investigación que cinco años más tarde culminó en un libro sobre empresas de análisis de datos como LexisNexis, Reed Elsevier LexisNexis.

Y también me enteré de que LexisNexis es uno de los principales corredores de datos del gobierno que vende nuestra información personal al gobierno. Y también son un importante proveedor de datos financieros que proporciona datos a, ya sabes, bancos y compañías de seguros y otras instituciones financieras.

Y empecé a ver que ArielX y algunas otras grandes empresas de datos estaban ocupando y monopolizando múltiples mercados de información. Y también vi cómo empresas como Reed Elsevier LexisNexis y Thomson Reuters reprimían la competencia y actuaban al unísono con los demás miembros de su grupo monopolizador de empresas y se comportaban como cárteles que, implícitamente -siempre es implícito, pero, actuaban de forma concertada para evitar la competencia y mantener y ganar más control sobre los grandes mercados de la información, como la información académica, la información jurídica y los datos personales.

Así que mi libro analiza los peligros de dejar que varias empresas de análisis de datos monopolicen y actúen como cárteles en los mercados de la información.

Es algo complejo porque estas empresas hacen cosas diferentes en mercados diferentes, ¿correcto? Y cada uno de los diferentes mercados tiene diferentes nombres. Como el mercado de intermediación de datos, por lo general pensamos en eso como expedientes o trozos de nuestros datos personales, correcto. Las empresas que venden datos de reconocimiento facial o datos de geolocalización, o, ya sabes, sólo los expedientes de datos sólidos con los registros públicos y privados y registros de redes sociales.

Todo ese tipo de industria de datos personales se llama a menudo la industria de corretaje de datos, pero no llamamos a las plataformas de información académica, no los llamamos corredores de datos académicos. Correcto. Las llamamos plataformas de investigación o incluso revistas académicas. Correcto. Y las plataformas de investigación jurídica son servicios de información jurídica asistidos por ordenador. ¿Correcto?

Así que cada uno de los mercados tiene nombres diferentes. Pero en realidad, lo que todos estos mercados están haciendo y lo que creo que todas estas empresas tienen en común es que todos están vendiendo información, ya sea información personal, información académica, información jurídica, y datos - puntos de datos sobre todas esas cosas diferentes, a diferentes grupos de consumidores, ¿correcto?

Y cuando te mueves por la base de datos, haces clic en las cosas, las miras, las descargas. Ya sabes, haces clic en hipervínculos. Todos esos datos sobre ti pueden ser rastreados y recogidos. Y esos datos son realmente valiosos. Los datos personales son muy valiosos, ¿correcto?

Se puede utilizar para evaluar el impacto de ciertas revistas, ya sabes, qué revistas están recibiendo más clics, qué información es más popular. Pero también puede ser, ya sabes, utilizado para identificarte como investigador, ¿correcto? Esto es lo que investigas. Esto es a quién te afilias. Y esos, ya sabes, son sólo dos ejemplos de la forma en que tu información es valiosa.

Así que ahora las empresas pueden tomar toda tu información y convertirla en un producto totalmente nuevo para vender, ya sean proyecciones del factor de impacto o sugerencias a las instituciones sobre qué tipo de investigación deberían financiar, o qué investigadores son los más productivos.

Pero también información sobre ti, ya sabes, ¿eres una buena apuesta para la titularidad? ¿Qué institución debería contratarte? Diferentes evaluaciones sobre ti, así como sobre la investigación y las propias instituciones. Esa información puede venderse a toda una nueva gama de consumidores. Puede venderse a empresas tecnológicas y farmacéuticas y a otras empresas que invierten en investigación y se benefician de ella, y puede venderse a instituciones académicas y financiadores de becas.

Así se abre un nuevo espacio para los consumidores y los beneficios.

Safa: Si bien es cierto que muchos de nosotros nos registramos voluntariamente para utilizar estas plataformas, a menudo no comprendemos plenamente los términos y condiciones que estamos aceptando, ni tenemos una opción real de excluirnos, si queremos seguir participando en el ecosistema de la investigación académica.

Sarah: En cierto sentido, todos podemos suponer que cuando utilizamos cualquier plataforma en línea, ya sea una plataforma de medios sociales o una plataforma de investigación, todos, en algún momento hacemos clic en subir a algún tipo de acuerdo, correcto.

Todos estamos de acuerdo con algunos términos de servicio. Hacemos clic, estoy de acuerdo. O rellenamos algún tipo de, ya sabes, nuestros nombres y direcciones de correo electrónico o lo que sea, alguna otra información de identificación para hacer algún tipo de contraseña para nosotros mismos. Así que en ese sentido, como, sí, nosotros, somos notificados. Pero como todos sabemos, esas notificaciones nunca son claras. Nunca son directas y realmente no hay una manera clara de ver cómo tus datos están siendo recogidos y cómo están siendo utilizados. Incluso con ese tipo de condiciones de los servicios que se ponen a tu disposición o con tu consentimiento.

Así que, en cierto modo, sí, probablemente sabemos que nos rastrean, pero no sabemos qué implica ese rastreo. El seguimiento real y el uso real de nuestros datos son procesos muy opacos que no son muy transparentes.

Dorothea Salo, investigadora de la Universidad de Wisconsin, creo, pudo ver algunos de los datos que se recopilaban en ProQuest, que ahora es una entidad de análisis de datos de Clarivate.

Y ella realmente presentó una solicitud de libertad de información en su estado para obtener el tipo de atrasos para ver qué datos se estaban recogiendo sobre ella.

Y la parte interesante era que se podía ver todos los tipos de datos, como, en la parte superior de la hoja de cálculo sobre qué tipo de datos podrían ser recogidos. Y era bastante invasivo. Es su nombre, su sexo, su dirección. Lo que usted haga clic en, cuánto tiempo usted mira, a qué hora del día usted está haciendo clic en, usted sabe, usted está iniciando sesión, su institución, sólo todo tipo de categorías de información que potencialmente podrían ser recogidos.

Ahora bien, muchos de los datos que se recopilan dependen de su institución, porque su institución puede determinar qué tipo de datos se asocian a su identificación de estudiante y cuántos datos se asocian a su identificación de estudiante - y su identificación de estudiante puede ser entonces el punto de enlace entre usted y su inicio de sesión en ProQuest y otras plataformas afiliadas a la escuela.

Así que varía de una institución a otra, pero la capacidad es recoger todo tipo de información de identificación. Dependiendo de la cantidad de información que la institución les proporcione, y de la cantidad de información que usted les proporcione a cambio de su acceso. Y una cosa interesante que está empezando a suceder. Por ejemplo, con Reed Elsevier Lexisnexis, estas empresas de análisis de datos están adquiriendo otras empresas que proporcionan otros conjuntos de datos.

Así que una cosa en la que SPARC ha estado haciendo un gran trabajo es en el seguimiento de las adquisiciones de empresas como Interfolio. Interfolio es un producto totalmente distinto que utilizan tanto los financiadores de subvenciones como las instituciones académicas como portales para que los solicitantes de empleo y los beneficiarios de subvenciones introduzcan todo tipo de información personal sobre sí mismos.

Así que, ya sabes, para solicitar un trabajo en una institución, puedes utilizar Interfolio como tu centro, y puedes cargar tu CV en él, todas tus cartas de recomendación. Si recurres a un proveedor de fondos que utiliza Interfolio, es posible que dirijas muchos datos a través de Interfolio sobre cómo estás gastando el dinero de la subvención y cómo se está llevando a cabo tu propuesta de subvención, paso a paso y con actualizaciones periódicas.

Y ahora ese tipo de datos también fluye a través de Elsevier, que es una plataforma de investigación, ¿correcto? Así que no es sólo cuando se conecte ahora, que sus datos van a la misma central de datos. Es también a través de estos otros productos que empresas como Clarivane y empresas como Reed Elsevier Lexisnexis están adquiriendo.

Y de nuevo, no podemos ver cómo están utilizando los datos que no conocemos. Correcto. Porque es muy opaco. Es muy poco transparente. Probablemente dirían que los algoritmos que están utilizando o los procesos que están utilizando son secretos comerciales protegidos, ya sabes. Al igual que la forma en que desarrollan sus factores de impacto y sus conjuntos de datos y la forma en que los utilizan es su información personal, de propiedad también. Así que no podemos saber cómo se utilizan los datos, porque no hay ninguna regulación que obligue a estas empresas a ser transparentes. No hay normas sobre cómo estas empresas utilizan nuestros datos o hay muy pocas normas.

Así que no estamos seguros de lo que ocurre con nuestros datos, pero sí sabemos que proceden de nosotros cuando nos conectamos e investigamos en las plataformas. Pero también están obteniendo nuestros datos personales a través de todo tipo de otras fuentes.

Si lo piensas, Reed Elsevier Lexisnexis, la parte LexisNexis de esa empresa, es también un importante intermediario de datos para el ICE. Para cientos de otras agencias policiales y agencias gubernamentales. Así que también tienen esos expedientes de datos sobre todos nosotros, ¿correcto?

Así que tienen nuestros datos académicos aquí. Tienen nuestros datos que están vendiendo a las fuerzas del orden aquí. Así que hay un montón de datos personales que fluyen a través de estas empresas que también nos venden nuestros productos de investigación.

Safa: Este tipo de vigilancia de los usuarios y de extracción de datos personales se ha convertido en un modelo de negocio líder dentro de la infraestructura académica, en Norteamérica y cada vez más a nivel mundial.

Sarah: Creo que otras empresas están considerando este modelo como una buena forma de proceder en el futuro.

Quiero decir, una de las cosas acerca de la industria editorial es que ha sufrido en las últimas décadas. A medida que la información se ha vuelto más fácilmente disponible en línea, ha sufrido problemas financieros y, ya sabes, una especie de nuevas fuentes de búsqueda de beneficios, ya que sus modelos están cambiando, ¿correcto?

Una vez que las bibliotecas dejan de comprar revistas en papel, empresas como Elsevier, Springer, etc., tienen que replantearse cómo van a hacer negocios y asegurarse de que van a seguir siendo rentables. Así que pienso mucho, y estudio este tipo de crecimiento vertical en estas otras industrias de datos que están como fluyendo a través del sistema.

Pero una cosa que hace la investigación de Leslie Chan es también mostrar este tipo horizontal de apertura de nuevos mercados de información. Así que donde Elsevier realmente solía obtener sus principales beneficios, obtener su principal, ya sabes, flujo de ingresos en el punto de venta. Así que Elsevier haría una revista, conseguiría estos editores para editar los materiales, hacer la revisión por pares y luego Elsevier haría dinero cuando la revista final estuviera disponible y podrían vender esa revista final a las bibliotecas. Pero ahora Elsevier ha descubierto cómo obtener beneficios del proceso de prepublicación.

Así que la plataforma de preprints como BPress u otros, ya sabes, servicios de preprints. SSRN es ahora una empresa de Reed Elsevier LexisNexis. Así que han encontrado la manera de obtener beneficios en el lado de preprints y luego también para desarrollar los factores de impacto y los datos posteriores a la publicación y los servicios de métricas que luego se pueden vender después de la publicación. ¿Correcto?

Así que también ha habido esta oportunidad de beneficiarse horizontalmente en más lugares que sólo el punto de venta. Y supongo que esto se remonta a lo que estaba diciendo anteriormente acerca de cómo las empresas están descubriendo nuevas formas de obtener beneficios en este modelo de publicación cambiante, ya sabes, en esta transición del papel a los servicios de información digital.

Así que sí, creo que lo que fluye a través de ese crecimiento horizontal que Leslie Chan describe y luego también de las adquisiciones de Interfolio, de Clarivate comprando ProQuest para obtener su nube de datos. Todo ello tiene en común el uso de datos personales para enriquecer estos productos de información y crear nuevos productos de información para vender.

Safa: Ustedes se preguntarán, ¿qué significa esto para el profesorado? ¿Hasta qué punto debería alarmar al profesorado?

Sarah: Muchas veces, cuando hablo con profesores y bibliotecarios sobre este tema, la gente dice: bueno, todos usamos Facebook, ¿correcto? Nuestros datos ya están ahí. ¿A quién le importa, correcto? ¿A quién le importa si Elsevier quiere actuar como Facebook y chupar nuestros datos y utilizarlos?

Pero creo que es de especial preocupación para la información académica por varias razones. La primera razón es que estamos dando mucho poder a estas empresas privadas que no son científicos y que no son expertos en diversos campos de las humanidades o la ciencia, para determinar qué empresas académicas se financian, que se valoran por tener mayor impacto, menor impacto, ¿correcto?

Asignan valor a productos académicos y de conocimiento en los que no son expertos. Correcto. Quitan mucho poder al mundo académico para controlar el proceso académico y toda la empresa del conocimiento, ¿correcto? Como la forma en que desarrollamos y apoyamos la academia y las actividades académicas.

Y luego otro gran problema con el uso de datos de esta manera y los datos académicos de esta manera es que toma tipo de sesgos sistémicos construidos en el mundo académico, y los perpetúa y los incrusta en los sistemas digitales. Correcto. Así que todos sabemos que el mundo académico ha tradicionalmente ignorado el trabajo de las personas de color, de las mujeres, de las instituciones que no son tan consideradas "élite".

Entonces, si eres un hombre blanco que trabaja en la Facultad de Derecho de Harvard, tu beca va a recibir más atención que la mía, ¿correcto? En la Facultad de Derecho de la CUNY, como mujer. Y sabemos que eso es aún peor si eres una persona de color en una facultad o universidad históricamente negra, o has publicado en una revista que tiene cierta terminología en su título frente a Harvard Law Review.

Así que estos prejuicios no desaparecen en los formatos electrónicos - y luego también en el tipo de datos en los que Elsevier y Clarivate se basan para hacer métricas y para hacer estas predicciones y prescripciones que sus productos de análisis de datos formulan, y que luego pueden vender a las instituciones contratantes y a todas las empresas de financiación y de contratación en el mundo.

Eso también es realmente perjudicial, porque cuando toda esta acumulación y recopilación de datos, y luego todos estos análisis de datos ocurren de una manera opaca que no es transparente, ni siquiera podemos ver cómo este sesgo está sucediendo, qué tipo de sesgos están sucediendo y luego dar consejos o instrucciones sobre cómo detener y romper estos sesgos, y crear un ecosistema mejor, más igualitario y más expansivo para el mundo académico.

Así que ahora sabes que cuando investigas en cualquiera de estas plataformas, tus datos están siendo recogidos, no hay investigación privada verdaderamente no velada que pueda ocurrir en estos ecosistemas tal y como están formulados actualmente.

Así que eso es problemático para la libertad académica e intelectual, ¿verdad? Sólo una falta de privacidad y luego también, en cuanto a la libertad académica se refiere, cuanto más dependemos de estas plataformas y sistemas para hacer el trabajo de la academia y para hacer el trabajo de investigación y para hacer el trabajo de gobernar los procesos de contratación y los procesos de permanencia y, clasificando a través de quién obtiene métricas más altas, quién obtiene métricas más bajas. Cuanto más cedemos a estas empresas, menos control tenemos y menos libertad tenemos en nuestro trabajo, ¿verdad? Así que estamos cediendo nuestra privacidad y también estamos cediendo nuestra capacidad de controlar nuestros destinos académicos, y de entender cómo se toman estas decisiones.

Safa: Entre el profesorado, los profesionales de las bibliotecas se enfrentan particularmente a una tensión entre su compromiso con la privacidad de sus usuarios y su papel en la compra de productos y servicios de estas empresas.

Sarah: Es interesante porque, además de ser profesora de Derecho y poder decir que estoy muy al margen de todo esto, que no tengo que preocuparme por ello, también soy bibliotecaria. Así que estoy muy, íntimamente y en primera persona consciente de cómo este tipo de problemas hacen más difícil para las bibliotecas hacer su trabajo. Así es.

Crea un verdadero problema ético en el que los bibliotecarios tienen que equilibrar la prestación de servicios muy necesarios a sus clientes, y luego también garantizar que sus clientes tienen privacidad, que las personas que utilizan sus bibliotecas, ya sea académicos o el público, que también llegan a mantener su privacidad. Porque ahora mismo ambas cosas no pueden suceder a la vez.

Y eso es un verdadero punto de choque para los bibliotecarios. Hace que el trabajo de los bibliotecarios sea realmente difícil y presenta estas opciones realmente difíciles, si no imposibles para las bibliotecas, donde tienen que decidir: ¿voy a proporcionar estas revistas en las que confían mi facultad y mis estudiantes?

¿O voy a decir "no podemos utilizar estas plataformas porque están recopilando nuestros datos y no sabemos si están vendiendo esos datos, o no sabemos cómo están utilizando esos datos"? Sí, claro.

Es muy difícil, especialmente para mí que soy bibliotecaria jurídica, así que las bibliotecas jurídicas que dependen en gran medida de Westlaw y Lexis no sienten que puedan dar la vuelta y decir a sus estudiantes, facultades y abogados: "Oye, ya no vamos a utilizar estos productos. No vamos a contratar más estos productos.

Podría haber reglamentos o normas que crearan ese tipo de salvaguardias y protecciones, pero ahora mismo no los hay.

Safa: Otro motivo de preocupación es la relación de estos intermediarios de datos que operan en el espacio de la publicación académica con entidades gubernamentales. Los datos personales que venden pueden contribuir a la violación de los derechos humanos, la elaboración de perfiles raciales y otras formas de violación y perjuicio.

Sarah:. ICE ha recibido la mayor atención porque su trabajo realmente ha levantado muchas alarmas, y su infraestructura de vigilancia se ha utilizado para separar a niños de sus padres y hacer cosas realmente duras y abusivas. Además, existe un movimiento llamado No Tech for ICE, liderado por excelentes organizaciones de defensa de los derechos de los inmigrantes, que han hecho un gran trabajo sacando a la luz cómo el ICE utiliza estos sistemas.

Pero estos sistemas no sólo son utilizados por el ICE. Están siendo utilizados por el estado local y la aplicación de la ley federal. Están siendo utilizados por la mayoría de las agencias gubernamentales que tienen algún tipo de contrato de servicios especiales de LexisNexis, u otro contrato de datos. Correcto.

La IRS los usa para determinar quién es más propenso a cometer fraude fiscal. Están siendo usados por el servicio postal de los Estados Unidos, ¿correcto? Nuestros datos de redes sociales y, y estos corredores para averiguar quién podría estar cometiendo fraude postal.

Están sirviendo como estos grandes centros de datos para las fuerzas policiales locales, estatales y federales para poner en común y utilizar los datos de los demás y ser capaz de vincular y utilizar los sistemas de datos que estas empresas proporcionan.

Están siendo utilizados por las compañías de seguros, empresas de selección de inquilinos, empleo, empresas de selección, los sistemas de salud, que están luchando contra la guerra de opioides, como eso es lo que les gusta decir que están haciendo.

Así que todos estos sistemas, tanto las principales instituciones públicas y las principales instituciones privadas, tomar decisiones muy importantes sobre nuestras vidas, ¿correcto? Acerca de qué tipo de servicios podemos obtener, si podemos acceder a nuestras cuentas bancarias, si podemos acceder a la asistencia sanitaria en ciertos tipos de medicamentos e intervenciones médicas, todos estos sistemas están utilizando empresas de análisis de datos de terceros para ayudarles en su trabajo. Y eso incluye a las empresas LexisNexis y Thompson Reuters, que también proporcionan importantes plataformas de investigación.

Safa: Es muy preocupante el hecho de que esta infraestructura se construya sobre desigualdades ya existentes y las digitalice, poniendo en riesgo y sirviendo para marginar aún más a estudiantes y académicos ya infrafinanciados.

Sarah: El primer riesgo es que, al integrar esos sistemas de injusticia, prejuicios históricos, racismo, xenofobia, etc., en los sistemas digitales, al utilizar datos que favorecen a determinadas instituciones, a determinados profesores, se sobrerrepresentan generosamente algunos tipos de becas y se subrepresentan otros, ¿sabes?

Y piensas en factores de impacto o, ya sabes, otras métricas que simplemente infunden su sesgo histórico en estos sistemas digitales.

La vigilancia, especialmente la gubernamental, tiende a afectar de forma dispar a ciertas comunidades, a ciertos tipos de instituciones y a ciertos individuos, ¿correcto?

Este es un problema que ocurrió en la ciudad de Nueva York y sobre el que, ya sabes, la gente de mi facultad de derecho trabajó mucho: los estudiantes musulmanes estaban siendo vigilados, ya sabes, vigilados de forma única, a raíz del 11 de septiembre en la ciudad de Nueva York. Y hay pruebas reales de que eso ocurrió.

Y así, este tipo de sistemas de recopilación de datos, si optan por entregar esos datos a las fuerzas del orden, a otras fuerzas de vigilancia, también es más probable que perjudique a ciertos académicos y determinados.

Y sin verdaderos esfuerzos concertados para crear igualdad de condiciones y no gravitar sólo hacia ciertas cosas una y otra vez - se ve cómo ciertas empresas de tecnología se convierten en las principales empresas de tecnología, ciertas instituciones académicas se convierten en las principales instituciones académicas. Ciertos enfoques académicos se convierten en los principales en detrimento de todo lo demás.

A mucha gente le encanta pensar en la inteligencia artificial, aunque hay muchas pruebas de que la inteligencia artificial no es real, de que los ordenadores no pueden ser sensibles, de que todavía no hemos llegado a ese punto. Y puede que nunca lleguemos a sustituir a los humanos por inteligencia artificial, la inteligencia artificial es una palabra de moda, es rentable.

Así que imagina estudios académicos que utilizan la frase inteligencia artificial, y se hacen en instituciones particulares que trabajan en estrecha colaboración con Facebook u otras grandes empresas de tecnología, esos estudios se hace clic en mucho más, sobre todo por, ya sabes, Twitter y Facebook - como son, los investigadores y están bien financiados y están en su uso de ese trabajo y la perpetuación de ese trabajo. Así que ese trabajo obtiene un artificialmente alto o simplemente un factor de impacto muy alto. Y parece que es muy popular.

Así que las empresas de análisis de datos, ya sabes, Clarivate, Scopus, y Elsevier de análisis de datos y todas estas empresas de análisis de datos predicen que la investigación en inteligencia artificial va a ser la más rentable, va a ser la próxima gran cosa.

Así que antes de que te des cuenta las principales instituciones académicas están gastando todo su dinero en laboratorios de inteligencia artificial. Correcto. Y los académicos que deciden dedicarse a la inteligencia artificial tienen muchas más posibilidades de conseguir postdoctorados y puestos de trabajo en estas grandes empresas, ya sabes, que es visto como el gran tipo de investigación. Así que todo el dinero, todo el talento, todo el interés está siendo desviado en una dirección por estas empresas de análisis de datos. Correcto. Entonces, ¿qué pasa con la investigación del cambio climático? ¿Qué pasa con la investigación, ya sabes, criticando la IA y diciendo, Ooh, tal vez la IA no va a, ya sabes, tal vez esa no es la ola del futuro, tal vez eso es un error o que no es realista.

Si dejamos que lo decida Scopus o Clarivate en lugar de que lo decidan los propios académicos, creo que no es un buen augurio para la toma de decisiones académicas y la empresa del conocimiento.

Safa: Dicho esto, hay muchos grupos de partes interesadas que se están organizando y luchando contra la consolidación del poder y las prácticas poco transparentes y perjudiciales de estas empresas.

Sarah: Así que es realmente inspirador, especialmente entre los bibliotecarios que, obviamente, son, somos una especie de primera línea de personas que se dan cuenta de que esto está sucediendo y hay realmente una gran cantidad de pensamiento concertado y un esfuerzo concertado en torno a esto.

Creo que todavía estamos en una especie de fase en la que estamos informando a todo el mundo sobre el problema, correcto.

He hablado con mucha gente este año y en cada sala en la que entro, virtual o real, hay gente que aún no ha oído hablar de este problema. Correcto.

Así que creo que todavía estamos en la fase en la que estamos explicando estos problemas a todo el mundo. Correcto.

Mi investigación es bastante nueva, la investigación de Leslie es bastante nueva, ¿correcto? Así que creo que ahora hay una masa crítica de personas que son conscientes de los problemas. Así que ahora estamos entrando en esta fase siguiente, que es realmente emocionante - organizaciones como Library Futures y Library Freedom Project están empezando a pensar, ahora que reconocemos que esto es un problema, que esta transición de análisis de datos que está sucediendo con nuestros editores es una amenaza para la privacidad, es una amenaza para la empresa del conocimiento, ¿qué hacemos ahora?

Y creo que, ya sabes, SPARC está empezando a pensar en la creación de un grupo, un espacio, supongo, para discutir estos temas, ¿correcto? Una especie de comunidad en torno a estas cuestiones en la que los bibliotecarios reflexionen sobre qué pueden hacer y qué sería lo mejor para sus instituciones.

Y luego trabajar juntos en ese espacio bien organizado.

Y otras organizaciones, como Library Freedom Project, han elaborado tablas de puntuación sobre la clasificación de diferentes productos, como qué productos te darán más privacidad, cómo están utilizando potencialmente tus datos estas empresas. Y estamos a punto de publicar una sobre investigación jurídica en la que hemos estado trabajando juntos. Es muy interesante.

Y es una herramienta muy útil para las personas preocupadas. Y realmente otras organizaciones como Library Futures, sólo piensan en otras intervenciones y otros caminos hacia adelante donde podemos seguir proporcionando recursos bibliotecarios y seguir siendo eficaces, incluso cuando estas empresas se alejan de su servicio tradicional de biblioteca.

Hemos encontrado y visto activismo en torno a este tema en algunos lugares interesantes. Uno de los más importantes es el de los accionistas. Así que para una empresa, Thomson Reuters, que es una empresa canadiense, un gran sindicato que invierte fuertemente en Thomson Reuters, presentó una resolución de los accionistas pidiendo Thomson Reuters para identificar y examinar los riesgos financieros de su participación en la vigilancia ICE.

En el momento en que se redactó la resolución de los accionistas, Thomson Reuters era el mayor intermediario de datos para la infraestructura de vigilancia del ICE, y Thomson Reuters sigue siendo un participante muy importante y tiene múltiples contratos con el ICE para datos personales.

Así que los accionistas exigieron que Thomson Reuters investigara e informara sobre el riesgo potencial que la empresa estaba asumiendo y al que estaba exponiendo a sus accionistas por estar implicada en esas prácticas potencialmente violadoras de los derechos humanos. Y esto, creo que fue este año, Thomson Reuters, se dio cuenta de que iba a tener que hacer ese informe.

Y ahora vamos a publicar un informe al respecto, con vigilancia de datos y, en particular, ICE en el futuro. Así que sí, ha habido un activismo efectivo de los grupos de defensa de los inmigrantes, pero también de los accionistas de estas empresas.

Y nuestras instituciones académicas tienen un papel realmente importante y grande que desempeñar en la forma en que estas empresas pueden acceder y recopilar nuestros datos.

Y creo que muchas instituciones ni siquiera son conscientes del poder que tienen. Así que gran parte de los datos que estas empresas pueden recopilar a través de sus plataformas de investigación depende de la cantidad de datos que las instituciones académicas están recopilando y de la cantidad de datos que están permitiendo que fluyan a través de estos productos.

Creo que fue en la entrada del blog de Dorothea Salo, donde describió los resultados de la solicitud de Interfolio que presentó. Pero creo que tal y como ella lo planteó, la plataforma ProQuest no podía recopilar ciertos datos sobre ella porque esos datos no eran recopilados por su institución, ¿correcto? No estaban vinculados a su identificación de estudiante. Y así, si la institución no recoge los datos, entonces ProQuest no puede recoger esos datos y ver esos datos. Correcto.

Así que muchas veces la información que su propia institución académica está rastreando tendrá un impacto en el tipo de datos que fluyen a estas empresas de terceros, ¿correcto?

Y entonces, ya sabes, las métricas y cualesquiera que sean las entidades que compran a estos terceros, no estamos seguros de quiénes son.

Una cosa que ha estado surgiendo un poco y que los bibliotecarios sobre todo me han contactado es que hay este tipo de choque de intereses en las instituciones académicas, entre las bibliotecas que quieren garantizar la privacidad de todos sus usuarios y las propias instituciones académicas que están preocupados por la seguridad y el robo - que quieren rastrear dónde está la gente en el campus en caso de que algo sea robado o en caso de que alguien está en algún lugar donde no se supone que deben estar.

Quieren rastrear quién saca un iPad y quién, ya sabes, se conecta a un ordenador en el laboratorio de informática, o lo que sea, con el fin de asegurarse de que nadie está haciendo nada ilegal en estas plataformas, ¿correcto?

Así que hay este choque de la seguridad del campus y la seguridad digital y la privacidad de los datos, porque esas dos cosas no suelen llevarse bien, correcto. Uno implica la recopilación de una gran cantidad de datos y una gran cantidad de seguimiento de los estudiantes y profesores y la gente en el campus. Y luego el otro es lo contrario de eso. ¿Correcto? Vamos a eliminar los datos. No vamos a recoger datos.

Así que creo que, en las instituciones académicas, hay que mantener un debate abierto y franco sobre dónde trazar esas líneas, cómo equilibrar el interés de la seguridad y el interés de la privacidad y qué valores nos interesa más proteger.

Creo que es un debate importante. De hecho, he sido contactado por algunos bibliotecarios cuyos campus están insertando spyware en todo tipo de dispositivos digitales en su campus y lo están haciendo - su intención es proteger la propiedad escolar y proteger a los estudiantes, correcto.

Piensan que es una buena medida de protección policial, pero los bibliotecarios lo ven como una invasión de la privacidad y como otro conducto para los datos que eventualmente pueden fluir a través de estos otros sistemas, porque una vez que un campus recopila datos, no hay, quiero decir, aparte de FERPA y ciertas leyes de privacidad, que son muy limitadas, realmente no hay manera de asegurar que esos datos no se filtren y se derramen en otros usos.

Safa: Recientemente, se pidió a la Dra. Lamdon que testificara ante el Congreso sobre su investigación y experiencia en este tema, ya que están considerando algunas estrategias para regular las actividades de estas empresas.

Sarah: Ha sido un camino muy largo y difícil aprobar cualquier tipo de legislación sobre privacidad de datos en el Congreso. Algunos estados han sido eficaces, California ha aprobado algunas leyes sobre privacidad de datos y corredores de datos y Vermont y algunos otros estados también se han movido en la dirección de iniciar estos registros de corredores de datos que requerirían a empresas como LexisNexis registrarse como corredores de datos y luego también proporcionar a la gente su expediente de datos para que la gente pueda ver y corregir sus propios datos.

Así pues, a nivel estatal ha habido una especie de mosaico de actividades, pero la actividad federal en materia de privacidad de datos ha sido difícil de encontrar.

Sin embargo, hace poco tuve la oportunidad de hablar ante el Comité Judicial de la Cámara de Representantes, que estaba estudiando una ley llamada Fourth Amendment is Not for Sale Act (La Cuarta Enmienda no está en venta). Se trata de un proyecto de ley para imponer requisitos de orden judicial a terceros proveedores de datos como LexisNexis y otros tipos de proveedores de datos de reconocimiento facial y geolocalización.

Así que básicamente lo que la gente está tratando de hacer es asegurarse de que si el gobierno utiliza nuestros expedientes de datos de estas empresas, de estas empresas de análisis de datos, primero tienen que obtener una orden judicial que implica ir a un tribunal y mostrar una causa probable y obtener una orden particularizada por una razón particularizada - y ahora mismo no existe ninguna de esas protecciones.

Así que es un planteamiento interesante. No resuelve todo el problema, pero es un buen paso. Y es un paso importante, ¿correcto? La protección constitucional contra registros e incautaciones injustificados es importante para nuestra intimidad y para el debido proceso.

Así que sí, esa es una de las leyes más populares sobre privacidad de datos y gobernanza de datos que se está considerando.

Honestamente, es difícil para mí vincular ese tipo de cosa estrecha con este enorme problema de la empresa del conocimiento, porque para mí la Fourth Amendment is Not for Sale Act es interesante, pero no se acerca en absoluto a la solución de los problemas que estamos discutiendo aquí. ¿Correcto?

No lo hace, no impide que toda la empresa del conocimiento sea superada por las empresas de análisis de datos, que es una especie de fastidio, que me gustaría que el Congreso prestara más atención a ese problema, porque creo que también es un gran problema.

Y, por desgracia, creo que esto es muy eurocéntrico, pero la mayoría de las veces la ley, o el reglamento sobre el que la gente pregunta es el GDPR en la UE, ¿correcto? Es su principal reglamento de protección de datos. Y tampoco resuelve los problemas que estamos describiendo aquí, pero abre la puerta a los consumidores a ver qué tipo de datos recogen estas entidades y a asegurarse de que los datos que están vendiendo son correctos. Y para participar más en la canalización de nuestro uso y venta de datos.

Es un modelo interesante, que Estados Unidos aún no ha seguido, pero creo que es un modelo popular a nivel internacional que quizá otros países más allá de la UE también hayan puesto en marcha actividades similares. Así que estas normativas no impiden que se recopilen los datos. No prohíben que los datos se vendan, pero hacen que el proceso de recopilación de datos sea menos opaco y permiten una mayor participación pública en ese proceso.

Sarah: Lo que quiero que todo el mundo sepa -el punto principal de mi investigación y la razón por la que escribí el libro- es que quiero que la gente entienda que todos estos mercados de información están conectados y que todos están controlados por las mismas empresas. La investigación jurídica, la investigación académica, los datos financieros y la recopilación y venta de datos personales: todos estos servicios diferentes son ofrecidos por las mismas empresas.

Hay unas pocas empresas que están dominando todos los mercados de la información, y deberíamos prestarles mucha atención. Porque si nos interesan los flujos de información y el acceso a la información y la privacidad de los datos, estas empresas tienen mucho control, aunque no hablemos de ellas.

Hablamos de los cinco grandes todo el tiempo. Hablamos de Facebook, de Amazon y de Google. Pero no hablamos de Reed Elsevier LexisNexis, no hablamos de Clarivate, no hablamos de Thompson Reuters. Estas empresas también son gigantes multimillonarios de la información y de los datos que merecen la atención de los académicos, que merecen la atención de los reguladores.

Y todos deberíamos tenerlas en cuenta. Por eso animo a la gente a que lea "Cárteles de datos", porque describe y expone el panorama con bastante claridad.

Mi libro Data Cartels saldrá a la venta el 8 de noviembre, fecha de su publicación, el día de su cumpleaños, y ya se puede encargar por adelantado en el sitio web de Stanford University Press o dondequiera que se compren los libros, ya está disponible en todas las plataformas habituales de venta de libros.

Safa: Muchas gracias por sintonizarnos.

Si lo que han escuchado hoy les ha provocado, les invitamos a unirse a nosotros en el Knowledge Equity Lab. Juntos podemos reimaginar los sistemas de conocimiento y construir relaciones más sanas y comunidades de atención que promuevan y promulguen la equidad en múltiples niveles.

*************

Knowledge Equity Lab

https://knowledgeequitylab.ca/podcast/s3-e2/

SEASON 3 EPISODE 2

Data Cartels and Surveillance Publishing

Over the last years, as the process of conducting research and scholarship has moved more and more online, it has become clear that user surveillance and data extraction has crept into academic infrastructure in multiple ways.

For those committed to preserving academic freedom and knowledge equity, it’s important to interrogate the practices and structures of the companies that are collecting and selling this data, and the impacts of this business model on academic infrastructure – and particularly on already marginalized and underfunded scholars and students.

To help us understand this landscape and its implications, today we are in conversation with Sarah Lamdan, author of the forthcoming book Data Cartels: The Companies That Control and Monopolize Our Information.

Listen Now

Transcript

Safa

You are listening to the Unsettling Knowledge Inequities podcast, presented by the Knowledge Equity Lab and SPARC – the Scholarly Publishing and Academic Resources Coalition.

Over the last years, as the process of conducting research and scholarship has moved more and more online, it’s become clear that user surveillance and data extraction has crept into academic infrastructure in multiple ways.

To help us understand this landscape and its implications, today we are in conversation with Dr Sarah Lamdan, author of the forthcoming book Data Cartels: The Companies That Control and Monopolize Our Information.

Sarah: My name is Sarah Lamdan. I’m a Professor of Law at the City University of New York school of Law. And I’m based in Long Island city.

In 2017 I was working as a Librarian. So I’m both a Professor of Law and I’m also a Librarian with a Master’s degree in Legal Information Management. So I was working in a law library and somebody sent me a news article that said: LexisNexis was one of the companies vying for a contract for ICE’s extreme vetting surveillance program.

ICE is Immigrations and Customs Enforcement. So it’s actually a sub-agency under the Department of Homeland Security, and they’re thought of kind of immigration police, immigration enforcement – and in the era where I learned about the ICE LexisNexis connection, ICE Immigration and Customs Enforcement was involved in a lot of really ethically fraught activities. There was the whole revelation about, you know, child separation, people were really concerned with the kind of human rights boundaries that immigrations and customs enforcement in the United States were crossing.

And as a Law Librarian, you use LexisNexis all the time. You use the legal platform all the time, and I didn’t understand how Lexis, which to me was a legal research platform, could be involved in surveillance, government surveillance and ICE surveillance – and that kind of opened up this line of questions and research that five years later is culminated in a book about data analytics companies like LexisNexis, Reed Elsevier LexisNexis.

So when I started researching, I wasn’t aware of the fact that LexisNexis was a legal information platform. You know, part of this duopoly that Thomson Reuters and ArielX share, where they are the main legal information providers in the United States. Then I learned that Reed Elsevier LexisNexis is also the umbrella company for Elsevier, which is the biggest academic information and research company in the world.

And then also I learned that LexisNexis is one of several major government data brokers that sells our personal information to the government. And they’re also a major financial data provider that provides data to, you know, banks and insurance companies and other financial institutions.

And I started to see that ArielX and a few other major data companies were occupying and kind of monopolizing multiple information markets. And I also saw how companies like Reed Elsevier LexisNexis and Thompson Reuters kind of stifle competition, and kind of act lock step with the other members of their kind of monopolizing group of companies and they behave kind of like cartels that either, you know, implicitly – it’s always implicit, but, act in concert to stave off competition and to kind of maintain and gain more control of big information markets, like academic information, legal information, personal data.

So my book discusses the dangers of letting several data analytics companies monopolize and act like cartels in information markets.

It’s kind of complex because these companies do different things in different markets, right? And each of the different markets has different names. Like the data brokering market, we usually think of that being like dossiers or bits and pieces of our personal data, right. Companies that sell facial recognition data or geolocation data, or, you know, just robust data dossiers with public and private records and social media records.

That whole kind of personal data industry is oftentimes called the data brokering industry, but we don’t call academic information platforms, we don’t call them academic data brokers. Right. We call them research platforms or even academic journals. Right. And legal research platforms are computer assisted legal information services. Right?

So each of the markets has different names. But really when you get down to it, what all of these markets are doing and what I think all of these companies have in common is that they’re all selling information, whether it’s personal information, academic information, legal information, and data – data points about all of those different things, to different consumer groups, right?

To academics, to lawyers, to surveillance enterprises.

So these companies are all information and data purveyors across multiple markets. Regardless of what kind of jargony names we give each of those markets.

Safa: In order to better understand how we got here, it’s important to note how the shift of academic information from paper resources to digital platforms has opened the door to a decrease in privacy and increase in surveillance practices.

Sarah: The transition of academic information from paper resources like journals, you know, in stacks and shelves and libraries to these digital platforms has opened up more informational opportunities for companies like Reed Elsevier LexisNexis, and all of the companies that are vending and selling this kind of information.

Because platforms – they can provide information and structure information and create databases that can be searched in certain ways, but they can also collect information, right. They can collect user information, So these platforms are these walled garden infrastructures where users have to log in, right. In order to fully use ScienceDirect or fully use LexisLaw, you have to use a password or affiliate with an institution that somehow is an identifier for you, whether it’s your personal identifier, you know, like Sarah Lamdon, or whether it’s your institutional identifier, CUNY School of Law. Right. You have to flag who you are.

And then as you move around the database, you click on things, you look at things, you download things. You know, you click on hyperlinks. All of that data about you can be tracked and collected. And that data is really valuable. Personal data is really valuable, right?

It can be used to evaluate the impact of certain journals, you know, which journals are getting more clicks, which information is more popular. But it can also be, you know, used to identify you as a researcher, right? This is what you research. This is who you affiliate with. And those, you know, that’s just two examples of the way your information is valuable.

So companies can now take all of your information and make that into a whole new product to sell – whether it’s impact factor projections or suggestions to institutions about what kind of research they should fund, or which researchers are the most productive.

But also information about you, you know, are you a good bet for tenure? What institution should hire you? Different evaluations about you as well as research and the institutions themselves. So that information can be sold to a whole new array of consumers. It can be sold to tech firms and pharmaceutical firms and other companies that are investing in research and profiting from research, and it can be sold to academic institutions and grant funders.

So it opens up a whole new venue for consumers and profits.

Safa: While it’s true that many of us voluntarily sign up to use these platforms, we often don’t fully understand the terms and conditions we are consenting to – nor have a real option to opt out, if we want to continue participating in the academic research ecosystem.

Sarah: In a sense, we can all assume that when we’re using any platform online, whether it’s a social media platform or research platform, we all, at some point click ascent to some sort of agreement, right.

We all agree to some terms of service. We click, I agree. Or we fill out some sort of, you know, our names and email addresses or whatever, some other identifying information to make some sort of password for ourselves. So in that sense, like, yes, we, we are notified. But as we all know, those notifications are never clear. They’re never straightforward and there’s really no clear way to see how your data is being collected and how it’s being used. Even with those kinds of terms of services being made available to you or you consenting to them.

So in some ways, yeah, we probably know that we are being tracked, but we don’t know what that tracking entails. The actual tracking and then the actual use of our data, those are both very opaque processes that don’t have a lot of transparency around them.

So, Dorothea Salo, who is a researcher at University of Wisconsin, I believe, she actually was able to see kind of the backend of some of the data that was being collected on ProQuest, which is now a Clarivate data analytics entity.

And she actually filed a freedom of information request in her state to get the kind of backlogs to see what data was being collected about her.

And the interesting part was that you could see all the types of data, like, on the top of the spreadsheet about what kind of data could be collected. And it was pretty invasive. It’s your name, your gender, your address. What you click on, how long you look at it, what time of day you’re clicking on, you know, you’re logging in, your institution, just all sorts of categories of information that could potentially be collected.

Now, a lot of what data gets collected depends on your institution because your institution might determine what kind of data is associated with your student ID and how much data is associated with your student ID – and your student ID might then be the link point between you and your login to ProQuest and other school affiliated platforms.

So it varies from institution to institution, but the capacity is to collect all sorts of identifying information. Depending on how much your school gives them, how much you personally give them in exchange for your login. And one interesting thing that’s starting to happen. You see it happening with Reed Elsevier Lexisnexis as an example, these companies, these data analytics companies are acquiring other companies that provide other data sets.

So one thing that SPARC has actually been doing great work on is following the acquisitions of companies like Interfolio. So Interfolio is a whole separate product that is used by both grant funders and academic institutions as portals for job applicants and grant recipients to insert all sorts of personal information about themselves.

So, you know, in order to apply for a job at an institution, you might use Interfolio as your hub, and you might upload your CV into it, all of your letters of recommendation. If you’re using a grant funder that uses Interfolio, you might be directing a lot of data through Interfolio about how you’re spending the grant money and how your grant proposal is being carried out in a step by step way with regular updates.

And now that kind of data also flows through to Elsevier, which is a research platform, right? So it’s not just when you log in now, that your data is going to the same central data hub. It’s also through these other products that companies like Clarivane and companies like Reed Elsevier Lexisnexis are acquiring.

And again, we can’t see how they are using the data we don’t know. Right. Because it’s very opaque. It’s very non-transparent. They would probably say that whatever algorithms they’re using or whatever processes they’re using are trade secret protected, you know. Like how they develop their impact factors and their data sets and how they use them is their personal, proprietary information also. So we can’t know how data is being used, because there’s no regulation requiring these companies to be transparent. There’s no rules about how these companies use our data or very few rules.

So we’re not sure what’s happening with our data, but we do know it’s coming from us when we log in and do research on the platforms. But they’re also getting our personal data through all sorts of other sources.

Like if you think about it, Reed Elsevier Lexisnexis, the LexisNexis part of that company, is also a major data broker for ICE. For hundreds of other law enforcement agencies and government agencies. So they also have those data dossiers about all of us, right?

So they have our academic data over here. They have our data that they’re selling to law enforcement over here. So there’s just a lot of personal data flowing through these companies that also sell us our research products.

Safa: This type of user surveillance and personal data extraction has become a leading business model within academic infrastructure, in North America and more and more so globally.

Sarah: I think other companies are looking to this model as a good way to proceed in the future.

I mean, one of the things about the publishing industry is that it has suffered over the last few decades. As information has become more readily available online, it has suffered financial issues and, you know, kind of new sources of profit seeking, as its models are changing, right?

Once libraries stop buying paper journals, companies like, you know, Elsevier, Springer, what have you, have to rethink how they’re going to do business and ensure that they’re going to continue to be profitable? So I think a lot about, and I study this vertical kind of growth into these other data industries that are like flowing through the system.

But one thing Leslie Chan’s research does is also show this horizontal kind of opening of new information markets. So where Elsevier really used to get its main profits, get its main, you know, income flow at the point of sale. So Elsevier would make a journal, would get these editors to edit the materials, do peer review and then Elsevier would make money when the final journal was available and they could sell that final journal to libraries. But now Elsevier has figured out how to derive profits from the pre publication process.

So platforming preprints on like BPress or other, you know, preprint services. SSRN is now a Reed Elsevier LexisNexis company. So they’ve found ways to profit on the pre-print side and then also to develop those impact factors and those post publication data and metrics services that it can then sell post publication. Right?

So there’s also been this opportunity to profit horizontally in more places than just the point of sale. And I guess this loops back to what I was saying previously about how the companies are figuring out new ways to profit in this changing publication model, you know, in this transition from paper to digital information services.

So yeah, I think what flows through that horizontal growth that Leslie Chan describes and then also of the acquisitions of Interfolio, of Clarivate purchasing ProQuest to get its data cloud. All of those things have this common thread of using personal data to enrich these information products and to make new information products to sell.

Safa: You might be asking yourselves, what does this mean for faculty? How alarmed should faculty be by this?

Sarah: A lot of times when I talk to faculty and librarians about this issue, people are like, well, we all use Facebook, right? Our data’s already out there. Who cares, right? Who cares if Elsevier wants to act like Facebook and suck up our data and use our data?

But I think that it’s of a special concern for academic information for several reasons. The first reason is that we are giving a lot of power to these private companies who aren’t scientists and who aren’t experts in various humanities or science fields, to determine what academic enterprises get funded, which get valued for having higher impacts, lower impacts, right?

They assign value to academic and knowledge products that they aren’t experts in. Right. They take a lot of power away from academia to control the academic process and the whole knowledge enterprise, right? Like the way we develop and support academia and academic pursuits.

And then another huge problem with using data this way and academic data this way is it takes kind of the systemic biases built into academia, and it perpetuates them and embeds them into digital systems. Right. So we all know that academia has traditionally ignored the work of people of color, of women, of institutions that aren’t as considered ‘elite’.

So, if you are a white man working at Harvard law school, your scholarship is gonna get more attention than my scholarship, right? At CUNY School of Law, as a woman. And we know that that is even worse if you are a person of color at a historically black college or university, or you’ve published in a journal that has certain terminology in its title versus like Harvard Law Review.

So these biases don’t disappear in electronic formats – and then also in the kind of data that Elsevire and Clarivate rely on to make metrics and to make these predictions and prescriptions that it’s data analytics products formulate, and then that they can sell to hiring institutions and to all the funding companies and hiring companies in the world.

That also is really harmful, because when all of this data accumulation and collection, and then all these data analytics happen in an opaque way that’s not transparent, we can’t even see how this bias is happening, what kind of biases are happening and then give advice or instruction about how to stop and break down these biases, and create a better, more equal, and more expansive ecosystem for academia.

So you know now that when you research on any of these platforms, your data is being collected, there’s no truly unsurveiled private research that can happen in these ecosystems as they currently are formulated.

So that is problematic for academic and intellectual freedom, right? Just a lack of privacy and then also, as far as academic freedom goes, the more we rely on these platforms and systems to do the work of academia and to do the work of research and to do the work of governing hiring processes and tenure processes and, sorting through who gets higher metrics, who gets lower metrics. The more we concede that to these companies, the less control we have and the less freedom we have, in our work, right? So we’re giving away our privacy and we’re also giving away our ability to control our academic destinies, and to understand how these decisions are being made.

Safa: Amongst faculty, library professionals particularly face a tension between their commitment to the privacy of their users and their role in purchasing products and services from these companies.

Sarah: It’s interesting because along with being a law professor who can say I’m very separate from that, you know, oh, I don’t have to worry about that – I’m also a librarian. So I’m very, intimately and first person aware of how these kinds of problems make it harder for libraries to do their jobs. Right.

It creates a real ethical problem where librarians have to balance providing much needed services to their patrons, and then also ensuring that their patrons have privacy, that people who use their libraries, whether it’s academics or the public, that they also get to maintain their privacy. Cause right now both of those things can’t happen at once.

And that is a real clash point for librarians. It makes librarians jobs really hard and it presents these really difficult, if not impossible choices to libraries, where they have to decide: am I going to provide these journals that my faculty and that my students rely on?

Or am I going to say” we can’t use these platforms because you are collecting our data and we don’t know if you’re selling that data, or we don’t know how you’re using that data. Right.

It’s really hard, especially I’m a law librarian, so law libraries that rely heavily on Westlaw and Lexis don’t feel like they can just turn around and tell their students and faculties and lawyers, Hey, we’re not gonna use these products anymore. We’re not gonna contract for these products anymore.

There could be regulations or rules that create those types of safeguards and protections, but right now there are not.

Safa: Another area of concern is the relationship of these data brokers who operate in the academic publishing space with government entities. The personal data they sell can contributes to the violation of human rights, racial profiling and other forms of violance and harm.

Sarah:. ICE has gotten the most attention because their work has really raised a lot of red flags, and their surveillance infrastructure has been used to separate children from their parents and do really harsh, abusive things. And also there’s this movement called No Tech for ICE that’s led by these really excellent immigration advocacy organizations that have done a really good job of shining a light on how ICE uses these systems.

But these systems aren’t just being used by ICE. They’re being used by local state and federal law enforcement. They’re being used by most government agencies have some sort of LexisNexis special services contract, or other data contract. Right.

They’re being used by the IRS to sift out who is more likely to commit tax fraud. Being used by the United States postal service, right? Our social media data and, and these brokers to figure out who might be committing mail fraud.

They’re serving as these major data centers for local, state and federal police forces to just pool and use each other’s data and to be able to link and use the data systems that these companies provide.

They’re being used by insurance companies, tenant screening companies, employment, screening companies, healthcare systems, that are fighting the opioid war, like that’s what they like to say that they are doing.

So all of these systems, both major public institutions and major private institutions, make really big decisions about our lives, right? About what kind of services we can get, whether we can access our bank accounts, whether we can access healthcare in certain types of medications and medical interventions, all of these systems are using third party data analytics companies to help them in their work. And that includes LexisNexis and Thompson Reuters companies that also provide major research platforms.

Safa: Of great concern is the fact that this infrastructure is built on and digitizes already existing inequities, putting at risk and serving to further marginalize already underfunded students and scholars.

Sarah: The first risk is, embedding those systems of injustice, historical bias, racism, xenophobia, what have you, into digital systems, by using data that favors certain institutions, certain professors, it generously overrepresents some types of scholarship and then under represents others, you know?

And you think about impact factors or, you know, other metrics that just infuse his historic bias into these digital systems.

But then the secondary type is if you are being surveilled – surveillance, especially government surveillance tends to disparately impact certain communities, certain types of institutions and certain individuals, right?

This is a problem that happened in New York city that, you know, people in my law school did a lot of work around, Muslim student were being surveilled, you know, uniquely surveilled, in the wake of September 11th in New York city. And there is actual evidence that that occurred.

And so these types of systems collecting data, if they choose to hand that data over to law enforcement, to other surveillance forces, also is more likely to harm certain academics and certain.

And without real concerted efforts to create equal playing field and to not gravitate just towards certain things again and again – you see how certain tech companies become the major tech companies, certain academic institutions become the major academic institutions. Certain academic focuses become the major focuses to the detriment of everything else.

So you know, a lot, a lot of people love thinking about artificial intelligence, even though there’s a lot of evidence that artificial intelligence isn’t real, that computers can’t be sentient, that we’re not there yet. And we may never be there with replacing humans with artificial intelligence, artificial intelligence is a buzzword, it’s profitable.

So imagine academic studies that use the phrase artificial intelligence, and are done at particular institutions that work closely with like Facebook or other big tech companies, those studies get clicked on a lot more, especially by, you know, Twitter and Facebook – like they’re, researchers and they’re well funded and they’re in their using that work and perpetuating that work. So that work gets an artificially high or just a very high impact factor. And it looks like it is very popular.

So the data analytics companies, you know, Clarivate, Scopus, what have you, and Elsevier data analytics and all these data analytics companies predict that artificial intelligence research is going to be the most profitable, gonna be the next big thing.

So before you know it major academic institutions are spending all of their money on artificial intelligence labs. Right. And academics who decide to go into artificial intelligence have a much better chance at getting postdocs and getting jobs at these big companies, you know, that’s seen as like the big type of research. So all of the money, all of the talent, all of the interest is being siphoned in one direction by these data analytics companies. Right. So what happens to climate change research? What happens to research you know, critiquing AI and saying, Ooh, maybe AI is not gonna, you know, maybe that’s not the wave of the future, maybe that’s a mistake or that’s not realistic.

If we let Scopus decide that, or, you know, Clarivate decide that instead of letting the actual academics decide that, I think that just doesn’t bode well for academic decision making and the knowledge enterprise.

Safa: All this being said, there are many groups of stakeholders who are organizing and pushing back against the consolidation of power and untransparent and harmful practices of these companies.

Sarah: So it’s really inspiring, especially among librarians who obviously they’re, we are kind of the first line people noticing that this is happening and there’s really a lot of concerted thought and concerted effort around this.

I think we’re still at kind of this phase where we’re informing everybody about the problem, right.

I talked to a lot of different people this year and every room that I walk into – virtual or real, there are people there who haven’t heard about this problem yet. Right.

So I think we’re still at the phase where we’re explaining these problems to everyone. Right.

My research is fairly new, Leslie’s research is fairly new, right? So I think now there’s a critical mass of people who are aware of the problems. So now we’re kind of entering this next phase, which is really exciting – organizations like Library Futures and Library Freedom Project are starting to think about, now that we recognize that this is a problem, that this data analytics transition that’s happening with our publishers is a threat to privacy, is a threat to the knowledge enterprise, what do we do next?

And I think, you know, SPARC is starting to think about setting up a group, a space, I guess, to discuss these issues, right? Kind of a community around these issues that will be librarians thinking about – authentically, like what can they do and what would be best for their institutions?

And then working together in that well organized space.

And other organizations like Library Freedom Project have made scorecards about rating different products, like which products will give you more privacy, how are these companies potentially using your data? And we’re about to release one about legal research that we’ve been working on together. So that’s really exciting.

And it’s a really useful tool for people who are concerned. And really other organizations like Library Futures, just thinking about other interventions and other paths forward where we can continue providing library resources and continue being effective, even as these companies transition away from their traditional library service.

We have found activism and seen activism around this topic popping up in some interesting places. And one of the big ones is shareholders. So for one company, Thomson Reuters, which is a Canadian company, a big union that invests heavily in Thomson Reuters, filed a shareholder resolution asking for Thomson Reuters to identify and examine the financial risks of its participation in ICE surveillance.

At the time that the shareholder resolution was drafted, Thomson Reuters was the largest data broker for ICE’s surveillance infrastructure, and Thomson Reuters is still a very large participant and has multiple contracts with ICE for personal data.

So the shareholders demanded that Thomson Reuters investigate and report out the potential risk that the company was taking on and exposing its shareholders to as a result of being involved in those potentially human rights abusing practices. And this, I believe it was this year, Thomson Reuters, realized that it was going to have to do that reporting.

And now we’ll be putting out a report about it, with data surveillance and particularly ICE in the future. So yeah, there’s been effective activism from immigrant advocacy groups, but also from shareholders in these companies.

And our academic institutions have a really important and large role to play in the way these companies can access and collect our data.

And I think a lot of institutions aren’t even aware of how much power they wield. So a lot of how much data these companies can collect through their research platforms depends on how much data academic institutions are collecting and how much data they’re allowing to flow through to these products.

I think it was Dorothea Salo’s blog post, where she described the results of the Interfolio request she filed. But I think the way she put it, the ProQuest platform couldn’t collect certain bits of data about her because that data wasn’t collected by her institution, right? It wasn’t attached to her student ID. And so if the institution doesn’t collect the data, then ProQuest can’t collect that data and see that data. Right.

So a lot of times the information that your own academic institution is tracking will impact what kind of data is flowing out to these third party companies, right?

And so then, you know, the metrics and whatever entities are purchasing from these third parties, we’re not sure who they are.

One thing that has been cropping up a bit and that librarians especially have contacted me about is there’s this kind of clash of interest in academic institutions, between libraries who want to ensure privacy for all of their users and the academic institutions themselves who are concerned with security and theft – they wanna track where people are on campus in case something is stolen or in case somebody is somewhere where they’re not supposed to be.

They want to track who checks out an iPad and who, you know, logs onto a computer at the computer lab, or what have you, in order to ensure that nobody’s doing anything illegal on these platforms, right?

So there’s this clash of campus security and digital security and then data privacy, because those two things don’t usually get along, right. One involves collection of a lot of data and a lot of tracking of students and faculty and people on campus. And then the other one is the opposite of that. Right? We’re going to expunge data. We’re not going to collect data.

So I think having just an open frank discussion in academic institutions about where to draw those lines, how do we balance the interest of security and the interest of privacy and which values are we more interested in protecting?

I think that’s an important discussion to have. So I’ve actually been contacted by some librarians whose campuses are just all out inserting spyware into all sorts of digital devices on their campus and they’re doing it – their intent is to protect school property and to protect students, right.

They think that that’s a good kind of policing protecting measure, but librarians see it as an invasion of privacy and as another conduit for data that may eventually flow through to these other systems, because once a campus collects data, there’s no, I mean, besides like FERPA and certain privacy laws, that are very limited, there’s really no way to ensure that that data won’t creep out and spill out into other uses.

Safa: Recently, Dr Lamdon was asked to testify in front of congress about her research and expertise on this issue, as they are considering some strategies for regulating the activities of these companies.

Sarah: It has been a really long, tough road to pass any sort of data privacy legislation in Congress. Some states have been effective, California has passed some data privacy and data broker legislation and Vermont and some other states have also moved in the direction of starting these like data broker registries that would require companies like LexisNexis to register as data brokers and then also provide people with their data dossier so that people can see and correct their own data.

So there has been, on a state level, there’s been kind of a patchwork of activity, but federal data privacy activity has been hard to come by.

However, I recently had the opportunity to speak to the House Judiciary Committee and they were considering a law called the Fourth Amendment is Not for Sale Act. So that bill is an effort to impose warrant requirements on third party data providers like LexisNexis and other types of facial recognition, geolocation data providers.

So basically what people are trying to do is ensure that if the government uses our data dossiers from these companies, from these data analytics companies, they first have to get a warrant which involves going to a court and showing probable cause and getting a particularized warrant for a particularized reason – and right now none of those protections exist.

So it’s an interesting approach. It doesn’t solve the whole problem, but it’s a good step. And it’s an important step, right? The constitutional protection against unwarranted searches and seizures is important for our privacy and for due process.

So yeah, that’s one of the more popular data privacy and data governance type of laws that’s being considered.

Honestly it’s hard for me to link that kind of narrow thing with this huge problem of the knowledge enterprise, because to me the Fourth Amendment is Not for Sale Act is interesting, but it doesn’t come anywhere close to solving the problems that we are discussing here. Right?

It doesn’t, it doesn’t prevent the whole knowledge enterprise from being overtaken by data analytics companies, which is kind of a bummer, which I wish Congress would pay more attention to that problem, because I think it’s also a huge problem.

And unfortunately I feel like this is very Eurocentric, but most often the law, or the regulation that people ask about is the GDPR in the EU, right? It’s their major data protection regulation. And it also does not solve the problems that we’re describing here, but it does kind of open the door for consumers to see what kind of data being collected by these entities and to ensure that the data that they’re selling is correct. And to kind of participate more in the pipeline of our data use and data sales.

It’s an interesting model, one that the US has not yet followed, but I think it’s a popular model internationally that maybe other countries beyond the EU have also implemented similar kinds of activities. So these regulations don’t stop the data from being collected. They don’t prohibit the data from being sold, but they do make the data collection process less opaque and they allow for more public participation in that process.

Sarah: The one thing I want everyone to know – the main point of my research and the whole reason I wrote the book is I want people to understand that all of these information markets are connected and they’re all being controlled by the same several companies. Legal research, academic research, financial data, and personal data collection and sale – all of those different services are all being offered by the same several companies.

There are just a few companies that are overtaking every informational market, and we should pay really close attention to those companies. Because if we are interested in information flows and information access and data privacy, these companies have a lot of control, even though we don’t discuss them.

We discuss the big five all the time. We discuss Facebook, we discuss Amazon and Google. But we don’t talk about Reed Elsevier LexisNexis, we don’t talk about Clarivate, we don’t talk about Thompson Reuters. These companies are also multi-billion dollar informational giants and data giants that deserve the attention of academics, that deserve the attention of regulators.

And we should all be mindful of them. And that’s why I would encourage people to read Data Cartels because it describes and lays out that landscape pretty clearly.

So my book Data Cartels is coming out November 8th, that’s its publication date, it’s birthday, and it is available for pre-order on the Stanford university press website or wherever you buy your books, it’s available on all the common book selling platforms for pre-order now.

Safa: Thank you so much for tuning in.

If you are provoked by what you heard today, we invite you to join us at the Knowledge Equity Lab. Together we can fundamentally reimagine knowledge systems and build healthier relationships and communities of care that promote and enact equity at multiple levels.

Please visit our website, sign up for our mailing list, follow us on social media and send us a message to get involved!

Boletín SciELO-México

martes, 1 de octubre de 2024

Los cárteles de datos y la publicación vigilada (entrevista con Sarah Lamdan)

Data Cartels and Surveillance Publishing

Los artículos científicos fraudulentos están en auge [ artículo en The Economist ]

Denunciar abuso

Etiquetas